The World

[as I find it]

Split Experiments in NetLogo XML

leave a comment »

I was recently approached by Michael Weisberg about a problem he was having using NetLogo. He has designed a NetLogo model and is running many repetitions of it to collect a broad dataset. This process is getting computationally expensive, and he wanted to farm out the experiment runs to different threads to speed up processing time.

I don’t know all the details, but it turns out that NetLogo, despite being Java-based, is not multithreaded, so Professor Weisberg is running these jobs on separate NetLogo instances running on separate machines, linked up by Apple’s Xgrid. He needed a way to take an experiment definition and split up the runs, so that the work could be spread to different nodes in the grid.

NetLogo provides a way to define an experiment using XML. Here’s an example Michael sent me:

<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE experiments SYSTEM "behaviorspace.dtd">
<experiments>
<experiment name="Radius" repetitions="25" runMetricsEveryStep="false">
<setup>setup</setup>
<go>go</go>
<timeLimitsteps="500"/>
<metric>countpatcheswith[item0visited>0]</metric>
<metric>countpatcheswith[item1visited>0]</metric>
<metric>countpatcheswith[item0visited>0andfirstsignificance>0]</metric>
<metric>countpatcheswith[item1visited>0andfirstsignificance>0]</metric>
<metric>countpatcheswith[firstsignificance>0]</metric>
<metric>countfollowerswith[previous_significance>0]</metric>
<metric>countmaverickswith[previous_significance>0]</metric>
<metric>countfollowerswith[intxcor=25andintycor=25]+countfollowerswith[intxcor=-5andintycor=-5]</metric>
<metric>countmaverickswith[intxcor=25andintycor=25]+countmaverickswith[intxcor=-5andintycor=-5]</metric>
<enumeratedValueSet variable="number_of_mavericks">
<value value="200"/>
</enumeratedValueSet>
<enumeratedValueSet variable="number_of_control">
<value value="0"/>
</enumeratedValueSet>
<enumeratedValueSet variable="number_of_followers">
<value value="0"/>
</enumeratedValueSet>
<steppedValueSet variable="community_size" first="1" step="0.5" last="20"/>
</experiment>
</experiments>

The key node to notice is the <steppedValueSet> node, which defines the way a variable should change across runs of the experiment. He wanted to be able to “chunk” this range of values into multiple experiments. For example, the community_size variable defined above runs from 1 to 20 with a step of 0.5, totalling 40 (or, I guess, 39) runs. He wanted to define a chunk size of 10, and end up with two experiment nodes that look as follows:

<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE experiments SYSTEM "behaviorspace.dtd">
<experiments>
<experiment name="Radius-1" repetitions="25" runMetricsEveryStep="false">
<!-- ... -->
<steppedValueSet variable="community_size" first="1" step="0.5" last="10"/>
</experiment>
<experiment name="Radius-2" repetitions="25" runMetricsEveryStep="false">
<!-- ... -->
<steppedValueSet variable="community_size" first="10" step="0.5" last="20"/>
</experiment>
</experiments>

This is a simple enough manipulation when you only have one stepped variable: all you have to do is read the steppedValueSet node, make a copy of its container experiment node, and adjust the attribute values on each steppedValueSet to reflect the endpoints of the chunks. But in the case where there is more than one steppedValueSet node per experiment, it gets a little trickier: now you need an experiment node for each tuple in the Cartesian product of the variable range chunks. If you have two variables with two chunks each, you need 2*2 = 4 experiment nodes; if you have three variables with four chunks each, you need 4*4*4 = 64 experiment nodes.

I wrote some Python code to perform this manipulation; you can get it here. For every experiment node in an input XML document, it reads the steppedValueSet nodes, chunks them according to sizes you define for each variable, and creates a new steppedValueSet node for each chunk. It then creates an experiment node for every ordered tuple in the Cartesian product of the sets of steppedValueSet nodes, and adds those experiments to a new XML document.

Here’s how you use the code:

# import the split_experiments function
>>> from nlogoxml import split_experiments
>>> from xml.dom import minidom

# open the input file and parse it into a DOM Document object
>>> f = open("experiment.xml")
>>> doc = minidom.parse(f)

# define chunk sizes for each variable:
# keys should be variable names exactly as they appear in the original XML
# variables with no defined chunk size will not be chunked
>>> chunk_sizes = {
...   'var_1' : 10,
...   'var_2' : 2,
...   'var_3' : 34,
...   # ...
... }

# use split_experiments to obtain a new DOM Document instance
>>> new_doc = split_experiments(doc, chunk_sizes)

# do whatever post-processing on new_doc you want; here we just write it out
>>> f2 = open("output_file.xml", "w")
>>> f2.write(new_doc.toxml())
>>> f2.close()

A few warnings: first of all, this code is NOT extensively tested. I tried it on the sample XML file Michael Weisberg sent me, and it works for that, but I have not tried it with more complex experiment definitions. Second, the code for the split_experiments function is not very modular, and doesn’t contain much error-handling, so please let me know if you find it breaking down in particular places, or if you see certain functions that would be useful to abstract out. Third, all of the XML processing is DOM-based, so it has the potential to use quite a lot of memory, especially since the number of experiment nodes grows exponentially with the number of steppedValueSet variables and the number of chunks for each variable. This is probably not a problem if you’re in a situation where you have dedicated machines for running NetLogo experiments, but keep it in mind.

Please let me know if you find this program useful!

Written by whereofwecannotspeak

June 11, 2008 at 12:11 pm

Posted in Free Software

Tagged with , ,

Leave a Reply