On database wrangling

Leif Johnson — 23 Apr 2009, 15:04

Wasn't I just sitting here writing about time passing quickly ? Well, as the end of the semester approaches and I find myself increasingly stressed, time just zips right along !

Kohonen maps

This week I wrote up a little JavaScript visualization of a bimodal Kohonen map that I think is pretty ok. The dataset stored in the map is a sequence of %% (\rho, \theta, x, y) %% tuples. The %%x%% and %%y%% components of each tuple identify a single point in the 2-dimensional cartesian plane, and the %%\rho%% and %%\theta%% components describe the normal vector to a line in the %%2%%-dimensional cartesian plane. The point %%(x, y)%% is constrained to lie on the line that's normal to the %%\rho%%-%%theta%% vector, so any three of the four dimensions are sufficient to calculate the fourth. (This is actually not entirely true, since %%\theta%% can take on two values for any particular %%\rho%%, %%x%%, and %%y%%. But the other dimensions are ok.)

A training dataset consists of randomly sampled points from this 4-dimensional space, such that the constraints are met. The sampling process draws %%x%% and %%y%% from Normal(0, 1), and %%\theta%% from Uniform(0, %%2\pi%%). Then %%\rho%% is calculated using the three sampled values.

The bimodal map actually consists of three separate Kohonen maps, each of the same topology—in this dataset, I used 2-dimensional rectangular grid maps with 50 rows and 50 columns. One map stores the “external” cues (the %%\rho%% values), one map stores the “internal” responses (the %%(\theta, x, y)%% values), and the third stores the Jacobian that maps changes in the internal values to changes in the external value.

The external map is presented with the %%\rho%% components of each data point, and the coordinates of the winner are used to identify both the internal vector and Jacobian matrix for that external cue. These values are mapped onto a series of (hopefully increasingly accurate) %%\rho%% values. This process is described by [Ritter, Martinetz and Schulten][] in their work on controlling a robot arm.

The maps have actually been working fairly well. There are some issues remaining with initialization, since the map tends to explode into NaN territory relatively often, but in general I've been happy with the results. The next step is to generate a more biologically relevant dataset by simulating the motions of a human shoulder/elbow(/wrist eventually).

I'm also working on putting my Python Kohonen map code up on bitbucket.

MySQL boo hoos

On a completely separate front, I'm finding myself once again butting heads with MySQL. As hokey as it may be to complain about such things, I thought I'd gotten away from battling with this relational database system once I left my post in the lawless territory of corporate software. But MySQL is just too useful for my task : I'm implementing the algorithm described in "Finding community structure in very large networks", but because the graph in question has about 3MM vertices and 70MM edges, I just can't hold the thing in memory very well. So it's off to the races with the MySQL documentation, setting variables like key_buffer_size and such nonsense. Mostly I'm just whining ; it's really the graph's fault that these operations take so much time. But waiting for a database to finish is worse than watching paint dry—you can't even blow on it to get it to hurry up already.

The softer side

The garden is going fairly well, but I'm a bit worried that the peas are drying out—their bottom leaves are all shriveled and yellow, but the top leaves seem fine. I think it might just be a growth strategy, so I'm leaving it alone for now. I have a couple of photos to put up, whenever I get photos set up on this web site (soon !).

I've been refreshing my hands on the piano more often lately, which has been quite pleasant. It's difficult to play so badly right now, knowing that the only thing lying between where I am now and where I want to get is a lot of practice time. Right now I'm still struggling with reading the music ; I think a lot of my practice time will improve once I've learned the mapping between black dots on paper and motions of my hands.

Finally, I think I've just managed to squeeze one game of disc golf in since the last update, but this time I did remember my stroke counts on each hole, so I put them on a neato graph :

It'll be fun to track this over time. It's too bad the labels in the legend aren't sorted, but whatevs.