Imports and the Community II
Thanks for the feedback on the previous post on imports; it got me thinking about whether my model had been over-elaborate. Specifically, the limit on the model agents which capped the maximum “completeness” levels at which they would contribute. I thought this was a necessary part of the model, but after investigation it turns out it isn’t.
The following graph shows what happens when this restriction is removed. Note that I’ve had to run the simulation at higher levels of import for the effects to be really visible — there’s no doubt that, if editors really do continue to contribute after the map has passed their personal threshold of “good enough” (or they don’t have such a threshold), then the imports have less of an effect than they do assuming such a threshold. But they still have a detrimental effect.
Let’s examine the evidence, taken from the changeset history of OpenStreetMap. If people don’t have a threshold then we would expect to see, in areas which are very well-mapped, a similar editor distribution in later edits as we do in earlier edits. The editor distribution shows the cumulative number of edits (in this case changesets) against the cumulative editors themselves. For comparison, it looks a lot like the income inequality charts (Lorenz curves) used to calculate the GINI coefficient. Here’s the chart for London, UK.
The red curve represents those edits which are in the top 100 most recent for any particular bounding box, the green line represents those which aren’t. The reason for using recency count, rather than the time of the changeset, is so that there’s no bias towards recently-mapped areas; the 100th changeset in an area is always the 100th changeset regardless of history.
It’s clear that the more recent edits are less equally distributed amongst the users, but the gap between them isn’t huge. Possibly this is an artefact of choosing London as the test-area, as it was the first place to be edited in OpenStreetMap, so maybe has a weird editing history. Let’s look at Den Haag, NL instead.
The same effect is visible, but not as pronounced. There is no firm conclusion to this, but maybe a slight suggestion that there is an inequality between recent editor activity and older editor activity in the same area. There are some problems with this approach, however, primary amongst them; all changesets are not equal and counting them in this fashion makes assumptions about the statistical distribution of work (FSVO “work”) within changesets.
Let’s look at some primary data then; the unique number of users per month in particular areas. First up, the USA.
The grey areas behind the curve are the periods in which TIGER data was being imported. Interestingly the first import doesn’t seem to have had much effect. About three months after the second, though, the editor growth rate seems to have dropped off. Maybe it’s an artefact, due to the low population density or something. Let’s look at somewhere with a higher density, the Netherlands.
The grey area, again, is the import. And, again, there’s a drop-off about three months after the import is finished. Maybe everywhere has a drop-off around the beginning of 2008? Let’s look at some more places.
Both Germany and Denmark had drop-offs in editor growth, albeit about 6 months after the Netherlands. But the UK seems to have had it’s drop-off much earlier — or maybe it just hasn’t had it yet. The kink in the graph, then, seems to be something natural (well, i fitted two curves, so I’d be surprised if it didn’t kink somewhere), but USA and NL kink earlier and lower than any of the others except the UK. The UK, though, has the highest current fit growth rate, so maybe it wasn’t a good candidate for a two-line fit.
In conclusion; even without the assumption that agent-editors have a threshold for contributions, the theoretical model still predicts that imports damage the growth of the editor community. There’s no conclusive evidence for this in practice, although there is some circumstantial evidence. At this point it’s difficult to say for sure whether the effects shown here are due to imports, or due to seasonal effects or anything else. For answers to those questions we’ll need more sophisticated analyses or more data.
September 10th, 2009





