Imports and the Community
September 6th, 2009
I’ve been thinking recently on the effects of imports on the OpenStreetMap community. But there’s no real hard data — and repeatable experiments would be difficult if not impossible. So I’ve turned to that old “what if” machine; the Monte-Carlo simulation.
A caveat before reading the whole post: I’ve no way of backing any of this up, so treat it as an “informed guess”. There are assumptions about the dynamics of the community which I’ve used in the model which may, or may not, be true in reality.
The first case is looking at the effect of an initial import (i.e: before I’ve started the community simulation) on the completion of the map. What I’m simulating is thousands of contributing agents, generated based on levels of local activity on the assumption that many mappers are recruited either through friends or by local events. There’s also a background, random level of recruitment from news articles, long-range friends, etc… All contributor sign-up is linked to a population density field so that the distribution of mappers is greater in urban areas, as it is in real life. Mappers also have a “comfort radius” within which they will map, but not go beyond, which is probably the case for most casual mappers. Finally, there is a “completeness” threshold for each mapper which controls the extent to which they’ll “complete” the map. This simulates an effect I’ve seen in real life — that different people care about, and will map, different things and few people will map absolutely everything and anything.
Anyway, on to the results. The first graph is instantly informative:

Basically; the more we start with, the less you end up with and the longer it takes to get there. The worrying thing is that the asymptotic level of “completeness” (for whatever value you consider as an indicator of quality) is inversely related to the level of initial import. The asymptotic level of completion isn’t 100% because the model accounts for very rural areas, where recruitment is poor, and doesn’t account for the mobility of mappers. A simplifying assumption is that they’ll stay within their “comfort radius” of home.
But we’ve got communities everywhere. So what happens when the import is done after the community has started growing, and started mapping its area?

Hmmm… Pretty much the same thing, but less so. A word here, though, about the “import” that I’m doing into the model. It’s a best-case import which completes the map to a certain degree everywhere and doesn’t stomp on, duplicate or otherwise maul anyone’s existing data, so I don’t model the corresponding drop in the community which might result from people feeling their efforts were steam-rollered. Also, the “import” is considered to be good data, something which is rarely the case in the real world.
But what happens when the import covers only a part of the total area?

This is better still, if only because the impact of the import is localised to a part of the community. But what happens if the import isn’t contiguous, but spread randomly over a fraction of the area?

Ah… Finally, some benefit to the imports — there doesn’t seem to be any long-term damage to the community and the “completeness” levels out at roughly the same place. The “import” here was to a small fraction of the area and, again, I didn’t simulate any direct effect on the community, or any time lost in fixing-up and integrating such data into the existing map.
So what happens if we break an import up into lots of mini-imports and spread them out so that the rate of importing is roughly constant?

Interestingly, here we can see even less of a long-term effect on the rate of completion. In fact, all the curves seem to converge pretty rapidly over time. However, if you’re looking for 90% completion, it seems the approach with incremental imports makes no measurable difference to the time taken, or to the final levels of completion.
What, if any, conclusions can we draw from this? It seems that small, non-local imports aren’t detrimental, and may even be beneficial, to the growth of the community and the map. Imports of specialised features, such as bus stops or water areas, can be very useful in filling in the gaps where other contributors may not want to contribute.
However, it seems that large-scale imports, particularly of “fundamental” features such as road networks, can cause problems with the growth of a community, particularly if the import is done at a very early stage in the growth of that community.
In conclusion, there’s no evidence here that imports are all bad, but it’s my personal opinion that imports, particularly if they’re badly thought-through, too wide-ranging or don’t do enough to engage the existing community, can be bad. And often are.
Entry Filed under: OSM
7 Comments Add your own
1. | September 7th, 2009 at 2:54 pm
[...] This post was mentioned on Twitter by Artem Dudarev. Artem Dudarev said: interesting speculation: data imports are not too good for community grows #openstreetmap http://bit.ly/IJVPh via @zerebubuth via @jokru [...]
2. Andrew MacKinnon | September 8th, 2009 at 8:07 am
In my opinion, this is not true. In my experience a complete map attracts more users (and thus mappers) than an incomplete map, as it is more useful to users. For example, if a map has a more-or-less complete road network imported from some government database, then it will attract users who then start making corrections, adding POIs, etc. The same is true if one hardcore user mapped a large area by him/herself – this has exactly the same effect as a data import. In contrast, if an area has little or no data, many potential users will just say “OpenStreetMap is useless” and not bother contributing. In addition, I don’t think that contributors will generally stop mapping at a certain amount of detail (unless the map is truly 100% complete), instead they will tend to get addicted and once they have completed the map to a certain level of detail, they will start adding even more detail.
3. Harry Wood | September 9th, 2009 at 5:02 pm
There’s definitely psychology at play. The first impression a visitor has when they look at our map… will they become interested in mapping? A visitor may also decide to become a user of OSM data (e.g. setting up website showing maps) which in turn feeds more community enthusiasm as Andrew says.
But maybe those two mindsets are quite different in fact. I think people judge our maps on two different metrics: “Coverage” and “Quality”. For using OSM, coverage is very important and quality is quite important (depends on the usage, but in general people perhaps decide they can’t use our data if they see variable coverage) BUT for piquing somebody’s curiosity and getting them involved in mapping, they’re interested in great examples of beautifully mapped areas, but also inspired when they see areas with poor coverage. What they are not inspired by, is completely uniform full coverage of a poor quality.
4. Richard Weait | September 30th, 2009 at 4:01 pm
Your simulation is interesting and certainly the graphs make a strong argument for caution in imports. Do I understand correctly that all of the data is simulated?
I wonder if any real data can be found to support the simulation? It seems like a lot of work, but perhaps selecting a few hundred US counties, then extracting history data, before and after the TIGER import in the area will advise us on improvements for the simulation.
Still, that only addresses community effects when the import is TIGER-like, but it is a start.
There were mappers in USA before TIGER, but perhaps not a statistically significant community.
Perhaps Netherlands data around the AND import is helpful. Data from the current French Corine import, may help in future.
5. Matt | September 30th, 2009 at 11:04 pm
@Richard: yes. in this post all the data is simulated. i’ve posted a follow-up which includes some analysis of real data (for the US and Netherlands in particular).
6. Mikel | November 5th, 2009 at 6:50 pm
I have to disagree with some of the fundamental assumptions of the psychology of the agents in the model. As the maps “complete”, and community activity declines, perhaps it’s the way community engages that needs to change. Ongoing maintenance and improvement of the map need to become a part of every day life.
In short, I don’t think imports are the problem, but our lack of ideas so far about how to make mapping fun when the map goes 1.0.
7. Matt | November 6th, 2009 at 9:04 pm
@Mikel: of course the simulation is simplistic, and is missing any special features to simulate hypothetical community engagement. however, the fundamental mechanism by which import slow community growth is a plausible one.
in short, i don’t think all imports are the problem – just badly thought-through, automated, mass imports. we should be very careful in selecting the data which is good enough to import and make sure that all of it is thoroughly checked by a real user before it is committed into the database. otherwise we risk slowing community growth and putting off potential users.
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed