The Crowdsourcing Model for Open Data Portals

April 24, 2020

Maintaining open data portals is taxing and requires consistency. Leveraging a community of data enthusiasts may be your way of keeping up with data demand.

In my last blog post, I had a look at the Kenya open data portal, a well-intentioned government initiative, funded under the Kenyan Ministry of Information, Communication and Technology, that ran into a few problems.

In this scenario, a team of professional technologists and data scientists are hired to maintain the function of the open data portal by researching, scraping and uploading data in CSVs or other machine-readable formats.

This is in an ideal situation, but there is a danger that — as in the case of the Kenyan Open Data Portal — that the specific interests and resource limitations of the team behind it could limit the amount of data uploaded to the platform in the long term, and thereby undermine its sustainability.

However, several other models exist for making open data portals work. What if you can share out the burden of work to other interested parties, for example? This would enable contributors to provide a much more diverse range of datasets, and a much higher volume of data overall, for everyone to use.

When we think of crowdsourcing, we may already have an idea about the sorts of projects that can be achieved with this model. After all, it’s been around in various forms for a while, and is the basis for countless online initiatives, ranging from from AirBnB’s rentals system, to solving mysteries through Reddit.

But open data platforms can also greatly benefit from this model — especially in the absence of adequate funds to employ a team of dedicated specialists to keep an open data portal active.

A number of different applications have been created to support independent crowdsourced open data projects to emerge and flourish, and to support their growth.

For example, CKAN (which stands for ‘Comprehensive Knowledge Archive Network’), is an open-source tool for making open data portals. It is similar to a content management system like WordPress, but for data. Its codebase is maintained by the Open Knowledge Foundation and it has developed into a powerful data catalogue system that is mostly used by public institutions looking to share their data with the public — for instance, the UK’s and the United States government’s

Screenshot of CKAN home page

CKAN assists in the storage and distribution of data and can be a powerful apparatus for a volunteer-based open data portal.

Plenty of resources provided by organizations such as Open Knowledge FoundationOpen HeroinesGovLab also provide guides and best-practices when putting together the structure of your open data portal, and figuring out how it might work.

So could it be a useful model for you? Let’s have a look at the pros and cons of crowdsourcing-based open data portals:

The Advantages

Diverse Contributors, Diverse Data— A crowdsourced model allows a broad range of different actors to produce and contribute data, and create the basis on which the platform can become self-sufficient. These actors could include data-producing NGOs, journalists, researchers, and data enthusiasts, among others. By contributing to the portal, they are able to share their resources in a machine-readable format while also potentially benefiting the work of others. In return, they will gain access to an ever-growing resource of valuable information, on a wide variety of topics.

Because of the wide range of data uploaders who are from different countries, backgrounds, races, religions and genders, the data shared also has the potential to be much richer in its scope.

Do More With Less!— A crowdsourcing model can significantly reduce the number of required permanent staff engaged with maintaining the platform. Although more energy will need to be devoted to community engagement and data verification, fewer resources will be needed to be dedicated to sourcing, cleaning, and structuring datasets internally, making the venture sustainable, open-source and cost-efficient.

Community Engagement Builds Trust in the Data— In a world where people are becoming more and more aware of how narratives can be spun and controlled by different actors for various purposes, the fact that there is no single controller of the data that goes up onto the portal lends credibility to the project’s objectives and can counter accusations that data is being manipulated or selectively uploaded by the data platform. This can encourage further engagement with the website from the general public.

Contributors Will Contribute With A Purpose — Most importantly, this model increases the likelihood that contributors will publish data with a purpose. As OpenDataCharter puts it: “In our efforts to encourage a shift towards governments being ‘open by default’, we have learned that publishing data to solve specific policy problems is more effective than doing so in isolation. ‘Publish with purpose’ creates more incentives and momentum than ‘publish and they will come.’”

The Drawbacks

Maintaining Data Consistency Can Be Tricky — Some things to consider when going into such a venture would be the risk of inconsistency when it comes to the regularity with which new data is published, as it entirely depends on the users. If there comes a time where there is a lull in the amount of data being uploaded, it can sometimes be challenging to maintain momentum.

This can be tackled by building a community of interested parties to become regular contributors, by creating a presence online through social media platforms, or wherever keen data enthusiasts exist. This might also involve getting organizations that regularly produce data to use your portal as their go-to storage and distribution network.

Be Prepared To Deal With Bad Data & Spammers –The quality of data uploaded to the platform could also be a point of contention, especially in the early stages. This can be tackled by having a strict policy clearly outlined on the portals’ website, along with constant checks on what is being uploaded. Regularly deleting spam will be a major task for any site manager as a result.

Hybrid Models May Be More Effective — Depending on the resources available, this system can exist in a hybrid form where half of the work is done by volunteers, and the other half is done by a hired team. This allows some flexibility to shift between the two models depending on access to funding and resources, and is a good launching pad to get an open data platform off the ground.

We hope this blog provided some insights into how and why you might want to use a crowdsourcing model for your open data project. It’s not a straightforward decision to take, and whether you go for a top-down model like the Kenya Open Data Portal, or a bottom-up crowdsourced one instead, there will be unique challenges and opportunities ahead.

Luckily, there are plenty of organisations out there to give you more support and guidance!

Resources and More Reading

Some examples of open data portals taking advantage of this community model to different degrees include:

If you’re interested in learning more about how to establish and support community-based open data platforms, the resources below should be able to provide some more insights: