Above: WWF team member exploring datasets in Indonesia, 2018. Photo by Stefan Kraus.
Data research is a method that helps you to systematically assess existing data and data sources, allowing you to identify where there are gaps in the data you need and where you can add value with your programme. At the same time, data research gives you the tools to think about your stakeholders and audience. In this chapter, you’ll find an overview of four consecutive steps that will help you in conducting data research:
- Make an inventory of existing data/evidence
- Evaluate existing data
- Perform a gap analysis
- Understand who will use your data
Make an inventory of existing data/evidence
Once you’ve identified what data needs you have within your programme, you will need to start gathering it. Some data may be readily available, while other data may still need to be captured. You can start off by making an inventory of existing data.
First of all, look into the data resources of your own organisation, including what is gathered in reports and stored in databases. Consider both quantitative data, expressing a certain quantity, amount or range, and qualitative data, which is more descriptive, resulting from small scale surveys, focus group discussions, observations and interviews. You can then think about what data may be available and easily accessible outside of your organisation. Are there any data sharing platforms or other organisations that deal with the same problem or try to answer the same question? What data do they have on this problem? Is it open access? Even if data is not openly accessible, it might be possible to persuade this organisation to share its data.
Evaluate existing data
Once you’ve created an inventory of existing data sources, it is important to evaluate the existing data on its accessibility, granularity, credibility and relevance. The following questions can help you understand whether the existing data is available for usage, detailed enough and has the right scale, and reliable enough for you to use in your programme:
- Is the data openly available, or does it require special permission to access? (Accessibility)
- Is the data structured in a way that is useful for your programme? (Relevance)
- How often is the data collected? (Granularity)
- How granular or detailed is the data geographically? (Granularity)
- How granular or detailed is the data demographically? (Granularity)
- When was the data collected? How long has it been retained? (Relevance and Granularity)
- Do the current problem solvers use it for decision making, evaluation, or something else? (Credibility)
- Who collected the data? What was the purpose of their data collection? Has the data been cleaned and/or analysed? And if so, in what way? (Credibility)
Perform a gap analysis
Now that you’ve identified the data sources that are available to you and what data you can use for your programme, you need to think about what data you still need to collect to answer your questions. To do so, it helps to ask the following: what data do I need to answer my questions or describe my indicators? It’s important that, in the first instance, you don’t think about restrictions that might be apparent in collecting this data. Only after identifying the data you need should you start considering potential restrictions, such as time, resources and feasibility. It might turn out that data you initially deemed infeasible to collect isn’t as difficult to gather after all.
Once you have identified all the data gaps, take a critical look at the data you’ve identified as necessary. Do you really need to collect all that data? And what are you going to use all the different elements for? Although it’s tempting to collect data that you may think will be useful in future, a general rule of thumb is that less data is more. It’s better to focus on the things that really matter and minimise complexity. It’s less expensive, less time consuming, and you don’t run the risk of collecting the wrong data.
Understand who will use your data
If you are collecting data to contribute to solving a problem, or to underline the importance of addressing a certain problem, keep in mind that it is crucial to involve all relevant stakeholders from the start of the data research process. This will create ownership of the data, ensure relevance and usefulness of the data, result in communities feeling represented by the data, and avoid decision makers turning a blind eye or questioning the credibility of the data. Start your data collection exercise with an inventory of what the different stakeholders want to know and how you are going to reach them. Sharing the data with the people directly involved in the problem empowers them to take action. However, this involves thinking about how to share the data in an understandable and accessible way. In remote communities, accessing the data online may prove to be difficult, and radio stations or distribution of offline materials may be a better mode of dissemination. You might want to consider making a data dissemination plan, in which you identify your stakeholders and their respective communication channels.
Data research is an approach that will help you to create focus in your programme. Thinking in this structured way about data gathering will avoid collection of duplicate data and encourage everyone involved to determine the quality and usefulness of available data. This method also allows you to assess whether the data you are collecting is truly relevant to your programme, and the different stakeholders involved, and forces you to think about how to disseminate the data to them before the data collection has actually started.
This blog was written with the help of Karolina Sarna, a data scientist who worked at Akvo in Amsterdam.