“I don’t side-sell. I only sell maize to the private crop aggregator company you mentioned,” said Sam (name changed to protect privacy). But later in the day, as we became more familiar with each other, Sam introduced me to two gentlemen visiting his village in Eastern Rwanda. They were private buyers, and Sam and his fellow villagers sold their crops to them. Some villagers sold regularly to these buyers, while others sold to them when they needed cash for a family emergency.
“But you said you didn’t side-sell,” I feigned surprise. “I lied,” Sam replied with an impish grin.
We have witnessed this theatre play out across contexts for nearly two decades now - the untruth that seems close to the truth, the incentives that dictate answers, and the biases of social desirability and acquiescence that influence respondent behaviour.
As each year comes to a close, we take time to pause and reflect. This year, we’ve decided to delve into our experiences of collecting data. For almost two decades, we have been gathering data on WASH, agriculture, climate, and energy across the world. While we’ve shared some of our data collection experiences before - from remote data collection to collecting farmer data at scale - we felt a broader overview, paired with actionable recommendations, would complement those more focused lessons.
We’ve divided this blog into six sections. The first uncovers the incentives to lie, while the second explores the power of conversation. The third highlights the importance of narratives beyond numbers. The fourth examines the potential of snap surveys over single snapshots in time, and the fifth discusses using a mix of technologies and techniques to conduct surveys. Each section contains conclusions and recommendations, but in the final section, we summarise these insights and make the recommendations more explicit and actionable. We hope readers and practitioners in the development sector will find them useful.
Lies, damned lies and statistics
How we collect data, who we talk to, and who is asking the question all influence data quality. For a respondent hoping to access a government scheme meant for people living below poverty levels (and if you are surveying on behalf of the government), there may be an incentive to underreport their income. However, if social status comes into play, the same respondent might report a higher income. That said, income is one of the hardest indicators to measure via a questionnaires - something we’ve been optimising with IDH for the last four years.
Similarly, questions like “Do you use the toilet?” or “Who makes financial decisions in the household?” almost always yield socially acceptable answers. People want to be seen doing the right thing. We’ve learned to be mindful of the biases that emerge when asking questions. Phrasing questions appropriately, using the right proxies (e.g. consumption to understand income), remaining observant of surroundings (sights, sounds, and smells), and empathising with respondents have all helped us navigate these untruths. Because these inaccuracies only magnify when aggregated into statistics.
The fine art of conversation
Surveys often fail to provide reliable information because respondents don’t always trust the interviewers. Respondents dislike long Q&A sessions and can suffer from survey fatigue, particularly when they see little benefit in participating. “I don’t know what these survey agencies/NGOs do with all the information they collect from me,” a fisherman in Bangladesh once lamented. Even when respondents consent and know they can stop at any time, they sometimes continue out of politeness, providing answers just to satisfy the interviewer—undermining data quality.
Over the years, my colleague Francis Warui taught me the power of small talk. For the first five minutes of a survey, I learned to discuss the weather, check on the crops (even when surveying toilet use), and talk about anything except the survey. Understanding the culture that drives conversation is key - what’s appropriate in Indonesia might not be in Uganda. Once the conversation flows naturally, you can introduce questions (not necessarily in the order of the survey) and listen carefully, as answers often hide in the subtext.
Field-testing questionnaires, rehearsing with trained enumerators (like those in Akvo’s network, who share our values and ensure data quality), and practising extensively before collecting actual data all help make conversations more organic. The more natural the dialogue, the closer we get to the truth.
Truth hides in numbers and narratives
In a focus group discussion (FGD) in Western India, we engaged with a group of men and women. During the discussion, 90% of respondents claimed that women played a key role in deciding what crops to grow and where to sell. Meanwhile, most men sat on chairs, while the women sat on the floor with us. In a separate FGD with women, we learned that they hardly played any role in these decisions.
Getting to the bottom of an issue and capturing its nuance takes time, effort, and cost. Numbers are useful, but without a context-sensitive narrative, they can hide more than they reveal.
Snap surveys not snapshots
The development sector is a graveyard of baseline and endline surveys. While these snapshots in time definitely serve an important purpose, it is perhaps time to rethink traditional surveys. While we acknowledge the power of immersive field investigations, combining narratives with numbers and presence in the field, we want to highlight the importance of breaking down large surveys into smaller chunks and administering remote surveys. Establishing a regular and effective system for user/beneficiary monitoring is often more valuable than to survey a larger number of users once or twice.
For example, we broke down user satisfaction with WASH services (in Kenya) and farm level business models (Zambia) into smaller closed-ended questionnaires that were administered via USSD, WhatsApp and phone calls. Insights from these frequent “snap surveys” both complemented and contradicted findings from large-scale field investigations. In cases of conflicting results, we either validated the findings in subsequent surveys or applied pre-defined criteria to determine which data to prioritise.
Above: A GIF illustrating the difference between in-person data collection and remote data collection. You can read more about this here.
Technologies and techniques
The advances in AI and machine learning have thrown interesting possibilities in the technology mix. In the past, we have administered USSD, WhatsApp, Web based and phone surveys in addition to mobile app-based field data collection. But AI and machine learning have made qualitative inquiries more interesting. It is easier to record, code the responses and analyse open ended responses using machine learning.
Sampling techniques are equally crucial. We have carried out multi-stage cluster sampling in countries with proper sampling frames. At the same time, we’ve used snowball sampling to build a sampling frame and administer the actual survey. No matter the country, or which technologies or techniques are selected, we have learnt to remain adaptive and flexible in the field. We learnt to oversample in Uganda to avoid conflicts between residents and refugees, we built sampling frames together with farmers in Kenya, we crossed flooding rivers in Liberia and dealt with the eventualities of displaced people and no internet for days.
Conclusions and recommendations
It is difficult to capture the richness and diversity of nearly two decades of experience in data collection—both in the field and remotely. Summarising our conclusions and making recommendations is even harder, as I am aware that I may have missed mentioning things that matter and occasions that deserved highlighting. Yet, one feels the need to conclude, recommend, and at least provide a sense of direction to the reader. So, here goes the list:
- Understand the incentives: Human beings alter their behaviours based on incentives. Understand the incentives, ask the questions in the right manner, and you are more likely to get a truthful response.
- Enjoy the conversation: All of us enjoy a good conversation. Rather than treating a survey like an interview, focus on having a genuine conversation. If you can build rapport with respondents and establish mutual trust, you are more likely to uncover the truth.
- Find the stories behind numbers: Numbers are useful markers of a situation, but without context or a story, they tend to hide more than they reveal. Invest the time, immerse yourself in the lives of your respondents, and uncover the stories behind the numbers—the benefits will outweigh the costs.
- Complement immersive investigations with lean and agile surveys: Immersive field investigations serve an important purpose. However, breaking down large, complicated surveys into more manageable chunks and frequently administering smaller, remote surveys with closed-ended questions can be equally effective.
- Tools and techniques are a means to an end: Focus on the end goals. There are situations when on-field surveys are a must, while at other times, remote surveys are more useful. Assess the situation and determine whether an on-field, remote, or combined approach would be more appropriate for your context. Remain adaptive, flexible, and alert to the situation. Getting fixated on a single tool will not help, but a toolbox of solutions—technologies (phone-based, USSD, WhatsApp, web-based, or on-field surveys) and techniques (multi-stage cluster sampling, snowball sampling, etc.)—will offer more possibilities.
Collecting data is an exercise in truth-seeking. However, it requires investments of money, time, and effort. While quick data collection methods and linear narratives of social impact may seem attractive at first glance, we should reject simplistic notions of change and strive for the truth. The truth is messy but magical. And it is this magic that empowers us to make powerful decisions that improve people's lives.