How to use machine learning to predict waterpoint status

  • Written by Lars Heemskerk
    3 February 2021
  • Share:
< The water point you selected is probably no longer functional >

If you’re responsible for providing drinking water to as many people as possible, this is the kind of information you want to have access to - especially when you’re hundreds of miles from the water point in question. Thanks to the support of the Dutch Ministry of Foreign Affairs and the Coca-Cola Foundation (TCCF) Akvo, together with WPDx and DataRobot, was able to conduct a pilot in Sierra Leone with machine learning algorithms to automate decision intelligence.


Improving water services in Sierra Leone


As of 2012, the government of Sierra Leone has been monitoring water points through a large-scale national inventory, as well as small-scale monitoring efforts by NGOs. Data has been collected on the functionality, year of construction, type of pump, type of management, distance to village, etc. to calculate the percentage of the population that have access to drinking water. This data provides a global insight into the state of WASH infrastructure in the country and, because Sierra Leone is at the forefront of African countries sharing data openly, a lot of this data is available on platforms like WASH data Sierra Leone and WPDx.


Unfortunately, this data is not regularly enriched, so the information on these portals is quickly outdated and therefore less reliable. Thanks to various efforts from WPDx, among others, the importance of regular uploading of data has been emphasised in the National Digital Monitoring Approach. The recent signing of a letter by the director of the Water Directorate, which states the mandatory sharing of water point data by every organisation or government body in Sierra Leone, is an indispensable step in this process.


In addition, Akvo, in collaboration with WPDx and the Ministry of Water Resources, has started to explore how more can be done with the existing data, at local and national level, to generate data-driven insights that can improve decision making. Machine learning is relatively new in the water sector, but can be applied very well to historical data to predict outcomes and uncover patterns not easily spotted by humans.

Setting up the foundation for advanced analytics

Machine Learning is about recognising patterns in data. Using data collected in the past, machine learning techniques can recognise patterns and make predictions for the future. This can be applied to historical water point data, too. 


Based on the available data, and with the help of DataRobot software, we have been able to determine a number of indicators that are related to the predictable metric - functionality. By combining functionality with other indicators, such as district, county, management, age, water source, and type, the system can teach itself to predict the probability that a water point will be functional now or in the future. The tool is made available on the Water Point Data Exchange.


By using the DataRobot platform, we were able to predict which water points are going to break with an accuracy of 85%. By applying these machine learning models, it’s possible to determine which broken water point, out of thousands, should be fixed first to help the most people. On top of this tool, decision makers can also make use of other geospatial information services (GIS) tools that have been developed to analyse water points to determine high impact locations for rehabilitation, construction and estimating basic water coverage aligned with the Sustainable Development Goals (SDGs).

Pilot Training and Support

Implementing these new advanced analytics techniques, it is just as important to involve and train stakeholders. This is not an easy process because it involves major process changes and the involvement of various governmental and non-governmental organisations. In 2019, the Global Water Challenge already held a three day training session with all district water directorates to discuss the transformation of the WASH sector to improve efficiency through the use of data. Following this session, a meeting was held to brief NGOs on the WPDx approach. Building on this general training, more focused training was provided to district mapping officers and NGOs. The next step was to set up a plan on how to use and implement the decision support tool. At the moment of writing this blog, a draft plan has been created and a workshop has been organised to dig deeper into how the decision support tools can contribute to safe water for all in Sierra Leone.

The need for more accurate data

Beside the involvement of NGOs and government bodies, reliable and up-to-date data is crucial for making correct predictions. Since the last national inventory dates back to 2016, it’s important that the water points are structurally monitored. With the letter from the above mentioned Water Directory, there will be a boost of more recent data which will certainly have a positive effect.


We also encourage stakeholders to test whether the machine learning predictions correspond to reality. This can be done on a small scale. There are talks with the Ministry of Water Resources and InterAide to carry this out and test whether the outcomes of the tools are correct and usable in the daily life of decision makers. We would like to continue with this in 2021, in order to prove the power of advanced analytics, but above all to provide drinking water to as many Sierra Leoneans as possible.


Do you want to know more about analysing and visualising data? Download our eBook! 


eBook3 (Understand) advert Data journey blog post


Lars Heemskerk

Lars Heemskerk is a Consultant, based in South East Asia and Pacific.

Posted in: Water, Data services