Behind the AgriFieldNet Model
We are excited to introduce Muhamed Tuo, a data scientist and a member of the winning team of the AgriFieldNet India Challenge. This competition aimed to classify crop types in agricultural fields across Northern India and was hosted on Zindi. It was organized with a grant from the Enabling Crop Analytics at Scale (ECAAS) Initiative, which is funded by Bill & Melinda Gates Foundation and implemented by Tetra Tech. The competition concluded in October 2022, with 635 participants from around the world vying to build machine learning models that can locate agricultural fields in various districts in the four Northern Indian states of Uttar Pradesh, Rajasthan, Odisha, and Bihar.
After a rigorous evaluation process, Team Starlink was declared the winner, having demonstrated exceptional skills in applying machine learning to satellite data. We had the pleasure of sitting down with Muhamed to discuss his journey to becoming data scientists and the team’s approach to tackling the problem. The other team members are Taiwo Ogundare and Caleb Emelike. Their winning solution, the AgriFieldNet Model for Crop Types Detection from Satellite Imagery is available for download on Radiant MLHub.
Congratulations on winning the AgriFieldNet India Challenge! What inspired you to get involved in this field? How did you become interested in machine learning? Tell us about your machine learning journey.
I have a Dual Bachelor’s in Mathematics and Computer Science from the University of Paris 12 in France, and I recently received my Master’s degree in Big Data and Artificial Intelligence. Back in my freshman year, with two friends, we had this game of challenging ourselves to learn new technologies and programming languages. One day, one of us came up with the idea of taking part in a data challenge on Zindi. The goal was to see what we could do and learn in 2 months. After that competition, I got curious and wanted to know more. From there, I started researching everything related to machine learning and AI. And I never stopped.
Where did you learn about the AgriFieldNet India Challenge, and what made you decide to participate?
I’ve been doing ML competitions for four years now on Zindi and Kaggle. So, when I saw a post on Linkedin from Radiant Earth’s account about the competition, I got curious and went on to learn more about the challenge and its specificities. When I saw on the problem description page that the training dataset was greatly imbalanced and that the test set wasn’t following that distribution, I knew that this was a good challenge to test my knowledge and learn more about building a robust and generalizable model.
Your winning algorithm outperformed 635 teams/individuals. How did you approach the problem, and what do you think set you apart?
We started by benchmarking a set of promising crop and imbalanced data classification techniques and reduced that list to the most effective ones. Then, we spent a significant amount of time on data engineering. That was the most important part of our solution. The data creation process takes about 7 hours to complete. I believe that is what ultimately set us apart from the other teams.
Were you familiar with using machine learning on satellite imagery before this competition? How does this differ from common problems in computer vision?
Prior to participating in this challenge, we joined a few competitions where we had to work with satellite imagery, so yes, we had some experience with satellite imagery.
They differ in two ways - the first being the models used to tackle these problems and the second being the input data of these models. In common computer vision problems, the model is nearly always a deep learning model (CNN or Transformers), and the difference between raw and actual input data is minimal.
Any challenges you would like to share?
Most of the fields were very small. So it took a lot of work to calculate the field statistics because of the low in-field variance.
Machine learning is a fast-growing field. How do you stay up-to-date with the latest technological developments?
I find Twitter to be a great place to learn about the latest academic research and techniques. Competition platforms like Zindi and Kaggle are also great places to stay up-to-date with the best methods and algorithms used in the field.
Any advice for beginner data scientists who want to participate in data competitions?
I would suggest joining a competition that they find both challenging and exciting. Then download the data, and start playing with it to build a strong understanding of the problem. While doing that, regularly go to the competition forum and read the discussions, as they always contain important information. Then, they can try to build a simple baseline model or even take the starter notebook and start from there. It doesn’t matter if one’s model isn’t as good as other competitors. The goal is to have a starting point and keep improving from there.
Once the competition ends, read the winners’ solution description and try to implement most of the doable parts. And lastly, remember that it is only scary until you try it.