Publishing Planetary-Scale Data is Easy
Making it is hard.
Last week, Taylor Geospatial announced availability of Fields of the World (FTW) global field boundaries, a data product comprised of agricultural field boundaries at unprecedented scale: 3.17 billion fields identified across 241 countries and territories. It’s a phenomenal data product, the result of years of rigorous research and an extraordinary collaboration across industry, academia, and philanthropy.
Publishing it was easy.

Technically, they’re "objects"
All told, the FTW global fields data product is made up of about 540,000 objects weighing in at 350 terabytes of data. Anyone can browse and download any of it in the browser on Source Cooperative.
350 terabytes is a lot of data. It’s enough to describe over 3 billion farm fields across the entire planet, but it’s only about 6% of the total holdings we manage in Source, which is a bit over 5 petabytes (1 petabyte is 1,000 terabytes).
5 petabytes is a lot of data, but it’s only about 1.5% of the total holdings of the AWS Open Data program, which amounts to over 300 petabytes of data made openly available in Amazon S3.
300 petabytes is an incredible amount of data. It’s big enough that it’s hard for most people to comprehend. One point of reference is that it takes about 5 years to download just one petabyte of data over a normal broadband connection. That translates to 1,500 years to transfer 300 petabytes (we are very grateful to the AWS Open Data Program, which covers most of our storage costs).
S3 has been around for about 20 years now. Mai-Lan Tomsen Bukovec, the AWS VP who oversees S3 recently revealed that S3 hosts hundreds of exabytes (1 exabyte is 1,000 petabytes) of data. That means that the holdings of the AWS Open Data program amount to at most 0.15% of all of the data in Amazon S3.
The point of all these numbers is to prove that it really is easy to publish planetary-scale data now. Cloud object storage is probably the most boring, most commoditized part of the cloud computing stack, which makes it incredibly powerful. We built Source Cooperative to make it easy to use object storage to share public interest data.
This astonishing adoption of S3 is evidence that S3-compatible object storage has become the de facto standard way to share data on the Internet. Every major programming language has a robust software development kit that allows it to interact with S3-compatible object storage. Organizations all over the world use object storage for any kind of data imaginable, and there are many different service providers competing to provide it. That is to say, AWS does not have a monopoly on this technology, but they were the first to provide it at scale. Like Kleenex and Hoover, S3-compatible is the brand name applied to an entire industry of competing object storage services.
What used to be extremely daunting is now easy. What remains hard is producing great data products.
Creating Great Data Products is Hard
FTW wouldn’t exist without Taylor Geospatial’s Jen Marcus. Jen has worked relentlessly for well over two years to assemble a dream team from Microsoft’s AI for Good Lab, Arizona State University, Washington University in St. Louis, Clark University, Planet, and Wherobots to make it possible. We’ve known for a long time that it should be possible to use AI to detect global field boundaries from space, but it’s taken a heroic collaborative effort to train a model to do it well and do it in the open. It’s been anything but easy. It’s required gathering huge volumes of training data, convening dozens of data practitioners from around the world to develop shared metadata specifications, and building open source tools. We’re extremely proud to have been on Jen’s team.
This data release is a major step change in our shared understanding of agricultural practices around the world, but it’s also just a start. FTW doesn’t describe itself as an authoritative source of field boundary data, but as “an open ecosystem for agricultural field boundary detection.” The team understands that the best way to improve the data is to enable as many people as possible to work with it. To do this, they’ve:
- published the data in cloud-native formats with widely-adopted metadata on Source Cooperative so that anyone can build on top of it using whatever tools they want
- created a simple web viewer that allows anyone to browse the data and provide feedback to improve the model
- published peer-reviewed research, including models and tutorials to help others contribute
This is hard, ongoing work designed so that FTW doesn’t own the space, but creates an environment in which anyone can contribute to improve our understanding of agriculture at planetary scale.
Almost two years ago in the early stages of this project, I wrote:
The farm field is a good place to start thinking about how to improve agricultural practices. Fields are a foundational unit of property where critical decisions are made such as what to plant, how to fertilize, how to irrigate, and how to insure. If we intend to improve agricultural practices globally, we need to be able to influence decision making at the field level. This isn’t possible if we don’t know where fields are located or how to refer to them.
We’ve proven that it’s possible to create this kind of planetary-scale data. The data will only get better from here. Now we have to figure out how to sustain it, improve it, and ensure that everyone can benefit from it. Plenty more fulfilling hard work to come.