More Data More Available to More People
The following is an excerpt from the Radiant Earth Foundation 2022 Annual Report.
As we enter the Year of Open Science, we’re giving some extra thought to what “open” means and why it matters. When Tim Berners-Lee first proposed “the WorldWideWeb project,” the sharing of information was core to his philosophy:
The project started with the philosophy that much academic information should be freely available to anyone. It aims to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups.
Berners-Lee’s vision was quite prescient. The volume of data shared on the web today is astonishing. Indeed, the volume and complexity of data that people can access over the Internet is so great that it can be overwhelming. It’s quite common for organizations to launch a data portal or put something on GitHub and dust their hands off, satisfied that they’ve done their part to “be open”. We enter 2023 recognizing that we’ll never be “done” with open science, but that it will always be a continual process to share data in ways that accelerate scientific discovery. The framework we use to guide us on this continual journey is to make more data more available to more people.
In one sense, “more data” isn’t a problem. As more satellites are launched and new sensors are invented, we can expect to have more than enough data to work with for a long time. But still, we want more. Radiant has historically focused on Earth imagery. In fact, our official legal name is “Open Imagery Network.” We’re already working on ways to expand Radiant ML Hub to support other types of data that our community needs, such as vector data, point clouds, and even tabular data.
We also know that data begets data. Members of our community frequently access open data and use it to produce their own derived products that they want to share with others. We will make it easier for our community members to publish their own data products on Radiant ML Hub in 2023.
Making data more available is the thrust of our work on STAC and with the cloud-native geospatial community more broadly.
Is it really fair to say that data is “open” if it takes a day to download it? or if you need more than a terabyte of available storage to work with it? What if there’s no documentation for it in the language you speak? And even if you can use a bit of open data with open source code, does it matter if that open source code only runs on a computer that you can’t afford? In short: making data available for download is not enough! Making data available requires thinking about the needs and capacity of your users.
There is a lot of exciting work already underway to make massive planetary-scale datasets in a variety of formats easier to work with. This year, we plan to apply some of the lessons we’ve learned from the rapid adoption of the STAC specification to improve the availability of many other types of Earth science data.
The future of our species depends on our ability to develop sustainable methods of sharing Earth data. I recognize that this is a big claim, so here’s why I believe it’s true.
Our ability to respond effectively to global crises is contingent on widespread access to trustworthy and accurate data about our planet. Whether we’re confronting climate change or a pandemic, we need to make policy decisions at all levels of society, and shared access to data helps us make those decisions collaboratively. Perhaps more significantly, it allows us to assess the impact of those decisions collaboratively.
While policy changes at public and private institutions have made significant inroads in improving access to data, more needs to be done to make Earth data work for all of us. In particular, large environmental datasets have traditionally only been available to institutions with large computing and storage infrastructure, putting them out of reach of underrepresented communities that are likely to be most impacted by climate change.
The impact of our work to make more data more available will be cut short if we aren’t also deliberate about creating a larger and more diverse community of users who can work with the data to inform decisions about their communities. The peril of failing to do this is summarized well in a paper titled, “The co-development of models with expert judgement suppresses model diversity and underestimates risk,” by Erica Thompson, who recently published the book, Escape from Model Land.
As she says in the paper:
All “climate decisions” are also political decisions about which industries to support or restrain, which goals to prioritise, which voices to amplify or to ignore. All “climate decisions” are also moral decisions about whose lives matter; what species matter; what levels of risk we are prepared to live with and accept on behalf of future generations. In framing climate decisions as technical decisions primarily to be answered by modelling studies, it is imperative to consider the political and ethical dimensions of that framing and what interests are served by doing so.
Increasing the size and diversity of our community is the most consequential aspect of our work and, we believe, the most difficult.
We certainly won’t be able to do it alone, so please get in touch if you’re interested in helping us reach more people.