Charitable projects by tech giants tend to highlight their strengths in data collection and processing. A project backed by Google collects agricultural data to predict which pests will eat away at crops. Another Microsoft plan is to process health data to find “insights into the causes of disease.”
There are dozens of these projects, all tackling real problems with good intentions and boundless optimism. The results can sometimes be overwhelming, but not always. And those results set a trend for the kind of smart data-driven philanthropy that appeals to the general ethos of the tech industry. If optimization is good for the company, why wouldn't it be good for the world?
But as the big data ethos hardens, some groups are rethinking the deal. In a new report for Access Now, researcher Giulio Coppi takes a hard look at similar moves in the humanitarian sector, where big US tech platforms are becoming increasingly inescapable. The point of this report is not to reject data collection and optimization altogether, but to take a hard look at exactly what humanitarian organizations are giving up in the bargain.
“When the big data craze happened, there was this idea that we could leverage data to do better. It was a static vision of how data worked. ” says Coppi. “Now we're entering a completely different situation where data is in flux. It's something that's constantly growing, and it's always looking for new data. And you have to feed it.”
As large-scale language models compete for more training data, this is a concern that goes far beyond just humanitarian work. Anyone who does business with big technology companies is used to trading data privacy in exchange for free services. However, transaction terms are typically difficult to specify and even more difficult to negotiate.
This is part of a broader pattern Coppi tracks in his report. This process of collecting and optimizing data can have real benefits, but it also has important consequences that humanitarian organizations are not accustomed to dealing with. In one example, his Microsoft-sponsored research data on height and malnutrition was left available as training data for future projects. In other cases, UN agencies end up acting as de facto cloud providers and immigration authorities simply because they collect relevant data.
In either case, identifying harm can be difficult. These projects were successful on their own terms, more efficient in terms of optimizing the data as a whole, and there were no breaches or other catastrophes to drive home the risks involved. But the structure of each project meant that humanitarian organizations were pushing for clearer data and better optimization, rather than asserting the privacy and autonomy of the people they were trying to help. If we view technology as a political battle between Silicon Valley data collectors and a globally distributed user base, humanitarian organizations end up on the wrong side.
It's a difficult problem, and knowing how to deal with it is even harder. Data makes it easier to measure and optimize the impact of your projects on the world. There is a real reason for collecting it. But data collection also has real costs, and those costs can appear suddenly and unexpectedly.
“We need to be more careful about the whole data chain,” Coppi told me. “We can no longer be naive.”