Which countries are the top data producers? After all, with data-fueled applications of artificial intelligence projected, by McKinsey, to generate $13 trillion in new global economic activity by 2030, this could determine the next world order, much like the role that oil production has played in creating economic power players in the preceding century.
While China and the U.S. could emerge as two AI superpowers, data sources can’t be limited to concentrations in a few places as we have with an oil-driven economy — it needs to be drawn from many, diverse sources and future AI applications will emerge from new and unexpected players. The new world order taking shape is likely to be more complex than a simple bi-polar structure, especially since data is being produced at a pace that boggles the mind.
Building on our past work mapping the digital evolution and digital competitiveness of different countries around the world, we wanted to try to locate the deepest and widest pools of useful data. This is essential to run the myriad machine learning models critical to AI. To do so, it is useful to make a distinction between the raw volume of data and a measure that we shall call “gross data product” – our version of the new GDP. To identify the world’s top “gross data product” producers, we propose using four criteria:
- Volume: Absolute amount of broadband consumed by a country, as a proxy for the raw data generated.
- Usage: Number of users active on the internet, as a proxy for the breadth of usage behaviors, needs and contexts.
- Accessibility: Institutional openness to data flows as a way to assess whether the data generated in a country permits wider usability and accessibility by multiple AI researchers, innovators, and applications.
- Complexity: Volume of broadband consumption per capita, as a proxy for the sophistication and complexity of digital activity.
There are several nuances to note. For one, we recognize that the digital trace that is generated by computers around the world spans a very wide range of activities, from sending an SMS text message to making a financial transaction. To enable an apples-to-apples comparison across the world, we use broadband per capita as a measure of such breadth and complexity (in some ways, mimicking the use of per capita income as a proxy for overall prosperity).
Second, there are differences across countries in terms of how private data is shared across agencies and whether there are digital identity frameworks that can help connect individuals to their digital activities. These institutional factors could make a difference to how data could eventually be pieced together. We do not call out these distinctions. We chose the countries included in our analysis based on a few considerations: 1) Countries that are the most significant contributors to the global digital economy either because they are high on our earlier digital evolution index score or because they have strong momentum in their digital activities; 2) Countries that represent a reasonable spread in terms of region and socio-economic position; and 3) Countries that provided us with a solid data and evidence base to do the analyses.
Finally, an important consideration in determining accessibility is privacy. Privacy concerns and data protection regulations can help or hinder the abilities for algorithms to develop new capabilities. We take the position for this analysis that an established framework for ensuring privacy and data protection and openness to the mobility of data is a net benefit and a positive contributor to the development of AI over the long term. As an example, consider the problem of fraud detection in financial transactions. Applications that draw upon insights from diverse geographic locations and multiple usage contexts help establish patterns of trustworthiness and help flag security risks; such applications benefit from systems that meet the accessibility criterion. That said, we acknowledge that in the near-term there could be some countries – China being the pre-eminent example – where data-sharing between public and private sector agencies with very little mobility beyond the national borders could violate privacy and openness norms and yet yield a temporary advantage in training algorithms inside a “walled garden.”
Which of these criteria should be used in assessing a potential new world order, based on data? We believe accessibility should remain a foundational criterion. If one were to take the point of view that the biggest and highest impact AI applications are the ones that serve the greatest public purpose, access to data is key. In its recent study of AI for the public good, McKinsey cites access as one of the principal barriers: of the 18 bottlenecks identified by McKinsey, six relate to data availability, volume, quality, and usability.
This chart below shows what happens when the 30 countries we studied were mapped using two of our criteria:
While the U.S. scores well on all three criteria – and this might seem counter-intuitive to prevailing wisdom — China operates with a handicap if global accessibility of the data is considered essential for creating successful AI applications in the future. If the EU (currently including the UK) were to act as a collective, it represents a key producer that could rival the U.S. Besides, China, other BRIC nations, Brazil, India, Russia, could emerge as strong tier two contenders, largely on the strengths of raw data they produce; however, they too would be handicapped by accessibility concerns.
A different set of implications emerge for smaller countries, such as New Zealand, or those unaffiliated with larger economic unions, such as South Korea, but with high openness and mobility in data flows; such countries would benefit from establishing trade agreements in data with other “open” countries and thereby overcome their natural limitations, either in terms of number of users or in terms of total broadband consumed within the country. The forms such trade or data-sharing agreements might take is yet to be determined; however, we can envision that they could be a distinct possibility especially when we recognize that gross data product has value just like any other product that is freely traded today.
Of course, the direction of high-value AI applications is still emerging. There is also a risk of AI itself being over-hyped, misunderstood, and set up for disappointments down the road. But it’s clear that many important applications are already in use and more are coming. Our analytical framework is flexible enough to account for such fluidity. If we use a different set of criteria as being more relevant for driving successful AI applications, we find a different picture emerging. The chart below offers one such possibility, where only complexity and accessibility are considered.
When viewed in this manner, there is a more linear structuring of this “new” data-driven world order. The high broadband consumption per capita and institutionally open countries (in the top right hand portion of the graphic) emerge as the clear winners. One can imagine a scenario where the high complexity and mobility of data flows in the top-right of the graphic allow for a more productive “free-trade” zone, where countries mutually benefit from tapping into each other’s data reservoirs.
Finally, we considered a scenario where all four criteria ought to be considered important. If we assign equivalent weights to all four, a ranking of “new” data producers and an updated world order emerges.
1. United States
2. United Kingdom
3. China
4. Switzerland
5. South Korea
6. France
7. Canada
8. Sweden
9. Australia
10. Czech Republic
11. Japan
12. New Zealand
13. Germany
14. Spain
15. Ireland
16. Italy
17. Portugal
18. Mexico
19. Argentina
20. Chile
21. Poland
22. Brazil
23. Greece
24. India
25. South Africa
26. Hungary
27. Malaysia
28. Russia
29. Turkey
30. Indonesia
Of course, these segmentations provide insight into where the major data producers are based on a set of assumptions about what will be important for the highest-value applications in the future. Our purpose was to acknowledge the uncertainties and show how alternative assumptions yield different scenarios for the world order. A different segmentation and ranking would emerge if were to ask a different set of questions focused on the outcomes, such as economic or geopolitical value through AI that might be assigned to each country or how countries rank in terms of ease of doing digital business currently as they prepare for such a future. We are developing these in future research projects.
Data is the fuel of the new economy, and even more so of the economy to come. In declaring back in 2017 that the world’s most valuable resource is no longer oil, but data, The Economist said: “Whether you are going for a run, watching TV or even just sitting in traffic, virtually every activity creates a digital trace — more raw material for the data distilleries.” Algorithms trained by all these digital traces will be globally transformational. It’s possible that a new world order will emerge from it, along with a new “GDP” — gross data product —that captures an emerging measure of wealth and power of nations. It is time we identified what the field looks like now that new competitive and collaborative opportunities are developing.