The potential impact of the data explosion around the world continues to excite the imagination. A 2018 report estimated that every second of every day, each person produces 1.7 MB data on average, and the annual data creation more than double since then and is projected to double again by 2025. A report by the McKinsey Global Institute estimates that clever uses of big data could generate a $ 3 billion in economic activity, enabling applications as diverse as autonomous cars, personalized healthcare, and traceable food supply chains.
But adding all of this data to the system is also creating confusion about how to find it, use it, manage it, and share it legally, safely, and efficiently. Where did a certain data set come from? Who owns what? Who is allowed to see certain things? Where he lives? Can it be shared? Can it be sold? Can people see how it was used?
As data applications grow and become more ubiquitous, data producers, consumers, and owners and managers find they have no blueprint to follow. Consumers want to connect to data they trust so they can make the best possible decisions. Producers need tools to share their data securely with those who need it. But technology platforms fall short and there are no true common sources of truth to connect both parties.
How do we find the data? When should we move it?
In a perfect world, data would flow freely as a utility accessible to all. It could be packaged and sold as raw material. It can be viewed easily, without complications, by anyone authorized to view it. Its origins and movements could be traced, eliminating any worries about dire uses somewhere along the line.
Today’s world, of course, does not work this way. The massive data explosion has created a long list of problems and opportunities that make it difficult to share bits of information.
Since data is created almost everywhere within and outside of an organization, the first challenge is identifying what is being collected and how to organize it so that it can be found.
The lack of transparency and sovereignty over the data and infrastructure stored and processed opens up trust problems. Today, moving data to centralized locations from multiple technology stacks is costly and inefficient. The absence of open metadata standards and easily accessible application programming interfaces can make data access and consumption difficult. The presence of industry-specific data ontologies can make it difficult for outsiders to benefit from new data sources. Multiple stakeholders and difficulty in accessing existing data services can make sharing difficult without a governance model.
Europe is taking the lead
Despite the problems, large-scale data exchange projects are being undertaken. One that is backed by the European Union and a non-profit group is creating an interoperable data exchange called Gaia-X, where companies can share data under the protection of strict European data privacy laws. The exchange is conceived as a container for sharing data between industries and a repository of information about data services around artificial intelligence (AI), analytics and the Internet of things.
Hewlett Packard Enterprise recently announced a solution framework to support the participation of companies, service providers and public bodies in Gaia-X. The data space platform, which is currently in development and is based on open and cloud-native standards, democratizes access to data, data analytics and AI by making it more accessible to experts in the domain and common users. It provides a place where domain area experts can more easily identify reliable data sets and securely perform operational data analysis, without always requiring costly data movement to centralized locations.
By using this framework to integrate complex data sources into IT environments, companies will be able to provide data transparency at scale so that everyone, whether data scientist or not, knows what data they have, how to access it, and how. use them. in real time.
Data sharing initiatives are also at the top of companies’ agendas. A major priority companies face is verifying the data that is used to train internal AI and machine learning models. Artificial intelligence and machine learning are already being used extensively in business and industry to drive continuous improvements in everything from product development to recruiting and manufacturing. And we are just beginning. IDC projects that the global AI market will grow from $ 328 billion in 2021 to $ 554 billion in 2025.
To unlock the true potential of AI, governments and businesses must better understand the collective legacy of all the data that drives these models. How do AI models make their decisions? Are they prejudiced? Are they trustworthy? Have untrusted people been able to access or change the data against which a company has trained its model? Connecting data producers with data consumers more transparently and more efficiently can help answer some of these questions.
Building data maturity
Businesses are not going to figure out how to unlock all their data overnight. But they can prepare to take advantage of management concepts and technologies that help create a data sharing mindset. They can ensure that they are developing the maturity to consume or share data strategically and effectively rather than ad hoc.
Data producers can prepare for a broader distribution of data by following a series of steps. They need to understand where their data is and how they collect it. Then they need to ensure that the people who consume the data have the ability to access the right data sets at the right time. That is the starting point.
Then comes the hardest part. If a data producer has consumers, who can be inside or outside the organization, they must connect to the data. That is both an organizational and a technological challenge. Many organizations want to govern the exchange of data with other organizations. The democratization of data, at least being able to find it in all organizations, is a problem of organizational maturity. How do you handle that?
Companies that contribute to the automotive industry actively share data with suppliers, partners, and subcontractors. It takes a lot of parts, and a lot of coordination, to assemble a car. Partners easily share information on everything from engines to tires to web-enabled repair channels. Automotive data spaces can serve more than 10,000 vendors. But in other industries, it could be more insular. Some large companies may not want to share confidential information even within their own network of business units.
Creating a data mindset
Businesses on both sides of the consumer-producer continuum can advance their data sharing mindset by asking these strategic questions:
- If companies are building artificial intelligence and machine learning solutions, where are teams getting their data from? How do they connect to that data? And how do they track that history to ensure the reliability and provenance of the data?
- If data has value to others, what is the monetization path the team is taking today to expand that value, and how will it be governed?
- If a business is already exchanging or monetizing data, can it authorize a broader set of services across multiple platforms, on premises, and in the cloud?
- For organizations that need to share data with vendors, how do you coordinate those vendors with the same data sets and updates today?
- Do producers want to replicate their data or force people to bring them models? Data sets can be so large that they cannot be replicated. Should a company host software developers on their platform where their data is and move models in and out?
- How can workers in a data-consuming department influence bottom-up data producer practices within your organization?
The data revolution is creating business opportunities, along with a lot of confusion about how to search, collect, manage and obtain information from that data in a strategic way. Data producers and data consumers are increasingly disconnected from each other. HPE is building a platform that supports both the public and local cloud, using open source as the foundation and solutions such as the HPE Ezmeral software platform to provide the common ground both parties need to make the data revolution work for them. .
Read the original article at Enterprise.nxt.
This content was produced by Hewlett Packard Enterprise. It was not written by the editorial staff of MIT Technology Review.