The Data Web

As we explored in the previous article , the prospects for data production over the next five years (2020) are exciting and challenging. If on the one hand the supply of digital data is expected to grow exponentially, on the other hand, a significant percentage of these data may not be useful for anything.

The traditional architectures for data storage, especially in the pre-internet era, were being established to store the data, in files, isolated from the outside world, characterizing true islands of data and information. As a result of this model, numerous problems have arisen, especially data redundancy, which by the way is an existing problem to this day. Over the years, data storage has evolved into database creation, and later clustering models such as distributed database systems and database federations [1].

At the same time, in mid-1996, Tim Berners-Lee published the article “The World Wide Web: Past, Present and Future” which defined Web guidelines as it should be at the time and how it should be in the future. Already at that time, almost 20 years ago, Berners-Lee established that the Web should be a space for sharing information so that people (and machines) can communicate with each other. Complementarily, he predicted the existence of the interaction between people and hypertext intuitive and machine readable.

However, the Web we know today has been structured from the hypertext, known as web pages, having as main focus the presentation of information. Although Tim Berners-Lee predicted the reading of the data by machine, the current Web is primarily interpreted by humans.

From the Web, numerous possibilities of information production have developed over time. HTML pages, websites, portals, multimedia content, various files and more recently with the “social era”, blogs, social media, among others. That is, the Web has become a global information space that grows every day.

With the increasing volume of information, other relevant issues related to information retrieval and retrieval have emerged. Rapidly, the human capacity to find information on the Web was very limited, highlighting the concern that the location and retrieval of data on the web should be done by machines, but lacking data on information that was understood by machines. This data is known as metadata. In addition, the current Web is syntactic, whose search is done mainly by keywords in a large number of pages obtaining low precision. In addition, the pages integrate and link in a poorly structured and manual way.

As a result, not all data can be found through traditional web search engines, much less is possible if you specify complex queries about data that are present on multiple pages, such as “What is the full name of all captains of the soccer teams who have won all World Cups? ” That is, just as in the time of archives, the data on the Web still live in isolation from one another.

Fortunately, various institutions and researchers around the world are very aware of this paradox, among them and especially the W3C – World Wide Web Consortium. W3C’s mission is to lead the WWW to maximize its potential by developing protocols and guides that support large-scale Web development. His vision for the Web involves sharing, sharing knowledge, supporting the building of trust on a global scale. This vision also establishes the existence of a single Web (One Web), which adopts open principles and standards.

I do not need to explain much about what the Web has to do with the vast supply of data on a global scale, is not it? After all, where do most of these billions and trillions of world-wide data travel?

In order to achieve this vision, W3C has been working hard to build a new Web that meets the principles and open standards that goes beyond the Web that we know to be composed primarily of files and HTML pages. This new, more connected and open Web is being called the “Data Web”.

In the “Data Web”, it is stipulated that the data become easily localizable as well as associated with semantic elements such as vocabularies. In addition, the data becomes understood as data resources and for that, they need unique identifiers that enable specific access to each resource. Moreover, the way in which the data relate to each other changes from the traditional tables and database schemas to a subject-object-predicate schema, known as triple, among other advances.

Fortunately, despite the problematic of the previous article, the prospects can be promising considering all this wonderful work that has been developed by countless experts world-wide under the coordination of W3C. In the next articles, we will be exploring the Data Web even further, trying to understand how it is being structured, the new concepts and relevant applications.

Leave a Reply

Your email address will not be published. Required fields are marked *