воскресенье, 8 января 2012 г.

2012: The end of the world of Web as we know it

When we are talking about Web, usually we imply hypertext. However, it is the middle size answer. The small size answer is a hyperreference to an information resource in Internet. The small idea which changed the world consists of one HTML tag. HTML without this tag would become trivial markup language for text formatting, not better and not worse than others. To that moment, ideas which lay in the ground of the Web, existed for almost 30 years. Thus, HTML descends from SGML, which descends from GML, which is developed in IBM in 1960s. The very hyperreference was used in different systems as NLS or HyperCard database before 1991, when the first draft of HTML specification was issued. Then why is hyperreference revolutionary? First, it is immediate transfer to another computer resource, second, it is a description of a reference. All this was used before: (a) URLs is quite similar to file or network references, (b) some applications provide transfer by this references, and (c) you could always describe any reference in a text file. However, hyperreference converged this all in one entity, which, finally, facilitated navigation between computer resources and their description.

Of course, there is the large size answer. Because there are a number of factors, which made possible Web, in general, and hyperreference, in particular, and without which we cannot live online:
1. Internet. Without it, hypertext could be just another data format, which might be already forgotten.
2. Protocols (link, internet, transport, application). Here we should specially mention routing, which hides details of connection to remote host. Without that, we should specify all details of subnetworks. Other protocols are no less important.
3. DNS and domain system. Do you realize an ad like "Visit our site at 210.987.654.321"? And "Visit our site at fedc:ba09:8765:4321:fedc:ba09:8765:4321"?
4. Hypertext. Of course, Internet itself could not revolutionize anything. You can easily understand that when work in local network without hypertext. In this case, you do use network references as plain text, then copy-paste them, lose their descriptions, and share them through emails or forums, which is tedious.
5. Text formatting. Of course, it is important factor too, because it made hypertext user-friendly. Without that, hypertext would be yet another text format known only for geeks.
6. Markup, tags as plain text. Though today we have a bunch of HTML editors, but simplicity of HTML creation is very important factor, which made HTML so widely used. Also, it allowed to use HTML in text fragments, which extends its usage even more.
7. Free form. Nevertheless important is HTML allowed enhanced survival for information. Thus, hypertext can have invalid tags, unknown attributes, omit headers, etc. Without that, HTML could be replaced by other format, which won't require specialized editor for checking validity.
8. Forms. Gave birth to Web applications, without which Web revolution would not be possible.
9. Email and other communications. Of course, without them, Webolution would not possible too. Otherwise, how to disseminate news and notify about new information or updates?

However, though hypertext facilitates content integration, it is not quite efficient sometimes. Did you try to support hypertext for describing some directory? First, you need to synchronize it each time when directory content is changed. Second, you should synchronize the very content (for example, if you described some picture at two Web pages, then, if the description changed, you should update both pages). Hypertext had other inefficacies too, which forced further Web evolution, which went in several directions:
1. Imitating desktop applications.
2. Extended usage of Web capabilities (like communicating, socializing, collaborating, etc).
3. Information management enhancements (search, etc).
4. Hypertext extending (data, semantics, etc).


Gradually users, developers, and designers understood that hypertext capabilities are not sufficient to represent everything they want. This caused creation of JavaScript, DHTML, CSS, Ajax, etc. All they relate to MVC pattern, where model (or content) corresponds to text inside HTML, view corresponds to text formatting, and controller does to JavaScript code. However, even though CSS was conceived to separate content (model) from form (view), and JavaScript theoretically can be separated from HTML, but, in reality, all three components mixed between each other. The practice of CSS and JavaScript usage witnesses that, theoretically, view and controller can be separated from model, when applied globally (to the whole). However, there are a lot of local tasks, which usually solved in mixed mode: for example, to move some specific HTML element or to add behavior to it. Ironically, this is the fate of almost all technologies, which applied in the way, which was not predicted or recommended by their designers.

Did these technologies revolutionize something? They definitely do hypertext itself. But, generally speaking, they just moved hypertext closer and closer to desktop applications. However, Web application functionality which is similar to desktop one requires more effort. This is consequence of development complexity (because hypertext was not designed for application development). That is, this is rather eternal pursuit than revolution.

Later many realized that complex behavior and complex design are difficult to implement in hypertext. This is natural, because it was not in the design. Therefore, Flash, HTML5, etc were created, and different medias (audio, video) were used more and more frequently. But multimedia progress is not specific feature of hypertext, but is the part of general progress of computer systems. Is there any revolution? Certainly no. This is only coincidence that they were used in hypertext in parallel with their broader usage in personal computers. Of course, when we see that some sites post their news only as video, it looks like everything could turn into "sheer television" soon. Of course, this won't happen just because some information is difficult to represent as video.


Though Web 2.0 was coined in 2000s, but its roots may be traced back to the middle of 1990s. Namely then, personal Web pages were widely popularized. Everyone wanted to broadcast oneself to the world, which was not so simple. First, it requires knowing HTML basics. Second, it demands to be with taste (some of us still shudder when recall some pages from 1990s) or money (for a designer). Third, you needed to stay in touch with the world, therefore you should have an email or even a forum (chat). When the first Web euphoria passed away, Web page requirements have grown but simultaneously simplified by different services, which proposed uniform approach. Here come social networks.

Though Web 2.0 term was coined not only for that. The Web changed itself: Rich Internet Applications (as the consequence of MVC improvements), enhanced search, tagging (which begot folksonomies), wikis, etc. Of course, Web 2.0 is inseparable from the progress of Web applications, hypertext itself, and computer industry, in general. For example, broad usage of animation and video was impossible in 1990s, because hardware was constantly behind. The same concerns bandwidth, which, at first, did not allow Web applications to come closer to desktop ones. That is, Web is rather evolved than changed the direction of progress. Did social networks revolutionized something? In some sense, yes. But, at the same time, most of their traffic is information "noise", samples of "read and throw away" or "see and forget". Fortunately, this anarchy is alleviated by collaboration sites (wikis, etc). However, this is just continuation of Webolution, which just came later than it could be (in 1990s).

Though revolution in the name of one company is too Googlecentric, but this company made for the Web more than some ones taken together. First of all, this is the search, then maps, then innovations here and there. On the other hand, value of the search sometimes is too exaggerated, though 10 years ago it changed the way we search information. Back in 1990s, the academic approach prevailed. It declared that information should be categorized into hierarchies, so called portals, whereas a search was secondary tool. Analogies with real world quite often play dirty jokes with computer technologies. Their creators attempt to draw parallels with real worlds, whereas computer environment in some sense is richer than that. Thus, portals were organized similarly to library directories, which is more efficient in real world than a search of a string throughout books. However, in computers everything is quite the contrary: a search could be more efficient than using hierarchy. Google proved that by making search more efficient than this have done before. However, today, all recent Google innovations either fails (like Wave), or change a little (as the most recent search enhancements, which are rather cosmetic and sometimes even irritating).

What is worse, the search works sometimes is just inefficiently and gives absurd results. This is direct result of overrated PageRank which is rather the statistical hack. Finally, what is statistics? What can average temperature by hospital tell? It is helpful to understand tendencies, it could be helpful for the search to sort out results, which are more related than others. But it could be done after precise search is ready. However, modern search is not precise. If you have ever read Google help for search, all advices are about making a query simpler. This is quite natural, because modern search works efficiently only in the case of plain queries, which consists of 1-3 words or which coincides with some identifier (like New York Times), or when you can predict a page title in advance. Of course, such approach works to some extent, but as soon as a query becomes more complex (when words are linked by complex relations, and when a query grows to 5 and more words), the search starts giving incorrect or no results at all. Even if you call this revolution, now it is the time to revise its results.


As soon as hypertext became generally available, its architects were disappointed with how it used. One of its purposes was text structuring, whereas it was applied mostly visually, usually with representation tags (which are considered by some adepts as the fault). Here comes the first attempt to adjust Web evolution: XML. It was designed as extended markup, which emphasizes simplicity, generality, and usability over the Internet, which was represented in 10 design goals in its specification. Already here we have the problem: markup is meaningless unless it is understood by a human. The same problem with any data format: meaning is not in data and even not in a format (metadata) but in understanding of both data and format.

Moreover, XML does not work as it was realized by its designers. Instead of text markup, which is human readable, XML became data serialization format, which is based on plain text. Moreover, many complain about its complexity and verbosity. XML document of even middle complexity is hardly human readable (because of complicated recognition of information in it), that is, such document can be read only by an application. But if so, then XML is only text (not binary) data form, which became possible thanks to extended storages. Of course, it is convenient, but it is not revolutionary.


Conventional Web changed the world in 10 years. For the same 10 years Semantic Web became only just yet another technology with some benefits and shortcomings. Why? Semantic Web is Web of data (as declared by its adepts), which is considered in the context of machine understandable information (metadata). One of the most interesting applications of it should be intelligent agents, which would help us to search information more efficiently. However, today is 2012, but only some of us have ever heard about it, and a few of us understand what it is for. And we are talking not only about ordinary users, developers are not interested much in it too. Remember 1990s, when any developer was eager to learn new standards and contribute something? Nothing similar for Semantic Web. The most of developers either only know theoretically what it is, or do not want to learn it because it is awkward, or think it is not applicable at all. But Semantic Web experts still describe how everything would be good. Someday. Something is wrong here. The history knows many cases of technologies, which could change the world, but didn't (CORBA, OS/2, etc).

What is wrong with Semantic Web? Reality check? Its architects clearly state that they used the latest achievements of artificial intelligence, which was used long ago before Semantic Web. However, these achievements were known only for narrow circle of experts. Is this a sign of success? If so, why should new text format be successful, where old (possibly binary) one wasn't? Only because it is used for Web? But the history of Web shows that successful technologies are (1) completely new ones, (2) ones, which were successfully used before, (3) ones, which become successful in parallel with Web progress. It is not the case for Semantic Web. But there are some problems with Semantic Web basics too.

Why does Semantic Web use URL for identification of not only information resources but things of real world? There is the evident problem: an information resource describes things of real world or abstract conceptions (that is, everything), but it is only a part of everything, not everything is a part of information resource. This is quite serious problem. Principles of computer resource identification and identification for things and conceptions are very different. Does anyone want to use library classification for people or car names? Then why this is true for Semantic Web? Moreover, such choice resulted in many URLs, which has nothing behind them, because they are used only as identifiers. But URL is not quite reliable identifier, because it depends on a site, a web server, a file system, which sometimes are just unavailable. So such usage of URL broke one of main advantage of conventional Web: hyperreference and its behavior.

Why does Semantic Web use triples for semantics? Arguments are quite solid: they are tested by time of usage in AI area and any information may be represented with them. Of course, there are not less solid cons. Not everything researched by AI science is successfully used in reality. Triple is not the only model, which may represent anything: any general-purpose programming language, relational database, natural language and some other forms can do this too. Difference is how difficult to apply it. Judging by Semantic Web dissemination, its model is not so good as some think. Quite evident proof of that: why does ternary relation is base one, whereas we use a lot of unary and binary relations? For graph representing? But it could be done in many other models too.

The deeper reason is triple model is arbitrary. Or rephrasing: "All models are wrong, but some are useful" (George Box). It is based on subject-predicate-object relation, which comes from natural language sentences like "I take a book". However already "I go home" is not so straightforward, because "home" is object only abstractly, but, in fact, it indicates a place. The usage of triples is stipulated by our living in space-time continuum, where each action involves at least two things. Of course, natural language abstracts it and fits any situation into such form. For example, "I am a user" has no action inside, because "is" is relation between "I" and "user". Such model works in natural language, but it works only because a human knows whether action or relation used. Semantic Web goes further and forced to break any situation into triples. For example, "Today I go home quickly" should do into "I go home", "I go today", and "I go quickly", whereas natural language considers each word as a separate part of speech, sentence, language, etc.

Of course, such model could work in some situations. But there are well-grounded doubts in its efficacy, because after 10 years Semantic Web is still expensive toy. Compare this with conventional Web, which was always quite affordable as for understanding and cost. Machine orientation played dirty joke with Semantic Web: it is so deeply oriented to machines that humans are not able to use it appropriately. The very experts of Semantic Web still declare there is no good representation of it, no human understandable interface. Partly this explains why developers hardly could understand what it is. Partially, this problem might be resolved with microformats, which used HTML for semantics, however they are restricted and not extendable.


Today we may confidently declare that Web (or namely hyperreference) potential is already exhausted, Web applications still cannot reach the level of desktop ones, multimedia progress is not specific to the Web, search does not advance because cannot handle complex queries, XML did not fulfill expectations of its designers, Semantic Web is too expensive and awkward to influence the Web. But there are a lot of areas which can be improved even in conventional Web. One simple hyperreference changed the world once, one semantic reference can do it again.

The idea is simple: any word, phrase, sentence, article, book is semantic reference. Semantic reference may navigate to things, conceptions and information resources which describe them. Semantic reference is self-descriptive, which is not necessary to define explicitly (however, which may require some specification to avoid ambiguities). For example "HTML specification" refers not to the specific file at the specific server, but to all HTML specifications which were published and which will be published. Of course, we are talking about "Web of Things", idea of which soars in air for long time, though nobody clearly realizes how it would work. But "Web of Things" is not the full story. Actually the question is broader: why won't provide semantics for ordinary users? Can human beings tame semantics? Yes, with natural language. But there are still no applications which reliably handle it (which, btw, is the area of AI too). But it is not needed, if there would be human-friendly way to define semantics.

And this is the key problem for semantics: humans and machines handle it quite differently. Any data format is definite order of information adjusted with its rules. Each format (including text ones like XML or Semantic Web ones) has own rules, therefore we should solve problems of compatibility between them. The story is different for natural language: there are a small set of rules, which is used for any domain, whereas compatibility concerns identifiers. That is, data formats deal with coarse-grained compatibility (the whole format with the whole format), natural language does with fine-grained compatibility (an identifier with an identifier).

Some may ask whether semantics is needed for ordinary users. Have you ever used forums? Have you ever asked for some information and got answers like "This is already discussed, look in other topics" (even if there are 100 topics, 30-50 pages each)? How many times you searched for some simple fact (like a site address), which you visit a week ago or which a friend sent to you? Such examples clearly show that humans have no access to semantics, they just could not retrieve facts from a discussion or an email. All this happens because information is ordered by computer rules not by human ones. Users are forced to order information with directories, files, pages, emails and other information containers. Whereas humans order information by meaning, topics, context, etc. Of course, such ordering is used in computer too, however, it depends on data formats and applications, which makes it computer-dependent too.

So what is necessary to allow humans to work with semantics too?
1. Human-friendly semantics interface. Solution is evident: hypertext is such human-friendly form, which just has to markup semantics appropriately.
2. Human-friendly identification. Its purpose is to make natural language less ambiguous and allow linking it with computer entities. For example, to discern opera as art and browser, we can use art:opera and browser:opera identifiers. In compact form they can look like rather as hints: <s-id="art"> opera </s-id> and <s-id="browser"> opera </s-id>.
3. Identification should use cloud routing, because the same identifier can have different meaning for different subjects. For example, "home" can be used as the same identifier by different people, whereas routing will decide which specific meaning it has.
4. Human-friendly semantic relations. Their set should be restricted and understandable for ordinary users. For that we can use quite simple structural relations: (a) identification, equality, or similarity (usually expressed with "is": "I am an user"), (b) specifying/generalizing or "part-of" relation (expressed with "of"/"have": "I have home" and "Home of me"), (c) association or undefined relation (that is, all other relations, which are defined according with used identifiers). That is, in the essence, we should just decide are two things or conceptions peer entities, parts of each other, or they are linked in some other way. Of course, this does not cover all possible relations, but this is enough to understand how parts of information linked with each other.
5. Semantics on the whole is a graph of identifiers linked with relations. However, human-friendliness means you are not forced to define semantics for the whole information (for the whole page), instead you may semantize only part of it.
6. Semantic wrapping for computer entities. Files, graphical controls, web page elements, etc do not have meaning by themselves. Usually it is attributed by human mind. However, to order information efficiently, we need meaning directly linked with these elements.
7. Notion of textlet and using questions-answers as base form of information exchange. In fact, a search for answer is comparison of two graphs of answer and question. For example, "Do you go home?" will return true only if it coincides with "We go home" graph, and "you" in the question coincides with "we" in given information.
8. Compatibility should be fine-grained, which may be applied even to single identifier vs. another single identifier (or to complexes of identifiers and relations).
9. Context is needed to make meaning area narrower or wider. For example, a context of tools may narrow to one of hammers, but also we may extend a context of hammers to one of tools.
10. Semantic line interface (SLI) may combine features of command line (CLI) and graphical interface (GUI). CLI features may include top level access to any identifier and an order of identifiers which is similar to natural language one. GUI features may include convenient information representation in checkboxes, lists, etc.

Humans should have access to semantics at least because machines could not manage it automatically. There are a lot of irresolvable ambiguities in natural language. If you was one day at opera, nobody can guess when, where, and which opera you attended, if only you will answer corresponding questions (or provide them otherwise). Machines are still just number and text grinders, they could not think. That's why we need human-friendly Semantic Web.

More details you may find in Future of operating system and Web: Q&A

Комментариев нет:

Отправить комментарий