понедельник, 23 января 2012 г.

How to get developers to document their code? Make it reasonable.

Following How to get developers to document their code

Code documenting is a must, at least because it saves time. It helps to understand why and how was this code written, which is always helpful not only for other developers, but also for yourself (if you return to code after, say, one year). Especially, if it is really huge: more than thousands files, millions lines. Especially if you are a beginner to it. But even otherwise you could not understand all implications of code and why this code was written namely so: a requirement, a management decision, a customer request, a contract, by design, etc. However, it has one major drawback: it is affected by human factor, which leads to its misuse.

The worst example of such misuse is when someone tries to follow it strictly and duplicates code behavior with natural language. Of course, any tool can be misused, however, in this case, the problem is different: developers don't understand or misunderstand the reason behind documenting. You may persuade them. But would it work? From the very beginning of history of programming everyone persuades junior developers to write good code in appropriate style and formatting, do not allow obstructive stereotypes and anti-patterns, design early and document appropriately. Finally, some developers understand reasons behind this, but some do not. And, unfortunately, some of them happen to work near you. Persuade further? Motivate? Even this does not work sometimes. Or work but after some time, when code is already infected with human arrogance ("I know better what to do").

Quite a contrary situation with areas where reason works. Have you heard recently much fuss about usefulness of object-oriented programming? About bug tracking? On patterns or refactoring? No? The secret is here persuading is not needed, reason works much more efficiently. Evidently, it should be supported by availability. Bug tracking became widely used only when it was described theoretically and implemented.

But what reason is behind code documenting? Today, it is more focused on explaining meaning of code to other people, which is needed because there are not explicit ways to communicate it. Explanation is linking between meanings, however, modern applications and information are scattered (requirements are written in a word processing application, coding and compiling in another, bug tracking in yet another, etc). Of course, applications can be integrated or be compatible basing on some format, but it is impossible in many cases. Therefore, today information can be exchanged only implicitly with the help of natural language and human mind. And this creates another problem, because natural language is ambiguous (in contrast with precise code) and humans interpret text and meaning very differently.

What's if we can fill the gap between applications without natural language? In this case, we can reduce necessity of explaining meaning of code. Is it possible today? Can we link code with other meanings and documents explicitly? A file reference is not appropriate at all because it works only at a specific computer. A hyper reference does not fit such purpose too, because it depends on a server/path availability (that is, if path changes, then you will lose meaning behind a reference). Another problem is more subtle: a hyperlink refers to an information resource, which means (1) our future usage will be tightly coupled with namely this resource, (2) there is still no link with natural language, for which we should generalize things (and which you usually do with code comments). Realize, you use a wiki link like http://mycompany.com/wiki/amadeus/Rating, which generalizes and links your code with some functional area. Everything looks great except of one "but": you need to persuade developers to fill any information about this area into this wiki page. But if you persuade, it is the clear sign this activity is not reasonable enough (at least, it is more or less true for programming and related things, where reasonable alternatives are possible). Of course, you can use tags (keywords) but they have another problem: they are text and not precise enough. Moreover, though they help to manage information, but growing quantity of them makes you to manage tags themselves.

Is there alternatives? Today no. However, I propose to use semantic link for that. Semantic link is a bridge between precise meaning (like data or code) and ambiguous or generalized one (like natural language), and allows to make things as precise or as general as necessary. It allows linking between code and real world domain, because it works as a reference to real world things and conceptions too. It is not coupled with specific information resource. For example, "Rating" as a string is ambiguous, because it is not clear which rating we are talking about. However, we can make it a little more precise with link <s-id="My Company:Amadeus:Rating">Rating</s-id>. But, unlike a file or hyper references, semantic link is not just an identifier, it is meaning itself and, partially, natural language identifier(you can use more compact representation of it: <s-id="My Company:Amadeus:">Rating</s-id>). This identifier refers to the conception of music rating as it seen in Amadeus project of My Company. At the same time, it refers to derivable specific information that (a) rating is namely as it designed and implemented by namely Amadeus project of namely My Company, but as well to derivable general information that (b) rating is about music. But also, this identifier may refer to a set of documents which describes the rating system. And also, because any meaning is a filter itself, this identifier is the filter of everything, which relates to the rating system.

How it solves the problem of code documenting and how does it make reasonable? First, because semantic link may be as precise (we may make it precise to avoid all ambiguities of words) or as general (or we may make it general to avoid too specific meaning of code) as necessary, we may link code and natural language more flexibly. Second, because semantic link does not refer to specific resources, we may apply it anywhere (but semantic link itself should be supported). Third, because semantic link is not specific only for programming it may be used in documents too (for requirements, etc). Main outcomes of these features are: (1) you may link description in natural language (like a requirement) and code, (2) you may navigate (filter) the project easier.

Example 1
For example, the rating requirement may be written as "User can rate releases and, optionally, review them." So, code documenting starts here. A developer needs just to make meaning of this requirement more precise. Consider as the typical example of top-down design. It may involve code generation or a developer may write own code like this:

//<s-id="My Company:Amadeus:Rating">
void rate() {


//<s-id="My Company:Amadeus:Review">
void review() {


Bottom-up design is possible too: a developer may write some code, which will be generalized in some description only later. Why does it reasonable? Because a manager sees which requirement is covered by code and because a developer sees which code is covered by descriptions or if there is still correspondence between code and requirements (which may change), etc. Moreover, when code grows, it allows navigating easier between the very code, documents (requirements, specifications, manuals), bug tracking, version control, database, and interface, because each may use semantic link too.

Example 2
Take easier but more intriguing example: string utils, which uses either some utility libraries or home-brewed one. But how to make everyone aware about it? Of course, you can send a letter and describe what we use in our project, or describe it somewhere at your wiki/portal, or declare it at a meeting. Unfortunately, a mail may be missed or forgotten, there are too many rules/documents at wiki/portal and it is not so easy to find one, which relates to string utils. Finally, a new developer may join your team later, when everyone knows some things and think all other people know the same things. Quite naturally, a new developer starts to write own string utils, because it shouldn't and won't ask about any feature in code (because asking too much and too little has own drawbacks and side-effects). Of course, code review helps but it depends on human factor too: developers may be ill or on vacation, developers may omit some files, developers may be not attentive to details because of family problems, etc, etc.

Instead, any developer should reach already used string utils as soon as he or she intends to use it. That is, autocomplete should work not only for specific fields and methods of a class, but also for semantics of code. For example, you type "string1 is not empty" and autocomplete automatically proposes to convert it into "!StringUtils1.isEmpty(string1)". And for that semantic link is needed too, because it is not sufficient just to put comment that StringUtils.isEmpty(String s) is "Defines whether a string is empty or not" but also link it to meaning like that:

//<s-id="empty"/> <s-id="#param" s-is="#empty">
void isEmpty(String param) {


(Where "#param" refers to a parameter of a method, and "#empty" refers to a local identifier. Of course, it is only proposal, therefore such syntax may look too complex and is only a hint not a must.) In the result, IDE may deduct that this parameter may be applied to any other String to define if it is empty.

Example 3
Take more complex example. A method, which searches a substring in a string is described differently:

* C: int strpos (const signed char *string, signed char c). The strpos function searches string for the first occurrence of c. The null character terminating string is included in the search. The strpos function returns the index of the character matching c in string or a value of -1 if no matching character was found. The index of the first character in string is 0.
* C++: size_t find ( const string& str, size_t pos = 0 ) const. Find content in string. Searches the string for the content specified in either str, s or c, and returns the position of the first occurrence in the string.
* C#: int IndexOf(string). Reports the index of the first occurrence of the specified String in this instance.
* Common Lisp: search sequence-1 sequence-2 &key from-end test test-not key start1 start2 end1 end2 => position. Searches sequence-2 for a subsequence that matches sequence-1.
* Delphi: function AnsiIndexStr ( const Source : string; const StringList : array of string ) : Integer. The AnsiIndexStr function checks to see if any of the strings in StringList exactly match the Source string. When a match is found, its (0 based) index is returned. Otherwise, -1 is returned.
* Erlang: str(String, SubString) -> Index. Returns the position where the first/last occurrence of SubString begins in String. 0 is returned if SubString does not exist in String.
* Fortran: INDEX takes two arguments, both of them are strings, it looks for the first string inside the second and returns the place the first string begins inside the second.
* Java: int indexOf(String str). Returns the index within this string of the first occurrence of the specified substring.
* JavaScript: string.indexOf(searchstring, start). The indexOf() method returns the position of the first occurrence of a specified value in a string.
* PHP: int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] ). Find the position of the first occurrence of a substring in a string. Find the numeric position of the first occurrence of needle in the haystack string.
* Python: str.find(sub[, start[, end]]). Return the lowest index in the string where substring sub is found, such that sub is contained in the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.
* Ruby: index(substring [, offset]) > fixnum or nil. Returns the index of the first occurrence of the given substring in str. Returns nil if not found.
* XSLT, XPath: fn:contains(string1,string2). Returns true if string1 contains string2, otherwise it returns false.

As you may see, sometimes it is described intentionally differently, with missing important details or adding unnecessary details. Of course, there are missing "language contract". For example, it is the fact that indexes start with 0 (which is one of machine-oriented details, because data offset in memory calculated from 0), whereas in real life humans usually counts from 1. Naturally, there are differences in language syntax, but we can omit them, because we consider namely descriptions.

Even at such simple example we can understand why is code documenting needed: method names are intentionally too short and abbreviated (instead, at least "index of"), the same concerns parameters. Also we may understand why just text search won't be fruitful: though there is only one description per programming language, there are infinite variants how this function may be described and search by in natural language.

But all these descriptions have some things in common: meaning ("a method, which searches a substring in a string"), which consists of:

1. The very method (which includes #2...#6)
2. "String"
3. "Searches" (which applies to #2 and #4)
4. "Substring"
5. A result value...
6. ...which is returned from the method (by #1, with #5)

Difference in descriptions is expressed only in different words which are chosen for each of this element:

1. Method: function, procedure, anonymous class, closure.
2. String: sequence.
3. Searches: occurs, finds, checks, contains.
4. Substring: string, content, subsequence.
5. Result: index, position.
6. Returns: reports.

That is, code documenting is meaning recognition applied to classes, functions, methods, etc. What is different comparing with text only: meaning elements are separate entities, therefore you can easily replace them with similar one (like using "finds" instead of "searches").

вторник, 17 января 2012 г.

Why my approach is different?

1. Statistical approach for search, recommendations, related things, similar people really works. However, it cannot cover all needs just because statistics is always about probability and averaging. There are a lot of automatic algorithms, which decide on behalf of us what information to provide and how to process it. At the same time, it is evident that such decisions could not be ideal even in theory, because we make decisions basing on our whole experience and all our knowledge, whereas machines take into account only tiny fraction of personal information and behavior. Of course, human decisions and processing is not ideal too. Therefore, we need to fill the gap between decisions made by humans and machines, in general, and between human-friendly representations of information (like natural language, user interfaces) and machine-friendly representations of information (data, formats, etc), in particular.
2. Semantic Web positions itself as "Web of data", conventional Web is, in fact, "Web of pages" (or information resources), also there are proposals for "Web of things". Of course, ideally, future Web should include everything: data, pages, things, abstract conceptions, and even natural language. Therefore the conception of semantic link is proposed, which allows referring to both specific things and conceptions, and abstract information and contexts (which covers some area of meaning).

3. Semantic Web does not provide user interface for semantics. But we need not only representation for semantics, but also human-friendly way to deal with semantics. Proposed approach provides such a way through simplified Semantic Web: usage of hypertext as human-friendly representation plus improvements in user interfaces. Semantic line interface (SLI) combines features of command line (CLI) and graphical interface (GUI). CLI features may include top level access to any identifier and an order of identifiers which is similar to natural language one. GUI features may include convenient information representation in checkboxes, lists, etc.
4. Library-like approach for file systems and URL does work too. However, it works more efficiently in real world than a search of a string throughout books. However, in computers everything is quite the contrary: a search could be more efficient than using hierarchy. Google proved that by making search more efficient than portals. Later, Wikipedia proved that organized information in some cases can be more efficient than search. Truth is somewhere between the opposites: information may be organized with the help of semantics, basing on identification, relationships, and contexts.

5. Today, we live in scattered world of data, code, applications, interfaces and other computer entities. As a result of that, integration of different systems and applications is considered as the separate task, because to work together data and applications should either comply with one standard or be integrated in some way. Proposed approach solves this problems by several techniques: semantic wrapping of any computer entity and ability to use semantic links for that, representing of any information (provided by a page, an application, whatsoever) in uniform way (textlet).

вторник, 10 января 2012 г.

Can humans treat semantics? Can machines treat natural language?

I constantly hear objections like "Humans are pretty bad at semantics, they miscommunicate meaning, they have different assumptions about relationships or meanings attached to a term" with references to Frege, Russell, Mill, etc. On the other hand, I hear "Natural language is impossible to be interpreted by computers, there are a lot of such attempts in the past, but they all failed", etc. Of course, I agree with this, because it is true.

Yes, they are impossible tasks. But everything we use today was impossible task at some moment of the past. Mathematics and physics are full of unresolvable (and sometimes impossible) tasks, however, it does not prevent us from using particular cases of these tasks. Impossible tasks are not always solved by intuition or sudden burst of inspiration, some of such tasks are solved only because they were restricted appropriately. Do not think "Humans are bad in semantics, because they miscommunicate constantly", because despite such problems we understand each other eventually. Do not think "Machines cannot learn natural language therefore it is impossible for ever", because search engines in some cases can fetch meaningful results. There is constant progress in both directions, however, these directions are converging in some point. And namely this point could be turning one.

Machines prefer to have information as specified as possible (because they do only what they instructed to do). Humans prefer to have information as general as possible (because it we constantly abstract and it saves time, for example, instead of "I go to Main St. 21, Springfield" we say "I go home"). Therefore, when machines meet with natural language, which usually consists of as general terms as possible, they have problems with that. The same concerns humans: as soon as they need to specify something in details, they start to mistake, lag, and misuse. Or rephrasing this: machines can interpret only what they learned to interpret, but when they do this, they do this thoroughly; humans can interpret anything, but when they do this, they may do this unreliable.

The consequence of the difference in thinking model is information matches differently too. In the case of machines, any data format can define only what it set up to define. No more than that. If you need to link data from different formats, usually, you need to solve the task of compatibility of one format vs. another format. Natural language is quite different: it can define anything, based on small set of rules, and to link information, you need just to fit identifiers. For example, you have separate databases (which are a sort of formats too) and applications of movies, history, and geography. If you want to know more on events and places, which some historical movie is based on, then you need to work with three applications and copy-paste information from one application to another. Or, you need to have one application which has predefined linkage for these databases. There are no other ways. The same concerns even hypertext: if you have no links from the page of the movie to the concerned events, you need to use copy-paste text, which relates to the events, and open/search them separately. Of course, copy-paste is one of the most important technologies of information exchange (no joking). But it is forced solution. Natural language works differently: you can easily combine different words, which relates to different domains, in one sentence.

But what's worse, we are forced to comply with machine-friendly way of information processing. It was appropriate, back in 1960s, when hardware performance was not enough to go far from machine commands. Slow progress in this direction is marked with following events: (1) interpreting bytes as text, (2) invention of OS and command-line, (3) introduction of graphical user interface, (4) using search instead of portals, etc. In each case, computer entity was replaced by more human-friendly entity. But computers (and mobile devices) are still far from being human-friendly. There are a lot of examples when we forced to use computer entities. We manipulate with files and URLs according with computer locations of information, we manipulate with GUI according with interface locations of controls, we manipulate with emails, pages and other information containers just because everyone do this by habit. And because there is no convergence point between the ways of human- and machine-inclined thinking.

At first, it seems impossible task to improve such situation. All attempts to move from the opposite corners failed or are very restricted (which make them working, at least). From one corner, moving is going on in direction of analysis of natural language. From another, we move into the area of human-friendly way of creating semantics. Why these attempts failed? Because machine-driven analysis of natural-language may deal with rules of natural language, however, it surrenders in front of identifiers of the language. These identifiers (words, phrases, sentences, etc) are generalized by the very humans and have different and sometimes ambiguous meaning. What's worse, machines cannot fit them, because they work with the different type of compatibility.

Humans always fail to interpret full meaning because it is always infinite. For example, "art" has infinite definitions, which talk about senses, emotions, intellect, music, literature, painting or sculpture. There are a lot of books which describe art very differently. There are infinite ways of interpretation and unwinding of all implications of that, because (1) the meaning of any word may ascend to "everything" (thanks to abstraction), (2) there are infinite paths between a thing or a conception and "everything" (like "Nature -> Humanities -> Art" or "Entertainment -> Art", etc), (3) meaning may change with time and thought, (4) subjective meaning may differ significantly, etc. So, thorough meaning defining is impossible (even machines can't do this), but should we discard meaning at all? Apparently no.

Some consider tagging as the best case scenario for humans to treat meaning. However, tagging is only continuation of the idea of keywords, which was used already for long time. It flourishes nowadays not because it treats meaning, but because it provides multiple entry points to information (which is impossible with URL). But it fails constantly when meaning concerned. For example, Full speed ahead for UK high speed rail network article, which has "Birmingham, London, Transport, United Kingdom". Quite strange choices for tags. Why do Manchester and Leeds have no tags, though they mentioned in the article? Why "Transport" not "Railway"?

There are several problems with tagging (and keywords), because they are:
- arbitrary, because, as we mentioned above, anything has potentially infinite set of meanings
- fixed, the article classified as "Transport" won't fall into "Railway" category, which can be later created because "Transport" will cover too many articles (and become useless because of that)
- overlapping, because creators of information often want it fall into all possible categories (that is, into "everything).
What's about the title of the article? In fact, it also abstract meaning of it, however, there are another problem: usually it can have a lot of metaphors and allusions to attract consumers of information, however, it also may distort meaning.

However, everything changes if we lower the bar. We don't need full analysis of natural language and full unfolding of meaning. It is enough to just link them. Such linking exists today too: (a) developers may link computer entities and natural language in interface, (b) anyone may markup some word as a hyper reference, etc. However, application developing for each case is too expensive, whereas hyper reference can refer only to information resource. But namely hyperlink gives a notion of how sought solution has to be. This is semantic link which is just further development of the idea of hyperlink with following features:
- it relates to both natural language and computer entities
- it may refer to real things, abstract conceptions (and, in particular, to computer entities)
- it is flexible (can have as general or as specific as necessary).

As a model we may use another HTML tag, which relates to both natural language and computer entities, similarly to URL. For example, "I was in New York" has several implicit semantic links: "I", "was", "New York", and "I was in New York" as an event. Explicit form of semantic link for New York is <s-id="city"> New York </s-id> , thus, we linked "New York" word with the specific city not a state or a hotel or a cafe. The implicit form of semantic link is any text (word, phrase, sentence, etc) or data. Also, semantic link implies infrastructure behind it:
1. Identification with the help of globally resolvable identifiers. That is "city:New York" should be resolvable at any computer.
2. Establishing of relations (of identity, belonging, or associating) between identifiers and complexes of identifiers. You should be able to link explicitly "I", "was", and "New York" or it can be done by some automatic tool.
3. Identification routing, which allows delegating of establishing of derivable relations to identification routing servers. Thus, "city:New York" has derivable relations with "country:USA" or "USA:state:New York".
4. Usage of semantic context, which allows make meaning area narrower or wider. That is, if you have a lot of pictures of you in New York, you can limit their number by moving to the context of pictures of you in specific district of New York, or extend number of them by moving to the context of USA. Context is rather mix between folder system (with fixed number of scopes) and search (with top-level access to any item): it is top-level access to any item through flexible scope.
5. Semantic wrapping for computer entities, which means that semantics may be attached to a file or other entity, and accompanies it in the case of transfer.

Though semantic link is similar to hyper link, however it differs in one aspect: it does not refer to computer entities only. The purpose of it is meaning identification, associating, generalizing and specifying of any level. Though terminally it may refer to a thing, a conception, or a computer entity. For example, a picture of your family in New York can be called in very different ways like "I and New York" (for personal use), "My family at Times Square" (for ones, who acquaint with your family and New York), "John Doe in USA" (if someone knows only one person from your family and who is not aware about difference between US cities), etc. Semantic links make these forms equivalent in some sense (not equal), because behind scene, "New York" may route to "Times Square" and to "USA", and vice versa. That is, "New York" and "USA" here are not just names but rather semantics itself.

This is the key for understanding semantic link: everything is semantics. It is a semantic link itself, a destination of semantic link and context which a destination belongs to, a subject which uses semantic link and own context. This is consequence of not only nature of semantic link but also possibility to use it attached to a computer entity like a file. This allows to avoid additional hypertext to describe it and avoid redefinition of it on transfer. For example, the picture automatically will appear in the corresponding context at your friend side (of "New York" or "USA" or "you", depending on the necessary level of generalization and intentions).

Usually, I call identification "human-friendly", which means it has to allow humans to define semantics. However, at the same time, I emphasize human-friendliness only because the most of previous attempts were namely machine-friendly. But, in fact, the proposed ways of identification and relation establishing are both human- and machine-friendly. Not only humans will be able to markup semantics, but also it could be done automatically or semi-automatically. For example, an application may help to find all ambiguities and propose to resolve them in human-friendly way (through GUI or SLI). For example, proposing variants "New York" as the city and the state. Is semantic markup needed for machines? Yes, because today data often has unstructured information, which may be stored in fields like "description" or "custom data". And semantic link allows handling namely such information by generalizing or specifying it as much as necessary.

So, can humans treat semantics? Can they treat meaning of Full speed ahead for UK high speed rail network with manually added semantic hints? Yes, for example, they can identify it as High-speed rail in the United Kingdom (though, of course, such semantic link should not refer to the article in Wikipedia but to namely "High-speed rail in the United Kingdom"). Difference with hyper link is such identifier assumes (with the help of routing) it relates to London, Birmingham, Transport, the United Kingdom, and many items, which we cannot predict. However, there are a lot of variants between full identification and no identification: for example, you can identify that the article is about "UK" and "railway", or about "high speed rail network", etc. Such type of identification is quite affordable for an ordinary user, especially considering that text itself gives hints on semantics and identification could be (semi)automatic. Of course, you may ask: "But what's difference if we consider it as plain text?" Check "UK high-speed rail network" and "UK rail network" and "UK railways" queries: they give very different results, which include articles, which are similar to one mentioned above, only for the first query and does not include for the third one.

Can humans differentiate simple relations? How "UK" and "railway" is linked? Is UK railway? Is UK similar to railway? The answer is evident, therefore identity, equality, or similarity could not be applied here. Is UK belongs to railway? No. Is railway belongs to UK? Yes. What's about "I was in New York"? How are "I" and "New York" linked? Identity? No. Neither I belong to New York, nor it to me. So, we can always safely just link them with plain association. Proposed four relations are plain and easily recognized by users: actually they already use them when using files and directories. Identification is file naming, generalizing/specifying is placing a file in a directory, associating is not supported by all file systems but it is already introduced by hyper link.

Can machines treat natural language? Can they analyze natural language? No. But semantic markup gives the chance to improve such situation by setting semantic links only where they can be established automatically or set manually. Can machines convert data in natural language? Not quite. But, you can link them with semantics. For example, if you have databases of movies, then you can markup some movie data with semantic links to events, but you could not create databases of history or geography.

But, finally, who treats semantics and natural language better? Humans? Machines? In fact, the best result achieved when their abilities united. Yes, machines can identify some things and establish relations faster. However, they can't summarize them for one piece of text, they cannot abstract, and sometimes they cannot choose the correct variant basing on statistics analysis of something. Don't get me wrong: statistics is very helpful, when applied appropriately. When as precise match as possible is done, then statistics may help to choose one variant of many ones, which are equivalent. However, statistics cannot replace precise match.

The purpose of proposed approach is collaboration of humans and machines, and making it possible in quite simple and straightforward way here and now. Additionally it may change the way how we interact with user interfaces and Web. However, this approach does not resolve all meaning problems. Yes, there is possibility to miscommunicate meaning. But humans mistake constantly, they miscommunicate meaning with plain text too. Does it mean we should discard any information? Definitely no. Also this approach does not allow to convert natural language words and phrases into computer data. But machines are too far away from human-like thinking and dealing with arbitrary abstractions. Does it mean we should wait for that? Certainly no. Just know limits.

воскресенье, 8 января 2012 г.

2012: The end of the world of Web as we know it

When we are talking about Web, usually we imply hypertext. However, it is the middle size answer. The small size answer is a hyperreference to an information resource in Internet. The small idea which changed the world consists of one HTML tag. HTML without this tag would become trivial markup language for text formatting, not better and not worse than others. To that moment, ideas which lay in the ground of the Web, existed for almost 30 years. Thus, HTML descends from SGML, which descends from GML, which is developed in IBM in 1960s. The very hyperreference was used in different systems as NLS or HyperCard database before 1991, when the first draft of HTML specification was issued. Then why is hyperreference revolutionary? First, it is immediate transfer to another computer resource, second, it is a description of a reference. All this was used before: (a) URLs is quite similar to file or network references, (b) some applications provide transfer by this references, and (c) you could always describe any reference in a text file. However, hyperreference converged this all in one entity, which, finally, facilitated navigation between computer resources and their description.

Of course, there is the large size answer. Because there are a number of factors, which made possible Web, in general, and hyperreference, in particular, and without which we cannot live online:
1. Internet. Without it, hypertext could be just another data format, which might be already forgotten.
2. Protocols (link, internet, transport, application). Here we should specially mention routing, which hides details of connection to remote host. Without that, we should specify all details of subnetworks. Other protocols are no less important.
3. DNS and domain system. Do you realize an ad like "Visit our site at 210.987.654.321"? And "Visit our site at fedc:ba09:8765:4321:fedc:ba09:8765:4321"?
4. Hypertext. Of course, Internet itself could not revolutionize anything. You can easily understand that when work in local network without hypertext. In this case, you do use network references as plain text, then copy-paste them, lose their descriptions, and share them through emails or forums, which is tedious.
5. Text formatting. Of course, it is important factor too, because it made hypertext user-friendly. Without that, hypertext would be yet another text format known only for geeks.
6. Markup, tags as plain text. Though today we have a bunch of HTML editors, but simplicity of HTML creation is very important factor, which made HTML so widely used. Also, it allowed to use HTML in text fragments, which extends its usage even more.
7. Free form. Nevertheless important is HTML allowed enhanced survival for information. Thus, hypertext can have invalid tags, unknown attributes, omit headers, etc. Without that, HTML could be replaced by other format, which won't require specialized editor for checking validity.
8. Forms. Gave birth to Web applications, without which Web revolution would not be possible.
9. Email and other communications. Of course, without them, Webolution would not possible too. Otherwise, how to disseminate news and notify about new information or updates?

However, though hypertext facilitates content integration, it is not quite efficient sometimes. Did you try to support hypertext for describing some directory? First, you need to synchronize it each time when directory content is changed. Second, you should synchronize the very content (for example, if you described some picture at two Web pages, then, if the description changed, you should update both pages). Hypertext had other inefficacies too, which forced further Web evolution, which went in several directions:
1. Imitating desktop applications.
2. Extended usage of Web capabilities (like communicating, socializing, collaborating, etc).
3. Information management enhancements (search, etc).
4. Hypertext extending (data, semantics, etc).


Gradually users, developers, and designers understood that hypertext capabilities are not sufficient to represent everything they want. This caused creation of JavaScript, DHTML, CSS, Ajax, etc. All they relate to MVC pattern, where model (or content) corresponds to text inside HTML, view corresponds to text formatting, and controller does to JavaScript code. However, even though CSS was conceived to separate content (model) from form (view), and JavaScript theoretically can be separated from HTML, but, in reality, all three components mixed between each other. The practice of CSS and JavaScript usage witnesses that, theoretically, view and controller can be separated from model, when applied globally (to the whole). However, there are a lot of local tasks, which usually solved in mixed mode: for example, to move some specific HTML element or to add behavior to it. Ironically, this is the fate of almost all technologies, which applied in the way, which was not predicted or recommended by their designers.

Did these technologies revolutionize something? They definitely do hypertext itself. But, generally speaking, they just moved hypertext closer and closer to desktop applications. However, Web application functionality which is similar to desktop one requires more effort. This is consequence of development complexity (because hypertext was not designed for application development). That is, this is rather eternal pursuit than revolution.

Later many realized that complex behavior and complex design are difficult to implement in hypertext. This is natural, because it was not in the design. Therefore, Flash, HTML5, etc were created, and different medias (audio, video) were used more and more frequently. But multimedia progress is not specific feature of hypertext, but is the part of general progress of computer systems. Is there any revolution? Certainly no. This is only coincidence that they were used in hypertext in parallel with their broader usage in personal computers. Of course, when we see that some sites post their news only as video, it looks like everything could turn into "sheer television" soon. Of course, this won't happen just because some information is difficult to represent as video.


Though Web 2.0 was coined in 2000s, but its roots may be traced back to the middle of 1990s. Namely then, personal Web pages were widely popularized. Everyone wanted to broadcast oneself to the world, which was not so simple. First, it requires knowing HTML basics. Second, it demands to be with taste (some of us still shudder when recall some pages from 1990s) or money (for a designer). Third, you needed to stay in touch with the world, therefore you should have an email or even a forum (chat). When the first Web euphoria passed away, Web page requirements have grown but simultaneously simplified by different services, which proposed uniform approach. Here come social networks.

Though Web 2.0 term was coined not only for that. The Web changed itself: Rich Internet Applications (as the consequence of MVC improvements), enhanced search, tagging (which begot folksonomies), wikis, etc. Of course, Web 2.0 is inseparable from the progress of Web applications, hypertext itself, and computer industry, in general. For example, broad usage of animation and video was impossible in 1990s, because hardware was constantly behind. The same concerns bandwidth, which, at first, did not allow Web applications to come closer to desktop ones. That is, Web is rather evolved than changed the direction of progress. Did social networks revolutionized something? In some sense, yes. But, at the same time, most of their traffic is information "noise", samples of "read and throw away" or "see and forget". Fortunately, this anarchy is alleviated by collaboration sites (wikis, etc). However, this is just continuation of Webolution, which just came later than it could be (in 1990s).

Though revolution in the name of one company is too Googlecentric, but this company made for the Web more than some ones taken together. First of all, this is the search, then maps, then innovations here and there. On the other hand, value of the search sometimes is too exaggerated, though 10 years ago it changed the way we search information. Back in 1990s, the academic approach prevailed. It declared that information should be categorized into hierarchies, so called portals, whereas a search was secondary tool. Analogies with real world quite often play dirty jokes with computer technologies. Their creators attempt to draw parallels with real worlds, whereas computer environment in some sense is richer than that. Thus, portals were organized similarly to library directories, which is more efficient in real world than a search of a string throughout books. However, in computers everything is quite the contrary: a search could be more efficient than using hierarchy. Google proved that by making search more efficient than this have done before. However, today, all recent Google innovations either fails (like Wave), or change a little (as the most recent search enhancements, which are rather cosmetic and sometimes even irritating).

What is worse, the search works sometimes is just inefficiently and gives absurd results. This is direct result of overrated PageRank which is rather the statistical hack. Finally, what is statistics? What can average temperature by hospital tell? It is helpful to understand tendencies, it could be helpful for the search to sort out results, which are more related than others. But it could be done after precise search is ready. However, modern search is not precise. If you have ever read Google help for search, all advices are about making a query simpler. This is quite natural, because modern search works efficiently only in the case of plain queries, which consists of 1-3 words or which coincides with some identifier (like New York Times), or when you can predict a page title in advance. Of course, such approach works to some extent, but as soon as a query becomes more complex (when words are linked by complex relations, and when a query grows to 5 and more words), the search starts giving incorrect or no results at all. Even if you call this revolution, now it is the time to revise its results.


As soon as hypertext became generally available, its architects were disappointed with how it used. One of its purposes was text structuring, whereas it was applied mostly visually, usually with representation tags (which are considered by some adepts as the fault). Here comes the first attempt to adjust Web evolution: XML. It was designed as extended markup, which emphasizes simplicity, generality, and usability over the Internet, which was represented in 10 design goals in its specification. Already here we have the problem: markup is meaningless unless it is understood by a human. The same problem with any data format: meaning is not in data and even not in a format (metadata) but in understanding of both data and format.

Moreover, XML does not work as it was realized by its designers. Instead of text markup, which is human readable, XML became data serialization format, which is based on plain text. Moreover, many complain about its complexity and verbosity. XML document of even middle complexity is hardly human readable (because of complicated recognition of information in it), that is, such document can be read only by an application. But if so, then XML is only text (not binary) data form, which became possible thanks to extended storages. Of course, it is convenient, but it is not revolutionary.


Conventional Web changed the world in 10 years. For the same 10 years Semantic Web became only just yet another technology with some benefits and shortcomings. Why? Semantic Web is Web of data (as declared by its adepts), which is considered in the context of machine understandable information (metadata). One of the most interesting applications of it should be intelligent agents, which would help us to search information more efficiently. However, today is 2012, but only some of us have ever heard about it, and a few of us understand what it is for. And we are talking not only about ordinary users, developers are not interested much in it too. Remember 1990s, when any developer was eager to learn new standards and contribute something? Nothing similar for Semantic Web. The most of developers either only know theoretically what it is, or do not want to learn it because it is awkward, or think it is not applicable at all. But Semantic Web experts still describe how everything would be good. Someday. Something is wrong here. The history knows many cases of technologies, which could change the world, but didn't (CORBA, OS/2, etc).

What is wrong with Semantic Web? Reality check? Its architects clearly state that they used the latest achievements of artificial intelligence, which was used long ago before Semantic Web. However, these achievements were known only for narrow circle of experts. Is this a sign of success? If so, why should new text format be successful, where old (possibly binary) one wasn't? Only because it is used for Web? But the history of Web shows that successful technologies are (1) completely new ones, (2) ones, which were successfully used before, (3) ones, which become successful in parallel with Web progress. It is not the case for Semantic Web. But there are some problems with Semantic Web basics too.

Why does Semantic Web use URL for identification of not only information resources but things of real world? There is the evident problem: an information resource describes things of real world or abstract conceptions (that is, everything), but it is only a part of everything, not everything is a part of information resource. This is quite serious problem. Principles of computer resource identification and identification for things and conceptions are very different. Does anyone want to use library classification for people or car names? Then why this is true for Semantic Web? Moreover, such choice resulted in many URLs, which has nothing behind them, because they are used only as identifiers. But URL is not quite reliable identifier, because it depends on a site, a web server, a file system, which sometimes are just unavailable. So such usage of URL broke one of main advantage of conventional Web: hyperreference and its behavior.

Why does Semantic Web use triples for semantics? Arguments are quite solid: they are tested by time of usage in AI area and any information may be represented with them. Of course, there are not less solid cons. Not everything researched by AI science is successfully used in reality. Triple is not the only model, which may represent anything: any general-purpose programming language, relational database, natural language and some other forms can do this too. Difference is how difficult to apply it. Judging by Semantic Web dissemination, its model is not so good as some think. Quite evident proof of that: why does ternary relation is base one, whereas we use a lot of unary and binary relations? For graph representing? But it could be done in many other models too.

The deeper reason is triple model is arbitrary. Or rephrasing: "All models are wrong, but some are useful" (George Box). It is based on subject-predicate-object relation, which comes from natural language sentences like "I take a book". However already "I go home" is not so straightforward, because "home" is object only abstractly, but, in fact, it indicates a place. The usage of triples is stipulated by our living in space-time continuum, where each action involves at least two things. Of course, natural language abstracts it and fits any situation into such form. For example, "I am a user" has no action inside, because "is" is relation between "I" and "user". Such model works in natural language, but it works only because a human knows whether action or relation used. Semantic Web goes further and forced to break any situation into triples. For example, "Today I go home quickly" should do into "I go home", "I go today", and "I go quickly", whereas natural language considers each word as a separate part of speech, sentence, language, etc.

Of course, such model could work in some situations. But there are well-grounded doubts in its efficacy, because after 10 years Semantic Web is still expensive toy. Compare this with conventional Web, which was always quite affordable as for understanding and cost. Machine orientation played dirty joke with Semantic Web: it is so deeply oriented to machines that humans are not able to use it appropriately. The very experts of Semantic Web still declare there is no good representation of it, no human understandable interface. Partly this explains why developers hardly could understand what it is. Partially, this problem might be resolved with microformats, which used HTML for semantics, however they are restricted and not extendable.


Today we may confidently declare that Web (or namely hyperreference) potential is already exhausted, Web applications still cannot reach the level of desktop ones, multimedia progress is not specific to the Web, search does not advance because cannot handle complex queries, XML did not fulfill expectations of its designers, Semantic Web is too expensive and awkward to influence the Web. But there are a lot of areas which can be improved even in conventional Web. One simple hyperreference changed the world once, one semantic reference can do it again.

The idea is simple: any word, phrase, sentence, article, book is semantic reference. Semantic reference may navigate to things, conceptions and information resources which describe them. Semantic reference is self-descriptive, which is not necessary to define explicitly (however, which may require some specification to avoid ambiguities). For example "HTML specification" refers not to the specific file at the specific server, but to all HTML specifications which were published and which will be published. Of course, we are talking about "Web of Things", idea of which soars in air for long time, though nobody clearly realizes how it would work. But "Web of Things" is not the full story. Actually the question is broader: why won't provide semantics for ordinary users? Can human beings tame semantics? Yes, with natural language. But there are still no applications which reliably handle it (which, btw, is the area of AI too). But it is not needed, if there would be human-friendly way to define semantics.

And this is the key problem for semantics: humans and machines handle it quite differently. Any data format is definite order of information adjusted with its rules. Each format (including text ones like XML or Semantic Web ones) has own rules, therefore we should solve problems of compatibility between them. The story is different for natural language: there are a small set of rules, which is used for any domain, whereas compatibility concerns identifiers. That is, data formats deal with coarse-grained compatibility (the whole format with the whole format), natural language does with fine-grained compatibility (an identifier with an identifier).

Some may ask whether semantics is needed for ordinary users. Have you ever used forums? Have you ever asked for some information and got answers like "This is already discussed, look in other topics" (even if there are 100 topics, 30-50 pages each)? How many times you searched for some simple fact (like a site address), which you visit a week ago or which a friend sent to you? Such examples clearly show that humans have no access to semantics, they just could not retrieve facts from a discussion or an email. All this happens because information is ordered by computer rules not by human ones. Users are forced to order information with directories, files, pages, emails and other information containers. Whereas humans order information by meaning, topics, context, etc. Of course, such ordering is used in computer too, however, it depends on data formats and applications, which makes it computer-dependent too.

So what is necessary to allow humans to work with semantics too?
1. Human-friendly semantics interface. Solution is evident: hypertext is such human-friendly form, which just has to markup semantics appropriately.
2. Human-friendly identification. Its purpose is to make natural language less ambiguous and allow linking it with computer entities. For example, to discern opera as art and browser, we can use art:opera and browser:opera identifiers. In compact form they can look like rather as hints: <s-id="art"> opera </s-id> and <s-id="browser"> opera </s-id>.
3. Identification should use cloud routing, because the same identifier can have different meaning for different subjects. For example, "home" can be used as the same identifier by different people, whereas routing will decide which specific meaning it has.
4. Human-friendly semantic relations. Their set should be restricted and understandable for ordinary users. For that we can use quite simple structural relations: (a) identification, equality, or similarity (usually expressed with "is": "I am an user"), (b) specifying/generalizing or "part-of" relation (expressed with "of"/"have": "I have home" and "Home of me"), (c) association or undefined relation (that is, all other relations, which are defined according with used identifiers). That is, in the essence, we should just decide are two things or conceptions peer entities, parts of each other, or they are linked in some other way. Of course, this does not cover all possible relations, but this is enough to understand how parts of information linked with each other.
5. Semantics on the whole is a graph of identifiers linked with relations. However, human-friendliness means you are not forced to define semantics for the whole information (for the whole page), instead you may semantize only part of it.
6. Semantic wrapping for computer entities. Files, graphical controls, web page elements, etc do not have meaning by themselves. Usually it is attributed by human mind. However, to order information efficiently, we need meaning directly linked with these elements.
7. Notion of textlet and using questions-answers as base form of information exchange. In fact, a search for answer is comparison of two graphs of answer and question. For example, "Do you go home?" will return true only if it coincides with "We go home" graph, and "you" in the question coincides with "we" in given information.
8. Compatibility should be fine-grained, which may be applied even to single identifier vs. another single identifier (or to complexes of identifiers and relations).
9. Context is needed to make meaning area narrower or wider. For example, a context of tools may narrow to one of hammers, but also we may extend a context of hammers to one of tools.
10. Semantic line interface (SLI) may combine features of command line (CLI) and graphical interface (GUI). CLI features may include top level access to any identifier and an order of identifiers which is similar to natural language one. GUI features may include convenient information representation in checkboxes, lists, etc.

Humans should have access to semantics at least because machines could not manage it automatically. There are a lot of irresolvable ambiguities in natural language. If you was one day at opera, nobody can guess when, where, and which opera you attended, if only you will answer corresponding questions (or provide them otherwise). Machines are still just number and text grinders, they could not think. That's why we need human-friendly Semantic Web.

More details you may find in Future of operating system and Web: Q&A

воскресенье, 1 января 2012 г.

2012: The end of the world of user interfaces as we know it

Games matter for humans. Games simulate reality, which is unaccessible for us by some reason. Boys (grown-up and not quite) usually play with gadgets. Girls of any age like behavioral games. Touch interface combines features of both. That's why boys and girls are still playing with it. Paradox is touch interface still does not influence PC world.

Why? The first cause: expensive big touch screens. But it is resolvable by mass production. Another cause is deeper. Can they replace mouse (even not counting that some models have problem with mistake touches)? Mouse accuracy is 1 pixel (which sometimes is enough to call another function). Touch screen accuracy (for finger) is 20-40 pixels (which is enough for an one-word button or 2-3 icons at not mobile screen). To replace mouse, touch screen should be 10-20 times bigger than they are now. But to use them we have to be further away from them, which makes usage of fingers impossible.

However, maybe touch screen devices may just outnumber and replace PCs?
1. The first problem: size mobility. In the essence, a tablet is a notebook or even a netbook without keyboard. One of straightforward reason of tablet popularity is they can be used everywhere, even where usage of notebooks is not comfortable. One minus: typing is awkward and screen is too small (but if we increase size, then we get notebook again).
2. The second problem: energy mobility. Tablet performance is only slightly over notebook one, smartphone performance is much better but its screen size is even smaller.
3. The third problem: text input. Comparing with PCs, typing is slow, error-prone, and is not quite comfortable. In fact, any text media (email or social network messages) in a mobile device finally converts into SMS.
4. The fourth problem: interface. Touch screen has not added value for conventional graphical interface. Except of different way of positioning. This problem described above. Voice control? It is not perfect now, and sometimes it is not acceptable if noise level is quite high or you are in no sound environment. But the bigger problem is modern interface does not suit voice control. Calling menus and submenus with voice control is the anachronism, if you realize that you can call any function without using nested paths to graphical controls (like "File - Open").
5. The fifth problem: icon hell. How many icons can you remember to use reliably all functions? 30? 50? 100? When touch interface uses restricted functionality, it does not matter. But if number of functions grows, it will.
6. The sixth problem: augmented reality looks exciting only because users have not played with it too much. It is restricted by necessity of visual contact with an object. Which makes it impossible to use for abstract conceptions. Performance? On one hand, augmented reality appeared to accelerate referring of real objects. Which works perfectly for immovable objects with constant geographic coordinates. But movable objects require at least image recognition, which works imperfectly.

As we may see, touch devices, in particular, and mobile devices, in general, are good in the niche, which they occupy today:
1. Mobility.
2. Casual communication (SMS style), telephony.
3. Casual entertainment.
4. Casual work.
5. Information, linked with geographic positioning and visual contact.

Of course, mobile devices may oust PCs, if the most of user will be happy with them. All advantages (games, entertainment, Internet) of personal computers, which attracted users in the past are available in mobile device now. What remains? The alternative of PC as a typewriter depends on reliability of voice recognition. That's all what an ordinary user wants. But there are a lot of tasks, which cannot be moved to mobile devices. And these tasks requires conventional interface of traditional operating systems.

The history of this interface passed through two key points: command line interface (CLI) and graphical user interface (GUI). Both stages are tightly coupled with hardware performance. Power of first computers was enough only for input-output of symbols. Gradually, performance reached the level, which allowed graphical interface. When you think on it, many do about affordable visualisation of data and applications, which made PCs so popular. Some may think about simplification, which let users to avoid CLI and manuals (to some degree). What matters more is GUI allowed increased complexity of applications, which extended usage of computers to many new domains.

However, graphical interface appeared at mass market around 30 years ago. At the current moment, he lost impetus and does not allow to increase application complexity even more. And how? Through further simplification? Nowadays simplified interface is somewhat "fashionable" thanks to minimalist style a la Apple and Google, thanks to mobile applications, which cannot be more complex because of physical restrictions. However, this simplification is achieved by decreasing functions with either throwing away or automating (which often only irritates, because an application makes wrong choices). That is, this is rather straightforward simplification, which decreases complexity of applications.

In some sense such minimalism (and accompanying simplification) is the reaction for unsuccessful attempts of graphical interface improving through 3D interface and virtual reality, which were quite promising 10 years ago. However, finally they become useful for restricted set of applications. Why? But what we can improve with them? 3D interface gives nothing special except of visual representation which is quite similar to real world one. This is where it converges with minimalism: they both intended to rather impress than serve efficiently.

The problem for 3D interface is humans prefer to see things as 3D but to manipulate with them in 2D. Realize, some information is written at a square or a cube. In the first case, we may see it fully (maybe with zooming or scrolling). In the second case, we may not see it completely but moreover we need additional ways for rotating cube, etc. That is, additional dimension makes information processing more complex. Of course, there are a lot of examples where 3D models are preferable (like vehicle design, etc). But in the most cases we would prefer 2D models. And even if a model has more dimensions (like Web which consists of 1D texts, represented at 2D pages, which linked with hyperlinks through the third dimension), we would prefer to make it two dimensional (pages are 2D anyway but "tunnelled" through the third dimension).

Virtual reality goes further and allows 3D interactions, that is, in the essence we deal already with 4D interface, which imitates real world. But is virtual reality efficient? We can ask it in even broader context: is "interface" of real world more efficient comparing with graphical computer interface? Look at things of real world. While they provide simple functions, their interface is ergonomic and well fit human hands. But as soon as functions become more complex and their quantity grows, we see buttons, switches, etc, that is, interface becomes very similar to 2D computer one. This is the key problem of any interface. While it covers a few functions, we can play with ergonomics and visual representation. As soon as complexity grows, learnability becomes more important.

Let's return to the original problem. Why does graphical interface allows to increase application complexity? Simplicity of GUI is at the surface. But looking deeper, GUI extended possibility of command line, because it packed commands into menus, their parameters into dialogs, and values into controls. Thus, (a) access to functions was simplified, because they can be found in one place, (b) access became reliable, because there are no more mistypes in names of functions and parameters, (c) validation can be done only for one element or can be avoided at all (because all values are packed in a list), etc. In the result, modern application can have about 200 functions only at top level, but with all dialogs, tabs, and other controls, their number easily grows to thousands. You may only guess how such number of functions may be managed by command line. In the essence, GUI have done quantitative leap in information ordering, which made applications more complex. But can this trend continue?

To understand that let us look how interface works in real life:
1. Any tool has own ergonomics, which is optimized for its usage.
2. As soon as quantity of tool functions grows, it starts using buttons and switches, and sometimes neglects ergonomics.
3. As soon as quantity of tools grows, we need reliable search of them. This can be resolved with groping by functional areas, or by shelves and boxes. But this works efficiently only if you understand and remember the principle of grouping. To explain this principle to others, you need to show it (which does not require calculations in mind). However, not everything can be shown visually (especially abstract conceptions), moreover, visualisation requires more time and size (if it's video) comparing with explanation. What's worse, fixing of physical grouping is not flexible, for example, if someone would violate it.
4. To be more flexible, we use natural language. But understanding of it is more difficult, because you need to link words to things, their location, relations between things and locations, etc, which require spatial and other calculations in mind.

The most important things in information ordering can be explained with Goedel incompleteness theorems (which states that any system may be either complete or consistent). Thus, a few functions may allow very minimal design, which can be easily understood by humans (that is, consistent for them). As soon a number of function grows (and a system become more complete), interface becomes more complex (and less consistent). The same theorems explain why explanation may be either brief but requiring calculations, or long but without them (that is, "speed vs. size"). This is true for both real world things and graphical interface, which cannot surpass the certain level of information ordering.

Similar balance between flexibility and efficiency we see in modern interface. CLI provides flexible approach, when you can easily combine functions and their parameters. GUI uses fixed order of elements, which is not flexible but more efficient. Both approaches are not flexible because their functions has fixed names (that is, if function is "copy file", you cannot call it as "write file").

But let's look at the situation from another angle. If we cannot have a system both complete and consistent, can we try to have both complete and consistent systems? This just means we may have several abstraction levels (from the most complete to the most consistent), which may be manipulated depending on circumstances. In own turn, this means, where we use information we need meaning. Information deals with fixed names as "hammer" (which may be used a sequence of 6 letters), which function could be only "driving a nail". Meaning deals with multiple aspects of a hammer as "a tool for impacting objects", etc, which properties may be applied to different situations. Its functions may include driving and removing nails, fitting parts, and breaking up objects, up to throwing and propping up.

To make meaning work we need the following innovations:
1. Human-friendly identification of meaning. Information, which usually represented as plain text, should be precisely identified (and refer to real world things or abstract conceptions). This would exclude ambiguities of natural language.
2. Human-friendly defining of relations. Meaning without appropriately defined relations may be incorrect. For example, "a hammer is at the second shelf in the left box" may mean that the given shelf is inside the box or this box is at the given shelf.
3. Semantics usage inside hypertext, which makes it (with identification and relations) human-friendly. And this is, in fact, simplified Semantic Web.
4. Semantic wrapping. All information elements (files, graphical controls, web page elements, etc) do not have meaning by themselves. Usually it is attributed by human mind. However, to order information efficiently, we need meaning directly linked with these elements.
5. Notion of textlet, which may (a) be requested with questions in a human-friendly form, which is quite close to natural language, and (b) responds with answers with the same form.
6. Context is needed to make meaning area more narrow or wider. For example, a context of tools may narrow to one of hammers, but also we may extend a context of hammers to one of tools.
7. Semantic line interface (SLI) may combine features of command line (CLI) and graphical interface (GUI). CLI features may include top level access to any identifier and an order of identifiers which is similar to natural language one. GUI features may include convenient information representation in checkboxes, lists, etc.

Such approach is more appropriate for several conceptions, which may work more efficiently under new circumstances. Thus, voice control may be more efficient coupled with SLI, which is more similar to natural language. Augmented reality and image recognition may use direct references to real things, which would be accessed by users easier. But what matters even more is this approach is a part of broader semantic ecosystem, which embraces not only interface and its elements, but also other parts of OS, Web, etc. This, in own turn, means that an ordinary user may access even to programming interfaces (semantically wrapped and represented as textlets).

More details you may find in Future of operating system and Web: Q&A

The example of SLI you may find below: how to set up timeout for monitor turning off? Almost any function is hidden in hierarchy of menu/dialog/tab calls, which makes their calls are not very intuitive.

Instead of that we may use SLI, which would provide top-level access for any function and combines certain features of both CLI and GUI. Thus, you can start typing "turn off" and SLI would hint which identifiers are available (possibly, restricted by some context). After "turn off monitor" function found, SLI would display the corresponding control.

понедельник, 12 декабря 2011 г.

Future of operating system and Web: Q&A

 Brief introduction (Google Docs)

 Q&A (Google Docs)

How can OS look like?

How do we exchange information today?

How can we exchange information tomorrow?

среда, 28 сентября 2011 г.

Browser vs. Operating System: Paradigm Shift

Recently we observe the rise of a pack of new operating systems, which are considered as competitors to traditional ones (which dates back to 30 and more years). Some of new ones are oriented for mobile devices, others are Web- and browser-centric. The latter, in part, assumes browser may replace traditional user interface with similar one (which uses desktop, icons, etc) though based on hypertext. Of course, appearance of new operating systems is not accidental. There are a number of factors, which force the whole industry thinks in this direction:

- possibility (or even necessity) of lightweight solutions vs. heavyweight solutions of traditional operating systems and applications;
- deprecated legacy of old graphical interface vs. hypertext interface;
- growing significance of hypertext itself.

But can browser-centric operating system replace traditional one completely? Or we observe the rise of alternative lightweight systems, which will exist in parallel with traditional heavyweight ones? Will interface change in oncoming years? Is file deprecated as information storage unit? These and other problems are considered below.


There are several factors which cause a demand for lightweight solutions:

1. Lightweight environment. Sometimes we really need only standard tools for viewing and creating quite simple content.
2. Lightweight functions or SOA (service-oriented architecture) in action. Sometimes we use only several functions from thousands. It is good to have a choice between lightweight functions (or remote/local services) and heavyweight applications.
3. Transfer of heavyweight functions to remote servers or cloud.
4. It is impossible to create heavyweight solutions for all possible use cases.

Of course, the choice between lightweight and heavyweight functions appeared not today and even not yesterday. This choice closely linked with the choice between loose and tight coupling, with the nuance of choosing between local and remote components (added by Web), and between explicit and implicit location of processing (by clouds).

The choice between lightweight or heavyweight solution is not simple and depends on circumstances. Usually any product is a combination of both solutions. In real world, tight coupling is preferred when efficiency achieved by configuring dependencies between components in advance; or composition of components is quite difficult. Thus, any car should be assembled before sold to a driver. Loose coupling is preferred when we need flexibility and replaceability of components. Thus, some details of car can be replaced, because otherwise we would replace the whole car because of a small failure.

Such choice is not simple in computer environment too. Namely therefore thin clients cannot replace thick ones completely, Internet applications cannot replace OS-specific ones, cloud computing cannot replace local one. They are alternatives, which usually combined as necessary. And any future operating system has to combine both lightweight and heavyweight approaches.


Graphical interface does not influence evolution of operating systems anymore. Recent touch sensing revolution enhanced the way of interacting with interface but not the very interface. However, many companies and developers still try to improve interface, which is due to following factors:

5. Convergence of graphical interfaces. Almost any platform (including mobile ones) is uniform now: desktops, icons, windows, etc.
6. Growing significance of hypertext interface. The most of information is represented today with hypertext, therefore its interface sometimes is more important than one of local environment.
7. Innate problems of graphical interface are consequences of Goedel's incompleteness theorems. That is, a user deals either with understandable interface with few functions, or with more sophisticated interface (with greater number of functions) but which requires significant learning curve.
8. Lesser flexibility of graphical interface, comparing with command-line interface (which allows reuse of command and parameters, e.g. by shell scripting).
9. Forced (for a user) awareness about low-level aspects of operating system (like files, services, applications, etc).

Tasks of interface are not only representing information in human-readable form but also order and filter this information. Of course, complete ordering of information is impossible (see Goedel's incompleteness theorems). Therefore, we have either ordered hierarchies in GUI (when we cannot get access to all information, because it is hidden in different branches of hierarchy), or unordered interface of command line (when all information is accessible from the top level but you read a manual more often). Similar situation with Web, where ordered portals confront search engines. Another example of this duality is news sites, which try either order number of news (but then some will be hidden), or show all (but then a user may get lost in them).

Can this duality be resolved with design (which sometimes considered as the solution)? Yes, but only partially, because design is only yet another (visual) way of information ordering (and which influenced by the same incompleteness theorems). Can this duality be resolved with any kind simple interface? Yes, but it is possible only by dropping some functions (and again see the theorems).

Can this situation be resolved with video and interactive lessons? It is true that some things may be easily explained with video. However, simple explanation fits only simple things. The more complex information, the more complex explanation is required. Video may convey more information, but less meaning (that is, ordered information), comparing with text. Video is more resource consuming as for both creating and for usage too. And not only for computer but for a user too: the same explanation (especially for abstract conceptions) may be quicker to read than to see at video. Actually, these facts were known centuries ago. Thus, a teacher gives "video" (in person) lessons in real time, but these lessons cover only basics, whereas full understanding is available only through self-education with books.

Can the problem be resolved by reading a manual? Yes, but here we have a sort of dilemma. The more complex interface is, the more complex description of it is, the more entities are introduced. Finally, the tails start to wag the dog: instead of mere using, interface dictates how we should use it. The problem is aggravated by separation of interface and documentation, which often written by different people at all. Therefore reading manuals is the adequate solution only when manuals are adequate too (which is not always the case).

However, if looking deeply, it is clear that learning interface may become even more difficult because of the very conceptions and principles of interface. Have you tried to look for some option, when you are forced to browse through a list of all (50+) options? Have you tried to look for some option in graphical interface, when it is available only in specific and the only way (like menu A -> dialog B -> button 1 -> checkbox 2), which should be remembered for each option? Have you tried to remember 30-50 keyboard shortcuts for each application you use? Have you tried to use some icon-abound application by remembering all icons?

You may see that even "read the manual" principle is limited. And the cause of interface problems is rather in understanding it interface as a layer between applications and humans. Though, in fact, interface is representation of semantics in human-readable form. That is, it is a sort of human-friendly semantics (like natural language) vs. computer-friendly semantics of data and formats.


Influence of hypertext is difficult to overestimate, but and is easy to do too. Hypertext long ago ceased to be hypertext (that is, text linked with other information sources with hyperreferences). Dynamic aspect, real-time updates, video transformed hypertext into a mix of design and programming elements. That is, it evolved to be more or less similar to standard graphical interface. This is not bad by itself, but unfortunately this also meant hypertext is roamed from text features and conveying semantics. In general, the peak of hypertext evolution is not reached yet, because of a number of factors:

10. Hypertext (which used for describing information) did not transform into hyperinformation (which is information itself).
11. Hypertext allows transparent integrating local and remote information sources, which, in own turn, facilitates development of lightweight solutions.
12. Necessity of broad usage of semantics. Semantic Web started this trend, however, it is restricted only with Web, though should include operating systems too.
13. Necessity of human-friendly semantics (which may be possible thanks to simplified Semantic Web, which may facilitate creation and usage of understandable semantics).
14. Necessity of content ordering/filtering/aggregating (which may be applied not only with search engines, but with semantics itself).
15. Browsing is a part of semantics. Really, each referring changes context of information environment. And each modifying of context may change a scope of references (by filtering them).

Everything is semantics, if it has some meaning for a human being. Not only interface, but browsing itself, any data, any application, etc. Today, applications expresses meaning in data and formats, but usually it is revealed only partially to a user (through graphical interface) because it is considered as too complex or not secure to disclose it. However, there are a lot of user data which by definition are not complex and secure for a user oneself (because often they were created by him or her). In result, we have the situation, when, meaning inside computer is detached from meaning inside humans. Of course, such detachment is evident problem, which is after fixed by other means.

For example, realize some application starts and shows a dialog, which proposes to install update and asks whether you want to update the application or not. Of course, such attention to a user is appropriate and necessary, however, there are a lot of questions, which ignored by conception of "detached meaning", in general, and "simple interface", in particular:
- what is included into an update and why each change required?
- what size an update has and where it is downloaded (is there any space on disk)?
- is there any options (do not download updates, download only updates on security, etc)?
- is there any video lesson, which will help to configure an update more finely?

Of course, such advanced management of update can be implemented with already existing tools and environments. However, each advanced management requires more resources and time of developers, which, in own turn, affects a user, because less resources and time will be spent on the very updates, etc. Can this vicious circle be broken? Yes, but for that we need new principles of interface, operating systems, and applications.

All the questions about the update, which were asked above, in fact, are semantics of the update. Therefore we need to attribute meaning to applications and their parts, data, interface, text, video, files and their parts, as well as real things and concepts in more advanced way. Today, we already have one way of attributing meaning: it is hypertext. But it can be enhanced as follows: (1) include semantics (identification and associating with relations), (2) allow fragmented and detached semantic wrappers for any piece of information, which in minimal form could consists from one identifier. In result, we would have not hypertext but rather hyperinformation, because not only text but any information (applications, data, any file, etc) can be used by it. You may imagine the whole operating system (or rather semantic ecosystem) as one semantic canvas, where any element (like a fragment of video or an interface control) may be easily linked with each other.

What technology should be used for hyperinformation? Traditional Web is too oriented for graphical interface. Semantic Web is too oriented for meaning in a form of data and formats. Both technologies are inappropriate here, because it is impossible to create personal application for any personal use case of any user, similarly it is impossible to create personal data and formats for any case. Namely therefore we need the technology which would allow creating personal meaning, that is, simplified Semantic Web.

Ideas behind simplified Semantic Web are simple: (1) human-friendly identification as a balance between precise computer identification (e.g. by hyperreferences) and ambiguous natural language identification, (2) human-friendly representation of semantic relation through legacy of hypertext, (3) restricted set of semantic relations to be human-friendly too, (4) semantic wrappers for any data or code. Because of these ideas we come to understanding that namely browser can play the key role in future operating systems.


As base tool for surfing traditional Web, browser can easily migrate for hyperinformation. In own turn, it can be easily adapted for lightweight solutions and semantics. It does not mean it should operate with any format or be able to represent any graph of Semantic Web. However, browser should work with:
- "Web of Things" (or more correctly "Web of Things and Conceptions"), that is, allow navigation between things, conceptions, and information resources on them;
- legacy data and applications;
- interface;
- Big Data.

For that future browser should incorporate following features:

1. To be semantic aware.

Browser should be able to refer to anything (unlike hypertext which can refer only to information resources). The only alternative (which can refer to anything) now is natural language, which, unfortunately, is ambiguous (mostly because it uses not unique identifiers). For example, "Springfield" as a city, can be found in different states of USA. When you are using map application, then software propose to solve this ambiguity manually. However, search engine cannot resolve it fully, because sometimes it is not clear from text and context which exact Springfield is mentioned. The only solution: to have a unique identifier, which can be explicitly used. For example, "Springfield, IL".

Realize, you have a document with description of economy of Springfield, IL. Today, the trendiest way to facilitate access to it is key words (or tags), because, evidently, the traditional search by text is considered as not quite fruitful. We can guess as such a set of tags could look like "Springfield, Illinois, USA, economy, business, trade, employment". Shortcoming of them are: (a) they are only associations, which you can recall (though, apparently, a set of associations can be never complete and is subjective by itself), (b) some tags are a part of other tags, which makes them redundant, (c) meaning of the document is blurred, basing on these tags: is it "economy"? is it "type of economy?"? is it "trade in USA and Illinois?", etc.

The alternative for key words is only more precise identification as a part of simplified Semantic Web. In our example, if you really want to make access to the document faster, you should use the aggregated and precise identification of the document. That is, such identification should (a) convey concise meaning of the document, (b) use as much precise identifiers as possible (not text). Of course, you may object to that, because usually information has several meanings. However, this the case for all things in Universe, cities has many meanings too, however they should be identified precisely. Similarly, any information should be precisely identified (and not by a title, which often tended to be attractive and full of metaphors), whereas all possible associations (that is, additional meaning) should be calculable.

In our case, the document should have the identifier, which consists of two precise identifiers: "Springfield, IL" which refers to the city, and "economy" (which is the precise identifier, though has vague area of definition what is included into this term), linked between each other (which means this document namely about "economy OF Springfield, IL", not about "Springfield, IL and economy" or "economy document written in Springfield, IL", etc). The role of browser is to be able to use precise identifiers (to both computer entities and real things and conceptions) and relations between them.

That is, when you surfed to such document, you can surf to either other information resources or real thing or conceptions (and not only). Why do we need that? In fact, traditional surfing only helps to research random links to documents, which describes concerned entity, and related things and conceptions. The situation can be improved with an article in an encyclopedia. However, neither encyclopedia can cover all things in the world, because a number of such things and conceptions much greater than ability of encyclopedia creators. Namely therefore, we surfing through real things and conceptions are needed. It would help to understand the role of concerned thing, its associations with other things, and surf between them.

2. To work with legacy data and applications.

Today, browsers are not used instead of file managers, mostly because hypertext support for files is resource and time consuming. For example, for directory you need to create a hypertext, which describes it, and then synchronize each time when you create/move/delete some files. To semantize files and directories, we can wrap them with meaning. A semantic wrapper should identify content, which would allow including legacy data (files and their parts, applications and their functions and controls) in semantic ecosystem. Semantic wrapping is the key principle of hyperinformation, which consider any information as a separate entity available for referring.

This opens new perspectives, sometimes in unexpected areas. Today, a lot of information is duplicated across the Internet. Unfortunately, there are no ways for machine to define if information is duplicate or not (unless file identifiers coincide). Semantic wrapping may change that. Unlike arbitrary file identifiers (for example, the document about Springfield, IL may be named as "Spr.doc" or "Sprinfield_IL.doc", etc) semantic identifiers are more unequivocal. Therefore, this would make possible to download only a wrapper from untrusted source, and after this to download the information (which is referred by this wrapper) from trusted source. We can do the same today, but in explicit way, whereas semantic wrapper saves could save our time. For example, you read about some book in blog, click on a reference to semantic wrapper, which automatically redirects you to e-book download from your favorite shop (or proposing several sources).

3. To interact with semantics.

Semantics based on references to things and conceptions, whereas the role of interface is interaction with such references. Similarly, natural language is interaction with ambiguous word-references to everything, command-line interface is one with (mostly) unique references to computer resources, graphical interface is one with visual references (which, unfortunately, is hardly formalized in symbols) to computer resources, at last, interface of hypertext is one with unique references to information resources. Evidently, we need both references to computer resources and information resources and real things and conceptions. Therefore, the question is can we combine advantages of all these interfaces?

In fact, hypertext already combines some features of graphical interface and natural language. Any hyperreference is covered with ambiguous reference of natural language and uses unique reference to information resource. Conception of hyperinformation, simplified Semantic Web, and semantic ecosystem add new tints. Hyperinformation can uniquely refer to things, conceptions, and computer resources. But changes in interface are needed to extend usage of hypertext beyond "point-and-click" metaphor.

Phrase that "interface is semantics" now sounds not so abstract: really, instead of the graph of web sites, interaction with semantics is one with the graph of everything (things, conceptions, sites, data, functions, etc). And if "point-and-click" is enough for the graph of sites, the graph of everything requires quite different approach. It is evident by the fact of success of search engines. Really, why are they needed? Because, mere "point-and-click" is not appropriate for navigating through graph of everything, and in the case of search engines we use natural language for navigating.

The next step is using hyperinformation for navigating the graph. Each navigating operation is a comparison of a graph of a query with the graph of everything, or if put simply, it is a comparison of a question with available answers. Such navigating is very similar to command-line interface, except of (1) not commands are used but hyperinformation (that is, natural language with unique identifiers behind it), (2) result is hyperinformation too (that is, references to things, conceptions, data, UI controls, functions, etc). By this way, as you may note, all types of interfaces are merged into one, because hyperinformation may wrap elements of all types of interface. In the result, even graphical interface may change (though semantic interface may operate in parallel with it). For example, hyperinformation allows atomic UI operations, which would involve any separate UI control, which means any UI control may be reused in navigation by the graph of everything.

If talking about the example with the update dialog from above, semantic interface may change it in following ways:
- the dialog may refer to semantic wrapper of an update, which can be downloaded to your computer;
- an update may refer to own components;
- an update may refer an application options or even UI controls;
- a precise reference to an update can help to retrieve references to it;
- an update itself is a graph of hyperinformation, which may be queried by semantic interface.
That is, for example, to turn off updates you may query an update with "Turn off" string (which can become an identifier as soon as it matches an update graph), which may response with UI control, which allows turning off downloads.

4. To tame Big Data (micro and macro data management).

The problem of big data is not created by computer age, in fact, it is known for any living creature: thinking is processing of big data (representation of billions atoms in a receptor) and aggregating it into compact inner representation (a memory or a word, which refers to something). And if you want to live with any big data, the answer is always aggregation. Aggregate, if data is still big, aggregate aggregations, repeat as many times as needed until the result is appropriate.

Aggregation is crucial for not only really big data but also for local data: whereas Big Data is hard to be swallowed by computers, local big data is usually the problem for users, which are lost even in own personal data, and cannot find necessary information. Can the problem of locally big data be resolved by search? Not quite, because search uses own algorithm of aggregation. Whereas, a user needs own ordering, which can be supported by self-aggregation and interface.

Really, why any information should be categorized each time it is copied to a new computer? Realize, you download a document about Springfield, then save it to disk in "Documents/Geography/USA" folder. It seems you will find this document easily in future. It is true, but only until the moment you will have thousands documents in this folder. That is, you need to create new folders with subdivision by states or by cities, but then you can have problems with folders themselves (if their quantity will be big enough). What's worse, in some cases you may not have time for categorizing a new information, and then chances to find "Downloads/2011/Springfld_2.doc" are even less.

Now realize, that the document is pre-categorized (which should be supported by semantic ecosystem) or pre-aggregated (which should be done by the document creators) before it reaches your computer. The aggregation of the document can be expressed in reference to "Springfield, IL". Categorization is derivable of the aggregation, that is, the topic has derivable relations to geography, Earth, North America, USA, Illinois, Springfield itself, etc. Now, if you have predefined folder for geography or USA, the document will be placed here. But are folders needed at all? In fact, folder is constant categorizing of information, but if we can place information automatically, then only identification of information matters, whereas categorization is always derivable.

But aggregation should be supported by interface. This is where context come into play. What is the context? You can consider it as following versions of folder conception:
- dynamic folder: context should cover all information which matches the given topic (however because of possible performance issues with fully dynamic contexts, number of contexts can be restricted, as it done with modern desktops);
- hyperinformation folder: context should cover not only files and subfolders, but any information wrapped with semantics (part of file, application, function, interface control, web site, web page, etc);
- symbiotic folder: changing context may help to reach information, upon reaching information, in own turn, context may change (to find a document about Springfield, IL you go to the context of Illinois, but upon reaching it, the context can be changed to one of Springfield, IL);
- filtering folder: because information may belong to multiple contexts, context can be finely tuned to reach information you need;
- associative folder: context may store information shared between contexts (for example, a document about Springfield, IL goes to USA context, Illinois one, Springfield one, trade one, etc);
- semantic folder: context allows only information, which relates to meaning of it (that is, if you have context of USA, you cannot put a file on France to it).


So, what answers can we give to answers from the beginning of this article? Can browser-centric operating system replace traditional one completely? Of course not, because operating system is interface between hardware and software, whereas browser is a part of interface between software and users. On the other hand, browser may surpass traditional or low-level user interface (files, graphical interface, etc) with semantic user interface, which may change the principles of interacting humans with computers.

Do we observe the rise of alternative lightweight operating systems, which will exist in parallel with traditional heavyweight operating systems? Possibly. Lightweightness may be forced, if we talk about devices, which are oriented for media content or Web surfing. However, in general, any lightweight solution tends to take heavyweight features in long run. The same effect we can observe in many successful applications, which become successful thanks to lightweightness, but which became heavyweight monsters after more and more features requested.

Will interface change in oncoming years? Definitely yes, and changes should concern rather not new visual features, but rather combination of visual and semantic ones, which would allow dramatically improve learnability of interface.

Is file deprecated as information storage unit? Yes and no. Yes, it is deprecated as low-level feature of operating systems, which should be represented as high-level entity for users. But no, it is not deprecated as atomic unit of information (there is no reason to replace one atomic unit by another).

Then which paradigm shift we are talking about? We are talking about new semantic level of operating system architecture, which is placed over traditional user level. Whereas the role of browser as "run anywhere" tool is to navigate through semantic level and complement traditional user interface. That is, not browser vs. operating system, but browser over operating system. Apparently, it would not be browser as we used to see it: surfing the Web would be only one of many other functions like identifying of things and conceptions, working with semantics in general and context in part, efficient management of local information, etc, etc.