понедельник, 10 января 2011 г.

Is perfect search engine possible?

Did you ever think why a search retrieves million pages even for a quite simple query? The answer is generalization. We always generalize things because of many reasons: we have no time for details, we imply that others know what we know, we cannot make it more precise just because we are unable to do this, etc. A search cannot avoid generalizations, therefore it cannot magically "understands exactly what you mean and gives you back exactly what you want" (as Larry Page once described the "perfect search engine"). This is not a search fault it is just a guess, based on complex mathematical methods for finding words in database of billions pages. We cannot avoid generalizations and a search cannot avoid them too. Even if computers will be many times more powerful than they are today they won't be able to understand exactly what we mean and want, because generalization always lose a part of information, which cannot be recovered.

In the essence, generalization is simplification. No surprise, search engines try to avoid this problem by forcing users to make queries simpler and shorter. This follows the beginning of the famous quotation but not the ending: "Make everything as simple as possible, but not simpler". Only reaching a goal can determine if something was simpler than necessary or not. The goal for a search is to receive answer for any question. But how we can do it if we are advised to simplify queries? It is a way far from reality, where we ask as complex question as necessary. Such advices is similar to the situation when you ask someone with quite general words, he or she answers with general words too, everyone means something specific, which can coincide with other's specifics or contradict it completely. The bottom line: you have some chances to get correct answer, but it will always be only chances.

But what can be done with this situation? We need to be able to make information more precise (including questions like search queries). And solution should be both complex and simple simultaneously. In fact, it can be based on several simple ideas:

1. Anything has meaning.
2. Identified once, mean anywhere.
3. Generalize, specify, and combine information as necessary.

These ideas imply many things behind: (a) existing formats like email give meaning only to specific parts of information (subject, recipient, body), but there is meaning inside of a text in an email body too, (b) identification is simpler than specification, but it can matter more, for example, you know that a bird is a bird even without deep knowledge in ornithology, so "bird" as a name give you more information, than specifics like "it has one head and two wings", which is not always precise, can have exceptions, and other implications, (c) anything should be a part of one knowledge space vs. the current situation when information can be scattered in emails, information systems, files, etc, etc (d) reusing information is not less important concept than reusing code in programming.

More details can found at http://on-meaning.blogspot.com/2010/12/how-to-solve-problems-of-contemporary.html.

Комментариев нет:

Отправить комментарий