понедельник, 23 января 2012 г.

How to get developers to document their code? Make it reasonable.

Following How to get developers to document their code

Code documenting is a must, at least because it saves time. It helps to understand why and how was this code written, which is always helpful not only for other developers, but also for yourself (if you return to code after, say, one year). Especially, if it is really huge: more than thousands files, millions lines. Especially if you are a beginner to it. But even otherwise you could not understand all implications of code and why this code was written namely so: a requirement, a management decision, a customer request, a contract, by design, etc. However, it has one major drawback: it is affected by human factor, which leads to its misuse.

The worst example of such misuse is when someone tries to follow it strictly and duplicates code behavior with natural language. Of course, any tool can be misused, however, in this case, the problem is different: developers don't understand or misunderstand the reason behind documenting. You may persuade them. But would it work? From the very beginning of history of programming everyone persuades junior developers to write good code in appropriate style and formatting, do not allow obstructive stereotypes and anti-patterns, design early and document appropriately. Finally, some developers understand reasons behind this, but some do not. And, unfortunately, some of them happen to work near you. Persuade further? Motivate? Even this does not work sometimes. Or work but after some time, when code is already infected with human arrogance ("I know better what to do").

Quite a contrary situation with areas where reason works. Have you heard recently much fuss about usefulness of object-oriented programming? About bug tracking? On patterns or refactoring? No? The secret is here persuading is not needed, reason works much more efficiently. Evidently, it should be supported by availability. Bug tracking became widely used only when it was described theoretically and implemented.

But what reason is behind code documenting? Today, it is more focused on explaining meaning of code to other people, which is needed because there are not explicit ways to communicate it. Explanation is linking between meanings, however, modern applications and information are scattered (requirements are written in a word processing application, coding and compiling in another, bug tracking in yet another, etc). Of course, applications can be integrated or be compatible basing on some format, but it is impossible in many cases. Therefore, today information can be exchanged only implicitly with the help of natural language and human mind. And this creates another problem, because natural language is ambiguous (in contrast with precise code) and humans interpret text and meaning very differently.

What's if we can fill the gap between applications without natural language? In this case, we can reduce necessity of explaining meaning of code. Is it possible today? Can we link code with other meanings and documents explicitly? A file reference is not appropriate at all because it works only at a specific computer. A hyper reference does not fit such purpose too, because it depends on a server/path availability (that is, if path changes, then you will lose meaning behind a reference). Another problem is more subtle: a hyperlink refers to an information resource, which means (1) our future usage will be tightly coupled with namely this resource, (2) there is still no link with natural language, for which we should generalize things (and which you usually do with code comments). Realize, you use a wiki link like http://mycompany.com/wiki/amadeus/Rating, which generalizes and links your code with some functional area. Everything looks great except of one "but": you need to persuade developers to fill any information about this area into this wiki page. But if you persuade, it is the clear sign this activity is not reasonable enough (at least, it is more or less true for programming and related things, where reasonable alternatives are possible). Of course, you can use tags (keywords) but they have another problem: they are text and not precise enough. Moreover, though they help to manage information, but growing quantity of them makes you to manage tags themselves.

Is there alternatives? Today no. However, I propose to use semantic link for that. Semantic link is a bridge between precise meaning (like data or code) and ambiguous or generalized one (like natural language), and allows to make things as precise or as general as necessary. It allows linking between code and real world domain, because it works as a reference to real world things and conceptions too. It is not coupled with specific information resource. For example, "Rating" as a string is ambiguous, because it is not clear which rating we are talking about. However, we can make it a little more precise with link <s-id="My Company:Amadeus:Rating">Rating</s-id>. But, unlike a file or hyper references, semantic link is not just an identifier, it is meaning itself and, partially, natural language identifier(you can use more compact representation of it: <s-id="My Company:Amadeus:">Rating</s-id>). This identifier refers to the conception of music rating as it seen in Amadeus project of My Company. At the same time, it refers to derivable specific information that (a) rating is namely as it designed and implemented by namely Amadeus project of namely My Company, but as well to derivable general information that (b) rating is about music. But also, this identifier may refer to a set of documents which describes the rating system. And also, because any meaning is a filter itself, this identifier is the filter of everything, which relates to the rating system.

How it solves the problem of code documenting and how does it make reasonable? First, because semantic link may be as precise (we may make it precise to avoid all ambiguities of words) or as general (or we may make it general to avoid too specific meaning of code) as necessary, we may link code and natural language more flexibly. Second, because semantic link does not refer to specific resources, we may apply it anywhere (but semantic link itself should be supported). Third, because semantic link is not specific only for programming it may be used in documents too (for requirements, etc). Main outcomes of these features are: (1) you may link description in natural language (like a requirement) and code, (2) you may navigate (filter) the project easier.

Example 1
For example, the rating requirement may be written as "User can rate releases and, optionally, review them." So, code documenting starts here. A developer needs just to make meaning of this requirement more precise. Consider as the typical example of top-down design. It may involve code generation or a developer may write own code like this:

//<s-id="My Company:Amadeus:Rating">
void rate() {


//<s-id="My Company:Amadeus:Review">
void review() {


Bottom-up design is possible too: a developer may write some code, which will be generalized in some description only later. Why does it reasonable? Because a manager sees which requirement is covered by code and because a developer sees which code is covered by descriptions or if there is still correspondence between code and requirements (which may change), etc. Moreover, when code grows, it allows navigating easier between the very code, documents (requirements, specifications, manuals), bug tracking, version control, database, and interface, because each may use semantic link too.

Example 2
Take easier but more intriguing example: string utils, which uses either some utility libraries or home-brewed one. But how to make everyone aware about it? Of course, you can send a letter and describe what we use in our project, or describe it somewhere at your wiki/portal, or declare it at a meeting. Unfortunately, a mail may be missed or forgotten, there are too many rules/documents at wiki/portal and it is not so easy to find one, which relates to string utils. Finally, a new developer may join your team later, when everyone knows some things and think all other people know the same things. Quite naturally, a new developer starts to write own string utils, because it shouldn't and won't ask about any feature in code (because asking too much and too little has own drawbacks and side-effects). Of course, code review helps but it depends on human factor too: developers may be ill or on vacation, developers may omit some files, developers may be not attentive to details because of family problems, etc, etc.

Instead, any developer should reach already used string utils as soon as he or she intends to use it. That is, autocomplete should work not only for specific fields and methods of a class, but also for semantics of code. For example, you type "string1 is not empty" and autocomplete automatically proposes to convert it into "!StringUtils1.isEmpty(string1)". And for that semantic link is needed too, because it is not sufficient just to put comment that StringUtils.isEmpty(String s) is "Defines whether a string is empty or not" but also link it to meaning like that:

//<s-id="empty"/> <s-id="#param" s-is="#empty">
void isEmpty(String param) {


(Where "#param" refers to a parameter of a method, and "#empty" refers to a local identifier. Of course, it is only proposal, therefore such syntax may look too complex and is only a hint not a must.) In the result, IDE may deduct that this parameter may be applied to any other String to define if it is empty.

Example 3
Take more complex example. A method, which searches a substring in a string is described differently:

* C: int strpos (const signed char *string, signed char c). The strpos function searches string for the first occurrence of c. The null character terminating string is included in the search. The strpos function returns the index of the character matching c in string or a value of -1 if no matching character was found. The index of the first character in string is 0.
* C++: size_t find ( const string& str, size_t pos = 0 ) const. Find content in string. Searches the string for the content specified in either str, s or c, and returns the position of the first occurrence in the string.
* C#: int IndexOf(string). Reports the index of the first occurrence of the specified String in this instance.
* Common Lisp: search sequence-1 sequence-2 &key from-end test test-not key start1 start2 end1 end2 => position. Searches sequence-2 for a subsequence that matches sequence-1.
* Delphi: function AnsiIndexStr ( const Source : string; const StringList : array of string ) : Integer. The AnsiIndexStr function checks to see if any of the strings in StringList exactly match the Source string. When a match is found, its (0 based) index is returned. Otherwise, -1 is returned.
* Erlang: str(String, SubString) -> Index. Returns the position where the first/last occurrence of SubString begins in String. 0 is returned if SubString does not exist in String.
* Fortran: INDEX takes two arguments, both of them are strings, it looks for the first string inside the second and returns the place the first string begins inside the second.
* Java: int indexOf(String str). Returns the index within this string of the first occurrence of the specified substring.
* JavaScript: string.indexOf(searchstring, start). The indexOf() method returns the position of the first occurrence of a specified value in a string.
* PHP: int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] ). Find the position of the first occurrence of a substring in a string. Find the numeric position of the first occurrence of needle in the haystack string.
* Python: str.find(sub[, start[, end]]). Return the lowest index in the string where substring sub is found, such that sub is contained in the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.
* Ruby: index(substring [, offset]) > fixnum or nil. Returns the index of the first occurrence of the given substring in str. Returns nil if not found.
* XSLT, XPath: fn:contains(string1,string2). Returns true if string1 contains string2, otherwise it returns false.

As you may see, sometimes it is described intentionally differently, with missing important details or adding unnecessary details. Of course, there are missing "language contract". For example, it is the fact that indexes start with 0 (which is one of machine-oriented details, because data offset in memory calculated from 0), whereas in real life humans usually counts from 1. Naturally, there are differences in language syntax, but we can omit them, because we consider namely descriptions.

Even at such simple example we can understand why is code documenting needed: method names are intentionally too short and abbreviated (instead, at least "index of"), the same concerns parameters. Also we may understand why just text search won't be fruitful: though there is only one description per programming language, there are infinite variants how this function may be described and search by in natural language.

But all these descriptions have some things in common: meaning ("a method, which searches a substring in a string"), which consists of:

1. The very method (which includes #2...#6)
2. "String"
3. "Searches" (which applies to #2 and #4)
4. "Substring"
5. A result value...
6. ...which is returned from the method (by #1, with #5)

Difference in descriptions is expressed only in different words which are chosen for each of this element:

1. Method: function, procedure, anonymous class, closure.
2. String: sequence.
3. Searches: occurs, finds, checks, contains.
4. Substring: string, content, subsequence.
5. Result: index, position.
6. Returns: reports.

That is, code documenting is meaning recognition applied to classes, functions, methods, etc. What is different comparing with text only: meaning elements are separate entities, therefore you can easily replace them with similar one (like using "finds" instead of "searches").

Комментариев нет:

Отправить комментарий