Shostack + Friends Blog Archive

 

Folksonomies, Tested

I’ve just stumbled across this abstract [link to http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V0B-48YW2RW-1&_coverDate=07%2F15%2F2003&_alid=241295300&_rdoc=1&_fmt=&_orig=search&_qd=1&_cdi=5642&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=8ef3f41f1ea30add7d9baa797146e0bf no longer works] comparing full-test searching to controlled vocabulary searching. The relevance to Clay’s posts on controlled vocabularies is that our intuitive belief that controlled vocabulary helps searching may be wrong. Unfortunately, the full paper is $30–perhaps someone with an academic library can comment.

…In this paper, we focus on an experiment in which different component indexing and retrieval methods were tested. The results are surprising. Earlier work had often shown that controlled vocabulary indexing and retrieval performed better than full-text indexing and retrieval…, but the differences in performance were often so small that some questioned whether those differences were worth the much greater cost of controlled vocabulary indexing and retrieval … In our experiment, we found that full-text indexing and retrieval of software components provided comparable precision but much better recall than controlled vocabulary indexing and retrieval of components. There are a number of explanations for this somewhat counter-intuitive result, including the nature of software artifacts, and the notion of relevance that was used in our experiment. We bring to the fore some fundamental questions related to reuse repositories.