Throwing Shoes In the Machinery of the World

Michael Gorman is still showing his ass to the world over at Britannica Blog. This time he demonstrates that he hasn’t got a clue how Google works:

Information retrieval systems have been studied for many decades. In the course of that study two important criteria have been developed to evaluate such systems—those criteria are recall and relevance. The first measures the percentage of pertinent documents retrieved from a database (for example, if there are 100 documents on Zambian agriculture in a database and a search on that topic retrieves 76 of them, the recall is 76%). The second measures the supposed appropriateness of the documents that have been retrieved (for example, if you retrieve 100 documents when searching for Zambian agriculture and 76 of them are actually about Zambian agriculture, the relevance is 76%).

Information retrieval systems achieve high recall and relevance rates by the use of controlled vocabularies (indexing terms, etc.) and present the results of complex searches in a meaningful and usable order. By any of these criteria, Google and its like are miserable failures. A search on those engines on anything but the most minutely detailed topic will yield many thousands of “results” in no useful order and with wretched recall and relevance ratios. However, even when the documents retrieved by a search engine are on the subject sought, the quality of the material – often community-generated material that pops up high on a hit list because the material is free and easily accessible — is shoddy or irresponsible.

Let’s unpack some of the misconceptions that Gorman is, once again spreading heedlessly.

Most academic oriented Information Retrieval Systems are designed to handle relatively small sets of data, especially when compared to the likes of Google, Yahoo or Ask. Ebsco, OCLC, WilsonWeb etc. have a large amount of documents that they index and search through, it’s true. But it’s a finite amount, and a known quantity. Someone has to Index it after all, so there’s a control over what goes in, making sure that what comes out in any given search is going to be on topic and relevant. If you’re searching for a certain book for a paper you’re writing, you might use Books In Print through OCLC FirstSearch, because chances are, that particular system is going to have information about books that are in print. You wouldn’t Use Books in Print to find recipes on how to fix a delicious bass for instance, because the index criteria for Books in Print doesn’t cover that. Google however does. It also indexes books that are in print, but from different sources.* The difference between the two systems is that Google is not searching a finite set of specific, indexed data sets that were chosen by vendors and academics, it’s searching and indexing the entire Surface Web.** If Books In Print is a fishing boat, Google is the Queen Mary. Both travel across the ocean but for different purposes. What Gorman is saying is that Books In Print (and other Academic Information Retrieval Systems) is better than Google because it’s a fishing boat and not a cruise ship. This isn’t even comparing apples and oranges, it’s comparing apple pie with raisins and the salad bar at Denny’s. Both have their uses and while they may overlap in some cases (you probably could find apples, pie and raisins on Denny’s salad bar) that’s just the happenstance of shared content, not design.

Another point to consider, which Gorman doesn’t, is that Google is free, can be accessed by anyone with a networked computer and gets the job done, quickly and to most everyone’s satisfaction. Not many people need an academically verified and indexed search portal to find out the latest rumors about the next Batman movie. Academic research is a very small part of what the Internet is used for today. Now, if I were writing a paper on the history of the Batman mythology, I might want to use a more reliable database, one that has indexed information on the subject. Of course, I’d be out of luck. No academic oriented Information Retrieval System currently covers comic book scholarship. Michael Gorman would probably say that one shouldn’t either, harumph, but weather or not he likes it, Comic scholarship is a fast growing area of interest to many academics as it is one of the few areas that is a confluence of history, pop culture, mythology, literature and social criticism. But, in Michael Gorman’s world, comics scholars are part of the problem. You see, we academics today just don’t respect the Tradition of the Written Word:

Over many centuries civilizations have developed an ethos of scholarship based on respect for the individual mind and veneration for learning and the learned. The thoughts of those individuals have been preserved in texts—many of them centuries old from China, Arabia, Greece, and Rome—that comprise the most important part of the human record. That record is not, alas, complete. Many texts were lost completely in the Manuscript Age and many have come to us in fragmentary or corrupted forms. Though we like to think that the history of society is a story of continuing progress, many electronic texts are in as much danger as manuscript texts—they are subject to loss or corruption in the same manner as those from before the Age of Print. If the culture of learning that has sustained our civilizations for millennia is to be preserved, it is imperative that we ensure that texts are preserved and authentic, that they contain the author’s ideas in the author’s words, and that we respect authorial intent.

He segways neatly from the Web as world’s largest virtual landfill into the preservation debate, without all the hard work of acknowledging the fact that many texts that would have been lost to the ages have been saved and are available for use by academics and the general public due in large part to the Internet, specifically Google.

As it happens, I just finished reading the Book of Lost Books by Stuart Kelly, which is a catalog of a great deal of literature that no one will ever read because it’s been lost. As Kelly point sou tin the introduction to the book, it’s really a feet to be proud of that any manuscripts survived from antiquity at all, what with wars, politics, fervor (the Library of Alexandria was burned no fewer than three times, by three different fanatics, two of them religious) and a concerted effort to keep people ill-educated and under control. Besides being a highly readable and fun book, it makes you appreciate the lengths Google and other institutions are going to to create their Google Books service. The legal wranglings alone are enough to make one wonder if publishers and academics even want to preserve books, as many of the items that are due to be scanned by Google (a proposition that I’m sure makes Michael Gorman cringe in horror) would otherwise go unread and unused, left to decay slowly in the basements and on the shelves of out of the way places.

If you think Michael Gorman has drifted a bit in the scope of his argument, you aren’t alone. We were talking about content, not format. But somewhere along the line, Gorman pulled a Chomsky and switched form message to medium. Clay Sharky responds to this inconsistency in Gorman’s essay:

Gorman then defends traditional publishing methods, and ends up conflating several separate concepts into one false conclusion, saying “To think that digitization is the answer to all that ails the world is to ignore the uncomfortable fact that most people, young and old,prefer to interact with recorded knowledge and literature in the form of print on paper.”Dispensing with the obvious straw man of “all that ails the world,” a claim no one has made, we are presented with a fact that is supposed to be uncomfortable — it’s good to read on paper. Well, “Duh,”as the kids say; there’s nothing uncomfortable about that. Paper is obviously superior to the screen for both contrast and resolution; Hewlett-Packard would be about half the size it is today if that were not true. But how did we get to talking about paper when we were talking about knowledge a moment ago? Gorman is relying on metonymy.
When he notes a preference for reading on paper he means a preference for traditional printed forms such as books and journals, but this is simply wrong. The uncomfortable fact is that the advantages of paper have become decoupled from the advantages of publishing; a big part of preference for reading on paper is expressed by hitting the print button. As we know from Lyman and Varian’s “ How Much Information? ” study, “the vast majority of original information on paper is produced by individuals in office documents and postal mail, not in formally published titles such as books, newspapers and journals.”

Things are changing, and not just in Library Land but all across the Universe. Michael Gorman wishes they would just stop and go back to the way things used to be, when only highly trained elites with a language all their own could access the hallowed words of the elders. To bad for him that the academic world is evolving. Scholarship has become decoupled from academic jargon and reliable information is no longer to be found just in the fiefdom of the printed word. This means that our concepts of knowledge, scholarship and access are changing as well. And unlike during the industrial revolution, you can’t even hope to stop progress these days by throwing your shoes in the machinery.
________

* Unless you happen to use Google while logged in to a computer on a network that ties all its database access to the network ID. Our computers at work do this, which has the unintended side effect of allowing me to use Google to search Books In Print, along with everything else on the Surface Web.

** A popular misconception is that Google searches and Indexes the entire Internet. In fact, it only searches the 10% of the Web that is accessible to the public. Most data on the web is buried on giant proprietary databases concealed behind firewalls. You wouldn’t want to access all that through Google anyway, as it’s such things as financial information, and listings of airline flights. You need a controlled mediator for that info, because if you could randomly retrieve a list of Delta’s flights from New York to London through a Google search it would not only make no sense to the average user but could potentially be a security risk in the hands of a terrorist. Weather or not scholarship should be secured behind the same sort of protection is a subject for another debate.

This entry was posted in Librarians, Library News, Michael Gorman Vs. The World, Technology. Bookmark the permalink.

7 Responses to Throwing Shoes In the Machinery of the World

  1. Bryan says:

    Gorman is so bad that even Kevin Drum noticed his bushel of BS.

    Yes, once I locate the research I want a printed on paper version to use while I work, but I do not want the original, because I like to make notes and mark-up my research, and you can’t do that with an original.

    The paperback book is an ideal form for reading, but most hardbound books are a PITA for research. I may want information from multiple segments of a book for what I’m doing, and that is difficult to manage and unnecessary today.

    There is a point-of-sale book publishing system available, I know because I designed it and tried to get someone interested in the college bookstore market, but they weren’t interested.

    People have no idea how fast some page printers are these days, and the ability to print color for everything except art books and some specialized science works that require photographic quality.

    Nothing would go out of stock, large print of every work, no shipping, everything in electronic format. It’s doable, but too much is invested in the existing system.

  2. Keith says:

    But it is changing, gradually.

    Except id Gorman had his way, we’d abandon the Internet and go back to doing all research through Inter Library Loan or traveling by horse drawn carriage and steam boat to the reclusive abbey where the manuscript is kept. How romantic! and silly.

  3. Pingback: trish2bstratus2bnude2bfakes21202 1 1111self on11253a1175471127299 0417top moderate10

  4. Pingback: design flash flash icon site site web

  5. Pingback: tampa florida apartment

  6. Pingback: order personalized check

  7. Pingback: nyc apartment listing