10.07.09
Phil Patton | Opinions

Why Is Google Giving Us the Finger?


Here's what a Joseph Conrad novel might look like when you find it in the library...

Like just about every professional writer and reader, I have been curious about Google’s much-debated library of scanned books — for personal reasons. After critics of the Google Books project charged the company with copyright infringement, a tentative agreement was reached last year that promises to pay authors $60 for the rights to copy each of their publications, with other fees to come. But I’m less interested, frankly, in any future royalties than in the benefits of instant access to a library that is estimated to eventually top 20 million books.

So when a mobile version of Google Book Search showed up among the apps offered on my relatively new iPhone, I tried it out. I was delighted to find that I could browse every issue of Life magazine, from 1936 on, much as I had as a child (though I no longer retreated to the dark closet under the stairs in the decrepit ancient house of my great-aunt Olive). And I learned some surprising things.

One of the sample texts on Book Search was a Joseph Conrad novella from 1917, The Shadow Line. I read it while stranded in an airport waiting room, happy for the emergency material. The novella put me in mind of Conrad’s Under Western Eyes, a 1911 novel about terrorism that struck me as having renewed relevance for our time. I searched the free-books list on Book Search, and there it was, in the public domain.

The type was clear, and I found it easy to drag the text down the screen with my thumb — easier than with Kindle, or the Sony Reader, with its irritating page-refresh flicker. But I noticed here that the scanning process occasionally stuttered. Bits of grit or loose paper appeared to throw off the character-recognition software. French phrases so confused the device that it threw in asterisks, tildes and carats. Now and then, it would give up completely and erupt in a string of dingbats like comic-book cursing. A couple of underlined sentences were suddenly reproduced photographically in the original book type rather than the screen type. Then the device seemed to hit the virtual carriage return a few times, producing a three-quarter-inch blank space.

I was surprised that Google didn’t make use of a higher class of scanner. And I was really surprised by what happened next: like a dirty photo falling from between the pages of a book, a photograph popped up.



...And here's what it might look like when you download it from Google Books

It showed the hand of whoever fed pages into the scanner — a hand with a latex sheath on its index finger, like a condom. The person’s nails were nothing to brag about. The condom and the nails, combined with the sudden, unexpected appearance, made the picture seem obscene and unhealthy. I thought with horror of the guy who found a finger in his bowl of fast-food chili.

Was this the literal hand of Google? The fickle finger of the company that holds my copyrights? The sticky fingers that, to hear some tell it, threaten to grab our literary heritage?
      
I wondered what such sloppiness said about the book-scanning project — about how much we can trust Google and how much we should fear it. Even as people involved with publishing have debated the issue of Google’s right to digital content, most of us, impressed with the company’s search engine and maps, have assumed it would at least get the technical part right.

Rereading press coverage of Google Books, I learned that others had found finger photos, and some had posted them online. But these technical concerns were crowded out by the lovefest for the project engaged in by important writers. Take Jeffrey Toobin's sloppy kiss to the deal in the February 5, 2007, New Yorker.

In Toobin’s account, details about the scanning process are not so easy to pin down. He depicts Google’s chief scanner, Dan Clancy, a NASA veteran, as a lovable geek with granola-bar crumbs clinging to his clothes.

Clancy tells Toobin that the project’s enormous scope required the development of special scanning tools and leaves it at that. Says Toobin, “Google will not discuss its proprietary scanning technology, but, rather than investing in page-turning equipment, the company employs people to operate the machines, I was told by someone familiar with the process. ‘Automatic page-turners are optimized for a normal book, but there is no such thing as a normal book,’ Clancy said. ‘There is a great deal of variability over books in a library, in terms of size or dust or brittle pages.’”

According to a Wikipedia contributor, Google currently uses Elphel cameras for book scanning. These were apparently adapted from models used to capture street imagery for Google Maps. (Elphel is a little-known company based in Utah that, ironically, given Google’s secrecy, uses open-source software to operate its equipment.)
   
Some critics, of course, have highlighted concerns about the technical side of Google Books. In August, the linguist Geoffrey Nunberg, writing in The Chronicle of Higher Education, attacked the project for errors in the data used to file the books: author, title, subject and year of publication, to begin with the most basic classification elements.

Nunberg wrote that the “book search's metadata are a train wreck: a mishmash wrapped in a muddle wrapped in a mess..."

To take Google's word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler's Killer in the Rain, The Portable Dorothy Parker, André Malraux's La Condition Humaine, Stephen King's Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams's Culture and Society 1780–1950, and Robert Shelton's biography of Bob Dylan, to name just a few. And while there may be particular reasons why 1899 comes up so often, such misdatings are spread out across the centuries. A book on Peter F. Drucker is dated 1905, four years before the management consultant was even born; a book of Virginia Woolf's letters is dated 1900, when she would have been 8 years old. Tom Wolfe's Bonfire of the Vanities is dated 1888, and an edition of Henry James's What Maisie Knew is dated 1848.

Part of the problem is the stupidity in software, or grayware. But scanning technology is also at fault, Nunberg believes. For instance simple misreading of the copyright page seems to lie behind many incorrect datings.

Nowhere in Google’s FAQs or anywhere else is there a clear answer to the question of how books are physically scanned. Whether the books are disassembled in the process of scanning. What measures are taken to avert damage to scanned books, especially to older, more fragile ones with dry bindings and acidic paper. What sort of action readers or authors can take if they encounter errors in the scanning, dating or classification.

Nor has Google's press department answered my email asking these questions.

So it is likely that the company will also ignore this question: If the process of creating Google Books is open and its motives good, why is there so much secrecy about the nuts and bolts? Many experts feel there is room only for a single digital super library and Google is it. Geoffrey Nunberg writes, “No competitor will be able to come after it on the same scale. Nor is technology going to lower the cost of entry. Scanning will always be an expensive, labor-intensive project." So why then does Google seem to fear competition from the disclosure of information about the process?

In an October 9 New York Times op-ed piece, Sergey Brin promised to improve on the bibliographic information in Google Books. But he said nothing about scanning errors and seems to dispute the prediction that the service is likely to emerge as a de facto monopoly. Writing about the millions of out-of-print books threatened with extinction, the books he aims to preserve, he said, “I wish there were a hundred services with which I could easily look at such a book; it would have saved me a lot of time, and it would have spared Google a tremendous amount of effort. But despite a number of important digitization efforts to date (Google has even helped fund others, including some by the Library of Congress), none have been at a comparable scale, simply because no one else has chosen to invest the requisite resources. At least one such service will have to exist if there are ever to be one hundred. If Google Books is successful, others will follow.”

If there are to be many libraries, it is all the more important to get the quality of the original scans right. The same files might serve as material not only for other libraries but also for other formats, including Kindle or open-source-based readers.

Concentrating power and responsibility for any purpose in the hands of a single entity is rarely positive. You don't have to read millions of scanned books to glean that lesson. Just try Suetonius, The Federalist Papers, Barbarians at the Gate or All the King’s Men. You can find them for free — at your public library.




Comments [16]

But of course you could have gone to gutenberg.org and read a clean, proofread copy of the book, downloaded in perhaps a second or two, for free.
j gold
10.12.09
09:53

Many people wear finger protection on the job, have you ever gotten a paper cut?

If the image were about workers in the Post Office or a library, or administrative assistant or other worker who has to deal with a high volume of touching paper in their daily job would, you say they were wearing condoms?

Sad you are reading books on your iPhone, but I'm reading Design Observer online, so go figure.
FPO
10.12.09
10:41

"Elphel is a little-known company based in Utah that, ironically, given Google’s secrecy, uses open-source software to operate its equipment.)"

Uh? Are you implying that Google's 'secret' approach to doing things should use 'secret' tools (as in non-open source software)?

You are confusing a process with a tool. The irony in the passage above is misconceived.
Mauro Mello Jr.
10.13.09
07:17

"Elphel is a little-known company based in Utah that ... uses open-source software to operate its equipment."

That statement is somewhat misleading - we (Elphel) are not just _using_ free and open source software (like some our customers), our products _are_ licensed under GNU licenses (GNU GPL v.3 for the software and FPGA code, GNU FDL for the circuit diagrams and PCB layout).

As for our customers - these licenses only mandate releasing the derivative code only if the (derivative) products themselves are distributed. As long as they are used in-house it is OK to be secretive.

Andrey
Andrey Filippov
10.13.09
09:57


For Anthony Grafton's recent essay discussing his reservations and placing the Google project in a broader historical context see:

Future Reading, Digitization and its discontents.
http://www.newyorker.com/reporting/2007/11/05/071105fa_fact_grafton?printable=true
Michael Robinson
10.13.09
12:55

You might find this http://www.youtube.com/watch?v=2D3BZTJZyQY of interest on Google Tech Talks. One point of interest is that the digitization of these books is not necessarily a one time process but will be conceivably repeated several times as OCR software is improved and is rerun on the saved scans of these books.

I also have to heartily applaud the wonderful volunteer efforts of Project Guttenberg. Anyone can become a transcriber. I am interested in the implications of paying people simply to transcribe books into digital format. Interesting how we are ok with it being free (volunteer) but would suddenly consider it objectionable to pay people $.01, $.10, etc. per page to do transcription.

In a time when many are excited by the potential of social networking, crowd sourcing, and coordinated action mediated by the internet it seems beneficial to have volunteers and/ or paid individuals use their wonderful grey matter to digitize these texts.

10.13.09
03:08


Muy bueno el blog de diseño! great finger! thanks from Argentina.
diseño de cd
10.13.09
05:59

I love how America has become so lazy that we can't even go to the library anymore and read the classic books in their original bindings, they are not meant to be read on your iPhone. Also, the fact that a picture of a person scanning an image with a finger protector on their index finger does that not make them less professional it makes them smart. Maybe Google should have cropped it out, but if you were scanning that many pages so somebody could read the book on the internet you might get a little sloppy and lazy too.
Aleah Pavlicek
10.13.09
09:48

There's something that always bothers me when someone complains about something that they get for free. I wouldn't even want to guess at the number of pages Google has scanned, and one has to presume that there's going to be errors. The picture of the finger is a funny surprise, and that's about the worst interpretation that should be put on it.

"How much should we trust Google?" About as much as anything else on the internet.

Charles H. Bryan
10.14.09
01:53

First of all I would like to respond to Aleah.. I do not think america is lazy. I believe google is doing a great thing, bringing books to everyone, and for free! Many libraries throughout the U.S. do not even have as many books as google is offering. So what if it comes with a few mistakes now and then, what great thing in history started out perfect? everything evolves and takes time to perfect. AND to be honest I don't think your lazy if you are reading on the internet, especially if it is a book, at least your not watching t.v. or playing video games. One in for kids is illiterate in america.. AMERICA.. not africa.. and if a kid or even an adult is reading a book on the internet to kill time at the airport, or in line or whatever instead of playing block breaker then more power to them because that is a great thing. Not to mention, if I was an author I don't think I would care because it is getting your book out there and more accessible to read, maybe someone will see it on google and will be inspired to go purchase it to have in their collection, but wouldn't have if they had not read part of it on google books. You know technology changes and I think books are included in that, the printing press was a history changing invention, but the internet is bigger and more accessible.
Courtney
10.14.09
03:18

I find it a little outrageous to read a book in it's entirety on your phone. Or to even download an entire book onto your phone. I have yet to see somebody or meet somebody that uses the app and actually reads books from their phone, but I would find it rather humorous to see such a thing. It's comparable to bootlegging movies, no matter what the real version is always better.
Scot Ferguson
10.14.09
03:38

But let's not forget how bad it could really get:

http://en.wikipedia.org/wiki/Rainbows_End
Haig Evans-Kavaldjian
10.15.09
10:22

What? No mention of Internet Archive. Download a copy of a book from both Internet Archive and Googlebooks and see the difference yourself.
Monty
10.16.09
01:01

Brin & Co. have no genuine interest in preserving anything. Google is the BASF of the internet: they don't produce the content, they present it. Books are just another form of content to be harnessed for commercial gain.
Josh
10.16.09
02:28

To Aleah:

Unfortunately, going to the library is not a choice for some people in America! Perhaps you are from a city and do not realize this -- many people are in that boat -- but there are vast areas of this country where there are NO LIBRARIES!

Imagine ... entire communities and counties with no libraries. That is rural America for you! Please think outside your big-city box.

Linda
01.09.10
09:21

This is somewhat humorous and also sad. It is funny that the person accidentally scanned their hand but the sad realization is that there is a human out there making probably 2 cents an hour scanning books for your $300 i___ (insert pad, pod, phone) appliances. Nothing is pretty anymore, not even technology and it takes accidental photos like this to bring us back to reality.
L. Dinger
02.07.11
11:52


Jobs | March 19