Posted on 03/04/2009 9:00:26 PM PST by Lorianne
With the help of a MacArthur genius grant, von Ahn set out to make amends. Now a growing number of websites, from e-commerce (Ticketmaster) to social networking (Facebook) to blogging (Wordpress), have implemented the precocious professors new tool, dubbed recaptcha. If youve visited those sites, your squiggly-letter- reading ability has been harnessed for a massive project that aims to scan and make freely available every out-of- copyright book in the world, by deciphering words from old texts that have stumped scanning software.
The largest scanning centre in this project, directed by a San Franciscobased non-profit called the Open Content Alliance, occupies a dimly lit corner of the seventh floor of Robarts Library, on the University of Torontos downtown campus. The space is filled with twenty-three cubicle-like scanning stations draped on all sides with light-proof black cloth, like rows of coin-operated peep shows.
When the centre opened in 2004, its single robotic scanner used a vacuum suction arm to turn pages automatically. We ran it into the ground, recalls coordinator Gabe Juszel, a cheerfully earnest former filmmaker who sports a soul patch. It was literally smoking by the time we were done with it. But with the wide variations in book sizes, binding, and condition, they consistently found that they could achieve a higher scanning rate by simply turning off the robotic arm and flipping the pages manually something else, it seems, that humans are still better at.
Two shifts of dedicated employees keep pages turning from 8:30 in the morning to 11 at night, leavening the monotony by listening to music on their iPods, reading, or (in one particularly talented case) knitting as they go. Two Canon digital slr cameras mounted in opposing corners of each booth click at an adjustable pre-set interval. Rookies opt for seven seconds, the slowest possible; veterans can scan a page per second.
Juszel had just returned from the ocas annual meeting, where it was announced that the number of books available on the groups Internet Archive had broken the one million mark. U of T is currently adding about 1,500 books a week and at that rate theres no need to be choosy about which ones to scan. Its a real beast to feed, actually, says Jonathan Bengtson, the librarian who oversees the universitys role. Entire subject areas are scanned by sorting for pre-1923 works (in accordance with US copyright laws), eliminating duplicates, and taking everything thats left. Scholars from around the world can also request books for ten cents a page, and typically see them online in less than twenty-four hours.
The most popular Toronto contribution, Juszel reports, is a 1475 edition of St. Augustines De civitate Dei, downloaded a baffling 75,911 times (at press time). Who knew people liked Latin so much? he says. Toward the other end of the spectrum is a book pulled from the stacks around the same time: the Montreal Philatelist, a monthly journal that ran from 1898 to 1902, which features lurid tales of stamp counterfeiting in Newfoundland.
... excerpt
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.