«  Better image searches ahead? Main Google makes a difference  »

John Wilkin€™ of the University of Michigan libraries was badly misquoted in an AP story on the digitization projects at the University of Michigan.

On his blog, he uses that glitch (calling the Brewster Kahle's issues with Google "polemical," but the AP substituted "theoretical.") to generate a great discussion about what he describes as degrees of openness.

I think all of this debate begs us to ask the question "what is open"? For the longest time (since the mid-1990's), Michigan digitized public domain content and made it freely viewable, searchable and printable. Anyone, anywhere could come to a collection like Making of America and read, search and print to his heart's delight. If the same user wanted to download the OCR, that too was made possible and, in fact, the Distributed Proofreader's project has made good use of this and other MOA functionality. We didn't make it possible for anyone to get a collection of our source files because we were actively involved in setting up Print-on-Demand (POD), POD typically has up-front, per-title costs, and making the source files available would have cost us some sales that might otherwise pay for that initial investment. As we moved into the agreement with Google, we made clear our intention to do the same "open" thing with the Google-digitized content, and to throw in our lot with a (then) yet-to-be-defined multi-institutional "Shared Digital Repository." In fact, now we have hundreds of thousands of public domain works online, all of which are readable, searchable and printable by anyone in the world in much the same way.


Could the Google books be more open? Absolutely. Along with Carl Malamud, for example, I would love to see all of the government documents that have been digitized by Google available for transfer to other entities so that the content could be improved and integrated into a wide variety of systems, thus opening up our government as well as our libraries. I believe that will happen, in fact, and that Google will one day (after they've had a chance to gain some competitive advantage) open up far more. In the meantime, however, when we talk about "open," let's mean it the way that the OCA FAQ means it. Let's mean it in the same way that the bulk of our audience means it. Let's talk about the ability to read, cite and search the contents of these books, and let's call the Google Books project and particularly Michigan's copies Open. Let's stop being theoretical, er, I mean polemical.

Here is part of the comments on Wilkin's blog:

Brewster Kahle on paragraph 2:

John– while it may not be appropriate to start this in a comment, but I am quite taken aback by your seeming implication that “open” includes what google is doing and what UMich is doing.

“Open” started to be widely used in the Internet community in association with certain software. Richard Stallman calls it “free”, but “open” has also come to be used as well. Lets start with that.

“Open Source” in that community means the source code can be downloaded in bulk, read, analyzed, modified, and reused.

“Open Content” has followed much the same trajectory. Creative Commons evolved a set of licenses to help the widespread downloading of creative works, or “content”. Downloading, and downloading in bulk, is part of this overall approach as we see it at the Internet Archive.

Researchers (and more general users, but we can stick with researchers because they are a community that research libraries are supposed to serve) require downloadability to materials so they can be read, compared, analyzed, and recontextualized.

Page at a time interfaces, therefore, would not be “open” in this sense. Downloadable crippled versions would not be open in the Open Source or Open Content sense either.

As a library community, we can build on the traditions from the analog world of sharing widely even as we move into the digital world. We see this as why we get public support.

Lets build that open world.

We would be happy to work with UMich to support its open activities.


April 25, 2008 4:59 pm
jpwilkin on paragraph 2:

I think this is precisely the sort of rhetoric that’s muddying the waters right now, Brewster. There is no uniformly defined constituency called “researchers” who “require downloadability.” I know ‘em, I work with ‘em, and I know that’s not true. Access (and openness) is defined on a continuum. What we do is extraordinarily open and has made a tremendous difference for research and the in the lives of ordinary users. This sort of differentiation in the full accessibility of source materials is one of the key incentives that has brought organizations like Google and Microsoft to the table, and if it didn’t make sense, the OCA wouldn’t go to pains to stipulate that “all contributors of collections can specify use restrictions on material that they contribute.” Is more open better? Damned right. That’s one reason why for two years we’ve been offering OCA the texts Michigan digitizes as part of its own in-house work. But is what we’re doing with Google texts open? Absolutely.
go to text » Reply »
April 26, 2008 8:43 am
Carl Malamud :

I’m not sure I get all these degrees of open … let me add a hypothetical if that helps clear this up.

What if a bunch of students in Ann Arbor organized themselves into a Democracy Club and started grabbing all the public domain documents they can find on MBooks and uploading them to some site such as scribd.com or pacer.resource.org for recycling? If the docs are open (and we’re just talking “works of the government” which are clearly in the public domain), would you consider that a mis-use of your system and try and stop it or would that fall inside of the open side of the open continuum we’re all trying to mutually understand in this dialogue?

Hypothetically speaking, of course. I’m not advocating that students form a Democracy Club and crawl your site to recycle public domain materials, I’m just trying to understand if the restrictions on reuse are passive ones like obscuring how to download files or if these are active restraints where the library is involved in enforcing restrictions on access to public domain materials.

Again, I’m not at all suggesting that students interested in furthering the public domain form Democracy Clubs and start harvesting documents from the public taxpayer-financed web sites at UMich and re-injecting them into the public domain.
go to text » Reply »
April 26, 2008 7:13 pm
John Wilkin and others on Openness and its opposites | Au Courant on whole page :

[…] Kahle’s as “theoretical,” when John meant polemical.” John has a nice blog post on the on the subject, with responses and rejoinders from both Brewster and from Carl Malamud. The […]
go to text » Reply »
April 26, 2008 4:01 pm
jpwilkin on paragraph 2:

What if? If there really were that sort of interest, I’d hope that we’d have a chance to talk to the students and make sure they were aware of powerful options to make “in situ” use of the openly accessible government documents that they find in MBooks. I’d want to make sure they knew that in late June we’re releasing a “collection builder” application that will allow them to leverage our investment in permanent (did I say permanent?) curation of these materials so that the materials could be found and used after the current crop of students comes and goes, that the students could add to the body of works as more get digitized from our collection and the collections of other partner libraries (e.g., Wisconsin’s are coming in soon) and that we would want to hear what sorts of services (an RSS feed of newly added gov docs?) might aid them in their work. I’d want to talk to them about the issue of authority and quality, and would see if there were ways that their efforts could help improve the works in MBooks rather than dispersing the effort to copies in multiple places. And if they needed computational resources to do things like data mining, I’d let them know that we’re glad to help. But if none of this satisfied them, would we try to stop them? Assuming Google digitized the works, according to our agreement (4.4.1) we would make “reasonable efforts … to prevent [them] from … automated and systematic downloading” of the content, something we currently do and which does not undermine the ability of those same students to read, search and print the documents. Lots of openness there.

What Wilkin dismisses as "rhetoric" and "polemical" is a real-world difference: usability that enables imaginative and powerful uses of materials that we (in our modest 2008 mindsets) can't yet imagine. Free Software, like liberal democracy, is constitutionally structured to limit binding future initiatives. Let's face it. We don't know what people of 2058 will want to do with digitized material. So let's avoid proprietary formats (because they die), corporate control (because corporations fail and morph), and closed standards and code (because they limit improvisation and correction).

Wilkin is just as polemical as Kahle (not that there is anything wrong with that). His rhetoric just rests on the default polemic of our age: proprietary neoliberalism.

Now here is where it gets all slippery. Let's face it, Google's ambitions and Michigan's hyperbole about their partnership is just as polemical (if not more) than Brewster's OCA principles. The difference is that we are so used to proprietary and neoliberal models being the default mechanisms for getting anything done in the university and library worlds that we discount a bold a profound distinction that Brewster is trying to import: Freedom.

When librarians are bound by nondisclosure agreements from discussing a major project involving state resources, we cannot pretend that there is meaningful freedom. When libraries are restricted in what they can do with copies of their own materials (in other library-Google contracts; not Michigan's), we can't pretend that there is meaningful freedom. When one secretive company claims as its mission ("organize the world's information") and public university libraries defer to its hegemony rather than building alternatives and challenging its troubling policies, we can't pretend that availability equals meaningful openness, let alone freedom.

So while "openness" lies on a continuum (see John Willinsky's essential book, The Access Principle, for a brilliant discussion of all these issues), "freedom" is not so smooth a concept. There are degrees of freedom. But they are more staggered and clearly defined than those of openness.

This is the main reason that Richard Stallman resists the clever branding of "open source." It invites all sorts of slipperiness at the expense of the public good. I think Stallman sacrifices many useful partners and allies for idealogical purity. But that is not Kahle's problem. His problem is that his project is dwarfed by one of the richest institutions in the history of the world and he can't get a fair hearing without being dismissed as "polemical." Yeesh. If only.

Paul Courant calls Kahle's issues "the perfect being the enemy of the good." If only it were that simple. It's actually the case of the not-as-good-as-advertised crowding out the could-be-great-if-everyone-would-start-asking-tough-questions


Comments (2)

Sherman Dorn (Editor, Education Policy Analysis Archives) on April 29, 2008 9:29 PM:

I side with Wilkin here, not because I like Michigan's arrangement with Google but because there's a huge difference between saying that the arrangement has limitations (which it certainly does), on the one hand, and saying that we should avoid all encumbrances on the use of digitized material, on the other hand. Whatever develops will be at the hands of multiple individuals. Yes, criticize Google, but please don't underestimate or misunderstand the reasons why libraries might partner with Google.

As Steven Weber has documented in The Success of Open Source, the open-source movement evolved away from its origins in Stallman's puritanism and towards a more flexible and inevitably political arrangement for property. In other words, the model of software that people are building on here is an ingenious solution to the political problem of complex entities, but it developed in a very human environment.

Raizel Liebler on May 9, 2008 5:40 PM:

The "theoretical" aspects of Google's scanning project are important, but I find the reality even more troubling. I have blogged about the public domain materials that according to Google’s own policies should be available for download.

I wasn't the first to note this: One of the examples given by the Prelinger Library over two years ago was a specific version of the copyright law, digitized September 2005 from the University of Michigan. It is still not available today for viewing or downloading.

Who can we go to correct this? Last year, I contacted some of the partner libraries — the response was it was Google’s responsibility to make those documents viewable or downloadable.

This doesn't seem very "open" to me. Or "accessible." I view librarians (and others) pushing on Google and partner libraries as a means of moving towards "could-be-great-if-everyone-would-start-asking-tough-questions

Post a comment

We had to crank up the spam filter so it may take a little while to appear. Thanks.

A book in progress by

Siva Vaidhyanathan

Siva Vaidhyanathan

This blog, the result of a collaboration between myself and the Institute for the Future of the Book, is dedicated to exploring the process of writing a critical interpretation of the actions and intentions behind the cultural behemoth that is Google, Inc. The book will answer three key questions: What does the world look like through the lens of Google?; How is Google's ubiquity affecting the production and dissemination of knowledge?; and how has the corporation altered the rules and practices that govern other companies, institutions, and states? [more]

» Send links, questions and ideas:
siva [at] googlizationofeverything [dot] com

» To reach me for a press query, please write to SIVAMEDIA ut POBOX dut COM

» To reach me for a speaking invitation, please write to SIVASPEAK ut POBOX dut COM

» Visit my main blog: SIVACRACY.NET

» More about me


Like the Mind of God (22 posts)

All the World's Information (26 posts)

What If Big Ads Don't Work (10 posts)

Don't Be Evil (9 posts)

Is Google a Library? (43 posts)

Challenging Big Media (18 posts)

The Dossier (19 posts)

Global Google (3 posts)

Google Earth (3 posts)

A Public Utility? (19 posts)

About this Book (16 posts)

Other books by Siva:


Rewiring the Nation: The Place of Technology in American Studies (Johns Hopkins University Press, 2007)

The Anarchist in the Library (Basic Books, 2004)

Copyrights and copywrongs cover

Copyrights and Copywrongs: The Rise of Intellectual Property and How it Threatens Creativity (New York University Press, 2001)


  • Sivacracy.net
  • if:book
RSS Feed icon  RSS Feed

Powered by Movable Type 3.35