If I handed you a book and asked whether it was in copyright or in the public domain, you'd probably turn to the copyright page first. Unfortunately, a copyright page can't answer that question definitively -- at best, it could tell you when the book in your hands was published, and who owned the rights to it at that time. Ownership can change, though: rights revert back to authors, and after enough time has passed, the book enters into the public domain, letting people copy and adapt it as they wish.So how much time is "enough"? It varies, often depending on the country, on when the book was published, and whether the author is living. For U.S. books published between 1923 and 1963, the rights holder needed to submit a form to the U.S. Copyright Office renewing the copyright 28 years after publication. In most cases, books that were never renewed are now in the public domain. Estimates of how many books were renewed vary, but everyone agrees that most books weren't renewed. If true, that means that the majority of U.S. books published between 1923 and 1963 are freely usable.
How do you find out whether a book was renewed? You have to check the U.S. Copyright Office records. Records from 1978 onward are online (see http://www.copyright.gov/records) but not downloadable in bulk. The Copyright Office hasn't digitized their earlier records, but Carnegie Mellon scanned them as part of their Universal Library Project, and the tireless folks at Project Gutenberg and the Distributed Proofreaders painstakingly corrected the OCR.
Thanks to the efforts of Google software engineer Jarkko Hietaniemi, we've gathered the records from both sources, massaged them a bit for easier parsing, and combined them into a single XML file available for download here.
There are undoubtedly errors in these records, but we believe this is the best and most comprehensive set of renewal records available today. These records are free and in the public domain, and we hope you're able to use them to determine the copyright status of books that interest you.
At Google, we're committed to making as many books available online to users as possible while respecting copyright, and this is one example of that commitment. Watch this space for more to come.
This is great news for historians, journalists, researchers, publishers, and librarians. It's also great for the Open Content Alliance and other book digitization projects.
Of course, this does not help much with books published and copyrighted outside of the United States. But that's always a complication.
However, I wonder if Google itself is going to use these records to change the format of many of the scanned books published between 1923 and 1963. Currently, these are only available in "snippet" form. Will Google Book Search change significantly now that this file is available?




Comments (2)
According to the basic guidelines, pre-1923 works should be PD. However, there are many of these, including old copies of classics, which are blocked on Google. For example, pre-1923 issues of Publishers' Weekly should not be blocked but they are.
Sometimes this is because some enterprising entrepreneur has taken the PG or MS or G files and "published" them with a print on demand system. They don't get new copyrights on the PD material (only something new they added like illustrations, an introduction, index, etc) yet they are blocked on Google Books.
I think Google is trying to avoid antagonizing too many publishers. The end result is that they are granting them a copyright extension beyond the statutory limit.
James
Siva, it is not just foreign works that are not helped by the Google records. Copyright restoration has made it almost impossible to determine US copyright status, as I describe in my article "Copyright Renewal, Copyright Restoration, and the Difficulty of Determining Copyright" at http://www.dlib.org/dlib/july08/hirtle/07hirtle.html.
Here is the conclusion:
This paper has demonstrated that it is almost impossible to determine with certainty whether a work published from 1923 through 1963 in the US is in the public domain because of copyright restoration of foreign works. First you have to determine if the work was also published abroad or if it is based on or derived from a work published abroad. If a foreign edition is found, one then has to establish the order of publication, and whether the foreign publication occurred less than 30 days before the US publication. If foreign publication was more than 30 days before American publication, one next needs to determine if publication occurred in an eligible country and if at least one of the authors of the work was living in or a citizen of an eligible nation. Checking the copyright renewal database is still important, but only after one has determined that the work's foreign copyright was not restored or that it does not draw upon subsisting foreign copyrights.
Copyright restoration has been criticized for unnecessarily removing thousands of foreign-published works from the public domain in the United States. What has been little noticed up to now is its negative impact on the determination of the potential public domain status of works published in the US. In many cases the impossibility of determining with certainty the absence of subsisting foreign copyrights in American publications that otherwise would be in the public domain means that American institutions will either have to keep these works inaccessible to the general public or risk the possibility of an infringement suit.