Introducing the End of Term Harvest project at Pratt SILSPosted: September 29, 2012 | |
I am excited to be contributing this semester to the End of Term (EOT) harvest project.
What, you may rightly ask, is the EOT harvest?
So here’s the short answer: The EOT harvest archives web information from the U.S. federal government in the months before and after presidential elections.
Who is behind the EOT Web Archive: The archive is created through a partnership among several organizations, among them The Internet Archive, the Library of Congress, California Digital Library, University of North Texas and the Government Printing Office.
More detail: The federal government authors a lot of content in many formats, over 95% of it electronically born. Some of this content is very well managed and protected by legislation. It is cataloged and authenticated and receives metadata and is maintained in archive quality file formats. These include bills and laws and titles such as the Congressional Record or the Code of Federal Regulations. Such content is available from FDsys, the content management system maintained by the Government Printing Office.
But other types of government content do not benefit from these practices. These include websites and their content created by over 500 government agencies.
At the end of a presidential term, particularly if a new president is elected, this web content is in high risk of disappearing from the web. This content reflects a record or blue print of the government. The EOT project was created to collect, preserve, archive and maintain this content. During the 2008 presidential election, the EOT harvest collected 16 terabyte of content that is now made available on the Internet Archive website.
Now the project continues for the 2012 presidential elections and students taking Government Information Sources (LIS 613) at Pratt SILS, are volunteering for the project. Very briefly, we are collecting URLs of social media websites of government agencies, such as the State Dept. twitter feed, for inclusion in the 2012 EOT Archive.
In the next blog posts I will describe in detail our involvement, our workflow, the questions we are encountering, solutions we are finding and lessons we are learning.