EOT Harvest, part III: Getting to work

Before I begin describing our workflow, here is a brief recap of the previous blog posts:
Students taking the course in government information sources this semester are working to assist in capturing social media websites of the federal government. These websites will be archived by The Internet Archive.

After a presentation by the EOT team, we began to chart our course. The sixteen students in the class divided into four teams. Each team has at least one member that is versed in social media, and in many cases, more than one. Students selected names for their teams. While the names had no operational function, they were instrumental in building team spirit and generating jokes. We have the L4 nominators, with all members having first names that begin with L, the Obama-net Preservation Society, and two groups that paid homage to Adelaide Hasse, and they are Team Adelaide and the Hasse Harvesters.

The basic workflow was simple: divide the government agencies into four parts and have each team examine all the agencies in one part, locating which social media websites they use. Each team created a spreadsheet which lists the URL and other information requested on the nominating tool. Once I review and approve each nomination, students can begin live nominations.

The plethora of available social media websites meant that we had to narrow the scope and focus on the main venues: Facebook, Twitter, and YouTube received first priority, followed by LinkdIn and Pinterest. Student were free to add other social media sites, and many students nominated agency blogs and some lesser-known sites like GitHub, a site for sharing and collaborating on computer code.

We debated over which agency list to use. While the most comprehensive and authoritative list is probably the U.S. Government Manual we chose to work from the A-Z agency list available on USA.gov, which offers a simpler interface and concise information.

There are approximately 500 agencies listed, so each team was assigned 125 agencies. In turn, each team divided the agencies amongst their own members, resulting in approximately 31 agencies per student. While the plan was to distribute the workload evenly among students, this proved to be all but impossible. Some agencies are self contained and do not branch out or down, for example, the Office of the Architect of the Capitol; whereas others have a deep organizational structure where each division in the chart includes additional sites that need to be checked. A classic example would be the State Department with embassies throughout the world, each with its own social media presence. Following a discussion with students who drew these short straws, we decided that students will select a few embassies to nominate for the harvest, focusing on contentious locations such as Afghanistan or Syria and not nominate countries like Canada and Finland.

In addition to agency website, we were asked to nominate social media of elected individuals currently serving in the Senate or the House, who are not running to reelection, and this list too was divided between the four groups.

Once students began work, we began to encounter problems regarding the status of agencies (are they official or quasi-official?) and multiple pages on a particular social media website.

Detailed descriptions of the problems we encountered will follow in the next post.

And a final note, I would love to receive your comments or thoughts on this project, you can leave comments here or email me.


No-fee public access to government information

My bags are packed and I’m ready to go to Washington DC to do my bit for “no-fee public access to government information in all forms and from all three branches of government now and in the future.”
Oct. 14-18 brings the DLC Meeting and FDL Conference. For those unfamiliar with this annual ritual, let me explain. The Government Printing Office is a federal agency with several responsibilities, among them disseminating information created by the U.S. government to the American people. This information is made available through about 1200 libraries where librarians with expertise in government information maintain the collection and provide research, reference and informational services. The FDL Conference. During the four conference days attendees discuss content, policies and politics of public information. During the content session there are presentations on census data, public health information and other information sources. The policies sessions discuss upcoming projects such as new databases or collections that will be added to FDsys, and updates on library services and content management. The politics sessions discuss the future vision for the depository library program and the sessions about the forecast recently conducted by GPO will be streamed live.
In the current political climate when many government programs are on the chopping board, advocating from public information is everyone’s responsibility. Follow GPO on Twitter for updates during the conference: #dlcf12 OR #dlc12 OR #fdlp

EOT Harvest, part II: How we got involved

I became aware of the EOT Harvest through meetings and presentations at the Depository Library Council conference, so when I saw that the project was seeking volunteers to nominate content for the archive, I already had some notion of what the project may involve.

I saw the notice on my favorite blog, Free Government Information (FGI). The notice briefly described the project and called for volunteers to nominate, through an online form, U.S. Federal government domains to be archived.
I used the generic e-mail address that was provided for further questions, and e-mailed the following:

Dear EOT project managers,
I saw the call for volunteers on FGI and thought this may be an opportunity to involve students taking my Government Information Sources (Fall 2012) course. Since they are new to GovDocs I would have to have something a little more contained and targeted for them. If there are any specific agencies/sub domains and such that I can have students work on, I would be glad to help. This is both a great learning opportunity as well as an act of civic responsibility. If this is at all on interest we can pursue this further.
Thank you.
Debbie Rabina

The first promising sign was a reply that came within 24 hours from one of the partners, the Library of Congress. They were excited about the suggestion and invited further discussion.
I cannot emphasize enough the importance of the relationship between volunteers and the EOT team. In our case it is making this project so much easier. In the weeks since we began work, I have been e-mailing the EOT team regularly, usually several times a week, At least one team member gets back to me within hours–including during weekends. This is instrumental to keeping the pace of the project going. Students are waiting for answers so that they can submit nominations, and every effort is made not to slow them down.

Following some e-mail exchanges and a conference call between myself and the EOT team, we came up with a project for students. During the course of the semester, students will systematically locate social media sites that are maintained by all three branches of the federal government such as, for example, NASA’s Twitter feed.

Sixteen students are involved in the project, all taking LIS 613 Government Information Sources. Most students are new to government information and the project was initially not very clear to them.

Several elements contributed to getting students to understand and get excited about the project. These included several classroom discussions that were enforced by a detailed write-up of the assignment, published literature of the 2008 EOT harvest, and finally, a conference call with the EOT team.
Sitting in our classroom in New York City, with a combination of Skype, a land-line conference call and Power Point slides, we discussed the project and our role and students had an opportunity to have their questions answered. It was after this conference call, which took place during Week 3 of the semester, that we finalized the workflow and began nominating websites for inclusion in the 2012 EOT archive.

As the project and our own involvement become clear, and lessons about government information began to emerge:

Lesson #1: The amount at at-risk information is enormous.
Students were under the impression that the Federal Depository Library Program preserves all content authored by the government and were surprised to learn that most agencies’ web content is not part of the distribution and preservation efforts of GPO. This required us to set up a good workflow that would capture as much of this information as possible.

Coming next: Our workflow