Capturing government websites during the 2013 shutdown

The shutdown of the U.S. government between Oct. 1-16 left government websites in varying stages of disarray. Some agencies shut their websites completely, others remained accessible although no longer maintained, and others still seemed unaffected.
Libraries and commercial vendors stepped in to fill the gap. Librarians created LibGuides with updates on the status of government websites and sources, and commercial vendors provided temporary complimentary access to their databases that are based of government information. In the few days that past since the reopening of the government, these LibGuides ceased to exist and commercial access removed (in fact, access to Social Explorer ceased two days before the shutdown ended). This should not be surprising given the nature of immediacy of websites where here-today-gone-tomorrow is the prevailing approach and the historic value of documenting transitions is overlooked.

I thought it worthwhile to document government websites during the shutdown and was looking for a way to do so quickly and with the limited technology tools and skills available to me. The immediate solution was to capture government websites with Zotero and create a library of websites at the time on the shutdown.

Zotero is an open source bibliographic citation manager from George Mason University. It can be integrated to a web browser and when clicking ‘add’, it will capture the website displayed in the browser, saving a screenshot of the website as well as the bibliographic citation. It is quick and easy to use and answered the needs of this project.

My first priority was to capture all official government websites. Using the A-Z list from USA.gov, I captured 405 websites from the legislative, executive and judicial branches as well as quasi-governmental websites. All the websites are available from this Zotero library.
Screen Shot 2013-10-20 at 6.34.38 PM

Next, I decided to capture the official social media websites used by the U.S. government. This part of the project was done by my colleague Anthony Cocciolo. We based the capture of social media sites on work previously done for the End of Term Harvest. Anthony wrote a script for a program that would crawl all the social media websites, import them into Zotero and capture their screenshots. The result is a library of 1356 government social media websites.
Screen Shot 2013-10-20 at 6.37.57 PM
The final step of this project is still in progress, and includes adding tags to all snapshots. These tags can allow future researchers to search the collection applying different filters such as shutdown status (Completely shut down, Available but not updated, No apparent change), by branch of government (Executive, Executive Office of the President, etc) and by agency (Dept. of Labor, Interior, etc.). These tags must be added manually and I will continue to do so over the next few weeks.

While we recognize this is not a true archive, we hope this capture will help those who are interested in learning more about the status of government website during the shutdown.


The End of Term Harvest: An Abundant Crop

I am so proud to be featured today on Signal, the Library of Congress
Digital Preservation blog, for the work I did with my students contributing content to the End of Term harvest.

I want to thank all the students who worked on this project: Laural Angrist, Leo Bellino, Denis Chaves, Megan Fenton, Eloise Flood, Shanta Gee, Lucia Kasiske, Mike Kohler, Emily Lundeen, Julia Marden, Joan, Erin Noto, Lauren Reinhalter, Megan Roberts, Malina Thiede and Rachel Wittmann (who provided the title for this blog).

You can read the Signal blog here:


EOT Harvest, part V: By the numbers

No report is complete without the numbers, and our numbers are in.
In all, the 16 students in the class nominated 1513 social media sites. Many more were reviewed but not all made the cut, either because they were out of scope, required password to access or had some problem associated with them.

Total 1513 sites nominated
Average 92 nominations per student
Range: 53-275 nominations

In terms of social media sites, the three leading sites are:
Facebook (32%)
Twitter (30%)
YouTube (17%)

In addition to these, there is a very long tail of over twenty social media sites that included both well-known and lesser-known sites such as Tumblr, Pinterest, Vimeo, Picasa and slideshare.

The leading agencies were
Dept. of Defense, State Department, Department of Agriculture and NASA

The limitation of numbers being what they are, I will begin with the disclaimer – the number apply only for this project and the way in which it was managed. I would be hard pressed to venture and guess the total number of social media site the federal government maintains, or their distribution.

Overall students were impressed by the wide use and variety of information they found. As one student said, “The kind of information that each one produces is quite a bit more extensive and somewhat more focused than I realized.”
Students were surprised by the widespread use of social media in government, including by agencies that traditionally avoided interaction with the general public. The Secret Service uses social media extensively for public relations and marketing services, and without a doubt J. Edgar Hoover is turning in his grave.
And while marketing and public relations constitutes a major use of social media, we did also see some creative uses as well. For example, the Central Texas Dept. of Veteran Affairs Tweet their daily lunch menu (for Thanksgiving: Lunch: Rst Turkey/Gravy, Cranberry Sce, Cornbread Dressing, Sweet Potatoes, Green Beans Supreme, Dinner Roll, Pumpkin Pie, Coffee/Tea) and the Border Patrol uses Pinterest for public education.

Menu of what’s being served at Central Texas VA Center


The project helped students understand and often appreciate, the role of social media in communicating with the public. We discovered that almost every agency searched had a Twitter account. As the work coincided with hurricane Sandy in New York one student observed: “During the hurricane, I was without power and relied heavily on Twitter for information from the City and Con Edison. After that experience (and after being told by some friendly police officers that it was where they were getting all their information), I understood why the federal government would rely so heavily on Twitter as opposed to other social media outlets. “

Most directly, in terms of supporting the course goals, students learned a lot about the information sources of the federal government and the limitations of social media.
There were also indirect lessons. Many students felt this project made them more aware of the work of government. As one student said “this project inspired me to become a more informed citizen” and some drew broader conclusions about the shifting role of government, from making information available to actively trying to communicate information directly to citizens.

With the conclusion of the project, I would like to thank all the student who participated in the End of Term harvest: Laural Angrist, Leo Bellino, Denis Chaves, Megan Fenton, Eloise Flood, Shanta Gee, Lucia Kasiske, Mike Kohler, Emily Lundeen, Julia Marden, Joan, Erin Noto, Lauren Reinhalter, Megan Roberts, Malina Thiede and Rachel Whittmann


EOT Harvest, part IV: Problems encountered

Once the nomination process got under way, problems arose, but each problem provided a learning opportunity for students. Some questions we encountered include:
The status of quasi agencies. Some quasi agencies, like the Smithsonian were considered within scope; but others were not quite as clear. For example, the John F. Kennedy Center for the Performing Arts is considered a quasi agency. Eventually, however, we determined that it is out of scope for the EOT harvest. The EOT team ran this by Brewster Kahle who determined that “even though funded by government, this is pushing the boundaries a bit hard for this go round from a policy perspective”.
What is social media? Narrowly defined, “social media” refers to websites with interactive user-generated content that allow people to communicate with one another. While Facebook, Twitter and YouTube were clearly within scope, we debated about whether or not to include blogs, such as the blog by Chairman of the Joint Chiefs of Staff. Our concern was that most blogs contained a low level of interactivity and functioned more like press releases. After some consideration we decided to include blogs since their purpose is to communicate directly and foster unmediated communication with the public.
A similar question came up regarding podcasts such as this one from the USGS, which we decided to nominate since the USGS includes podcasts in their main site for social media
Restrictions on public access: Only websites that are accessible to the general public without a password were considered within scope for the EOT harvest. However, we did encounter some websites with tiered access, specifically Linked-in sites. For example, the Linked-in page of the Peace Corps is publicaly accessible but the tab for “employee insights” requires a password. Since we were unable to separate the sections we decided against nomination.

As I write this we are two days away from the elections. Our goal is to complete nominations by election-day. Hurricane Sandy, which directly affected our area, left many students without power. We coordinated our efforts and students without internet access took over for other students in the group to get the work done on time. I am encouraged by the collaborative spirit of the students who did not hesitate to contribute above their quota so that the work gets done.
How much work exactly – stay tuned for the next post which will provide the numbers.


EOT Harvest, part III: Getting to work

Before I begin describing our workflow, here is a brief recap of the previous blog posts:
Students taking the course in government information sources this semester are working to assist in capturing social media websites of the federal government. These websites will be archived by The Internet Archive.

After a presentation by the EOT team, we began to chart our course. The sixteen students in the class divided into four teams. Each team has at least one member that is versed in social media, and in many cases, more than one. Students selected names for their teams. While the names had no operational function, they were instrumental in building team spirit and generating jokes. We have the L4 nominators, with all members having first names that begin with L, the Obama-net Preservation Society, and two groups that paid homage to Adelaide Hasse, and they are Team Adelaide and the Hasse Harvesters.

The basic workflow was simple: divide the government agencies into four parts and have each team examine all the agencies in one part, locating which social media websites they use. Each team created a spreadsheet which lists the URL and other information requested on the nominating tool. Once I review and approve each nomination, students can begin live nominations.

The plethora of available social media websites meant that we had to narrow the scope and focus on the main venues: Facebook, Twitter, and YouTube received first priority, followed by LinkdIn and Pinterest. Student were free to add other social media sites, and many students nominated agency blogs and some lesser-known sites like GitHub, a site for sharing and collaborating on computer code.

We debated over which agency list to use. While the most comprehensive and authoritative list is probably the U.S. Government Manual we chose to work from the A-Z agency list available on USA.gov, which offers a simpler interface and concise information.

There are approximately 500 agencies listed, so each team was assigned 125 agencies. In turn, each team divided the agencies amongst their own members, resulting in approximately 31 agencies per student. While the plan was to distribute the workload evenly among students, this proved to be all but impossible. Some agencies are self contained and do not branch out or down, for example, the Office of the Architect of the Capitol; whereas others have a deep organizational structure where each division in the chart includes additional sites that need to be checked. A classic example would be the State Department with embassies throughout the world, each with its own social media presence. Following a discussion with students who drew these short straws, we decided that students will select a few embassies to nominate for the harvest, focusing on contentious locations such as Afghanistan or Syria and not nominate countries like Canada and Finland.

In addition to agency website, we were asked to nominate social media of elected individuals currently serving in the Senate or the House, who are not running to reelection, and this list too was divided between the four groups.

Once students began work, we began to encounter problems regarding the status of agencies (are they official or quasi-official?) and multiple pages on a particular social media website.

Detailed descriptions of the problems we encountered will follow in the next post.

And a final note, I would love to receive your comments or thoughts on this project, you can leave comments here or email me.


Follow

Get every new post delivered to your Inbox.

Join 950 other followers