voxgov, first launched for subscription access in January 2014, is a unique “discovery platform” which aggregates a broad range of official and ephemeral information resources issued by individual representatives and organizations from all branches of the U.S. Federal Government, and links that content to publicly accessible government documentation.
voxgov developers indicated in 2014 that they access content from over 10,000 government website locations in continuous daily indexing, and cache web content including “information that is no longer retrievable.” They note that many governmental press releases, as well as social media posts, have “never before been aggregated, much less collected in real-time.”
voxgov harvests and indexes content from elected congressional members' “release sites” including official press releases, speeches, testimony, articles and editorial statements, as well as commentary posted by their offices on over 2,000 social media sites (Twitter, YouTube, and Facebook). Federal agency release sites yield news, announcements, reports and links to content on social media sites from executive agencies such as the Department of State, as well as judicial and regulatory information disseminated by various other federal government bodies. voxgov also indexes release sites of autonomous bodies such as the Voice of America and U.S. political parties.
Additionally the service indexes official documents such as the Congressional Record, Congressional Research Service (CRS) reports, the Federal Register, legislation (bills, amendments, resolutions, etc.), and congressional documents (including testimony transcripts and committee reports). In 2014 voxgov estimated that the primary source documents available elsewhere amounted to only around 13% of voxgov indexed content, or 1.28 million document files out of a total of 9.4 million. Results from these sources are reported in line with the other collected material, enabling users to track commentary and official responses to issues during all phases of the political, legislative and regulatory process. voxgov links out to some of these sources on external platforms, including open access content on Thomas.gov and the beta version of its successor site, congress.gov. Federal Register content is gathered from GPO.gov, with public inspection documents from Federal Register.gov.
Representing the published output of tens of thousands of U.S. Federal Government web locations, the resulting database yields an enormous amount of information. Voxgov indicates that more than 12,500 files are added daily. Content ranges widely in scope, from official statements by congressional representatives on legislation down to congratulatory messages to local Girl Scout troops. Inevitably, there is some redundancy in this content, as press releases are often picked up in news reporting. On the other hand, harvested content can also incorporate third-party sources, such as news article transcripts included on congressional representatives’ pages, or broadcast material reproduced on YouTube and embedded in harvested sites. voxgov applies rigorous indexing to each entry to aid filtering and navigation by source, topic, and content type. Tools built into the platform enable detection and analysis of trends in topics, and can be used to further parse the data according to the speaker’s background (see below).
voxgov plans an ambitious campaign of annual content growth, expecting to add five to eight million document files in the initial year following launch. Approximately five million of those documents (consisting of updated website entries, single page releases, and longer reports) will come from updates to resources already being captured by voxgov from existing sources (organizational entities); a million documents are expected to come from a broader definition of eligible resource types from those source entities; and an estimated two million documents gathered as new source entities are added.
While the voxgov platform was first released in 2014, it includes retrospective materials gathered since at least 2002 (the developers indicate that some materials date back to twenty years prior to launch). As new source entities and resource types are added, the developers seek to “include all associated available archives so they remain complete.” But specific data was not available to CRL on the full extent of pre-2002 material to be added, largely because meaningful data is difficult to obtain before the harvesting is done.” Some legislative documentation currently indexed by the resource in fact dates back earlier than the 1990s.
Given voxgov’s broad aims for harvesting content, the researcher may be left with the interesting methodological question of discerning what portion of congressional online and social media content the voxgov database will come to represent. The inherent difficulty in defining the entire U.S. federal government “domain” on the web has often been cited by authorities on preservation, most recently by James A. Jacobs in his March 2014 report Born-Digital U.S. Federal Government Information: Preservation and Access. This difficulty complicates the task of interpreting the term frequency data utilized in the interface.