Executive Summary
In 2016 The Andrew W. Mellon Foundation awarded CRL funding to develop an “integrated, self-sustaining, international cooperative framework to support area and international studies (AIS).” The chief goal of the Global Collections Initiative is to expand electronic access to primary source documentation and data from major world regions, where the information landscape differs from that in the U.S. and Western Europe. The initial phase of the project focused on one region: Latin America and the Caribbean. A major focus of the initiative has been access to materials existing only in digital form. This report evaluates efforts in the U.S. to archive open web content from the Caribbean and Latin America for future use by researchers.
The web is booming throughout this world region and has become a fertile resource for research and publication across the disciplinary spectrum. At the same time, the ephemerality of web content, the result of deletion, migration, alteration, or adulteration, collectively known as “reference rot,” has led to a crisis in scholarly communication. Conducting, sharing, and reading web-based research is “like trying to stand on quicksand.” While largely resolved for journals, this crisis still needs to be addressed for the great variety of open web resources.
Part I of this report introduces both systemic and region-specific issues of web use (and abuse) which are at the root of the problem; while Part II, using the example of a hypothetical scholar researching the Landless Workers Movement in Brazil, shows how these issues impact actual research in ways unimaginable in the pre-web era.
In Part III, three prominent area studies–relevant archival programs are examined: the Library of Congress Web Archiving Program (LCWA); Columbia University’s Human Rights Web Archive (HRWA); and then two programs at the University of Texas at Austin, the Latin American Government Documents Archive (LAGDA) and the Human Rights Documentation Initiative (HRDI). For each, we describe: history; governance; scoping and selection; metadata and search; inhouse use analysis; and self-assessed challenges and future hopes. In Part IV, we consider what external evidence exists that these archives are actually being used. We discover that “views” do not necessarily translate into scholarly citations, but that methodological problems make any comprehensive use analysis—recognized by the National Digital Stewardship Alliance as a high priority—difficult at this time. Above all, lack of consensus on how and even whether to cite from web archive as sources compromises any analysis of their research relevance.
Finally, in Part V, this report describes opportunities for moving ahead. These include efforts to standardize metadata across the library/archives divide; to develop better finding aids and expose them to web crawlers; to improve citation standards for web content—especially in popular style manuals and bibliographic software used by students, scholars, and publishers; to introduce certification standards for web archives, thereby enhancing their credibility among skeptical scholars; to push education, outreach, and exchange outside the bubble of web archivists both on campus and beyond, for example at discipline-specific professional meetings; to promote data mining and whole-collection analysis; and finally to advance inter-institutional—and international—collaboration, leveraging the strengths of multiple partners.