Before I changed the focus of my Knight journalism project (more on that in the coming weeks), I spent the first six months of my fellowship learning a lot about the state of digital news archives. In fact, this was my original innovation proposal for my Knight Fellowship application.
TL,DR: How can we preserve and analyze digital news archives to better cover our communities?
I interviewed dozens of historians, archivists, librarians, journalists and executives, who care about preserving the news, but no one has it quite figured out. Here are a few of the challenges, and naturally the opportunities, for journalists and news organizations to consider:
Historical vs. digital preservation. In the past, archiving the news was relatively straightforward. Newspapers, radio and TV broadcasts were a frozen moment in time. Once they were published, or hit the airwaves, they could no longer be altered. Companies, such as LexisNexis, ProQuest, Factiva, and Merlin One, and memory institutions, like the National Digital Newspapers Program at the Library of Congress, are refining the process of digitizing physical newspapers, but at least there’s a process. When it comes to born-digital content, it’s a bit more complicated.
The nature of digital is dynamic. Stories are constantly updated and delivered in a variety of formats. At what point do we preserve them? Is it when they’re first published online? Or later when the news dies down and the story is more complete? How do you capture tweets, Vines, Instagrams, interactives, links, comments, ads, surveys, quizzes, and other types of content?
A few organizations have taken a stab at it. The Internet Archive’s Wayback Machine, for instance, takes snapshots of millions of webpages overtime so you can see what the NYT’s homepage looked like in 1996 compared to what it is now. But you can’t search for specific news stories. Newsdiffs is a neat tool that tracks changes in articles from the NYT, WaPo, CNN, Politico and BBC, but it has yet to include videos, photos and other forms of multimedia. Wikipedia addresses the Internet’s revisionist tendencies by documenting user edits in its “view history” tab. Then there’s the Knight Foundation-backed Digital Public Library of America, which is digitizing and visualizing historical collections, but they aren’t centered around news. If journalism’s mission is to document our lives, how can we preserve our journalism?
Newsroom culture and priorities. In the past eight months, the Bay Area Guardian, GigaOm, the Bold Italic and Homicide Watch DC (just to name a few) have ceased to exist. The financial conundrum of running a news organization is real. Newsroom leaders are still wrestling with how to make journalism sustainable, and hopefully profitable, but the drive to measure an immediate ROI doesn’t allow for experimentation and discovery, especially when it comes to reimagining news archives. It encourages newsrooms to revert to what they know and accept things the way they are. The culture needs to change.
I’m not the only one who believes this, but a key competitive advantage between legacy news organizations and digital news startups is the depth of their institutional knowledge. Local journalists have been covering their beats and communities for decades, producing stories, photos and other forms of multimedia all along the way. That’s a lot of data, which if structured correctly, could be valuable to reporters and residents alike. How we can leverage that inherent strength? The NYT’s Cooking collection is a taste (pun intended) of how to surface, showcase and monetize archive stories. The LAT developed a similar recipes section too. Legacy news organizations are sitting on a trove of content that could evolve into a range of potential products, but it requires a shift in newsroom culture and priorities to create or adopt something that never existed before.
Structured journalism. Championed by Reg Chua, the executive editor at Reuters, and Bill Adair, the creator of Politifact, structured journalism is a movement “to change the way we create content so as to maximize its shelf-life, as well as structuring – as much as possible – the information in stories, at the time of creation, for use in databases that can form the basis of new stories or information products.” Essentially, how can we re-think how we produce stories and present them in different ways? Spaceprob.es, Emergent and Event Registry are just a few projects that have been mentioned on the structured journalism listserv. Why does this matter? How we structure our stories is connected to the value we can derive from our archives. Imagine if we can navigate through our own content in visual ways. How could that help editors make more informed decisions about news coverage? How quickly can reporters learn a new beat, historically contextualize their coverage, and generate new story ideas?
So, what’s next? I’m collaborating with a co-conspirator, Tiago Etiene, a programmer based in SF, who’s equally interested in reaching out to news organizations and testing our hypothesis. We believe digital news archives are a source of untapped data and a natural competitive advantage for news organizations, but its full potential has yet to be realized. We want to build a tool that can help journalists leverage their institutional knowledge.
If you’re a news outlet that’s game for experimenting (at no financial cost), please reach me at yleow [at] stanford [dot] edu.
And if you’re a designer who nerds out about news, history or datavisualizations, please shoot me a note.
The more we test in this space, the more we’ll learn. The Knight Foundation has been actively funding libraries in an effort to “build more knowledgeable communities,” but it’s no coincidence that they’re investing in institutions dedicated to preserving the past. The Educopia Institute is hosting a conference, Dodging the Memory Hole II: An Action Assembly, from May 11-12, 2015 at UNC to bring together news publishers, press associations, technologists, researchers, libraries, corporations and funding agencies to tackle the challenge of preserving digital news content.
My time at Stanford officially wraps up on June 5, but it’s not over. If there’s anything I learned this year, it’s that this project, and all worthwhile ideas, are a constant work in progress.
A special thanks to:
James Robinson, New York Times
Evan Sandhaus, New York Times
Liz McClure, JSK fellow 2012
Jeremy Hay JSK fellow, 2015
Michael Morisy JSK fellow, 2015
Zena Barakat JSK fellow, 2015
Donna Borak JSK fellow, 2015
Christina Passariello JSK fellow, 2015
Akoto Ofori-Atta JSK fellow, 2015
Anne Kornblut JSK fellow, 2015
Charla Bear JSK fellow, 2015
Leigh Pointinger, San Jose Mercury News
Carlos DelaSerna JSK fellow, 2014
Michelle Price, Associated Press
Amy Wang, Arizona Republic
Tom Huang, Dallas Morning News
Michelle Holmes, Alabama Media Group
Frank Shyong, LAT
Matt Stevens, LAT
Paolo Carretta, Universidade de São Paulo
Andy Waters, Columbia Daily Tribune
Clea Benson, Bloomberg News
Peter Rippon, BBC
Cary Schneider, LAT
Ted Han, Document Cloud
Trei Brundrett, Vox Media
Logan McClure, Palantir
Miguel Paz, Poderopedia
Steve Jones, SF Bay Guardian
Mark Bieschke, SF Bay Guardian
Anne Wooton, Pop-up Archive
Bailey Smith, Pop-up Archive
Kathleen Hansen, University of Minnesota
Nora Paul, University of Minnesota
Edward McCain, Donald W. Reynolds Institute
Victoria McCargar, Mount St. Mary’s College
David Hansen, UNC Chapel Hill
Jonathan Kalan, Timeline
Heather Corcoran, Colloq
Zach Kaplan, Colloq
Paul Quinn, Minezy
T. Christian Miller, ProPublica
Deborah Thomas, Library of Congress
Kenny Whitebloom, Digital Public Library of America
David Riordan, New York Public Library
Abigail Grotke, Library of Congress
Amy Rudersdorf, Digital Public Library of America
Gretchen Gueguen, Digital Public Library of America
Matt Galligan, Circa