Introducing StoryTracker 1.0
Today marks the official release of version 1.0 of StoryTracker, a set of open source tools for archiving and analyzing news homepages.
The project started in June thanks to funding from the Reynolds Journalism Institute, a research and development center based at the University of Missouri.
Today I’m presenting the result of our work at “Dodging the Memory Hole,” a conference for archivists and professionals hosted at the Institute’s headquarters in Columbia, Missouri.
Because the entire codebase is free and open source, you can run wild with the code already published on GitHub and distributed via the Python Package Index.
It offers a menu of options, documented here, for creating an orderly archive of HTML snapshots, extracting hyperlinks with a bonus set of metadata that captures each link’s prominence on the page and visualizing a page’s layout with animations that show changes over time.
Our work is far from complete, but you can see what is possible today in the slide deck above.
It’s already been integrated into PastPages to archive the HTML from a select set of sites, and I would like to soon expand to offering automated analysis of those homepages.
All of this could benefit greatly from your ideas, critiques, bug reports and, most of all, patches. And if you have any thoughts you’d like to share privately, please email me at ben.welsh@gmail.com.