Last night I cobbled together pastpages2gif, a command-line tool that pulls images from the new PastPages API and combines them into an animated GIF.
Right now, you’ll have to know a little Python to get it going, but if it proves useful it could grow into something for anyone to use via a web interface. The GIF at the top of this post was made like so:
More examples and a copy of the code are at https://github.com/pastpages/pastpages2gif. If you see anything that sucks or have an idea for improvements, please email me or file a ticket.
Credit for the idea goes to PastPages users who impressed me with GIFs of their own, including Jeremy Singer-Vine. Andrei Scheinkman, and Zachary M. Seward.
And please keep hacking on that new API!
I’m happy to announce the launch of the PastPages API, which offers a machine-readable version of the site that programmers can use to mine our homepage archive.
You can easily get a list of all the sites we track, see the latest homepages from France, find out how Xinhua covered New Year’s Eve or any other query you can dream up.
The data are published in JSON, JSONP, XML and other popular formats. Documentation is available at http://www.pastpages.org/api/docs/.
While the API is currently free and requires no registration, access is throttled and the system’s structure is likely to change in the future. It was developed using django-tastypie and follows its common conventions.
If you encounter any problems, please contact me via email or file a ticket. Try it out. Tell me what sucks.
Today marks the release of the second generation of PastPages' code base, nicknamed “bradlee.” The screenshotting system has been rewritten to make it faster and cheaper by shedding dependencies and introducing a task queue. Here's a quick rundown:
The result is that a significantly less powerful server now completes a screenshotting run in half the time the old server did before. That saves money in addition to time.
All of the code is open source on GitHub with the entire deployment route included as a Chef cookbook. Patches welcome!
Before at 6pm CDT
After at 8pm CDT
For a few minutes this morning, CNN ran an incorrect headline declaring that a significant part of President Barack Obama’s healthcare law had been struck down by the Supreme Court.
I captured the image above manually several minutes ago. CNN has already corrected the page with a new headline.
The error was missed by PastPage’s hourly script, which visits CNN once per hour. The last visit happened at 10:02 AM EDT, before CNN made a judgement. When it visits again in the next hour, the error will certainly still be gone.
This shows just how quickly news sites can change the framing of stories and proves that even PastPages’ hourly screenshot is wanting. One of my goals with the future development of the site is to increase how frequently it captures data. Al Shaw has suggested we allow for instant on-demand archival when a human spots an error that ought to be captured.
@palewire PastPages totally needs a GO NOW button that you can mash when shit gets crazy— Al Shaw (@A_L) June 28, 2012
If you’re a developer and you’d like to help make this happen, all of the code is open on GitHub and I’d welcome your contributions.
This morning the United States Supreme Court issued a split decision on the legality of a hardline immigration law adopted by the state of Arizona. Four of the law’s provision were reviewed, but only three struck down, according to Kevin Russell at SCOTUSblog.
English-language news outlets in the U.S. and Britain jumped on the news, but disagreed on how to frame the results. Some emphasized that much of the law went down. Others emphasized the survival of a part of the law that, according to the Los Angeles Times, will allow “state officials to begin enforcing a provision that calls on police, when making lawful stops, to check the immigration status of people who may be in the country illegally.”
Fox News and the Los Angeles Times are examples of a “glass three-quarters empty” frame.
Reuters and BBC are examples of the “glass quarter full” frame, framing the news as good news for its supporters.
You can review all of the homepages archived by PastPages for that same hour right here.
Also, the Los Angeles Times is my employer, but in no way associated with PastPages, which I maintain on my own time with the support of a network of individual donors. Read all about it.
Update: Soon after, Reuters changed its play, opting for more ambiguous frame with this revised headline.
PastPages now offers an advanced search that allows users to quickly pull up screenshots from any date range by title or tag. Try it out. Tell me what sucks.
Knight News Challenge Round 2: The Mapping L.A. API -
Check out my Knight News Challenge pitch related to my day job at the Los Angeles Times
PastPages profiled by the Voice of America's Learning English Technology Report
PastPages now publishes automatically generated citations for every screenshot. Visit a screenshot’s detail page and simply click on the new “citations” button to see a popup like the one pictured above.
It currently provides drafted citations in MLA, APA and Chicago styles. It also provides Wikipedia citation markup that can be immediately pasted into an entry and used as a reference.
My hope is that this will make it easier for scholars, both professional and amateur, to use PastPages. The style above was copied from instructions at Purdue University’s Online Writing Lab. If you see any errors, please let me know or file a ticket on GitHub. If you’re interested in how it’s implemented, you can see the code that makes this work here.