PastPages is the news homepage archive.

Every hour it captures a snapshot of the top stories reported by news organizations around the world.

This blog is a selection from its files by Editor and Publisher Ben Welsh.

The Seattle Times homepage, animated by pastpages2gif.

New tool makes animated GIFs from the PastPages homepage archive

image

Last night I cobbled together pastpages2gif, a command-line tool that pulls images from the new PastPages API and combines them into an animated GIF.

Right now, you’ll have to know a little Python to get it going, but if it proves useful it could grow into something for anyone to use via a web interface. The GIF at the top of this post was made like so:

More examples and a copy of the code are at https://github.com/pastpages/pastpages2gif. If you see anything that sucks or have an idea for improvements, please email me or file a ticket.

Credit for the idea goes to PastPages users who impressed me with GIFs of their own, including Jeremy Singer-Vine. Andrei Scheinkman, and Zachary M. Seward.

And please keep hacking on that new API!

Say hello to the PastPages API

image

I’m happy to announce the launch of the PastPages API, which offers a machine-readable version of the site that programmers can use to mine our homepage archive.

You can easily get a list of all the sites we track, see the latest homepages from France, find out how Xinhua covered New Year’s Eve or any other query you can dream up.

The data are published in JSON, JSONP, XML and other popular formats. Documentation is available at http://www.pastpages.org/api/docs/.

While the API is currently free and requires no registration, access is throttled and the system’s structure is likely to change in the future. It was developed using django-tastypie and follows its common conventions.

If you encounter any problems, please contact me via email or file a ticket. Try it out. Tell me what sucks.

Second-generation PastPages code base is all the way live

image

Today marks the release of the second generation of PastPages' code base, nicknamed “bradlee.” The screenshotting system has been rewritten to make it faster and cheaper by shedding dependencies and introducing a task queue. Here's a quick rundown:

  • Firefox -> Webkit
  • Selenium -> PhantomJS
  • Xvfb headless server -> Nothing!
  • One-by-one screenshot script -> Concurrent Celery queue
  • Memcached -> Varnish

The result is that a significantly less powerful server now completes a screenshotting run in half the time the old server did before. That saves money in addition to time. 

All of the code is open source on GitHub with the entire deployment route included as a Chef cookbook. Patches welcome!

Chicago Tribune redesign, before and after

Before at 6pm CDT

After at 8pm CDT

For a few minutes this morning, CNN ran an incorrect headline declaring that a significant part of President Barack Obama’s healthcare law had been struck down by the Supreme Court. 

I captured the image above manually several minutes ago. CNN has already corrected the page with a new headline.

The error was missed by PastPage’s hourly script, which visits CNN once per hour. The last visit happened at 10:02 AM EDT, before CNN made a judgement. When it visits again in the next hour, the error will certainly still be gone.

This shows just how quickly news sites can change the framing of stories and proves that even PastPages’ hourly screenshot is wanting. One of my goals with the future development of the site is to increase how frequently it captures data. Al Shaw has suggested we allow for instant on-demand archival when a human spots an error that ought to be captured.

If you’re a developer and you’d like to help make this happen, all of the code is open on GitHub and I’d welcome your contributions.

Media split on how to frame decision on Arizona’s controversial immigration law

This morning the United States Supreme Court issued a split decision on the legality of a hardline immigration law adopted by the state of Arizona. Four of the law’s provision were reviewed, but only three struck down, according to Kevin Russell at SCOTUSblog.

English-language news outlets in the U.S. and Britain jumped on the news, but disagreed on how to frame the results. Some emphasized that much of the law went down. Others emphasized the survival of a part of the law that, according to the Los Angeles Times, will allow “state officials to begin enforcing a provision that calls on police, when making lawful stops, to check the immigration status of people who may be in the country illegally.”

Fox News and the Los Angeles Times are examples of a “glass three-quarters empty” frame.

Reuters and BBC are examples of the “glass quarter full” frame, framing the news as good news for its supporters.

You can review all of the homepages archived by PastPages for that same hour right here.

Also, the Los Angeles Times is my employer, but in no way associated with PastPages, which I maintain on my own time with the support of a network of individual donors. Read all about it.

Update: Soon after, Reuters changed its play, opting for more ambiguous frame with this revised headline.

Advanced search by title, tag and date range

PastPages now offers an advanced search that allows users to quickly pull up screenshots from any date range by title or tag. Try it out. Tell me what sucks.

Knight News Challenge Round 2: The Mapping L.A. API 

Check out my Knight News Challenge pitch related to my day job at the Los Angeles Times

Next page Something went wrong, try loading again? Loading more posts