The PastPages Web Log

Feb 04

Second-generation PastPages code base is all the way live

image

Today marks the release of the second generation of PastPages’ code base, nicknamed “bradlee.” The screenshotting system has been rewritten to make it faster and cheaper by shedding dependencies and introducing a task queue. Here’s a quick rundown:

The result is that a significantly less powerful server now completes a screenshotting run in half the time the old server did before. That saves money in addition to time. 

All of the code is open source on GitHub with the entire deployment route included as a Chef cookbook. Patches welcome!

Jun 29

Chicago Tribune redesign, before and after

Before at 6pm CDT

After at 8pm CDT

Jun 28

For a few minutes this morning, CNN ran an incorrect headline declaring that a significant part of President Barack Obama’s healthcare law had been struck down by the Supreme Court. 
I captured the image above manually several minutes ago. CNN has already corrected the page with a new headline.
The error was missed by PastPage’s hourly script, which visits CNN once per hour. The last visit happened at 10:02 AM EDT, before CNN made a judgement. When it visits again in the next hour, the error will certainly still be gone.
This shows just how quickly news sites can change the framing of stories and proves that even PastPages’ hourly screenshot is wanting. One of my goals with the future development of the site is to increase how frequently it captures data. Al Shaw has suggested we allow for instant on-demand archival when a human spots an error that ought to be captured.

@palewire PastPages totally needs a GO NOW button that you can mash when shit gets crazy
— Al Shaw (@A_L) June 28, 2012
If you’re a developer and you’d like to help make this happen, all of the code is open on GitHub and I’d welcome your contributions.

For a few minutes this morning, CNN ran an incorrect headline declaring that a significant part of President Barack Obama’s healthcare law had been struck down by the Supreme Court. 

I captured the image above manually several minutes ago. CNN has already corrected the page with a new headline.

The error was missed by PastPage’s hourly script, which visits CNN once per hour. The last visit happened at 10:02 AM EDT, before CNN made a judgement. When it visits again in the next hour, the error will certainly still be gone.

This shows just how quickly news sites can change the framing of stories and proves that even PastPages’ hourly screenshot is wanting. One of my goals with the future development of the site is to increase how frequently it captures data. Al Shaw has suggested we allow for instant on-demand archival when a human spots an error that ought to be captured.

@palewire PastPages totally needs a GO NOW button that you can mash when shit gets crazy

— Al Shaw (@A_L) June 28, 2012

If you’re a developer and you’d like to help make this happen, all of the code is open on GitHub and I’d welcome your contributions.

Jun 25

Media split on how to frame decision on Arizona’s controversial immigration law

This morning the United States Supreme Court issued a split decision on the legality of a hardline immigration law adopted by the state of Arizona. Four of the law’s provision were reviewed, but only three struck down, according to Kevin Russell at SCOTUSblog.

English-language news outlets in the U.S. and Britain jumped on the news, but disagreed on how to frame the results. Some emphasized that much of the law went down. Others emphasized the survival of a part of the law that, according to the Los Angeles Times, will allow “state officials to begin enforcing a provision that calls on police, when making lawful stops, to check the immigration status of people who may be in the country illegally.”

Fox News and the Los Angeles Times are examples of a “glass three-quarters empty” frame.

Reuters and BBC are examples of the “glass quarter full” frame, framing the news as good news for its supporters.

You can review all of the homepages archived by PastPages for that same hour right here.

Also, the Los Angeles Times is my employer, but in no way associated with PastPages, which I maintain on my own time with the support of a network of individual donors. Read all about it.

Update: Soon after, Reuters changed its play, opting for more ambiguous frame with this revised headline.

Jun 14

Advanced search by title, tag and date range

PastPages now offers an advanced search that allows users to quickly pull up screenshots from any date range by title or tag. Try it out. Tell me what sucks.

Jun 13

Knight News Challenge Round 2: The Mapping L.A. API -

Check out my Knight News Challenge pitch related to my day job at the Los Angeles Times

(Source: newschallenge2)

Jun 11

PastPages profiled by the Voice of America's Learning English Technology Report

PastPages now provides automatic citations

PastPages now publishes automatically generated citations for every screenshot. Visit a screenshot’s detail page and simply click on the new “citations” button to see a popup like the one pictured above.

It currently provides drafted citations in MLA, APA and Chicago styles. It also provides Wikipedia citation markup that can be immediately pasted into an entry and used as a reference.

My hope is that this will make it easier for scholars, both professional and amateur, to use PastPages. The style above was copied from instructions at Purdue University’s Online Writing Lab.  If you see any errors, please let me know or file a ticket on GitHub. If you’re interested in how it’s implemented, you can see the code that makes this work here

French parliamentary election viz from L’Express.

French parliamentary election viz from L’Express.

Jun 10

PastPages changes to global timestamps

The first versions of PastPages had only one user: Me. So I printed all the timestamps in Los Angeles time, since that’s where I live.

Now that approximately 50 percent of PastPages visitors come from outside the United States, that doesn’t make sense anymore. 

In response, I’ve tried to globalize how the site reports the time.

Where appropriate, the site now prints a relative timestamp that will be correct wherever it’s viewed. For example, the homepage now reports:

In other locations, the site now presents a default timestamp in the Coordinated Universal Time, also known as UTC or Greenwich Mean Time. It’s the international standard for this sort of thing. If you’re not familiar with it, it’s roughly the current time in London, though it does not change in the summer for daylight saving time. 

This may prove a little awkward at first, especially for U.S. users accustomed to the Internet catering to our vantage on the world. But I’ve tried to make it a little easier to swallow by also providing a publication’s local time, where appropriate. 

You can see this on site detail pages:

And on screenshot detail pages:

Also, wherever screenshots are grouped by date, the beginning and end of that day is now midnight in UTC time. This seems like the slipperiest thing to me, and I’d be interested to hear opinions on the best way to present that. Should these also be grouped by a publication’s local time?

This is new ground for me as a developer, so there’s a good chance I’ve overlooked a better solution or created a new problem. But I want to get this right, so if you have advice please contact me or file a bug report on GitHub.