Miskatonic University Press

Notifying the Internet Archive when a new post is published

code4lib indieweb jekyll

I saw a mention of the IndieWeb idea of notifying the Internet Archive when a new page is posted.

Trigger an Archive

You can tell archive.org to crawl and archive a specific URL immediately.

$ curl -I -H "Accept: application/json" http://web.archive.org/save/{url to archive} | grep Content-Location

and you'll get a response like:

Content-Location: /web/20160715203015/http://indieweb.org

The response includes the path to the archived page on web.archive.org. Append this path to http://web.archive.org to build the final URL for the archived page.

I use Jekyll for this site, and I manage building and publishing with a Makefile. I added this trigger to it, and now the publish part looks like:

publish:
        rsync --archive --compress --itemize-changes /var/www/miskatonic/production/ myhostingsite:public_html/miskatonic.org/
        curl --head --silent --header "Accept: application/json" http://web.archive.org/save/www.miskatonic.org/ | grep Content-Location
        notify-send "Web site is now live"

Now, this just tells the Internet Archive to get my site’s home page. It doesn’t specify which pages have been added and/or updated. That would require keeping track of all the site’s content and checking for differences every time I publish, which is certainly possible, but would require making a new plugin. Adding one line to the Makefile is far easier and gets 95% of the work done.