Today’s Data Scraping Mini Episode of the Partially Derivative podcast appalled me. It’s about scraping data off web sites, and there’s lots of laughter and joking as Chris Albon relates how in preparation for his PhD comps he wrote a script to download thousands of articles from JSTOR. And sure enough he got in a little bit of trouble because he hadn’t read the terms of service, and the next day the university librarian passed the word down that all that had to stop immediately or there would be serious repercussions. More wry laughter at having escaped without anything worse.
All that and no mention, not one, of Aaron Swartz. I don’t know when Albon was doing the downloading, but from the dates on his MA and PhD it looks like it might have been after Swartz got arrested in January 2011 (and before he committed suicide in 2013). Even if it was before that, it astounds me that anyone now could talk about doing a mass download from JSTOR without mentioning Swartz. I can understand there wasn’t time to get into how messed up everything about scholarly publishing is, but a joking reminder about how it’s important to read terms of service trivializes important issues that everyone in this field should know and discuss.
I met Swartz very briefly one day in 2008, and wrote it up after he died.