Miskatonic University Press

Access 2009, Saturday #4: William J. Turkel, Hacking as a Way of Knowing


Access 2009 closed with an excellent talk by William J. Turkel, a historian at the University of Western Ontario. Check out the Lab for Humanistic Fabrication and an online book he co-wrote, The Programming Historian, "an open-access introduction to programming in Python, aimed at working historians (and other humanists)."

The video of the talk isn't online, but it probably will be soon, and I recommend you watch it or listen to it. Peter Zimmerman blogged about it, and I know there was a lot of talk in the IRC backchannel but I haven't seen other blog posts about it.

This was exciting stuff and a great talk to close the conference. I talked to Turkel after about this and that and really enjoyed the conversation. To exchange contact information we each held our Android phone up to the other's and scanned a QR code that contained all the details.

A note about the data compression and Kolmogorov complexity stuff mentioned below. I asked him about this after and this is my understanding. If you have a set of texts, for example biographies from the Dictionary of Canadian Biography, you can find groupings of related texts by using compression. Take text A and text B and compress them to A.bz2 and B.bz2 (using bunzip). Concatenate A and B to AB, and compress to AB.bz2, and then compare the size of AB.bz2 to A.bz2 + B.bz2. If there are a lot of words in common in A and B then AB will have duplications that will let the compression algorithm reduce the size of of the compressed file it generates. If AB.bz2 is smaller than A.bz2 + B.bz2 then you know bunzip has been able to do some extra compressing, so there's duplication in A and B, so they're related. Doing pairwise comparisons of compression ratios like this will group all the texts into clusters. For example, the articles about the Canadian Pacific Railroad and Sir Sanford Fleming will have a lot of overlap, but the article about Leonard Cohen will use different words, so when you compress and compare them this way and that the ratios will tell you that the first two are more related to each than either is to the third.

When he says that data compression is a proxy for Kolmogorov complexity, he means that if you have the source code for bunzip and a compressed text file, you have, roughly, the minimum amount of information you need to express the original text file. (Assume that bunzip's source code is as efficient as possible and that its algorithm is the most efficient we know.) I'd come across this idea of complexity before in reading some of Gregory Chaitin's work algorithmic information theory but I don't know much about it, and had never heard of this clustering and compression stuff. Very cool.

Here are my notes:

Research built on a framework of citations, but it's not presented that way, people write books as though it's a seamless whole. Footnotes there to plug one's work into everything else. Part of what historians do is to dig into other people's footnotes, to find things or to attack.

Works built on other works. Audience wants to talk back. Rich backchannel going on here at Access, for example. Scholars should do something with all of that audience/reader-generated stuff.

How can scholarly works be designed so that they can be hacked, remixed, reused in uncontrolled ways? That requires open source, open access.

What if history were written by anonymous people? Showed http://en.wikipedia.org/wiki/Hack as example. Wikipedia tip of iceberg of how we deal with authority etc. New form of culture, relevance, peer review, etc. Traditional forms of scholarship can be rethought so we can do peer review as networks.

Textual analysts would love Wikipedia. Every change is recorded and available for download. Showed History Flow diagram of history of edits of a Wikipedia page.

Another project: code_swarm. Take history of edits to a FLOSS project and visualize it as a movie. [I ran this on the VuFind source code a few months ago. It's easy to use and the results are wild. Try it out.]

When writing the history of the Apache project, for example, in two-three decades a historian would have all kinds of material available that was never available to historians before. Tens, hundreds, thousands of individuals networked together, exchanging information in time. Job of humanists is to make sense of that, what we do as human beings.

PW Anderson (Nobel) wrote short paper in Science (Vol. 177, No. 4047, 4 August 1972): More Is Different. At different scales, different laws kick in. He's against reductionism.

Compare paper vs digital. Reproducing paper costs about the same as producing it. Reproducing digital is zero cost.

Little attention paid to what scholarly communication will/should look like after the monograph.

Plebeian Lives. Will be able to follow about 40% of people who in 18th century London all through their lives, through all of the data that's been aggregatd about them. http://www.shef.ac.uk/hri/projects/projectpages/plebeianlives.html

Also mentioned Old Bailey project. http://www.oldbaileyonline.org/

Machine learning, users training machines to geneate better search results, machines learning from learners as they learn.

He's also doing viz with large data sets in humanities: "This is really awesome and it's pretty simple to do and it's surprisingly deep." Take compression algorithms (gzip, zip) and use them to do other cool things. Can use them to build a general purpose yardstick to see if two files are related or different. Use compressor's ability to find redundancies between texts. Used on Dict Cdn Bio it sorts them out into clusters: Jesuits, Frobishers, Quebeckers, all fall into groups. Data compression serves as a proxy for the Kolmogorov complexity.

http://en.semapedia.org/ Print out 2d bar codes that give a link to the Wikipedia entry for the place the code is stuck. Geotagging.

Smartphones have compasses and acceleromters built in. Augmented reality. He has an Android.

Merging of physical and virtual changing the way we understand place and the way we understand the past. Each thing in a place is a time traveler from the past.

Some scenes become contested: a crime scene, plane crash, tracks at a watering hole. Each scene has interpreters who can make sense of what it means. Now can attach virtual sources to physical objects. Every physical thing can become more inforamtin about itself.

Quoted Bruce Sterling. Every object can become the protagonist of a documented process.

What happens to the idea of a curated object when every object knows its own history?

The future of public history looks like the holodeck, says Turkel's students. In Star Trek, holodeck often used to simulate the past, eg Janeway being an Austen-era governess.

He tells students: Brainstorm gadgets, appliances that somehow magically dispense history. What would it be like if a tap on the wall could pour out history? But you can't reinvent the holodeck.

They come up with: Heritage Knitting Needles that remember every pattern they've ever knitted. Reverse Babel Fish. Put it on your ears and make it so that everyone around you is speaking Middle English or Ancient Greek or some language you don't know. Tangible Spray. Spray it around, it makes a cloud, and you can reach in and feel the past until the cloud has dissipated. [This reminded both Bess Sadler and me of Ubik.]

Star Trek world is a cradle-to-cradle universe. Everything exists for only a brief time, out of the replicator until they no longer need it and they recycle it where it's rereplicated as something else. What's the role of curation there? What about a Buddhist temple that's burned down and rebuilt every 20 years? (It's not been given heritage status because the object itself is not old enough, even though the ritual is thousands of years old.)

For way too long we've thought of computers as objects on desks. They let us become brains in vats if we want. We can use them to make virtual ghosts to haunt real places. They can let us manipulate physical objects and data at the same time.

Ubiquitous computing, fabrication. Student assignment: visualize the work of William Harvey. He used some ethically repugnant techniques in his work.

"Matter is the new medium." With computers we can make documents on our desks, and now it's getting so we can make objects on our desks. For less than $5,000 you can buy a laser scanner that will turn a 3D thing into a 3D model. They did that with a model heart for the Harvey project.

Digital data very plastic, reusable, scalable. They had a computer-controlled milling machine to print out physical objects. Using a computerized casting process means you can change the scale and size of the objects, which you can't with regular casting.

Loop around between virtual and tangible. Take something real, digitize it, change it, make a new real thing. We can take advantage of each half and make hybrids that invite hacking, open source, reshaping.

Most of Turkel's colleagues don't get all this stuff he's talking about. They think it's for scientists and engineers.

He runs hackfests for humanists. "Every since Levi-Strauss we've been celebrating the bricoleur, but it's another thing to be faced with a pile of waste and turn it into something else."

We need new spaces and tools to do all this. Kindergartens are better equipped for hands-on research than grad schools. Not too hard to buy a milling machine, harder not to poison your students or set the carpet on fire when you're soldering.

Role of hobbyists mostly written out of history of technology.

Mention of woman who has a home fabbing setup, she can make her own silicon chips. Toaster Project, guy making a toast from scratch, starting by mining iron.

RepRap, a machine that can replicate anything, including itself.


Ultimate goal: you have the scanner, the 3d printer, and then a recycling device all on your table.

What happens if you could make all of those plastic things in stores? What if you could give away the plans for them, as open source?

Instructables. People sharing how to do things.

Pachube. Mash up things in the Internet of things, feeds of realtime data open to everyone.

"If you look through a Victorian book of mechanisms ..." Now you can print out everything in that book to see how it works.

We can now print out things from past environments, handle them, feel them, use them to enrich the research process.