Miskatonic University Press

ARL statistics visualized with R and Google Motion Charts

28 July 2011 r librarystats

I fixed a couple of things after the last post and now we can use motion charts to visualize all the Association of Research Library statistics using R and googleVis. I’ve got a small example below, set up so that it’s showing four particularly interesting things to start, and you can just press play to see something good:

  • along the y-axis: TOTSTU, total number of students at the universities (see Principles of Membership to see what it means to be in the ARL for a library and for its parent university; serious research will be happening there)
  • along the x-axis: FAC, number of faculty
  • colour of the circles: TYPE, blue for Canadian universities ©; green for private American universities (P); yellow for public state American universities (S)
  • size of the circles: TOTEXP, total expenditures of the libraries

(Again, if you’re viewing this through an RSS feed and not seeing a fancy graph with a lot of coloured circles on it, come to this page and try out the motion chart.)

R version 2.12.1 (2010-12-16) • googleVis-0.2.7Google Terms of Use

Notice how the green (private American) universities are lower down because they generally have fewer students. The shape of the line they make is closer to the x-axis because they generally have more faculty per student. The yellow (American state universities) are angled up much higher because they generally have more students and fewer faculty per student. The blue Canadian universities are mixed in with the yellow ones.

Pennsylvania State is a big outlier in the American universities, far up and to the right with lots of faculty and lots of students. U of Toronto is the outlier among Canadian universities—it’s the biggest in this country. When the graph stops in 2009 there are three large green circles with just over 2,000 faculty, running up the centre of the chart: Yale, Harvard and Columbia.

A related pair of variables to chart is TOTSTU vs PRFSTF (professional staff). A really interesting pair is EXPSER (expenditures for current serials) vs SERPUR (current serials purchased). EXPSER/SERPUR is how much a library spends per year on serials, and the motion chart of the last twenty years of EXPSER to SERPUR shows how crazy all this has become and why serials purchasing is such a problem for libraries now.

How to recreate this in R:

> arl <- read.csv("http://www.miskatonic.org/files/arl-1989-2009.csv")
> install.packages("googleVis")
> library(googleVis)
> arl.toplot <- subset(arl, subset = TYPE %in% c("C", "P", "S"),
select=c(YEAR, TYPE, INAM, FAC, TOTSTU, TOTEXP, VOLS, SERPUR,
TOTCIRC, PRFSTF, TOTSTF, EXPMONO, EXPSER, SALPRF))
> M <- gvisMotionChart(arl.toplot, idvar="INAM", timevar="YEAR")
> plot(M)

To keep this page small enough that my CMS could deal with it (there’s about 300K of data embedded in it) I’m only showing a very small subset of all available variables:

  • FAC: instructional faculty
  • TOTSTU: total full-time student enrolment
  • TOTEXP: total library expenditures
  • VOLS: volumes held
  • SERPUR: current serials purchased
  • TOTCIRC: total circulations
  • PRFSTF: professional staff (librarians and others)
  • TOTSTF: total professional and support staff
  • EXPMONO: expenditures for monographs
  • EXPSER: expenditures for current serials
  • SALPRF: professional salaries

Using the code above it’s easy to recreate this chart at home. If you do, you can leave out the select bit and it will graph all the variables. As well, the subset command picks out three kinds of institutions and leaves out national libraries like Library and Archives Canada and the Library of Congress, which in many ways aren’t comparable to university libraries, but you can leave out the subsetting to see what happens. To visualize the entire ARL data set, run this:

> arl <- read.csv("http://www.miskatonic.org/files/arl-1989-2009.csv")
> install.packages("googleVis")
> library(googleVis)
> M <- gvisMotionChart(arl, idvar="INAM", timevar="YEAR")
> plot(M)

(Aside from loading in the googleVis package, that’s three lines: get the data, prep the data, show the data. Powerful!)


Canadian library statistics visualized with R and Google Motion Charts

21 July 2011 r librarystats

Here’s an example of using the googleVis package for R, which makes it easy to use the Google Visualization API.

(If you’re reading this through an RSS feed and don’t see an interactive chart below, come see the full blog post. You’ll want to play with it.)

I already had some cleaned-up Association of Research Library statistics sitting around in arl-1989-2009.csv, and then all it took was this in R (and setting up canada.toplot could have been done in one line):

> install.packages('googleVis')
> arl <- read.csv("http://www.miskatonic.org/files/arl-1989-2009.csv")
> canada <- subset(arl, REGION == 10)
> canada.toplot <- canada[, c(1, 3, 39, 41, 42, 44, 55, 57, 66, 67, 70)]
> M <- gvisMotionChart(canada.toplot, idvar="INAM", timevar="YEAR")
> plot(M) # to plot it locally
> cat (M$html$chart, file="chart.html") # so I could include it here

The variable names are what the ARL uses. They are:

Try starting off with TOTSTU against FAC or TOTSTU against TOTCIRC, There are lots of other variables that could be plotted but to keep it manageable I just picked out some I thought would be interesting. Try turning on the log view and seeing how that changes things.

(It’s funny how Library and Achives Canada jumps in at the end out of nowhere with a very large number of staff. Did they just recently join the ARL? Is my data wrong?)


Zen problems

20 July 2011 drupal

The Drupal theme Zen, that is, not the branch of Buddhism. Earlier tonight I used drush to do some upgrades to this site and in the process the page layout got completely buggered. I use a theme based on Zen (a sub-theme, as it’s called), and it broke. Rob Casson told me that the Zen theme people had made some big changes and it turned out that upgrading Zen from 6.x-1.x to 6.x-2.x pretty much requires that you redo your subtheme. So I did that and things are more or less back to where they were.

I might migrate to WordPress. Drupal is overkill for what I do here. I’ve had the same theme for six years for one WordPress blog and never had the slightest problem with it, either.


Divisiveness, Communication Failure, and Boundary Wars as Tragicomedy

10 July 2011 talks theatre

Adam Taves and I were down in New Orleans last month to take part in Strange Bedfellows: IT and Reference Collaborations to Enhance User Experiences. It was a day-long workshop run by RUSA (the Reference and User Services Association) the day before the big American Library Association annual conference started.

We did Divisiveness, Communication Failure, and Boundary Wars as Tragicomedy, a revised version of After Launching Search and Discovery, Who Is Mission Control? which we performed at Access last October.

Bess Sadler projected on screen behind me

(Photo by Matt Critchlow of me “talking on videophone” to Bess Sadler. Matt and Dan Suchy were also part of the workshop and talked about two projects they’d worked on at UCSD, one of which they wrote up with Lia Friedman for the Code4Lib Journal: Using an Agile-based Approach to Develop a Library Mobile Website.)

We made some cuts and revisions to the script but the biggest change was the addition of new thoughtful and well-informed video clips from Bess Sadler as my fellow systems librarian and Sophie Bury, Patti Ryan and Lisa Sloniowski as reference and instruction librarians saying what they want from IT people. Everything is up in our institutional repository:

It’s all under a Creative Commons license Attribution-Share Alike license, which means you can perform the play yourself at your own library! If you do, please send a picture.

Picture of cat sleeping on front steps of a New Orleans house on a hot evening

Most of the cats I saw in New Orleans were moving pretty slowly or zonked out having a snooze. It was hot. I had a fantastic time. Thanks, New Orleans.


Seven Questions Over Breakfast with Kady MacDonald Denton

06 July 2011 kady.macdonald.denton

My mother, Kady MacDonald Denton, is the subject of a wonderful interview: Seven Questions Over Breakfast with Kady MacDonald Denton.

It’s part of a series at Seven Impossible Things Before Breakfast, a blog about children’s books. The interview is profusely illustrated with pictures of my mother’s books and art and studio and it’s well worth a look.

Here’s a stop-motion video (no sound) of her drawing Bear from the popular Bear and Mouse books:


Usability Testing of VuFind at an Academic Library

14 June 2011 publications

My fellow York University librarian Sarah J. Coysh and I have a paper just out in Library Hi Tech vol. 29 no. 2: “Usability Testing of VuFind at an Academic Library” (DOI: 10.1108/07378831111138189). It does what’s on the tin: it’s about usability testing of VuFind as implemented as the York University Libraries catalogue.

Our postprint of the article is available in our institutional repository: Usability Testing of VuFind at an Academic Library. It’s the same as the printed version except for the formatting.

Here’s the abstract:

Purpose – The purpose of this paper is to present the findings of an academic library’s implementation of a discovery layer (VuFind 1.0 RC1) as a next-generation catalogue, based on usability testing and an online survey.

Design/methodology/approach – Usability tests were performed on ten students (eight undergraduates, two graduates), asking a set of 14 task-oriented questions about the customized VuFind interface. Task completion was scored using a simple formula to generate a percentage indicating success or failure. Changes to the interface were made based on resulting scores and on feedback and observations of users during testing. An online survey was also run for three weeks, to which 75 people responded. The results were analyzed, compared and cross-tested with the findings of the usability testing.

Findings – Both the usability testing and survey demonstrated that users preferred VuFind’s interface over the classic catalogue. They particularly liked the facets and the richness of the search results listings. Users intuitively understood how to use the deconcatenated Library of Congress Subject Headings. Despite the discovery layer’s new functionality, known journal title searching still presents a challenge to users and certain terms used in the interface were problematic.

Practical implications – It is hoped that the findings will assist implementers of VuFind and other next-generation catalogues to improve their own systems. The questions add to the body of knowledge about usability testing of library catalogues.

Originality/value – No previous papers have been published documenting VuFind usability testing. Not only will the findings be relevant, not just to VuFind, but they will also add to the growing body of literature on next-generation catalogues.

The “no previous papers” became untrue after the paper was accepted and before it was printed. Usability Testing of the VuFind Next-Generation Online Catalogue by Jennifer Emanuel came out in the March 2011 issue of Information Technology and Libraries. It covers very similar matter, and we’re glad that the body of work on VuFind is building up.


Punk math

09 June 2011 mathematics

I really enjoyed listening to the Strongly Connected Components podcast interview with Tom Henderson. Henderson (@mathpunk on Twitter) is a professor who takes a punk do-it-yourself attitude to doing and teaching mathematics. In The Philosophy of Punk Rock Mathematics, an interview at Technoccult, he sets it out:

1) People use the average Joe’s poor mathematics as a way to control, exploit, and numerically fuck him over.

2) Mathematics is the subject in which, regardless of what the authorities tell you is true, you can verify every last iota of truth, with a minimum of equipment.

I like this quote:

I’m trying to get across that if you are highly motivating, if you have a high degree of fire and “Fuck yeah!” and “What, that’s impossible, but true!”, you can get students to express interest in theorems named after dead Hungarians.

See also: Edupunk and Libpunk. And maybe listen to “Marquee Moon” by Television.


Footnotes update

08 June 2011 footnotes

Short note that I added Will Self’s Walking to Hollywood and Margaret Weis and Tracy Hickman’s Deathgate Cycle to Fictional Footnotes and Indexes. I’m always interested in hearing about more.