# Better ways of using R on LibStats (2): durations

Posted: 22 April 2014

(In the previous post, Better ways of using R on LibStats (1), I explain the background for this reference desk statistics analysis with R, and I set up the data I use. This follows on, showing another example of how I figured out how to do something more cleanly and quickly.)

In Ref desk 4: Calculating hours of interactions (from almost exactly two years ago) I explained in laborious detail how I calculated the total hours of interaction at the reference desks. I quote myself:

Another fact we record about each reference desk interaction is its duration, which in our `libstats` data frame is in the `time.spent` column. As I explained in Ref Desk 1: LibStats these are the options:

• NA (“not applicable,” which I’ve used, though I can’t remember why)
• 0-1 minute
• 1-5 minutes
• 5-10 minutes
• 10-20 minutes
• 20-30 minutes
• 30-60 minutes
• 60+ minutes

We can use this information to estimate the total amount of time we spend working with people at the desk: it’s just a matter of multiplying the number of interactions by their duration. Except we don’t know the exact length of each duration, we only know it with some error bars: if we say an interaction took 5-10 minutes then it could have taken 5, 6, 7, 8, 9, or 10 minutes. 10 is 100% more than 5: relatively that’s a pretty big range. (Of course, mathematically it makes no sense to have a 5-10 minute range and a 10-20 minute range, because if something took exactly 10 minutes it could go in either category.)

Let’s make some generous estimates about a single number we can assign to the duration of reference desk interactions.

Duration Estimate
NA 0 minutes
0-1 minute 1 minute
1-5 minutes 5 minutes
5-10 minutes 10 minutes
10-20 minutes 15 minutes
20-30 minutes 25 minutes
30-60 minutes 40 minutes
60+ minutes 65 minutes

This means that if we have 10 transactions of duration 1-5 minutes we’ll call it 10 * 5 = 50 minutes total. If we have 10 transactions of duration 20-30 minutes we’ll call it a 10 * 25 = 250 minutes total. These estimates are arguable but I think they’re good enough. They’re on the generous side for the shorter durations, which make up most of the interactions.

To do all those calculations I made a function, then a data frame of sums, then I loop through all the library branches, build up new a new data frame for each by applying the function to the sums, then put all those data frames together into a new one. Ugly! And bad!

When I went back to the problem and tackled it with `dplyr` I realized I’d made a mistake right off the bat back then: I shouldn’t have added up the number of “20-30 minute” durations (e.g. 10) and then multiplied by 25 to get 250 minutes total. It’s much easier to use the `time.spent` column in the big data frame to generate a new column of estimated durations and then add those up. For example, in each row that has a `time.spent` of “20-30 minutes” put 25 in the `est.duration` column, then later add up all those 25s. Doing it this way means only ever having to deal with vectors, and R is great at that.

Here’s the data I’m interested in. I want to have a new `est.duration` column with numbers in it.

``````> head(subset(l, select=c("day", "question.type", "time.spent", "library.name")))
day                  question.type    time.spent library.name
1 2011-02-01              4. Strategy-Based  5-10 minutes        Scott
2 2011-02-01              4. Strategy-Based 10-20 minutes        Scott
3 2011-02-01              4. Strategy-Based  5-10 minutes        Scott
4 2011-02-01  3. Skill-Based: Non-Technical  5-10 minutes        Scott
5 2011-02-01              4. Strategy-Based  5-10 minutes        Scott
6 2011-02-01              4. Strategy-Based  5-10 minutes        Scott
``````

I’ll do it with these two vectors and the `match` command, which the documentation says “returns a vector of the positions of (first) matches of its first argument in its second.” Here I set them up and show an example of using them to convert the words to an estimated number.

``````> possible.durations <- c("0-1 minute", "1-5 minutes", "5-10 minutes", "10-20 minutes", "20-30 minutes", "30-60 minutes", "60+ minutes")
> duration.times <- c(1, 4, 8, 15, 25, 40, 65)
> match("20-30 minutes", possible.durations)
[1] 5
> duration.times[5]
[1] 25
> duration.times[match("20-30 minutes", possible.durations)]
[1] 25
``````

That’s how to do it for one line, and thanks to the way R works, if we say we want this to be done on a column, it will do the right thing.

``````> l\$est.duration <- duration.times[match(l\$time.spent, possible.durations)]
> head(subset(l, select=c("day", "question.type", "time.spent", "library.name", "est.duration")))
day                  question.type    time.spent library.name est.duration
1 2011-02-01              4. Strategy-Based  5-10 minutes        Scott            8
2 2011-02-01              4. Strategy-Based 10-20 minutes        Scott           15
3 2011-02-01              4. Strategy-Based  5-10 minutes        Scott            8
4 2011-02-01  3. Skill-Based: Non-Technical  5-10 minutes        Scott            8
5 2011-02-01              4. Strategy-Based  5-10 minutes        Scott            8
6 2011-02-01              4. Strategy-Based  5-10 minutes        Scott            8
``````

Now with `dplyr` it’s easy to make a new data frame that lists, for each month, how many ref desk interactions happened and an estimate of their total duration. First I’ll take a fresh sample so I can use the `est.duration` column

``````> l.sample <- l[sample(nrow(l), 10000),]
> sample.durations.pm <- l.sample %.% group_by(library.name, month) %.% summarise(minutes = sum(est.duration, na.rm =TRUE), count=n())
> sample.durations.pm
Source: local data frame [274 x 4]
Groups: library.name

library.name      month minutes count
1           ASC 2011-09-01      77     7
2           ASC 2011-10-01      66     2
3           ASC 2011-11-01      13     7
4           ASC 2012-01-01      41     3
5           ASC 2012-02-01      11     5
6           ASC 2012-03-01       1     1
7           ASC 2012-04-01       4     1
8           ASC 2012-05-01      23     3
9           ASC 2012-06-01       8     2
10          ASC 2012-07-01       4     1
..          ...        ...     ...   ...
> ggplot(sample.durations.pm, aes(x=month, y=minutes/60)) + geom_bar(stat="identity") + facet_grid(library.name ~ .) + labs(x="", y="Hours", title="Estimated total interaction time (based on a small sample only)")
``````

The `count` column is made the same way as last time, and the `minutes` column uses the `sum` function to add up all the durations in each grouping of the data. (`na.rm = TRUE` removes any NA values before adding; without that R would say 5 + NA = NA.)

So easy compared to all the confusing stuff I was doing before.

Finally, finding the average duration is just a matter of dividing (`mutate` comes in `dplyr`):

``````> sample.durations.pm <- mutate(sample.durations.pm, average.length = minutes/count)
> sample.durations.pm
Source: local data frame [274 x 5]
Groups: library.name

library.name      month minutes count average.length
1           ASC 2011-09-01      77     7      11.000000
2           ASC 2011-10-01      66     2      33.000000
3           ASC 2011-11-01      13     7       1.857143
4           ASC 2012-01-01      41     3      13.666667
5           ASC 2012-02-01      11     5       2.200000
6           ASC 2012-03-01       1     1       1.000000
7           ASC 2012-04-01       4     1       4.000000
8           ASC 2012-05-01      23     3       7.666667
9           ASC 2012-06-01       8     2       4.000000
10          ASC 2012-07-01       4     1       4.000000
..          ...        ...     ...   ...            ...
> ggplot(sample.durations.pm, aes(x=month, y=average.length)) + geom_bar(stat="identity") + facet_grid(library.name ~ .) + labs(x="", y="Minutes", title="Estimated average interaction time (based on a small sample only)")
``````

Don’t take those numbers as reflecting the actual real activity going on at YUL. It’s just a sample, and it conflates all kinds of questions, from directional (“where’s the bathroom”), which take 0-1 minutes, to specialized (generally the deep and time-consuming upper-year, grad and faculty questions, or ones requiring specialized subject knowledge), which can take hours. Include the usual warnings about data gathering, analysis, visualization, interpretation, problem(at)ization, etc.

# Better ways of using R on LibStats (1)

Posted: 22 April 2014

A couple of years ago I wrote some R scripts to analyze the reference desk statistics that we keep at York University Libraries with LibStats. I wrote five posts here about what I found; the last one, Ref desk 5: Fifteen minutes for under one per cent, links to the other four.

Those scripts did their job, but they were ugly, and there were some more things I wanted to do. Because of my recent Ubuntu upgrade, I’m running R version 3.0.2 now, which means I can use the new `dplyr` package by R wizard Hadley Wickham and others. (It doesn’t work on 3.0.1.) The vignette for dplyr has lots of examples, and I’ve been seeing great posts about it, and I was eager to try it. So I’m going back to the old work and refreshing it and figuring out how to do what I wanted to do in 2012—or couldn’t because we only had one year of data; now that we have four, year-to-year comparisons are interesting.

This first post is about how I used to do things in an ugly and slow way, and how to do them faster and better.

I begin with a CSV file containing a slightly munged and cleaned dump of all the information from LibStats.

``````\$ head libstats.csv
timestamp,question.type,question.format,time.spent,library.name,location.name,initials
02/01/2011 09:20:11 AM,4. Strategy-Based,In-person,5-10 minutes,Scott,Drop-in Desk,AA
02/01/2011 09:43:09 AM,4. Strategy-Based,In-person,10-20 minutes,Scott,Drop-in Desk,AA
02/01/2011 10:00:56 AM,4. Strategy-Based,In-person,5-10 minutes,Scott,Drop-in Desk,AA
02/01/2011 10:05:05 AM,3. Skill-Based: Non-Technical,Phone,5-10 minutes,Scott,Drop-in Desk,AA
02/01/2011 10:17:20 AM,4. Strategy-Based,In-person,5-10 minutes,Scott,Drop-in Desk,AA
02/01/2011 10:30:07 AM,4. Strategy-Based,In-person,5-10 minutes,Scott,Drop-in Desk,AA
02/01/2011 10:54:41 AM,4. Strategy-Based,In-person,5-10 minutes,Scott,Drop-in Desk,AA
02/01/2011 11:08:00 AM,4. Strategy-Based,In-person,10-20 minutes,Scott,Drop-in Desk,AA
02/01/2011 11:32:00 AM,3. Skill-Based: Non-Technical,In-person,10-20 minutes,Scott,Drop-in Desk,AA
``````

I read the CSV file into a data frame, then fix a couple of things. The date is a string and needs to be turned into a Date, and I use a nice function from `lubridate` to find the floor of the date, which aggregates everything to the month it’s in.

``````> l <- read.csv("libstats.csv")
> library(lubridate)
> l\$day <- as.Date(l\$timestamp, format="%m/%d/%Y %r")
> l\$month <- floor_date(l\$day, "month")
> str(l)
'data.frame': 187944 obs. of  9 variables:
\$ timestamp      : chr  "02/01/2011 09:20:11 AM" "02/01/2011 09:43:09 AM" "02/01/2011 10:00:56 AM" "02/01/2011 10:05:05 AM" ...
\$ question.type  : chr  "4. Strategy-Based" "4. Strategy-Based" "4. Strategy-Based" "3. Skill-Based: Non-Technical" ...
\$ question.format: chr  "In-person" "In-person" "In-person" "Phone" ...
\$ time.spent     : chr  "5-10 minutes" "10-20 minutes" "5-10 minutes" "5-10 minutes" ...
\$ library.name   : chr  "Scott" "Scott" "Scott" "Scott" ...
\$ location.name  : chr  "Drop-in Desk" "Drop-in Desk" "Drop-in Desk" "Drop-in Desk" ...
\$ initials       : chr  "AA" "AA" "AA" "AA" ...
\$ day            : Date, format: "2011-02-01" "2011-02-01" "2011-02-01" "2011-02-01" ...
\$ month          : Date, format: "2011-02-01" "2011-02-01" "2011-02-01" "2011-02-01" ...
timestamp                 question.type question.format    time.spent library.name location.name initials
1 02/01/2011 09:20:11 AM             4. Strategy-Based       In-person  5-10 minutes        Scott  Drop-in Desk       AA
2 02/01/2011 09:43:09 AM             4. Strategy-Based       In-person 10-20 minutes        Scott  Drop-in Desk       AA
3 02/01/2011 10:00:56 AM             4. Strategy-Based       In-person  5-10 minutes        Scott  Drop-in Desk       AA
4 02/01/2011 10:05:05 AM 3. Skill-Based: Non-Technical           Phone  5-10 minutes        Scott  Drop-in Desk       AA
5 02/01/2011 10:17:20 AM             4. Strategy-Based       In-person  5-10 minutes        Scott  Drop-in Desk       AA
6 02/01/2011 10:30:07 AM             4. Strategy-Based       In-person  5-10 minutes        Scott  Drop-in Desk       AA
``````

The columns are:

• timestamp: timestamp (not in a standard format)
• question.type: one of five categories of question (1 = directional, 5 = specialized)
• question.format: how or where the question was asked (in person, phone, chat)
• time.spent: time spent giving help
• library.name: the library name
• location.name: where in the library (ref desk, office, info desk)
• initials: initials of the person (or people) who helped

Now I have these fields in the data frame that I will use:

``````> head(subset(l, select=c("day", "month", "question.type", "time.spent", "library.name")))
day      month                 question.type    time.spent library.name
1 2011-02-01 2011-02-01             4. Strategy-Based  5-10 minutes        Scott
2 2011-02-01 2011-02-01             4. Strategy-Based 10-20 minutes        Scott
3 2011-02-01 2011-02-01             4. Strategy-Based  5-10 minutes        Scott
4 2011-02-01 2011-02-01 3. Skill-Based: Non-Technical  5-10 minutes        Scott
5 2011-02-01 2011-02-01             4. Strategy-Based  5-10 minutes        Scott
6 2011-02-01 2011-02-01             4. Strategy-Based  5-10 minutes        Scott
``````

But I’m going to just take a sample of all of this data, because this is just for illustrative purposes, not real analysis. Let’s grab 10,000 random entries from this data frame and put that into `l.sample`.

``````> l.sample <- l[sample(nrow(l), 10000),]
``````

An easy thing to ask first is: How many questions are asked each month in each library?

Here’s how I did it before. I’ll run the command and show the resulting data frame. I used the `plyr` package, which is (was) great, and its `ddply` function, which applies a function to a data frame and gives a data frame back. Here I have it collapse the data frame `l` along the two columns specified (month and library.name) and use `nrow` to count how many rows result. Then I check how long it would take to perform that operation on the entire data set.

``````> library(plyr)
> sample.allquestions.pm <- ddply(l.sample, .(month, library.name), nrow)
month      library.name  V1
1 2011-02-01          Bronfman  63
2 2011-02-01             Scott  60
3 2011-02-01 Scott Information 183
4 2011-02-01              SMIL  57
5 2011-02-01           Steacie  57
6 2011-03-01          Bronfman  46
> system.time(allquestions.pm <- ddply(l, .(month, library.name), nrow))
user  system elapsed
2.812   0.518   3.359
``````

The `system.time` line there show how long the previous command takes to run on the entire data frame: almost 3.5 seconds! That is slow. Do a few of those, chopping and slicing the data in various ways, and it will really add up.

This is a bad way of doing it. It works! But it’s slow and I wasn’t thinking about the problem the right way. Using `ddply` and `nrow` was wrong: I should have been using `count` (also from `plyr`), which I wrote up a while back, with some examples. That’s a much faster and more sensible way of counting up the number of rows in a data set.

But now that I can use `dplyr`, I can approach the problem in a whole new way.

First, I’ll clear `plyr` out of the way, then load `dplyr`. Doing it this way means no function names collide.

``````> search()
[1] ".GlobalEnv"        "package:plyr"      "package:lubridate" "package:ggplot2"   "ESSR"              "package:stats"     "package:graphics"  "package:grDevices"
[9] "package:utils"     "package:datasets"  "package:methods"   "Autoloads"         "package:base"
> detach("package:plyr")
> library(dplyr)
``````

See how nicely you can construct and chain operations with `dplyr`:

``````> l.sample %.% group_by(month, library.name) %.% summarise(count=n())
Source: local data frame [277 x 3]
Groups: month

month      library.name count
1  2011-02-01          Bronfman    63
2  2011-02-01              SMIL    57
3  2011-02-01             Scott    60
4  2011-02-01 Scott Information   183
5  2011-02-01           Steacie    57
6  2011-03-01          Bronfman    46
7  2011-03-01              SMIL    59
8  2011-03-01             Scott    71
9  2011-03-01 Scott Information   220
10 2011-03-01           Steacie    61
..        ...               ...   ...
``````

The `%.%` operator lets you chain together different operations, and just for the sake of clarity of reading, I like to arrange things so first I specify the data frame on its own and then walk through the things I do to it. First, `group_by` breaks down the data frame by columns and does some magic. Then `summarise` collapses the different chunks of resulting data into one line each, and I use `count=n()` to make a new column, `count`, which contains the count of how many rows there were in each chunk, calculated with the `n()` function. In English I’m saying, “take the `l` data frame, group it by `month` and `library.name`, and count how many rows are in each grouping.” (Also, notice I didn’t need to use the `head` command to stop it running off the screen, it made it nicely readable on its own.)

It’s easier to think about, it’s easier to read, it’s easier to play with … and it’s much faster. How long would this take to run on the entire data set?

``````> system.time(l %.% group_by(month, library.name) %.% summarise(count=n()))
user  system elapsed
0.032   0.000   0.033
``````

0.03 seconds elapsed time! That is 0.9% of the 3.35 seconds the old way.

Graphing it is easy, using Hadley Wickham’s marvellous ggplot2 package.

``````> library(ggplot2)
> sample.allquestions.pm <- l.sample %.% group_by(month, library.name) %.% summarise(count=n())
> ggplot(sample.allquestions.pm, aes(x=month, y=count)) + geom_bar(stat="identity") + facet_grid(library.name ~ .) + labs(x="", y="", title="All questions")
> ggsave(filename="20140422-all-questions-1.png", width=8.33, dpi=72, units="in")
``````

You can see the ebb and flow of the academic year: September, October and November are very busy, then things quiet down in December, then January, February and March busy, then it cools off in April and through the summer. (Students don’t ask a lot of questions close to and during exam time—they’re studying, and their assignments are finished.)

What about comparing year to year? Here’s a nice way of doing that.

First, pick out the numbers of the months and years. The `format` command knows all about how to handle dates and times. See the man page for `strptime` or your favourite language’s date manipulation commands for all the options possible. Here I use %m to find the month number and %Y to find the four-digit year. Two examples, then the commands:

``````> format(as.Date("2014-04-22"), "%m")
[1] "04"
> format(as.Date("2014-04-22"), "%Y")
[1] "2014"
> sample.allquestions.pm\$mon  <- format(as.Date(sample.allquestions.pm\$month), "%m")
> sample.allquestions.pm\$year <- format(as.Date(sample.allquestions.pm\$month), "%Y")
Source: local data frame [6 x 5]
Groups: month

month      library.name count mon year
1 2011-02-01          Bronfman    63  02 2011
2 2011-02-01              SMIL    57  02 2011
3 2011-02-01             Scott    60  02 2011
4 2011-02-01 Scott Information   183  02 2011
5 2011-02-01           Steacie    57  02 2011
6 2011-03-01          Bronfman    46  03 2011
> ggplot(sample.allquestions.pm, aes(x=year, y=count)) + geom_bar(stat="identity") + facet_grid(library.name ~ mon) + labs(x="", y="", title="All questions")
> ggsave(filename="20140422-all-questions-2.png", width=8.33, dpi=72, units="in")
``````

This plot changes the x-axis to the year, and facets along two variables, breaking the the chart up vertically by library and horizontally by month. It’s easy now to see how months compare to each other across years.

With a little more work we can rotate the x-axis labels so they’re readable, and put month names along the top. The `month` function from `lubridate` makes this easy.

``````> sample.allquestions.pm\$month.name <- month(sample.allquestions.pm\$month, label = TRUE)
Source: local data frame [6 x 6]
Groups: month

month      library.name count mon year month.name
1 2011-02-01          Bronfman    63  02 2011        Feb
2 2011-02-01              SMIL    57  02 2011        Feb
3 2011-02-01             Scott    60  02 2011        Feb
4 2011-02-01 Scott Information   183  02 2011        Feb
5 2011-02-01           Steacie    57  02 2011        Feb
6 2011-03-01          Bronfman    46  03 2011        Mar
> ggplot(sample.allquestions.pm, aes(x=year, y=count)) + geom_bar(stat="identity") + facet_grid(library.name ~ month.name) + labs(x="", y="", title="All questions") + theme(axis.text.x = element_text(angle = 90))
> ggsave(filename="20140422-all-questions-3.png", width=8.33, dpi=72, units="in")
``````

# Ubuntu 14.04 and grub

Posted: 18 April 2014

Ubuntu 14.04 was released yesterday. I have two laptops that run it. I did the unimportant one first, and everything went fine. Then I did the important one, the one where I do all my work, and after restarting it came up with a boot error:

`error: symbol 'grub_term_highlight_color' not found`

I had two reactions. First, boot errors are solvable. The boot stuff is on one part of my hard drive, and my real stuff is on another part, and it’s fine where it is, I just need to fix the boot stuff. Besides, I have backups. So with a bit of fiddling, I’ll be able to fix it. Second, cripes, what the hell? I’ve been using this laptop for six months or a year or more since a major upgrade, and now it’s telling me there’s some problem with how it boots up? That is a load of shite.

Searching turned up evidence other people had the same problem, and they were being blamed for having an improper boot sector or some such business. For a few minutes I felt like non-geeks feel when presented with problems like this: despair … annoyance … frustration … the first pangs of hate.

But such is life. When upgrading a system we must be prepared for possible problems. We cannot expect it to always go smoothly. Even in the face of such technical problems we must try to remain tranquil.

It’s solvable, I remembered. So I downloaded a Boot-Repair Disk image—this is a very useful tool, and it works even though it’s a year old—and put it on a USB drive with startup disk creator, then booted up, ran `sudo boot-repair`, used all the default answers, let it do its work, and everything was all right. Phew.

Aside from that, everything about the upgrade went perfectly fine. This time I did it at the command line with `sudo do-release-upgrade`. It took a while to download all the upgraded packages, but the actual update went quickly and smoothly. My thanks to everyone involved with Debian, Ubuntu, GNU/Linux, and everything else.

(However, I’m glad I had another machine available where I could do the download and set up the boot disk. Without it, I would have been in trouble. I don’t know if a similar problem might have arisen when Windows or MacOS users do an upgrade.)

# The heart of the university

Posted: 15 April 2014

The library is the heart of the University. From it, the lifeblood of scholarship flows to all parts of the University; to it the resources of scholarship flow to enrich the academic body. With a mediocre library, even a distinguished faculty cannot attain its highest potential; with a distinguished library, even a less than brilliant faculty may fulfill its mission. For the scientist, the library provides an indispensable backstop to his laboratory and field work. For the humanist, the library is not only his reference centre; it is indeed his laboratory and the field of his explorations. What he studies, researches and writes is the product of his reading in the library. For these reasons, the University library must be one of the primary concerns for those responsible for the development and welfare of the institution. At the same time, the enormous cost of acquisitions, the growing scarcity of older books, the problem of storage and cataloguing make the library one of the most painful headaches of the University administrator.

From the Report to the Committee on University Affairs and the Committee of Presidents of Provincially-Assisted Universities, by the Commission to Study the Development of Graduate Programmes in Ontario Universities, chaired by Gustave O. Arlt, F. Kenneth Hare, and J.W.T. Spinks, published in 1966. I think I found this in Evolution of the Heart: A History of the University of Toronto Library Up to 1981 by Robert Blackburn, which is a fine book, and very interesting. Blackburn was the chief librarian there for about 25 years.

# Stuff, Standards and Sites

Posted: 14 April 2014

On 26 March 2014 I gave a short talk at the March 2014 AR Standards Community Meeting in Arlington, Virginia. The talk was called “Stuff, Standards and Sites: Libraries and Archives in AR.” My slides and the text of what I said are online:

I struggled with how best to talk to non-library people, all experts in different aspects of augmented reality, about how our work can fit with theirs. The stuff/standards/sites components gave me something to hang the talk on, but it didn’t all come together as well as I’d hoped and in the heat of actually speaking I forgot to mention a couple of important things. Ah well.

I made the slides a new way. They are done with reveal.js, but I wrote them in Emacs with Org and then used org-reveal to export them. It worked beautifully! The diagrams in the slides are done in text in Org with ditaa and turned into images on export.

What I write in Org looks like this (here I turned image display off, but one keystroke makes them show):

When turned into slides, that looks like this:

Working this way was a delight. No more nonsense about dragging boxes and stuff around like in Power Point. I get to work with pure text, in my favourite editor, and generate great-looking slides, all with free software.

To turn all the slides into little screenshots, I used this little script I found in a GitHub gist: PhantomJS script to capture/render screenshots of the slides of a Reveal.js powered slideshow. I had to install phantom.js first, but on my Ubuntu system that was just a simple `sudo apt-get install phantomjs`.

# Dying Every Day

Posted: 11 April 2014

I was in New York a couple of weeks ago, and I went to the Strand Bookstore, that multistory heaven of used and new books. I wandered around a while and got some things I’d been wanting. I wanted to read something set in New York so I looked first at Lawrence Block’s books and got The Burglar in the Closet, which opens with Bernie Rhodenbarr sitting in Gramercy Park, which I’d just passed by on the walk down, and then at Donald E. Westlake and got Get Real, the last of the Dortmunder series, and mostly set in the Lower East Side. Welcome to New York.

While I was standing near a table in the main aisle on the ground floor an older woman carrying some bags passed behind me and accidentally knocked some books to the floor. “Oh, I’m sorry, did I do that?” she said in a thick local accent. A young woman and I both leaned over to pick up the books. I was confused for a moment, because it looked like the cover had ripped, but it hadn’t, the rip was printed.

Then I saw what the book was: Dying Every Day: Seneca at the Court of Nero, by James Romm. A new book about Seneca, the Roman senator and Stoic philosopher! Fate had actually put this book in my hand. “It is destined to be,” I thought, and immediately bought it.

It’s a fine book, a gripping history and biography, covering in full something I only knew a tiny bit about. Seneca wrote a good amount of philosophy, including the Epistles, a series of letters full of Stoic advice to a younger friend, but the editions of his philosophy (or his tragedies) don’t go much into the details of Seneca’s life. They might mention he was a senator and advisor to Nero, and rich (as rich as a billionaire today), but then they get on to analyzing the subtleties of his thoughts on nature or equanimity.

Seneca led an incredible life: he was a senator in Rome, he was banished by the emperor Claudius on trumped-up charges of an affair with Caligula’s sister, but was later called back to Rome at the behest of Agrippina, Nero’s mother, to act as an advisor and tutor to the young man. Five years later, Agrippina poisoned Claudius, and Nero became emperor.

Seneca was very close to Nero and stayed as his advisor for years. It worked fairly well at first, but Nero was Nero. This is the main matter of the book: how Seneca, the wise Stoic, stayed close to Nero, who gradually went out of control: wild behaviour, crimes, killings, and eventually the murder of his mother Agrippina. An attempt to kill her on a boat failed, and then:

None of Seneca’s meditations on morality, Virtue, Reason, and the good life could have prepared him for this. Before him, as he entered Nero’s room, stood a frightened and enraged youth of twenty-three, his student and protégé for the past ten years. For the past five, he had allied with the princeps against his dangerous mother. Now the path he had first opened for Nero, by supporting his dalliance with Acte, had led to a botched murder and a political debacle of the first magnitude. It was too late for Seneca to detach himself. The path had to be followed to its end.

Every word Seneca wrote, every treatise he published, must be read against his presence in this room at this moment. He stood in silence for a long time, as though contemplating the choices before him. There were no good ones. When he finally spoke, it was to pass the buck to Burrus. Seneca asked whether Burrus could dispatch his Praetorians to take Agrippina’s life.

Seneca supported Nero’s matricide.

It’s impossible to match that, and other things Seneca did, with his Stoic writings, but it was all the same man. It’s a remarkable and paradoxical life.

Romm’s done a great job of writing this history. It’s full of detail (especially drawing on Tacitus), with lots of people and events to follow, but it’s all presented clearly and with a strong narrative. If you liked I, Claudius you’ll like this, and I see similar comments about House of Cards and Game of Thrones.

I especially recommend this to anyone interested in Stoicism. Thrasea Pateus is a minor figure in the book, another senator and also a Stoic, but one who acted like a Stoic should have, by opposing Nero. He was new to me. Seneca’s Stoic nephew Lucan, who wrote the epic poem The Civil War, also appears. He was friends with Nero but later took part in a conspiracy to kill the emperor. It failed, and Lucan had to commit suicide, as did Seneca, who wasn’t part of the plot.

There’s a nice chain of philosophers at the end of the book. After Nero’s death, Thrasea’s Stoic son-in-law Helvidius Priscus returns to Rome, as does the great Stoic Musonius Rufus and Demetrius the Cynic. The emperor Vespasian later banished philosophers from Rome (an action that seems very puzzling these days; I’m not sure what the modern equivalent would be), but for some reason let Musonius Rufus stay. One of his students was Epictetus, who had been a slave belonging to Epaphroditus, who in turn had been Nero’s assistant and had been with him when Nero, on the run, committed suicide—in fact, Epaphroditus helped his master by opening up the cut in his throat.

Later the Stoics were banished from Rome again, and Epictetus went to Greece and taught there. He never wrote anything himself, but one of his students, Arrian, wrote down what he said, which is why we now have the very powerful Discourses. And years later this was read by Marcus Aurelius, the Stoic emperor, a real philosopher king.

For a good introduction to the book, listen to this interview with James Romm on WNYC in late March. It’s just twenty minutes.

# Don't give up your library card number

Posted: 09 April 2014

There was good news Monday from the Toronto Public Library (TPL): Toronto Public Library Introduces Online Music and Video. It seemed good at the start, anyway.

Toronto Public Library has introduced a new service that allows customers to download or stream a wide variety of music and video content. With a library card, customers can access music albums from a wide variety of genres, movies, educational television and documentaries. More information is available at tpl.ca/hoopla.

“We’re happy to now offer customers a great selection of music and videos that they can easily stream or download. E-content is our fastest area of growth, with customers borrowing more than 2 million ebooks, eaudio-books and emagazines in 2013. We expect we’ll see even more growth this year with the introduction of online music and video,” said Vickery Bowles, Director of Collections Management at Toronto Public Library.

With just a library card, customers can listen to a wide selection of music albums and watch a variety of video content. Content may be borrowed via a browser, smartphone or tablet and instantly streamed or downloaded with no waiting lists or late fees. Customers may borrow up to five items per month.

Here’s a CBC news report about: Hoopla comes to Toronto: Toronto’s libraries are introducing a new Netflix-like service.

Seems like a very nice service. I’m happy to see my local library system working to get more streaming media to people in Toronto. I’m unhappy with the privacy implications of this, however. (As is Kate Johnson, a professor at the library school at the University of Western Ontario, who’s interviewed in that video clip: she raises the privacy question, but the reporter completely drops the issue). Here are my speculations based on a brief examination of what I see.

The TPL’s page about the new service explains how it works. It says you need an “account at hoopladigital.com (library card and email address required to create)” and “because Hoopla requires a separate account to be created, you may wish to review their privacy policy.” The privacy policy is, oddly, a PDF hosted at an unmemorable Cloudfront URL, and not the official privacy policy on Hoopla’s web site. They are different. For example, the web site version says, “As you use the hoopla service, we record how you use our application, including the materials you borrow. This information is reported to your library, content providers, and licensing agencies. When it is reported, it is always reported in aggregate with other patrons. It is never reported in a manner that associates your account with specific content or activities.” (Update at 19:25: that privacy policy link has been corrected to go to Hoopla’s site.)

None of that bothered me particularly, so I went to sign up for an account to try it out. This is the third step in the process:

“Enter your libary card number,” it says. “If your library gave you a PIN to use with your library card, please enter it.” I have a PIN, but I stopped here. (I don’t know what happens to people without a password; I’d guess they’re asked to set one up.)

So Hoopla wants my library card number. I posted a comment on Twitter about that and got a number of responses, including three from Michelle Leung (@mishiechau), who said, we review 3rd prty privcy polcies 2 portect cust + we suggest cust. do the same… and we haven’t given hoopla a dump of card #s in advance their systms chk w/ours@ acnt creation time 2 c if user valid..

Certainly Hoopla needs to be sure that anyone claiming to be a Toronto Public Library user actually is. But it looks like they’re doing it by asking the user for their library card number and password and then asking TPL if that is a valid account.

This is not right. There’s no need for any third party to know my library card number. OAuth would be a better way to do it: as it says, it’s “an open protocol to allow secure authorization in a simple and standard method from web, mobile and desktop applications.” This is what they say to anyone offering services online: “If you’re storing protected data on your users' behalf, they shouldn’t be spreading their passwords around the web to get access to it. Use OAuth to give your users access to their data while protecting their account credentials.”

Who’s behind Hoopla, anyway? It’s a sevice run by Midwest Tape, who on their Twitter account say “Midwest Tape is a full service DVD, Blu-ray, music CD, audiobook, and Playaway distributor, conducting business exclusively with public libraries since 1989.” They’re run out of Holland, Ohio, in the United States.

I suspect this means the Toronto Public Library is offering a service that requires users to give their library card number and password to an American company that will store it on American servers, which means the data is available to the US government through the PATRIOT Act. (Of course, we also need to assume that all library data can be access by our spy agencies, but we need to do what we can.)

I may be wrong. I’ll ask Hoopla and TPL and update this with what I find.

# The Norman Conquests

Posted: 04 March 2014

I had the enormous pleasure on Saturday and Monday of seeing the three plays that make up The Norman Conquests by Alan Ayckbourn, put on by the Soulpepper company here in Toronto. This review of the October 2013 production explains well how well done they all were and what great plays they are. It was more excellent work by Soulpepper; even more enjoyable than usual because seeing three plays in such a short time—two Saturday and one Monday—concentrates and intensifies everything.

Here I note two especially interesting about the trilogy: the chronology and the fact that Norman is a librarian. I admit that second fact is of limited interest to non-librarians, but after all I myself am a librarian.

## Chronology

The three plays all take place over the same weekend with the same six characters, but Table Manners is set in the dining room, Living Together in the sitting room, and Round and Round the Garden in the garden. Each has two acts with two scenes, but the times are staggered, so as you see them—I saw them in that order—the pieces all lock together, and when someone enters a room in one play you realize you saw them leave from another room in another play, or when someone says something offhand in one play you realize they’re covering up an intense experience from another play.

Table Manners

• I.i: The dining room. Saturday evening, 6 pm
• I.ii: The dining room. Sunday morning. 9 am
• II.i: The dining room. Sunday evening, 8 pm
• II.ii: The dining room. Monday morning, 8 am

Living Together

• I.i: The sitting room. Saturday, 6:30 pm
• I.ii: The sitting room. Saturday, 8 pm
• II.i: The sitting room. Sunday, 9 pm
• II.ii: The sitting room. Monday, 8 am

Round and Round the Garden

• I.i: The garden. Saturday, 5:30 pm
• I.ii: The garden. Saturday, 9 pm
• II.i: The garden. Sunday, 11 am
• II.ii: The garden. Monday, 9 am

Round and Round the Garden comes third in the sequence but contains the weekend in time: it begins first, Saturday at 5:30 pm, and ends last, in the garden on Monday morning at 9 am when people are leaving.

Seeing all three, and spending over six hours with the six actors—while sitting in the front row of an arena theatre!—was a marvellous experience.

## Digression

The Norman Conquests was first produced at the Library Theatre, which at the time was inside the library in Scarborough in Yorkshire.

## Librarian

One of the characters is a librarian: Norman, played by Albert Schultz. He does it as a great hairy shambling kind of a man, as many male librarians are, and suitably dressed in a cardigan, as all librarians are. There are a few good library-related lines:

From Table Manners:

I.ii:

Norman: The trouble is, I was born in the wrong damn body. Look at me. A gigolo trapped in a haystack. The tragedy of my life. Norman Dewers—gigolo and assistant librarian.

II.i

Ruth: Forget it. You couldn’t possibly take Norman away from me. That assumes I own him in the first place. I’ve never done that. I always feel with Norman that I have him on loan from somewhere. Like one of his library books. I’ll get a card one day informing me he’s overdue and there’s a fine to pay on him.

From Living Together:

I.i:

Sarah: I thought you were in a hurry to go somewhere, Norman.

Norman: Not at all.

Reg: Yes, I thought you said you had a—librarian’s conference.

Norman: It’s been cancelled.

Reg: When?

Norman: About ten seconds ago. Due to lack of interest.

Reg: Funny lot these librarians.

I.i:

Sarah: It’s a bit late to consider his feelings now, isn’t it? Having tried to steal Annie from under his nose.

Norman: I wasn’t stealing her, I was borrowing her. For the weekend.

Sarah: Makes her sound like one of your library books.

I.i:

Annie: What are you going to tell Ruth?

Norman: What I was going to tell her anyway. I’ve been on a conference.

Annie: Which finished early?

Norman: Something like that. We ran out of things to talk about. What does it matter? She won’t care. She probably thinks I’m in the attic mending the roof.

Annie: I didn’t know Assistant Librarians had conferences.

Norman: Everybody has conferences.

II.ii:

Ruth: You’re supposed to be at work too.

Norman: I was taken ill, haven’t you heard?

Ruth: I’m amazed they keep you on.

Norman: I’m a very good librarian, that’s why. I know where all the dirty bits are in all the books.

From Round and Round the Garden:

III.I

Tom: Oh. I thought you said you were staying.

Norman: No, I’m just passing through on my way to East Grinstead.