Miskatonic University Press

Bruce Dickinson on Christmas University Challenge

music

I was delighted to see that Iron Maiden singer Bruce Dickinson appeared on BBC Television’s Christmas University Challenge late last year, representing Queen Mary. The Christmas version is a slightly easier celebrity take on the regular University Challenge, which Canadians can think of as an extremely difficult university-level version of Reach for the Top.

Bruce did very well and contributed a lot to the team, and Queen Mary won, but didn’t make it through to the semi-finals because their score wasn’t high enough compared to other winners.

Bruce introduces himself.
Bruce introduces himself.

Bruce introduced himself thus:

Hi, I’m Bruce Dickinson and I graduated in 1978 with a Desmond in modern history, and I ended up being a heavy metal singer, an airline pilot, [also] I brew beer, and I’m just about to discover what does this button do at this quiz.

(A Desmond is rhyming slang for a 2:2 in the UK university marking scheme.)

Bruce recognizes Roger Moore quote.
Bruce recognizes Roger Moore quote.

Above, he has just buzzed to answer this starter for ten:

“Young man, with your devastating good looks and disastrous lack of talent, you should take any job offered to you.” This advice was given by Noël Coward to which actor, who died in 2017. The actor in question once described his range as “left eyebrow raised, right—”

Bruce cut off host Jeremy Paxman with the correct answer: Roger Moore.

In a later question, Bruce stared into space as he tried to remember the answer.

Bruce thinking.
Bruce thinking.

Here he is regretting that he didn’t recognize the Fine Young Cannibals song “She Drives Me Crazy.” However, he did know that Mick Ronson played guitar on The Rise and Fall of Ziggy Stardust and the Spiders from Mars.

Bruce misses a question.
Bruce misses a question.

Here he is answering correctly that nickel, like iron, is magnetic at room temperature (the only other one being cobalt).

Bruce knows his metals.
Bruce knows his metals.

Nickel, of course, is metal.

Dilton smells a book in the library

archie libraries

Recent Archie comics have had several scenes in libraries. Here’s a one-page gag, “Nothing But a Hound-Dog,” about Dilton Doiley (the Riverdale boy genius):

Dilton should not be ashamed.
Dilton should not be ashamed.

I’m with Dilton on this. In fact, not only is it nice to smell new books, it’s even more interesting to smell old books. If you ever pick up a book over 100 years old, make sure you stick your nose up close and give it a good smell. If anyone sees you they may make fun of you, but they just don’t get it. Librarians, archivists, book collectors and related types get it.

The comic is most likely set in the fiction section of the Riverdale High School library, with books arranged by the last name of the author. It is unusual that the book Dilton is looking at has nothing written on the cover or the spine.

This is from Archie and Me Comics Digest 3 (February 2018). There are no credits. It is copyright by Archie Comics.

Hiram in the library

archie libraries

Here’s another Archie Comics panel in a library, but this one is different: it’s Hiram Lodge when he was a teenager. The story is “The Collector,” a two-parter, eleven pages long, where Mr. Lodge’s valuable collection of baseball memorabilia, which he has been collecting since he was a boy, is stolen. Spoiler: Archie figures out who did it.

This panel is from Mr. Lodge’s flashback describing how much time he spent on his collection when he was young.

A young Hiram Lodge.
A young Hiram Lodge.

“210–270” is a Dewey Decimal Classification range. The 200s are for religion (almost entirely Christianity, reflecting the origins of the system).

“The Collector” was written by George Gladir, pencilled by Stan Goldberg, inked by Mike Esposito, coloured by Barry Grossman and lettered by Bill Yoshida. It is copyright by Archie Comics. I read it in Archie Comics Double Digest 264 (January 2018), but it originally appeared in Life with Archie 284 (May 1991).

Jughead in the library

archie libraries

Here’s another instance of our favourite Riverdale High students going to the school library: this time it’s from “The All-Seeing Eye,” which I saw in Archie and Me Comics Digest 2 (January 2018) but which was originally printed in Archie’s Pal Jughead 157 (June 2004).

Notice that the books have no shelf labels.
Notice that the books have no shelf labels.

I’m a librarian and I can confirm students sleep in the library all the time. We don’t mind. They’re working hard, they’re tired, and there are few places to grab a quick nap. Nevertheless, no librarian would ever let a student block an aisle like this, or put his feet up on a stack of books. Jughead should know better.

“The All-Seeing Eye” was written by Craig Boldman, pencilled by Rex Lindsay, inked by Rich Koslowski, coloured by Barry Grossman and lettered by Vickie Williams. The comic is copyright by Archie Comics.

A mysterious event earlier today

vagaries

Early this afternoon I was walking south down a major street in Toronto near where I live. A red car, southbound, appears from behind me, pulls to the side, and parks a short ways ahead of me. A grey-haired woman, about 60, short and squat, gets out from the passenger side. She is wearing a long wool coat and a longer dress or skirt, and has a long scarf around her neck. She comes towards me on the sidewalk, walking in the opposite direction to her previous travel. I assume she is going to one of the apartment buildings along the street.

When she is very near, she looks at me and says, “Do you speak Polish?”

“No.”

“Russian?”

“No.”

She nods. “Thank you.” Then she walks back to the small red car and it drives off.

I have been unable to come up with any reasonable explanation for this. Do I look, from behind, like I speak Polish? What if I had said yes? Perhaps some things are best left unknown.

Org clocktables II: Summarizing a month

code4lib emacs libraries r york

In Org clocktables I: The daily structure I explained how I track my time working at an academic library, clocking in to projects that are either categorized as PPK (“professional performance and knowledge,” our term for “librarianship,”), PCS (“professional contributions and standing”, which covers research, professional development and the like) and Service. I do this by checking in and out of tasks with the magic of Org.

I’ll add a day to the example I used before, to make it more interesting. This is what the raw text looks like:

* 2017-12 December

** [2017-12-01 Fri]
:LOGBOOK:
CLOCK: [2017-12-01 Fri 09:30]--[2017-12-01 Fri 09:50] =>  0:20
CLOCK: [2017-12-01 Fri 13:15]--[2017-12-01 Fri 13:40] =>  0:25
:END:

*** PPK

**** Libstats stuff
:LOGBOOK:
CLOCK: [2017-12-01 Fri 09:50]--[2017-12-01 Fri 10:15] =>  0:25
:END:

Pull numbers on weekend desk activity for A.

**** Ebook usage
:LOGBOOK:
CLOCK: [2017-12-01 Fri 13:40]--[2017-12-01 Fri 16:30] =>  2:50
:END:

Wrote code to grok EZProxy logs and look up ISBNs of Scholars Portal ebooks.

*** PCS

*** Service

**** Stewards' Council meeting
:LOGBOOK:
CLOCK: [2017-12-01 Fri 10:15]--[2017-12-01 Fri 13:15] =>  3:00
:END:

Copious meeting notes here.

** [2017-12-04 Mon]
:LOGBOOK:
CLOCK: [2017-12-04 Mon 09:30]--[2017-12-04 Mon 09:50] =>  0:20
CLOCK: [2017-12-04 Mon 12:15]--[2017-12-04 Mon 13:00] =>  0:45
CLOCK: [2017-12-04 Mon 16:00]--[2017-12-04 Mon 16:15] =>  0:15
:END:

*** PPK

**** ProQuest visit
:LOGBOOK:
CLOCK: [2017-12-04 Mon 09:50]--[2017-12-04 Mon 12:15] =>  2:25
:END:

Notes on this here.

**** Math print journals
:LOGBOOK:
CLOCK: [2017-12-04 Mon 16:15]--[2017-12-04 Mon 17:15] =>  1:00
:END:

Check current subs and costs; update list of print subs to drop.

*** PCS

**** Pull together sonification notes
:LOGBOOK:
CLOCK: [2017-12-04 Mon 13:00]--[2017-12-04 Mon 16:00] =>  3:00
:END:

*** Service

All raw Org text looks ugly, especially all those LOGBOOK and PROPERTIES drawers. Don’t let that put you off. This is what it looks like on my screen with my customizations (see my .emacs for details):

Much nicer in Emacs.
Much nicer in Emacs.

At the bottom of the month I use Org’s clock table to summarize all this.

#+BEGIN: clocktable :maxlevel 3 :scope tree :compact nil :header "#+NAME: clock_201712\n"
#+NAME: clock_201712
| Headline             | Time  |      |      |
|----------------------+-------+------+------|
| *Total time*           | *14:45* |      |      |
|----------------------+-------+------+------|
| 2017-12 December     | 14:45 |      |      |
| \_  [2017-12-01 Fri] |       | 7:00 |      |
| \_    PPK            |       |      | 3:15 |
| \_    Service        |       |      | 3:00 |
| \_  [2017-12-04 Mon] |       | 7:45 |      |
| \_    PPK            |       |      | 3:25 |
| \_    PCS            |       |      | 3:00 |
#+END

I just put in the BEGIN/END lines and then hit C-c C-c and Org creates that table. Whenever I add some more time, I can position the pointer on the BEGIN line and hit C-c C-c and it updates everything.

Now, there are lots of commands I could use to customize this, but this is pretty vanilla and it suits me. It makes it clear how much time I have down for each day and how much time I spent in each of the three pillars. It’s easy to read at a glance. I fiddled with various options but decided to stay with this.

It looks like this on my screen:

Much nicer in Emacs.
Much nicer in Emacs.

That’s a start, but the data is not in a format I can use as is. The times are split across different columns, there are multiple levels of indents, there’s a heading and a summation row, etc. But! The data is in a table in Org, which means I can easily ingest it and process it in any language I choose, in the same Org file. That’s part of the power of Org: it turns raw data into structured data, which I can process with a script into a better structure, all in the same file, mixing text, data and output.

Which language, though? A real Emacs hacker would use Lisp, but that’s beyond me. I can get by in two languages: Ruby and R. I started doing this in Ruby, and got things mostly working, then realized how it should go and what the right steps were to take, and switched to R.

Here’s the plan:

  • ignore “Headline” and “Total time” and “2017-12 December” … in fact, ignore everything that doesn’t start with “\_”
  • clean up the remaining lines by removing “\_”
  • the first line will be a date stamp, with the total day’s time in the first column, so grab it
  • after that, every line will either be a PPK/PCS/Service line, in which case grab that time
  • or it will be a new date stamp, in which case capture that information and write out the previous day’s information
  • continue on through all the lines
  • until the end, at which point a day is finished but not written out, so write it out

I did this in R, using three packages to make things easier. For managing the time intervals I’m using hms, which seems like a useful tool. It needs to be a very recent version to make use of some time-parsing functions, so it needs to be installed from GitHub. Here’s the R:

library(tidyverse)
library(hms) ## Right now, needs GitHub version
library(stringr)
clean_monthly_clocktable <- function (raw_clocktable) {
  ## Clean up the table into something simple
  clock <- raw_clocktable %>% filter(grepl("\\\\_", Headline)) %>% mutate(heading = str_replace(Headline, "\\\\_ *", "")) %>% mutate(heading = str_replace(heading, "] .*", "]")) %>% rename(total = X, subtotal = X.1) %>% select(heading, total, subtotal)

  ## Set up the table we'll populate line by line
  newclock <- tribble(~date, ~ppk, ~pcs, ~service, ~total)

  ## The first line we know has a date and time, and always will
  date_old <- substr(clock[1,1], 2, 11)
  total_time_old <- clock[1,2]
  date_new <- NA
  ppk <- pcs <- service <- vacation <- total_time_new <- "0:00"

  ## Loop through all lines ...
  for (i in 2:nrow(clock)) {
    if      (clock[i,1] == "PPK")     { ppk      <- clock[i,3] }
    else if (clock[i,1] == "PCS")     { pcs      <- clock[i,3] }
    else if (clock[i,1] == "Service") { service  <- clock[i,3] }
    else {
     date_new <- substr(clock[i,1], 2, 11)
     total_time_new <- clock[i,2]
    }
    ## When we see a new date, add the previous date's details to the table
    if (! is.na(date_new)) {
     newclock <- newclock %>% add_row(date = date_old, ppk, pcs, service, total = total_time_old)
     ppk <- pcs <- service <- "0:00"
     date_old <- date_new
     date_new <- NA
     total_time_old <- total_time_new
    }
  }

  ## Finally, add the final date to the table, when all the rows are read.
  newclock <- newclock %>% add_row(date = date_old, ppk, pcs, service, total = total_time_old)
  newclock <- newclock %>% mutate(ppk = parse_hm(ppk), pcs = parse_hm(pcs), service = parse_hm(service), total = parse_hm(total), lost = as.hms(total - (ppk + pcs + service))) %>% mutate(date = as.Date(date))
}

All of that is in a SRC block like below, but I separated the two in case it makes the syntax highlighting clearer. I don’t think it does, but such is life. Imagine the above code pasted into this block:

#+BEGIN_SRC R :session :results values

#+END

Running C-c C-c on that will produce no output, but it does create an R session and set up the function. (Of course, all of this will fail if you don’t have R (and those three packages) installed.)

With that ready, now I can parse that monthly clocktable by running C-c C-c on this next source block, which reads in the raw clock table (note the var setting, which matches the #+NAME above), parses it with that function, and outputs cleaner data. I have this right below the December clock table.

#+BEGIN_SRC R :session :results values :var clock_201712=clock_201712 :colnames yes
clean_monthly_clocktable(clock_201712)
#+END_SRC

#+RESULTS:
|       date |      ppk |      pcs |  service |    total |     lost |
|------------+----------+----------+----------+----------+----------|
| 2017-12-01 | 03:15:00 | 00:00:00 | 03:00:00 | 07:00:00 | 00:45:00 |
| 2017-12-04 | 03:25:00 | 03:00:00 | 00:00:00 | 07:45:00 | 01:20:00 |

This is tidy data. It looks this this:

Again, in Emacs
Again, in Emacs

That’s what I wanted. The code I wrote to generate it could be better, but it works, and that’s good enough.

Notice all of the same dates and time durations are there, but they’re organized much more nicely—and I’ve added “lost.” The “lost” count is how much time in the day was unaccounted for. This includes lunch (maybe I’ll end up classifying that differently), short breaks, ploughing through email first thing in the morning, catching up with colleagues, tidying up my desk, falling into Wikipedia, and all those other blocks of time that can’t be directly assigned to some project.

My aim is to keep track of the “lost” time and to minimize it, by a) not wasting time and b) properly classifying work. Talking to colleagues and tidying my desk is work, after all. It’s not immortally important work that people will talk about centuries from now, but it’s work. Not everything I do on the job can be classified against projects. (Not the way I think of projects—maybe lawyers and doctors and the self-employed think of them differently.)

The one technical problem with this is that when I restart Emacs I need to rerun the source block with the R function in it, to set up the R session and the function, before I can rerun the simple “update the monthly clocktable” block. However, because I don’t restart Emacs very often, that’s not a big problem.

The next stage of this is showing how I summarize the cleaned data to understand, each month, how much of my time I spent on PPK, PCS and Service. I’ll cover that in another post.

COUNTER data made tidy

code4lib libraries r

At work I’m analysing usage of ebooks, as reported by vendors in COUNTER reports. The Excel spreadsheet versions are ugly but a little bit of R can bring them into the tidyverse and give you nice, clean, usable data that meets the three rules of tidy data:

  1. Each variable must have its own column.
  2. Each observation must have its own row.
  3. Each value must have its own cell.

There are two kinds of COUNTER reports for books: BR1 (“Number of Successful Title Requests by Month and Title”) counts how many times people looked at a book and BR2 (“Number of Successful Section Requests by Month and Title”) counts how many times they look at a part (like a chapter) of a book. The reports are formatted in the same human-readable way, so this code works for both, but be careful to handle them separately.

Fragment of COUNTER report
Fragment of COUNTER report

They start with seven lines of metadata about the report, and then you get the actual data. There are a few required columns, one of which is the title of the book, but that column doesn’t have a heading! It’s blank! Further to the right are columns for each month of the reporting period. Rows are for books or sections, but there is also a “Total for all titles” row that sums them all up.

This formatting is human-readable but terrible for machines. Happily, that’s easy to fix.

First, in R, load in some packages:

  • the basic set of tidyverse packages;
  • readxl, to read Excel spreadsheets;
  • lubridate, to manipulate dates; and
  • yulr, my own package of some little helper functions. If you want to use it you’ll need to install it specially, as explained in its documentation.
library(tidyverse)
library(readxl)
library(lubridate)
library(yulr)

As it happens the COUNTER reports are all in one Excel spreadsheet, organized by sheets. Brill’s 2014 report is in the sheet named “Brill 2014,” so I need to pick it out and work on it. The flow is:

  • load in the sheet, skipping the first seven lines (including the one that tells you if it’s BR1 or BR2)
  • cut out columns I don’t want with a minus select
  • use gather to reshape the table by moving the month columns to rows, where the month name ends up in a column named “month;” the other fields that are minus selected are carried along unchanged
  • rename two columns
  • reformat the month name into a proper date, and rename the unnamed title column (which ended up being called X__1) while truncating it to 50 characters
  • filter out the row that adds up all the numbers
  • reorder the columns for human viewing
brill_2014 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2014", skip = 7)
%>% select(-ISSN, -`Book DOI`, -`Proprietary Identifier`, -`Reporting Period Total`)
%>% gather(month, usage, -X__1, -ISBN, -Publisher, -Platform)
%>% rename(platform = Platform, publisher = Publisher)
%>% mutate(month = floor_date(as.Date(as.numeric(month), origin = "1900-01-01"), "month"), title = substr(X__1, 1, 50))
%>% filter(! title == "Total for all titles") %>% select(month, usage, ISBN, platform, publisher, title)

Looking at this I think that date mutation business may not always be needed, but some of the date formatting I had was wonky, and this made it all work.

That line above just works for one year. I had four years of Brill data, and didn’t want to repeat the long line for each, because if I ever need to make a change I’d have to make it four times and if I missed one there’d be a problem. This is the time to create a function. Now my code looks like this:

counter_parse_brill <- function (x) {
  x %>% select(-ISSN, -`Book DOI`, -`Proprietary Identifier`, -`Reporting Period Total`) %>% gather(month, usage, -X__1, -ISBN, -Publisher, -Platform) %>% rename(platform = Platform, publisher = Publisher) %>% mutate(month = floor_date(as.Date(as.numeric(month), origin = "1900-01-01"), "month"), title = substr(X__1, 1, 50)) %>% filter(! title == "Total for all titles") %>% select(month, usage, ISBN, platform, publisher, title)
}

brill_2014 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2014", skip = 7) %>% counter_parse_brill()
brill_2015 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2015", skip = 7) %>% counter_parse_brill()
brill_2016 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2016", skip = 7) %>% counter_parse_brill()
brill_2017 <- read_xlsx("eBook Usage.xlsx", sheet = "Brill 2017", skip = 7) %>% counter_parse_brill()
brill <- rbind(brill_2014, brill_2015, brill_2016, brill_2017)

That looks much nicer in Emacs (in Org, of course):

R in Org
R in Org

I have similar functions for other vendors. They are all very similar, but sometimes a (mandatory) Book DOI field or something else is missing, so a little fiddling is needed. Each vendor’s complete data goes into its own tibble, which I then glue together. Then I delete all the rows where no month is defined (which, come to think of it, I should investigate to make sure these aren’t being introduced by some mistake I made in reshaping the data), I add the ayear column so I can group things by academic year, and where usage of a book in a given month is 0, I make it 0 instead of NA.

ebook_usage <- rbind(brill, ebl, ebook_central, iet, scholars_portal, spie)

ebook_usage <- ebook_usage %>% filter(! is.na(month))
ebook_usage <- ebook_usage %>% mutate(ayear = academic_year(month))
ebook_usage$usage[is.na(ebook_usage$usage)] <- 0

The data now looks like this (truncating the title even more for display here):

month usage ISBN platform publisher title ayear
2014-01-01 0 9789004216921 BOPI Brill A Comme 2013
2014-01-01 0 9789047427018 BOPI Brill A Wande 2013
2014-01-01 0 9789004222656 BOPI Brill A World 2013
> str(ebook_usage)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	1343899 obs. of  7 variables:
 $ month    : Date, format: "2014-01-01" "2014-01-01" "2014-01-01" "2014-01-01" ...
 $ usage    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ ISBN     : chr  "9789004216921" "9789047427018" "9789004222656" "9789004214149" ...
 $ platform : chr  "BOPI" "BOPI" "BOPI" "BOPI" ...
 $ publisher: chr  "Brill" "Brill" "Brill" "Brill" ...
 $ title    : chr  "A Commentary on the United Nations Convention on t" "A Wandering Galilean: Essays in Honour of Seán Fre" "A World of Beasts: A Thirteenth-Century Illustrate" "American Diplomacy" ...
 $ ayear    : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...

The data is now ready for analysis.

Brooklyn Raga Massive, In C

music

Earlier this year I saw mention in The New Yorker of a performance by the Brooklyn Raga Massive of In C by Terry Riley. It got a great review, and I was delighted to find the recording on Bandcamp, which I ordered immediately. It came out last month and I’ve listened to it several dozen times.

The other day I saw a link from NPR that includes a video the band posted that trims the 70-odd minute long performance to a delightful 7.5 minutes, with the band waking up and moving through a day.

If you have access to a streaming music service, there will probably be several performances of “In C” available. If you like one, try them all. It’s not to everyone’s taste, but I think it’s an incredible composition and every performance I’ve heard has had at minimum a good dose of magic in it.

List of all blog posts