Ref desk 4: Calculating hours of interactions

← Hat Rack #thatcamp hashtags the Tony Hirst way →

Ref desk 4: Calculating hours of interactions

24 April 2012 r librarystats

(Fourth in a series about using R to look at reference desk statistics recorded in LibStats. Third was Ref desk 3: Comparing question.type across branches.)

The last two posts looked at two straightforward breakdowns of reference desk activity at a library with several branches: questions by branch (to look at all activity at a single branch) and branches by question (to compare how many questions of the same type are asked at all branches). Both are very useful and lead to interesting questions and discussions. I'm sticking to numbers and charts here, but they are just the beginning of a larger discussion about the purpose of the reference desk is and how it works at a library---and how reference desk statistics are recorded, and what the act of recording means.

It's important to talk about all of that, and I do enjoy those discussions. But right now I'm going to make another chart.

Another fact we record about each reference desk interaction is its duration, which in our libstats data frame is in the time.spent column. As I explained in Ref Desk 1: LibStats, these are the options:

NA ("not applicable," which I've used, though I can't remember why)
0-1 minute
1-5 minutes
5-10 minutes
10-20 minutes
20-30 minutes
30-60 minutes
60+ minutes

We can use this information to estimate the total amount of time we spend working with people at the desk: it's just a matter of multiplying the number of interactions by their duration.

Except we don't know the exact length of each duration, we only know it with some error bars: if we say an interaction took 5-10 minutes then it could have taken 5, 6, 7, 8, 9, or 10 minutes. 10 is 100% more than 5: relatively that's a pretty big range. (Of course, mathematically it makes no sense to have a 5-10 minute range and a 10-20 minute range, because if something took exactly 10 minutes it could go in either category.)

Let's make some generous estimates about a single number we can assign to the duration of reference desk interactions.

Duration	Estimate
NA	0 minutes
0-1 minute	1 minute
1-5 minutes	5 minutes
5-10 minutes	10 minutes
10-20 minutes	15 minutes
20-30 minutes	25 minutes
30-60 minutes	40 minutes
60+ minutes	65 minutes

This means that if we have 10 transactions of duration 1-5 minutes we'll call it 10 * 5 = 50 minutes total. If we have 10 transactions of duration 20-30 minutes we'll call it a 10 * 25 = 250 minutes total. These estimates are arguable but I think they're good enough. They're on the generous side for the shorter durations, which make up most of the interactions.

Next, we need to figure out how many interactions happened of each possible duration. This is easily done using ddply as we did before, but this time to count by time.spent:

> tmp <- ddply(libstats, .(time.spent, week), nrow)
> head(tmp)
  time.spent       week   V1
1 0-1 minute 2011-01-31  811
2 0-1 minute 2011-02-07 1220
3 0-1 minute 2011-02-14 1177
4 0-1 minute 2011-02-21  592
5 0-1 minute 2011-02-28  949
6 0-1 minute 2011-03-07 1037

This tells us that across our library system in the week starting 2011-01-31 there were 811 interactions that took 0-1 minutes. What else happened that week? How many interactions were there of all other durations?

> subset(tmp, week == " 2011-01-31")
       time.spent       week  V1
1      0-1 minute 2011-01-31 811
65    1-5 minutes 2011-01-31 299
129 10-20 minutes 2011-01-31  72
212 20-30 minutes 2011-01-31  28
274 30-60 minutes 2011-01-31   6
332  5-10 minutes 2011-01-31 153
447          <NA> 2011-01-31  11

What's the total amount of time spent dealing with people at the desk? We'll use our rule defined above, and just ignore the NAs:

> 811*1 + 299*5 + 72*15 + 28*25 + 6*40 + 153*10
[1] 5856
> round(5856/60)
[1] 98

Answer: about 98 hours.

That's just one week, though. How can we figure it out for each week, and then chart that? I'd hoped to be able to do it in a nice R way with one of the apply functions, because I was in exactly the situation that Neil Saunders describes in A brief introduction to "apply" in R:

At any R Q&A site, you'll frequently see an exchange like this one:

Q: How can I use a loop to [...insert task here...] ?

A: Don't. Use one of the apply functions.

I'd hoped to go at it with some kind of arrangement with a data frame of duration factors and estimated times:

     duration       estimate
   0-1 minute              1
  1-5 minutes              5
 5-10 minutes             10
10-20 minutes             15
20-30 minutes             25
30-60 minutes             40
  60+ minutes             60

Then I'd apply a function to each row of tmp that would check the value of time.spent, look up the equivalent duration in this data frame to find the estimate that goes with it, multiply V1 * estimate and then put the result in an interaction.time column. I took a stab at but couldn't get it working, so I did it with a loop. It's not the R way, but it worked, and I'll take another crack at it another day.

So to do it the ugly way, first I define a function:

> desk.time.spent <- function(x) {
    if (nrow(x) == 0) { return(0) }
    sum = 0
    for (i in 1:nrow(na.omit(x))) {
      if      (x[i,1] == "0-1 minute"   ) { sum = sum + 1 * x[i,2]  }
      else if (x[i,1] == "1-5 minutes"  ) { sum = sum + 5 * x[i,2]  }
      else if (x[i,1] == "5-10 minutes" ) { sum = sum + 10 * x[i,2]  }
      else if (x[i,1] == "10-20 minutes") { sum = sum + 15 * x[i,2] }
      else if (x[i,1] == "20-30 minutes") { sum = sum + 25 * x[i,2] }
      else if (x[i,1] == "30-60 minutes") { sum = sum + 40 * x[i,2] }
      else if (x[i,1] == "60+ minutes"  ) { sum = sum + 65 * x[i,2] }
    }
    return(sum)
  }

Then I create an empty data frame and reset which branches to look at, just in case I fiddled that somewhere before:

> interaction.time <- data.frame(library.name = factor(), week = factor(), desk.mins = numeric())
> interaction.time$week = as.Date(interaction.time$week) # Can't set this in the line above
> branches <- levels(libstats$library.name)

Then I loop through the branches, using ddply to calculate how many of each question was asked each week, and passing that data frame to the desk.time.spent() function for multiplication. It returns the number of minutes spent that week helping people, and then rbind matches things up to add a new row to the interaction.time data frame.

> for (i in 1:length(branches)) {
    branchname <- branches[i]
    write (branchname, stderr())
    for (j in 1:length(weeks)) {
      spent <- desk.time.spent(ddply(subset(libstats,
    library.name == branchname & week==weeks[j]), .(time.spent), nrow))
      rbind(interaction.time,
        data.frame(library.name = branchname, week = weeks[j], desk.mins = spent)) -> interaction.time
    }
  }
> tail(interaction.time)
    library.name       week desk.mins
429      Steacie 2012-02-27      1460
430      Steacie 2012-03-05      1180
431      Steacie 2012-03-12      1110
432      Steacie 2012-03-19      1286
433      Steacie 2012-03-26      1042
434      Steacie 2012-04-02      1523

(desk.mins is actually time spent not only at the desk but in offices and anywhere else. Just ignore the name, or change it---I'm copying this from a script that does something slightly different.)

> xyplot(desk.mins/60 ~ as.Date(week)|library.name, data = interaction.time,
         type = "h",
         ylab = "Hours",
         xlab = "Week",
         main = "Hours of interactions",
         sub = paste("From Feb 2011 to", up.to.week),
        )

Hours of interactions at all branches

Let's narrow things down and look at only research questions (4s and 5s) at the reference desk. Librarians and archivists often help people with research questions in an office, not at the desk---such practices vary from library to library and discipline to discipline, and of course not all branches have reference/research desks---and those aren't counted here. Nevertheless, it's an interesting view on what happens at the desk, and it's an instructive example about how to slice the data more finely, but as always we have to remember what data is being shown.

> branches <- c("Bronfman", "Frost", "Scott", "Steacie")
> research.interaction.time <- data.frame(library.name = factor(), week = factor(), research.mins = numeric())
> for (i in 1:length(branches)) {
    branchname <- branches[i]
    write (branchname, stderr())
    for (j in 1:length(weeks)) {
      spent <- desk.time.spent(ddply(subset(libstats,
                                          library.name == branchname & week==weeks[j] &
                                          location.name %in% c("Consultation Desk", "Drop-in Desk", "Reference Desk") &
                                          question.type %in% c("4. Strategy-Based", "5. Specialized")
                                          ),
                                   .(time.spent), nrow))
      rbind(research.interaction.time, data.frame(library.name = branchname, week = weeks[j],
      research.mins = spent)) -> research.interaction.time
    }
  }
> interaction.time <- merge(interaction.time, research.interaction.time, by=c("week", "library.name"))
> tail(interaction.time)
          week library.name desk.mins research.mins
243 2012-03-26        Scott      2856          1946
244 2012-03-26      Steacie      1042           110
245 2012-04-02     Bronfman       781           270
246 2012-04-02        Frost       351            55
247 2012-04-02        Scott      1467           960
248 2012-04-02      Steacie      1523            20
> xyplot(research.mins/60 ~ as.Date(week) | library.name, data = interaction.time,
         type = "h",
         ylab = "Hours",
         xlab = "Week",
         main = "Research desk research interactions (hours)",
         sub = paste("From Feb 2011 to", up.to.week),
        )

Hours of research interactions at research desks