Miskatonic University Press

Sorting LCC call numbers in R

code4lib r

Here’s the easiest way to sort Library of Congress Classification call numbers in R:

call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")
library(gtools)
mixedsort(call_numbers)
## [1] "QA 7 H3 1992"          "QA 76.73 R3 W53 2015"  "QA 90 H33 2016"        "QA 276.45 R3 A35 2010"

gtools is part of standard R. The docs says about mixedsort and mixedorder:

These functions sort or order character strings containing embedded numbers so that the numbers are numerically sorted rather than sorted by character value. I.e. “Asprin 50mg” will come before “Asprin 100mg”. In addition, case of character strings is ignored so that “a”, will come before “B” and “C”.

(I don’t know why “Aspirin” is misspelled.)

If you have a data frame (df) with column call_number then you would use mixedorder to sort the whole thing by call number thusly:

df[mixedorder(df$call_number), ]

I asked about this on Stack Overflow and on the Code4Lib mailing list last July, then I went on vacation and sort of forgot about it. Nine months later, I thanked Li Kai, who pointed me to a Stack Overflow that solved my problem and let me then answer my own question.

Unrelated library sign.
Unrelated library sign.