Miskatonic University Press

Nice-Looking Printed LibraryThing Catalogue

Last October I was at Access 2010, a conference about libraries and technology. Since 2002, the day before the conference begins there is a Hackfest. Last year it was organized by Dan Scott and Alex Homanchuk. They did a great job and we all had a lot of fun. It was a very productive Hackfest! I worked on something there that I've been meaning to write up for a while, and here it is.

I proposed a Nice-Looking Printed LibraryThing Catalogue: "It's possible to export from LibraryThing in custom spreadsheet formats--but what if you want a nice-looking printed catalogue for a small special library? Use OpenOffice.org or LaTeX, with a scripting language, to generate an attractive printed catalogue."

This was on my mind because I recently became the librarian at The Arts and Letters Club of Toronto. We have about 1700 books in the collection. Our collection policy centres on books by or about members, and then there are sections on Toronto and on clubs, and we have some general reference as well.

The catalogue I inherited was kept in Word and every now and then would be updated, printed out, and put in a binder on a table in the library. I'd like to make the catalogue searchable, and it would be easy enough to generate a PDF from the Word file and put that on our web site. People could do keyword searches in the PDF. That's very basic, but it's enough for basic author or title or keyword searches.

But I'd like to do more. The library uses an unusual classification system. There are several high-level categories, based around the LAMPS disciplines that are the fundamental basis of the Club: Literature, Architecture, Music, Painting, Stage. Along with them we have Reference, Toronto, Clubs, and Subjects Other Than LAMPS. Within each category books are sorted into two sections: by title and by author. This system does a good job of collocating items broadly by LAMPS discipline, but it means that books about A.Y. Jackson are scattered across the Painting shelves. Instead of being all together under his name, they are shelved either by title or by the author of the book.

One way to fix this would be to reclassify everything with the Dewey Decimal Classification. How would I get the Dewey numbers? One easy way would be to use LibraryThing to manage our collection. When one enters a book into LT, it often knows the Dewey number already, or it can find it. The numbers it doesn't know I could look up in the Toronto Public Library's catalogue. They have an excellent collection for Toronto-related things. I could add the Dewey number to the LibraryThing record for my own use and for everyone else's.

Using LibraryThing as a catalogue would simplify a lot of things for me. But it would be overkill for what people in the library actually need. A simple printed catalogue is enough. How could I use LibraryThing to generate a nice-looking printed catalogue? Aha, I thought: a Hackfest project.

At the Hackfest three other people were interested in the idea: Ganga Dakshinamurti (who works at the University of Manitoba), Wendy Huot (Queen's University), and Rebecca Larocque (North Bay Public Library). We sat at a small table in a reading room and started working. One or two other Hackfest groups were working in the room as well, along with a bunch of students who were either reading or sleeping or both.

For the quick view of what we did, look at the Hackfest report. Our bit (done by by Wendy and Rebecca) starts on page 14.

Rebecca, as the slides show, hacked up a way of exporting the data from LibraryThing and pulling it into OpenOffice where she pulled apart the metadata and put it back together in a new way on the page, making a nicely formatted catalogue entry for each book. She fitted the metadata to a template, sort of like a mail merge, where you have a bunch of addresses and print off many copies of the same letter, addressed to different people. Rebecca got all the way done except for one small thing: each entry in the catalogue ended up on a separate page! Instead of having twenty or thirty on the same page, OpenOffice wanted to do a fresh page for each. This presented a small problem for a library with 1700 books, but there must be some simple solution for it.

Having not the slightest clue how to do any of that, I stuck with my limited Ruby and LaTeX skills.

Both Rebecca and I were using LibraryThing's export feature. (You will only see it on that page if you're a LibraryThing user (it's free — if you haven't already tried it, do so) and you're logged in. You can get your data out of LibraryThing in two formats: CSV and TSV labelled as Excel (again, those won't work unless you're logged in).

Rebecca used the TSV and I used the CSV. We were working away, discussing what they contained, when we realized they were slightly different. Here's the order of the CSV:

1. TITLE
2. AUTHOR (first, last)
3. AUTHOR (last, first)
4. DATE
5. LCC
6. DDC
7. ISBN
8. PUBLICATION INFO
10. RATING
11. REVIEW
12. ENTRY DATE
13. COPIES
14. SUBJECTS
15. TAGS

And here's TSV:

1. book id
2. title
3. author (last, first)
4. author (first, last)
5. other authors
6. publication
7. date
8. ISBNs
9. series
10. source
11. language 1
12. language 2
13. original language
14. LCC
15. DDC
16. BCID
17. date entered
18. date acquired
19. date started
20. date ended
21. stars
22. collections
24. review
25. summary
29. encoding

Knowing the LibraryThing book ID is extremely useful because then you can use all the LibraryThing APIs, including librarything.ck.getwork, which gets the "Common Knowledge" for a book. Unfortunately, I didn't have it in the CSV. I asked Tim Spalding about this on Twitter and he said "APIs are a big problem. It's very unclear what we're allowed to do, so they've stagnated while we discuss the issue." He liked the idea of the nice-looking printed catalogue, though, and said he might put someone on the job of making it a built-in feature. I hope that happens!

Enough about the data dumps. Rebecca was working with hers and I was working with mine. I used a Ruby CSV library to parse the CSV, and then I used ERB to build a LaTeX template. It looked like this:

<% items.sort_by{ |a| [ a[2], a[0].gsub(/^(The |A |An )/, '') ]}.each do |i| %>

<% ddc    = i[5] || "" %>
<% title  = coder.decode(i[0]).gsub(/&/, 'and') %>
<% author = coder.decode(i[2]).gsub(/&/, 'and') || "[unknown]" %>
<% publication_info = coder.decode(i[7]) .gsub(/&/, 'and') || "" %>
<% tags = coder.decode(i[15]).gsub(/&/, 'and').gsub(/,/, ', ') || "" %>

<% if author == lastauthor %>------------<% else %>\MakeUppercase{\textsf{\textbf{<%= author %>}}}<% end %>
\hspace{2em}\textsf{<%= title %>}
\newline
<% unless publication_info == "" %>\textit{\hspace{1em}<%= publication_info %>}\newline<% end %>
\hspace{1em}<% unless tags == "" %>\textsc{<%= tags %>}<% end %>

<% lastauthor = author %>

&

<%= ddc %> \\\

<% end %>


All the beauties of typesetting language and a scripting language combined into one.

All of that LaTeX business was to make the printed catalogue look the way Ganga and Wendy had designed. They spent the morning looking at examples of printed catalogues online and at what information we would want to include in one. In the afternoon Wendy was on her own, and she found more excellent examples, photographs and scans of beautiful old catalogues, and made a design for what ours should look like. The catalogues were absolutely gorgeous for their design and typography, and they contained much more information than we'd expected, especially, if I recall, in booksellers' catalogues.

Unfortunately my LaTeX skills weren't good enough to come near to these, but the basics are there, as you can see in this sample that contains a mishmash of various books (PDF). Here's an example of the author listing:

After the books are listed by author there's a classified listing where they are in Dewey Decimal order, the way they would be on the shelf.

I put all of the code on GitHub: nlpltc. There's the Ruby script, the LaTeX template, and a sample CSV file. If you have Ruby and LaTeX installed then it should work on any CSV export from LibraryThing.

My hope is that this will inspire Tim Spalding and his team of book-loving programmers to build a feature like this into LibraryThing so that users can fiddle a few options and then generate their own Nice-Looking Printed LibraryThing Catalogue.

Thanks to Wendy, Rebecca, and Ganga for making the Hackfest so much fun.