Miskatonic University Press

Code4Lib 2010: Monday 22 February

code4lib

My brief notes on what I saw at Code4Lib 2010 in Asheville, North Carolina. I had a great time at the conference and am glad I went. My thanks to Kevin Clarke, Jodi Schneider, and all the other organizers and volunteers.

See also:

Preconference 1: Solr White Belt

Bess Sadler mostly ran through this Solr tutorial, elaborating with lots of examples from Blacklight work and personal experience. There were about 50 people in the room and everyone got Solr installed and working.

We tried:

  • requesting particular fields
  • highlighting
  • looking at how Solr parses a search
  • facets

Useful to know is Stanford's Solr config which gives their relevancy weightings:

<str name="qf">
title_245a_unstem_search^100000
title_245a_search^75000
vern_title_245a_search^75000
title_245_unstem_search^50000
...

That solves the Nature Problem. Putting in Nature used to bring up a book called Naturalism, but now the unstemmed match comes up a lot higher, so Nature will be the top result.

Very good morning. Bess did a fine job and everyone learned a lot of practical, hands-on stuff.

Preconference 2: Hacker 201 with Dan Chudnov

Dan Chudnov had done Hacking 101 in the morning, which used Processing as a way of learning programming and good programming habits. The afternoon session was to use Python and pymarc to hack on MARC, but there was some trouble with everyone in the room getting all the right things installed.

I wasn't too interested in hacking on MARC in Python, so I worked on OpenFRBR and ended up making some good progress through the afternoon and into the evening. I counted that as a hacking success.

Dan talked about the thirty-minute rule: if you're stuck on a problem for 30 minutes, take a break. A good rule. Here's another I'd forgotten: start with the simplest thing and then make it more complicated. I knew I wanted to deal with data from five or six sources in three different formats (JSON, RDF, XML). Different fields from different sources would mean different things and there would be different relations between them.

I was getting dismayed at how complicated this was turning out to be. Finally I realized, "I don't need to get all that working right off the bat. I just need to get one source working. I'll ignore everything else for now and add it later."

And that worked very well. By doing just one thing everything fell into place for me and I got it working quickly and enjoyably. That was nice.