Miskatonic University Press

York University Open Access Policy for Librarians and Archivists

13 October 2009 open.access york

As my colleague John Dupuis reported last week, York University Libraries, where we work, has adopted the York University Open Access Policy for Librarians and Archivists, which as I understand it is a green self-archiving mandate:

York University Open Access Policy for Librarians and Archivists

Librarians and archivists at York University recognize the importance of open access to content creators and researchers in fostering new ideas, creating knowledge and ensuring that it is available as widely as possible. In keeping with our long-standing support of the Open Access movement, York librarians and archivists move to adopt a policy which would ensure our research is disseminated as widely as possible and available in perpetuity through deposit in York’s institutional repository, YorkSpace.

Policy Statement

Academic librarians and archivists at York University1 commit to making the best possible effort to publish in venues providing unrestricted public access to their works. They will endeavour to secure the right to self-archive their published materials, and will deposit these works in YorkSpace.

The York University academic librarian and archivist complement grant York University Libraries the non-exclusive right to make their scholarly publications accessible through self-archiving in the YorkSpace institutional repository subject to copyright restrictions.

Guidelines

This policy applies to all scholarly and professional work produced as a member of York University academic staff produced as of the date of the adoption of this policy. Retrospective deposit is encouraged. Co-authored works should be included with the permission of the other author(s). Examples of works include:

  • Scholarly and professional articles
  • Substantive presentations, including slides and text
  • Books/book chapters
  • Reports
  • Substantive pedagogical materials such as online tutorials

Works should be deposited in YorkSpace as soon as is possible, recognizing that some publishers may impose an embargo period.

This policy is effective as of 01/10/2009 and will be assessed a year after implementation.

I have six things in YorkSpace right now: a book chapter, a magazine article, and slides (two with audio) from four talks.


Access 2009, Saturday #3: Gwendolyn MacNairn, Zotero: A Better Way to Go?

11 October 2009 conferences

This was Gwendolyn MacNairn of Dalhousie University talking about research she’d done about students using Zotero. There’s no video of the talk up yet, but Peter Zimmerman blogged about it more fully than I do, as he did with all the other talks. My notes:

They do IL for new students in comp sci: Springer, IEE Xplore, Safari, Science Direct, etc. Tell them the importance of academic integrity. It’s been a problem for them.

Question: “How do you [the students] organize the information you have collected so that you know exactly what you have and where you got it?”

RefWorks isn’t popular with their students. All of the clicking and stuff just to get a list of references out isn’t worth it. Too much work, too confusing. So she thought that Zotero might make a difference.

She took ten students, gave them Firefox, had them install Zotero, then gave them a job: pick a subject, find and store four references (an article, a PDF, a blog post, a YouTube video) about it in Zotero, the generate a list of references.

To demonstrate what she wanted, she did a demo with the students on the question “Where does Wikipedia fit in the academic world?” She found the PDF of the Pew report.

She was doing this with Zotero version 1 [too bad, in a way, version 2 is so much more powerful]. She described the various steps up to generating the bib in APA.

She found: students save a LOT of PDFs. Sometimes without information to make proper citations (given the way PDFs save).

About 1 in 10 students kept on (or said they would keep on?) using Zotero.

She covered the Endnote/GMU case.

Timeline feature in 2 good. Students remember things by when they looked at them, not so much by title or other citation information. Makes it easier to find something archived away a while ago.

See also games: BiblioBouts (U Michigan game about doing good citations, built on Zotero), Vertov at Concordia.


Access 2009, Saturday #4: William J. Turkel, Hacking as a Way of Knowing

11 October 2009 conferences

Access 2009 closed with an excellent talk by William J. Turkel, a historian at the University of Western Ontario. Check out the Lab for Humanistic Fabrication and an online book he co-wrote, The Programming Historian, “an open-access introduction to programming in Python, aimed at working historians (and other humanists).”

The video of the talk isn’t online, but it probably will be soon, and I recommend you watch it or listen to it. Peter Zimmerman blogged about it, and I know there was a lot of talk in the IRC backchannel but I haven’t seen other blog posts about it.

This was exciting stuff and a great talk to close the conference. I talked to Turkel after about this and that and really enjoyed the conversation. To exchange contact information we each held our Android phone up to the other’s and scanned a QR code that contained all the details.

A note about the data compression and Kolmogorov complexity stuff mentioned below. I asked him about this after and this is my understanding. If you have a set of texts, for example biographies from the Dictionary of Canadian Biography, you can find groupings of related texts by using compression. Take text A and text B and compress them to A.bz2 and B.bz2 (using bunzip). Concatenate A and B to AB, and compress to AB.bz2, and then compare the size of AB.bz2 to A.bz2 + B.bz2. If there are a lot of words in common in A and B then AB will have duplications that will let the compression algorithm reduce the size of of the compressed file it generates. If AB.bz2 is smaller than A.bz2 + B.bz2 then you know bunzip has been able to do some extra compressing, so there’s duplication in A and B, so they’re related. Doing pairwise comparisons of compression ratios like this will group all the texts into clusters. For example, the articles about the Canadian Pacific Railroad and Sir Sanford Fleming will have a lot of overlap, but the article about Leonard Cohen will use different words, so when you compress and compare them this way and that the ratios will tell you that the first two are more related to each than either is to the third.

When he says that data compression is a proxy for Kolmogorov complexity, he means that if you have the source code for bunzip and a compressed text file, you have, roughly, the minimum amount of information you need to express the original text file. (Assume that bunzip’s source code is as efficient as possible and that its algorithm is the most efficient we know.) I’d come across this idea of complexity before in reading some of Gregory Chaitin’s work algorithmic information theory but I don’t know much about it, and had never heard of this clustering and compression stuff. Very cool.

Here are my notes:

Research built on a framework of citations, but it’s not presented that way, people write books as though it’s a seamless whole. Footnotes there to plug one’s work into everything else. Part of what historians do is to dig into other people’s footnotes, to find things or to attack.

Works built on other works. Audience wants to talk back. Rich backchannel going on here at Access, for example. Scholars should do something with all of that audience/reader-generated stuff.

How can scholarly works be designed so that they can be hacked, remixed, reused in uncontrolled ways? That requires open source, open access.

What if history were written by anonymous people? Showed http://en.wikipedia.org/wiki/Hack as example. Wikipedia tip of iceberg of how we deal with authority etc. New form of culture, relevance, peer review, etc. Traditional forms of scholarship can be rethought so we can do peer review as networks.

Textual analysts would love Wikipedia. Every change is recorded and available for download. Showed History Flow diagram of history of edits of a Wikipedia page.

Another project: code_swarm. Take history of edits to a FLOSS project and visualize it as a movie. [I ran this on the VuFind source code a few months ago. It’s easy to use and the results are wild. Try it out.]

When writing the history of the Apache project, for example, in two-three decades a historian would have all kinds of material available that was never available to historians before. Tens, hundreds, thousands of individuals networked together, exchanging information in time. Job of humanists is to make sense of that, what we do as human beings.

PW Anderson (Nobel) wrote short paper in Science (Vol. 177, No. 4047, 4 August 1972): More Is Different. At different scales, different laws kick in. He’s against reductionism.

Compare paper vs digital. Reproducing paper costs about the same as producing it. Reproducing digital is zero cost.

Little attention paid to what scholarly communication will/should look like after the monograph.

Plebeian Lives. Will be able to follow about 40% of people who in 18th century London all through their lives, through all of the data that’s been aggregatd about them. http://www.shef.ac.uk/hri/projects/projectpages/plebeianlives.html

Also mentioned Old Bailey project. http://www.oldbaileyonline.org/

Machine learning, users training machines to geneate better search results, machines learning from learners as they learn.

He’s also doing viz with large data sets in humanities: “This is really awesome and it’s pretty simple to do and it’s surprisingly deep.” Take compression algorithms (gzip, zip) and use them to do other cool things. Can use them to build a general purpose yardstick to see if two files are related or different. Use compressor’s ability to find redundancies between texts. Used on Dict Cdn Bio it sorts them out into clusters: Jesuits, Frobishers, Quebeckers, all fall into groups. Data compression serves as a proxy for the Kolmogorov complexity.

http://en.semapedia.org/ Print out 2d bar codes that give a link to the Wikipedia entry for the place the code is stuck. Geotagging.

Smartphones have compasses and acceleromters built in. Augmented reality. He has an Android.

Merging of physical and virtual changing the way we understand place and the way we understand the past. Each thing in a place is a time traveler from the past.

Some scenes become contested: a crime scene, plane crash, tracks at a watering hole. Each scene has interpreters who can make sense of what it means. Now can attach virtual sources to physical objects. Every physical thing can become more inforamtin about itself.

Quoted Bruce Sterling. Every object can become the protagonist of a documented process.

What happens to the idea of a curated object when every object knows its own history?

The future of public history looks like the holodeck, says Turkel’s students. In Star Trek, holodeck often used to simulate the past, eg Janeway being an Austen-era governess.

He tells students: Brainstorm gadgets, appliances that somehow magically dispense history. What would it be like if a tap on the wall could pour out history? But you can’t reinvent the holodeck.

They come up with: Heritage Knitting Needles that remember every pattern they’ve ever knitted. Reverse Babel Fish. Put it on your ears and make it so that everyone around you is speaking Middle English or Ancient Greek or some language you don’t know. Tangible Spray. Spray it around, it makes a cloud, and you can reach in and feel the past until the cloud has dissipated. [This reminded both Bess Sadler and me of Ubik.]

Star Trek world is a cradle-to-cradle universe. Everything exists for only a brief time, out of the replicator until they no longer need it and they recycle it where it’s rereplicated as something else. What’s the role of curation there? What about a Buddhist temple that’s burned down and rebuilt every 20 years? (It’s not been given heritage status because the object itself is not old enough, even though the ritual is thousands of years old.)

For way too long we’ve thought of computers as objects on desks. They let us become brains in vats if we want. We can use them to make virtual ghosts to haunt real places. They can let us manipulate physical objects and data at the same time.

Ubiquitous computing, fabrication. Student assignment: visualize the work of William Harvey. He used some ethically repugnant techniques in his work.

“Matter is the new medium.” With computers we can make documents on our desks, and now it’s getting so we can make objects on our desks. For less than $5,000 you can buy a laser scanner that will turn a 3D thing into a 3D model. They did that with a model heart for the Harvey project.

Digital data very plastic, reusable, scalable. They had a computer-controlled milling machine to print out physical objects. Using a computerized casting process means you can change the scale and size of the objects, which you can’t with regular casting.

Loop around between virtual and tangible. Take something real, digitize it, change it, make a new real thing. We can take advantage of each half and make hybrids that invite hacking, open source, reshaping.

Most of Turkel’s colleagues don’t get all this stuff he’s talking about. They think it’s for scientists and engineers.

He runs hackfests for humanists. “Every since Levi-Strauss we’ve been celebrating the bricoleur, but it’s another thing to be faced with a pile of waste and turn it into something else.”

We need new spaces and tools to do all this. Kindergartens are better equipped for hands-on research than grad schools. Not too hard to buy a milling machine, harder not to poison your students or set the carpet on fire when you’re soldering.

Role of hobbyists mostly written out of history of technology.

Mention of woman who has a home fabbing setup, she can make her own silicon chips. Toaster Project, guy making a toast from scratch, starting by mining iron.

RepRap, a machine that can replicate anything, including itself.

Thingiverse.

Ultimate goal: you have the scanner, the 3d printer, and then a recycling device all on your table.

What happens if you could make all of those plastic things in stores? What if you could give away the plans for them, as open source?

Instructables. People sharing how to do things.

Pachube. Mash up things in the Internet of things, feeds of realtime data open to everyone.

“If you look through a Victorian book of mechanisms …” Now you can print out everything in that book to see how it works.

We can now print out things from past environments, handle them, feel them, use them to enrich the research process.


Access 2009, Saturday #2: Cathy Nelson Hartman and Mark Phillips: The Portal to Texas History

11 October 2009 conferences

Peter Zimmerman’s notes, video of the talk.

The Portal to Texas History looked like a great project, but the conference was catching up with me now and all I wrote down were two URLs: http://texashistory.unt.edu, http://beta.texashistory.unt.edu.


Access 2009, Friday #6: Roy Tennant, Inspecting the Elephant: Characterizing the Hathi Trust Collection

11 October 2009 conferences

Roy Tennant talked about the Hathi Trust and the work he’d done hacking around with the metadata. As interesting a project as it is and as his hacking was, it was overshadowed by the video he did to go with Stan Rogers’s song White Collar Holler, which many people bought from iTunes while it was still playing up at the front. Watch the video of the talk to see it.


Access 2009, Friday #5: Dorothea Salo, Grab a Bucket! It's Raining Data!

11 October 2009 conferences

I’ve been reading Dorothea Salo online for years and it was nice to finally meet her and to see her in action. She gave a very good talk. Watch the video of her talk to get the full benefit. One thing I liked about her talk was that not only did she not put a lot of words up on her slides, just big interesting pictures, but she’d reuse the pictures and come back to one she’d shown before, but say new things about the related subject. As her talk went on we saw some pictures several times, each connected to some particular topic, and the visuals helped us connect what she was saying to what she’d said before about the topic. Interesting technique.

Here are my notes, and as usual Peter Zimmerman blogged it copiously.

Exciting time to be a digital libarian: there’s a whole new world of stuff out there.

Says she’d been cast as the Cassandra of Open Access. But she’s definitely in favour of it—hard to be against something that is an unambiguous good. But actually working at it shows it’s not easy. We’re asking for something for nothing from faculty members. And the same kind of foresight and designing and planning is what she’s seeing in the data world. Is she now the Cassandra of data curation?

Says she’s going to point out a lot of problems about data curation, but she’s all in favour of it.

Focus on one problem: fit between content and container.

What do we know about research data? There’s a lot of it. Even if we admit that a lot of big projects (LHC) will take care of their own stuff, we have a lot left over to handle. Can we?

“Data are there to be interacted with.”

We’ll need to get rid of all technical barriers around reusing data. “Different kinds of data have different kinds of affordances.” They get used for different purposes.

“Data are wildly diverse” so you need different places to put them, depending on their nature. Two photographs are the same, no the content. But a book encoded in TEI and a book that’s a collection of page scans need to be treated differently. Example of scientist who takes lots of images of the same cell and then builds a 3D model of it: they need to be all tied together in a sensible way and they won’t all fit into eg DSpace.

“Data are already out there.”

All the data that we already have is sitting around in different silos and it’s all in danger. We made our own mess

Lots of data is analog (eg handwritten lab notebooks) that needs to be digitized. Can we scale up to that? Today, probably not.

“Data are project-based.” Example: http://exploringthehyper.net/, a web-based thesis. How to preserve that? [dchud and vphill said in IRC: Fire up Heretrix.])

“Data are sloppy.” If our systems only accept nice clean data then it won’t handle real world stuff.

“Data aren’t standardized.”

Our big bucket: The digital library. Another big bucket: The institutional repository. But you can’t just put an IR in place and then expect it to work. “There is no magic pixie dust for digital curation.”

Why keep calling things digital as though it’s something special? How do you then brand a digital collection if you think that digital is normal?

We built digital libraries same as print ones: careful selection and attention. Eg. Naskapi Lexicon at LAC.

Made a point about cyberinfrastructure going to the people with the money, but I missed it while checking IRC.

Archives don’t keep everything. They throw a lot out. So we can’t keep every bit of research data that comes our way. How do we decide? How do we balance giving good service to faculty and not swamping ourselves with garbage?

“Production is a Taylorist’s dream.” So you end up only digitizing the stuff that easily fits your production line model. Might have great images but no finding aids. Digital libraries specialize themselves by data types for efficiency’s sake. That’s a problem when more data comes in: it’s diverse in nature, it doesn’t fit into our Fordist model for handling it.

How do we handle it when the user’s technology and ours don’t match? Eg the web-based thesis. The best you could do with that site is to take on its whole technology stack and then future-proof it. That’s a lot of work, a ton of work.

She’s scared to death that so many librarians don’t realize all this is a problem. They just don’t realize what’s involved and what it’ll mean to handle it all.

“Everything about how we cost out and budget a project will have to change.”

What if you don’t have a Taylorist production model? You just hack something together for a given project, for now, not for the future? You have a silo. Many digital libraries are silos. Example: Decameron Web.

“Project silos are not really part of the web.” Article called them “cabinets of curiosities.”

Some librarians talk about the “context of an object” as though if you take something out of its context everything is broken. She says context is fluid. Context is constantly being built and rebuilt. We should expose our digital objects so they can used and reused in new contexts: that’s no decontextualization, that’s REcontextualization.

Presentation is content-specific. Books browse differently than maps, etc.

We’ve lost a lot of projects already to all these problems. Mellon-funded digital humanities projects have disappeared completely.

What about IRs? “Institutional” is becoming a problem. If we want to bring stuff into our IR, we need to prove a link to our institution. The more tenuous the link, the more work involved, no matter how worthy the project.

We can take things into our IR that are final, static and immutable: but to researchers, that stuff is dead. By the time it’s in that state, they’re done with it.

IRs terrible at dealing with diverse kinds of data. Not everything is a PDF of a research paper.

People see the ugly interfaces we have (DSpace, Fedora) and run away.

“Any metadata you want … as long as it’s key-value pairs.” “Do anything you want … as long as it’s download.” No way to deal with real-world data.

Summarizing: “We need bigger, better buckets. Silos are both necessary and unacceptable. We have a lot of modeling to do. And meta-modeling. We have a lot of code to write. We can’t code or model in isolation. Fedora is the new world. But Fedora must change. Focus on the start … not so much on the finished product. Solr brings it all together.”

Hopes she can stop being the Cassandra of data curation and instead become its Clio.


Access 2009, Friday #7: Bess Sadler, Blacklight: Findability for Your Whole Collection

11 October 2009 conferences

Bess Sadler was back up, talking about Blacklight. Pete Zimmerman took notes on this and the next talk, which were both short and took up one 45-minute slot. See the video.

“If your interface requires instructions, it needs to be redesigned.” —Dan Rubin

Bad things about their old catalogue:

  • lack of relevance ranking
  • lack of permanent URLs/RSS
  • siloing of collections
  • lack of object type-appropriate behaviour
  • inability to respond to user requests and suggestions

Data sources they want to bring all together:

  • library catalogue
  • IR
  • theses and dissertations
  • Google Books
  • library digitization projects
  • departmental digitizaton projects
  • faculty research output
  • archival finding aids
  • licensed journals and databases

Solr is the anti-silo.”

“The 30,000 steamboats problem.” How do you make sense out of piles of stuff that are all of the same, and overwhelm all the other results?

They use Cucumber to define and test how their relevancy rankings should work, giving specific cases and the expected results. When her librarians send her bug reports about rankings they send them in Cucumber format!!

On home page, they assume that most relevant result is an exact match on the title. On a special music place that’s not true, so they boost the importance of names. Music students have such special needs that they did special stuff for them.

http://projectblacklight.org

Plugin structure allows for local customizations without forking.


Access 2009, Friday #9: Andrew Nagy and Heather Tones White: Mobile Apps

11 October 2009 conferences

This was another twofer, with two people talking about mobile apps. Now that I have an Android these are much more on my mind! The video of the talk isn’t up and it looks like Pete Zimmerman didn’t blog it so I don’t have any supplemental links for you.

(Before I give my short notes, I hope people will bear in mind that not every mobile phone is an iPhone. Building iPhone apps is building for a very closed, locked-down environment. People should build web sites that work on all devices. Thought: The mobile web is where the web was in 1996, where everyone was building web sites optimized for one particular browser. Nevertheless, all of this mobile work is exciting stuff and I’m glad to see it.)

Andrew Nagy showed a Duke U iPhone app Then U Virginia mobile app. Internet was b0rked so he couldn’t pull it up. http://mobile.virginia.edu/, http://m.lib.virginia.edu/.

Heather Tones White (who put her talk in U Saskatchewan’s institutional repository) showed a video showing how the U Sask mobile app works.

20% of incoming students have an iPhone or iPod Touch. 77% of arts and sciences students have a cell phone, mostly used for talk and text.

Mobile version of web site will launch soon. Using customized version of Drupal Mobile-tools module. They use browscap module to detect the device type.

Athabasca U is working on a Device Detection Framework.

I asked why they made it an iPhone app instead of a mobile-friendly web site, and they said it basically grew out of a course about iPhone development. The library’s site is a mobile site, not an app.