Miskatonic University Press

Basic citations in Org (Part 1)

citations emacs zotero

I did some sessions at work showing what Zotero can do, and my preparations got me caught up on the improvements in version 6 almost two years ago, which I’d read about but not tried. It’s fantastic. A group at Rice University did a great twenty-minute video that covers it all: Reading, Annotating, Note-taking, and Drafting/Outlining with Zotero 6.

The Zotero project did incredible work on this upgrade. Zotero already was the best research management tool and general purpose citation manager around, and now you can use it for PDF annotations, note-taking and draft-writing, and then easily move all that into your word processor, carrying all your citations along. I’ve always recommended it and now have even more reasons to do so (with still more when version 7 comes out).

But I don’t use it that way myself. Notes I take digitally I do in Org mode in Emacs, and if I want to mark up a PDF I print it and use a (mechanical) pencil. I use Zotero mostly as a research management tool, collecting citations and PDFs and web snapshots; lately I’m starting to use it to help with Wikipedia editing (see Wikipedia’s Citing sources with Zotero and Zotero’s Zotero and Wikipedia/Wikidata).

Seeing everything Zotero can do now made me wonder: Can I do that in Emacs with Org? Zotero users can go all in and make it their main research tool. That has a lot to offer. But I don’t want to write in Zotero, because I write in Org. Where should I draw the line between Zotero and Emacs?

That sent me looking into various Emacs packages and tools. It got complicated. This is a big subject. I decided to start with a core feature of any such system: citations. I’m a librarian. Citations I understand.

Citations in Org

A citation system in Org was released in summer 2021 after years of discussion and a lot of intense work. It builds on some great existing work and is itself an extremely impressive achievement. Handling citations is hard and now Org can do it.

But I’d never tried it, not once. And I’m a librarian! I decided I was going to learn it. I enjoy formatting citations and bibliographies by hand—indeed I enjoy everything about The Chicago Manual of Style—but now is the time to figure out how Org does them. The Org manual pages on citations are still rather sparse, and I thought this would be a path to me adding some documentation to improve them. That’s my plan.

When the citation system came out, the best documentation on it was in July 2021’s This Month in Org by the mononymic Timothy. I think this is still the main thing people refer to when they want to know how the system works. Timothy’s piece is very thorough, and it was a huge contribution to helping people understand how the new features worked. I found his example citation unclear, however, because the author is “org, mode and Syntax, Citation and List, Mailing and Effort, Time.” There is so much going on there it made it hard for me to see how things worked.

The citation processors

Citations in Org are meant to be exported to other document formats such as LaTeX (for PDFs), OpenDocument or HTML. As we’ll see, that’s where a sort of formula specifying a citation is turned into something readable. There are five citation processors available to do this exporting: basic and csl, which export to several different formats, and bibtex, biblatex and natbib, which only go to LaTeX.

I decided to start by working through a simple example with the basic processor, which is the only one requiring no dependencies or anything from outside. It turned out there were some bugs with it, which were fixed by Org maintainer Ihor Radchenko (for example this commit). Clearly some tests are needed for citations—but perhaps no one had tried using the basic system in these past two-and-a-half years? Yet another reason to try them out and write them up.

Here is the first part of my look into the basic citation system in Org.

Disclaimer

This is at the end of the commentary section at the top of oc-basic.el, the file with the code that controls the basic processor.

;; Disclaimer: this citation processor is meant to be a proof of concept, and
;; possibly a fall-back mechanism when nothing else is available.  It is too
;; limited for any serious use case.

This is true, but it’s still a great place to start.

Example book and .bib

First, we need something to cite. I’m going to use LaTeX and Friends by M.R.C. van Dongen (Berlin: Springer, 2012). I chose this for three reasons: first, it’s about LaTeX, which will be part of all this; second, the author’s surname begins with a lower-case letter, which will help with examples; and third, it’s a good book. Check out this fifteen-minute video about it.

Next, we’re going to put that book’s metadata into a .bib file, which is a bibliography database format used by BibTeX and BibLaTeX, which we will skip now but come back to later. I’ll also come back to the excellent Better BibTeX Zotero extension, which is going to be important for all this, and can generate these files magically. For now, we’ll just make a file called Basic.bib that has this in it:

@book{friends,
  title = {​{​{LaTeX}​​} and Friends},
  author = {van Dongen, M.R.C.},
  date = {2012},
  location = {Berlin},
  publisher = {Springer},
  doi = {10.1007/978-3-642-23816-1},
  isbn = {978-3-642-23816-1}
}

This says we have a book which will be identified with the key “friends”. We know the title, author (in Surname, Forename order), date of publication, place of publication as “location,” the name of the publisher, the digital object identifier and the International Standard Book Number. Different citation styles will use or ignore this information in their own ways.

”{​{LaTeX}​}” is in curly braces so its unusual capitalization will be preserved. “Bib(La)TeX case protection rules are incredibly convoluted” as the Better BibTeX FAQ says.

Example Org file

Now we’re ready to work on an Org file. Let’s make basic.org. We’ll use some settings to keep exports clean and simple: no title, author, date or table of contents, and don’t number sections.

#+options: title:nil author:nil date:nil toc:nil num:nil

#+bibliography: Basic.bib

"Most scholarly works have citations and a bibliography or reference
section," wrote a computer scientist [cite:@friends].

That [cite:@friends] is an Org “citation object” in its simplest form. It means: cite the work identified with the key “friends”. Where to look for the metadata about this work? In the Basic.bib bibliography file, as specified with #+bibliography: basic.bib.

Exporting to text

Since we’re keeping things simple, we’ll start by exporting to plain text (C-c C-e t A), which gives:

"Most scholarly works have citations and a bibliography or reference
section," wrote a computer scientist (van Dongen, M.R.C., 2012).

It works!

What citation style is “(van Dongen, M.R.C., 2012)” using? It’s Org’s default basic author-year style. We’ll come back to that later.

When we have citations, we need a bibliography. We add that with one line (#+print_bibliography:), and give it a heading to make it look nicer:

#+options: title:nil author:nil date:nil toc:nil

#+bibliography: Basic.bib

"Most scholarly works have citations and a bibliography or reference
section," wrote a computer scientist [cite:@friends].

* Bibliography

#+print_bibliography:

Then we export again:

"Most scholarly works have citations and a bibliography or reference
section," wrote a computer scientist (van Dongen, M.R.C., 2012).

Bibliography
============

  van Dongen, M.R.C. (2012). /{​{LaTeX}​} and Friends/, Springer.

That looks a bit strange. We see the braces because of a bug in Emacs where something else should be tidying up the BibTeX formatting but doesn’t. This shows in text, HTML and ODT exports. We also see /slashes/ on either end of the title, but that is just the text way of indicating italics. The basic processor is so basic that’s all it can do.

This isn’t showing Org and its citations in the best light, so I’ll switch to exporting to LaTeX and making PDFs, which look much nicer.

Exporting to LaTeX

Let’s export basic.org to LaTeX and generate a PDF, with C-c C-e l p (this assumes a working LaTeX system is installed, of course).

LaTeX export example
LaTeX export example

Phew! That looks a lot better.

Styles and variants

I said [cite:@friends] is the simplest form of citation. There are several “styles” that can be added to it, and most styles have variants. (The word “style” is being unfortunately overloaded given the context of “citation styles” meaning The Chicago Manual of Style and such, but that’s the way it is.) The styles make the citations look different, sometimes very much so. The variants control if the citation is wrapped in brackets and if the first letter is capitalized. [cite:@friends] is in fact using the default style with no variant.

These are the available styles and their variants (as listed in lisp/oc-basic.el in the source code):

style code variants intention
(default)   bare, caps  
author a caps show only author(s)
note ft bare, bare-caps, caps footnotes
nocite n   put in bibliography
noauthor na bare date only
numeric nb   use numbers
text t bare, bare-caps, caps plain text

The variants use codes b (bare; no brackets), bc (bare-caps) or c (caps; first letter of the name is capitalized). Styles and variants are specified using slashes after cite in the citation. For example, the author style uses a, so to use it the default way you would specify [cite/a:@friends], or to use the caps variant, [cite/a/c:@friends]. To use the default style with caps variant, use [cite//c:@friends] with no style code given.

Here is a table of styles (s), variants (v), how they’re specified, what the citation object looks like in the raw, and what it becomes when exported.

s v codes citation result
      [​cite:@friends] (van Dongen, M.R.C., 2012)
  b //b [​cite//b:@friends] van Dongen, M.R.C., 2012
  c //c [​cite//c:@friends] (Van Dongen, M.R.C., 2012)
a   /a [​cite/a:@friends] van Dongen, M.R.C.
a c /a/c [​cite/a/c:@friends] Van Dongen, M.R.C.
ft   /ft [​cite/ft:@friends] ¹
ft b /ft/b [​cite/ft/b:@friends] ²
ft bc /ft/bc [​cite/ft/bc:@friends] ³
ft c /ft/c [​cite/ft/c:@friends]
n   /n [​cite/n:@friends]  
na   /na [​cite/na:@friends] (2012)
na b /na/b [​cite/na/b:@friends] 2012
nb   /nb/ [cite/nb:@friends] (1)
t   /t [​cite/t:@friends] van Dongen, M.R.C. (2012)
t b /t/b [​cite/t/b:@friends] van Dongen, M.R.C. 2012
t bc /t/bc [​cite/t/bc:@friends] Van Dongen, M.R.C. 2012
t c /t/c [​cite/t/c:@friends] Van Dongen, M.R.C. (2012)

Here’s the table in the LaTeX export, with an extra column specifying the style name.

LaTeX export of table of examples
LaTeX export of table of examples

Notice that the bare (b) variants don’t have brackets, and the caps (c) variants turn “van Dongen” into “Van Dongen.” Bare-caps (bc) does both.

You can see the nocite (n) style is unusual because it produces nothing. Its use is to force the entry into the bibliography even though the work is not cited; this need arises now and then.

The numeric (nb) style is different because it’s pointing directly to the bibliography. We’ll get to that next.

I recognize these as more or less common citation forms—shown in a simple way—except for some of those text (t) variants. Maybe I’ll figure them out later. It’s good the basic citation processor is complete and offers all variants, because it helps show what’s going on. As the disclaimer said, it’s a proof of concept.

The footnotes look like this. These variants in order are default, bare, bare-caps and caps.

¹ van Dongen, M.R.C. (2012)

² van Dongen, M.R.C. 2012

³ Van Dongen, M.R.C. 2012

⁴ Van Dongen, M.R.C. (2012)

Bibliographies and cite_export

The bibliography generated after that export looks the same as before:

Very simple bibliography
Very simple bibliography

That works for all the citations but one: the numeric style, where the citation was “(1).” For that we need a matching number in the bibliography. This will give the bones of the Vancouver system, which is common in the physical sciences.

To make that appear we need to use the third part of the citation system, the cite_export keyword, which we’ve left out so far. By not specifying it we were using all the defaults. The default citation processor is basic, which we wanted. To have a numbered bibliography matching the numbered citations we’ll need to get away from other defaults and make some customizations, which I’ll cover next.