I work at a university library, and when I analyse data I like to arrange things by academic year (September to August) so I often need to find the academic year for a given date. Here are Ruby and R functions I made to do that. Both are pretty simple—they could be better, I’m sure, but they’re good enough for now. They use the same method: subtract eight months and then find the year you’re in.
The Ruby is the shortest, and uses the Date class. First, subtract eight months, with <<
.
d « n: Returns a date object pointing n months before self. The n should be a numeric value.
Rather cryptic. Then we find the year with .year
, which is pretty clear. This is the function:
require 'date'
def academic_year(date)
(Date.parse(date) << 8).year
end
Example:
> academic_year("2016-09-22")
=> 2016
The function is very short because Ruby nicely handles leap years and months of varying lengths. What is 30 October 2015 - eight months?
> Date.parse("2015-10-30") << 8
=> #<Date: 2015-02-28 ((2457082j,0s,0n),+0s,2299161j)>
2016 is a leap year—what is 30 October 2016 - eight months?
> Date.parse("2016-10-30") << 8
=> #<Date: 2016-02-29 ((2457448j,0s,0n),+0s,2299161j)>
Sensible. And the function returns a number (a Fixnum), not a string, which is what I want.
In R things are more complicated. How to subtract months from a date in R? gives a few answers, but none are pretty. Using lubridate makes things much easier (and besides, I use lubridate
in pretty much everything anyway).
library(lubridate)
academic_year <- function(date) {
as.integer(format(floor_date(floor_date(as.Date(date), "month") - months(8), "year"), "%Y"))
}
Example:
> academic_year("2016-09-22")
[1] 2016
The floor_date
function gets called twice, the first time to drop back to the start of the month, which avoids R’s problems dealing with leap years:
> as.Date("2016-10-30") - months(8)
[1] NA
But you can always subtract 8 months from the first of a month. Then the function goes to 01 January of that year, pulls out just the year (“%Y”) and returns it as an integer. I’m sure it could be faster.
And once the academic year is identified, when making charts it’s nice to have September–August on the x axis. I often do something like this, with a data frame called data
that has a date
column:
library(dplyr) # I always use it
library(lubridate)
data <- data %>% mutate (month_name = month(date, label = TRUE))
data$month_name <- factor(data$month_name, levels = c("Sep", "Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug"))
Finding the academic year of a date could be a code golf thing, but Stack Overflow has too many rules.