Last month I wrote about poking into the CBC’s list of public appearances made by its on-air hosts, which emerged after an interview on Jesse Brown’s Canadaland caused a ruckus. Eventually the CBC started to make available data about where its hosts appeared and whether they got paid. Out of curiosity, I decided to dig into the data a little more. Here’s what I found, and how.
As I write this in early July, the default data showing on the appearances listing is for June. It looks like this:
If you look at the source you won’t see all of the data in the page itself—it’s not a big table that someone entered by hand, for example. Where’s it all coming from? It’s not obvious in the page source because it’s happening in some Javascript, but if you use Firefox’s web console (Ctrl-Shift-i) to follow what’s going on when the page loads, you’ll see that along with all the stuff it’s getting from the CBC’s web site it’s also getting something from Google:
The line in question is:
GET https://spreadsheets.google.com/feeds/list/11Kk-vaj_MKGZdImP54YEh-KxhUMjzDINtZLohfnvbLU/6/public/values?alt=json
Aha, a Google spreadsheet! What’s in there if we look at it ourselves? It’s JSON, so run it through jsonlint to clean it up:
curl "https://spreadsheets.google.com/feeds/list/11Kk-vaj_MKGZdImP54YEh-KxhUMjzDINtZLohfnvbLU/6/public/values?alt=json" | jsonlint | more
Here’s the top:
Bunch of Google stuff … let’s skip by that and look at a chunk from the middle:
Now we’re getting somewhere. There’s a name, a date, an event, a role, and a fee category. There’s the data! It’s all jumbled up and messy the way we’re seeing it, but that’s because it’s ugly JSON. Some Javascript is taking this and reformatting it into a nice table.
Going back to the CBC appearances page, notice there’s also a “Network Radio” tab in the upper right. That data is coming from another Google spreadsheet. Using the console shows this request:
GET https://spreadsheets.google.com/feeds/list/1qqXnT1--bKn2qXigFqoaH09T9vBm4dXlQNQgTjS60tE/6/public/values?alt=json
The ugly string in the middle of the URL is different.
According to the docs on Google’s spreadsheet API, when you publish a spreadsheet the URL has this format:
https://spreadsheets.google.com/feeds/list/key/worksheetId/public/basic
The ugly string in the URLs is the key and 6 is the worksheet ID. June is the sixth month. What if we change that to 5? Turns out it works: you get May’s data.
So now we know that CBC is keeping this appearance data in two Google spreadsheets, each of which has one worksheet per month, and both spreadsheets have been published to the web, which means the data is available in JSON or XML. That’s good, but it’s awkward. I wish the CBC was making the data available in a simple format. They’re not, but we can easily work around that.
I wrote a little Ruby script, cbcappearances.rb, that runs through both spreadsheets and all their monthly worksheets, grabs all the data, and dumps it out in nice, easy to munge CSV. The resulting CSV looks like this:
name,date,event,role,fee
Adrian Harewood,2014-04-24,Don't Quit your Day Job Fundraiser,Host,Unpaid
Heather Hiscox,2014-04-24,Canadian Medical Hall of Fame 20th Anniversary Gala,Host,Paid
Peter Mansbridge,2014-04-25,Canadian Centre for Male Survivors of Child Sexual Abuse Fundrasier,Speech,Unpaid
David Gray,2014-04-25,Canadian Centre for Male Survivors of Child Sexual Abuse Fundrasier,Host,Unpaid
Miyoung Lee,2014-04-25,RBC Top 25 Canadian Immigrant Awards,Host,Unpaid
All of the data in one file and in a format any program can read. (And with dates specified properly.) Perfect!
You can poke at this and dig into it with whatever tool you want—it’s easy to load into any spreadsheet program—but I like R, so I wrote a script that just dumps out some little text reports, host-activity.R:
(If you get the data and run these commands one by one, you can examine the data frames at each step to see how the data gets simplified and reshaped.)
If you run it, it doesn’t make any fancy charts, it just tells you who’s been busy and how:
[1] "Define as being 'busy' anyone who has more than this many appearances:"
[1] 2.333333
[1] "Who is busy?"
name Expenses Paid Unpaid Total Paid.PerCent
1 Lucy van Oldenbarneveld 0 0 12 12 0
2 Adrian Harewood 0 0 10 10 0
3 Amanda Lang 0 8 0 8 100
4 Nora Young 0 6 2 8 75
5 Peter Mansbridge 0 4 4 8 50
6 Rex Murphy 0 8 0 8 100
7 Bob McDonald 0 4 3 7 57
8 Laurence Wall 0 0 7 7 0
9 Matt Galloway 0 0 7 7 0
10 Bruce Rainnie 0 0 6 6 0
11 Doug Dirks 0 1 5 6 17
12 Evan Solomon 0 3 3 6 50
13 Wendy Mesley 0 4 2 6 67
14 Craig Norris 0 0 5 5 0
15 Heather Hiscox 0 5 0 5 100
16 Miyoung Lee 0 0 5 5 0
17 Stephen Quinn 0 0 5 5 0
18 Brian Goldman 0 4 0 4 100
19 Jian Ghomeshi 0 4 0 4 100
20 Rick Cluff 0 0 4 4 0
21 Shelagh Rogers 0 0 4 4 0
22 Stephanie Domet 0 1 3 4 25
23 Tom Harrington 2 0 2 4 0
24 Anne-Marie Mediwake 0 0 3 3 0
25 Dave Brown 0 0 3 3 0
26 David Gray 0 0 3 3 0
27 Duncan McCue 0 1 2 3 33
28 Frank Cavallaro 0 0 3 3 0
29 Harry Forestell 0 0 3 3 0
30 Joanna Awa 0 0 3 3 0
31 Loren McGinnis 0 0 3 3 0
32 Margaret Gallagher 0 0 3 3 0
33 Mark Connolly 0 2 1 3 67
34 Mark Kelley 0 2 1 3 67
35 Matt Rainnie 0 0 3 3 0
36 Shane Foxman 0 0 3 3 0
37 Wendy Bergfeldt 0 0 3 3 0
[1] "Of busy people, who has only done paid appearances?"
name Expenses Paid Unpaid Total Paid.PerCent
1 Amanda Lang 0 8 0 8 100
2 Rex Murphy 0 8 0 8 100
3 Heather Hiscox 0 5 0 5 100
4 Brian Goldman 0 4 0 4 100
5 Jian Ghomeshi 0 4 0 4 100
[1] "Of busy people, who has never done a paid appearance?"
name Expenses Paid Unpaid Total Paid.PerCent
1 Lucy van Oldenbarneveld 0 0 12 12 0
2 Adrian Harewood 0 0 10 10 0
3 Laurence Wall 0 0 7 7 0
4 Matt Galloway 0 0 7 7 0
5 Bruce Rainnie 0 0 6 6 0
6 Craig Norris 0 0 5 5 0
7 Miyoung Lee 0 0 5 5 0
8 Stephen Quinn 0 0 5 5 0
9 Rick Cluff 0 0 4 4 0
10 Shelagh Rogers 0 0 4 4 0
11 Tom Harrington 2 0 2 4 0
12 Anne-Marie Mediwake 0 0 3 3 0
13 Dave Brown 0 0 3 3 0
14 David Gray 0 0 3 3 0
15 Frank Cavallaro 0 0 3 3 0
16 Harry Forestell 0 0 3 3 0
17 Joanna Awa 0 0 3 3 0
18 Loren McGinnis 0 0 3 3 0
19 Margaret Gallagher 0 0 3 3 0
20 Matt Rainnie 0 0 3 3 0
21 Shane Foxman 0 0 3 3 0
22 Wendy Bergfeldt 0 0 3 3 0
[1] "Of the other busy people, who has some paid and some unpaid appearances?"
name Expenses Paid Unpaid Total Paid.PerCent
1 Nora Young 0 6 2 8 75
2 Peter Mansbridge 0 4 4 8 50
3 Bob McDonald 0 4 3 7 57
4 Doug Dirks 0 1 5 6 17
5 Evan Solomon 0 3 3 6 50
6 Wendy Mesley 0 4 2 6 67
7 Stephanie Domet 0 1 3 4 25
8 Duncan McCue 0 1 2 3 33
9 Mark Connolly 0 2 1 3 67
10 Mark Kelley 0 2 1 3 67
Lucy van Oldenbarneveld was extra busy because she hosted six gigs at the Ottawa Jazz Festival.
Rex Murphy and Amanda Lang lead the ranks of people who have only done paid appearances so far. Where did they talk?
Rex Murphy:
- Association of Professional Engineers and Geoscientists of Alberta
- Canadian Association of Members of Public Utility Tribunals
- Building & Construction Trades Unions (Canadian Office)
- CGOV Asset Management
- Jewish National Fund of Canada
- Association of Canadian Community Colleges
- Fort McMurray Chamber of Commerce: conference
- Canadian Taxpayers Federation
Amanda Lang:
- Canadian Hotel Investment Conference
- BASF Canada
- Canadian Restaurant Investment Summit
- Young Presidents Organization
- Alberta Urban Development Institute
- GeoConvention Show Calgary
- Manulife Asset Management Seminar
- Manulife Asset Management Seminar
Hmm. Both generally businessy, and tilting Alberta-wards. Compare to Toronto radio host Matt Galloway, all of whose seven appearances were unpaid:
- Food on Film - TIFF Series (3 of 6)
- Toronto Region Immigrant Employment Council 10th Anniversary & Immigrant Success Awards
- Food on Film - TIFF Series (4 of 6)
- Dundas West Fest Lulaworld Stage
- YorkTown Family Services Dinner
- Toronto Community Foundation Vital Toronto Celebration
- Toronto International Film Festival Food on Film Series
The listings start on 24 April, so there’s only two months and a bit worth of data in the feeds right now, but it’s a good start. It will be interesting to watch how things change over the course of the year. I’ll rerun the script every now and then and if anything looks interesting, I’ll post about it. Of course, going deeper than this, it will be important to see if anyone seems to be in a conflict of interest.