Sean Craig’s Amanda Lang took money from Manulife & Sun Life, gave them favourable CBC coverage piece on Canadaland got me looking at the cbcappearances script I wrote earlier this year.
It wasn’t getting any of the recent appearances—looks like the CBC changed how they are storing the data that is presented: instead of pulling it in on the fly from Google spreadsheets, it’s all the page in a hidden table (generated by their content management system, I guess) and shown as needed.
They should be making the data available in a reusable format, but they still aren’t. So we need to scrape it, but that’s easy, so I updated the script and regenerated appearances.csv, a nice reusable CSV file suitable for importing into your favourite data analysis tool. The last appearance listed was on 29 November 2014; I assume the December ones will show up soon in January.
The data shows 218 people have made 716 appearances since 24 April 2014. A quick histogram of appearances per person shows that most made only 1 or 2 appearances, and then it quickly tails off. Here how I did things in R:
The median number of appearances is 2, the mean is about 3.3, and third quartile is 4 and above. Let’s label anyone in the third quartile as “busy,” and pick out everyone who is busy, then make a data frame of just the appearance information about busy people.
busy is a data frame of information about who did what where, but only for people with more than 4 appearances. It’s easy to do a stacked bar chart that shows how many of each type of fee (Paid, Unpaid, Expenses) each person received. There aren’t many situations where someone did a gig for expenses (red). Most are unpaid (blue) and some are paid (green).
Lawrence Wall is doing a lot of unpaid appearances, and has never done any for pay. Good for him. Rex Murphy is the only busy person who only does paid appearances. Tells you something, that.
Let’s pick out just the paid appearances of the busy people. No need to colour anything this time.
Amanda Lang is way out in the lead, with Peter Mansbridge second and Heather Hiscox and Dianne Buckner tied to show. In R, with
dplyr, it’s easy to poke around in the data and see what’s going on, for example looking at the paid appearances of Amanda Lang and—as someone I’d expect/hope to be a lot different—Nora Young:
Nora Young spoke about healthy communities and education to planners and colleges and universities … Amanda Lang spoke to developers and business groups and insurance companies. They are a lot different.
At this point, following up on any relation between Amanda Lang (or another host) and paid corporate gigs requires examination by hand. If the transcripts of The Exchange with Amanda Lang were available then it would be possible to write a script to look through them for mentions of these corporate hosts, which would provide clues for further examination. If the interviews were catalogued by a librarian with a controlled vocabulary then it would be even easier: you’d just do a query to find all occasions where (“Amanda Lang” wasPaidBy ?company) AND (“Amanda Lang” interviewed ?person) AND (?person isEmployeeOf ?company) and there you go, a list of interviews that require further investigation.
But it’s not all catalogued neatly, so journalists need to dig. This kind of initial data munging and visualization may, however, be helpful in pointing out who should be looked at first. Lang, Mansbridge and Murphy are the first three that Canadaland looked at, which does make me wonder what checking Hiscox and Buckner would show … are they different, and if so, how and why, and what does that say? I don’t know. This is as far as I’ll go with this cursory analysis.
In any case: hurrah to the CBC for making the data available, but boo for not making the raw data easy to use. Hurrah to Canadaland for investigating all this and forcing the issue.