Algorithms and journalisms, part one

by Andy Boyle.

Late night at the Daily Nebraskan

This morning a story from Wired made the homepage of Hacker News. It was about a company that can take in data, run it through a program and spit out a basic news story. The company has been written about before, and when it first bounced around journalism blog-o-circle-sphericals I had meant to write something about it.

This company is doing what we should be doing more of: Automation. Journalists tend to do a lot of work on daily tasks that could be automated. I’m going to give you an example from my younger days as a reporter, something I’m sure many journalists can relate to.

It was early 2007, a year before that photo up there was taken. I was a cops reporter at the Daily Nebraskan, my college newspaper. Three or four times a week I would head to the city/county building complex for crime information. First you would look through a a stack of overnight reports from the sheriff’s office and you’d note any interesting cases. Then the Lincoln/Lancaster County morning media briefing would begin.

Officers would hand you a print-off with everyone arrested overnight and a brief summary (less characters than a tweet) of the crime committed. The assembled reporters — from television, radio and print — would comb through the thick stack and ask questions about interesting crimes. The public information officer would look things up, read off arrest reports and give us information (They often had to do this for legal reasons, if I recall, because open investigative documents aren’t public. Hence the need for someone to read it to you). Sometimes the police chief would appear and tip us off to stories or give context. Someone from the sheriff’s office would wander in and do the same.

My job was to do normal stories that affected campus and students. But I also had to scan the list of everyone arrested and check for anyone who was 18 to 22, prime college ages. Mondays briefing was huge because it included information from Friday night through Sunday, prime collegiate boozing time. I would head back to the newspaper and start typing these names into the unl.edu student directory and Facebook, seeing if I got a match. If I did, I’d circle them on my list.

Then I would sometimes hand the list to the sports department and have them scan the names for “important people,” namely athletes. If there were any, I’d note the name.  I would then call the public information officer and ask about each case. The officer usually had listed whether they were a student or not, as that would be listed as their occupation. This was a nice way to double-check these were indeed students and thus worthy of our coverage.

I’d take the information and either write up individual full articles or throw them into a cops brief column, attempting to get comment from the students. I did this daily. So did/does someone at the Lincoln Journal Star. So did/do probably thousands of journalists across the country.

And it’s incredibly inefficient.

Almost everything I did to identify people on the list was something a computer could have done. Instead of an hour of my time, a computer could do it in 10 seconds. It could give me a list of everyone whose name and perhaps date of birth information matched and other information we entered in about them. Then I could still do my reporting and writing, some of which can’t be replaced by an algorithm, but I’d have taken a bit of the hunting and scanning aspect out of it.

These are the steps to follow at your news organization in order to do this:

  • Daily data from the police department in a format I could ingest
  • Enter the “important names” into a list
  • Writing a program to match the names and also find people college-aged

I think the first part would be the hardest. But maybe not in Lincoln, as their police chief was one of the data-savviest people I’ve ever encountered in my life. Tom Casady, now Lincoln’s public safety director, has blogged for years about data and other musings. He once gave me about 10 years of geocoded crime data for free. You won’t find many heads of departments like him, I gather.

Convincing your public safety officials to get you this data will take some time and some tact. You could present it as a cost-saving measure — less printed pages means less cost to their department (unless they’re requiring your news organization to foot the bill, of course). But odds are they are already emailing themselves this sort of data. They just need to add your news organization to the list.

Finding the important names may be something you will have to undertake yourself. From past experience in trying to set up something similar at a previous job in 2009 (wherein I failed to get the project going), you should just take initiative and build a list yourself. Add everyone you can think of, from city/county council members, important business persons, head of local institutions, sports teams, etc.

Lastly, you will need to write the program. This I can’t help you with. Thankfully, at least one news organization already does this and has brains you can pick. Ben Welsh from the Los Angeles Times already did just this. He talks about that and lots of other awesome ideas in this amazing video.

Seriously. Watch that video if you haven’t. Do it now. I’ll be writing a second post on this topic that he also touches on in the video.

What does a project like this do? It takes a bit of the daily grind off your back, allowing you to do other reporting. If I had done this project while I was still in college, I probably would’ve still attended the morning briefings. But even if it saved me an hour a day, that’s five hours a week, or 12.5 percent of your time. That’s a lot. In one year, five hours a week adds up to 6.5 weeks a year.

Wouldn’t you rather have that time to do other reporting? I sure would.

Have you tried this? Would you be willing to try it? What problems do you see that I don’t? Let me know what you think in the comments.

Photo by Jay Carlson, via Facebook.