Quick CSV to JSON parser in python
by Andy Boyle.
I did something in python today. It wasn’t that hard, but I thought I should write up something quickly about it.
We’ve got some people entering in some data in a Google Doc spreadsheet for a project. After fiddling around with attempting to make Google Docs API spit out not-ugly JSON, I said screw it and came up with a new plan: I was going to write a cron job that downloads the CSV from the spreadsheet and then parses it into JSON.
And of course, as a guy who likes Django, I turned to python for the parsing.
First, you will need to remove the first line in the CSV if it had any field names. Put it into a folder somewhere, perhaps. Then make an empty file and name it parsed.json and place it in the same file.
If you’re on a Mac/Unix-y thing, use your terminal and cd into the folder your parsed.json file is. Enter
pwd
in the terminal to see the path to your directory. If you’re on a Windows box, then it’s probably C:\wherever\your\stuff\is\
Now create a new file called parser.py. Enter in the code below, changing /path/to/filename part on line 5 to the bit that was spit out after you typed pwd (or where you inherently found it if you’re in Windows). On line 7, change fieldname1 and whatnot to the name you want each row to have in your JSON file.
import csv import json # Open the CSV f = open( '/path/to/filename.csv', 'rU' ) # Change each fieldname to the appropriate field name. I know, so difficult. reader = csv.DictReader( f, fieldnames = ( "fieldname0","fieldname1","fieldname2","fieldname3" )) # Parse the CSV into JSON out = json.dumps( [ row for row in reader ] ) print "JSON parsed!" # Save the JSON f = open( '/path/to/parsed.json', 'w') f.write(out) print "JSON saved!"
In your terminal, while you’re in the proper directory, type:
python parser.py
BOOM! It should spit out “JSON parsed!” and “JSON saved!” If you wanted to spit out the JSON in the terminal, you could add a line at the bottom:
print out
Voila. Now you can turn your CSV files into JSON.

With no intent to diminish showing your code, I’ll throw out that csvkit has a csvjson utility which does this. It gives you a few other niceties, like unicode support, optional pretty-printing, and the ability to reformat the output keyed on a field from the csv.
(I got tired of writing the same script you just did over and over.)
Cheers,
C
Yeah, I should’ve added in the post that I can’t use such tools like that for this project. It has to run as a cronjob on a server I’m not in charge of and can’t install any new software on. But thanks for the help.
You can avoid having to change the script whenever the column names or input and output file paths change by using command line arguments and redirection. All you have to put in the script is:
import sys
import csv
import json
reader = csv.DictReader(sys.stdin, fieldnames=sys.argv[1:])
print json.dumps( [ row for row in reader ] )
Now, just run “python parser.py fieldname0 fieldname1 fieldname2 fieldname3 /path/to/parsed.json”. The great thing about redirection is that input can come from anywhere. If you’re downloading the CSV with curl, you can just “curl http://spreadsheets.google.com/blahblahblah | python parser.py fieldname0 fieldname1 fieldname2 fieldname3″ and send that output wherever (to a file, like above, or to another program…)
Also, since the reader can be iterated,
list(reader)
will suffice for
[ row for row in reader ]
without having to come up with a throwaway name for the row.
Your comment form ate the redirection in the above comment. It was supposed to be (I’ll try this with ampersands…):
“python parser.py fieldname0 fieldname1 fieldname2 fieldname3 < /path/to/filename.csv > /path/to/parsed.json”
Thanks so much for This one Andy !
I am new to this and just began learning Python.
I noticed that the output JSON get formed like this:
[{"filed1":"2012-02-01", "field2":"hello"}]
Is there a way I change the format to
[{'filed1':'2012-02-01', 'field2':'hello'}].
Basically replace the “” with ”.
Thanks