One of the biggest problems I faced when starting out with Django was figuring out how to load data into databases. It’s also at the heart of one of online journalism’s biggest problems: Getting data from your computer to something you can put online. While loading data from spreadsheets into a database isn’t the last step in getting data online, understanding how it works and developing the engineering mindset behind it is certainly helpful.
Many companies sell expensive software that offer some point-and-click methods of getting data online, but I don’t think that’s good for the industry, as it may create a mentality that software isn’t hard, and that “one-size-fits-all” solutions are good for different news organizations with different needs. I’d much rather see organizations have talented folks who understand how to put data online, and one of the first steps to that is loading data into a database.
But I never saw a walkthrough meant for the beginner, showing you how to take a spreadsheet and write a script that allows you to load it into Django. Thankfully, my coworkers at the Chicago Tribune News Applications team taught me some good ways to load data from Google Docs. And because I haven’t seen many write-ups about it, I thought I would start a series on it.
What I plan to do in the next few blog posts is teach you how to set up a local project using Django and PostgreSQL, write an importer for data stored in a Google doc and then write a basic view and a basic template to display the data. If I have time, I’ll also show you how to bake out this file and push it to Amazon S3 as a flat file. If you’re looking for a nifty project that does something similar but uses Flask and is much more robust, check out Tarbell, which was created by some awesome members on my team.
Note: If you find any errors in this tutorial, please either email me (my Twitter name at that google email provider) or comment on the post. This is a work in progress, so I’m bound to have made some mistakes or not fully explained things. Thanks for your help!
Setting Up Your Machine
First things first, let’s get your machine set up properly. Brian Boyer’s got a great gist that explains how, so I’ll be cribbing from that and giving a more detailed explanation. If you can go through his tutorial without needing extra instructions, awesome. If not, go through the next few paragraphs, as I’ll be trying to explain what things mean in more detail. I won’t be doing everything Brian’s asking of you, so feel free to compare between our lists and deciding what you need. I’ll just be helping you install the software you need for this specific project, whereas he includes a lot of nifty tools you may want to adopt.
Furthermore, this tutorial is meant for Mac OSX users, so if you’re on a PC, I’m sorry, but perhaps you should use my tutorial and write up your own way on how to do all of this in Windows. If you use Linux, I’m sure you’re fine and everything I’ll be telling you will make you bored.
Okay. Now, let’s install XCode from the Apple Mac Store. This brings in a bunch of nifty tools for you to be able to do everything I’ll be asking of you later. XCode is a big file, so it may take awhile to download. Once that’s done, get the XCode command line tools by opening up the program, clicking Preferences, then Downloads, then Command Line Tools.
Next you’ll need to install Homebrew, which we will be using to install Postgres, an open source database. You can install it easily by entering this into your Terminal:
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
If you don’t know what your Terminal is or how to find it, read up on it about halfway through this post. Follow the instructions listed when prompted during the Homebrew installation.
Once it’s installed, you’ll want to edit your /etc/paths/ to make sure the installed version of Postgres is the one we use, and not anything already installed on your machine. This is where I would tell you to install Vim and use it to edit everything, including this file, but since I don’t know your level of computer wizardry, let’s just use the built-in nano editor.
In your terminal, type
A screen should pop up, and you can use your up/down keys to move around. You’ll want to go to the line that has /usr/local/bin and delete that whole line. Then, move your cursor to the top left. Hit enter, then type in
Once that’s done, hit CTRL + O, then CTRL + X to exit.
Installing Postgres and other tricks
Next, you’ll install Postgres using Homebrew this way and initialize the database:
brew install postgres initdb /usr/local/var/postgres
As I shall assume you’re using bash instead of zsh (if this sentence makes no sense to you, then you’re using bash), you’ll need to edit your bash_profile (if zsh, then your zshrc). We’ll make some aliases so you can type shorter commands to spin up and down your postgres server.
We’ll use nano again, so type this:
If you’re using zsh, use ~/.zshrc instead of ~/.bash_profile
Now copy and paste this in somewhere, as it doesn’t really matter where:
alias pgup='pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start' alias pgdown='pg_ctl -D /usr/local/var/postgres stop -s -m fast'
What this does is make it so you can start and stop your database with the simple commands pgup and pgdown instead of a really long command that’s hard to remember. Feel free when going about your development travels to edit your bash/zsh profile and throw in aliases. Another one I like to use is to create a local debugging mail server, so I can test that I’m writing/sending emails properly using smtp. Here’s that line of code:
alias pymail='python -m smtpd -n -c DebuggingServer localhost:1025'
Installing pip and virtual environments
Now we’ll install pip, which is a tool for managing Python packages. Think of it like this: Lots of people have written lots of open source software for you to use. Instead of having to go to different websites and find it and install it, and run through all of the different installation and setup commands AND perhaps needing to install software that’s needed for the specific software you just downloaded, you can instead type “pip install packagename.” It’s pretty nifty.
So, first, let’s install pip:
sudo easy_install pip
Next we’re going to install virtualenv and the virtualenvwrapper. This is one of the best things I’ve learned working at the Chicago Tribune. The idea around this is you want to make sure your local environment — your machine where you’re programming the code — matches the same site packages and software of your staging and production environments, so the place you will be ultimately deploying your code for the world to see. Using virtualenv and virtualenvwrapper make it so you can install packages and software using pip and it won’t be installed on the rest of your machine.
Think of it like you’re compartmentalizing your code per project, so you can install newer/older versions of software to match wherever you plan on deploying your project. This way you know what works on your local machine should definitely work elsewhere, and if a coworker or someone else plans on working on your project, they can install all the same software into a virtual environment (where the virtualenv comes from) and hopefully not have any problems.
So let’s install these wonders of modern computing:
sudo pip install virtualenv sudo pip install virtualenvwrapper
Huzzah. Now, let’s add this line to the end of your bash_profile, this time without having to use nano, as the echo does it for you. (If you’re using zsh, change the last part to ~/.zshrc)
echo 'source /usr/local/bin/virtualenvwrapper.sh' >> ~/.bash_profile
Using Virtual Environments
Now we’ve got our virtualenv cracking. Let’s make one and destroy one just for fun. Type this, which means “make virtual environment”:
You’ve now got a virtual environment called gdocs_importer, which this current terminal window will be using. When you create a new virtual environment, you are automatically in that environment. If you open a new terminal window (Apple + T), you won’t be in that virtual environment anymore, you’ll just be using whatever packages you’ve installed system-wide on your computer. You can deactivate the virtualenv — which means to close it out and go to your computer’s normal environment — by typing:
To work on that environment again, type:
Pretty simple, eh? Now let’s say you borked something you installed really bad on the virtual environment gdocs_importer. Normally this would require you having to uninstall local packages on your machine, which can be a real pain. Instead you just need to deactivate it and remove the virtual environment by type the following commands:
deactivate rmvirtualenv gdocs_importer
Now if you try to workon gdocs_importer it won’t show up. That’s because we deleted that virtual environment. Simple create a new one with the command I showed you previously:
And voila, gdocs_importer is back. This is where we will be making our Google Docs importer, so you can keep it open in your terminal. So that’s enough for the first few big steps in creating a project that imports data from a Google doc into a postgres-backed Django application.