iA


Importing Data From Google Docs Into Django, Step Three: Setting up your spreadsheet and models.py

by Andy Boyle.

Everything going into its proper place.

Now that you’ve installed all the proper site packages and set up your basic Django area, it’s time to actually make some data you’ll want to import and some models to match them.

If you don’t have a Google account, get one and remember your username and password. Next, go to Google Drive and create a new spreadsheet. Let’s call it 2012 Homicides. Then let’s download this spreadsheet, which contains information about the homicides in 2012.

Import this into your Google spreadsheet. The first row of your spreadsheet should have these headers: Address, Date, Time, Location, Neighborhood, Age, Gender, Race, Name, Cause, Story_url, RDNumber, Charges and trials.

These headers should be self explanatory. Now we’re going to edit our models.py so it matches the data in the Google doc. Here’s what you should make your models.py file look like (or use this gist):

from django.db import models


class Homicide(models.Model):
    address = models.CharField(max_length=100, blank=True, null=True)
    date_time = models.DateTimeField(blank=True, null=True)
    location = models.CharField(max_length=50, blank=True, null=True)
    neighborhood = models.CharField(max_length=100, blank=True, null=True)
    age = models.IntegerField(blank=True, null=True)
    gender = models.CharField(max_length=7, blank=True, null=True)
    race = models.CharField(max_length=20, blank=True, null=True)
    name = models.CharField(max_length=100, blank=True, null=True)
    cause = models.CharField(max_length=20, blank=True, null=True)
    story_url = models.URLField(blank=True, null=True)
    rd_number = models.CharField(max_length=15, blank=True, null=True)
    charges_url = models.URLField(blank=True, null=True)
    created_date = models.DateTimeField(auto_now=True)

    def __unicode__(self):
        return self.address

If you’re new to Django, let me explain to you what this is doing. I’m creating a class called Homicide, which will be what each of these rows in the Google doc are. I’m then conforming the fields in the Google doc — Address, Date, Time, Location, etc. — into an individual field for this Homicide class.

Because address, location, neighborhood, gender, race, name, cause and rd_number are all normal alphanumeric characters (so they aren’t going to be used for math, basically), I’m making them CharFields, or character fields. I am then giving them each a maximum length of varying degrees, based on what inferences I’m making about the data (an address field will be longer than a gender field, for instance). Then I’m making them have the possibility of being either blank or null. You can find out more about the difference between blank or null here.

Age is just an integer field — IntegerField — because that’s a number we can do math with. And storyl_url and charges_url are actual web addresses, so we make them a URLField. Now, you may have noticed I made a date_time field instead of two separate fields for date and time. This is because when we import the data, we will be combining it all into one, so that’s a DateTimeField. It’ll be nifty and you’ll be able to impress your friends.

Now we should sync our models with the database on our local machine. First, if you haven’t yet, let’s spin up postgres:

pgup

That command should work, because we made it an alias in step one. Next, let’s make your gdocs_importer database:

createdb gdocs_importer

If you see no errors, that means it worked correctly.Now, make sure you’re in the same directory as your manage.py file, in the main gdocs_importer folder, and let’s sync the database to our models:

python manage.py syncdb

It’ll ask if you want to create a superuser. Just say no for the time being.

Now let’s take a look into the database, just to check that your models were created on your local machine.

psql gdocs_importer

This takes you into the postgres database prompt. Here’s how you display all of the tables in your gdocs_importer database:

\dt

You should see about 10 different rows, all pointing to different tables. The one we’ll want to check out is gdocs_importer_homicides. We can take a look at that with some simply SQL:

select * from gdocs_importer_homicide;

Basically what we’re saying is, hey, give me everything (*) in the table called gdocs_importer_homicide. This should show you all of those fields we just created, starting with id. Nothing’s in the database yet, but we’ve primed it for pulling in data from our Google doc. Exit psql by typing:

\q

Here’s a neat thing a coworker showed me. Say we were going to make changes to the models.py. Previously I thought you had to drop the database and run syncdb again. The other option was to write SQL and edit it within the postgresql console. This is all really annoying. Thankfully, this is an easy trick:

./manage.py sqlreset gdocs_importer |psql gdocs_importer

With this, you run a command called sqlreset, which shows you what the SQL would be to write the current database. Then you pipe what it’s spitting out into the gdocs_importer database. So it writes the SQL for you and is a much cleaner way going about making database schema changes. Neat, huh?

We’re getting close to writing the importer script that’ll pull in all of that data we just put into Google docs. That’s what we’ll be doing next.