Programatically uploading to blobstore in python

Officially this is something that cannot be done. Or rather that shouldn't be done. When you look at the google appengine docs on "uploading to blobstore" this is what they have to say:

Blobs are useful for serving large files, such as video or image files, and for allowing users to upload large data files.

To prompt a user to upload a Blobstore value, your app presents a web form with a file upload field.

So ok, obviously the official documentation isn't of much use here since it only talks about letting users upload files. But I needed something different. I needed to fetch an image from an url (gotten by intricate means, different story) and store it in the blobstore so it could later be served to many users. Obviously since file access isn't permitted on appengine the only choice left was storing the file in the blobstore.

Naturally someone else has had this problem before right?

No. There are no solutions I could find online. None. Nada. Zilch. Niente.

After a few hours of hacking a week or so ago, however, I got it working.

Essentially the solution is to fake a form post to the blobstore url the API creates. An interesting gotcha is that a redirect happens. Initially I thought I was making the form post right back to my application, but apparently you're first posting to the blobstore, then the blobstore posts back to you. For some reason I couldn't keep the associated meta-data to go through with the request so there's an ugly-ish workaround that happens.

Another thing that's important to note for this tutorial/howto is that I am using django-nonrel and that the initial event that starts the process is triggered by appengine's task queue.

The howto

First, these are all the imports I'm using, there's quite a few, so heh :)

from django.http import HttpResponse, HttpResponseBadRequest, HttpRequest
from django.views.decorators.csrf import csrf_exempt
from django.conf import settings
from django.core.urlresolvers import reverse
import simplejson as json
import urllib2, urllib
from cStringIO import StringIO

from google.appengine.api.urlfetch import ResponseTooLargeError, DownloadError
from google.appengine.ext import blobstore
from google.appengine.api import urlfetch

from forms import ArticleProcForm
from models import Article
from lib import ImageExtractor
from lib import urllib2_file
from lib.urllib2_file import UploadFile
from lib.decorators import form_valid

First thing that you're going to need is the function that starts the whole process (in my case this is a django view)

class ArticleProcForm(forms.Form):
    article = forms.IntegerField(required=True)

@csrf_exempt
@form_valid(ArticleProcForm, 'POST')
def article(request):
    try:
        article = Article.objects.get(id=request.form.cleaned_data['article'])
    except Article.DoesNotExist:
        return HttpResponse(json.dumps({'status': 'Bad Article'}))

    try:
        image_url = ImageExtractor.getImages(article.url)[0]['url']
    except IndexError:
        pass
    else:
        # important bit
        try:
            image = StringIO(urllib2.urlopen(image_url).read())
        except (urllib2.HTTPError, DownloadError):
            pass
        else:
            image = UploadFile(image, '.'.join([str(article.id), image_url.rsplit('.', 1)[1][:4]]))
            upload_url = blobstore.create_upload_url(reverse('Articles.views.upload'))

            try:
                urllib2.urlopen(upload_url, {'file': image})
            except (DownloadError, RequestTooLargeError):
                pass
        # end of important bit

    return HttpResponse(json.dumps({'status': 'OK'}))

Here is basically what happens in the important bit:

Download image from url and change it to a StringIO
Make an UploadFile (basically a bundle of byte-string-data and desired filename)
Create an upload_url with the blobstore API
Fake a file-upload form post

The next thing we need is a view that will handle the request the blobstore will send back to our app.

@csrf_exempt
def upload(request):
    if request.method == 'POST':
        blobs = get_uploads(request, field_name='file', populate_post=True)

        article = Article.objects.get(id=int(blobs[0].filename.split('.')[0]))
        article.media = blobs[0].filename
        article.parsed = True
        article.save()

        return HttpResponseRedirect(reverse('Articles.views.upload'))
    else:
        return HttpResponse('meow')

Basically it extracts the article's id from the filename (the only way I could make work to pass that information) and stores some changes into the datastore. You'll notice that I'm basically just storing the article's id again in another field, this is to preserve knowledge of the file extension. It's also important to note that the blobstore requires a redirect response upon success, otherwise it will throw an error.

Here is the get_uploads function I found online somewhere.

def get_uploads(request, field_name=None, populate_post=False):
    """Get uploads sent to this handler.
    Args:
      field_name: Only select uploads that were sent as a specific field.
      populate_post: Add the non blob fields to request.POST
    Returns:
      A list of BlobInfo records corresponding to each upload.
      Empty list if there are no blob-info records for field_name.
    """

    if hasattr(request,'__uploads') == False:
        request.META['wsgi.input'].seek(0)
        fields = cgi.FieldStorage(request.META['wsgi.input'], environ=request.META)

        request.__uploads = {}
        if populate_post:
            request.POST = {}

        for key in fields.keys():
            field = fields[key]
            if isinstance(field, cgi.FieldStorage) and 'blob-key' in field.type_options:
                request.__uploads.setdefault(key, []).append(blobstore.parse_blob_info(field))
            elif populate_post:
                request.POST[key] = field.value
    if field_name:
        try:
            return list(request.__uploads[field_name])
        except KeyError:
            return []
    else:
        results = []
        for uploads in request.__uploads.itervalues():
            results += uploads
        return results

Now the process of serving this blob to the browser is very simple and goes something like this:

class ImageForm(forms.Form):
    id = forms.CharField(required=True)

@form_valid(ImageForm, 'GET')
@cache_response
def image(request):
    blob = BlobInfo.gql("WHERE filename='%s' LIMIT 1" % request.form.cleaned_data['id'])[0]

    return HttpResponse(BlobReader(blob.key()).read(),
                        content_type=blob.content_type)

One final note

And one VERY important final note. The vanilla urllib2 library can't handle file uploads, so I found one online that can. It's called urllib2_file.

However it doesn't quite work on google appengine. For example it can't handle being told what you want the filename to be and some other details because it relies on raw file access. So I changed it a little bit, unfortunately I don't quite know how to upstream my changes so I'm hosting it on github.

You can get it at github, feel free to contribute.

Using BlobReader, wildcard subdomains and webapp2 on Google AppEngine (notdot.net)
Django protip #2: Forms are awesome (swizec.com)
App Engine SDK 1.3.5 Released With New Task Queue, Python Precompilation, and Blob Features (googleappengine.blogspot.com)
Parsing file uploads at 500 mb/s with node.js (debuggable.com)

Published on August 10th, 2010 in Application programming interface, django, Form (web), Google AppEngine, python, Uncategorized, Uploading and downloading

Did you enjoy this article?

👎👍

Continue reading about Programatically uploading to blobstore in python

Semantically similar articles hand-picked by GPT-4

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Learn more

Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

Senior Mindset Book

Start with a free chapter

Programatically uploading to blobstore in python

The howto

One final note

Did you enjoy this article?

Continue reading about Programatically uploading to blobstore in python

Learned something new?
Read more Software Engineering Lessons from Production

Software Engineering Lessons from Production

Senior Mindset Book

Senior Mindset Book

Programatically uploading to blobstore in python

The howto

One final note

Related articles by Zemanta

Did you enjoy this article?

Continue reading about Programatically uploading to blobstore in python

Senior Mindset Book