Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Possibly the ugliest python ever to escape my brain

    Lensing data from one format to another always produces the most horrible code ... here's some I wrote last night:

        # some other code here
    
        def stitch(*iterables):
            return [(iterables[0][i][0], sum([l[i][1] for l in iterables]))
                    for i in xrange(len(iterables[0]))]
    
        def one(z):
            d = dict(stitch(*[[b for b in zip(a.keys(), a.values()) if b[0] in ['count', 'step_conv_ratio', 'overall_conv_ratio']] for a in z]))
    
            d['overall_conv_ratio'] = d['overall_conv_ratio']/len(funnel['meta']['dates'])
            d['step_conv_ratio'] = d['step_conv_ratio']/len(funnel['meta']['dates'])
            return d
    
        print map(one,
                  zip(*[v['steps'][:entry['counts']['p']] for v in funnel['data'].values()]))
    
    The funnel of the Mauretania.

    Can you guess what all of that does? Me neither, let me try to explain.

    WTF!

    Firstly, the basic problem I'm trying to solve is fetching some funnel data from Mixpanel, which returns all the data I need, but it is split into different days and I'd really like to have it all mashed together. Due to the design of funnels in their API, they also return too much data, which spoils what I'm looking for.

    Their format is something like this:

    {'Signup flow': {'data': {'2010-05-24': {'analysis': {'completion': 0.064679359580052493,
                                                   'starting_amount': 762,
                                                   'steps': 3,
                                                   'worst': 2},
                                      'steps': [{'count': 762,
                                                 'goal': 'pages',
                                                 'overall_conv_ratio': 1.0,
                                                 'step_conv_ratio': 1.0},
                                                {'count': 69,
                                                 'goal': 'View signup',
                                                 'overall_conv_ratio': 0.09055118110236221,
                                                 'step_conv_ratio': 0.09055118110236221},
                                                // etc.},
                       '2010-05-31': // etc.
              'meta': {'dates': ['2010-05-24', '2010-05-31']}}}
    

    What I want is something where "Signup flow" would be just a list of steps where count is a sum of all counts for that step and conversion ratios are the average between all days. Yes this discards some useful data, but it's not too relevant for what I'm trying to analyze.

    Justified?

    Hopefully I can defend this code, explain why it's ugly.

    map(one,
        zip(*[v['steps'][:entry['counts']['p']] for v in funnel['data'].values()]))
    

    This code takes the data for each day, iterates over it to take only the steps key, cuts its list at an appropriate length (the funnel's got 60+ steps, I have extra data to tell me how much is actually applicable). Then zips all the steps together.

    Now we have a list of step tuples - all the first steps, second steps and so on.

    Then we apply the one function on each tuple:

    def one(z):
            d = dict(stitch(*[[b for b in zip(a.keys(), a.values()) if b[0] in ['count', 'step_conv_ratio', 'overall_conv_ratio']] for a in z]))
    
            d['overall_conv_ratio'] = d['overall_conv_ratio']/len(funnel['meta']['dates'])
            d['step_conv_ratio'] = d['step_conv_ratio']/len(funnel['meta']['dates'])
            return d
    

    Right, so the first thing that happens is we turn a dictionary into a list of (key, value) tuples, then this is filtered so only the ones we're actually interested in remain (count and the conversion ratios).

    These are then stitched together:

    def stitch(*iterables):
            return [(iterables[0][i][0], sum([l[i][1] for l in iterables]))
                    for i in xrange(len(iterables[0]))]
    

    Looks pretty bad, but really just turns a list of (key, value) tuples into a list of (key, sum of corresponding values) tuples.

    Finally, the conversion values are turned into the average by simply dividing with the number of days this funnel was tracked for.

    Better way?

    This is the cleanest and most elegant way I could come up with to solve this problem - there's bound to be something cleaner and easier. Actually, please tell me there's a better way!

    Generally speaking, is there even a good approach to take for converting data returned from somewhere into the kind of data you need somewhere else? Or are we doomed to forever spend most of our time figuring out how to get API's talking to one another?

    Published on February 13th, 2012 in Uncategorized

    Did you enjoy this article?

    Continue reading about Possibly the ugliest python ever to escape my brain

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️