Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Python and lazy evaluation

    Lazy, by CurlysGirly
    Lazy, by CurlysGirly

    I woke up this morning with a clear thought in mind "Generators are awesome! I will write a post about using them for lazy evaluation! Hoorah"

    After much dabbling I realized how wrong I was. Python's generators are indeed cool. They give us the use of infinite lists and they're useful for conserving memory usage, but lazy evaluation they can't quite do.

    Let's make a generator for natural numbers:

    def generator():
        i = 1
        while True:
            yield i
            i += 1
    

    A simple function with a loop counting from one to infinity. The yield operator is what saves us from looping into infinity by turning the function into a generator. We can now take any number of natural numbers:

    from itertools import islice
    
    def take(n, iterable):
        "Return first n items of the iterable as a list"
        return list(islice(iterable, n))
    
    print take(5, generator())
    # [1, 2, 3, 4, 5]
    

    Cool, we've implemented python's native range function. Handy, bot nothing special.

    The power of generators lies in using something more than +1 as the core function. How about implementing a naive algorithm for listing prime numbers?

    def primes():
        for n in generator():
            if not any(i > 1 and i != n and n%i == 0
                       for i in islice(generator(), n)):
                yield n
    
    print take(10, primes())
    # [1, 2, 3, 5, 7, 11, 13, 17, 19, 23]
    

    Ok the number one might not be prime, but that's easily fixable by changing the generator we're iterating over in the primes() function so it starts at 2 instead of 1.

    But this algorithm is slow for anything significant. It would be great if we could improve it by only doing trial divisions with known primes rather than everything.

    But ... we can't do that. At least I haven't found a good way to do it. Logically speaking, we should be able to iterate over a list of all primes() as long as the numbers returned are smaller than the one we are currently checking:

    def primes():
        for n in generator():
            if not any(p != n and n%p == 0
                       for p in takewhile(lambda x: n>x,
                                          primes())):
                yield n
    
    print take(10, primes())
    

    This produces an infinite recursion and the script dies after spitting out a bunch of errors. It might look like we aren't doing anything to stop the recursion, we actually are.

    Internally primes() is stopped by the takewhile since it only takes from the generator while a condition is met. And externally it's stopped by the take() since it finishes after 10 primes have been yielded.

    No laziness :(

    By no means are generators bad - using generator comprehensions instead of list comprehensions is a great idea. It will save plenty of memory when you're doing something like this:

    # builds a big list and immediately discards it
    >>> sum([x*x for x in xrange(2000000)])
    2666664666667000000L
    
    # only keeps one value at a time in memory
    >>> sum(x*x for x in xrange(2000000))
    2666664666667000000L
    

    As noted in Improving your code with modern idioms a lot of cool tricks like that have been backported from python 3 and you should start using them.

    But what's happened to our dreams of lazy evaluation?

    We can check that Python is indeed very eager to evaluate everything:

    >>> ["no sleep", time.sleep(1), time.sleep(2)][0]
    'no sleep'  # takes 3 seconds to print
    

    As this example shows python immediately evaluates the whole data structure with wanton disregard for what you're actually using. Even doing it as a tuple doesn't work.

    Using a generator helps a little bit

    >>> list(islice((time.sleep(x) for x in xrange(3)), 1))
    [None] # takes 0 seconds
    >>> list(islice((time.sleep(x) for x in xrange(3)), 2))
    [None, None] # takes 1 second
    >>> list(islice((time.sleep(x) for x in xrange(3)), 3))
    [None, None, None] # takes 3 seconds
    

    But this is awkward. Our only other bet is using lambda's to postpone evaluation of particular members of a list until they're actually needed:

    >>> [lambda: "no sleep", lambda: time.sleep(1), lambda: time.sleep(2)][0]()
    'no sleep' # takes 0 seconds
    

    Only somewhat awkward to use and it isn't even real lazy evaluation. Of course nothing gets evaluated eagerly when we're just assigning functions to variables so they can be called later. It's a trick that works in all languages that permit storing functions. Javascript comes immediately to mind.

    While we might be able to expand on this lambda and generator trickery throughout our codebase ... let's not. Let's be content that python is an eager language, asking only "How high?" when we tell it to jump, and rejoice in the fact it can be kind of sort of almost lazy when we explicitly tell it to be.

    Update May 16th, 2016

    Reader Jasen Betts fixed a bug in my lazy eval implementation. Here's what he has to say:

    Note that takewhile does not stop UNTIL it finds an unacceptable result, it has to find the first unacceptable result to terminate. That's why your second primes function doesn't work.

    Code that works:

    def primes():
    
        # print ("(yield=2)")
        yield 2
    
        for f in generator():
            n=f*2+1
            if not any(p != n and n%p == 0
                    for p in takewhile(lambda x: n>=x*x, primes())
                    );
                # print ("(yield=%d)"%(n) )
                yield n
    

    points to note

    `yield 2` and `n>=x*x` ensure that it terminates by ensuring the racursive calls require smaller maximum results

    `n=f*2+1` helps efficiency, but `n=f+2` would also work. after `yield 2` we need to start searching at 3.

    It's still horribly inefficient uncomment the `print()` calls to see why.

    Perhaps this can be fixed by storing the found results in a dict and yielding those first before looping over the generator.

    Published on August 24th, 2012 in Lazy evaluation, Prime number, python, Sequence, Time complexity, Uncategorized

    Did you enjoy this article?

    Continue reading about Python and lazy evaluation

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️