Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Livecoding #31: Wherein we learn that datasets are hard and find 2 good papers

    This is a Livecoding Recap – an almost-weekly post about interesting things discovered while livecoding ?. Always under 500 words and with pictures. You can follow my channel, here. New content almost every Sunday at 2pm PDT. There’s live chat, come say hai ?

    There were things happening this weekend that left me glued to Twitter. Distracted and ineffective, I didn't get much done. It shows in the Livecoding session too.

    I'm still deciding whether I want to write about the things that were happening. Maybe I should, maybe I shouldn't, maybe I have nothing useful to add. Who knows… ¯\(ツ)

    Completely coincidentally, I wanted to use the Livecoding session to build an immigration dataviz. Something that would show the positive economic impact of immigration, and not just This Is How Many People Came. Numbers are more interesting when coupled with impact.

    Did you know that 24,000,000 people immigrated to the US in 2015 alone? That's a bunch of people.

    Immigration chord diagram
    Immigration chord diagram

    Curran Kelleher built a great chord diagram of the UN migrations dataset. And there's this cool chord diagram of flights in and out of United States that @espinielli built.

    Flights chord diagram

    I wanted something more. I wanted to build something that shows how many businesses are created by immigrants, how much money is pumped into the economy, and how many people were employed.

    I failed. For now.

    Those datasets are hard to find. I was able to find a dataset that shows the number of people self-employed, having jobs, or running a business based on race and ethnicity. It's called the Survey of Business Owners. It’s collected by the government and released every month, I think.

    Then there's the US census current population survey which also promised to be useful.

    But I was unable to put them together and build a comprehensive dataset that shows what I wanted. Or even mentions it.

    Looks like the US government is much more concerned with tracking whether somebody is Black, Asian, Hispanic, White, or a veteran than it is whether they're an immigrant or not. I wonder why… ?

    That said, I found two amazing studies talking about what I wanted to show.

    This Immigrant Entrepreneurship 2016 paper from Harvard is the first. It's 68 pages, so I haven't read it yet.

    We examine immigrant entrepreneurship and the survival and growth of immigrant-founded businesses over time relative to native-founded companies. Our work quantifies immigrant contributions to new firm creation in a wide variety of fields and using multiple definitions. While significant research effort has gone into understanding the economic impact of immigration into the United States, comprehensive data for quantifying immigrant entrepreneurship are difficult to assemble. We combine several restricted-access U.S. Census Bureau data sets to create a unique longitudinal data platform that covers 1992-2008 and many states. We describe differences in the types of businesses initially formed by immigrants and their medium-term growth patterns. We also consider the relationship of these outcomes to the immigrant's age at arrival to the United States.

    Sounds perfect, doesn't it?

    Except for the "combine several restricted-access". I can't do restricted access.

    This Immigrant Entrepreneurs and Small Business Owners, and their Access to Financial Capital 2012 paper from SBA tells a similar tale. Their datasets constructed out of restricted-access materials.

    Alas, that puts a stop to this project for now. But I'll email the paper authors to see if they're willing to share.

    Published on January 30th, 2017 in Livecoding, Technical

    Did you enjoy this article?

    Continue reading about Livecoding #31: Wherein we learn that datasets are hard and find 2 good papers

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️