Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Moving 13 years of Wordpress blog to Gatsby Markdown

    Did you know most websites go offline within 2 years of publishing? Don't let your writing join the graveyard.

    Click through for source
    Click through for source

    More and more people are creating blogs with Gatsby and that's exciting as heck! I miss the internet days when everyone had a place to call their own.

    It's pretty easy, too.

    You initiate a repo, follow a quick tutorial, write your first article, press build, and voila: A beautiful new place to publish your ideas.

    And you get some amazing benefits we didn't get when I built my first site on Geocities.

    Great lighthouse scores, fast load times, easy build&publish process with Zeit or Netlify, and image optimization unlike anything I've ever seen. Plop a 40MB photo from a DSLR into your blog and Gatsby plugins convert it to a tiny fast loading 900Kb version.

    Done that before, didn't even notice. On Wordpress it breaks your site. Tried that too 😇

    But what if you already have a blog? That's where it gets tricky.

    Click through for source
    Click through for source

    How to move an existing Wordpress blog to Gatsby

    Creating the blog itself is easy:

    1. Setup repo
    2. Follow the gatsby tutorial
    3. Add a bunch of plugins that sound useful

    Now, your old stuff may not be great but you should keep it anyway. For the memories, for not breaking old links, for being a good member of the internet.

    Did you know most websites go offline within 2 years of publishing? Don't let your writing join the graveyard.

    There's 2 parts to this and both are hard:

    1. Convert your Wordpress history to a format Gatsby understands
    2. Keep the old links working

    I haven't solved #2 yet. I imagine there's going to be a redirect page on Gatsby that takes a URL from the old format and pushes you to the correct article.

    No. 1 proved harder than I thought.

    A script that converts Wordpress to Gatsby

    Gatsby has a plugin to source data directly from Wordpress's API. This is great, if you want to continue using Wordpress as your CMS and use Gatsby as the frontend.

    I prefer writing in Markdown.

    Your other option is downloading the wordpress.xml datadump and converting it to Gatsby. Get all your content, your blogs, even the comments in one big file ... then what?

    rkFOnrE

    Then you 👇

    1. clone this wordpress-to-markdown repo
    2. Run yarn install or npm install as preferred
    3. Name your file export.xml and place it in root
    4. Edit line 181 to change the author
    5. Run node convert.js
    6. Wait

    Yes it's a little hacky right now. Might productize later :)

    convert.js started life as ytechie's hack some 6 years ago. A couple forks later I found a version that mostly works in 2019.

    Making it work for Gatsby took some tinkering.

    The generated Markdown wasn't valid MDX, which I prefer for blogs since it lets you run JavaScript, downloading images was wonky, and I needed more self-contained posts with better headers. Gatsby likes it when each article comes in its own folder with its own images.

    Using a folder-per-article schema also makes it filename collisions less likely. ✌️

    Parse XML posts into Gatsby

    The core of convert.js is a method that parses XML, iterates through posts, generates markdown, and downloads all images.

    Using xml2js to parse XML, rehype to parse HTML content, remark to generate Markdown, and good old node-fetch to download images.

    You can see the full code on GitHub altho it's not the prettiest.

    Here's the fun part that takes HTML and spits out Markdown 😛

    Click through for source
    Click through for source

    I forked from a version with a complex homegrown Markdown generator and many bugs. Mixing rehype, remark, unified, and plugins makes the process more reliable and easier to maintain.

    1. Parse HTML to an AST
    2. Convert to Markdown AST
    3. Stringify
    4. Output

    👌

    My surrounding code adds some headers and other meta data. Makes it easier to plop straight into Gatsby.

    Download all the images

    Most images on my old posts are dead. 404, 500, 301, you name the error, it's in there somewhere.

    The only way to avoid that fate in the future is to keep local copies of images that you host yourself. Gatsby supports that really well with MDX – put images next to your words and use relative paths.

    But you need to download them first.

    I created this processImage method and tried a few different ways to download images one by one. The original crashed my computer. Too much parallelism.

    Click through for source
    Click through for source

    Ignore the ancient var syntax, I didn't want to rewrite everything 😇

    We take an image URL, split it into parts, create a new filePath, run the downloadFile method, replace the original URL in our post with the new relative path.

    Updating the images array is important so we can collect candidates for our hero image. The part that goes into social thumbnails and on the homepage.

    Tried a bunch of ways to download files, this is the one that worked best in the end.

    Click through for source
    Click through for source

    Check the URL is an image, use fetch(), make sure the response is an image (some server errors return a success with a bunch of HTML, that was fun to learn), and write the file.

    You'd think this was the easiest part, but it isn't. Most libraries you find are meant to work with APIs, not file downloads. Handling all sorts of errors gets tricky, and when you have thousands of files to download you gotta be really careful with your async code.

    First version downloaded everything in parallel. Crashed my computer.

    Now it goes one-by-one which is slower, but at least it finishes.

    The result

    An error.

    CI4a2eC

    I told you this was hard.

    The way I setup my new Gatsby blog, it expects every article to have a hero image. Looks like some either have no images, or no working images.

    That's next on my list to fix. What do we do with missing images? Do I create a 404 graphic? Do I remove them from the post? What about the hero image? 🤔

    Until then this will continue to be the main comment I get

    Click through for source
    Click through for source

    Cheers,
    ~Swizec

    Published on September 23rd, 2019 in Technical

    Did you enjoy this article?

    Continue reading about Moving 13 years of Wordpress blog to Gatsby Markdown

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️