Swizec Teller - a geek with a hatswizec.com

    Moving 13 years of Wordpress blog to Gatsby Markdown

    Did you know most websites go offline within 2 years of publishing? Don't let your writing join the graveyard.

    Click through for source
    Click through for source

    More and more people are creating blogs with Gatsby and that's exciting as heck! I miss the internet days when everyone had a place to call their own.

    It's pretty easy, too.

    You initiate a repo, follow a quick tutorial, write your first article, press build, and voila: A beautiful new place to publish your ideas.

    And you get some amazing benefits we didn't get when I built my first site on Geocities.

    Great lighthouse scores, fast load times, easy build&publish process with Zeit or Netlify, and image optimization unlike anything I've ever seen. Plop a 40MB photo from a DSLR into your blog and Gatsby plugins convert it to a tiny fast loading 900Kb version.

    Done that before, didn't even notice. On Wordpress it breaks your site. Tried that too πŸ˜‡

    But what if you already have a blog? That's where it gets tricky.

    Click through for source
    Click through for source

    How to move an existing Wordpress blog to Gatsby

    Creating the blog itself is easy:

    1. Setup repo
    2. Follow the gatsby tutorial
    3. Add a bunch of plugins that sound useful

    Now, your old stuff may not be great but you should keep it anyway. For the memories, for not breaking old links, for being a good member of the internet.

    Did you know most websites go offline within 2 years of publishing? Don't let your writing join the graveyard.

    There's 2 parts to this and both are hard:

    1. Convert your Wordpress history to a format Gatsby understands
    2. Keep the old links working

    I haven't solved #2 yet. I imagine there's going to be a redirect page on Gatsby that takes a URL from the old format and pushes you to the correct article.

    No. 1 proved harder than I thought.

    A script that converts Wordpress to Gatsby

    Gatsby has a plugin to source data directly from Wordpress's API. This is great, if you want to continue using Wordpress as your CMS and use Gatsby as the frontend.

    I prefer writing in Markdown.

    Your other option is downloading the wordpress.xml datadump and converting it to Gatsby. Get all your content, your blogs, even the comments in one big file ... then what?


    Then you πŸ‘‡

    1. clone this wordpress-to-markdown repo
    2. Run yarn install or npm install as preferred
    3. Name your file export.xml and place it in root
    4. Edit line 181 to change the author
    5. Run node convert.js
    6. Wait

    Yes it's a little hacky right now. Might productize later :)

    convert.js started life as ytechie's hack some 6 years ago. A couple forks later I found a version that mostly works in 2019.

    Making it work for Gatsby took some tinkering.

    The generated Markdown wasn't valid MDX, which I prefer for blogs since it lets you run JavaScript, downloading images was wonky, and I needed more self-contained posts with better headers. Gatsby likes it when each article comes in its own folder with its own images.

    Using a folder-per-article schema also makes it filename collisions less likely. ✌️

    Parse XML posts into Gatsby

    The core of convert.js is a method that parses XML, iterates through posts, generates markdown, and downloads all images.

    Using xml2js to parse XML, rehype to parse HTML content, remark to generate Markdown, and good old node-fetch to download images.

    You can see the full code on GitHub altho it's not the prettiest.

    Here's the fun part that takes HTML and spits out Markdown πŸ˜›

    Click through for source
    Click through for source

    I forked from a version with a complex homegrown Markdown generator and many bugs. Mixing rehype, remark, unified, and plugins makes the process more reliable and easier to maintain.

    1. Parse HTML to an AST
    2. Convert to Markdown AST
    3. Stringify
    4. Output


    My surrounding code adds some headers and other meta data. Makes it easier to plop straight into Gatsby.

    Download all the images

    Most images on my old posts are dead. 404, 500, 301, you name the error, it's in there somewhere.

    The only way to avoid that fate in the future is to keep local copies of images that you host yourself. Gatsby supports that really well with MDX – put images next to your words and use relative paths.

    But you need to download them first.

    I created this processImage method and tried a few different ways to download images one by one. The original crashed my computer. Too much parallelism.

    Click through for source
    Click through for source

    Ignore the ancient var syntax, I didn't want to rewrite everything πŸ˜‡

    We take an image URL, split it into parts, create a new filePath, run the downloadFile method, replace the original URL in our post with the new relative path.

    Updating the images array is important so we can collect candidates for our hero image. The part that goes into social thumbnails and on the homepage.

    Tried a bunch of ways to download files, this is the one that worked best in the end.

    Click through for source
    Click through for source

    Check the URL is an image, use fetch(), make sure the response is an image (some server errors return a success with a bunch of HTML, that was fun to learn), and write the file.

    You'd think this was the easiest part, but it isn't. Most libraries you find are meant to work with APIs, not file downloads. Handling all sorts of errors gets tricky, and when you have thousands of files to download you gotta be really careful with your async code.

    First version downloaded everything in parallel. Crashed my computer.

    Now it goes one-by-one which is slower, but at least it finishes.

    The result

    An error.


    I told you this was hard.

    The way I setup my new Gatsby blog, it expects every article to have a hero image. Looks like some either have no images, or no working images.

    That's next on my list to fix. What do we do with missing images? Do I create a 404 graphic? Do I remove them from the post? What about the hero image? πŸ€”

    Until then this will continue to be the main comment I get

    Click through for source
    Click through for source


    Did you enjoy this article?

    Published on September 23rd, 2019 in Technical

    Learned something new?
    Want to become an expert?

    Here's how it works πŸ‘‡

    Leave your email and I'll send you thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

    Join Swizec's Newsletter

    And get thoughtful letters πŸ’Œ on mindsets, tactics, and technical skills for your career. Real lessons from building production software. No bullshit.

    "Man, love your simple writing! Yours is the only newsletter I open and only blog that I give a fuck to read & scroll till the end. And wow always take away lessons with me. Inspiring! And very relatable. πŸ‘Œ"

    ~ Ashish Kumar

    Join over 14,000 engineers just like you already improving their careers with my letters, workshops, courses, and talks. ✌️

    Have a burning question that you think I can answer?Β I don't have all of the answers, but I have some! Hit me up on twitter or book a 30min ama for in-depth help.

    Ready to Stop copy pasting D3 examples and create data visualizations of your own? Β Learn how to build scalable dataviz components your whole team can understand with React for Data Visualization

    Curious about Serverless and the modern backend? Check out Serverless Handbook, modern backend for the frontend engineer.

    Ready to learn how it all fits together and build a modern webapp from scratch? Learn how to launch a webapp and make your first πŸ’° on the side with ServerlessReact.Dev

    Want to brush up on your modern JavaScript syntax?Β Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❀️

    Created bySwizecwith ❀️