Did you know most websites go offline within 2 years of publishing? Don’t let your writing join the graveyard.

Click through for source

More and more people are creating blogs with Gatsby and that’s exciting as heck! I miss the internet days when everyone had a place to call their own.

It’s pretty easy, too.

You initiate a repo, follow a quick tutorial, write your first article, press build, and voila: A beautiful new place to publish your ideas.

And you get some amazing benefits we didn’t get when I built my first site on Geocities.

Great lighthouse scores, fast load times, easy build&publish process with Zeit or Netlify, and image optimization unlike anything I’ve ever seen. Plop a 40MB photo from a DSLR into your blog and Gatsby plugins convert it to a tiny fast loading 900Kb version.

Done that before, didn’t even notice. On WordPress it breaks your site. Tried that too 😇

But what if you already have a blog? That’s where it gets tricky.

Click through for source

How to move an existing WordPress blog to Gatsby

Creating the blog itself is easy:

  1. Setup repo
  2. Follow the gatsby tutorial
  3. Add a bunch of plugins that sound useful

Now, your old stuff may not be great but you should keep it anyway. For the memories, for not breaking old links, for being a good member of the internet.

Did you know most websites go offline within 2 years of publishing? Don’t let your writing join the graveyard.

There’s 2 parts to this and both are hard:

  1. Convert your WordPress history to a format Gatsby understands
  2. Keep the old links working

I haven’t solved #2 yet. I imagine there’s going to be a redirect page on Gatsby that takes a URL from the old format and pushes you to the correct article.

No. 1 proved harder than I thought.

A script that converts WordPress to Gatsby

Gatsby has a plugin to source data directly from WordPress’s API. This is great, if you want to continue using WordPress as your CMS and use Gatsby as the frontend.

I prefer writing in Markdown.

Your other option is downloading the wordpress.xml datadump and converting it to Gatsby. Get all your content, your blogs, even the comments in one big file … then what?

Then you 👇

  1. clone this wordpress-to-markdown repo
  2. Run yarn install or npm install as preferred
  3. Name your file export.xml and place it in root
  4. Edit line 181 to change the author
  5. Run node convert.js
  6. Wait

Yes it’s a little hacky right now. Might productize later 🙂

convert.js started life as ytechie‘s hack some 6 years ago. A couple forks later I found a version that mostly works in 2019.

Making it work for Gatsby took some tinkering.

The generated Markdown wasn’t valid MDX, which I prefer for blogs since it lets you run JavaScript, downloading images was wonky, and I needed more self-contained posts with better headers. Gatsby likes it when each article comes in its own folder with its own images.

Using a folder-per-article schema also makes it filename collisions less likely. ✌️

Parse XML posts into Gatsby

The core of convert.js is a method that parses XML, iterates through posts, generates markdown, and downloads all images.

Using xml2js to parse XML, rehype to parse HTML content, remark to generate Markdown, and good old node-fetch to download images.

You can see the full code on GitHub altho it’s not the prettiest.

Here’s the fun part that takes HTML and spits out Markdown 😛

Click through for source

I forked from a version with a complex homegrown Markdown generator and many bugs. Mixing rehype, remark, unified, and plugins makes the process more reliable and easier to maintain.

  1. Parse HTML to an AST
  2. Convert to Markdown AST
  3. Stringify
  4. Output

👌

My surrounding code adds some headers and other meta data. Makes it easier to plop straight into Gatsby.

Download all the images

Most images on my old posts are dead. 404, 500, 301, you name the error, it’s in there somewhere.

The only way to avoid that fate in the future is to keep local copies of images that you host yourself. Gatsby supports that really well with MDX – put images next to your words and use relative paths.

But you need to download them first.

I created this processImage method and tried a few different ways to download images one by one. The original crashed my computer. Too much parallelism.

Click through for source

Ignore the ancient var syntax, I didn’t want to rewrite everything 😇

We take an image URL, split it into parts, create a new filePath, run the downloadFile method, replace the original URL in our post with the new relative path.

Updating the images array is important so we can collect candidates for our hero image. The part that goes into social thumbnails and on the homepage.

Tried a bunch of ways to download files, this is the one that worked best in the end.

Click through for source

Check the URL is an image, use fetch(), make sure the response is an image (some server errors return a success with a bunch of HTML, that was fun to learn), and write the file.

You’d think this was the easiest part, but it isn’t. Most libraries you find are meant to work with APIs, not file downloads. Handling all sorts of errors gets tricky, and when you have thousands of files to download you gotta be really careful with your async code.

First version downloaded everything in parallel. Crashed my computer.

Now it goes one-by-one which is slower, but at least it finishes.

The result

An error.

I told you this was hard.

The way I setup my new Gatsby blog, it expects every article to have a hero image. Looks like some either have no images, or no working images.

That’s next on my list to fix. What do we do with missing images? Do I create a 404 graphic? Do I remove them from the post? What about the hero image? 🤔

Until then this will continue to be the main comment I get

Click through for source

Cheers,
~Swizec

Learned something new? Want to improve your skills?

Join over 10,000 engineers just like you already improving their skills!

Here's how it works 👇

Leave your email and I'll send you an Interactive Modern JavaScript Cheatsheet 📖right away. After that you'll get thoughtfully written emails every week about React, JavaScript, and your career. Lessons learned over my 20 years in the industry working with companies ranging from tiny startups to Fortune5 behemoths.

PS: You should also follow me on twitter 👉 here.
It's where I go to shoot the shit about programming.