Swizec Teller - a geek with a hatswizec.com

Senior Mindset Book

Get promoted, earn a bigger salary, work for top companies

Senior Engineer Mindset cover
Learn more

    Parsing JavaScript with JavaScript

    Over the weekend I started working on llamaduck- a simple tool that aims to figure out whether your code will run on the newly released node 0.6.0. Eventually it might be able to perform other compatibility assessment tasks as well, but I'm focusing on simple stuff first.

    southridge-56gold-111105-891.JPG

    Or at least I thought it was simple.

    The list of API changes since 0.4.x doesn't seem that long and it should be easy enough to digest. But as it turns out, I spent almost all of Sunday just figuring out how to turn javascript into a beautiful analyzable AST.

    If you don't know what an AST is - it's a so called abstract syntax tree, which means it should look identical regardless of what the actual syntax is. Although it will differ for actually different languages. So a coffeescript AST should look the same as JavaScript, but Python's will differ.

    My research came up with three options:

    1. Take a parser generator and a JavaScript grammar, hope for the best
    2. JSLint has a parser ... somewhere around line 2000
    3. Uglify-JS supposedly has a parser too

    The only viable option was uglify-js. It's a neatly packaged node.js module that does a bit more than I need, but at least it's got an easy to use parser with an exposed api interface.

    Score!

    Here's an example of a file that outputs its own AST to give you a feel for what I'm talking about:

    var parser = require("uglify-js").parser;
    var util = require("util");
    
    (function get_ast(path, callback) {
      require("fs").readFile(path, "utf-8", function (err, data) {
        if (err) throw err;
    
        callback(parser.parse(data));
      });
    })("./example.js", function (data) {
      console.log(util.inspect(data, true, null));
    });
    

    The file parses itself and outputs a tree encoded as a javascript array (scroll past the insanity, there's a bit more text there):

    [
      "toplevel",
      [
        [
          "var",
          [
            [
              "parser",
              [
                "dot",
                [
                  "call",
                  ["name", "require", ([length]: 2)],
                  [["string", "uglify-js", ([length]: 2)], ([length]: 1)],
                  ([length]: 3),
                ],
                "parser",
                ([length]: 3),
              ],
              ([length]: 2),
            ],
            ([length]: 1),
          ],
          ([length]: 2),
        ],
        [
          "var",
          [
            [
              "util",
              [
                "call",
                ["name", "require", ([length]: 2)],
                [["string", "util", ([length]: 2)], ([length]: 1)],
                ([length]: 3),
              ],
              ([length]: 2),
            ],
            ([length]: 1),
          ],
          ([length]: 2),
        ],
        [
          "stat",
          [
            "call",
            [
              "function",
              "get_ast",
              ["path", "callback", ([length]: 2)],
              [
                [
                  "stat",
                  [
                    "call",
                    [
                      "dot",
                      [
                        "call",
                        ["name", "require", ([length]: 2)],
                        [["string", "fs", ([length]: 2)], ([length]: 1)],
                        ([length]: 3),
                      ],
                      "readFile",
                      ([length]: 3),
                    ],
                    [
                      ["name", "path", ([length]: 2)],
                      ["string", "utf-8", ([length]: 2)],
                      [
                        "function",
                        null,
                        ["err", "data", ([length]: 2)],
                        [
                          [
                            "if",
                            ["name", "err", ([length]: 2)],
                            [
                              "throw",
                              ["name", "err", ([length]: 2)],
                              ([length]: 2),
                            ],
                            undefined,
                            ([length]: 4),
                          ],
                          [
                            "stat",
                            [
                              "call",
                              ["name", "callback", ([length]: 2)],
                              [
                                [
                                  "call",
                                  [
                                    "dot",
                                    ["name", "parser", ([length]: 2)],
                                    "parse",
                                    ([length]: 3),
                                  ],
                                  [["name", "data", ([length]: 2)], ([length]: 1)],
                                  ([length]: 3),
                                ],
                                ([length]: 1),
                              ],
                              ([length]: 3),
                            ],
                            ([length]: 2),
                          ],
                          ([length]: 2),
                        ],
                        ([length]: 4),
                      ],
                      ([length]: 3),
                    ],
                    ([length]: 3),
                  ],
                  ([length]: 2),
                ],
                ([length]: 1),
              ],
              ([length]: 4),
            ],
            [
              ["string", "./example.js", ([length]: 2)],
              [
                "function",
                null,
                ["data", ([length]: 1)],
                [
                  [
                    "stat",
                    [
                      "call",
                      [
                        "dot",
                        ["name", "console", ([length]: 2)],
                        "log",
                        ([length]: 3),
                      ],
                      [
                        [
                          "call",
                          [
                            "dot",
                            ["name", "util", ([length]: 2)],
                            "inspect",
                            ([length]: 3),
                          ],
                          [
                            ["name", "data", ([length]: 2)],
                            ["name", "true", ([length]: 2)],
                            ["name", "null", ([length]: 2)],
                            ([length]: 3),
                          ],
                          ([length]: 3),
                        ],
                        ([length]: 1),
                      ],
                      ([length]: 3),
                    ],
                    ([length]: 2),
                  ],
                  ([length]: 1),
                ],
                ([length]: 4),
              ],
              ([length]: 2),
            ],
            ([length]: 3),
          ],
          ([length]: 2),
        ],
        ([length]: 3),
      ],
      ([length]: 2),
    ];
    

    Conclusion

    Now we have a simple tree we can recursively analyze and look for incompatibilities. But before anything really practical can be done I need to figure out how to track variable scope. That's really the hard bit because the code needs to check when variables become a critical section and then confirm that they do in fact eventually get used in a critical way.

    But once that nut is cracked llamaduck will be a neat little tool useful for many things.

    If you've got some coding inclination, I'd love a helping hand over at the llamaduck github repo.

    Published on November 11th, 2011 in Abstract syntax tree, Application programming interface, AST, Compilers, JavaScript, Languages, Parsing, Programming, Uncategorized

    Did you enjoy this article?

    Continue reading about Parsing JavaScript with JavaScript

    Semantically similar articles hand-picked by GPT-4

    Senior Mindset Book

    Get promoted, earn a bigger salary, work for top companies

    Learn more

    Have a burning question that you think I can answer? Hit me up on twitter and I'll do my best.

    Who am I and who do I help? I'm Swizec Teller and I turn coders into engineers with "Raw and honest from the heart!" writing. No bullshit. Real insights into the career and skills of a modern software engineer.

    Want to become a true senior engineer? Take ownership, have autonomy, and be a force multiplier on your team. The Senior Engineer Mindset ebook can help 👉 swizec.com/senior-mindset. These are the shifts in mindset that unlocked my career.

    Curious about Serverless and the modern backend? Check out Serverless Handbook, for frontend engineers 👉 ServerlessHandbook.dev

    Want to Stop copy pasting D3 examples and create data visualizations of your own? Learn how to build scalable dataviz React components your whole team can understand with React for Data Visualization

    Want to get my best emails on JavaScript, React, Serverless, Fullstack Web, or Indie Hacking? Check out swizec.com/collections

    Did someone amazing share this letter with you? Wonderful! You can sign up for my weekly letters for software engineers on their path to greatness, here: swizec.com/blog

    Want to brush up on your modern JavaScript syntax? Check out my interactive cheatsheet: es6cheatsheet.com

    By the way, just in case no one has told you it yet today: I love and appreciate you for who you are ❤️

    Created by Swizec with ❤️