How I built this site with awk, make and emacs

2023-07-11 Tue

If you want to make a static site and you're a reasonable person you'd probably use one of the more popular static site generators out there like Gatsby, Jekyll, Hugo etc. You might also just write the HTML by hand or use a document converter like pandoc or org mode's export facilities.

But not all of us are reasonable people. After realizing that generating a static site is mostly about wrangling text and managing dependencies I could not help but do it myself. And I decided to use awk and make for this task.

Awk and make are not fashionable software - both are almost 50 years old, which is absolutely ancient by software standards. But they're powerful tools that directly address the problem domain: awk is a line-oriented language specialized for text operations, and make is a build automation tool that automates tasks by basically allowing you to compose rules like "update X if Y changes". They're also practically universally available - you'll find them installed on any Unix-like system.

So after some time tinkering away at the idea I ended up with this. I call it Mossgreen, and it's a collection of eight scripts written in awk, bash and emacs lisp, tied together with a makefile, totaling just under 500 lines at the time of writing. Some of its capabilities are:

exporting content written in org mode to HTML
super simple templating engine that uses only four directives
syntax highlighted code blocks
atom feed generation

It's not a solution for everyone: doing simple things like adding a meta description to a blog post requires the use of command line tools such as grep and sed, but the example I've provided with the code is pretty comprehensive and should be useful to get a basic blog up and running with minimal effort.

Project Overview: Who does what

The eight scripts that compose Mossgreen are the following:

scripts
├── export.awk
├── syntax-highlight.el
├── export-new.sh
├── fill-template.awk
├── surround-content.awk
├── condense-blog.awk
├── atomize-news-entry.awk
└── html-encode.awk

Let's begin with export.awk: it exports content written in org mode to HTML. I've used a slightly restricted subset of the org mode syntax: export.awk supports all the basics like headings, lists, bold, italic, underlined, ~~strikethrough~~ text, inline code and links, while ignoring some org features such as tables, comments and macros. It also supports the export settings #+TITLE and #+DATE, plus an additional setting named #+META for arbitrary metadata to be appended at the end of the file as a comment block. (Remember that bit about grep and sed? We'll come to it shortly.)

export.awk exports the contents of org mode's #+begin and #+end blocks literally, with the exception of #+begin_src and #+end_src; which are used to export highlighted source code with the help of syntax-highlight.el. Let's use syntax-highlight.el to syntax highlight syntax-highlight.el:

#! /usr/local/bin/emacs --script

;; Load htmlize
(require 'package)
(package-initialize)
(require 'htmlize)

(let ((target (pop command-line-args-left)))
  (find-file target)
  (kill-whole-line)          ; Delete the first line, which is used to set mode
  (font-lock-fontify-buffer) ; Force fontification, required in batch mode
  (with-current-buffer (htmlize-buffer)
    (princ (buffer-string))))

This is an emacs lisp script, called from export.awk and executed by emacs in batch mode. The crux of it has been taken from this stackoverflow answer.

When done export.awk returns the time stamp of the content it has exported using an unused file descriptor (courtesy of another stackoverflow answer). export.awk is called from export-new.sh, which takes a source and a target directory as arguments, and scans the source directory for any files that do not have a timestamp in their name. These files are then exported to the target directory and renamed using the timestamp, so that they won't be exported later - unless they change, which is handled by the makefile using static pattern rules like the following:

$(NEWSEXPORTED): build/news-exported/%: src/news-raw/%
        @echo "Updating: $@"
        cd scripts; ./export.awk ../$^ 3>/dev/null 1>../$@

At this point we have directories containing content in org mode, and directories containing exported partial HTML. We have now arrived at the templating part: we want the partial HTML to be inserted into templates so we can have full HTML pages with headers, footers and whatnot. This is handled by two scripts: surround-content.awk and fill-template.awk.

surround-content.awk is for when you want to generate one page per content file, e.g. for blog posts. It takes one content file and one template as input, and combines them through the following two directives:

#+CONTENT: Inserts the content into the template
#+EXEC <command>: Executes command and inserts the output into the template

The EXEC directive uses the contentfile keyword to refer to the content file in the command, which is how you do things like setting document titles and descriptions: you grep for the relevant bits and sed them into shape like this:

<head>
  <meta charset="UTF-8" />
  #+EXEC grep contentfile -e "#+META Description: " | sed 's/^#+META Description: \(.*\)$/<meta name="description" content="\1">/'
  #+EXEC grep contentfile -e "<h1>.*</h1>" | sed 's/^<h1>\(.*\)<\/h1>$/<title>\1<\/title>/'
  <link href="/style.css" rel="stylesheet" />
</head>

fill-template.awk is for when you want to generate one page per template, e.g. blog index pages that list all blog posts. It understands the following two directives:

#+INSERT <path>: Inserts file(s) into the template
#+SH <command>: Executes command and inserts the output into the template

The INSERT directive does the heavy lifting here; at its simplest it'll just take a file like a header/footer and insert it into the file:

#+INSERT header.html

But it can also do things like "take 10 files, starting from the 20'th file, wrap them between <li> tags, process them with a command and insert the output into the file"

#+INSERT posts/* 10 20 LI EXEC: ./condense-blog.awk -v file={}

Which is where condense-blog.awk comes into play - as its name suggests it takes a blog post and condenses it to a small entry intended for the blog index page. The same method is also used in the generation of atom feeds with atomize-news-entry.awk. The only remaining script, html-encode.awk, is intended as a helper in SH and EXEC directives; the output of commands like git log aren't safe to insert into HTML directly due to containing syntactically significant characters like < >.

Final Thoughts

And that's it! I hope you've found this interesting. I'm satisfied with how small and straightforward the project ended up being - it's basically one step above writing HTML by hand, and gives me the sense of control and predictability that I sought when I set out to do this.

If you're the kind of person who likes minimal solutions or building things themselves you may find this appealing or even useful. At any rate feel free to share your thoughts with me and definitely let me know if you decide to use this!