Using ox-hugo without duplicating content in the repository

The Org mode unicorn logo, by Greg Newman.

My new blogging setup mashes up Org mode with Hugo using the ox-hugo library. This is part of my attempt to bring more of my work product into the single format of Org mode. Hugo has rudimentary support for Org mode through go-org, but it admittedly only tries to cover the 80/20 use case of Org mode. Since I intend to use Org mode for everything, the chances are high that I will end up with something incompatible, thus requiring me to work around the problem. I didn’t want to do that, so I stuck with the Hugo export library.

Every example that I saw of a site that uses ox-hugo ends up duplicating the Org mode documents are Markdown in the repository. This felt noisy and redundant to me so I wanted to find a way to have Netlify render the Org document to its constituent Hugo pages as part of the deploy process. This article documents the process that I now use and shares the rendering script that I wrote to process the file as part of a deployment.

Project

The layout of my project looks like the following:

michaeljherold.com/
├── blog.org
├── config.toml
├── Makefile
├── netlify.toml
├── script/
│   └── publish
├── static/
└── themes/
    └── v1/

The blog.org file holds my Org mode blog pages in “one post per Org subtree” mode. This allows me to have the entire site live in a single file, which I think is pretty cool. As time goes on and I write more, perhaps I will choose to split it into multiple files but for now, I like how this works.

For Hugo, we have config.toml, static/ and themes/. Nothing is exciting in any of those files since I render the site with Hugo-flavored Markdown.

The netlify.toml file contains my Netlify-specific configuration. This includes some redirects and commands for making production deployments, deploy previews, and branch-based deployments. Again, there’s not a lot that is interesting here.

The Makefile is a simple affair that we will discuss in two sections.

The meat of this article lives in the script/publish file, which we will talk about next.

Hopefully, other than the complete lack of a content/ folder, none of this is surprising for you if you’re familiar with Hugo. I’m striving to have as little magic as possible as part of my publication routine so that there are fewer things for me to tweak instead of sitting down and writing.

Publish script

The solution to the problem of duplicating content in your Git repository lives in the script/publish file. This is an Emacs Lisp-in-shell script that I run during my Netlify deployment to generate all of the Markdown files for the pages on the site.

I will break the script down section-by-section here and explain what I’m doing in each section.

Run me a shell script, please

#!/usr/bin/env sh
:; set -e # -*- mode: emacs-lisp; lexical-binding: t -*-
:; emacs --no-site-file --script "$0" -- "$@" || __EXITCODE=$?
:; exit ${__EXITCODE:-0}

I borrowed the setup for this section from Henrik Lissner’s fantastic Doom Emacs scripts because I thought it was so ingenious. First, we set the shebang that makes the script run via a shell command. Then comes the “wow!” moment. To explain, first a little background.

Historically, POSIX shells didn’t have the concepts of true and false¹. They used : to mean something equivalent to true. So we could rewrite the :; on each of these lines to true;, which is a no-op. You might be asking, “why would you do that?” The reason requires a little bit of Emacs Lisp knowledge: the ; character denotes a comment and the : denotes a keyword. In this case, it’s the keyword :, which in Emacs is effectively a no-op since it’s only a reference to the keyword.

Combine all of this together and you get the following effect: each of these lines runs in a POSIX shell like Bash but does not run anything in Emacs. How cool is that?

So what do these lines do?

The first line sets the POSIX “consequence of shell errors” to be that any command that fails in the script exits the script with the same exit code as the failing command. It also uses Emacs file variables to tell Emacs to read the file as if it were an Emacs Lisp file and use lexical bindings for all its variables.

The second line runs this file (via --script "$0") in Emacs without using a blank profile (via --no-site-file), passing along any arguments you give to the script into Emacs (via -- "$@"). If Emacs exits with a non-zero exit code, it saves that exit code in the __EXITCODE variable².

The third line then exits the script with the exit code from the last line or, if we didn’t set it, a default of zero, or success. That means the shell will never execute the rest of the script, so we’re free to do whatever we want there! Again, Henrik’s pattern here is impressive.

Okay, with the entirety of the shell script covered, let’s move on to some Emacs Lisp!

Just kidding, actually run some Emacs code

The first section of Emacs Lisp is boring because it just sets up some boilerplate. But it’s necessary, so here we go:

(defvar bootstrap-version)
(defvar straight-base-dir)
(defvar straight-fix-org)
(defvar straight-vc-git-default-clone-depth 1)
(defvar publish--straight-repos-dir)

(setq gc-cons-threshold 83886080 ; 80MiB
      straight-base-dir (expand-file-name "../.." (or load-file-name buffer-file-name))
      straight-fix-org t
      straight-vc-git-default-clone-depth 1
      publish--straight-repos-dir (expand-file-name "straight/repos/" straight-base-dir))

The first paragraph defines some variables that we later use to hush some warnings in the byte compiler. The first four are for setting up the Straight package manager, which I’ll talk about shortly, and the last one is a variable for this script.

The second paragraph sets values for several variables, as so:

gc-cons-threshold controls the threshold that Emacs must hit before it allows the garbage collector to run. By default, Emacs uses a very low threshold of 800KB. This causes many garbage collection cycles, which slows down the process of running the script. You can see in the comment that I set it to 80MiB, which is a good deal larger than the default but still conservative enough that I don’t worry that Netlify will run out of memory in the build container.
straight-base-dir sets the location to which Straight will install itself, relative to the current file. This is the project’s root directory, confusingly. Straight then adds a straight/ folder to that directory for its files.
straight-fix-org sets a Boolean variable for Straight to run a “fix” on the Org mode package during download. Org mode doesn’t store its org-version variable in the source repository and instead waits till they release a package to set it. Since Straight pulls from a Git repository, we must set this variable or Org will not build correctly³.
straight-vc-git-default-clone-depth sets the depth to which Straight will check out the Git repository for each package when it downloads them. Since we’re only using this to build our Markdown files and not use it daily, we can safely limit this to a single commit to speed up the build process on Netlify.
publish--straight-repos-dir contains the directory in which we want to store the repositories for the packages we download. We set this to be straight/repos/ in the root of the project.

Now that we have all of that setup, we can run the commands to bootstrap and install Straight. I decided to go with Straight here because it’s declarative and flexible. Right now, I only need Org mode and ox-hugo, but using Straight allows me to quickly and easily add a new package or a monkey-patch to either of the packages that I currently use. It also uses a declarative syntax which helps to make sure we do the correct thing every time.

(let ((bootstrap-file (expand-file-name "straight/repos/straight.el/bootstrap.el" straight-base-dir))
      (bootstrap-version 5))
  (unless (file-exists-p bootstrap-file)
    (with-current-buffer
        (url-retrieve-synchronously
         "https://raw.githubusercontent.com/raxod502/straight.el/develop/install.el"
         'silent 'inhibit-cookies)
      (goto-char (point-max))
      (eval-print-last-sexp)))
  (load bootstrap-file nil 'nomessage))

This bootstrap block comes almost verbatim from the Straight documentation. We install the bootstrap file in the Straight directory that we configured above and load it without displaying any messages. Please note that this, at the end of the day, downloads and executes a script from the Internet. Many people use Straight so I’m confident in my use case to do this, but consider it if you’re doing something where you are hosting sensitive data. I am not, so I am comfortable doing this.

Once we have downloaded and bootstrapped Straight, we can install the packages we need:

(straight-use-package
 '(org-mode :type git
            :host github
            :repo "emacs-straight/org-mode"
            :files ("*.el" "lisp/*.el" "contrib/lisp/*.el")))

(straight-use-package
 '(ox-hugo :type git
           :host github
           :repo "kaushalmodi/ox-hugo"
           :nonrecursive t))

As previously stated, we are only going to use two packages, Org mode and ox-hugo. The :files declaration on Org mode tells Straight to only create symbolic links for those particular files. The :nonrecursive declaration on ox-hugo tells Straight not to pull the submodules in the repository, which helps the download complete faster.

Now that we have our dependencies, we can run the actual script to publish the Org subtrees as Markdown files:

(with-current-buffer (find-file-noselect (expand-file-name "blog.org" "."))
  (org-next-visible-heading 1)

  (when (equal (org-entry-get (point) "CUSTOM_ID") "toc")
    (let ((inhibit-message t))
      (org-cut-subtree)))

  (message "Publishing...")
  (org-hugo-export-wim-to-md t))

This block opens the blog.org file at the root of the project. It removes my table of contents that I keep in the file because it prevents ox-hugo from rendering the file. To do this, it skips to the first heading in the file, checks to make sure it has the CUSTOM_ID property that I set on it called toc for “table of contents,” and, if so, cuts — or removes — the subtree from the file. Then, we echo “Publishing …” to standard out and use the org-hugo-export-wim-to-md function to export the whole file to its constituent Markdown files.

Et voilà! We now have all of our Markdown files in the content/ directoy, just as Hugo expects.

Makefile

To coordinate the use of our publish script, I find it useful to create Makefile so that we can write a simple command in our netlify.toml instead of a complex list of commands. Here is the content of the Makefile we need:

.PHONY: md production preview clean
md:
	./script/publish

production: md
	hugo

preview: md
	hugo -FD

clean:
	rm -rf ./content ./public ./resources

We have four phony targets that handle different pieces of the site. The md target runs our publish script to create our Markdown files. We list it as a dependency for the production and preview targets, which generate the published and published, future, and draft pages of our Markdown, respectively. And to help clean it all up, we have the clean target that removes everything we built.

With these targets, we can then fill out the relevant portions of our Netlify configuration.

`netlify.toml`

For this article, we only need a simple, three-section Netlify configuration. You can see it as follows:

[context.production]
command = "make production"

[context.deploy-preview]
command = "make preview"

[context.branch-deploy]
command = "make preview"

For production deployments, we only want to build published pages, so we use make production to generate the site. For previews, it’s handy to see future and draft pages as well, so we use make preview for those environments.

And that’s it! That’s all we need.

Conclusion

Because I am still a novice-to-journeyman Emacs Lisp programmer, it took me a while to figure out how to do the programmatic render of my Org mode file into Markdown. But, once I figured it out, the solution ended up being very simple and, I believe, future-proof.

This allows me to keep the size of my site repository down, which pays dividends going forward. I still have to worry about the size of assets in the repository, so that might be something I will look into in the future. But at least I don’t have two copies of the page contents that could only lead to confusion. Even better is the fact that it only uses software that is readily available in Netlify’s build images.

Did I make any mistakes? What do you think of Org mode? Have you ever written a website in it?

At least according to this StackOverflow answer. I haven’t been able to track down a primary source for this quirk. ↩
As I write this, I think I found an unnecessary feature here. Given that I just explained that set -e will cause the command to exit with an error code, I think this is not needed and that the next line can just be exit 0, but I will have to try it out to double-check. ↩
There’s nothing like writing an article for you to catch errors in your scripts. I looked this variable up and the Straight maintainer marked it as obsolete on 2020-10-23. I don’t think it’s necessary any more, but again, I have to double-check. ↩