How This Blog Does IndieWeb || Math ∩ Programming

This article will explain how the blog is organized at a technical level, and show how I implemented various IndieWeb features.

Table of Contents:

Motivation

Earlier this year I migrated this blog off Wordpress to the Hugo static site generator. Over the last six months I’ve been gradually sculpting and shaping this digital place to feel more like my cozy corner of the web. This was partly motivated by withering social media, partly inspired by the IndieWeb POSSE philosophy (Publish on your Own Site, Syndicate Elsewhere), and partly driven by my desire to do more with the space like embed Javascript demos and have footnotes and endnotes.

So one of the big new features is that I have a “shortform” section on the blog, which corresponds roughly to threads I would post on Twitter or Mastodon. Syndication automatically converts those to full threads on all platforms, with a link back to the website.

The main constraint I had for myself when doing all this was to avoid managing a continuously running server, and certainly not deal with self-hosting. As the rest of the article shows, this means I rely heavily on GitHub Actions to run automation, as well as Netlify to trigger deployment webhooks and various third-party services for webmentions.

Depending so heavily on GitHub Actions mildly triggers my reliability brain, but because the custom features are implemented in standard python scripts, and the “databases” are also stored in flat files in the repo, if something goes awry migrating off GH Actions should be relatively straightforward.

As a standard disclaimer, the scripts in this article are very hacky and specific to my setup. I will likely never take the time to make them into something general-purpose. It’s my cozy corner and it can be a little messy.

Structure and Deployment

The blog is a relatively standard Hugo site using the Paperesque theme with various customizations. The blog lives in a private GitHub repository (with the theme fully vendored within), and deploys to Netlify on every push.

Here is my hugo.toml

baseURL = 'https://www.jeremykun.com/'
languageCode = 'en-us'
title = 'Math ∩ Programming'
theme = 'paperesque'

# This parameter is branched on in templates to determine if MathJAX should be
# served when the page is loaded.
[params]
  math = true

[[params.topmenu]]
  name = "Main Content"
  url = "main-content/"
[[params.topmenu]]
  name = "Primers"
  url = "primers/"
[[params.topmenu]]
  name = "All articles"
  url = "posts/"
[[params.topmenu]]
  name = "About"
  url = "about/"
[[params.topmenu]]
  name = "rss"
  url = "rss/"

[markup.goldmark.extensions.passthrough]
  enable = true
[markup.goldmark.extensions.passthrough.delimiters]
  inline = [['$', '$'], ['\(', '\)']]
  block = [['$$', '$$'], ['\[', '\]']]

Note the extension to allow Hugo to work well with standard TeX delimiters, I actually added that to Hugo in v0.122.0. See the docs here, and note there is still one bug I haven’t figured out how to fix.

And my netlify.toml:

[build]
  # npx -y pagefind ... runs pagefind to create a static search index
  command = "hugo --gc --minify && npx -y pagefind --site public"
  publish = "public"
  # exclude flat files in scripts/ that contain database mappings for published
  # social media posts
  ignore = "git diff --quiet $CACHED_COMMIT_REF $COMMIT_REF -- . ':(exclude)scripts/*.txt'"

[build.environment]
  HUGO_VERSION = "v0.122.0"
  TZ = "America/Los_Angeles"

[[redirects]]
  from = "/feed/"
  to = "/index.xml"
  status = 301

[context.deploy-preview]
  command = "hugo --gc --minify --buildFuture -b $DEPLOY_PRIME_URL && npx -y pagefind --site public"

Static search index

The build command above has an extra step to initialize a static search index using Pagefind. When this runs, it creates files such that this just works.

<div style="margin: 20px 0px 20px 0px;">
<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
<script src="/pagefind/pagefind-ui.js"></script>
<div id="search"></div>
<script>
    window.addEventListener('DOMContentLoaded', (event) => {
        new PagefindUI({ element: "#search", showSubResults: true });
    });
</script>
</div>

I slapped it in themes/paperesque/layouts/partials/homepage_display_section.html and it creates the search bar and dynamic experience you see on the blog homepage. I have not noticed it producing significant additional bandwidth usage, especially not compared to serving images.

Running scripts via GitHub Actions

Because the site is static, all of the automation on the blog is done through GitHub Actions, modifying static files in the repository and triggering a new deployment if any files change.

To accomplish this, I utilize the workflow_dispatch trigger in GitHub Actions, which allows one to remotely trigger a workflow via an HTTP POST request, optionally with arguments passed in the URL.

For example, my social media syndication script starts like this:

#.github/workflows/syndicate.yml
name: Syndicate to social media
permissions:
  contents: write
on:
  workflow_dispatch:
  schedule:
    # https://crontab.guru/once-a-day
    - cron: '0 0 * * *'

To trigger this action remotely, I can POST to https://api.github.com/repos/j2kun/math-intersect-programming/actions/workflows/syndicate.yml/dispatches with the following headers

{
  "Accept": "application/vnd.github+json",
  "Authorization": "Bearer <API_KEY>",
  "X-GitHub-Api-Version": "2022-11-28",
  "Content-Type": "application/x-www-form-urlencoded"
}

where <API_KEY> is a GitHub personal access token with permissions to trigger workflows. (I have it set to “Read access to actions variables, metadata, and secrets” and “Read and Write access to actions”) I will show how this is set up for the various instances in which it’s invoked in the later sections. As a bonus, workflow_dispatch gives a nice button in the GitHub Actions UI to trigger the workflow, which is helpful for debugging.

Once I can trigger a workflow remotely, the rest of the workflow involves checking out the repo, installing dependencies of the script, running a script that does whatever the action is supposed to do, and commit and push any changes to main.

The script running is simple, but this was the first time I tried actually pushing commits from a GH Action workflow. To do this, I set up a GITHUB_TOKEN secret that has write permissions, and then each script has a set of steps that looks like this:

- name: Run my_script.py
  run: |
    python -m scripts.my_script    
  env:
    SOME_SECRET: ${{ secrets.SOME_SECRET }}
- name: Commit changes
  run: |
    git config --local user.name  ${{ github.actor }}
    git config --local user.email "${{ github.actor }}@users.noreply.github.com"
    test -z "$(git status --porcelain)" || git add <FILE_THAT_COULD_BE_CHANGED>
    test -z "$(git status --porcelain)" || git commit -m "<SOME MESSAGE>"    
- name: Push changes
  uses: ad-m/github-push-action@master
  with:
    github_token: ${{ secrets.GITHUB_TOKEN }}
    branch: main

The git status --porcelain command returns an empty string if there are no changes to commit, and test -z returns true if the argument string is empty, so this skips the git add and git commit commands if there is nothing to commit.

This boilerplate is repeated often enough in the next few sections that I will omit it and just show the script that is being run.

The big one is social media syndication. Here I do a few things:

For long articles, post a single social media post with the title of the article and a link to it.
For a new special “shortform” section of the blog, split the article into posts and publish them on each social media platform as a thread.
For each syndicated post, save the URL of the syndicated post in a text file (“database”), so I can know it’s been processed in the future.
For each syndicated post, add a link to the end of the post to the syndicated post, so readers can “discover” that I syndicate stuff and see any associated discussion.

Each social media platform I want to syndicate to gets its own script, but they all have a common structure. Each has a flat file stored in the repository where each line of the file contains a pair of the blog post URL and the syndicated post URL. E.g., published_mastodon.txt has as its first line

https://www.jeremykun.com/shortform/2024-05-06-1018/ https://mathstodon.xyz/@j2kun/112426452236288901

Then each syndication script loads the markdown files for all relevant posts from the repo, determines their canonical URL, looks them up in the database to see if they’ve been syndicated, and syndicates each one that hasn’t. Finally, it writes the database with any new entries back to disk.

I have a set of helpers in a “utils” module that I use to parse markdown and convert articles to threads, among other things. I copied the current utils.py into a GitHub gist for full reference, and will highlight some of the more relevant parts shortly.

Then there is a common syndication.py that handles the common logic across platforms:

import pprint
from scripts import utils as utils


def publish_single_post(
    publish_callback, abspath: str, published_posts, dry_run=False, **kwargs
):
    """
    publish_callback takes as input the post and any additional args passed in
    kwargs, and returns the URI to the post that should be stored in the
    databse.
    """
    print(f"Processing {abspath} as single post")
    blog_post_permalink = utils.canonical_url(abspath, post_type="")
    with open(abspath, "r") as infile:
        post = utils.title_and_link_as_post(infile.read(), blog_post_permalink)

    # a debug print of the posts about to be posted
    print(f"Printing post for {abspath}:\n----------------------")
    print(f"\n{post}")
    print("\n----------------------\n")

    if dry_run:
        print("Dry run enabled, skipping post creation")
        return

    print(f"Publishing post for {abspath}")
    output_uri = publish_callback(post, **kwargs)
    print(f"Successfully posted post {output_uri}")
    published_posts[blog_post_permalink] = output_uri


def publish_thread(
    publish_callback,
    post_adjuster,
    abspath: str,
    published_posts,
    dry_run=False,
    **kwargs,
):
    """
    publish_callback takes as input the list of posts and any additional args
    passed in kwargs, and returns the URI to the root post that should be
    stored in the databse.

    post_adjuster takes as input the list of posts and returns the list of
    posts that should be published, adjusting the length and/or splitting them
    as needed for the platform.
    """
    print(f"Processing {abspath} as thread")
    blog_post_permalink = utils.canonical_url(abspath)
    convert_math = kwargs.pop("convert_math", True)
    with open(abspath, "r") as infile:
        posts = utils.convert_post_to_thread(
            infile.read(),
            blog_post_permalink,
            convert_math=convert_math,
        )

    posts = post_adjuster(posts, blog_post_permalink=blog_post_permalink, **kwargs)

    print(f"Printing post thread for {abspath}:\n----------------------")
    for i, post in enumerate(posts):
        print(f"\n{i}.\t{post}")
    print("\n----------------------\n")

    if dry_run:
        print("Dry run enabled, skipping post creation")
        return

    print(f"Publishing post thread for {abspath}")
    root_uri = publish_callback(posts, **kwargs)
    published_posts[blog_post_permalink] = root_uri


def syndicate_to_service(
    name: str,
    database_filepath,
    thread_publisher,
    thread_adjuster,
    post_publisher,
    since_days=1,
    dry_run=False,
    **kwargs,
):
    print("Syndicating to", name)

    # dict mapping Blog URL to first post url in published thread.
    git_root = utils.get_git_root()
    database_path = git_root / database_filepath
    published_posts = utils.load_database(database_path)
    print(f"Existing {name} posts from {database_path}:")
    pprint.pp(published_posts)

    posts_to_try = utils.get_blog_posts()
    posts_to_publish = utils.get_posts_without_mapping(
        posts_to_try, published_posts, since_days=since_days
    )

    try:
        for abspath in posts_to_publish["shortform"]:
            publish_thread(
                thread_publisher,
                thread_adjuster,
                abspath,
                published_posts,
                dry_run=dry_run,
                **kwargs,
            )

        for abspath in posts_to_publish["posts"]:
            publish_single_post(
                post_publisher,
                abspath,
                published_posts,
                dry_run=dry_run,
                **kwargs,
            )
    finally:
        print("Writing successful post URLs to disk")
        utils.dump_database(published_posts, database_path)

The main part is syndicate_to_service, which loads the database file, extracts the posts and finds the ones that haven’t been syndicated yet, and then decides to publish a thread or single post. The publish_thread function is the tricky part. It takes two callbacks publish_callback and post_adjuster, which, respectively, implement the platform specific ways to publish all the posts in a thread and “adjust” a thread to fit the platform’s constraints.

By “adjust” I mean that platforms differ in the allowed length of a post, as well as how to count the character contributions of links and such. It’s also responsible for adding the link back to the original post. The slightly devious part is that the post_adjuster and publish_callback receive **kwargs from passed through from syndicate_to_service, which allow the original caller to propagate platform-specific options (mainly the client for the platform’s API).

The Mastodon client is the simplest:

import os

import fire
from mastodon import Mastodon
from scripts import syndication as syndication

# A simple text file with two urls per line
DATABASE_FILE = "scripts/published_mastodon.txt"


def mastodon_post_publisher(post: str, mastodon_client=None, **kwargs):
    if not mastodon_client:
        raise ValueError("mastodon_client must be provided")
    status_dict = mastodon_client.status_post(post)
    return status_dict['url']


def mastodon_thread_adjuster(posts, blog_post_permalink=None, **kwargs):
    if not blog_post_permalink:
        raise ValueError("blog_post_permalink must be provided")
    posts[0] += f"\n\nArchived at: {blog_post_permalink}"
    return posts


def mastodon_thread_publisher(posts, mastodon_client=None, **kwargs):
    if not mastodon_client:
        raise ValueError("mastodon_client must be provided")

    toots_for_post = []
    for i, toot in enumerate(posts):
        reply_id = toots_for_post[-1]["id"] if len(toots_for_post) > 0 else None
        status_dict = mastodon_client.status_post(toot, in_reply_to_id=reply_id)
        print(
            f"Successfully posted toot {i} of the thread: "
            f"{status_dict['id']} -> {status_dict['url']}"
        )
        toots_for_post.append(status_dict)

    return toots_for_post[0]["url"]


def publish_to_mastodon(since_days=1, dry_run=False):
    """Idempotently publish shortform and regular posts to mastodon."""
    # File generated by scripts/login_with_mastodon.py or else set in
    # environment for headless usage in GH actions.
    mastodon_client = Mastodon(
        api_base_url="https://mathstodon.xyz",
        access_token=os.getenv(
            "MASTODON_TOKEN", "scripts/jeremykun_tootbot_usercred.secret"
        ),
    )

    syndication.syndicate_to_service(
        "mastodon",
        database_filepath=DATABASE_FILE,
        thread_publisher=mastodon_thread_publisher,
        thread_adjuster=mastodon_thread_adjuster,
        post_publisher=mastodon_post_publisher,
        since_days=since_days,
        dry_run=dry_run,
        mastodon_client=mastodon_client,
    )



if __name__ == "__main__":
    fire.Fire(publish_to_mastodon)

Twitter is similar, but requires more “thread adjusting” because of the shorter character limits, and I have a split_post function that handles that. It tries to split them at sentence boundaries close to the word limit, then at comma boundaries.

def split_post(post, max_char_len=300):
    if len(post) < max_char_len:
        return [post]

    # weird because re.split keeps the separators as list items
    # re_joined rejoins them together
    re_split = [p.strip() for p in re.split(r"(\. |, )", post)]
    re_joined = [
        i + j for i, j in zip_longest(re_split[::2], re_split[1::2], fillvalue="")
    ]
    subposts = deque(re_joined)

    for subpost in subposts:
        if len(subpost) > max_char_len:
            raise ValueError(f"Sentence is too long: {subpost}")

    accumulated_subposts = []
    while subposts:
        next_subpost = subposts.popleft()
        if not accumulated_subposts:
            accumulated_subposts.append(next_subpost)
            continue

        merged = accumulated_subposts[-1] + " " + next_subpost
        if len(merged) > max_char_len:
            accumulated_subposts.append(next_subpost)
        else:
            accumulated_subposts[-1] = merged

    return accumulated_subposts

And Bluesky is the weirdest, because the API is more complicated (create_strong_ref??) mainly because you have to provide things like links in terms of something called “facets” which has a lot of extra structure.

Links to syndicated versions at the end of each post

The GH Actions workflow runs the three scripts sequentially, makes three separate commits for each database update, and then at the end runs a final script add_links_on_posts that adds the syndication links to the end of each article.

The script itself is quite simple, except for Bluesky:

from scripts import utils as utils

SYNDICATION_FILES = {
    "mastodon": "scripts/published_mastodon.txt",
    "twitter": "scripts/published_twitter.txt",
    "bluesky": "scripts/published_bluesky.txt",
}


def add_links_on_posts():
    git_root = utils.get_git_root()
    shortform_path = git_root / "content" / "shortform"
    for service, file in SYNDICATION_FILES.items():
        print(f"Adding {service} links to shortform posts.")
        database_path = git_root / file
        syndicated_posts = utils.load_database(database_path)
        for blog_url, syndicated_url in syndicated_posts.items():
            # bluesky is weird, have to transform
            #
            #  at://did:plc:6st2p3o4niwz5olbxkuimxlk/app.bsky.feed.post/3ksggi2tfnk2t
            #
            # to
            #
            #  https://bsky.app/profile/jeremykun.com/post/3ksggi2tfnk2t
            #
            if service == "bluesky":
                key = syndicated_url.strip("/").split("/")[-1]
                syndicated_url = f"https://bsky.app/profile/jeremykun.com/post/{key}"

            blog_filename = blog_url.strip("/").split("/")[-1] + ".md"
            post_path = shortform_path / blog_filename
            with open(post_path, "r") as infile:
                post_lines = infile.readlines()
            output = utils.add_link(post_lines, f"{service}_url", syndicated_url)
            with open(post_path, "w") as outfile:
                outfile.write(output)


if __name__ == "__main__":
    add_links_on_posts()

The add_link function (at the end of utils.py) hides how the actual link is displayed. What I do is put the link in the yaml frontmatter of the markdown file, and then add some Hugo templating to display it if it’s present.

For example, the script would change the yaml frontmatter to look like this:

# content/shortform/2024-05-06-1018.md
---
title: "Remez and function approximations"
date: 2024-05-06T10:18:29-07:00
shortform: true
...
mastodon_url: https://mathstodon.xyz/@j2kun/112426452236288901
twitter_url: https://x.com/jeremyjkun/status/1798788845896339460
---

And then in my themes/paperesque/layouts/partials/single-article.html template I have the following code to display the links:

{{ if .Params.shortform }}
  <p>This article is syndicated on:</p>
  <ul>
    {{ if .Params.mastodon_url }}
    <li><a href="{{ .Params.mastodon_url }}" rel="syndication">Mastodon</a></li>
    {{ end }}
    {{ if .Params.bluesky_url }}
    <li><a href="{{ .Params.bluesky_url }}" rel="syndication">Bluesky</a></li>
    {{ end }}
    {{ if .Params.twitter_url }}
    <li><a href="{{ .Params.twitter_url }}" rel="syndication">Twitter</a></li>
    {{ end }}
  </ul>
{{ end }}

Warning for a too-long first paragraph

The first post for a shortform post becomes the first post in a social media thread. I want to make sure that first post is not awkwardly cut into pieces, so one thing I do is have a rendering error baked into Hugo that says, “shortform articles can’t have a first paragraph that’s too long.”

This gets put into single_article.html before {{ .Content }}:

{{ if .Params.shortform }}
  {{ $maxLength := 299 }}
  {{ $collapsed := (strings.Replace .Content "\n" " ")}}
  {{ $firstParagraph := index (index (strings.FindRESubmatch `<p.*?>(.*?)</p>` $collapsed 1) 0) 1}}
  {{ if gt (len $firstParagraph) $maxLength }}
    {{ errorf "The length of the first paragraph is %d, exceeds %d characters.\n\nFirst paragraph is:\n\n%s\n\n" (len $firstParagraph) $maxLength $firstParagraph }}
  {{ end }}
{{ end }}

Triggering this workflow automatically after deployment

Netlify supports webhooks after a build is completed, and you can cnofigure this by adding a netlify/functions/deploy-succeeded.mjs to the repo. This file contains some javascript to fire a POST request with some secrets that you have to give to Netlify.

import fetch from "node-fetch";

const syndicate_url = 'https://api.github.com/repos/j2kun/math-intersect-programming/actions/workflows/syndicate.yml/dispatches';

export default async (req, context) => {
  const apiKey = Netlify.env.get("GITHUB_TRIGGER_ACTION_PAT");

  if (apiKey == null) {
    return new Response("Need env GITHUB_TRIGGER_ACTION_PAT", { status: 401 });
  }

  const response = await fetch(syndicate_url, {
    method: 'POST',
    headers: {
      'Accept': 'application/vnd.github+json',
      'Authorization': 'Bearer ' + apiKey,
      'X-GitHub-Api-Version': '2022-11-28',
      'Content-Type': 'application/x-www-form-urlencoded'
    },
    body: '{"ref":"main"}'
  });
  if (syndicate_response.ok) {
    return new Response("Successfully triggered action", { status: 200 });
  } else {
    return new Response("Failed to trigger action", { status: 500 });
  }
};

So with this, pushing a new article to main kicks off the Netlify build, which publishes the new post, then calls the syndication script, which syndicates and further adds new content (the syndication links) and when that is committed netlify deploys again.

The next most complicated thing I implemented is a twist on the idea of a blogroll. I don’t like blogrolls because they typically get stale. Most blogrolls point to dead blogs. Instead, I wanted to make a blogroll that points to specific articles I enjoyed, with some sense of recency so that people who visit my homepage get a sense of where my head is at these days. I also wanted to have an easy way to add to the list without having to fire up a text editor.

So here’s how I did that, which culminated in my “What I’m Reading” page and homepage sidebar which shows the latest three entries I added to the blogroll.

Internally, it’s driven by a flat file called blogroll.txt That contains entries with a URL and title separated by newlines:

https://vickiboykis.com/2024/09/19/dead-internet-souls/
Dead Internet Souls

https://blog.nelhage.com/post/fuzzy-dedup/
Finding near-duplicates with Jaccard similarity and MinHash

https://mathenchant.wordpress.com/2023/09/18/when-five-isnt-prime/
When Five Isn’t Prime

...

Then there is a script add_url_to_blogroll.py that takes a URL as a CLI input, updates the blogroll.txt file, and then updates the two template pages that have the blogroll data in them.

The hard part here is actually determining from the URL what the page title is. The <title> tag often has extra stuff in it (like the title of the blog). The <meta property="og:title"> tag is often more reliable, but not always present. Otherwise I will typically dig through <h1> tags and use hints about the css class names or parent tags to figure out if it’s the actual title, but it breaks in stupid ways. For one, IACR preprint pages will detect you’re not using javascript, and then insert an <h1> that says “What a lovely hat”. Thankfully IACR also uses og:title, but still, don’t use h1 for this.

Once the database is updated, then the script updates content/blogroll/_index.md with a simple blogroll shortcode like Data scientists work alone and that's bad (www.ethanrosenthal.com)that renders as

# layouts/shortcodes/blogroll.html
<a class="u-like-of" href="{{ .Get "url" }}" target="_blank">{{ .Get "title" }}</a> ({{ .Get "domain" }})

And more manual HTML in layouts/partials/sidebar.html so that it looks like

<div class="sidebarblogroll">
  <a href="/blogroll" class="sidebar-large">What I'm Reading</a>
<ul>
       <li><a href="https://www.ethanrosenthal.com/2023/01/10/data-scientists-alone/">Data scientists work alone and that's bad</a> (www.ethanrosenthal.com)</li>
       <li><a href="https://billwear.github.io/art-of-attention.html">the quiet art of attention</a> (billwear.github.io)</li>
       <li><a href="https://blog.jreyesr.com/posts/typst/">Exploring Typst, a new typesetting system similar to LaTeX</a> (blog.jreyesr.com)</li>
</ul>
</div>

This is all done with very manual string processing kludgery as you can see for yourself in the linked GitHub gist.

Chrome extension

To make adding articles to my blogroll easy, I made a Chrome Extension that, when clicking the extension icon, triggers the GH Action workflow to add a URL, using the URL of the current page as the parameter.

This works for any workflow, and I have it loaded into my browser and configured to trigger my add_url_to_blogroll workflow and script.

My workflow looks like this—just like all the others, run the script and commit—the only difference is the workflow_dispatch inputs and how that’s referenced as a CLI arg.

name: Add URL to blogroll
permissions:
  contents: write
on:
  workflow_dispatch:
    inputs:
      url:
        description: 'URL to add to blogroll'
        required: true

jobs:
  add_url:
    runs-on: ubuntu-latest
    steps:
      - name: Echo URL
        run: echo "${{ github.event.inputs.url }}"
      - uses: actions/checkout@v4
      - name: Set up Python 3.11
        uses: actions/setup-python@v5
        with:
          python-version: 3.11
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt          

      # Run script
      - name: Run add_url_to_blogroll.py
        run: |
          python -m scripts.add_url_to_blogroll --url="${{ github.event.inputs.url }}"          

      - name: Commit changes to blogroll pages
        run: |
          git config --local user.name  ${{ github.actor }}
          git config --local user.email "${{ github.actor }}@users.noreply.github.com"
          test -z "$(git status --porcelain)" || git add scripts/blogroll.txt content/blogroll/_index.md layouts/partials/sidebar.html
          test -z "$(git status --porcelain)" || git commit -m "blogroll: add ${{ github.event.inputs.url }}"          
      - name: Push changes
        uses: ad-m/github-push-action@master
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          branch: ${{ github.ref }}

Webmentions and referencing external discussion threads

Next we have webmentions. I start by adding https://webmention.io support via the layouts/partials/webmentions.html containing

<link rel="webmention" href="https://webmention.io/www.jeremykun.com/webmention" />

And added {{ partial "webmentions" . }} to the header section of themes/paperesque/layouts/_default/baseof.html. This allows people who want to post webmentions to find the right endpoint.

Finally, I added a webmention.js to static/ which contains a minified version of my fork of the webmention.js project. This queries webmention.io for my webmentions at page load time. I forked it basically because I didn’t like how the webmentions were displayed by default, so I hacked it up to make everything look like the “comment” style, and put special cases for Hacker News and Reddit.

Run the minifier from the README, and copy it to my static directory, and it was easy to tweak it to my liking.

Bridgy

Bridgy is a nice service that lets you find folks linking to your site on places like Reddit and Twitter, and it sends you webmentions.

It’s easy to set up but it has one fatal flaw: it detects each and every post in my syndicated threads as webmentions. The dev who maintains bridgy basically said, “yeah that sucks”. But now, a few months later, when I test it the behavior seems slightly different. I don’t get any webmentions from anything I post myself, but when other people favorite my posts, I get webmentions for those.

So maybe I just need some more client-side filtering or something else to get that part right. I’ll keep working on it.

At least, what’s nice here is that I don’t need to add anything to my blog, since the webmentions are handled externally.

Hacker News backlinks

My blog gets on HackerNews regularly, but Bridgy doesn’t support it.

So I added a simple script that doesn’t touch my blog itself, but queries the HackerNews API for links to my blog, and then sends webmentions to my blog for each one.

The script is relatively simple, and runs on a schedule via GitHub Actions. I ran it once with a huge since_days value to get the entire HN history queried once, and now it just checks the last week of stories when it runs.

This uses the very nice indieweb_utils package in a util file to send the webmentions.

Outgoing webmentions

To participate in the IndieWeb ecosystem, I also want to send outgoing webmentions whenever I link to someone else’s blog in an article.

I have another script, outgoing.py which handles outgoing webmentions. In the same workflow as the one that looks for HackerNews articles, I scan my own blog for posts that haven’t been processed, with special handling for the blogroll, parse it looking for links, and send webmentions to those pages.

This uses the very nice indieweb_utils package in a util file to send the webmentions.

That said, I find very few webmentions are actually sent. I may be doing something wrong, or maybe I’m just not linking to enough IndieWeb people.

DOIs

Rogue Scholar is a serviec that provides DOIs for articles on scientific blogs. It automates the whole process of getting DOIs and provides an API for querying them.

So I have a script, run again in a workflow on a schedule, that queries Rogue Scholar for DOIs for my blog posts, and, similarly to the syndication script, adds them to the frontmatter of posts, and renders them in the template if present.

{{ if .Params.doi }}
  <p>DOI: <a href="{{ .Params.doi }}" rel="doi">{{ .Params.doi }}</a></p>
{{ end }}

Dead link checker

I have yet another workflow that runs the Lychee dead link checker on my blog once a week.

Here’s the workflow:

name: Check for dead links

on:
  repository_dispatch:
  workflow_dispatch:
  schedule:
    - cron: "0 6 7,21 * *"

jobs:
  linkChecker:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Restore lychee cache
        uses: actions/cache@v4
        with:
          path: .lycheecache
          key: cache-lychee-${{ github.sha }}
          restore-keys: cache-lychee-

      # check for existing issue
      - name: Find Link Checker Issue
        uses: micalevisk/last-issue-action@v2
        with:
          state: open
          labels: |
            link-checker            

      - name: Setup Hugo
        uses: peaceiris/actions-hugo@v3
        with:
          hugo-version: '0.119.0'
          extended: true

      - name: Build
        run: hugo

      - name: Link Checker
        id: lychee
        uses: lycheeverse/lychee-action@v1
        with:
          args: --accept 200,429 --verbose --max-concurrency 1 --cache --max-cache-age 7d --exclude 'linkedin.com' --exclude 'fonts.googleapis.com' --exclude 'pnas.org' --exclude 'tandfonline.com' --exclude 'ogp.me' --exclude 'fonts.gstatic.com' --exclude 'dl.acm.org/doi' --exclude 'sciencemag.org' --exclude 'web.archive.org' --exclude 'doi.org' --exclude 'gmplib.org' --exclude 'github.com/j2kun/mlir-tutorial' -r 5 -t 50  --archive wayback --suggest --base https://www.jeremykun.com public

      - name: Update Issue
        uses: peter-evans/create-issue-from-file@v5
        if: env.lychee_exit_code != 0
        with:
          title: Broken links detected in docs 🔗
          issue-number: "${{ steps.link-checker-issue.outputs.issue_number }}"
          content-filepath: ./lychee/out.md
          token: ${{secrets.GITHUB_TOKEN}}
          labels: |
            link-checker

Note: there are so many --exclude flags because the tool just fails to handle a bunch of domains in ways I can’t quite understand. So stuff that seems like it has longevity (like journal websites and stuff that were constantly giving false positives), I just ignore the failures.

The last step creates an issue, so I get an email when stuff breaks.

Want to respond? Send me an email, post a webmention, or find me elsewhere on the internet.

This article is syndicated on:

How This Blog Does IndieWeb