Daniel Gray

Thoughts, Notes, Ideas, Projects

← Back to home

Blog Architecture - From Markdown to Web

This article explores the content processing pipeline: how Obsidian markdown files are transformed into web pages. We'll trace a file's journey from your Obsidian vault to a published blog post.

This is part 3 of the Blog Architecture Deep Dive series. Start with Blog Architecture - Overview if you haven't read it yet.

The Processing Pipeline

When you write a markdown file in Obsidian and set published: published in the frontmatter, the blog system processes it through several stages:

  1. File Discovery - Finding all published markdown files

  2. Metadata Extraction - Reading frontmatter and file information

  3. Content Processing - Converting markdown to HTML

  4. Link Resolution - Converting Obsidian links to web links

  5. Static Generation - Pre-rendering pages at build time

Let's explore each stage in detail.

Stage 1: File Discovery

The system starts by scanning your Obsidian vault for all markdown files. This happens in the getAllMarkdownFiles() function, which:

Recursive Directory Scanning

The scanner:

  • Recursively traverses all subdirectories

  • Finds all .md files at any depth

  • Handles symlinks safely by tracking visited real paths to prevent infinite loops

  • Skips hidden files (like .obsidian, .git) automatically

  • Respects depth limits (max 100 levels) to prevent stack overflow

Symlink Handling

Obsidian vaults can contain symlinks (symbolic links) that point to other directories. The scanner uses fs.realpathSync() to resolve symlinks and tracks visited real paths to prevent circular references.

Publishing Detection

Not all markdown files should be published. The system checks each file for:

  • published: published in frontmatter

Files with published: draft in frontmatter, or without a published field, are excluded.

Stage 2: Metadata Extraction

Once a file is identified as published, the system extracts metadata from it.

Title Extraction (Priority Order)

  1. Frontmatter title: field - Explicit title in YAML

  2. First # H1 heading - First heading in the content

  3. Filename - Basename with spaces/underscores converted

Date Extraction

  1. Frontmatter date: field - ISO format: YYYY-MM-DD

  2. File creation time - Fallback for simple posts without frontmatter

Categories

Extracted from multiple sources:

  • categories: field (array or comma-separated)

  • category: field (single category)

  • tags: field (all tags except "published")

Series Information

The system extracts series metadata:

  • series: - Name of the series

  • seriesOrder: - Order within the series (1, 2, 3, etc.)

  • parentSeries: - If this series is nested within another

Slug Generation

Slugs are generated from the file's relative path within the vault, preserving directory structure:


File: Science/Planetary Science & Space.md

Slug: Science/Planetary_Science___Space

The slug generation:

  • Replaces spaces with underscores

  • Handles special characters

  • Preserves directory structure in URLs

  • Allows custom URLs via url: in frontmatter

Excerpt Generation

Excerpts are automatically generated from content:

  • Removes frontmatter and HTML comments

  • Strips markdown formatting

  • Converts Obsidian links to plain text

  • Truncates to 200 characters

Stage 3: Content Processing

The markdown content is converted to HTML using the remark and rehype ecosystem.

Markdown to HTML Pipeline

The processing pipeline:

  1. Obsidian Image Processing - Converts ![image](/blog-content/image.png) to standard markdown

  2. Obsidian Link Processing - Converts <span class="obsidian-link disabled" data-link="links" title="Link to: links (not yet created)">links</span> to HTML links (before markdown conversion)

  3. Markdown Parsing - Uses remark to parse markdown

  4. GitHub Flavored Markdown - Adds support for tables, strikethrough, task lists

  5. LaTeX Math - Parses and renders math equations with KaTeX

  6. Syntax Highlighting - Adds code highlighting with Prism

  7. HTML Conversion - Converts the AST to HTML

  8. Post-Processing - Handles escaped links, removes duplicate H1 tags

Obsidian Image Processing

Obsidian uses ![image](/blog-content/image.png) syntax for images. The system converts this to standard markdown:


![image](/blog-content/image.png) → ![alt](/blog-content/image.png)

This handles:

  • Images with spaces in filenames (URL encoding)

  • Alt text from ![alt text](/blog-content/image.png)

  • Path resolution from the vault structure

Stage 4: Link Resolution

Obsidian <span class="obsidian-link disabled" data-link="wiki links" title="Link to: wiki links (not yet created)">wiki links</span> are converted to HTML links. This is one of the most complex parts of the system.

Link Processing Pipeline

Links are processed in multiple stages:

  1. Pre-HTML conversion - Process links in markdown before remark converts to HTML

  2. Post-HTML conversion - Catch any escaped links (from code blocks, etc.)

  3. HTML entity handling - Process HTML-escaped brackets

This multi-pass approach ensures links are resolved even in edge cases.

Matching Strategy

The link resolver uses a three-tier matching strategy:

1. Path-based Matching

For links with / (directory structure):

  • Matches against relativePath (preserves directory structure)

  • Case-insensitive

  • Handles URL encoding

2. Filename Matching

  • Matches against fileName (basename without extension)

  • Also matches last segment of relativePath

  • Handles variations in spacing/capitalization

3. Fuzzy Matching

  • Normalizes both link and filename (removes spaces, special chars)

  • Handles edge cases like "Moon forming Impact" vs "Moon Forming Impact"

Link States

Links can be in two states:

  • Active: Points to an existing published post → <a href="/blog/slug">

  • Disabled: Points to non-existent post → <span class="obsidian-link disabled"> (styled differently, not clickable)

Disabled links are rendered but not clickable, allowing you to see what you intended to link to even if the post doesn't exist yet.

Stage 5: Static Generation

Next.js pre-generates all post pages at build time using generateStaticParams().

This ensures:

  • All pages are pre-rendered (fast, SEO-friendly)

  • 404s are handled for non-existent posts

  • URL encoding is handled correctly

Catch-All Route Handling

The [...slug] route handles nested paths:


URL: /blog/Science/Planetary_Science

params.slug = ['Science', 'Planetary_Science']

slugString = 'Science/Planetary_Science'

Each segment is URL-decoded to handle special characters properly.

Development vs Production

The system operates differently in development and production:

Development Mode

  • Reads directly from source Obsidian vault

  • Live updates: Changes to markdown files appear immediately (no rebuild needed)

  • Fast iteration: Edit in Obsidian, see changes in browser

Production Mode

  • Reads from copied content (blog-content/ directory)

  • Build-time copy: npm run build copies vault content to blog-content/

  • Deployment-ready: All content is colocated with the app

This dual-mode approach provides the best of both worlds: fast development and reliable deployment.

Error Handling

The system includes comprehensive error handling:

  • Missing vault: Clear error message with setup instructions

  • Permission errors: Helpful macOS-specific guidance (Full Disk Access)

  • Circular symlinks: Detected and skipped with warnings

  • Invalid frontmatter: Gracefully falls back to defaults

  • Broken links: Rendered as disabled spans (not errors)

Performance Optimizations

Several optimizations ensure fast performance:

  1. Static generation: All pages pre-rendered at build time

  2. Parallel processing: Promise.all() for concurrent file reading

  3. Efficient matching: Early returns in link resolution

  4. Caching: Next.js caches static pages aggressively

  5. Lazy components: LinkedPosts uses Intersection Observer

Conclusion

The content processing pipeline transforms Obsidian markdown files into web pages through a series of well-defined stages. Each stage handles specific aspects of the conversion:

  • File Discovery finds all published content

  • Metadata Extraction reads frontmatter and file information

  • Content Processing converts markdown to HTML with rich features

  • Link Resolution creates navigation from wiki links

  • Static Generation pre-renders pages for fast loading

The result is a blog that maintains Obsidian's authoring experience while delivering a modern web experience.

Next Steps

Continue to Blog Architecture - Links and Navigation to learn how Obsidian links become web navigation and how the content tree is built.

Related Content


This article is part of the Blog Architecture Deep Dive series. Previous: Blog Architecture - Why Obsidian?. Next: Blog Architecture - Links and Navigation.

Explore Categories

Related Content

Blog Architecture - Overview

Blog Architecture - Overview This is the first in a series of articles exploring the architecture of this blog. We'll start with a high-level overview that anyone can understand, then progressively di...

Blog Architecture - Links and Navigation

Blog Architecture - Links and Navigation This article explores how Obsidian `wiki links` become web navigation, how the content tree is built, and how users navigate the blog. This is part 4 of the Bl...

Blog Architecture - Why Obsidian?

Blog Architecture - Why Obsidian?

Blog Architecture - Why Obsidian? This article explores why Obsidian was chosen as the authoring tool for this blog and how its unique features integrate with the web publishing system. This is part 2...

Blog Architecture

As a side project getting started with this blog, I worked though An adventure with 3js 3d Backgrounds in 3js. I wanted something unique. Not quite what I intended when I set off to make it, but it se...

Navigating Blog Content

Navigating Blog Content

Navigating Blog Content !images/Pasted image 20251123205959.png This blog is designed with multiple ways to explore and discover content, inspired by tools like Obsidian that emphasize connections and...