A Test-Driven Approach to Translating a Markdown Document into HTML

Introduction

Currently, I write these blog posts in Obsidian, owing to its clean interface and the relatively simple way in which images and tables can be created in markdown.[1] However, to display these posts on this blog this markdown needs to be replaced by HTML.

Doing this by hand is tedious, error–prone, and not an efficient use of time. There are some options which already exist, such as Showdown in JavaScript.[2] However, I wanted the HTML to contain some class names so that they can take advantage of the CSS options provided by Bulma.[3]

Here, I follow a test–driven approach[4] to writing a simple program to take in a markdown file and convert it into HTML. This reads the input token–by–token and determines what rule to apply based on what tokens it sees. In a later post I will present a better way to do this with an existing tool.

Throughout the post, I highlight some instances where the initial grammar is incomplete, or not sufficiently understood, and some of the changes this brings about.

A First Pass at The Grammar

The grammar is a translation guide, stating how rules in one system are expressed in another.

The markdown rules followed by Obsidian are all listed on their website.[5] Those for Bulma are also listed on their website. Here, I am only concerned with a subset of the rules needed to display certain elements of text.

To start, I scanned posts which I've already written and picked out some rules.

   
Markdown HTML
text; more text text <p> more text </p>
`text` <code>text</code>
```some_programming_lang text ``` <code>text</code>
| col name one | col name two | |-|-| | row contents one | row contents two | <table class="table"> <thead> <tr> <th scope="col">col name one</th> <th scope="col">col name two</th> </tr> </thead> <tbody> <tr> <th scope="row">row contents one</th> <td>row contents two</td> </tr> </tbody> </table>
# text <h1>text</h1>
## text <h2>text</h2>
[^1] <a id="footnote–anchor–1" href="#footnote–1">[1]</a>
[^1]: text <p id="footnote–1"> <a href="#footnote–anchor–1">[1]</a> text </p>
*text* <i>text</i>
**text** <b>text</b>
***text*** <i><b>text</b></i>
' '
< <
> >
" "
– One, – Two, – Three <ul> <li> One,</li> <li> Two,</li> <li> Three.</li> <ul>
All other characters Add as–is

There are a few assumptions made here, which are particular to the blog posts.

  • I assume that all posts start with some type of heading. Therefore, whatever the first line is, it is not wrapped in <p></p> tags.
  • At this point, I determined that the grammar was not going to be fully–recursive, but had not yet figured out what parts would be. For example, I do not currently allow tables to exist within other tables. Moreover, '#' characters in URLs and lists do not creating headings. However, there may have been other cases where a recursive relationship should be allowed.
  • The footnote numbers may not be in the correct order in text. This may arise due when editing the file to include a footnote, resulting in a case where footnote 2 is before footnote 1. When output, however, they should be renumbered to be in the correct order.

These are subject to change, particularly as omissions (of which there are a couple) are drawn out by test cases.

Approach

I took a token–by–token approach, where the rule to be applied is figured out based on the token at hand. An alternative would have been to feed in a string and replace parts of it with string manipulation functions. I avoided this as I wanted to have some assurance that certain rules were not applied in particular areas. For example, only HTML entities should be replaced within code blocks.

A note on new lines.

Windows systems use '\r\n' to draw a new line, whilst Linux systems simply use '\n'.[6] To ensure that this was caught when a new file is loaded, I initially stored test cases in text files. Note however that when comparing the results, if one output uses carriage returns and the other uses new lines, the difference will not be shown in most string outputs. Code for loading these files and running the tests is included in the main_test.go file on GitHub. In the below, I will show tests within the code rather than using files.

New lines between HTML tags are ignored when checking the output of the interpreter against the solution. Browsers are agnostic to whether there are new lines between HTML tags, and tests may end up failing due to an additional line character being included between HTML tags.

For code blocks, however, new line characters do matter. Here, I have included the new lines.

Aside: Possible error if using a string slice.

Note that here I've used a string builder rather than a slice of runes.[7] This is to avoid an issue where earlier runes in the slice can be overwritten.[8] The error does not always arise, but when it does it can be rather annoying to debug. In a different version of this project, I noticed the following when print debugging:


   ../test_files/plain_text_over_multiple_lines.txt
   rune matcher []
   rune matcher [T]
   rune matcher [T h]
   rune matcher [T h i]
   rune matcher [T h i s]
   ]une matcher [T h i s
   check if paragraph
   ]T h i s
   next r 105 i
   ]une matcher [T h i s
    i]e matcher [T h i s
    i s]matcher [T h i s
   ]i s matcher [T h i s
   check if paragraph
   ]i s i s
   next r 115 s
   ]i s matcher [T h i s
    s]s matcher [T h i s
    s o]matcher [T h i s
    s o m]tcher [T h i s
    s o m e]her [T h i s
    s o m e  ]r [T h i s
   

One can either use a string builder, or create a linked list which recursively appends strings at each node together to avoid this. The former is lighter, and thus used here.

File Structure

The directory for this will be very simple:


   – main
   | – main.go
   | – main_test.go
   

Single–Criteria Tests

An empty line

The simplest input file is an empty one, which should also return an empty string.

main_test.go


   package main
   
   import (
       "bytes"
       "testing")
   
   func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
       if convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(""))) != "" {
          t.Error("expected an empty input string ")
       }
   }
   

Making main.go:


   package main
   
   import (
       "bytes"
   )
   
   func main() {}
   
   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
       return ""
   }
   

Whilst on the command line, in the ./main directory, one can run:


   go test
   

To run the tests. Here, it passes with flying colours.

A line without markdown

Similarly, a single line of text without any markdown characters should also be returned as–is:


   func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
       if convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(""))) != "" {
          t.Error("expected an empty input string ")
       }
       if convertMarkdownFileToBlogHTML(bytes.NewReader([]byte("This is a plain text file."))) != "This is a plain text file." {
          t.Errorf("expected %s to be returned as–is", "This is a plain text file.")}
   }
   

The immediately causes main.go to fail:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:13: expected This is a plain text file. to be returned as–is
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   
   FAIL
   

This is easily fixed with a little shenanigan.


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
       _, _, err := br.ReadRune()
       if err == io.EOF {
          return ""
       }
       if err != nil {
          log.Fatal("unable to read rune:", err)
       }
       return "This is a plain text file."
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Headers

A header on a single line should be returned in header tags. At this point, it is worth creating a 'test' struct to put the inputs and expected outputs into.


   type testCase struct {
       input  string
       output string
   }
   
   func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
       testCases := []testCase{
          {
             input:  "",
             output: "",
          },
          {
             input:  "This is a plain text file.",
             output: "This is a plain text file.",
          },
          {
             input:  "# This is an h1 header",
             output: "<h1> This is an h1 header</h1>",
          },
       }
   
       for i, tst := range testCases {
          res := convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(tst.input)))
          if res != tst.output {
             t.Errorf(
                "TestConvertMarkdownFileToBlogHTML test number: %d \nexpected: \n%s \nbut got: \n%s",
                i, tst.output, res,
             )
          }
       }
   }
   

Back to failure mode:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:32: TestConvertMarkdownFileToBlogHTML test number: 2
           expected:
           <h1> This is an h1 header</h1>
           but got:
           This is a plain text file.
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Now one has to read at least the first rune. This is easy to add.


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
       r, _, err := br.ReadRune()
       if err == io.EOF {
          return ""
       }
       if err != nil {
          log.Fatal("unable to read rune:", err)
       }
       if r == '#' {
          return "<h1> This is an h1 header</h1>"
       }
       return "This is a plain text file."
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

What about an h2 header?


   func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
       testCases := []testCase{
          // ...
          {
             input:  "## This is an h2 header",
             output: "<h2> This is an h2 header</h2>",
          },
       }
   
       for i, tst := range testCases {
          res := convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(tst.input)))
          if res != tst.output {
             t.Errorf(
                "TestConvertMarkdownFileToBlogHTML test number: %d \nexpected: \n%s \nbut got: \n%s",
                i, tst.output, res,
             )
          }
       }
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:36: TestConvertMarkdownFileToBlogHTML test number: 3
           expected:
           <h2> This is an h2 header</h2>
           but got:
           <h1> This is an h1 header</h1>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Now one has to read at least 16 runes. Let's try a solution which reads the whole line. There will be a string builder to update the value to be returned. Otherwise, a solution can still be hacked together.


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
       sb := strings.Builder{}
   
      headerCount := 0
   
      for {
          r, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
          if r == '#' {
             if headerCount == 0 {
                sb.WriteString("<h1>")
                headerCount++
   
             } else {
                headerCount++
             }
          } else {
             sb.WriteRune(r)
          }
      }
   
      if headerCount > 0 {
          sb.WriteString("</h1>")
      }
   
      res := sb.String()
   
      if headerCount > 1 {
          res = strings.ReplaceAll(res, "1", "2")
      }
   
      return res
   }
   

An h3 test will require a slight adjustment:


   //...
      {
          input:  "### This is an h3 header",
          output: "<h3> This is an h3 header</h3>",
      },
   //...
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:40: TestConvertMarkdownFileToBlogHTML test number: 4
           expected:
           <h3> This is an h3 header</h3>
           but got:
           <h2> This is an h3 header</h2>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   


   //...
      if headerCount > 1 {
          res = strings.ReplaceAll(res, "1", strconv.Itoa(headerCount))
      }
   //...
   

Now, what if there is one of more '#' in the middle of the header string?


   //...
      {
          input:  "### This is an ### h3 ### header",
          output: "<h3> This is an ### h3 ### header</h3>",
      },
   //...
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:44: TestConvertMarkdownFileToBlogHTML test number: 5
           expected:
           <h3> This is an ### h3 ### header</h3>
           but got:
           <h9> This is an  h3  header</h9>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Well, that failed. It failed in a strange way as well – the '3's in the header tags were replaced, but not the '3' in the middle of the string.

More condition checking and hack–ons will resolve this:


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
       sb := strings.Builder{}
   
       headerCount := 0
       finishedCountingHeaderTagsForLine := false
   
       for {
          r, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
          if r == '#' {
             if headerCount == 0 {
                sb.WriteString("<h1>")
                headerCount++
   
             } else if !finishedCountingHeaderTagsForLine {
                headerCount++
   
             } else if finishedCountingHeaderTagsForLine {
                sb.WriteRune('#')
             }
   
          } else if r == ' ' && !finishedCountingHeaderTagsForLine {
             sb.WriteRune(' ')
             finishedCountingHeaderTagsForLine = true
   
          } else {
             sb.WriteRune(r)
          }
       }
   
       if headerCount > 0 {
          sb.WriteString("</h1>")
       }
   
       res := sb.String()
   
       if headerCount > 1 {
          res = strings.ReplaceAll(res, "1", strconv.Itoa(headerCount))
       }
   
       return res
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Pause to refactor

The string of if statements has the makings of a switch statement. Let's try replacing it.


   //...
      for {
          r, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
   
          switch r {
          case '#':
             if headerCount == 0 {
                sb.WriteString("<h1>")
                headerCount++
   
             } else if !finishedCountingHeaderTagsForLine {
                headerCount++
   
             } else if finishedCountingHeaderTagsForLine {
                sb.WriteRune('#')
             }
   
          case ' ':
             if !finishedCountingHeaderTagsForLine {
                sb.WriteRune(' ')
                finishedCountingHeaderTagsForLine = true
   
             } else {
                sb.WriteRune(r)
             }
   
          default:
             sb.WriteRune(r)
          }
      }
   //...
   

Which can further be simplified:


   //...
      switch r {
      case '#':
          if !finishedCountingHeaderTagsForLine {
             headerCount++
   
          } else if finishedCountingHeaderTagsForLine {
             sb.WriteRune('#')
          }
   
      case ' ':
          if headerCount > 0 && !finishedCountingHeaderTagsForLine {
             sb.WriteString("<h" + strconv.Itoa(headerCount) + "> ")
             finishedCountingHeaderTagsForLine = true
   
          } else {
             sb.WriteRune(r)
          }
   
      default:
          sb.WriteRune(r)
      }
   //...
   

The hacked on strings.ReplaceAll() function at the end can also be refactored out. When there is a new line character, the header is complete.


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
       sb := strings.Builder{}
   
       headerCount := 0
       finishedCountingHeaderTagsForLine := false
   
       for {
          r, _, err := br.ReadRune()
          if err == io.EOF {
             if headerCount > 0 {
                sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
             }
             break
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
   
          switch r {
          case '#':
             if !finishedCountingHeaderTagsForLine {
                headerCount++
   
             } else if finishedCountingHeaderTagsForLine {
                sb.WriteRune('#')
             }
   
          case ' ':
             if headerCount > 0 && !finishedCountingHeaderTagsForLine {
                sb.WriteString("<h" + strconv.Itoa(headerCount) + "> ")
                finishedCountingHeaderTagsForLine = true
   
             } else {
                sb.WriteRune(r)
             }
   
          case '\n':
             if headerCount > 0 {
                sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
             }
   
          default:
             sb.WriteRune(r)
          }
       }
   
       return sb.String()
   

There is some duplicated code which also can be refactored out:


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
      //...
      for {
          r, _, err := br.ReadRune()
          if err == io.EOF {
             if headerCount > 0 {
                addClosingHeaderTag(sb, headerCount)
             }
             break
          }
   
         //...
          case '\n':
             if headerCount > 0 {
                addClosingHeaderTag(&sb, headerCount)
             }
   
         //...
      }
      //...
   }
   
   func addClosingHeaderTag(sb *strings.Builder, headerCount int) {
       sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
   }
   

Note that a pointer to sb has to be passed as the argument to addClosingHeaderTag. If not, sb will be copied onto the stack, resulting in the following error:


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   panic: strings: illegal use of non–zero Builder copied by value [recovered]
      panic: strings: illegal use of non–zero Builder copied by value
   

As a pointer, all runs well:


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

The tests can also be given names, to provide more feedback for debugging.


   func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
       testCases := []testCase{
          {
             name:   "an empty string should be returned as–is",
             input:  "",
             output: "",
          },
          {
             name:   "a single line with no markdown character should be returned as–is",
             input:  "This is a plain text file.",
             output: "This is a plain text file.",
          },
          //...
      }
   
      for i, tst := range testCases {
          res := convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(tst.input)))
          if res != tst.output {
             t.Errorf(
                "TestConvertMarkdownFileToBlogHTML test number: %d \nTest name: %s \nexpected: \n%s \nbut got: \n%s",
                i, tst.name, tst.output, res,
             )
          }
      }
   }
   

Plain text over multiple lines

So far, all of the input text has been on a single line. It is worth checking for text over multiple lines now and for paragraphs.

Note that in the testCases variable I have explicitly included \n characters into the string, rather than having the string itself be over multiple lines. This is to avoid introducing a lot of whitespace runes between different lines when it shouldn't be there. It does, however, come at the cost of legibility. To alleviate this somewhat, I will show the input and output both as they should be in the files being read from and written to, and then in the test cases in the code.

The output for the below text over multiple lines will be the same as the input.


   This
   is
   some simple text
   which has been spread out
   across multiple lines.
   


   //...
      {
          name:   "text across multiple lines with no markdown should be returned as–is",
          input:  "This\nis\nsome simple text\nwhich has been spread out\nacross multiple lines.",
          output: "This\nis\nsome simple text\nwhich has been spread out\nacross multiple lines.",
      },
   //...
   

The tests fail again:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:48: TestConvertMarkdownFileToBlogHTML test number: 6
           expected:
           This
           is
           some simple text
           which has been spread out
           across multiple lines.
           but got:
           Thisissome simple textwhich has been spread outacross multiple lines.
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

A one–line fix is all that is needed:


   case '\n':
       if headerCount > 0 {
          addClosingHeaderTag(&sb, headerCount)
       }
       sb.WriteRune('\n')
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Paragraphs

Now with paragraph tags:


   This is a first line.
   
   Paragraph one.
   


   This is a first line.
   <p>
   Paragraph one.
   </p>
   


   {
       name:   "paragraph tags should be added if there is a blank line before a line of plain text",
       input:  "This is a first line.\n\nParagraph one.",
       output: "This is a first line.\n<p>\nParagraph one.\n</p>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:61: TestConvertMarkdownFileToBlogHTML test number: 7
           Test name: paragraph tags should be added if there is a blank line before a line of plain text
           expected:
           This is a first line.
           <p>
           Paragraph one.
           </p>
           but got:
           This is a first line.
   
           Paragraph one.
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

If there are two successive \n characters, add a paragraph tag. At the next \n, add a closing paragraph tag.


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
       //...
       lastCharacterWasANewLine := false
       thereIsAParagraphToClose := false
      for {
         // ...
         case '\n':
             if headerCount > 0 {
                addClosingHeaderTag(&sb, headerCount)
             }
             sb.WriteRune('\n')
   
             if thereIsAParagraphToClose {
                sb.WriteString("</p>")
                thereIsAParagraphToClose = false
             }
   
             if lastCharacterWasANewLine {
                sb.WriteString("<p>")
                thereIsAParagraphToClose = true
   
             } else {
                lastCharacterWasANewLine = true
             }
   
         default:
             lastCharacterWasANewLine = false
             sb.WriteRune(r)
         }
      }
   
      if thereIsAParagraphToClose {
          sb.WriteString("</p>")
      }
   
      //...
   }
   

Which fails, ironically enough, due to new line characters:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:61: TestConvertMarkdownFileToBlogHTML test number: 7
           Test name: paragraph tags should be added if there is a blank line before a line of plain text
           expected:
           This is a first line.
           <p>
           Paragraph one.
           </p>
           but got:
           This is a first line.
   
           <p>Paragraph one.</p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

A little rearranging resolves this:


   case '\n':
       if headerCount > 0 {
          addClosingHeaderTag(&sb, headerCount)
       }
   
       if thereIsAParagraphToClose {
          sb.WriteRune('\n')
          sb.WriteString("</p>")
          thereIsAParagraphToClose = false
       }
   
       if lastCharacterWasANewLine {
          sb.WriteString("<p>")
          thereIsAParagraphToClose = true
       } else {
          lastCharacterWasANewLine = true
       }
       sb.WriteRune('\n')
   


   if thereIsAParagraphToClose {
       sb.WriteRune('\n')
       sb.WriteString("</p>")
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now what about two paragraphs?


   This is a first line.
   
   Paragraph one.
   
   Paragraph two.
   


   This is a first line.
   <p>
   Paragraph one.
   </p>
   <p>
   Paragraph two.
   </p>
   


   {
       name:   "paragraph tags should be added if there is a blank line before a line of plain text for multiple paragraphs",
       input:  "This is a first line.\n\nParagraph one.\n\nParagraph two.",
       output: "This is a first line.\n<p>\nParagraph one.\n</p>\n<p>\nParagraph two.\n</p>",
   },
   

This passed without issue with the code as–is:


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Italics & Bold

Italics text is, seemingly, straightforward:


   *italic text*
   


   <i>italic text</i>
   

Simply added:


   thereIsAnItalicsTagToClose := false
   
   //...
   
   case '*':
       if thereIsAnItalicsTagToClose {
          sb.WriteString("</i>")
          thereIsAnItalicsTagToClose = false
       } else {
          sb.WriteString("<i>")
          thereIsAnItalicsTagToClose = true
       }
   

What about bold tags?


   **bold text**
   


   <b>bold text</b>
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:76: TestConvertMarkdownFileToBlogHTML test number: 10
           Test name: bold tags should be added if a string is surrounded by '**'
           expected:
           <b>bold text</b>
           but got:
           <i>bold text</i>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Well. That was to be expected.

One can try the same trick used with counting headers.


   countAsterisks := 0
   thereIsAnItalicsOrBoldTagToClose := false
   
   //...
   
   case '*':
       countAsterisks++
   
   for {
      default:
          lastCharacterWasANewLine = false
   
          if countAsterisks > 0 {
             if thereIsAnItalicsOrBoldTagToClose {
                if countAsterisks == 1 {
                   sb.WriteString("</i>")
                } else if countAsterisks == 2 {
                   sb.WriteString("</b>")
                }
                countAsterisks = 0
                thereIsAnItalicsOrBoldTagToClose = false
   
             } else {
                if countAsterisks == 1 {
                   sb.WriteString("<i>")
                } else if countAsterisks == 2 {
                   sb.WriteString("<b>")
                }
                countAsterisks = 0
                thereIsAnItalicsOrBoldTagToClose = true
             }
          }
   }
   
   // ...
   
   if countAsterisks > 0 {
       if countAsterisks == 1 {
          sb.WriteString("</i>")
       } else if countAsterisks == 2 {
          sb.WriteString("</b>")
       }
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

This passes, but is very disparate.

A quick add–on are cases of italicised an emboldened text?


   ***italic and bold text***
   


   <i><b>italic and bold text</b></i>
   


   if countAsterisks == 1 {
       sb.WriteString("<i>")
   } else if countAsterisks == 2 {
       sb.WriteString("<b>")
   } else if countAsterisks == 3 {
       sb.WriteString("<i><b>")
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

There are some natural extensions to these test cases, such as having a lone asterisk within text between two other asterisks, bold text in the middle of italicised text or vice–versa, and so–forth. I will return to these trickier test cases later.

HTML Entities

HTML entities are used to display reserved characters in HTML code.[9] Currently, I assume that there is only a subset of interest.


   This is a file ' which is <filled> with – HTML "entities" of interest.
   


   {
       name:   "reserved characters should be substituted with html entities",
       input:  "This is a file ' which is <filled> with – HTML \"entities\" of interest.",
       output: "This is a file ' which is <filled> with – HTML "entities" of interest.",
   },
   

Each of these is simply a case of looking up and replacing a character. A hash map is perfect for this.


   var htmlEntityMap = map[rune]string{
       '\'': "'",
       '<':  "<",
       '>':  ">",
       '"':  """,
       '–':  "–",
   }
   

Then in the for loop:


   case '\'': // Rune needs to be escaped
       sb.WriteString(htmlEntityMap[r])
   case '<':
       sb.WriteString(htmlEntityMap[r])
   case '>':
       sb.WriteString(htmlEntityMap[r])
   case '"':
       sb.WriteString(htmlEntityMap[r])
   case '–':
       sb.WriteString(htmlEntityMap[r])
   

The tests now pass again.

Code Blocks

There are two types of code blocks: inline and multi–line.

Here, one needs to decide on whether empty code blocs should be skipped, or not.


   {
       name:   "an empty inline code block should be skipped",
       input:  "``",
       output: "",
   },
   


   {
       name:   "an empty inline code block should be skipped",
       input:  "``",
       output: "<code></code>",
   },
   

Superficially, the latter seems easier to code – as soon as a '`' appears, open a code block. With the former, one needs to read ahead to check if there are one, three backquotes, or some other number of back quotes. On the flip–side, the first test case allows one to avoid adding unnecessary code block tags.

The context here means that it doesn't really matter if there are additional code blocks with nothing between them. This program processes text files in a one–off manner. It doesn't process audio nor video where this type of optimisation is required.

For now, I assume that the former test case should be followed.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:101: TestConvertMarkdownFileToBlogHTML test number: 13
           Test name: an empty inline code block should be skipped
           expected:
   
           but got:
           ``
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

One can read ahead by a token. If it is also a backquote, don't insert any code block.

I assume here that all code blocks are closed, and that a code block does not open if a backquote is the final character.


   thereIsACodeBlockOpen := false
   //...
   
      case '':
          nextR, _, err := br.ReadRune()
         if err == io.EOF {
             if thereIsACodeBlockOpen {
                sb.WriteString("</code>")
                thereIsACodeBlockOpen = false
             }
             break
         }
         if err != nil {
             log.Fatal("unable to read next rune:", err)
         }
         if nextR == '' {
             continue
   
         } else {
             if thereIsACodeBlockOpen {
                sb.WriteString("</code>")
                thereIsACodeBlockOpen = false
             } else {
                sb.WriteString("<code>")
                thereIsACodeBlockOpen = true
             }
         }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now a code block with some content:


   {
       name:   "plain text between a code block should be kept as–is",
       input:  "This is a simple inline code block",
       output: "<code>This is a simple inline code block</code>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:121: TestConvertMarkdownFileToBlogHTML test number: 14
           Test name: plain text between a code block should be kept as–is
           expected:
           <code>This is a simple inline code block</code>
           but got:
           <code>his is a simple inline code block</code>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Oops – there is a rune being ignored.


   //...
      nextR, _, err := br.ReadRune()
      //...
      } else {
          //...
   
          err = br.UnreadRune()
          if err != nil {
             log.Fatal("unable to unread rune:", err)
          }
      }
   //...
   

Sorted.


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

What about a code block in a sentence?


   {
       name:   "code tags should be positioned correctly around an inline code block within another sentence",
       input:  "This is some text surrounding and `inline code block`.",
       output: "This is some text surrounding <code>and inline code block</code>.",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

No problem there. What if HTML entities exist within the code block? Here, they should still be updated.


   {
       name:   "reserved characters within a code block should be replaced with HTML entities",
       input:  "This file contains `a code block` with `a number of ' <> – \" ` html entities in it.",
       output: "This file contains <code>a code block</code> with <code>a number of ' <> – " </code> html entities in it.",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Again, no need to change any code.

Now for a multi–line code block. As I'm using the Bulma CSS framework, the multiline code blocks have to be contained within <pre> tags.[10]


   {
       name:   "multi–line plain text within a code block should be kept as–is",
       input:  "```programming_language\nThis is a multiline code block.\nLine one,\nLine two,\nLine three.\n```",
       output: "<pre><code>\nThis is a multiline code block.\nLine one,\nLine two,\nLine three.\n</code></pre>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:121: TestConvertMarkdownFileToBlogHTML test number: 17
           Test name: multi–line plain text within a code block should be kept as–is
           expected:
           <pre><code>
           This is a multiline code block.
           Line one,
           Line two,
           Line three.
           </code></pre>
           but got:
           <code>programming_language
           This is a multiline code block.
           Line one,
           Line two,
           Line three.
           </code>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Finally, the code has broken.

Now a new rule needs to be added in. If a code block was opened with a trio of back quotes, then include <pre><code> and skip all other characters until the end of the line.

In the first instance, some more conditional statements were added:


   //...
   numberOfCurrentBackQuotes := 0
   
   //...
   case '`':
       fmt.Println("number of `", numberOfCurrentBackQuotes)
       fmt.Println(sb.String())
       numberOfCurrentBackQuotes++
       nextR, _, err := br.ReadRune()
       if err == io.EOF {
          if thereIsACodeBlockOpen {
             sb.WriteString("</code>")
             if numberOfCurrentBackQuotes == 3 {
                sb.WriteString("</pre>")
             }
             thereIsACodeBlockOpen = false
             numberOfCurrentBackQuotes = 0
          }
          break
       }
       if err != nil {
          log.Fatal("unable to read next rune:", err)
       }
       if nextR == '`' {
          numberOfCurrentBackQuotes++
          continue
   
       } else {
          if numberOfCurrentBackQuotes == 3 {
             if thereIsACodeBlockOpen {
                sb.WriteString("</code></pre>")
                thereIsACodeBlockOpen = false
                numberOfCurrentBackQuotes = 0
             } else {
                sb.WriteString("<pre><code>")
                thereIsACodeBlockOpen = true
   
                for nextR != '\n' {
                   nextR, _, err = br.ReadRune()
                   if err == io.EOF {
                      break
                   }
                   if err != nil {
                      log.Fatal("unable to read next rune:", err)
                   }
                }
             }
          } else {
             if thereIsACodeBlockOpen {
                sb.WriteString("</code>")
                thereIsACodeBlockOpen = false
                numberOfCurrentBackQuotes = 0
             } else {
                sb.WriteString("<code>")
                thereIsACodeBlockOpen = true
             }
          }
   
          err = br.UnreadRune()
          if err != nil {
             log.Fatal("unable to unread rune:", err)
          }
       }
   

Which doesn't work quite as intended:


   === RUN   Test name: multi–line plain text within a code block should be kept as–is
           expected:
           <pre><code>
           This is a multiline code block.
           Line one,
           Line two,
           Line three.
           </code></pre>
           but got:
           <pre><code>
           This is a multiline code block.
           Line one,
           Line two,
           Line three.
           </code>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

The print statement explains why:


   number of ` 5
   <pre><code>
   This is a multiline code block.
   Line one,
   Line two,
   Line three.
   

Changing the code to check if there are six back quotes solves this.


   //...
   if err == io.EOF {
       if thereIsACodeBlockOpen {
          sb.WriteString("</code>")
          if numberOfCurrentBackQuotes == 6 {
             sb.WriteString("</pre>")
          }
          thereIsACodeBlockOpen = false
          numberOfCurrentBackQuotes = 0
       }
       break
   }
   //...
   } else {
       if numberOfCurrentBackQuotes == 3 || numberOfCurrentBackQuotes == 6 {
          if thereIsACodeBlockOpen {
             sb.WriteString("</code></pre>")
             thereIsACodeBlockOpen = false
             numberOfCurrentBackQuotes = 0
   
          } else {
             sb.WriteString("<pre><code>")
             thereIsACodeBlockOpen = true
   
             for nextR != '\n' {
                nextR, _, err = br.ReadRune()
                if err == io.EOF {
                   break
                }
                if err != nil {
                   log.Fatal("unable to read next rune:", err)
                }
             }
          }
       }
       //...
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Paragraphs in code blocks are next:


   {
       name:   "paragraphs of plain text within a code block should be kept as–is (without paragraph tags)",
       input:  "```some programming language\nThis is a line.\n\nHere is another line. It should not be in paragraph tags.\n\nA final line.\n```",
       output: "<pre><code>\nThis is a line.\n\nHere is another line. It should not be in paragraph tags.\n\nA final line.\n</code></pre>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:126: TestConvertMarkdownFileToBlogHTML test number: 18
           Test name: paragraphs of plain text within a code block should be kept as–is (without paragraph tags)
           expected:
           <pre><code>
           This is a line.
   
           Here is another line. It should not be in paragraph tags.
   
           A final line.
           </code></pre>
           but got:
           <pre><code>
           This is a line.
           <p>
           Here is another line. It should not be in paragraph tags.
           </p>
           <p>
           A final line.
           </p>
           </code></pre>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

The additional rule of not having paragraph tags within a code block is straight–forward to add.


   case '\n':
       if headerCount > 0 {
          addClosingHeaderTag(&sb, headerCount)
       }
   
       if thereIsAParagraphToClose {
          sb.WriteRune('\n')
          sb.WriteString("</p>")
          thereIsAParagraphToClose = false
       }
   
       if !thereIsACodeBlockOpen {
          if lastCharacterWasANewLine {
             sb.WriteString("<p>")
             thereIsAParagraphToClose = true
          } else {
             lastCharacterWasANewLine = true
          }
       }
       sb.WriteRune('\n')
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

What about a paragraph with a code block in it?


   This is a line.
   
   Paragraph `with a code block` in it.
   


   {
       name:   "a paragraph of plain text with an inline code block in it should wrap the <code> tags around it properly",
       input:  "This is a line.\n\nParagraph `with a code block` in it.",
       output: "This is a line.\n<p>\nParagraph <code>with a code block</code> in it.\n</p>",
   },
   

That works as–is. Still, it is nice to have about for regression testing.


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

How about a paragraph with a multi–line code block within it?


   This is a line.
   
   Here is a multi–line code block:
   
   ```code
   

Line one,

Line two,

line three.

``` That's the end of the code block.


   {
       name:   "a paragraph of plain text with an inline code block in it should wrap the <code> tags around it properly",
       input:  "This is a line.\n\nHere is a multi–line code block:\n\n```code\nLine one,\n\nLine two,\n\nline three.\n```\n\nThat's the end of the code block.",
       output: "This is a line.\n<p>\nHere is a multi–line code block:\n</p>\n<p>\n<pre><code>\nLine one,\n\nLine two,\n\nline three.\n</code></pre>\n</p>\n<p>\nThat's the end of the code block.\n</p>",
   },
   

Which fails:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:136: TestConvertMarkdownFileToBlogHTML test number: 20
           Test name: a paragraph of plain text with an inline code block in it should wrap the <code> tags around it properly
           expected:
           This is a line.
           <p>
           Here is a multi–line code block:
           </p>
           <p>
           <pre><code>
           Line one,
   
           Line two,
   
           line three.
           </code></pre>
           </p>
           <p>
           That's the end of the code block.
           </p>
           but got:
           This is a line.
           <p>
           Here is a multi–line code block:
           </p>
           <p>
           <pre><code>
           </p>
           Line one,
   
           Line two,
   
           line three.
           </code></pre>
           <p>
           That's the end of the code block.
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

There was a slight flaw in the logic for the new lines in the paragraph case. This can be corrected by moving the 'thereIsAParagraphToClose' conditional within the '!thereIsACodeBlockOpen' conditional.


   case '\n':
       if headerCount > 0 {
          addClosingHeaderTag(&sb, headerCount)
       }
   
       if !thereIsACodeBlockOpen {
          if thereIsAParagraphToClose {
             sb.WriteRune('\n')
             sb.WriteString("</p>")
             thereIsAParagraphToClose = false
          }
          if lastCharacterWasANewLine {
             sb.WriteString("<p>")
             thereIsAParagraphToClose = true
          } else {
             lastCharacterWasANewLine = true
          }
       }
       sb.WriteRune('\n')
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

What about a code block with a directory structure in it?


   ```
   – dashboard
   | – frontend
   | – backend
   ```
   


   <code>
   – dashboard
   | – frontend
   | – backend
   </code>
   


   {
       name:   "a multi–line code block with a directory structure within it should be rendered correctly",
       input:  "```\n– dashboard  \n| – frontend  \n| – backend  \n```",
       output: "<pre><code>\n– dashboard\n| – frontend\n| – backend\n</code></pre>",
   },
   

This fails, though it seemingly gives the correct output:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:141: TestConvertMarkdownFileToBlogHTML test number: 21
           Test name: a multi–line code block with a directory structure within it should be rendered correctly
           expected:
           <pre><code>
           – dashboard
           | – frontend
           | – backend
           </code></pre>
           but got:
           <pre><code>
           – dashboard
           | – frontend
           | – backend
           </code></pre>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

The keen–eyed amongst you have noticed the additional space characters in the test input, which are not included in the text output. These were added by mistake when copying text back–and–forth from Obsidian and the IDE. This test can be split into two: one without the trailing spaces, and one with, to ensure that both cases are accounted for.


   {
       name:   "a multi–line code block with a directory structure within it should be rendered correctly",
       input:  "```\n– dashboard  \n| – frontend  \n| – backend  \n```",
       output: "<pre><code>\n– dashboard  \n| – frontend  \n| – backend  \n</code></pre>",
   },
   {
       name:   "a multi–line code block with a directory structure within it should be rendered correctly",
       input:  "```\n– dashboard\n| – frontend\n| – backend\n```",
       output: "<pre><code>\n– dashboard\n| – frontend\n| – backend\n</code></pre>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

What about a code block with has asterisks, for instance when code is imported?


   ```js
   import * as echarts from 'echarts';
   ```
   


   {
       name:   "asterisks within a code block should be left as–is",
       input:  "```js\nimport * as echarts from 'echarts';\n```",
       output: "<pre><code>\nimport * as echarts from 'echarts';\n</code></pre>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:151: TestConvertMarkdownFileToBlogHTML test number: 23
           Test name: asterisks within a code block should be left as–is
           expected:
           <pre><code>
           import * as echarts from 'echarts';</code></pre>
           but got:
           <pre><code>
           import  <i>as echarts from 'echarts';
           </code></pre>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

This is easily solved:


   case '*':
       if thereIsACodeBlockOpen {
          sb.WriteRune('*')
       } else {
          countAsterisks++
       }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

There are a lot of 'if in a code block' conditionals littered throughout the switch statement. In most cases, it is to break out of the normal flow of that case and skip adding characters.

There are a couple more code block–related test cases to add, mainly due to being in combination with other markdown rules. I'll add those first before refactoring out.

Footnotes

The footnotes that I've been using are listed as:


   [^1]
   
   [^1]: This is the footnote
   

Let's start with just the in–text footnote number.


   Here is a footnote.[^1]
   

For these, I'll use anchor tags, so that a viewer can click on the footnote number to jump to the end of the post. It would also be useful if a view could click on the footnote number at the end and jump back up to where the footnote is in text. Each footnote will need to have it's own id to jump back to, and


   {
       name:   "inline footnotes should be replaced with  <a id=\"footnote–anchor–n\" href=\"#footnote–n\">[n]</a>",
       input:  "Here is a footnote.[^1]",
       output: "Here is a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:156: TestConvertMarkdownFileToBlogHTML test number: 24
           Test name: inline footnotes should be replaced with  <a id=\"footnote–anchor–n\" href=\"#footnote–n\">[n]</a>
           expected:
           Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           but got:
           Here is a footnote.[^1]
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

One can start with a blunt case:


   case '[':
       for {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == ']' {
             sb.WriteString("<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>")
             break
          }
       }
   

Adding another footnote breaks this immediately.


   {
       name:   "successive inline footnotes should be replaced with  <a id=\"footnote–anchor–n\" href=\"#footnote–n\">[n]</a> and be numbered correctly",
       input:  "Here is a footnote[^1] and another footnote.[^2]",
       output: "Here is a footnote<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a> and another footnote.<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:161: TestConvertMarkdownFileToBlogHTML test number: 25
           Test name: inline footnotes should be replaced with  <a id=\"footnote–anchor–n\" href=\"#footnote–n\">[n]</a>
           expected:
           Here is a footnote<a id="footnote–anchor–1" href="#footnote–1">[1]</a> and another footnote.<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
           but got:
           Here is a footnote<a id="footnote–anchor–1" href="#footnote–1">[1]</a> and another footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

One option might be to extract the number from the footnote.


   case '[':
       var footnoteNum = strings.Builder{}
       for {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == ']' {
             sb.WriteString("<a id=\"footnote–anchor–" + footnoteNum.String() + "\" href=\"#footnote–" + footnoteNum.String() + "\">[" + footnoteNum.String() + "]</a>")
             break
          } else if nextR != '^' {
             footnoteNum.WriteRune(nextR)
          }
       }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

This presents a problem, however. When editing a post, footnotes might be added out–of–order. For instance:


   # Some heading
   
   This is a line which was added later one.[^2]
   
   This used to be the first line of the text.[^1]
   

However, in a post, viewers should see the numbers in order.


   # Some heading
   
   This is a line which was added later one.[^1]
   
   This used to be the first line of the text.[^2]
   

The footnotes at the end will need to be reordered as well. We'll get to that step when we need to.

For now, let's keep track of which numbers exist.


   {
       name:   "out–of–order footnote numbers should be updated to be in increasing order",
       input:  "Here is a footnote[^2] and another footnote.[^1]",
       output: "Here is a footnote<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a> and another footnote.<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>",
   },
   


   footnoteNumber := 0
   
   //...
   
   case '[':
       for {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == ']' {
             sb.WriteString(
             "<a id=\"footnote–anchor–" +
                strconv.Itoa(footnoteNumber) +
                "\" href=\"#footnote–" +
                strconv.Itoa(footnoteNumber) +
                "\">[" + strconv.Itoa(footnoteNumber) +
                "]</a>",
           )
             break
          } else if nextR != '^' {
             footnoteNumber++
          }
       }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Let's check quickly that this still works correctly when a footnote is in a paragraph.


   # This is a heading
   
   Here is a footnote.[^1]
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:171: TestConvertMarkdownFileToBlogHTML test number: 27
           Test name: a footnote in a paragraph should have paragraph and anchor tags added correctly
           expected:
           <h1> This is a heading</h1>
           <p>
           Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           but got:
           <h1> This is a heading</h1>
           </h1><p>
           Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a></h1>
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Well, that's a little distressing.

Adding a couple of print statements reveals the issue:


   fmt.Println(sb.String())
   fmt.Println("rune:", string(r))
   


   rune:
   
   <h1> This is a heading</h1>
   
   rune:
   
   <h1> This is a heading</h1>
   </h1><p>
   
   rune: H
   <h1> This is a heading</h1>
   </h1><p>
   H
   

There is an additional </hn> being added when a new line rune is found.


   case '\n':
       if headerCount > 0 {
          addClosingHeaderTag(&sb, headerCount)
       }
   

The solution is a simple, and overlooked, point from earlier. The header count needs to be reset to 0 when the header has been read in. If it is not, then later headers will have quite large header count values.


   case '\n':
       if headerCount > 0 {
          addClosingHeaderTag(&sb, headerCount)
          headerCount = 0
       }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Let's add another test for this, where there are two footnotes.


   # This is a heading
   
   Here is a footnote.[^2] Here's another.[^1]
   


   {
       name:   "a footnote in a paragraph should have paragraph and anchor tags added correctly, and successive footnotes should be numbered in increasing order",
       input:  "# This is a heading\n\nHere is a footnote.[^2] Here's another.[^1]",
       output: "<h1> This is a heading</h1>\n<p>\nHere is a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a> Here's another.<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>\n</p>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Good.

Now it is time to add the footnotes section at the end.


   Throwaway line
   
   This paragraph references a footnote.[^1]
   
   [^1]: This is the reference.
   


   Throwaway line
   <p>
   This paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>
   </p>
   <p id="footnote–1">
   <a href="#footnote–anchor–1">[1]</a>
   This is the reference.
   </p>
   


   {
       name:   "a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly",
       input:  "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\n[^1]: This is the reference.",
       output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference.\n</p>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:181: TestConvertMarkdownFileToBlogHTML test number: 29
           Test name: a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly
           expected:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
           but got:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           <a id="footnote–anchor–2" href="#footnote–2">[2]</a>: This is the reference.
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Notice the paragraph tags around the footnote at the bottom. I'll circle back to these momentarily.

The next rule to add: if a heading is followed by : then it is a footnote at the end of the post. This isn't too difficult to add, but does get knarly.


   inlineFootnoteNumber := 0
   endFootnoteNumber := 0
   
   //...
   
      case '[':
          for {
             nextR, _, err := br.ReadRune()
             if err == io.EOF {
                break
             }
             if err != nil {
                log.Fatal("unable to read next rune:", err)
             }
             if nextR == ']' {
              // Check if the footnote is inline, or at the end of the document.
                nextR, _, err = br.ReadRune()
                // If the file ends there, assume it is an inline footnote.
                if err == io.EOF {
                   sb.WriteString(
                      "<a id=\"footnote–anchor–" +
                         strconv.Itoa(inlineFootnoteNumber) +
                         "\" href=\"#footnote–" +
                         strconv.Itoa(inlineFootnoteNumber) +
                         "\">[" + strconv.Itoa(inlineFootnoteNumber) +
                         "]</a>",
                   )
                   break
                }
                if err != nil {
                   log.Fatal("unable to read next rune:", err)
                }
                // Else, check if it is a footnote at the end of the blog or not.
                if nextR == ':' {
                   endFootnoteNumber++
                   sb.WriteString(
                      "<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n",
                   )
   
                   for nextR != '\n' {
                      nextR, _, err = br.ReadRune()
                      if err == io.EOF {
                         break
                      }
                      if err != nil {
                         log.Fatal("unable to read next rune:", err)
                      }
                      sb.WriteRune(nextR)
                   }
   
                sb.WriteString("\n</p>")
   
                } else {
                   sb.WriteString(
                      "<a id=\"footnote–anchor–" +
                         strconv.Itoa(inlineFootnoteNumber) +
                         "\" href=\"#footnote–" +
                         strconv.Itoa(inlineFootnoteNumber) +
                         "\">[" + strconv.Itoa(inlineFootnoteNumber) +
                         "]</a>",
                   )
                   err = br.UnreadRune()
                   if err != nil {
                      log.Fatal("unable to unread rune:", err)
                   }
                }
                break
   
             } else if nextR != '^' {
                inlineFootnoteNumber++
             }
          }
   

This fails, owing to the additional paragraph tags noted above:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:181: TestConvertMarkdownFileToBlogHTML test number: 29
           Test name: a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly
           expected:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
           but got:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Now a decision to make. One an either change the rules to have all footnotes at the end of the post be surrounded by paragraph tags, or when one reads in a \n rune, one can look ahead to see if the next character is for a footnote.

I will try the latter to start with.


   case '\n':
       //...
   
       if !thereIsACodeBlockOpen {
          //...
         if lastCharacterWasANewLine {
             nextR, _, err := br.ReadRune()
             if err == io.EOF {
                break
             }
             if err != nil {
                log.Fatal("unable to read rune:", err)
             }
             if nextR == '[' {
                err = br.UnreadRune()
                if err != nil {
                   log.Fatal("unable to unread rune:", err)
                }
                continue
             }
   
             sb.WriteString("<p>")
             thereIsAParagraphToClose = true
   
             err = br.UnreadRune()
             if err != nil {
                log.Fatal("unable to unread rune:", err)
             }
          } else {
             lastCharacterWasANewLine = true
          }
       }
       sb.WriteRune('\n')
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now for successive footnotes.


   Throwaway line
   
   This paragraph references a footnote.[^1]
   
   This paragraph[^2] also has a footnote.
   
   [^1]: This is the reference.
   [^2]: This is a footnote.
   


   Throwaway line
   <p>
   This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
   </p>
   <p>
   This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
   </p>
   <p id="footnote–1">
   <a href="#footnote–anchor–1">[1]</a>
    This is the reference.
   </p>
   <p id="footnote–1">
   <a href="#footnote–anchor–1">[1]</a>
    This is a footnote.
   </p>
   

Feedback on the next failed test is:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:186: TestConvertMarkdownFileToBlogHTML test number: 30
           Test name: successive footnotes in text and at the end should be numbered correctly
           expected:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            This is a footnote.
           </p>
           but got:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
   
           </p><p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is a footnote.
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

The numbering is, of course, off. There also is an additional space after the first footnote at the end, and no new line rune after it. This is created by a combination of the sb.WriteString("<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n") and the sb.WriteString("\n</p>") statements.

Let's remove the new line runes from both, and replace the hard–coded 1 with the value of endFootnoteNumber.


   case '[':
       for {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == ']' {
   
             nextR, _, err = br.ReadRune()
             if err == io.EOF {
                sb.WriteString(
                   "<a id=\"footnote–anchor–" +
                      strconv.Itoa(inlineFootnoteNumber) +
                      "\" href=\"#footnote–" +
                      strconv.Itoa(inlineFootnoteNumber) +
                      "\">[" + strconv.Itoa(inlineFootnoteNumber) +
                      "]</a>",
                )
                break
             }
             if err != nil {
                log.Fatal("unable to read next rune:", err)
             }
             if nextR == ':' {
                endFootnoteNumber++
                sb.WriteString(
               "<p id=\"footnote–" +
                   strconv.Itoa(endFootnoteNumber) +
                   "\">\n<a href=\"#footnote–anchor–" +
                   strconv.Itoa(endFootnoteNumber) +
                   "\">[" +
                   strconv.Itoa(endFootnoteNumber) +
                   "]</a>",
                )
   
                for nextR != '\n' {
                   nextR, _, err = br.ReadRune()
                   if err == io.EOF {
                      break
                   }
                   if err != nil {
                      log.Fatal("unable to read next rune:", err)
                   }
                   sb.WriteRune(nextR)
                }
   
                sb.WriteString("</p>")
   
             } else {
                sb.WriteString(
                   "<a id=\"footnote–anchor–" +
                      strconv.Itoa(inlineFootnoteNumber) +
                      "\" href=\"#footnote–" +
                      strconv.Itoa(inlineFootnoteNumber) +
                      "\">[" + strconv.Itoa(inlineFootnoteNumber) +
                      "]</a>",
                )
                err = br.UnreadRune()
                if err != nil {
                   log.Fatal("unable to unread rune:", err)
                }
             }
   
             break
          } else if nextR != '^' {
             inlineFootnoteNumber++
          }
       }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:186: TestConvertMarkdownFileToBlogHTML test number: 29
           Test name: a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly
           expected:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
           but got:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a> This is the reference.</p>
       main_test.go:186: TestConvertMarkdownFileToBlogHTML test number: 30
           Test name: successive footnotes in text and at the end should be numbered correctly
           expected:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            This is a footnote.
           </p>
           but got:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a> This is the reference.
           </p><p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a> This is a footnote.</p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

A–ha – regression!

Let's add the \n back in and try a different approach.

Moving the sb.WriteRune('\n') into the err == io.EOF statement fixes test 29, but does not resolve the issue with test 30.


   #...
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p><p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            This is a footnote.
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Now, does this matter?

For the final output, a browser is not going to care if it reads </p><p> or </p>\n<p>. Both will render in the same way.

One solution, on the test side, is simply to get rid of all of the \n runes between these types of paragraph tags. As long as the new lines within code blocks and the like are kept as–is, it shouldn't be a problem.

However, this looks fixable without having to add this odd rig–ma–role to the testing loop. One simply needs to check if err == io.EOF after the losing paragraph tag has been added.


   for nextR != '\n' {
       nextR, _, err = br.ReadRune()
       if err == io.EOF {
          sb.WriteRune('\n')
          break
       }
       if err != nil {
          log.Fatal("unable to read next rune:", err)
       }
       sb.WriteRune(nextR)
   }
   
   sb.WriteString("</p>")
   if err == io.EOF {
       break
   }
   sb.WriteRune('\n')
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

There's one more footnote test I'd like to add before moving on. Many of these footnotes will contain URLs, and some of those may contain # symbols in them.


   Throwaway line
   
   This paragraph references a footnote.[^1]
   
   [^1]: This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.
   


   Throwaway line
   <p>
   This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
   </p>
   <p id="footnote–1">
   <a href="#footnote–anchor–1">[1]</a>
    This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.
   </p>
   


   {
       name:   "'#' in footnotes should not cause header tags to be added",
       input:  "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\n[^1]: This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.",
       output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.\n</p>",
   },
   

Failure incoming!


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:191: TestConvertMarkdownFileToBlogHTML test number: 31
           Test name: '#' in footnotes should not cause header tags to be added
           expected:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.
           </p>
           but got:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Well, that is good to know. The # are being rendered correctly, so that isn't an issue. However, the HTML entities are not being replaced correctly.

This is easy to solve. Rather than just adding runes as–is, one can check if the run should be converted to an HTML entity.


   if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
       sb.WriteString(htmlEntityMap[nextR])
   } else {
       sb.WriteRune(nextR)
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

What about double–digit amounts of footnotes?


   Throwaway line
   
   [^1]
   [^2]
   [^3]
   [^4]
   [^5]
   [^6]
   [^7]
   [^8]
   [^9]
   [^10]
   [^11]
   [^12]
   
   [^1]: 1
   [^2]: 2
   [^3]: 3
   [^4]: 4
   [^5]: 5
   [^6]: 6
   [^7]: 7
   [^8]: 8
   [^9]: 9
   [^10]: 10
   [^11]: 11
   [^12]: 12
   


   Throwaway line
   
   <a id="footnote–anchor–1" href="#footnote–1">[1]</a>
   <a id="footnote–anchor–2" href="#footnote–2">[2]</a>
   <a id="footnote–anchor–3" href="#footnote–3">[3]</a>
   <a id="footnote–anchor–4" href="#footnote–4">[4]</a>
   <a id="footnote–anchor–5" href="#footnote–5">[5]</a>
   <a id="footnote–anchor–6" href="#footnote–6">[6]</a>
   <a id="footnote–anchor–7" href="#footnote–7">[7]</a>
   <a id="footnote–anchor–8" href="#footnote–8">[8]</a>
   <a id="footnote–anchor–9" href="#footnote–9">[9]</a>
   <a id="footnote–anchor–10" href="#footnote–10">[10]</a>
   <a id="footnote–anchor–11" href="#footnote–11">[11]</a>
   <a id="footnote–anchor–12" href="#footnote–12">[12]</a>
   
   <p id="footnote–1">
   <a href="#footnote–anchor–1">[1]</a>
    1
   </p>
   <p id="footnote–2">
   <a href="#footnote–anchor–2">[2]</a>
    2
   </p>
   <p id="footnote–3">
   <a href="#footnote–anchor–3">[3]</a>
    3
   </p>
   <p id="footnote–4">
   <a href="#footnote–anchor–4">[4]</a>
    4
   </p>
   <p id="footnote–5">
   <a href="#footnote–anchor–5">[5]</a>
    5
   </p>
   <p id="footnote–6">
   <a href="#footnote–anchor–6">[6]</a>
    6
   </p>
   <p id="footnote–7">
   <a href="#footnote–anchor–7">[7]</a>
    7
   </p>
   <p id="footnote–8">
   <a href="#footnote–anchor–8">[8]</a>
    8
   </p>
   <p id="footnote–9">
   <a href="#footnote–anchor–9">[9]</a>
    9
   </p>
   <p id="footnote–10">
   <a href="#footnote–anchor–10">[10]</a>
    10
   </p>
   <p id="footnote–11">
   <a href="#footnote–anchor–11">[11]</a>
    11
   </p>
   <p id="footnote–12">
   <a href="#footnote–anchor–12">[12]</a>
    12
   </p>
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 32
           Test name: double–digit footnotes should be numbered correctly
           expected:
           Throwaway line
           <p>
           <a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           <a id="footnote–anchor–2" href="#footnote–2">[2]</a>
           <a id="footnote–anchor–3" href="#footnote–3">[3]</a>
           <a id="footnote–anchor–4" href="#footnote–4">[4]</a>
           <a id="footnote–anchor–5" href="#footnote–5">[5]</a>
           <a id="footnote–anchor–6" href="#footnote–6">[6]</a>
           <a id="footnote–anchor–7" href="#footnote–7">[7]</a>
           <a id="footnote–anchor–8" href="#footnote–8">[8]</a>
           <a id="footnote–anchor–9" href="#footnote–9">[9]</a>
           <a id="footnote–anchor–10" href="#footnote–10">[10]</a>
           <a id="footnote–anchor–11" href="#footnote–11">[11]</a>
           <a id="footnote–anchor–12" href="#footnote–12">[12]</a>
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            1
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            2
           </p>
           <p id="footnote–3">
           <a href="#footnote–anchor–3">[3]</a>
            3
           </p>
           <p id="footnote–4">
           <a href="#footnote–anchor–4">[4]</a>
            4
           </p>
           <p id="footnote–5">
           <a href="#footnote–anchor–5">[5]</a>
            5
           </p>
           <p id="footnote–6">
           <a href="#footnote–anchor–6">[6]</a>
            6
           </p>
           <p id="footnote–7">
           <a href="#footnote–anchor–7">[7]</a>
            7
           </p>
           <p id="footnote–8">
           <a href="#footnote–anchor–8">[8]</a>
            8
           </p>
           <p id="footnote–9">
           <a href="#footnote–anchor–9">[9]</a>
            9
           </p>
           <p id="footnote–10">
           <a href="#footnote–anchor–10">[10]</a>
            10
           </p>
           <p id="footnote–11">
           <a href="#footnote–anchor–11">[11]</a>
            11
           </p>
           <p id="footnote–12">
           <a href="#footnote–anchor–12">[12]</a>
            12
           </p>
           but got:
           Throwaway line
           <a id="footnote–anchor–1" href="#footnote–1">[1]</a><a id="footnote–anchor–2" href="#footnote–2">[2]</a><a id="footnote–anchor–3" href="#footnote–3">[3]</a><a id="footnote–anchor–4" href="#footnote–4">[4]</a><a id="footnote–anchor–5" href="#footnote–5">[5]</a><a id="footnote–anchor–6" href="#footnote–6">[6]</a><a id="footnote–anchor–7" href="#footnote–7">[7]</a><a id="footnote–anchor–8" href="#footnote–8">[8]</a><a id="footnote–anchor–9" href="#footnote–9">[9]</a><a id="footnote–anchor–11" href="#footnote–11">[11]</a><a id="footnote–anchor–13" href="#footnote–13">[13]</a><a id="footnote–anchor–15" href="#footnote–15">[15]</a><p>
   
           </p><p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            1
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            2
           </p>
           <p id="footnote–3">
           <a href="#footnote–anchor–3">[3]</a>
            3
           </p>
           <p id="footnote–4">
           <a href="#footnote–anchor–4">[4]</a>
            4
           </p>
           <p id="footnote–5">
           <a href="#footnote–anchor–5">[5]</a>
            5
           </p>
           <p id="footnote–6">
           <a href="#footnote–anchor–6">[6]</a>
            6
           </p>
           <p id="footnote–7">
           <a href="#footnote–anchor–7">[7]</a>
            7
           </p>
           <p id="footnote–8">
           <a href="#footnote–anchor–8">[8]</a>
            8
           </p>
           <p id="footnote–9">
           <a href="#footnote–anchor–9">[9]</a>
            9
           </p>
           <p id="footnote–10">
           <a href="#footnote–anchor–10">[10]</a>
            10
           </p>
           <p id="footnote–11">
           <a href="#footnote–anchor–11">[11]</a>
            11
           </p>
           <p id="footnote–12">
           <a href="#footnote–anchor–12">[12]</a>
            12
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

A few issues:

  • There are no new lines characters after the second in–text footnote.
  • The in–text footnotes start skipping numbers as soon as the double–digit numbers are reached.
  • There are some unnecessary paragraph tags between the in–line footnotes and the

For the first point, one can attempt to add a new line character when the start of a footnote has been found after a new line character.


   if nextR == '[' {
       err = br.UnreadRune()
       if err != nil {
          log.Fatal("unable to unread rune:", err)
       }
       sb.WriteRune('\n')
       continue
   }
   

This breaks some earlier tests. For example:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 29
           Test name: a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly
           expected:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
           but got:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
   
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
   

Here, we return to the question of whether this difference matters. As noted before, there is no semantic difference on whether new line characters are included between closing and opening paragraph tags or not, it is more for the convenience of making the output from tests easier to read. The places where one needs to care about new line characters are within code blocks, as this affects how the code is displayed.

Therefore, for convenience I'm wiling to change the expected outputs from the tests to allow for an additional new line character between the </p> and <p ...> tags just before the footnotes at the end.

Four tests have to be updated to allow for this:


   {
       name:   "a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly",
       input:  "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\n[^1]: This is the reference.",
       output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference.\n</p>",
   },
   {
       name:   "successive footnotes in text and at the end should be numbered correctly",
       input:  "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\nThis paragraph[^2] also has a footnote.\n\n[^1]: This is the reference.\n[^2]: This is a footnote.",
       output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n<p>\nThis paragraph<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a> also has a footnote.\n</p>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference.\n</p>\n<p id=\"footnote–2\">\n<a href=\"#footnote–anchor–2\">[2]</a>\n This is a footnote.\n</p>",
   },
   {
       name:   "'#' in footnotes should not cause header tags to be added",
       input:  "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\n[^1]: This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.",
       output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.\n</p>",
   },
   {
       name:   "double–digit footnotes should be numbered correctly",
       input:  "Throwaway line\n\n[^1]\n[^2]\n[^3]\n[^4]\n[^5]\n[^6]\n[^7]\n[^8]\n[^9]\n[^10]\n[^11]\n[^12]\n\n[^1]: 1\n[^2]: 2\n[^3]: 3\n[^4]: 4\n[^5]: 5\n[^6]: 6\n[^7]: 7\n[^8]: 8\n[^9]: 9\n[^10]: 10\n[^11]: 11\n[^12]: 12",
       output: "Throwaway line\n<p>\n<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>\n<a id=\"footnote–anchor–3\" href=\"#footnote–3\">[3]</a>\n<a id=\"footnote–anchor–4\" href=\"#footnote–4\">[4]</a>\n<a id=\"footnote–anchor–5\" href=\"#footnote–5\">[5]</a>\n<a id=\"footnote–anchor–6\" href=\"#footnote–6\">[6]</a>\n<a id=\"footnote–anchor–7\" href=\"#footnote–7\">[7]</a>\n<a id=\"footnote–anchor–8\" href=\"#footnote–8\">[8]</a>\n<a id=\"footnote–anchor–9\" href=\"#footnote–9\">[9]</a>\n<a id=\"footnote–anchor–10\" href=\"#footnote–10\">[10]</a>\n<a id=\"footnote–anchor–11\" href=\"#footnote–11\">[11]</a>\n<a id=\"footnote–anchor–12\" href=\"#footnote–12\">[12]</a>\n</p>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n 1\n</p>\n<p id=\"footnote–2\">\n<a href=\"#footnote–anchor–2\">[2]</a>\n 2\n</p>\n<p id=\"footnote–3\">\n<a href=\"#footnote–anchor–3\">[3]</a>\n 3\n</p>\n<p id=\"footnote–4\">\n<a href=\"#footnote–anchor–4\">[4]</a>\n 4\n</p>\n<p id=\"footnote–5\">\n<a href=\"#footnote–anchor–5\">[5]</a>\n 5\n</p>\n<p id=\"footnote–6\">\n<a href=\"#footnote–anchor–6\">[6]</a>\n 6\n</p>\n<p id=\"footnote–7\">\n<a href=\"#footnote–anchor–7\">[7]</a>\n 7\n</p>\n<p id=\"footnote–8\">\n<a href=\"#footnote–anchor–8\">[8]</a>\n 8\n</p>\n<p id=\"footnote–9\">\n<a href=\"#footnote–anchor–9\">[9]</a>\n 9\n</p>\n<p id=\"footnote–10\">\n<a href=\"#footnote–anchor–10\">[10]</a>\n 10\n</p>\n<p id=\"footnote–11\">\n<a href=\"#footnote–anchor–11\">[11]</a>\n 11\n</p>\n<p id=\"footnote–12\">\n<a href=\"#footnote–anchor–12\">[12]</a>\n 12\n</p>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 32
           Test name: double–digit footnotes should be numbered correctly
           expected:
           Throwaway line
           <p>
           <a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           <a id="footnote–anchor–2" href="#footnote–2">[2]</a>
           <a id="footnote–anchor–3" href="#footnote–3">[3]</a>
           <a id="footnote–anchor–4" href="#footnote–4">[4]</a>
           <a id="footnote–anchor–5" href="#footnote–5">[5]</a>
           <a id="footnote–anchor–6" href="#footnote–6">[6]</a>
           <a id="footnote–anchor–7" href="#footnote–7">[7]</a>
           <a id="footnote–anchor–8" href="#footnote–8">[8]</a>
           <a id="footnote–anchor–9" href="#footnote–9">[9]</a>
           <a id="footnote–anchor–10" href="#footnote–10">[10]</a>
           <a id="footnote–anchor–11" href="#footnote–11">[11]</a>
           <a id="footnote–anchor–12" href="#footnote–12">[12]</a>
           </p>
   
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            1
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            2
           </p>
           <p id="footnote–3">
           <a href="#footnote–anchor–3">[3]</a>
            3
           </p>
           <p id="footnote–4">
           <a href="#footnote–anchor–4">[4]</a>
            4
           </p>
           <p id="footnote–5">
           <a href="#footnote–anchor–5">[5]</a>
            5
           </p>
           <p id="footnote–6">
           <a href="#footnote–anchor–6">[6]</a>
            6
           </p>
           <p id="footnote–7">
           <a href="#footnote–anchor–7">[7]</a>
            7
           </p>
           <p id="footnote–8">
           <a href="#footnote–anchor–8">[8]</a>
            8
           </p>
           <p id="footnote–9">
           <a href="#footnote–anchor–9">[9]</a>
            9
           </p>
           <p id="footnote–10">
           <a href="#footnote–anchor–10">[10]</a>
            10
           </p>
           <p id="footnote–11">
           <a href="#footnote–anchor–11">[11]</a>
            11
           </p>
           <p id="footnote–12">
           <a href="#footnote–anchor–12">[12]</a>
            12
           </p>
           but got:
           Throwaway line
   
           <a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           <a id="footnote–anchor–2" href="#footnote–2">[2]</a>
           <a id="footnote–anchor–3" href="#footnote–3">[3]</a>
           <a id="footnote–anchor–4" href="#footnote–4">[4]</a>
           <a id="footnote–anchor–5" href="#footnote–5">[5]</a>
           <a id="footnote–anchor–6" href="#footnote–6">[6]</a>
           <a id="footnote–anchor–7" href="#footnote–7">[7]</a>
           <a id="footnote–anchor–8" href="#footnote–8">[8]</a>
           <a id="footnote–anchor–9" href="#footnote–9">[9]</a>
           <a id="footnote–anchor–11" href="#footnote–11">[11]</a>
           <a id="footnote–anchor–13" href="#footnote–13">[13]</a>
           <a id="footnote–anchor–15" href="#footnote–15">[15]</a><p>
   
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            1
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            2
           </p>
           <p id="footnote–3">
           <a href="#footnote–anchor–3">[3]</a>
            3
           </p>
           <p id="footnote–4">
           <a href="#footnote–anchor–4">[4]</a>
            4
           </p>
           <p id="footnote–5">
           <a href="#footnote–anchor–5">[5]</a>
            5
           </p>
           <p id="footnote–6">
           <a href="#footnote–anchor–6">[6]</a>
            6
           </p>
           <p id="footnote–7">
           <a href="#footnote–anchor–7">[7]</a>
            7
           </p>
           <p id="footnote–8">
           <a href="#footnote–anchor–8">[8]</a>
            8
           </p>
           <p id="footnote–9">
           <a href="#footnote–anchor–9">[9]</a>
            9
           </p>
           <p id="footnote–10">
           <a href="#footnote–anchor–10">[10]</a>
            10
           </p>
           <p id="footnote–11">
           <a href="#footnote–anchor–11">[11]</a>
            11
           </p>
           <p id="footnote–12">
           <a href="#footnote–anchor–12">[12]</a>
            12
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

There is still the issue that no paragraph tag has been included before the slew of in–text footnotes. This is an artifact of the [ check in the \n case, added earlier to avoid having <p> around the footnotes at the end of the post:


   if nextR == '[' {
       err = br.UnreadRune()
       if err != nil {
          log.Fatal("unable to unread rune:", err)
       }
       sb.WriteRune('\n')
       continue
   }
   

At this point, I noticed something rather annoying. lastCharacterWasANewLine is not set to false outside of the default case. At the end of every other statement, I added in a clause to ensure that this is the case.


   lastCharacterWasANewLine = false
   

Doing so doesn't break any tests (apart from the latest, which is already broken). However, it does change the output, removing the <p> tags completely.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 32
           Test name: double–digit footnotes should be numbered correctly
           expected:
           #...
           but got:
           Throwaway line
   
           <a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           <a id="footnote–anchor–2" href="#footnote–2">[2]</a>
           <a id="footnote–anchor–3" href="#footnote–3">[3]</a>
           <a id="footnote–anchor–4" href="#footnote–4">[4]</a>
           <a id="footnote–anchor–5" href="#footnote–5">[5]</a>
           <a id="footnote–anchor–6" href="#footnote–6">[6]</a>
           <a id="footnote–anchor–7" href="#footnote–7">[7]</a>
           <a id="footnote–anchor–8" href="#footnote–8">[8]</a>
           <a id="footnote–anchor–9" href="#footnote–9">[9]</a>
           <a id="footnote–anchor–11" href="#footnote–11">[11]</a>
           <a id="footnote–anchor–13" href="#footnote–13">[13]</a>
           <a id="footnote–anchor–15" href="#footnote–15">[15]</a>
   
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            1
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            2
           </p>
           <p id="footnote–3">
           <a href="#footnote–anchor–3">[3]</a>
            3
           </p>
           <p id="footnote–4">
           <a href="#footnote–anchor–4">[4]</a>
            4
           </p>
           <p id="footnote–5">
           <a href="#footnote–anchor–5">[5]</a>
            5
           </p>
           <p id="footnote–6">
           <a href="#footnote–anchor–6">[6]</a>
            6
           </p>
           <p id="footnote–7">
           <a href="#footnote–anchor–7">[7]</a>
            7
           </p>
           <p id="footnote–8">
           <a href="#footnote–anchor–8">[8]</a>
            8
           </p>
           <p id="footnote–9">
           <a href="#footnote–anchor–9">[9]</a>
            9
           </p>
           <p id="footnote–10">
           <a href="#footnote–anchor–10">[10]</a>
            10
           </p>
           <p id="footnote–11">
           <a href="#footnote–anchor–11">[11]</a>
            11
           </p>
           <p id="footnote–12">
           <a href="#footnote–anchor–12">[12]</a>
            12
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

With the current code, one can either have no paragraph tags around paragraphs made just of footnotes and none around the footnotes at the end, or have paragraph tags around both. Determining if the function is in the footnotes at the end requires finding a [ and then reading ahead by at least four runes. With the byte reader, one cannot unread a rune after having already unread a rune. Having decided to use a different data structure, such as a linked list, to store the output would allow for a lot of flexibility here: it would be possible to look back through the most recent nodes and replace them as needed rather than just working forwards.

At this point, I'm going to take a rather unsatisfying step back. The only lines in a blog post with start with a footnote number should be those at the end. Moreover, there shouldn't be any paragraphs of just footnote numbers in the text – that would be very strange to read. Adding these two assumptions allows me to remove the paragraph tags around the footnotes section of the expected answer in the test. One of three \n runes around the </p> also goes.


   {
       name:   "double–digit footnotes should be numbered correctly",
       input:  "Throwaway line\n\n[^1]\n[^2]\n[^3]\n[^4]\n[^5]\n[^6]\n[^7]\n[^8]\n[^9]\n[^10]\n[^11]\n[^12]\n\n[^1]: 1\n[^2]: 2\n[^3]: 3\n[^4]: 4\n[^5]: 5\n[^6]: 6\n[^7]: 7\n[^8]: 8\n[^9]: 9\n[^10]: 10\n[^11]: 11\n[^12]: 12",
       output: "Throwaway line\n\n<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>\n<a id=\"footnote–anchor–3\" href=\"#footnote–3\">[3]</a>\n<a id=\"footnote–anchor–4\" href=\"#footnote–4\">[4]</a>\n<a id=\"footnote–anchor–5\" href=\"#footnote–5\">[5]</a>\n<a id=\"footnote–anchor–6\" href=\"#footnote–6\">[6]</a>\n<a id=\"footnote–anchor–7\" href=\"#footnote–7\">[7]</a>\n<a id=\"footnote–anchor–8\" href=\"#footnote–8\">[8]</a>\n<a id=\"footnote–anchor–9\" href=\"#footnote–9\">[9]</a>\n<a id=\"footnote–anchor–10\" href=\"#footnote–10\">[10]</a>\n<a id=\"footnote–anchor–11\" href=\"#footnote–11\">[11]</a>\n<a id=\"footnote–anchor–12\" href=\"#footnote–12\">[12]</a>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n 1\n</p>\n<p id=\"footnote–2\">\n<a href=\"#footnote–anchor–2\">[2]</a>\n 2\n</p>\n<p id=\"footnote–3\">\n<a href=\"#footnote–anchor–3\">[3]</a>\n 3\n</p>\n<p id=\"footnote–4\">\n<a href=\"#footnote–anchor–4\">[4]</a>\n 4\n</p>\n<p id=\"footnote–5\">\n<a href=\"#footnote–anchor–5\">[5]</a>\n 5\n</p>\n<p id=\"footnote–6\">\n<a href=\"#footnote–anchor–6\">[6]</a>\n 6\n</p>\n<p id=\"footnote–7\">\n<a href=\"#footnote–anchor–7\">[7]</a>\n 7\n</p>\n<p id=\"footnote–8\">\n<a href=\"#footnote–anchor–8\">[8]</a>\n 8\n</p>\n<p id=\"footnote–9\">\n<a href=\"#footnote–anchor–9\">[9]</a>\n 9\n</p>\n<p id=\"footnote–10\">\n<a href=\"#footnote–anchor–10\">[10]</a>\n 10\n</p>\n<p id=\"footnote–11\">\n<a href=\"#footnote–anchor–11\">[11]</a>\n 11\n</p>\n<p id=\"footnote–12\">\n<a href=\"#footnote–anchor–12\">[12]</a>\n 12\n</p>",
   },
   

This is a little unsatisfying, as it is a second case of changing a test after running it. Overall, I'm comfortable with making this change as the end goal should still be achieved with it. If this was different, for example there was a paragraph of text with footnote numbers strewn throughout it and then the paragraph tags were not being added then there would be a very material change to the output.

Now for the counting of in–text footnotes. Currently, these are double–counted as the footnote number increases for each rune between ^ and ].


   } else if nextR != '^' {
       inlineFootnoteNumber++
   }
   

One fix is very simple. Increase the count each time the rune ^ is seen.


   } else if nextR == '^' {
       inlineFootnoteNumber++
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now, how about renumbering footnotes at the end, such that they align with the new footnote numbers provided earlier on. Here, I assume that the


   Throwaway line
   
   This paragraph references a footnote.[^2]
   
   This paragraph[^1] also has a footnote.
   
   [^1]: This is the reference.
   [^2]: This is a footnote.
   


   {
       name:   "footnotes at the end should be renumbered if footnotes in text were renumbered",
       input:  "Throwaway line\n\nThis paragraph references a footnote.[^2]\n\nThis paragraph[^1] also has a footnote.\n\n[^1]: This is the reference.\n[^2]: This is a footnote.",
       output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n<p>\nThis paragraph<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a> also has a footnote.\n</p>\n\n<p id=\"footnote–2\">\n<a href=\"#footnote–anchor–2\">[2]</a>\n This is the reference.\n</p>\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is a footnote.\n</p>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 33
           Test name: footnotes at the end should be renumbered if footnotes in text were renumbered
           expected:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            This is the reference.
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is a footnote.
           </p>
           but got:
           Throwaway line
           <p>
           This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
           </p>
   
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            This is the reference.
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            This is a footnote.
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

A hash map can help match these up. First, by taking the value of the original footnote, and pairing it with the value it was updated to.


   footnoteNumberMap := map[int]int{}
   
   //...
   
      } else {
          sb.WriteString(
             "<a id=\"footnote–anchor–" +
                strconv.Itoa(inlineFootnoteNumber) +
                "\" href=\"#footnote–" +
                strconv.Itoa(inlineFootnoteNumber) +
                "\">[" + strconv.Itoa(inlineFootnoteNumber) +
                "]</a>",
          )
   
          footnoteOriginalNumber, err := strconv.Atoi(inTextFootnoteNumber.String())
          if err != nil {
             log.Fatal("unable to convert string to number:", err)
          }
          footnoteNumberMap[footnoteOriginalNumber] = inlineFootnoteNumber
      //...
      } else if nextR == '^' {
          inlineFootnoteNumber++
   
      } else if nextR != '^' {
          inTextFootnoteNumber.WriteRune(nextR)
      }
   

Rather than counting up the endFootnoteNumber as before, we take the value from inside the [^]: look up the value from the map.


   if nextR == ':' {
       footnoteNumber, err := strconv.Atoi(inTextFootnoteNumber.String())
       if err != nil {
          log.Fatal("unable to convert string to number:", err)
       }
   
       sb.WriteString(
          "<p id=\"footnote–" +
             strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
             "\">\n<a href=\"#footnote–anchor–" +
             strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
             "\">[" +
             strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
             "]</a>",
       )
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Unordered List

This will hopefully be straightforward.


   # Unordered List!
   
   – This is an unordered list with a – dash.
   – One,
   – Two,
   – Three.
   


   <h1> Unordered List!</h1>
   <p>
   <ul>
   <li> This is an unordered list with a – dash.</li>
   <li> One,</li>
   <li> Two,</li>
   <li> Three.</li>
   </ul>
   </p>
   


   {
       name:   "unordered lists should have <ul> tags and <li> tags",
       input:  "# Unordered List!\n\n– This is an unordered list with a – dash.\n– One,\n– Two,\n– Three.",
       output: "<h1> Unordered List!</h1>\n<p>\n<ul>\n<li> This is an unordered list with a – dash.</li>\n<li> One,</li>\n<li> Two,</li>\n<li> Three.</li>\n</ul>\n</p>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:206: TestConvertMarkdownFileToBlogHTML test number: 34
           Test name: unordered lists should have <ul> tags and <li> tags
           expected:
           <h1> Unordered List!</h1>
   
           <ul>
           <li> This is an unordered list with a – dash.</li>
           <li> One,</li>
           <li> Two,</li>
           <li> Three.</li>
           </ul>
           but got:
           <h1> Unordered List!</h1>
           <p>
           – This is an unordered list with a – dash.
           </p>
           – One,
           – Two,
           – Three.
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Here, one needs to distinguish between a which is simply in–text, and one which is the start of a list. The distinction is on whether a \n is followed by or not. The list ends when there are two \n in a row.

Let's try this.


   thereIsAnUnorderedListOpen := false
   //...
   
      case '\n':
          //fmt.Println("\n", thereIsACodeBlockOpen, lastCharacterWasANewLine)
          if headerCount > 0 {
             addClosingHeaderTag(&sb, headerCount)
             headerCount = 0
          }
   
          if !thereIsACodeBlockOpen {
             if thereIsAnUnorderedListOpen {
                sb.WriteRune('\n')
                sb.WriteString("</ul>")
                thereIsAnUnorderedListOpen = false
             }
   
         sb.WriteRune('\n')
         lastCharacterWasANewLine = true // <– this was missing from earlier
      //...
      case '–':
       if lastCharacterWasANewLine {
          if thereIsAnUnorderedListOpen {
             sb.WriteString("<li>")
   
           var nextR rune
           for nextR != '\n' {
               nextR, _, err := br.ReadRune()
               if err == io.EOF {
                  sb.WriteString("</li>")
                  break
               }
               if err != nil {
                  log.Fatal("unable to read ahead by one rune:", err)
               }
               if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
                  sb.WriteString(htmlEntityMap[nextR])
               } else {
                  sb.WriteRune(nextR)
               }
           }
           sb.WriteString("</li>")
             sb.WriteRune('\n')
   
          } else {
             sb.WriteString("<ul>")
             thereIsAnUnorderedListOpen = true
          }
       } else {
          sb.WriteString(htmlEntityMap[r])
       }
       lastCharacterWasANewLine = false
   

Here, I noticed, and corrected, the issue of not having lastCharacterWasANewLine set to true at the end of the case: '\n'.

A few tests broke here.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:206: TestConvertMarkdownFileToBlogHTML test number: 21
           Test name: a multi–line code block with a directory structure within it should be rendered correctly
           expected:
           <pre><code>
           – dashboard
           | – frontend
           | – backend
           </code></pre>
           but got:
           <pre><code>
           <ul> dashboard
           | – frontend
           | – backend
           </code></pre>
   
       main_test.go:206: TestConvertMarkdownFileToBlogHTML test number: 22
           Test name: a multi–line code block with a directory structure within it should be rendered correctly
           expected:
           <pre><code>
           – dashboard
           | – frontend
           | – backend
           </code></pre>
           but got:
           <pre><code>
           <ul> dashboard
           | – frontend
           | – backend
           </code></pre>
   
       main_test.go:206: TestConvertMarkdownFileToBlogHTML test number: 34
           Test name: unordered lists should have <ul> tags and <li> tags
           expected:
           <h1> Unordered List!</h1>
           <p>
           <ul>
           <li> This is an unordered list with a – dash.</li>
           <li> One,</li>
           <li> Two,</li>
           <li> Three.</li>
           </ul>
           </p>
           but got:
           <h1> Unordered List!</h1>
           <p>
           <ul> This is an unordered list with a – dash.
           </ul>
           </p>
           <ul> One,
           </ul>
           <ul> Two,
           </ul>
           <ul> Three.
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

If there is a code block open, then \n can be followed by without starting an unordered list. There's also an issue where none of the <li> tags are added.

Let's fix the first one.


   case '–':
       if lastCharacterWasANewLine {
          if thereIsACodeBlockOpen {
             sb.WriteString("–")
             continue
          }
   

Now there's only the unordered list test to fix. This is quite a bit more involved.


   thereIsAnUnorderedListOpen := false
   
   //...
   
      case '\n':
          //...
   
          if !thereIsACodeBlockOpen {
             if thereIsAnUnorderedListOpen {
                sb.WriteString("</ul>")
                thereIsAnUnorderedListOpen = false
             }
   
      //...
      case '–':
          if thereIsAnUnorderedListOpen {
            if thereIsACodeBlockOpen {
                sb.WriteString("–")
                continue
             }
             sb.WriteString("<li>")
   
             var nextR rune
             for nextR != '\n' {
                nextR, _, err = br.ReadRune()
                if err == io.EOF {
                   break
                }
                if err != nil {
                   log.Fatal("unable to read ahead by one rune:", err)
                }
   
                if nextR == '\n' {
                   break
                }
                if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
                   sb.WriteString(htmlEntityMap[nextR])
                } else {
                   sb.WriteRune(nextR)
                }
             }
             sb.WriteString("</li>")
             sb.WriteRune('\n')
   
          } else if lastCharacterWasANewLine {
             if thereIsACodeBlockOpen {
                sb.WriteString("–")
                continue
             }
             if thereIsAnUnorderedListOpen {
                sb.WriteString("<li>")
   
                var nextR rune
                for nextR != '\n' {
                   nextR, _, err := br.ReadRune()
                   if err == io.EOF {
                      break
                   }
                   if err != nil {
                      log.Fatal("unable to read ahead by one rune:", err)
                   }
   
                   if nextR == '\n' {
                      break
                   }
                   if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
                      sb.WriteString(htmlEntityMap[nextR])
                   } else {
                      sb.WriteRune(nextR)
                   }
                }
                sb.WriteString("</li>")
                sb.WriteRune('\n')
   
             } else {
                sb.WriteString("<ul>")
                thereIsAnUnorderedListOpen = true
                sb.WriteRune('\n')
                sb.WriteString("<li>")
   
                var nextR rune
                for nextR != '\n' {
                   nextR, _, err = br.ReadRune()
                   if err == io.EOF {
                      break
                   }
                   if err != nil {
                      log.Fatal("unable to read ahead by one rune:", err)
                   }
   
                   if nextR == '\n' {
                      break
                   }
                   if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
                      sb.WriteString(htmlEntityMap[nextR])
                   } else {
                      sb.WriteRune(nextR)
                   }
                }
                sb.WriteString("</li>")
                sb.WriteRune('\n')
             }
          } else {
             sb.WriteString(htmlEntityMap[r])
          }
          lastCharacterWasANewLine = false
      //...
   
   if thereIsAnUnorderedListOpen {
       sb.WriteString("</ul>")
   }
   
   if thereIsAParagraphToClose {
       sb.WriteRune('\n')
       sb.WriteString("</p>")
   }
   
   return sb.String()
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

There is a lot of code here which needs to be refactored out later on.

Tables

Tables are all started with a |. Let's start with just a header. I assume that the entire set of table tags should be added if a table has been started.


   | Table | Head |
   


   <table class="table is–hoverable">
   <thead>
   <tr>
   <th> Table </th>
   <th> Head </th>
   </tr>
   </thead>
   <tbody>
   </tbody>
   </table>
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:211: TestConvertMarkdownFileToBlogHTML test number: 35
           Test name: the head of a table should be added correctly
           expected:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> Table </th>
           <th> Head </th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table>
           but got:
           | Table | Head |
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Let's try:


   case '|':
       sb.WriteString("<table class=\"table is–hoverable\">")
       sb.WriteRune('\n')
       sb.WriteString("<thead>")
       sb.WriteRune('\n')
       sb.WriteString("<tr>")
       sb.WriteRune('\n')
       sb.WriteString("<th>")
   
       var nextR rune
       for nextR != '\n' {
          nextR, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == '|' {
             nextR, _, err = br.ReadRune()
             if err == io.EOF {
                sb.WriteString("</th>")
                sb.WriteRune('\n')
                break
             }
             if err != nil {
                log.Fatal("unable to read next rune:", err)
             }
             if nextR == '\n' {
                break
             }
             err = br.UnreadRune()
             if err != nil {
                log.Fatal("unable to unread rune:", err)
             }
             sb.WriteString("</th>")
             sb.WriteRune('\n')
             sb.WriteString("<th scope=\"col\">")
          } else {
             sb.WriteRune(nextR)
          }
       }
       sb.WriteString("</tr>")
       sb.WriteRune('\n')
       sb.WriteString("</thead>")
       sb.WriteRune('\n')
       sb.WriteString("<tbody>")
       sb.WriteRune('\n')
       sb.WriteString("</tbody>")
       sb.WriteRune('\n')
       sb.WriteString("</table>")
   

On re–running the tests:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:211: TestConvertMarkdownFileToBlogHTML test number: 21
           Test name: a multi–line code block with a directory structure within it should be rendered correctly
           expected:
           <pre><code>
           – dashboard
           | – frontend
           | – backend
           </code></pre>
           but got:
           <pre><code>
           – dashboard
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> – frontend
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table><table class="table is–hoverable">
           <thead>
           <tr>
           <th> – backend
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table></code></pre>
       main_test.go:211: TestConvertMarkdownFileToBlogHTML test number: 22
           Test name: a multi–line code block with a directory structure within it should be rendered correctly
           expected:
           <pre><code>
           – dashboard
           | – frontend
           | – backend
           </code></pre>
           but got:
           <pre><code>
           – dashboard
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> – frontend
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table><table class="table is–hoverable">
           <thead>
           <tr>
           <th> – backend
   
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table></code></pre>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Oh dear. This both failed to account for | within code blocks. At least the condition is clear.


   case '|':
       if thereIsACodeBlockOpen {
          sb.WriteRune('|')
          continue
       }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now to add the border line after the header. I assume that this should be skipped, as the set of table tags is added when the table header is found.


   | Table | Head |
   |––|––|
   


   {
       name:   "the border line after the header of a table should be added correctly",
       input:  "| Table | Head |\n|––|––|",
       output: "<table class=\"table is–hoverable\">\n<thead>\n<tr>\n<th> Table </th>\n<th> Head </th>\n</tr>\n</thead>\n<tbody>\n</tbody>\n</table>",
   },
   

This will fail quite spectacularly.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:216: TestConvertMarkdownFileToBlogHTML test number: 36
           Test name: the border line after the header of a table should be added correctly
           expected:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> Table </th>
           <th> Head </th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table>
           but got:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> Table </th>
           <th> Head </tr>
           </thead>
           <tbody>
           </tbody>
           </table><table class="table is–hoverable">
           <thead>
           <tr>
           <th>––</th>
           <th>––</th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

We can read past the border to the next new line character.


   case '|':
       //...
       for nextR != '\n' {
          //...
          if nextR == '|' {
             //...
             if nextR == '\n' {
                sb.WriteString("</th>")
                sb.WriteRune('\n')
                break
             }
             //...
       }
       sb.WriteString("</tr>")
       sb.WriteRune('\n')
       sb.WriteString("</thead>")
       sb.WriteRune('\n')
   
       var afterR rune
       for afterR != '\n' {
          afterR, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
       }
   
       sb.WriteString("<tbody>")
       sb.WriteRune('\n')
       sb.WriteString("</tbody>")
       sb.WriteRune('\n')
       sb.WriteString("</table>")
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now for a table with content.


   | col name one | col name two |
   |–|–|
   | row contents one | row contents two |
   


   <table class="table table–hover">
   <thead>
   <tr>
   <th> col name one </th>
   <th> col name two </th>
   </tr>
   </thead>
   <tbody>
   <tr>
   <td> row contents one </td>
   <td> row contents two </td>
   </tr>
   </tbody>
   </table>
   

And now the output:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:221: TestConvertMarkdownFileToBlogHTML test number: 37
           Test name: simple tables without markdown characters in them should have the appropriate table tags added
           expected:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> col name one </th>
           <th> col name two </th>
           </tr>
           </thead>
           <tbody>
           <tr>
           <td> row contents one </td>
           <td> row contents two </td>
           </tr>
           </tbody>
           </table>
           but got:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> col name one </th>
           <th> col name two </th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table><table class="table is–hoverable">
           <thead>
           <tr>
           <th> row contents one </th>
           <th> row contents two </th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

For each line after this, one can read runes until an \n is found.


   sb.WriteString("<tbody>")
   sb.WriteRune('\n')
   
   nextR, _, err = br.ReadRune()
   if err == io.EOF {
       sb.WriteString("</tbody>")
       sb.WriteRune('\n')
       sb.WriteString("</table>")
       break
   }
   if err != nil {
       log.Fatal("unable to read next rune:", err)
   }
   if nextR == '|' {
       sb.WriteString("<tr>")
       sb.WriteRune('\n')
       sb.WriteString("<td>")
   
       numOfConsecutiveNewLines := 0
   
       for numOfConsecutiveNewLines < 2 {
          nextR, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == '\n' {
             sb.WriteString("</td>")
             sb.WriteRune('\n')
             sb.WriteString("</tr>")
             sb.WriteRune('\n')
             numOfConsecutiveNewLines++
   
          } else {
             if numOfConsecutiveNewLines == 1 {
                sb.WriteString("<td>")
             } else {
                if nextR == '|' {
                   sb.WriteString("</td>")
   
                   nextR, _, err = br.ReadRune()
                   if err == io.EOF {
                      sb.WriteRune('\n')
                      sb.WriteString("</tr>")
                      sb.WriteRune('\n')
                      break
                   }
                   if err != nil {
                      log.Fatal("unable to read next rune:", err)
                   }
                   sb.WriteRune('\n')
                   if nextR != '\n' {
                      sb.WriteString("<td>")
                   }
                   err = br.UnreadRune()
                   if err != nil {
                      log.Fatal("unable to unread rune:", err)
                   }
   
             } else {
                   sb.WriteRune(nextR)
                }
             }
             numOfConsecutiveNewLines = 0
          }
       }
   }
   
   sb.WriteString("</tbody>")
   sb.WriteRune('\n')
   sb.WriteString("</table>")
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

This will be simplified by extracting out much of this logic into a function.

Tables can contain HTML entities as well.


   | col name one | col name two |
   |–|–|
   | A non–entity / | Some entities – ' |
   | < More entities > | "And I quote..." |
   


   <table class="table is–hoverable">
   <thead>
   <tr>
   <th> col name one </th>
   <th> col name two </th>
   </tr>
   </thead>
   <tbody>
   <tr>
   <td> A non–entity / </td>
   <td> Some entities – ' </td>
   </tr>
   <tr>
   <td> < More entities > </td>
   <td> "And I quote..." </td>
   </tr>
   </tbody>
   </table>
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:226: TestConvertMarkdownFileToBlogHTML test number: 38
           Test name: a table with html entities should have them replaced
           expected:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> col name one </th>
           <th> col name two </th>
           </tr>
           </thead>
           <tbody>
           <tr>
           <td> A non–entity / </td>
           <td> Some entities – ' </td>
           </tr>
           <tr>
           <td> < More entities > </td>
           <td> "And I quote..." </td>
           </tr>
           </tbody>
           </table>
           but got:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> col name one </th>
           <th> col name two </th>
           </tr>
           </thead>
           <tbody>
           <tr>
           <td> A non–entity / </td>
           <td> Some entities – ' </td>
           </td>
           </tr>
           <td> < More entities > </td>
           <td> "And I quote..." </td>
           </tr>
           </tbody>
           </table>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

There's an additional </td> coming from somewhere, and one–too–few <tr> tags.

For the first, let's try removing the </tr> included when a new line character is found.


   for numOfConsecutiveNewLines < 2 {
       nextR, _, err = br.ReadRune()
       if err == io.EOF {
          break
       }
       if err != nil {
          log.Fatal("unable to read next rune:", err)
       }
       if nextR == '\n' {
          sb.WriteRune('\n')
          numOfConsecutiveNewLines++
       } else {
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:226: TestConvertMarkdownFileToBlogHTML test number: 38
           Test name: a table with html entities should have them replaced
           expected:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> col name one </th>
           <th> col name two </th>
           </tr>
           </thead>
           <tbody>
           <tr>
           <td> A non–entity / </td>
           <td> Some entities – ' </td>
           </tr>
           <tr>
           <td> < More entities > </td>
           <td> "And I quote..." </td>
           </tr>
           </tbody>
           </table>
           but got:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> col name one </th>
           <th> col name two </th>
           </tr>
           </thead>
           <tbody>
           <tr>
           <td> A non–entity / </td>
           <td> Some entities – ' </td>
           </tr>
           <td> < More entities > </td>
           <td> "And I quote..." </td>
           </tr>
           </tbody>
           </table>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

That seems to have solved it, and removed some redundancy.

Just a little further down in the code there is a check to add a new <td> if the last character was \n. This should also include a <tr> as a new table row starts after a single new line character.


   for numOfConsecutiveNewLines < 2 {
       nextR, _, err = br.ReadRune()
       //...
   
       } else {
          if numOfConsecutiveNewLines == 1 {
             sb.WriteString("<tr>")
             sb.WriteRune('\n')
             sb.WriteString("<td>")
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now what if a table has footnotes? I'll return to this after refactoring the code, as extracting the footnote–translating code into its own functions will make this significantly easier.

Images

One set of items I had forgotten to account for in the initial grammar were images.

This is a little trickier. Currently, I don't decide on whether an image should take the full width of a column, or if it should sit side–by–side with another image until actually submitting the post.

The former would suggest:

   
Markdown HTML
![[image_name.png]] <figure class="image"> <img src="/directory_name/image_name.png"> </figure>
hilst the latter *might* require:

   
Markdown HTML
![[image_name.png]] <div class="columns"> <div class="column"> <figure class="image is–5by4"> <img src="image_name.png"> </figure> </div> </div>

For now, I will assume the former, simpler case. The blog posts have relatively thin columns, and it will be in relatively particular cases that two images sit side–by–side.

The directory_name can be supplied by the user as a command–line argument. This will be added later on.


   {
       name:   "images should be placed into <figure> and <img> tags",
       input:  "![[image_name.png]]",
       output: "<figure class=\"image\">\n<img src=\"/directory_name/image_name.png\">\n</figure>",
   },
   


   // Default directory name. To be overwritten by user with
   // a command–line flag.
   var imageDirectoryName = "/directory_name"
   

Time to run the latest test:


   === RUN   TestConvertMarkdownFileToBlogHTML
   2024/02/03 13:40:42 unable to convert string to number:strconv.Atoi: parsing "[image_name.png": invalid syntax
   

As there is no case for the ! character, it is being read as a plain rune and the function then attempts to read the image as if it was a footnote. Time to add a new case.


   // Default directory name. To be overwritten by user with
   // a command–line flag.
   var imageDirectoryName = "/directory_name"
   
   //...
   
   case '!':
       // Assumes structure of ![[image_name.png]]
       nextR, _, err := br.ReadRune()
       if err == io.EOF {
          break
       }
       if err != nil {
          log.Fatal("unable to read rune:", err)
       }
       if nextR != '[' {
          // not an image
          continue
       }
   
       var imageNameAndExtension = strings.Builder{}
   
       for nextR != '\n' {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break // to update for plain '!'
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
          if nextR != '[' && nextR != ']' {
             imageNameAndExtension.WriteRune(nextR)
          }
       }
   
       sb.WriteString("<figure class=\"image\">")
       sb.WriteRune('\n')
       sb.WriteString("<img src=\"" + imageDirectoryName + "/" + imageNameAndExtension.String() + "\">")
       sb.WriteRune('\n')
       sb.WriteString("</figure>")
   

Here, once a ! is found, read the next character to see if it is an image. I assume that only images will have the combination of ![.

Let's see how this shakes out.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:236: TestConvertMarkdownFileToBlogHTML test number: 34
           Test name: unordered lists should have <ul> tags and <li> tags
           expected:
           <h1> Unordered List!</h1>
           <p>
           <ul>
           <li> This is an unordered list with a – dash.</li>
           <li> One,</li>
           <li> Two,</li>
           <li> Three.</li>
           </ul>
           </p>
           but got:
           <h1> Unordered List</h1>
           <ul>
           <li> This is an unordered list with a – dash.</li>
           <li> One,</li>
           <li> Two,</li>
           <li> Three.</li>
           </ul>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

So, the new test passes, but the unordered list test now fails. This is due to the ! not being written as a plain character.


   if nextR != '[' {
       sb.WriteRune('!')
       err = br.UnreadRune()
       if err != nil {
          log.Fatal("unable to unread rune:", err)
       }
       continue
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now what about a ! at the end of a file?


   {
       name:   "! at the end of a file should be written correctly.",
       input:  "A sentence!",
       output: "A sentence!",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:241: TestConvertMarkdownFileToBlogHTML test number: 40
           Test name: ! at the end of a file should be written correctly.
           expected:
           A sentence!
           but got:
           A sentence
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

A small correction to the io.EOF condition fixes this.


   case '!':
       nextR, _, err := br.ReadRune()
       if err == io.EOF {
          sb.WriteRune('!')
          break
       }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

It is high–time to refactor the code.

Refactoring

Header

Simply extracting the header case from the main function takes us from:


   case '#':
       if !finishedCountingHeaderTagsForLine {
          headerCount++
   
       } else if finishedCountingHeaderTagsForLine {
          sb.WriteRune('#')
       }
       lastCharacterWasANewLine = false
   

To:


   case '#':
      countOpeningHeaderTagNumber(
          &sb,
          &finishedCountingHeaderTagsForLine,
          &headerCount,
          &lastCharacterWasANewLine,
      )
   
   //...
   
   func countOpeningHeaderTagNumber(sb *strings.Builder, finishedCountingHeaderTagsForLine *bool, headerCount *int, lastCharacterWasANewLine *bool) {
       if *finishedCountingHeaderTagsForLine {
          sb.WriteRune('#')
       } else {
          *headerCount++
       }
       *lastCharacterWasANewLine = false
   }
   

This passes the test suite:


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

However, there's an opportunity sitting on the sidewalk here. The header tag is still not added until the case ' ': is found, and multiple variables are passed into this function to provide context. One can check if a header function needs to be added at all, and if so undertake all the work to add header tags in the same function. The closing header tags are also in other places. There is no need to have this information strewn across multiple cases.

Before proceeding, I'll add one more test to the test suite, checking that HTML elements in a header at replaced correctly.


   {
       name:   "html elements in a header should be replaced correctly",
       input:  "# A header with html < > \" ' – elements",
       output: "<h1> A header with html < > " ' – elements</h1>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Good. Time to rewrite this.

First, to check if a header tag should be added at all. This should be if this is the first character in the file, or the last character was a new line. If the string builder is empty, I assume the byte reader is at the first character of the file.


   case '#':
       if sb.Len() == 0 {
          addHeaderTags(//...)
       } else if lastCharacterWasANewLine {
          addHeaderTags(//...)
       } else {
          sb.WriteRune('#')
       }
   

As the closing header tags will be added in the function, the headerCount and finishedCountingHeaderTagsForLine variables can be declared in there rather than in this outer function. lastCharacterWasANewLine is also not needed in this function, as it is decided by the coordinating function.


   func addHeaderTags(br *bytes.Reader, sb *strings.Builder) {
       var finishedCountingHeaderTagsForLine = false
       var headerCount = 1 // assume 1, as a '#' has been seen in order to get to here
       var nextR rune
   
       for nextR != '\n' {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune in header:", err)
          }
          if !finishedCountingHeaderTagsForLine {
             if nextR == '#' {
                headerCount++
             }
             if nextR == ' ' {
                finishedCountingHeaderTagsForLine = true
                sb.WriteString("<h" + strconv.Itoa(headerCount) + ">")
                sb.WriteRune(' ')
             }
   
          } else if nextR == '\n' {
             err := br.UnreadRune()
             if err != nil {
                log.Fatal("unable to unread rune when adding header tags:", err)
             }
             break
   
         } else {
             sb.WriteRune(nextR)
          }
       }
   
       sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
   }
   

There was a bit of a regression here:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:246: TestConvertMarkdownFileToBlogHTML test number: 41
           Test name: html elements in a header should be replaced correctly
           expected:
           <h1> A header with html < > " ' – elements</h1>
           but got:
           <h1> A header with html < > " ' – elements</h1>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Throughout the function there is a repeating check of:


   if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
       sb.WriteString(htmlEntityMap[nextR])
   } else {
       sb.WriteRune(nextR)
   }
   

This can become its own function.


   func addRuneOrHTMLEntity(r rune, sb *strings.Builder) {
       if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, r) {
          sb.WriteString(htmlEntityMap[r])
       } else {
          sb.WriteRune(r)
       }
   }
   

All other instances can be replaced with a call to the function, together with the else { sb.WriteRune(nextR) } in addHeaderTags.


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now the headerCount and finishedCountingHeaderTagsForLine variables can be removed from the main function, together with any remaining header tag functions elsewhere. This simplifies the:

  • first io.EOF check in the for loop,
  • the case ' ':,
  • the case '\n':.


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

We can go a step further as well, and take the check for whether a # leads to a header into its own function too.


   case '#':
       addHeaderTagsOrPoundRune(br, &sb, lastCharacterWasANewLine)
   
   //...
   
   func addHeaderTagsOrPoundRune(br *bytes.Reader, sb*strings.Builder, lastCharacterWasANewLine bool){
       if sb.Len() == 0 {
          addHeaderTags(br, sb)
   
       } else if lastCharacterWasANewLine {
          addHeaderTags(br, sb)
   
       } else {
          sb.WriteRune('#')
       }
   }
   

The addClosingHeaderTag function is now redundant and can be removed.

Spaces

In extracting out the header case, the space case is now simple enough to leave as–is. The same type of simplification will naturally arise if code blocks are refactored out before other cases.


   case ' ':
       sb.WriteRune(r)
       lastCharacterWasANewLine = false
   

Code Blocks


   case '`':
       addCodeBlock(br, &sb, &numberOfCurrentBackQuotes, &thereIsACodeBlockOpen)
       lastCharacterWasANewLine = false
   
   //...
   
   func addCodeBlock(br *bytes.Reader, sb *strings.Builder, numberOfCurrentBackQuotes *int, thereIsACodeBlockOpen *bool) {
       *numberOfCurrentBackQuotes++
       nextR, _, err := br.ReadRune()
       if err == io.EOF {
          if *thereIsACodeBlockOpen {
             sb.WriteString("</code>")
             if *numberOfCurrentBackQuotes == 6 {
                sb.WriteString("</pre>")
             }
             *thereIsACodeBlockOpen = false
             *numberOfCurrentBackQuotes = 0
          }
          return
       }
       if err != nil {
          log.Fatal("unable to read next rune:", err)
       }
       if nextR == '`' {
          *numberOfCurrentBackQuotes++
          return
   
       } else {
          if *numberOfCurrentBackQuotes == 3 || *numberOfCurrentBackQuotes == 6 {
             if *thereIsACodeBlockOpen {
                sb.WriteString("</code></pre>")
                *thereIsACodeBlockOpen = false
                *numberOfCurrentBackQuotes = 0
             } else {
                sb.WriteString("<pre><code>")
                *thereIsACodeBlockOpen = true
   
                for nextR != '\n' {
                   nextR, _, err = br.ReadRune()
                   if err == io.EOF {
                      break
                   }
                   if err != nil {
                      log.Fatal("unable to read next rune:", err)
                   }
                }
             }
          } else {
             if *thereIsACodeBlockOpen {
                sb.WriteString("</code>")
                *thereIsACodeBlockOpen = false
                *numberOfCurrentBackQuotes = 0
             } else {
                sb.WriteString("<code>")
                *thereIsACodeBlockOpen = true
             }
          }
   
          err = br.UnreadRune()
          if err != nil {
             log.Fatal("unable to unread rune:", err)
          }
       }
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Time to improve upon this.

Neither numberOfCurrentBackQuotes nor thereIsACodeBlockOpen needs to be available to other parts of the for loop. They can be declared in addCodeBlock function.

To contain more of the code–block–specific logic in this function, it will loop over runes. The code can also be rearranged slightly to make it slightly easier to read. Additionally, the addRuneOrHTMLEntity function can be used to add all of the characters and HTML entities which needed to be added by other rune cases.


   func addCodeBlock(br *bytes.Reader, sb *strings.Builder) {
       var numberOfCurrentBackQuotes = 1
       var thereIsACodeBlockOpen = false
   
       for {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             if thereIsACodeBlockOpen {
                sb.WriteString("</code>")
                if numberOfCurrentBackQuotes == 6 {
                   sb.WriteString("</pre>")
                }
                thereIsACodeBlockOpen = false
                numberOfCurrentBackQuotes = 0
             }
             return
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
   
          if nextR == '`' {
             numberOfCurrentBackQuotes++
   
             if thereIsACodeBlockOpen {
                if numberOfCurrentBackQuotes == 2 {
                   sb.WriteString("</code>")
                   return
                }
             }
   
             if numberOfCurrentBackQuotes == 3 {
                sb.WriteString("<pre><code>")
                thereIsACodeBlockOpen = true
   
                for nextR != '\n' {
                   nextR, _, err = br.ReadRune()
                   if err == io.EOF {
                      break
                   }
                   if err != nil {
                      log.Fatal("unable to read next rune:", err)
                   }
                }
                sb.WriteRune('\n')
             }
   
             if numberOfCurrentBackQuotes == 6 {
                sb.WriteString("</code></pre>")
                return
             }
   
          } else {
             if !thereIsACodeBlockOpen {
                if numberOfCurrentBackQuotes == 1 {
                   sb.WriteString("<code>")
                   thereIsACodeBlockOpen = true
   
                } else if numberOfCurrentBackQuotes == 2 {
                   sb.WriteString("</code>")
                   return
                }
             }
   
             addRuneOrHTMLEntity(nextR, sb)
          }
       }
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now the thereIsACodeBlockOpen conditions littered through the different cases can be deleted. This simplifies:

  • case '\n':,
  • case '*':,
  • case '–':,
  • case '|':.

For example, the asterisk case changes from:


   case '*':
       if thereIsACodeBlockOpen {
          sb.WriteRune('*')
       } else {
          countAsterisks++
       }
       lastCharacterWasANewLine = false
   

To:


   case '*':
       countAsterisks++
       lastCharacterWasANewLine = false
   

Images

The logic for adding an image tag is relatively simple and can just be extracted out as–is for now. As it doesn't interact with any other cases, it passes tests as it.


   case '!':
       addImageTags(br, &sb)
   
   //...
   
   func addImageTags(br *bytes.Reader, sb *strings.Builder) {
       // Assumes structure of ![[image_name.png]]
       nextR, _, err := br.ReadRune()
       if err == io.EOF {
          sb.WriteRune('!')
          return
       }
       if err != nil {
          log.Fatal("unable to read rune:", err)
       }
       if nextR != '[' {
          sb.WriteRune('!')
          err = br.UnreadRune()
          if err != nil {
             log.Fatal("unable to unread rune:", err)
          }
          return
       }
   
       var imageNameAndExtension = strings.Builder{}
   
       for nextR != '\n' {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
          if nextR != '[' && nextR != ']' {
             imageNameAndExtension.WriteRune(nextR)
          }
       }
   
       sb.WriteString("<figure class=\"image\">")
       sb.WriteRune('\n')
       sb.WriteString("<img src=\"" + imageDirectoryName + "/" + imageNameAndExtension.String() + "\">")
       sb.WriteRune('\n')
       sb.WriteString("</figure>")
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Unordered List

As above, this will first be extracted out as–is to start with.


   case '–':
       addUnorderedList(br, &sb, r, &thereIsAnUnorderedListOpen, &lastCharacterWasANewLine)
       lastCharacterWasANewLine = false
   
   //...
   func addUnorderedList(br *bytes.Reader, sb *strings.Builder, r rune, thereIsAnUnorderedListOpen *bool, lastCharacterWasANewLine *bool) {
       var err error
       if *thereIsAnUnorderedListOpen {
          sb.WriteString("<li>")
   
          var nextR rune
          for nextR != '\n' {
             nextR, _, err = br.ReadRune()
             if err == io.EOF {
                break
             }
             if err != nil {
                log.Fatal("unable to read ahead by one rune:", err)
             }
   
             if nextR == '\n' {
                break
             }
             addRuneOrHTMLEntity(nextR, sb)
          }
          sb.WriteString("</li>")
          sb.WriteRune('\n')
   
       } else if *lastCharacterWasANewLine {
          if *thereIsAnUnorderedListOpen {
             sb.WriteString("<li>")
   
             var nextR rune
             for nextR != '\n' {
                nextR, _, err := br.ReadRune()
                if err == io.EOF {
                   break
                }
                if err != nil {
                   log.Fatal("unable to read ahead by one rune:", err)
                }
   
                if nextR == '\n' {
                   break
                }
                addRuneOrHTMLEntity(nextR, sb)
             }
             sb.WriteString("</li>")
             sb.WriteRune('\n')
   
          } else {
             sb.WriteString("<ul>")
             *thereIsAnUnorderedListOpen = true
             sb.WriteRune('\n')
             sb.WriteString("<li>")
   
             var nextR rune
             for nextR != '\n' {
                nextR, _, err = br.ReadRune()
                if err == io.EOF {
                   break
                }
                if err != nil {
                   log.Fatal("unable to read ahead by one rune:", err)
                }
   
                if nextR == '\n' {
                   break
                }
                addRuneOrHTMLEntity(nextR, sb)
             }
             sb.WriteString("</li>")
             sb.WriteRune('\n')
          }
       } else {
          sb.WriteString(htmlEntityMap[r])
       }
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

The rune r is only passed into to add an HTML entity if the last character was not a new line. Both of these can be extracted out rather than being passed into the function. thereIsAnUnorderedListOpen can also be declared within the function.


   case '–':
       if lastCharacterWasANewLine {
          addUnorderedList(br, &sb)
   
       } else {
          sb.WriteString(htmlEntityMap[r])
       }
       lastCharacterWasANewLine = false
   

addUnorderedList also needs to loop over runes. In doing this, the remaining code can be simplified.


   func addUnorderedList(br *bytes.Reader, sb *strings.Builder) {
       var lastCharacterWasANewLine = false
       var nextR rune
       var err error
   
       sb.WriteString("<ul>")
       sb.WriteRune('\n')
       sb.WriteString("<li>")
   
       for {
          nextR, _, err = br.ReadRune()
          if err == io.EOF {
             sb.WriteString("</li>")
             sb.WriteRune('\n')
             break
          }
          if err != nil {
             log.Fatal("unable to read rune when creating an unordered list:", err)
          }
   
          if nextR == '–' {
             if lastCharacterWasANewLine {
                sb.WriteString("<li>")
   
             } else {
                sb.WriteString(htmlEntityMap[nextR])
             }
             lastCharacterWasANewLine = false
   
          } else if nextR == '\n' {
             sb.WriteString("</li>")
             sb.WriteRune('\n')
             lastCharacterWasANewLine = true
   
          } else {
             addRuneOrHTMLEntity(nextR, sb)
             lastCharacterWasANewLine = false
          }
       }
   
       sb.WriteString("</ul>")
   }
   

The only way out of this new code is via an io.EOF error. Let's add a test.


   # Header
   
   – Unordered
   – List
   
   End of file.
   


   <h1> Header</h1>
   <p>
   <ul>
   <li> Unordered</li>
   <li> List</li>
   </ul>
   </p>
   <p>
   End of file.
   </p>
   

This confirms the above suspicion. There's also an unnecessary \n between list items.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 42
           Test name: unordered lists should have their tags closed correctly before the next piece of content
           expected:
           <h1> Header</h1>
           <p>
           <ul>
           <li> Unordered</li>
           <li> List</li>
           </p>
           <p>
           End of file.
           </p>
           but got:
           <h1> Header</h1>
           <p>
           <ul>
           <li> Unordered</li>
   
           <li> List</li>
           </li>
           End of file.</li>
           </ul>
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

A check for consecutive \n should catch this out.


   func addUnorderedList(br *bytes.Reader, sb *strings.Builder) {
       var lastCharacterWasANewLine = false
   
       sb.WriteString("<ul>")
       sb.WriteRune('\n')
       sb.WriteString("<li>")
   
       for {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             sb.WriteString("</li>")
             sb.WriteRune('\n')
             break
          }
          if err != nil {
             log.Fatal("unable to read rune when creating an unordered list:", err)
          }
   
          if nextR == '–' {
             if lastCharacterWasANewLine {
                sb.WriteString("<li>")
   
             } else {
                sb.WriteString(htmlEntityMap[nextR])
             }
             lastCharacterWasANewLine = false
   
          } else if nextR == '\n' {
             if lastCharacterWasANewLine {
                err = br.UnreadRune()
                if err != nil {
                   log.Fatal("unable to unread rune at end of unordered list:", err)
                }
                break
   
             } else {
                sb.WriteString("</li>")
                sb.WriteRune('\n')
                lastCharacterWasANewLine = true
             }
   
          } else {
             addRuneOrHTMLEntity(nextR, sb)
             lastCharacterWasANewLine = false
          }
       }
   
       sb.WriteString("</ul>")
   }
   

There's a new issue with paragraph tags, however. This will have to be solved as the new line character case is refactored.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 42
           Test name: unordered lists should have their tags closed correctly before the next piece of content
           expected:
           <h1> Header</h1>
           <p>
           <ul>
           <li> Unordered</li>
           <li> List</li>
           </ul>
           </p>
           <p>
           End of file.
           </p>
           but got:
           <h1> Header</h1>
           <p>
           <ul>
           <li> Unordered</li>
           <li> List</li>
           </ul>
           </p>
           End of file.
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

The case '–': can set lastCharacterWasANewLine to true after an unordered list has been added.


   case '–':
       if lastCharacterWasANewLine {
          addUnorderedList(br, &sb)
          lastCharacterWasANewLine = true
   
       } else {
          sb.WriteString(htmlEntityMap[r])
          lastCharacterWasANewLine = false
       }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 42
           Test name: unordered lists should have their tags closed correctly before the next piece of content
           expected:
           <h1> Header</h1>
           <p>
           <ul>
           <li> Unordered</li>
           <li> List</li>
           </ul>
           </p>
           <p>
           End of file.
           </p>
           but got:
           <h1> Header</h1>
           <p>
           <ul>
           <li> Unordered</li>
           <li> List</li>
           </ul>
           </p><p>
           End of file.
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

This has arisen due to the </ul> being added within the logic rather than in the case '\n':, leaving one fewer \n to write between the paragraph tags. It is not possible to unread a rune twice. A tacked–on solution is to add thereIsAnUnorderedListOpen back in, to ensure that the new line character is accounted for. This will be followed for now, though it is admittedly inelegant.


   thereIsAnUnorderedListOpen := false
   //...
   
   case '\n':
       //...
          if thereIsAnUnorderedListOpen {
             sb.WriteRune('\n')
             thereIsAnUnorderedListOpen = false
          }
   
          sb.WriteString("<p>")
   //...
   
   case '–':
       if lastCharacterWasANewLine {
          addUnorderedList(br, &sb)
          thereIsAnUnorderedListOpen = true
          lastCharacterWasANewLine = true
   //...
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Tables

To start with, the table logic can be extracted out as–is.


   case '|':
       addTable(br, &sb)
   
   //...
   
   func addTable(br *bytes.Reader, sb *strings.Builder) {
       var err error
       sb.WriteString("<table class=\"table is–hoverable\">")
       sb.WriteRune('\n')
       sb.WriteString("<thead>")
       sb.WriteRune('\n')
       sb.WriteString("<tr>")
       sb.WriteRune('\n')
       sb.WriteString("<th>")
   
       var nextR rune
       for nextR != '\n' {
          nextR, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == '|' {
             nextR, _, err = br.ReadRune()
             if err == io.EOF {
                sb.WriteString("</th>")
                sb.WriteRune('\n')
                break
             }
             if err != nil {
                log.Fatal("unable to read next rune:", err)
             }
             if nextR == '\n' {
                sb.WriteString("</th>")
                sb.WriteRune('\n')
                break
             }
             err = br.UnreadRune()
             if err != nil {
                log.Fatal("unable to unread rune:", err)
             }
             sb.WriteString("</th>")
             sb.WriteRune('\n')
             sb.WriteString("<th>")
          } else {
             addRuneOrHTMLEntity(nextR, sb)
          }
       }
       sb.WriteString("</tr>")
       sb.WriteRune('\n')
       sb.WriteString("</thead>")
       sb.WriteRune('\n')
   
       var afterR rune
       for afterR != '\n' {
          afterR, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
       }
   
       sb.WriteString("<tbody>")
       sb.WriteRune('\n')
   
       nextR, _, err = br.ReadRune()
       if err == io.EOF {
          sb.WriteString("</tbody>")
          sb.WriteRune('\n')
          sb.WriteString("</table>")
          return
       }
       if err != nil {
          log.Fatal("unable to read next rune:", err)
       }
       if nextR == '|' {
          sb.WriteString("<tr>")
          sb.WriteRune('\n')
          sb.WriteString("<td>")
   
          numOfConsecutiveNewLines := 0
   
          for numOfConsecutiveNewLines < 2 {
             nextR, _, err = br.ReadRune()
             if err == io.EOF {
                break
             }
             if err != nil {
                log.Fatal("unable to read next rune:", err)
             }
             if nextR == '\n' {
                sb.WriteString("</tr>")
                sb.WriteRune('\n')
                numOfConsecutiveNewLines++
   
             } else {
                if numOfConsecutiveNewLines == 1 {
                   sb.WriteString("<tr>")
                   sb.WriteRune('\n')
                   sb.WriteString("<td>")
   
                } else {
                   if nextR == '|' {
                      sb.WriteString("</td>")
   
                      nextR, _, err = br.ReadRune()
                      if err == io.EOF {
                         sb.WriteRune('\n')
                         sb.WriteString("</tr>")
                         sb.WriteRune('\n')
                         break
                      }
                      if err != nil {
                         log.Fatal("unable to read next rune:", err)
                      }
                      sb.WriteRune('\n')
                      if nextR != '\n' {
                         sb.WriteString("<td>")
                      }
                      err = br.UnreadRune()
                      if err != nil {
                         log.Fatal("unable to unread rune:", err)
                      }
   
                   } else {
                      addRuneOrHTMLEntity(nextR, sb)
                   }
                }
                numOfConsecutiveNewLines = 0
             }
          }
       }
   
       sb.WriteString("</tbody>")
       sb.WriteRune('\n')
       sb.WriteString("</table>")
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

This can further be broken down to have the header section and each table row be added in their own functions.


   func addTable(br *bytes.Reader, sb *strings.Builder) {
       sb.WriteString("<table class=\"table is–hoverable\">")
       sb.WriteRune('\n')
   
       addTableHeader(br, sb)
       skipTableHeaderLine(br)
       addTableBody(br, sb)
   
       sb.WriteString("</table>")
   }
   
   func addTableHeader(br *bytes.Reader, sb *strings.Builder) {
       sb.WriteString("<thead>")
       sb.WriteRune('\n')
       sb.WriteString("<tr>")
       sb.WriteRune('\n')
       sb.WriteString("<th>")
   
       var nextR rune
       var err error
       for nextR != '\n' {
          nextR, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == '|' {
             nextR, _, err = br.ReadRune()
             if err == io.EOF {
                sb.WriteString("</th>")
                sb.WriteRune('\n')
                break
             }
             if err != nil {
                log.Fatal("unable to read next rune:", err)
             }
             if nextR == '\n' {
                sb.WriteString("</th>")
                sb.WriteRune('\n')
                break
             }
             err = br.UnreadRune()
             if err != nil {
                log.Fatal("unable to unread rune:", err)
             }
             sb.WriteString("</th>")
             sb.WriteRune('\n')
             sb.WriteString("<th>")
   
          } else {
             addRuneOrHTMLEntity(nextR, sb)
          }
       }
       sb.WriteString("</tr>")
       sb.WriteRune('\n')
       sb.WriteString("</thead>")
       sb.WriteRune('\n')
   }
   
   func skipTableHeaderLine(br *bytes.Reader) {
       var afterR rune
       var err error
   
       for afterR != '\n' {
          afterR, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read rune:", err)
          }
       }
   }
   
   func addTableBody(br *bytes.Reader, sb *strings.Builder) {
       sb.WriteString("<tbody>")
       sb.WriteRune('\n')
   
       nextR, _, err := br.ReadRune()
       if err == io.EOF {
          sb.WriteString("</tbody>")
          sb.WriteRune('\n')
          sb.WriteString("</table>")
          return
       }
       if err != nil {
          log.Fatal("unable to read next rune:", err)
       }
       if nextR == '|' {
          sb.WriteString("<tr>")
          sb.WriteRune('\n')
          sb.WriteString("<td>")
   
          numOfConsecutiveNewLines := 0
   
          for numOfConsecutiveNewLines < 2 {
             nextR, _, err = br.ReadRune()
             if err == io.EOF {
                break
             }
             if err != nil {
                log.Fatal("unable to read next rune:", err)
             }
             if nextR == '\n' {
                sb.WriteString("</tr>")
                sb.WriteRune('\n')
                numOfConsecutiveNewLines++
   
             } else {
                if numOfConsecutiveNewLines == 1 {
                   sb.WriteString("<tr>")
                   sb.WriteRune('\n')
                   sb.WriteString("<td>")
   
                } else {
                   if nextR == '|' {
                      sb.WriteString("</td>")
   
                      nextR, _, err = br.ReadRune()
                      if err == io.EOF {
                         sb.WriteRune('\n')
                         sb.WriteString("</tr>")
                         sb.WriteRune('\n')
                         break
                      }
                      if err != nil {
                         log.Fatal("unable to read next rune:", err)
                      }
                      sb.WriteRune('\n')
                      if nextR != '\n' {
                         sb.WriteString("<td>")
                      }
                      err = br.UnreadRune()
                      if err != nil {
                         log.Fatal("unable to unread rune:", err)
                      }
   
                   } else {
                      addRuneOrHTMLEntity(nextR, sb)
                   }
                }
                numOfConsecutiveNewLines = 0
             }
          }
       }
   
       sb.WriteString("</tbody>")
       sb.WriteRune('\n')
   }
   

A quick sanity check:


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 35
           Test name: the head of a table should be added correctly
           expected:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> Table </th>
           <th> Head </th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table>
           but got:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> Table </th>
           <th> Head </th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table></table>
       main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 36
           Test name: the border line after the header of a table should be added correctly
           expected:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> Table </th>
           <th> Head </th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table>
           but got:
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> Table </th>
           <th> Head </th>
           </tr>
           </thead>
           <tbody>
           </tbody>
           </table></table>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

The </table> tag has doubled up. This is due to the io.EOF condition in addTableBody.


   if err == io.EOF {
       sb.WriteString("</tbody>")
       sb.WriteRune('\n')
       sb.WriteString("</table>")
       return
   }
   

Taking this out resolves the issue.


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Footnotes

To start with, I'll try to extract the footnote code out into its own function. However, I cannot pass a pointer to the footnote map along with it, as it isn't possible to index on a pointer to a map. Either the map could be copied into the function each time and the function could return an updated map, or the map can be moved into a higher scope and thus be available to both functions by default. For now, I've opted for the latter.


   var imageDirectoryName = "/directory_name"
   var footnoteNumberMap = map[int]int{}
   
   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
      //...
      case '[':
       addFootNote(br, &sb, &inlineFootnoteNumber)
       lastCharacterWasANewLine = false
       //...
   }
   
   func addFootNote(br *bytes.Reader, sb *strings.Builder, inlineFootnoteNumber *int) {
       inTextFootnoteNumber := strings.Builder{}
       for {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read next rune:", err)
          }
          if nextR == ']' {
   
             nextR, _, err = br.ReadRune()
             if err == io.EOF {
                sb.WriteString(
                   "<a id=\"footnote–anchor–" +
                      strconv.Itoa(*inlineFootnoteNumber) +
                      "\" href=\"#footnote–" +
                      strconv.Itoa(*inlineFootnoteNumber) +
                      "\">[" + strconv.Itoa(*inlineFootnoteNumber) +
                      "]</a>",
                )
                break
             }
             if err != nil {
                log.Fatal("unable to read next rune:", err)
             }
             if nextR == ':' {
                footnoteNumber, err := strconv.Atoi(inTextFootnoteNumber.String())
                if err != nil {
                   log.Fatal("unable to convert string to number:", err)
                }
   
                sb.WriteString(
                   "<p id=\"footnote–" +
                      strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
                      "\">\n<a href=\"#footnote–anchor–" +
                      strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
                      "\">[" +
                      strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
                      "]</a>",
                )
                sb.WriteRune('\n')
   
                for nextR != '\n' {
                   nextR, _, err = br.ReadRune()
                   if err == io.EOF {
                      sb.WriteRune('\n')
                      break
                   }
                   if err != nil {
                      log.Fatal("unable to read next rune:", err)
                   }
   
                   addRuneOrHTMLEntity(nextR, sb)
                }
   
                sb.WriteString("</p>")
                if err == io.EOF {
                   break
                }
                sb.WriteRune('\n')
   
             } else {
                sb.WriteString(
                   "<a id=\"footnote–anchor–" +
                      strconv.Itoa(*inlineFootnoteNumber) +
                      "\" href=\"#footnote–" +
                      strconv.Itoa(*inlineFootnoteNumber) +
                      "\">[" + strconv.Itoa(*inlineFootnoteNumber) +
                      "]</a>",
                )
   
                footnoteOriginalNumber, err := strconv.Atoi(inTextFootnoteNumber.String())
                if err != nil {
                   log.Fatal("unable to convert string to number:", err)
                }
                footnoteNumberMap[footnoteOriginalNumber] = *inlineFootnoteNumber
   
                err = br.UnreadRune()
                if err != nil {
                   log.Fatal("unable to unread rune:", err)
                }
             }
   
             break
          } else if nextR == '^' {
             *inlineFootnoteNumber++
   
          } else if nextR != '^' {
             inTextFootnoteNumber.WriteRune(nextR)
          }
       }
   }
   

Italics and Bold

The logic for bold and italics is currently spread across two cases.


   case '*':
       countAsterisks++
       lastCharacterWasANewLine = false
   //...
   default:
       lastCharacterWasANewLine = false
   
       if countAsterisks > 0 {
          if thereIsAnItalicsOrBoldTagToClose {
             if countAsterisks == 1 {
                sb.WriteString("</i>")
             } else if countAsterisks == 2 {
                sb.WriteString("</b>")
             } else if countAsterisks == 3 {
                sb.WriteString("</b></i>")
             }
             countAsterisks = 0
             thereIsAnItalicsOrBoldTagToClose = false
   
          } else {
             if countAsterisks == 1 {
                sb.WriteString("<i>")
             } else if countAsterisks == 2 {
                sb.WriteString("<b>")
             } else if countAsterisks == 3 {
                sb.WriteString("<i><b>")
             }
             countAsterisks = 0
             thereIsAnItalicsOrBoldTagToClose = true
          }
       }
   
       sb.WriteRune(r)
   }
   

This can all be put into the same function. Whilst reading ahead, I assume that any characters between the tags are to be added as–is or replaced with HTML entities, and that the only asterisks seen are those which mark the closing of the tags.


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
      //...
      case '*':
          addItalicsAndOrBoldTags(br, &sb)
          lastCharacterWasANewLine = false
      //...
      default:
          lastCharacterWasANewLine = false
          sb.WriteRune(r)
       //...
   }
   //...
   
   func addItalicsAndOrBoldTags(br *bytes.Reader, sb *strings.Builder) {
       asteriskCount := 1
       asteriskCountNeededToCloseTags := 0
       stillCountingAsterisks := true
       italicsOrBoldTagOpen := false
   
       for {
          nextR, _, err := br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read ahead by one rune when adding italics or bold tags:", err)
          }
   
          if nextR == '*' {
             if stillCountingAsterisks {
                asteriskCount++
   
                if asteriskCount == asteriskCountNeededToCloseTags {
                   break
                }
   
             } else {
                addRuneOrHTMLEntity(nextR, sb)
             }
          }
   
          if nextR != '*' {
             if !italicsOrBoldTagOpen {
                stillCountingAsterisks = false
   
                switch asteriskCount {
                case 1:
                   sb.WriteString("<i>")
                   asteriskCountNeededToCloseTags = 1
                case 2:
                   sb.WriteString("<b>")
                   asteriskCountNeededToCloseTags = 2
                case 3:
                   sb.WriteString("<i><b>")
                   asteriskCountNeededToCloseTags = 3
                }
                asteriskCount = 0
   
                addRuneOrHTMLEntity(nextR, sb)
                italicsOrBoldTagOpen = true
   
             } else {
                stillCountingAsterisks = true
                addRuneOrHTMLEntity(nextR, sb)
             }
          }
       }
   
       switch asteriskCount {
       case 1:
          sb.WriteString("</i>")
       case 2:
          sb.WriteString("</b>")
       case 3:
          sb.WriteString("</b></i>")
       }
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

These assumptions can be broken by adding the following two tests. Here, I will skip implementing these for now and they are only highlighted for awareness.


   {
       name:   "italics tags should correctly surround text with an '*' in it which has spaces either side",
       input:  "*This text contains * an asterisk.*",
       output: "<i>This text contains * an asterisk.</i>",
   },
   {
       name:   "a solitary '*' at the end should not create an italics tag",
       input:  "*This text contains* an asterisk.*",
       output: "<i>This text contains</i> an asterisk.*",
   },
   

A Small Test File

Now to see if passing all of these small tests translates into a small file. There are a few extensions included in the below (such as allowing a list to have italicised or emboldened text). Let's see where it breaks.


   # Introduction
   
   ## A Small File
   
   This is a *small* file. It contains – neigh – requires the program to correctly translate a variety of different Obsidian Markdown elements into the HTML elements I want.
   
   ![[image_name.png]]
   
   For example:
   
   – paragraphs[^1]
   – "0 < 1"
   – "2 > 1"
   – **and**
   – ***headings***
   – `Code blocks`
   
   ```Pseudocode
   fn removeCharacterFromList(remList list, charToRemove char) list {
       match remList {
           case x::[]:
               match x {
                   charToRemove: []
                   _: x
               }
           case x::xs:
               match x {
                   charToRemove: removeCharacterFromList(xs, charToRemove)
                   _: x::removeCharacterFromList(xs, charToRemove)
               }
       }
   }
   
   removeCharacterFromList(['a', 'b', 'c'], 'a')
   ```
   
   ## A table conclusion
   
   Another footnote.[^2]
   
   | A table | must have | columns |
   |––|––|––|
   | and rows. | which may have an arbitrary amount of content | |
   
   [^1]: With footnotes!
   [^2]: Pseudocode.
   


   <h1> Introduction</h1>
   <p>
   </h2> A Small File</h2>
   </p>
   <p>
   This is a <i>small</i> file. It contains – neigh – requires the program to correctly translate a variety of different Obsidian Markdown elements into the HTML elements I want.
   </p>
   <p>
   <figure class="image">
   <img src="/directory_name/image_name.png">
   </figure>
   </p>
   <p>
   For example:
   </p>
   <p>
   <ul>
   <li> paragraphs<a id="footnote–anchor–1" href="#footnote–1">[1]</a></li>
   <li> "0 < 1"</li>
   <li> "2 > 1"</li>
   <li> <b>and</b></li>
   <li> <i><b>headings</b></i></li>
   <li> <code>Code blocks</code></li>
   </ul>
   </p>
   <p>
   <pre><code>
   fn removeCharacterFromList(remList list, charToRemove char) list {
       match remList {
           case x::[]:
               match x {
                   charToRemove: []
                   _: x
               }
           case x::xs:
               match x {
                   charToRemove: removeCharacterFromList(xs, charToRemove)
                   _: x::removeCharacterFromList(xs, charToRemove)
               }
       }
   }
   
   removeCharacterFromList(['a', 'b', 'c'], 'a')
   </code></pre>
   </p>
   <p>
   <h2> A table conclusion</h2>
   </p>
   <p>
   Another footnote.<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
   </p>
   <p>
   <table class="table is–hoverable">
   <thead>
   <tr>
   <th> A table </th>
   <th> must have </th>
   <th> columns </th>
   </tr>
   </thead>
   <tbody>
   <tr>
   <td> and rows. </td>
   <td> which may have an arbitrary amount of content </td>
   <td> </td>
   </tr>
   </tbody>
   </table>
   </p>
   <p id="footnote–1">
   <a href="#footnote–anchor–1">[1]</a>
    With footnotes!
   </p>
   <p id="footnote–2">
   <a href="#footnote–anchor–2">[2]</a>
    Pseudocode.
   </p>
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:256: TestConvertMarkdownFileToBlogHTML test number: 43
           Test name: integration test: a small file
           expected:
           #...
           but got:
           <h1> Introduction</h1>
           <p>
           <h2> A Small File</h2>
           </p><p>
   
           </p><p>
           This is a <i>small</i> file. It contains – neigh – requires the program to correctly translate a variety of different Obsidian Markdown elements into the HTML elements I want.
           </p>
           <p>
           <figure class="image">
           <img src="/directory_name/image_name.png
   
           For example:
   
           – paragraphs^1
           – "0 < 1"
           – "2 > 1"
           – **and**
           – ***headings***
           – `Code blocks`
   
           ```Pseudocode
           fn removeCharacterFromList(remList list, charToRemove char) list {
               match remList {
                   case x:::
                       match x {
                           charToRemove:
                           _: x
                       }
                   case x::xs:
                       match x {
                           charToRemove: removeCharacterFromList(xs, charToRemove)
                           _: x::removeCharacterFromList(xs, charToRemove)
                       }
               }
           }
   
           removeCharacterFromList('a', 'b', 'c', 'a')
           ```
   
           ## A table conclusion
   
           Another footnote.^2
   
           | A table | must have | columns |
           |––|––|––|
           | and rows. | which may have an arbitrary amount of content | |
   
           ^1: With footnotes!
           ^2: Pseudocode.">
           </figure>
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Well, something is going wrong with the image tag.


   sb.WriteString("<img src=\"" + imageDirectoryName + "/" + imageNameAndExtension.String() + "\">")
   

The imageNameAndExtension variable is reading until the end. Let's add an additional way to break out of the loop.


   for nextR != '\n' {
       nextR, _, err := br.ReadRune()
       if err == io.EOF {
          break
       }
       if err != nil {
          log.Fatal("unable to read rune:", err)
       }
       if nextR != '[' && nextR != ']' {
          imageNameAndExtension.WriteRune(nextR)
       }
       if nextR == ']' {
          _, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read last ] of an image:", err)
          }
          break
       }
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:256: TestConvertMarkdownFileToBlogHTML test number: 43
           Test name: integration test: a small file
           expected:
           #...
           but got:
           <h1> Introduction</h1>
           <p>
           <h2> A Small File</h2>
           </p><p>
   
           </p><p>
           This is a <i>small</i> file. It contains – neigh – requires the program to correctly translate a variety of different Obsidian Markdown elements into the HTML elements I want.
           </p>
           <p>
           <figure class="image">
           <img src="/directory_name/image_name.png">
           </figure>
           </p><p>
   
           </p><p>
           For example:
           </p>
           <p>
           <ul>
           <li> paragraphs[^1]</li>
           <li> "0 < 1"</li>
           <li> "2 > 1"</li>
           <li> **and**</li>
           <li> ***headings***</li>
           <li> Code blocks</li>
           </ul>
           </p>
           <p>
           <pre><code>
           fn removeCharacterFromList(remList list, charToRemove char) list {
               match remList {
                   case x::[]:
                       match x {
                           charToRemove: []
                           _: x
                       }
                   case x::xs:
                       match x {
                           charToRemove: removeCharacterFromList(xs, charToRemove)
                           _: x::removeCharacterFromList(xs, charToRemove)
                       }
               }
           }
   
           removeCharacterFromList(['a', 'b', 'c'], 'a')
           </code></pre>
           </p>
           <p>
           <h2> A table conclusion</h2>
           </p><p>
   
           </p><p>
           Another footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           <p>
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> A table </th>
   
           <th> must have </th>
           <th> columns </th>
   
           </tr>
           </thead>
           <tbody>
           <tr>
           <td> and rows. </td>
   
           <td> which may have an arbitrary amount of content </td>
           <td> </td>
           </tr>
           </tr>
           </tbody>
           </table><p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            With footnotes!
           </p>
           <p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            Pseudocode.
           </p>
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

That's not too bad. The unordered list, which contains a fair number of untested combinations, is the most egregious failure in this test case. There are some additional paragraph tags for new lines, which seem a little inconsistent.

Let's start with the unordered list. I'll extract out the main part to be tested from the last test, and comment it out for the moment.


   {
       name:   "an unordered list may contain italics tags, bold tags, and inline code blocks",
       input:  "For example:\n\n– paragraphs[^1]\n– \"0 < 1\"\n– \"2 > 1\"\n– **and**\n– ***headings***\n– `Code blocks`",
       output: "For example:\n<p>\n<ul>\n<li> paragraphs<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a></li>\n<li> "0 < 1"</li>\n<li> "2 > 1"</li>\n<li> <b>and</b></li>\n<li> <i><b>headings</b></i></li>\n<li> <code>Code blocks</code></li>\n</ul>\n</p>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:261: TestConvertMarkdownFileToBlogHTML test number: 43
           Test name: an unordered list may contain italics tags, bold tags, and inline code blocks
           expected:
           For example:
           <p>
           <ul>
           <li> paragraphs<a id="footnote–anchor–1" href="#footnote–1">[1]</a></li>
           <li> "0 < 1"</li>
           <li> "2 > 1"</li>
           <li> <b>and</b></li>
           <li> <i><b>headings</b></i></li>
           <li> <code>Code blocks</code></li>
           </ul>
           </p>
           but got:
           For example:
           <p>
           <ul>
           <li> paragraphs[^1]</li>
           <li> "0 < 1"</li>
           <li> "2 > 1"</li>
           <li> **and**</li>
           <li> ***headings***</li>
           <li> `Code blocks`</li>
           </ul>
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Superficially, this should be straightforward now that the footnote, italics, and code block logic are all contained in their own functions. The inlineFootnoteNumber needs to be accessible to this function as well as the the others. I'll move it into the outer scope. Note that this still needs to be initialised to 0 within convertMarkdownFileToBlogHTML to reset it between tests.


   var imageDirectoryName = "/directory_name"
   var footnoteNumberMap = map[int]int{}
   var inlineFootnoteNumber int
   
   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
      //...
       inlineFootnoteNumber = 0
      //...
   }
   //...
   
   func addFootNote(br *bytes.Reader, sb *strings.Builder) {
      //...
   }
   
   func addUnorderedList(br *bytes.Reader, sb *strings.Builder) {
       //...
   
       for {
          nextR, _, err := br.ReadRune()
          //...
          } else if nextR == '[' {
             addFootNote(br, sb, inlineFootnoteNumber)
   
          } else if nextR == '*' {
             addItalicsAndOrBoldTags(br, sb)
   
          } else if nextR == '`' {
             addCodeBlock(br, sb)
   
          } else {
             addRuneOrHTMLEntity(nextR, sb)
             lastCharacterWasANewLine = false
          }
          //...
   }
   

It turns out it was.


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now for the additional paragraph tags. These arise after images, and after h2 headings. We'll add test cases for each of these.


   # Introduction
   
   ![[image_name.png]]
   
   For example:
   


   <h1> Introduction</h1>
   <p>
   <figure class="image">
   <img src="/directory_name/image_name.png">
   </figure>
   </p>
   <p>
   For example:
   </p>
   

The most likely issue to cause this is an additional or lost new line character. Back in the addimageTags function, we read until and including the \n and write it into a string buffer. Let's read the additional \n for the empty line.


   for nextR != '\n' {
       nextR, _, err := br.ReadRune()
       if err == io.EOF {
          break
       }
       if err != nil {
          log.Fatal("unable to read rune:", err)
       }
       if nextR != '[' && nextR != ']' {
          imageNameAndExtension.WriteRune(nextR)
       }
       if nextR == ']' {
          _, _, err = br.ReadRune()
          if err == io.EOF {
             break
          }
          if err != nil {
             log.Fatal("unable to read last ] of an image:", err)
          }
          break
       }
   }
   _, _, err = br.ReadRune()
   if err != nil && err != io.EOF {
       log.Fatal("unable to read rune:", err)
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:271: TestConvertMarkdownFileToBlogHTML test number: 44
           Test name: paragraph tags should be added correctly after an image is added
           expected:
           <h1> Introduction</h1>
           <p>
           <figure class="image">
           <img src="/directory_name/image_name.png">
           </figure>
           </p>
           <p>
           For example:
           </p>
           but got:
           <h1> Introduction</h1>
           <p>
           <figure class="image">
           <img src="/directory_name/image_name.png">
           </figure>
           </p><p>
           For example:
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Well, that produces the correct number of paragraph tags, but now there isn't a new line character to write between them. A workable, though inelegant, solution would be to repurpose the thereIsAnUnorderedListOpen variable as a more general addNewLineCharBeforeOpeningPara.


   func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
       //...
       addNewLineCharBeforeOpeningPara := false
       //...
   
       for {
          //...
          case '!':
             addImageTags(br, &sb)
             lastCharacterWasANewLine = true
             addNewLineCharBeforeOpeningPara = true
          //...
   }
   


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Now to fix the paragraph tags after an h2 header.


   {
       name:   "paragraph tags should be added correctly after an h2 header",
       input:  "# Introduction\n\n## A Small File\n\nThis is a *small* file.",
       output: "<h1> Introduction</h1>\n<p>\n<h2> A Small File</h2>\n</p>\n<p>\nThis is a <i>small</i> file.\n</p>",
   },
   


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:276: TestConvertMarkdownFileToBlogHTML test number: 45
           Test name: paragraph tags should be added correctly after an h2 header
           expected:
           <h1> Introduction</h1>
           <p>
           <h2> A Small File</h2>
           </p>
           <p>
           This is a <i>small</i> file.
           </p>
           but got:
           <h1> Introduction</h1>
           <p>
           <h2> A Small File</h2>
           </p><p>
   
           </p><p>
           This is a <i>small</i> file.
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

Looking back over the cases, the case '#': does not update the lastCharacterWasANewLine variable.


   case '#':
       addHeaderTagsOrPoundRune(br, &sb, lastCharacterWasANewLine)
       lastCharacterWasANewLine = true
   

Doing so breaks another five tests.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:276: TestConvertMarkdownFileToBlogHTML test number: 27
           Test name: a footnote in a paragraph should have paragraph and anchor tags added correctly
           expected:
           <h1> This is a heading</h1>
           <p>
           Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
           but got:
           <h1> This is a heading</h1><p>
   
           </p><p>
           Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
           </p>
       main_test.go:276: TestConvertMarkdownFileToBlogHTML test number: 28
       #...
   

There's an error in addHeaderTags. Here, upon hitting the \n character, the function unreads a rune. It should only have done this if it read the second new line character as well.


   func addHeaderTags(br *bytes.Reader, sb *strings.Builder) {
       //...
   
       for nextR != '\n' {
          //...
          } else if nextR == '\n' {
             err := br.UnreadRune()
             if err != nil {
                log.Fatal("unable to unread rune when adding header tags:", err)
             }
             break
   
         }
         //...
       }
   
       sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
   }
   

Removing this, so that the function simply breaks if \n is found leads to the current tests passing:


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Time to uncomment and rerun the small file test.


   === RUN   TestConvertMarkdownFileToBlogHTML
       main_test.go:276: TestConvertMarkdownFileToBlogHTML test number: 43
           Test name: integration test: a small file
           #...
           but got:
           #...
           <p>
           <table class="table is–hoverable">
           <thead>
           <tr>
           <th> A table </th>
   
           <th> must have </th>
           <th> columns </th>
   
           </tr>
           </thead>
           <tbody>
           <tr>
           <td> and rows. </td>
           <td> which may have an arbitrary amount of content </td>
           <td> </td>
           </tr>
           </tr>
           </tbody>
           </table><p id="footnote–1">
           <a href="#footnote–anchor–1">[1]</a>
            With footnotes!
           </p>
           <p id="footnote–2">
           <a href="#footnote–anchor–2">[2]</a>
            Pseudocode.
           </p>
           </p>
   ––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
   

It still fails. The paragraph closing tag, which should wrap the table, instead also wraps the footnotes at the end.

This was another case where lastCharacterWasANewLine had not been set to true. Correcting this doesn't break any more cases, but doesn't solve the small file test.


   case '|':
       addTable(br, &sb)
       lastCharacterWasANewLine = true
   

The addTableBody function failed to unread the new line character after the table.


   func addTableBody(br *bytes.Reader, sb *strings.Builder) {
       //...
       if nextR == '|' {
          //...
          for numOfConsecutiveNewLines < 2 {
             //...
             if nextR == '\n' {
                sb.WriteString("</tr>")
                sb.WriteRune('\n')
                numOfConsecutiveNewLines++
   
                if numOfConsecutiveNewLines == 2 {
                   err = br.UnreadRune()
                   if err != nil {
                      log.Fatal("unable to unread new line character after table:", err)
                   }
                   break
                }
   

Adding these two parts in closes the paragraph tag immediately after the table. However, it does not solve the test case as a whole. There is now an additional </tr> tag to remove.

Removing the </tr> tag can be solved by correcting the logic around the number of consecutive new line characters in addTableBody.


   if nextR == '\n' {
       numOfConsecutiveNewLines++
   
       if numOfConsecutiveNewLines < 2 {
          sb.WriteString("</tr>")
          sb.WriteRune('\n')
       }
   
       if numOfConsecutiveNewLines == 2 {
          err = br.UnreadRune()
          if err != nil {
             log.Fatal("unable to unread new line character after table:", err)
          }
          break
       }
   

Now the small test file passes.


   === RUN   TestConvertMarkdownFileToBlogHTML
   ––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
   PASS
   

Reading and Saving to a File

First, I'll create a tmp directory in the project, to store the program's output. I'll put an input.txt file in here, which the program will read from.


   – project
   | – main
     | – main.go
     | – main_test.go
   | – tmp
     | – input.txt
   

Then add a couple of functions to allow the program to read command line flags, create a byte reader, and save the results to a file. For now, I assume that the entire file can be read into memory. All carriage returns (\r), if they exist, are removed.


   func main() {
       pathName := os.Args
       br := getByteReadForFile(pathName[1])
       res := convertMarkdownFileToBlogHTML(br)
       saveToFile(res, pathName[2])
   }
   
   func getByteReadForFile(pathAndFilename string) *bytes.Reader {
       bytesReadIn, err := os.ReadFile(pathAndFilename)
       if err != nil {
          log.Fatal("unable to find file:", err)
       }
   
      // replace carriage returns
      bytesReadIn = bytes.ReplaceAll(bytesReadIn, []byte{'\r'}, []byte{})
   
       return bytes.NewReader(bytesReadIn)
   }
   
   // See: https://gobyexample.com/writing–files
   func saveToFile(res string, outputPathAndFileName string) {
       f, err := os.Create(outputPathAndFileName)
       if err != nil {
          log.Fatal("unable to create file:", err)
       }
       defer f.Close()
   
       numBytesWritten, err := f.WriteString(res)
       if err != nil {
          log.Fatal("error when writing to file:", err)
       }
       fmt.Printf("wrote %d bytes to file", numBytesWritten)
   
       f.Sync()
   }
   

On the command line, whilst in the project directory, run the following command:


   go run .\main\main.go .\tmp\input.txt .\tmp\output.html
   

This will cause the program to read whatever exists in input.txt and write it out to a file called output.html. If the latter doesn't exist yet, then it is created.

Let's update this to include the update value for the imageDirectoryName variable.


   func main() {
       pathName := os.Args
       br := getByteReadForFile(pathName[1])
       res := convertMarkdownFileToBlogHTML(br, pathName[3])
       saveToFile(res, pathName[2])
   }
   
   func convertMarkdownFileToBlogHTML(br *bytes.Reader, newImageDirectoryName string) string {
       //...
       imageDirectoryName = newImageDirectoryName
       //...
   }
   

This does require an update to main_test.go. We need to feed in the "/directory_name" string, as it is used in the current suite of tests.


   for i, tst := range testCases {
       res := convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(tst.input)), "/directory_name")
       if res != tst.output {
          t.Errorf(
             "TestConvertMarkdownFileToBlogHTML test number: %d \nTest name: %s \nexpected: \n%s \nbut got: \n%s",
             i, tst.name, tst.output, res,
          )
       }
   }
   


   go run .\main\main.go .\tmp\input.txt .\tmp\output.html /the_image_directory_path
   

Putting the small file test input into input.txt results in an HTML file. When dropped in a browser, it shows the following.

Forty–Six Tests Later

If anything, it is a little surprising that only forty–six tests were needed to create a program which could convert a small markdown file into HTML. Taking the test–first approach made this relatively quick to develop, however, and ensured that issues which arose during refactoring were all removed. It has also worked well enough for this post to be translated with it!

There were a few missing parts from the initial, slapdash grammar. For example, images were missing. These omissions were identified through creating test cases and added in without too much difficulty. A few potential extensions were not included, and have been listed below.

There should be an easier way to do this, however. There will be a follow–up post should I get it working.

Optional Extensions

A few extensions one could add include:

  • hyperlinks,
  • recursively allowing bold and italics tags within each other, as well as supporting underscores to indicate bold / italicised text,
  • allowing footnotes to be included in tables,
  • allowing footnotes to be included in code blocks,
  • creating a contents list with the header values.

Code

All of the above code is available on Github

A previous version which used a linked list is also available on GitHub

Update

This post has been edited to correct a broken link.

Footnotes

[1] Li, Shida, Erica Xu, Steph Ango, Liam Cain, Johannes Theiner, Matthew Meyers, Tony Grosinger, and Rebbecca Bishop. 'Obsidian', 2024. https://obsidian.md/.

[2] Soares dos Santos, Estevão. 'Showdown', 2019. https://showdownjs.com/.

[3] Thomas, Jeremy. 'Bulma.Io', 2024. https://bulma.io/.

[4] Wikipedia. 'Test–Driven Development', 28 January 2024. https://en.wikipedia.org/wiki/Test–driven_development; Beck, Kent. _Test–Driven Development: By Example_. The Addison–Wesley Signature Series. Boston: Addison–Wesley, 2003; Siddiqui, Saleem. _Learning Test–Driven Development: A Polyglot Guide to Writing Uncluttered Code_. Sebastopol, CA: O'Reilly Media, Inc, USA, 2021.

[5] Li, Shida, Erica Xu, Steph Ango, Liam Cain, Johannes Theiner, Matthew Meyers, Tony Grosinger, and Rebbecca Bishop. 'Basic Formatting Syntax', 2024. https://help.obsidian.md/Editing+and+formatting/Basic+formatting+syntax.

[6] Mozilla Corporation. 'CRLF', 2024. https://developer.mozilla.org/en–US/docs/Glossary/CRLF.

[7] 'Strings', 9 January 2024. https://pkg.go.dev/strings#Builder.

[8] dh1tw. 'Append Function Overwrites Existing Data in Slice', 31 October 2016. https://stackoverflow.com/questions/40343987/append–function–overwrites–existing–data–in–slice.

[9] Mozilla Corporation. 'Entity', 2024. https://developer.mozilla.org/en–US/docs/Glossary/Entity.

[10] Thomas, Jeremy. 'Content', 2024. https://bulma.io/documentation/elements/content/.