Introduction
Currently, I write these blog posts in Obsidian, owing to its clean interface and the relatively simple way in which images and tables can be created in markdown.[1] However, to display these posts on this blog this markdown needs to be replaced by HTML.
Doing this by hand is tedious, error–prone, and not an efficient use of time. There are some options which already exist, such as Showdown in JavaScript.[2] However, I wanted the HTML to contain some class names so that they can take advantage of the CSS options provided by Bulma.[3]
Here, I follow a test–driven approach[4] to writing a simple program to take in a markdown file and convert it into HTML. This reads the input token–by–token and determines what rule to apply based on what tokens it sees. In a later post I will present a better way to do this with an existing tool.
Throughout the post, I highlight some instances where the initial grammar is incomplete, or not sufficiently understood, and some of the changes this brings about.
A First Pass at The Grammar
The grammar is a translation guide, stating how rules in one system are expressed in another.
The markdown rules followed by Obsidian are all listed on their website.[5] Those for Bulma are also listed on their website. Here, I am only concerned with a subset of the rules needed to display certain elements of text.
To start, I scanned posts which I've already written and picked out some rules.
Markdown | HTML |
---|---|
text; more text | text <p> more text </p> |
`text` | <code>text</code> |
```some_programming_lang text ``` | <code>text</code> |
| col name one | col name two | |-|-| | row contents one | row contents two | | <table class="table"> <thead> <tr> <th scope="col">col name one</th> <th scope="col">col name two</th> </tr> </thead> <tbody> <tr> <th scope="row">row contents one</th> <td>row contents two</td> </tr> </tbody> </table> |
# text | <h1>text</h1> |
## text | <h2>text</h2> |
[^1] | <a id="footnote–anchor–1" href="#footnote–1">[1]</a> |
[^1]: text | <p id="footnote–1"> <a href="#footnote–anchor–1">[1]</a> text </p> |
*text* | <i>text</i> |
**text** | <b>text</b> |
***text*** | <i><b>text</b></i> |
' | ' |
< | < |
> | > |
" | " |
– | – |
– One, – Two, – Three | <ul> <li> One,</li> <li> Two,</li> <li> Three.</li> <ul> |
All other characters | Add as–is |
There are a few assumptions made here, which are particular to the blog posts.
- I assume that all posts start with some type of heading. Therefore, whatever the first line is, it is not wrapped in <p></p> tags.
- At this point, I determined that the grammar was not going to be fully–recursive, but had not yet figured out what parts would be. For example, I do not currently allow tables to exist within other tables. Moreover, '#' characters in URLs and lists do not creating headings. However, there may have been other cases where a recursive relationship should be allowed.
- The footnote numbers may not be in the correct order in text. This may arise due when editing the file to include a footnote, resulting in a case where footnote 2 is before footnote 1. When output, however, they should be renumbered to be in the correct order.
These are subject to change, particularly as omissions (of which there are a couple) are drawn out by test cases.
Approach
I took a token–by–token approach, where the rule to be applied is figured out based on the token at hand. An alternative would have been to feed in a string and replace parts of it with string manipulation functions. I avoided this as I wanted to have some assurance that certain rules were not applied in particular areas. For example, only HTML entities should be replaced within code blocks.
A note on new lines.
Windows systems use '\r\n' to draw a new line, whilst Linux systems simply use '\n'.[6] To ensure that this was caught when a new file is loaded, I initially stored test cases in text files. Note however that when comparing the results, if one output uses carriage returns and the other uses new lines, the difference will not be shown in most string outputs. Code for loading these files and running the tests is included in the main_test.go
file on GitHub. In the below, I will show tests within the code rather than using files.
New lines between HTML tags are ignored when checking the output of the interpreter against the solution. Browsers are agnostic to whether there are new lines between HTML tags, and tests may end up failing due to an additional line character being included between HTML tags.
For code blocks, however, new line characters do matter. Here, I have included the new lines.
Aside: Possible error if using a string slice.
Note that here I've used a string builder rather than a slice of runes.[7] This is to avoid an issue where earlier runes in the slice can be overwritten.[8] The error does not always arise, but when it does it can be rather annoying to debug. In a different version of this project, I noticed the following when print debugging:
../test_files/plain_text_over_multiple_lines.txt
rune matcher []
rune matcher [T]
rune matcher [T h]
rune matcher [T h i]
rune matcher [T h i s]
]une matcher [T h i s
check if paragraph
]T h i s
next r 105 i
]une matcher [T h i s
i]e matcher [T h i s
i s]matcher [T h i s
]i s matcher [T h i s
check if paragraph
]i s i s
next r 115 s
]i s matcher [T h i s
s]s matcher [T h i s
s o]matcher [T h i s
s o m]tcher [T h i s
s o m e]her [T h i s
s o m e ]r [T h i s
One can either use a string builder, or create a linked list which recursively appends strings at each node together to avoid this. The former is lighter, and thus used here.
File Structure
The directory for this will be very simple:
– main
| – main.go
| – main_test.go
Single–Criteria Tests
An empty line
The simplest input file is an empty one, which should also return an empty string.
main_test.go
package main
import (
"bytes"
"testing")
func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
if convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(""))) != "" {
t.Error("expected an empty input string ")
}
}
Making main.go
:
package main
import (
"bytes"
)
func main() {}
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
return ""
}
Whilst on the command line, in the ./main
directory, one can run:
go test
To run the tests. Here, it passes with flying colours.
A line without markdown
Similarly, a single line of text without any markdown characters should also be returned as–is:
func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
if convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(""))) != "" {
t.Error("expected an empty input string ")
}
if convertMarkdownFileToBlogHTML(bytes.NewReader([]byte("This is a plain text file."))) != "This is a plain text file." {
t.Errorf("expected %s to be returned as–is", "This is a plain text file.")}
}
The immediately causes main.go
to fail:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:13: expected This is a plain text file. to be returned as–is
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
FAIL
This is easily fixed with a little shenanigan.
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
_, _, err := br.ReadRune()
if err == io.EOF {
return ""
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
return "This is a plain text file."
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Headers
A header on a single line should be returned in header tags. At this point, it is worth creating a 'test' struct to put the inputs and expected outputs into.
type testCase struct {
input string
output string
}
func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
testCases := []testCase{
{
input: "",
output: "",
},
{
input: "This is a plain text file.",
output: "This is a plain text file.",
},
{
input: "# This is an h1 header",
output: "<h1> This is an h1 header</h1>",
},
}
for i, tst := range testCases {
res := convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(tst.input)))
if res != tst.output {
t.Errorf(
"TestConvertMarkdownFileToBlogHTML test number: %d \nexpected: \n%s \nbut got: \n%s",
i, tst.output, res,
)
}
}
}
Back to failure mode:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:32: TestConvertMarkdownFileToBlogHTML test number: 2
expected:
<h1> This is an h1 header</h1>
but got:
This is a plain text file.
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Now one has to read at least the first rune. This is easy to add.
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
r, _, err := br.ReadRune()
if err == io.EOF {
return ""
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if r == '#' {
return "<h1> This is an h1 header</h1>"
}
return "This is a plain text file."
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
What about an h2
header?
func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
testCases := []testCase{
// ...
{
input: "## This is an h2 header",
output: "<h2> This is an h2 header</h2>",
},
}
for i, tst := range testCases {
res := convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(tst.input)))
if res != tst.output {
t.Errorf(
"TestConvertMarkdownFileToBlogHTML test number: %d \nexpected: \n%s \nbut got: \n%s",
i, tst.output, res,
)
}
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:36: TestConvertMarkdownFileToBlogHTML test number: 3
expected:
<h2> This is an h2 header</h2>
but got:
<h1> This is an h1 header</h1>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Now one has to read at least 16 runes. Let's try a solution which reads the whole line. There will be a string builder to update the value to be returned. Otherwise, a solution can still be hacked together.
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
sb := strings.Builder{}
headerCount := 0
for {
r, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if r == '#' {
if headerCount == 0 {
sb.WriteString("<h1>")
headerCount++
} else {
headerCount++
}
} else {
sb.WriteRune(r)
}
}
if headerCount > 0 {
sb.WriteString("</h1>")
}
res := sb.String()
if headerCount > 1 {
res = strings.ReplaceAll(res, "1", "2")
}
return res
}
An h3
test will require a slight adjustment:
//...
{
input: "### This is an h3 header",
output: "<h3> This is an h3 header</h3>",
},
//...
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:40: TestConvertMarkdownFileToBlogHTML test number: 4
expected:
<h3> This is an h3 header</h3>
but got:
<h2> This is an h3 header</h2>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
//...
if headerCount > 1 {
res = strings.ReplaceAll(res, "1", strconv.Itoa(headerCount))
}
//...
Now, what if there is one of more '#' in the middle of the header string?
//...
{
input: "### This is an ### h3 ### header",
output: "<h3> This is an ### h3 ### header</h3>",
},
//...
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:44: TestConvertMarkdownFileToBlogHTML test number: 5
expected:
<h3> This is an ### h3 ### header</h3>
but got:
<h9> This is an h3 header</h9>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Well, that failed. It failed in a strange way as well – the '3's in the header tags were replaced, but not the '3' in the middle of the string.
More condition checking and hack–ons will resolve this:
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
sb := strings.Builder{}
headerCount := 0
finishedCountingHeaderTagsForLine := false
for {
r, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if r == '#' {
if headerCount == 0 {
sb.WriteString("<h1>")
headerCount++
} else if !finishedCountingHeaderTagsForLine {
headerCount++
} else if finishedCountingHeaderTagsForLine {
sb.WriteRune('#')
}
} else if r == ' ' && !finishedCountingHeaderTagsForLine {
sb.WriteRune(' ')
finishedCountingHeaderTagsForLine = true
} else {
sb.WriteRune(r)
}
}
if headerCount > 0 {
sb.WriteString("</h1>")
}
res := sb.String()
if headerCount > 1 {
res = strings.ReplaceAll(res, "1", strconv.Itoa(headerCount))
}
return res
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Pause to refactor
The string of if
statements has the makings of a switch
statement. Let's try replacing it.
//...
for {
r, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
switch r {
case '#':
if headerCount == 0 {
sb.WriteString("<h1>")
headerCount++
} else if !finishedCountingHeaderTagsForLine {
headerCount++
} else if finishedCountingHeaderTagsForLine {
sb.WriteRune('#')
}
case ' ':
if !finishedCountingHeaderTagsForLine {
sb.WriteRune(' ')
finishedCountingHeaderTagsForLine = true
} else {
sb.WriteRune(r)
}
default:
sb.WriteRune(r)
}
}
//...
Which can further be simplified:
//...
switch r {
case '#':
if !finishedCountingHeaderTagsForLine {
headerCount++
} else if finishedCountingHeaderTagsForLine {
sb.WriteRune('#')
}
case ' ':
if headerCount > 0 && !finishedCountingHeaderTagsForLine {
sb.WriteString("<h" + strconv.Itoa(headerCount) + "> ")
finishedCountingHeaderTagsForLine = true
} else {
sb.WriteRune(r)
}
default:
sb.WriteRune(r)
}
//...
The hacked on strings.ReplaceAll()
function at the end can also be refactored out. When there is a new line character, the header is complete.
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
sb := strings.Builder{}
headerCount := 0
finishedCountingHeaderTagsForLine := false
for {
r, _, err := br.ReadRune()
if err == io.EOF {
if headerCount > 0 {
sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
}
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
switch r {
case '#':
if !finishedCountingHeaderTagsForLine {
headerCount++
} else if finishedCountingHeaderTagsForLine {
sb.WriteRune('#')
}
case ' ':
if headerCount > 0 && !finishedCountingHeaderTagsForLine {
sb.WriteString("<h" + strconv.Itoa(headerCount) + "> ")
finishedCountingHeaderTagsForLine = true
} else {
sb.WriteRune(r)
}
case '\n':
if headerCount > 0 {
sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
}
default:
sb.WriteRune(r)
}
}
return sb.String()
There is some duplicated code which also can be refactored out:
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
//...
for {
r, _, err := br.ReadRune()
if err == io.EOF {
if headerCount > 0 {
addClosingHeaderTag(sb, headerCount)
}
break
}
//...
case '\n':
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
}
//...
}
//...
}
func addClosingHeaderTag(sb *strings.Builder, headerCount int) {
sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
}
Note that a pointer to sb
has to be passed as the argument to addClosingHeaderTag
. If not, sb
will be copied onto the stack, resulting in the following error:
=== RUN TestConvertMarkdownFileToBlogHTML
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
panic: strings: illegal use of non–zero Builder copied by value [recovered]
panic: strings: illegal use of non–zero Builder copied by value
As a pointer, all runs well:
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
The tests can also be given names, to provide more feedback for debugging.
func TestConvertMarkdownFileToBlogHTML(t *testing.T) {
testCases := []testCase{
{
name: "an empty string should be returned as–is",
input: "",
output: "",
},
{
name: "a single line with no markdown character should be returned as–is",
input: "This is a plain text file.",
output: "This is a plain text file.",
},
//...
}
for i, tst := range testCases {
res := convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(tst.input)))
if res != tst.output {
t.Errorf(
"TestConvertMarkdownFileToBlogHTML test number: %d \nTest name: %s \nexpected: \n%s \nbut got: \n%s",
i, tst.name, tst.output, res,
)
}
}
}
Plain text over multiple lines
So far, all of the input text has been on a single line. It is worth checking for text over multiple lines now and for paragraphs.
Note that in the testCases
variable I have explicitly included \n
characters into the string, rather than having the string itself be over multiple lines. This is to avoid introducing a lot of whitespace runes between different lines when it shouldn't be there. It does, however, come at the cost of legibility. To alleviate this somewhat, I will show the input and output both as they should be in the files being read from and written to, and then in the test cases in the code.
The output for the below text over multiple lines will be the same as the input.
This
is
some simple text
which has been spread out
across multiple lines.
//...
{
name: "text across multiple lines with no markdown should be returned as–is",
input: "This\nis\nsome simple text\nwhich has been spread out\nacross multiple lines.",
output: "This\nis\nsome simple text\nwhich has been spread out\nacross multiple lines.",
},
//...
The tests fail again:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:48: TestConvertMarkdownFileToBlogHTML test number: 6
expected:
This
is
some simple text
which has been spread out
across multiple lines.
but got:
Thisissome simple textwhich has been spread outacross multiple lines.
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
A one–line fix is all that is needed:
case '\n':
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
}
sb.WriteRune('\n')
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Paragraphs
Now with paragraph tags:
This is a first line.
Paragraph one.
This is a first line.
<p>
Paragraph one.
</p>
{
name: "paragraph tags should be added if there is a blank line before a line of plain text",
input: "This is a first line.\n\nParagraph one.",
output: "This is a first line.\n<p>\nParagraph one.\n</p>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:61: TestConvertMarkdownFileToBlogHTML test number: 7
Test name: paragraph tags should be added if there is a blank line before a line of plain text
expected:
This is a first line.
<p>
Paragraph one.
</p>
but got:
This is a first line.
Paragraph one.
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
If there are two successive \n
characters, add a paragraph tag. At the next \n
, add a closing paragraph tag.
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
//...
lastCharacterWasANewLine := false
thereIsAParagraphToClose := false
for {
// ...
case '\n':
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
}
sb.WriteRune('\n')
if thereIsAParagraphToClose {
sb.WriteString("</p>")
thereIsAParagraphToClose = false
}
if lastCharacterWasANewLine {
sb.WriteString("<p>")
thereIsAParagraphToClose = true
} else {
lastCharacterWasANewLine = true
}
default:
lastCharacterWasANewLine = false
sb.WriteRune(r)
}
}
if thereIsAParagraphToClose {
sb.WriteString("</p>")
}
//...
}
Which fails, ironically enough, due to new line characters:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:61: TestConvertMarkdownFileToBlogHTML test number: 7
Test name: paragraph tags should be added if there is a blank line before a line of plain text
expected:
This is a first line.
<p>
Paragraph one.
</p>
but got:
This is a first line.
<p>Paragraph one.</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
A little rearranging resolves this:
case '\n':
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
}
if thereIsAParagraphToClose {
sb.WriteRune('\n')
sb.WriteString("</p>")
thereIsAParagraphToClose = false
}
if lastCharacterWasANewLine {
sb.WriteString("<p>")
thereIsAParagraphToClose = true
} else {
lastCharacterWasANewLine = true
}
sb.WriteRune('\n')
if thereIsAParagraphToClose {
sb.WriteRune('\n')
sb.WriteString("</p>")
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now what about two paragraphs?
This is a first line.
Paragraph one.
Paragraph two.
This is a first line.
<p>
Paragraph one.
</p>
<p>
Paragraph two.
</p>
{
name: "paragraph tags should be added if there is a blank line before a line of plain text for multiple paragraphs",
input: "This is a first line.\n\nParagraph one.\n\nParagraph two.",
output: "This is a first line.\n<p>\nParagraph one.\n</p>\n<p>\nParagraph two.\n</p>",
},
This passed without issue with the code as–is:
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Italics & Bold
Italics text is, seemingly, straightforward:
*italic text*
<i>italic text</i>
Simply added:
thereIsAnItalicsTagToClose := false
//...
case '*':
if thereIsAnItalicsTagToClose {
sb.WriteString("</i>")
thereIsAnItalicsTagToClose = false
} else {
sb.WriteString("<i>")
thereIsAnItalicsTagToClose = true
}
What about bold tags?
**bold text**
<b>bold text</b>
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:76: TestConvertMarkdownFileToBlogHTML test number: 10
Test name: bold tags should be added if a string is surrounded by '**'
expected:
<b>bold text</b>
but got:
<i>bold text</i>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Well. That was to be expected.
One can try the same trick used with counting headers.
countAsterisks := 0
thereIsAnItalicsOrBoldTagToClose := false
//...
case '*':
countAsterisks++
for {
default:
lastCharacterWasANewLine = false
if countAsterisks > 0 {
if thereIsAnItalicsOrBoldTagToClose {
if countAsterisks == 1 {
sb.WriteString("</i>")
} else if countAsterisks == 2 {
sb.WriteString("</b>")
}
countAsterisks = 0
thereIsAnItalicsOrBoldTagToClose = false
} else {
if countAsterisks == 1 {
sb.WriteString("<i>")
} else if countAsterisks == 2 {
sb.WriteString("<b>")
}
countAsterisks = 0
thereIsAnItalicsOrBoldTagToClose = true
}
}
}
// ...
if countAsterisks > 0 {
if countAsterisks == 1 {
sb.WriteString("</i>")
} else if countAsterisks == 2 {
sb.WriteString("</b>")
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
This passes, but is very disparate.
A quick add–on are cases of italicised an emboldened text?
***italic and bold text***
<i><b>italic and bold text</b></i>
if countAsterisks == 1 {
sb.WriteString("<i>")
} else if countAsterisks == 2 {
sb.WriteString("<b>")
} else if countAsterisks == 3 {
sb.WriteString("<i><b>")
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
There are some natural extensions to these test cases, such as having a lone asterisk within text between two other asterisks, bold text in the middle of italicised text or vice–versa, and so–forth. I will return to these trickier test cases later.
HTML Entities
HTML entities are used to display reserved characters in HTML code.[9] Currently, I assume that there is only a subset of interest.
This is a file ' which is <filled> with – HTML "entities" of interest.
{
name: "reserved characters should be substituted with html entities",
input: "This is a file ' which is <filled> with – HTML \"entities\" of interest.",
output: "This is a file ' which is <filled> with – HTML "entities" of interest.",
},
Each of these is simply a case of looking up and replacing a character. A hash map is perfect for this.
var htmlEntityMap = map[rune]string{
'\'': "'",
'<': "<",
'>': ">",
'"': """,
'–': "–",
}
Then in the for loop:
case '\'': // Rune needs to be escaped
sb.WriteString(htmlEntityMap[r])
case '<':
sb.WriteString(htmlEntityMap[r])
case '>':
sb.WriteString(htmlEntityMap[r])
case '"':
sb.WriteString(htmlEntityMap[r])
case '–':
sb.WriteString(htmlEntityMap[r])
The tests now pass again.
Code Blocks
There are two types of code blocks: inline and multi–line.
Here, one needs to decide on whether empty code blocs should be skipped, or not.
{
name: "an empty inline code block should be skipped",
input: "``",
output: "",
},
{
name: "an empty inline code block should be skipped",
input: "``",
output: "<code></code>",
},
Superficially, the latter seems easier to code – as soon as a '`' appears, open a code block. With the former, one needs to read ahead to check if there are one, three backquotes, or some other number of back quotes. On the flip–side, the first test case allows one to avoid adding unnecessary code block tags.
The context here means that it doesn't really matter if there are additional code blocks with nothing between them. This program processes text files in a one–off manner. It doesn't process audio nor video where this type of optimisation is required.
For now, I assume that the former test case should be followed.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:101: TestConvertMarkdownFileToBlogHTML test number: 13
Test name: an empty inline code block should be skipped
expected:
but got:
``
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
One can read ahead by a token. If it is also a backquote, don't insert any code block.
I assume here that all code blocks are closed, and that a code block does not open if a backquote is the final character.
thereIsACodeBlockOpen := false
//...
case '':
nextR, _, err := br.ReadRune()
if err == io.EOF {
if thereIsACodeBlockOpen {
sb.WriteString("</code>")
thereIsACodeBlockOpen = false
}
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '' {
continue
} else {
if thereIsACodeBlockOpen {
sb.WriteString("</code>")
thereIsACodeBlockOpen = false
} else {
sb.WriteString("<code>")
thereIsACodeBlockOpen = true
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now a code block with some content:
{
name: "plain text between a code block should be kept as–is",
input: "This is a simple inline code block",
output: "<code>This is a simple inline code block</code>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:121: TestConvertMarkdownFileToBlogHTML test number: 14
Test name: plain text between a code block should be kept as–is
expected:
<code>This is a simple inline code block</code>
but got:
<code>his is a simple inline code block</code>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Oops – there is a rune being ignored.
//...
nextR, _, err := br.ReadRune()
//...
} else {
//...
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
}
//...
Sorted.
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
What about a code block in a sentence?
{
name: "code tags should be positioned correctly around an inline code block within another sentence",
input: "This is some text surrounding and `inline code block`.",
output: "This is some text surrounding <code>and inline code block</code>.",
},
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
No problem there. What if HTML entities exist within the code block? Here, they should still be updated.
{
name: "reserved characters within a code block should be replaced with HTML entities",
input: "This file contains `a code block` with `a number of ' <> – \" ` html entities in it.",
output: "This file contains <code>a code block</code> with <code>a number of ' <> – " </code> html entities in it.",
},
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Again, no need to change any code.
Now for a multi–line code block. As I'm using the Bulma CSS framework, the multiline code blocks have to be contained within <pre>
tags.[10]
{
name: "multi–line plain text within a code block should be kept as–is",
input: "```programming_language\nThis is a multiline code block.\nLine one,\nLine two,\nLine three.\n```",
output: "<pre><code>\nThis is a multiline code block.\nLine one,\nLine two,\nLine three.\n</code></pre>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:121: TestConvertMarkdownFileToBlogHTML test number: 17
Test name: multi–line plain text within a code block should be kept as–is
expected:
<pre><code>
This is a multiline code block.
Line one,
Line two,
Line three.
</code></pre>
but got:
<code>programming_language
This is a multiline code block.
Line one,
Line two,
Line three.
</code>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Finally, the code has broken.
Now a new rule needs to be added in. If a code block was opened with a trio of back quotes, then include <pre><code>
and skip all other characters until the end of the line.
In the first instance, some more conditional statements were added:
//...
numberOfCurrentBackQuotes := 0
//...
case '`':
fmt.Println("number of `", numberOfCurrentBackQuotes)
fmt.Println(sb.String())
numberOfCurrentBackQuotes++
nextR, _, err := br.ReadRune()
if err == io.EOF {
if thereIsACodeBlockOpen {
sb.WriteString("</code>")
if numberOfCurrentBackQuotes == 3 {
sb.WriteString("</pre>")
}
thereIsACodeBlockOpen = false
numberOfCurrentBackQuotes = 0
}
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '`' {
numberOfCurrentBackQuotes++
continue
} else {
if numberOfCurrentBackQuotes == 3 {
if thereIsACodeBlockOpen {
sb.WriteString("</code></pre>")
thereIsACodeBlockOpen = false
numberOfCurrentBackQuotes = 0
} else {
sb.WriteString("<pre><code>")
thereIsACodeBlockOpen = true
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
}
}
} else {
if thereIsACodeBlockOpen {
sb.WriteString("</code>")
thereIsACodeBlockOpen = false
numberOfCurrentBackQuotes = 0
} else {
sb.WriteString("<code>")
thereIsACodeBlockOpen = true
}
}
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
}
Which doesn't work quite as intended:
=== RUN Test name: multi–line plain text within a code block should be kept as–is
expected:
<pre><code>
This is a multiline code block.
Line one,
Line two,
Line three.
</code></pre>
but got:
<pre><code>
This is a multiline code block.
Line one,
Line two,
Line three.
</code>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
The print statement explains why:
number of ` 5
<pre><code>
This is a multiline code block.
Line one,
Line two,
Line three.
Changing the code to check if there are six back quotes solves this.
//...
if err == io.EOF {
if thereIsACodeBlockOpen {
sb.WriteString("</code>")
if numberOfCurrentBackQuotes == 6 {
sb.WriteString("</pre>")
}
thereIsACodeBlockOpen = false
numberOfCurrentBackQuotes = 0
}
break
}
//...
} else {
if numberOfCurrentBackQuotes == 3 || numberOfCurrentBackQuotes == 6 {
if thereIsACodeBlockOpen {
sb.WriteString("</code></pre>")
thereIsACodeBlockOpen = false
numberOfCurrentBackQuotes = 0
} else {
sb.WriteString("<pre><code>")
thereIsACodeBlockOpen = true
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
}
}
}
//...
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Paragraphs in code blocks are next:
{
name: "paragraphs of plain text within a code block should be kept as–is (without paragraph tags)",
input: "```some programming language\nThis is a line.\n\nHere is another line. It should not be in paragraph tags.\n\nA final line.\n```",
output: "<pre><code>\nThis is a line.\n\nHere is another line. It should not be in paragraph tags.\n\nA final line.\n</code></pre>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:126: TestConvertMarkdownFileToBlogHTML test number: 18
Test name: paragraphs of plain text within a code block should be kept as–is (without paragraph tags)
expected:
<pre><code>
This is a line.
Here is another line. It should not be in paragraph tags.
A final line.
</code></pre>
but got:
<pre><code>
This is a line.
<p>
Here is another line. It should not be in paragraph tags.
</p>
<p>
A final line.
</p>
</code></pre>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
The additional rule of not having paragraph tags within a code block is straight–forward to add.
case '\n':
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
}
if thereIsAParagraphToClose {
sb.WriteRune('\n')
sb.WriteString("</p>")
thereIsAParagraphToClose = false
}
if !thereIsACodeBlockOpen {
if lastCharacterWasANewLine {
sb.WriteString("<p>")
thereIsAParagraphToClose = true
} else {
lastCharacterWasANewLine = true
}
}
sb.WriteRune('\n')
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
What about a paragraph with a code block in it?
This is a line.
Paragraph `with a code block` in it.
{
name: "a paragraph of plain text with an inline code block in it should wrap the <code> tags around it properly",
input: "This is a line.\n\nParagraph `with a code block` in it.",
output: "This is a line.\n<p>\nParagraph <code>with a code block</code> in it.\n</p>",
},
That works as–is. Still, it is nice to have about for regression testing.
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
How about a paragraph with a multi–line code block within it?
This is a line.
Here is a multi–line code block:
```code
Line one,
Line two,
line three.
```
That's the end of the code block.
{
name: "a paragraph of plain text with an inline code block in it should wrap the <code> tags around it properly",
input: "This is a line.\n\nHere is a multi–line code block:\n\n```code\nLine one,\n\nLine two,\n\nline three.\n```\n\nThat's the end of the code block.",
output: "This is a line.\n<p>\nHere is a multi–line code block:\n</p>\n<p>\n<pre><code>\nLine one,\n\nLine two,\n\nline three.\n</code></pre>\n</p>\n<p>\nThat's the end of the code block.\n</p>",
},
Which fails:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:136: TestConvertMarkdownFileToBlogHTML test number: 20
Test name: a paragraph of plain text with an inline code block in it should wrap the <code> tags around it properly
expected:
This is a line.
<p>
Here is a multi–line code block:
</p>
<p>
<pre><code>
Line one,
Line two,
line three.
</code></pre>
</p>
<p>
That's the end of the code block.
</p>
but got:
This is a line.
<p>
Here is a multi–line code block:
</p>
<p>
<pre><code>
</p>
Line one,
Line two,
line three.
</code></pre>
<p>
That's the end of the code block.
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
There was a slight flaw in the logic for the new lines in the paragraph case. This can be corrected by moving the 'thereIsAParagraphToClose' conditional within the '!thereIsACodeBlockOpen' conditional.
case '\n':
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
}
if !thereIsACodeBlockOpen {
if thereIsAParagraphToClose {
sb.WriteRune('\n')
sb.WriteString("</p>")
thereIsAParagraphToClose = false
}
if lastCharacterWasANewLine {
sb.WriteString("<p>")
thereIsAParagraphToClose = true
} else {
lastCharacterWasANewLine = true
}
}
sb.WriteRune('\n')
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
What about a code block with a directory structure in it?
```
– dashboard
| – frontend
| – backend
```
<code>
– dashboard
| – frontend
| – backend
</code>
{
name: "a multi–line code block with a directory structure within it should be rendered correctly",
input: "```\n– dashboard \n| – frontend \n| – backend \n```",
output: "<pre><code>\n– dashboard\n| – frontend\n| – backend\n</code></pre>",
},
This fails, though it seemingly gives the correct output:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:141: TestConvertMarkdownFileToBlogHTML test number: 21
Test name: a multi–line code block with a directory structure within it should be rendered correctly
expected:
<pre><code>
– dashboard
| – frontend
| – backend
</code></pre>
but got:
<pre><code>
– dashboard
| – frontend
| – backend
</code></pre>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
The keen–eyed amongst you have noticed the additional space characters in the test input, which are not included in the text output. These were added by mistake when copying text back–and–forth from Obsidian and the IDE. This test can be split into two: one without the trailing spaces, and one with, to ensure that both cases are accounted for.
{
name: "a multi–line code block with a directory structure within it should be rendered correctly",
input: "```\n– dashboard \n| – frontend \n| – backend \n```",
output: "<pre><code>\n– dashboard \n| – frontend \n| – backend \n</code></pre>",
},
{
name: "a multi–line code block with a directory structure within it should be rendered correctly",
input: "```\n– dashboard\n| – frontend\n| – backend\n```",
output: "<pre><code>\n– dashboard\n| – frontend\n| – backend\n</code></pre>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
What about a code block with has asterisks, for instance when code is imported?
```js
import * as echarts from 'echarts';
```
{
name: "asterisks within a code block should be left as–is",
input: "```js\nimport * as echarts from 'echarts';\n```",
output: "<pre><code>\nimport * as echarts from 'echarts';\n</code></pre>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:151: TestConvertMarkdownFileToBlogHTML test number: 23
Test name: asterisks within a code block should be left as–is
expected:
<pre><code>
import * as echarts from 'echarts';</code></pre>
but got:
<pre><code>
import <i>as echarts from 'echarts';
</code></pre>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
This is easily solved:
case '*':
if thereIsACodeBlockOpen {
sb.WriteRune('*')
} else {
countAsterisks++
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
There are a lot of 'if in a code block' conditionals littered throughout the switch statement. In most cases, it is to break out of the normal flow of that case and skip adding characters.
There are a couple more code block–related test cases to add, mainly due to being in combination with other markdown rules. I'll add those first before refactoring out.
Footnotes
The footnotes that I've been using are listed as:
[^1]
[^1]: This is the footnote
Let's start with just the in–text footnote number.
Here is a footnote.[^1]
For these, I'll use anchor tags, so that a viewer can click on the footnote number to jump to the end of the post. It would also be useful if a view could click on the footnote number at the end and jump back up to where the footnote is in text. Each footnote will need to have it's own id to jump back to, and
{
name: "inline footnotes should be replaced with <a id=\"footnote–anchor–n\" href=\"#footnote–n\">[n]</a>",
input: "Here is a footnote.[^1]",
output: "Here is a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:156: TestConvertMarkdownFileToBlogHTML test number: 24
Test name: inline footnotes should be replaced with <a id=\"footnote–anchor–n\" href=\"#footnote–n\">[n]</a>
expected:
Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
but got:
Here is a footnote.[^1]
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
One can start with a blunt case:
case '[':
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == ']' {
sb.WriteString("<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>")
break
}
}
Adding another footnote breaks this immediately.
{
name: "successive inline footnotes should be replaced with <a id=\"footnote–anchor–n\" href=\"#footnote–n\">[n]</a> and be numbered correctly",
input: "Here is a footnote[^1] and another footnote.[^2]",
output: "Here is a footnote<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a> and another footnote.<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:161: TestConvertMarkdownFileToBlogHTML test number: 25
Test name: inline footnotes should be replaced with <a id=\"footnote–anchor–n\" href=\"#footnote–n\">[n]</a>
expected:
Here is a footnote<a id="footnote–anchor–1" href="#footnote–1">[1]</a> and another footnote.<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
but got:
Here is a footnote<a id="footnote–anchor–1" href="#footnote–1">[1]</a> and another footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
One option might be to extract the number from the footnote.
case '[':
var footnoteNum = strings.Builder{}
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == ']' {
sb.WriteString("<a id=\"footnote–anchor–" + footnoteNum.String() + "\" href=\"#footnote–" + footnoteNum.String() + "\">[" + footnoteNum.String() + "]</a>")
break
} else if nextR != '^' {
footnoteNum.WriteRune(nextR)
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
This presents a problem, however. When editing a post, footnotes might be added out–of–order. For instance:
# Some heading
This is a line which was added later one.[^2]
This used to be the first line of the text.[^1]
However, in a post, viewers should see the numbers in order.
# Some heading
This is a line which was added later one.[^1]
This used to be the first line of the text.[^2]
The footnotes at the end will need to be reordered as well. We'll get to that step when we need to.
For now, let's keep track of which numbers exist.
{
name: "out–of–order footnote numbers should be updated to be in increasing order",
input: "Here is a footnote[^2] and another footnote.[^1]",
output: "Here is a footnote<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a> and another footnote.<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>",
},
footnoteNumber := 0
//...
case '[':
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == ']' {
sb.WriteString(
"<a id=\"footnote–anchor–" +
strconv.Itoa(footnoteNumber) +
"\" href=\"#footnote–" +
strconv.Itoa(footnoteNumber) +
"\">[" + strconv.Itoa(footnoteNumber) +
"]</a>",
)
break
} else if nextR != '^' {
footnoteNumber++
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Let's check quickly that this still works correctly when a footnote is in a paragraph.
# This is a heading
Here is a footnote.[^1]
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:171: TestConvertMarkdownFileToBlogHTML test number: 27
Test name: a footnote in a paragraph should have paragraph and anchor tags added correctly
expected:
<h1> This is a heading</h1>
<p>
Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
but got:
<h1> This is a heading</h1>
</h1><p>
Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a></h1>
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Well, that's a little distressing.
Adding a couple of print statements reveals the issue:
fmt.Println(sb.String())
fmt.Println("rune:", string(r))
rune:
<h1> This is a heading</h1>
rune:
<h1> This is a heading</h1>
</h1><p>
rune: H
<h1> This is a heading</h1>
</h1><p>
H
There is an additional </hn>
being added when a new line rune is found.
case '\n':
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
}
The solution is a simple, and overlooked, point from earlier. The header count needs to be reset to 0 when the header has been read in. If it is not, then later headers will have quite large header count values.
case '\n':
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
headerCount = 0
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Let's add another test for this, where there are two footnotes.
# This is a heading
Here is a footnote.[^2] Here's another.[^1]
{
name: "a footnote in a paragraph should have paragraph and anchor tags added correctly, and successive footnotes should be numbered in increasing order",
input: "# This is a heading\n\nHere is a footnote.[^2] Here's another.[^1]",
output: "<h1> This is a heading</h1>\n<p>\nHere is a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a> Here's another.<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>\n</p>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Good.
Now it is time to add the footnotes section at the end.
Throwaway line
This paragraph references a footnote.[^1]
[^1]: This is the reference.
Throwaway line
<p>
This paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
{
name: "a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly",
input: "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\n[^1]: This is the reference.",
output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference.\n</p>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:181: TestConvertMarkdownFileToBlogHTML test number: 29
Test name: a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly
expected:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
but got:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
<a id="footnote–anchor–2" href="#footnote–2">[2]</a>: This is the reference.
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Notice the paragraph tags around the footnote at the bottom. I'll circle back to these momentarily.
The next rule to add: if a heading is followed by :
then it is a footnote at the end of the post. This isn't too difficult to add, but does get knarly.
inlineFootnoteNumber := 0
endFootnoteNumber := 0
//...
case '[':
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == ']' {
// Check if the footnote is inline, or at the end of the document.
nextR, _, err = br.ReadRune()
// If the file ends there, assume it is an inline footnote.
if err == io.EOF {
sb.WriteString(
"<a id=\"footnote–anchor–" +
strconv.Itoa(inlineFootnoteNumber) +
"\" href=\"#footnote–" +
strconv.Itoa(inlineFootnoteNumber) +
"\">[" + strconv.Itoa(inlineFootnoteNumber) +
"]</a>",
)
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
// Else, check if it is a footnote at the end of the blog or not.
if nextR == ':' {
endFootnoteNumber++
sb.WriteString(
"<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n",
)
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
sb.WriteRune(nextR)
}
sb.WriteString("\n</p>")
} else {
sb.WriteString(
"<a id=\"footnote–anchor–" +
strconv.Itoa(inlineFootnoteNumber) +
"\" href=\"#footnote–" +
strconv.Itoa(inlineFootnoteNumber) +
"\">[" + strconv.Itoa(inlineFootnoteNumber) +
"]</a>",
)
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
}
break
} else if nextR != '^' {
inlineFootnoteNumber++
}
}
This fails, owing to the additional paragraph tags noted above:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:181: TestConvertMarkdownFileToBlogHTML test number: 29
Test name: a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly
expected:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
but got:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Now a decision to make. One an either change the rules to have all footnotes at the end of the post be surrounded by paragraph tags, or when one reads in a \n
rune, one can look ahead to see if the next character is for a footnote.
I will try the latter to start with.
case '\n':
//...
if !thereIsACodeBlockOpen {
//...
if lastCharacterWasANewLine {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if nextR == '[' {
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
continue
}
sb.WriteString("<p>")
thereIsAParagraphToClose = true
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
} else {
lastCharacterWasANewLine = true
}
}
sb.WriteRune('\n')
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now for successive footnotes.
Throwaway line
This paragraph references a footnote.[^1]
This paragraph[^2] also has a footnote.
[^1]: This is the reference.
[^2]: This is a footnote.
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is a footnote.
</p>
Feedback on the next failed test is:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:186: TestConvertMarkdownFileToBlogHTML test number: 30
Test name: successive footnotes in text and at the end should be numbered correctly
expected:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
This is a footnote.
</p>
but got:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p><p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is a footnote.
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
The numbering is, of course, off. There also is an additional space after the first footnote at the end, and no new line rune after it. This is created by a combination of the sb.WriteString("<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n")
and the sb.WriteString("\n</p>")
statements.
Let's remove the new line runes from both, and replace the hard–coded 1
with the value of endFootnoteNumber
.
case '[':
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == ']' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteString(
"<a id=\"footnote–anchor–" +
strconv.Itoa(inlineFootnoteNumber) +
"\" href=\"#footnote–" +
strconv.Itoa(inlineFootnoteNumber) +
"\">[" + strconv.Itoa(inlineFootnoteNumber) +
"]</a>",
)
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == ':' {
endFootnoteNumber++
sb.WriteString(
"<p id=\"footnote–" +
strconv.Itoa(endFootnoteNumber) +
"\">\n<a href=\"#footnote–anchor–" +
strconv.Itoa(endFootnoteNumber) +
"\">[" +
strconv.Itoa(endFootnoteNumber) +
"]</a>",
)
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
sb.WriteRune(nextR)
}
sb.WriteString("</p>")
} else {
sb.WriteString(
"<a id=\"footnote–anchor–" +
strconv.Itoa(inlineFootnoteNumber) +
"\" href=\"#footnote–" +
strconv.Itoa(inlineFootnoteNumber) +
"\">[" + strconv.Itoa(inlineFootnoteNumber) +
"]</a>",
)
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
}
break
} else if nextR != '^' {
inlineFootnoteNumber++
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:186: TestConvertMarkdownFileToBlogHTML test number: 29
Test name: a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly
expected:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
but got:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a> This is the reference.</p>
main_test.go:186: TestConvertMarkdownFileToBlogHTML test number: 30
Test name: successive footnotes in text and at the end should be numbered correctly
expected:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
This is a footnote.
</p>
but got:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a> This is the reference.
</p><p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a> This is a footnote.</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
A–ha – regression!
Let's add the \n
back in and try a different approach.
Moving the sb.WriteRune('\n')
into the err == io.EOF
statement fixes test 29, but does not resolve the issue with test 30.
#...
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p><p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
This is a footnote.
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Now, does this matter?
For the final output, a browser is not going to care if it reads </p><p>
or </p>\n<p>
. Both will render in the same way.
One solution, on the test side, is simply to get rid of all of the \n
runes between these types of paragraph tags. As long as the new lines within code blocks and the like are kept as–is, it shouldn't be a problem.
However, this looks fixable without having to add this odd rig–ma–role to the testing loop. One simply needs to check if err == io.EOF
after the losing paragraph tag has been added.
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
sb.WriteRune(nextR)
}
sb.WriteString("</p>")
if err == io.EOF {
break
}
sb.WriteRune('\n')
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
There's one more footnote test I'd like to add before moving on. Many of these footnotes will contain URLs, and some of those may contain #
symbols in them.
Throwaway line
This paragraph references a footnote.[^1]
[^1]: This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.
</p>
{
name: "'#' in footnotes should not cause header tags to be added",
input: "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\n[^1]: This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.",
output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.\n</p>",
},
Failure incoming!
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:191: TestConvertMarkdownFileToBlogHTML test number: 31
Test name: '#' in footnotes should not cause header tags to be added
expected:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.
</p>
but got:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Well, that is good to know. The #
are being rendered correctly, so that isn't an issue. However, the HTML entities are not being replaced correctly.
This is easy to solve. Rather than just adding runes as–is, one can check if the run should be converted to an HTML entity.
if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
sb.WriteString(htmlEntityMap[nextR])
} else {
sb.WriteRune(nextR)
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
What about double–digit amounts of footnotes?
Throwaway line
[^1]
[^2]
[^3]
[^4]
[^5]
[^6]
[^7]
[^8]
[^9]
[^10]
[^11]
[^12]
[^1]: 1
[^2]: 2
[^3]: 3
[^4]: 4
[^5]: 5
[^6]: 6
[^7]: 7
[^8]: 8
[^9]: 9
[^10]: 10
[^11]: 11
[^12]: 12
Throwaway line
<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
<a id="footnote–anchor–3" href="#footnote–3">[3]</a>
<a id="footnote–anchor–4" href="#footnote–4">[4]</a>
<a id="footnote–anchor–5" href="#footnote–5">[5]</a>
<a id="footnote–anchor–6" href="#footnote–6">[6]</a>
<a id="footnote–anchor–7" href="#footnote–7">[7]</a>
<a id="footnote–anchor–8" href="#footnote–8">[8]</a>
<a id="footnote–anchor–9" href="#footnote–9">[9]</a>
<a id="footnote–anchor–10" href="#footnote–10">[10]</a>
<a id="footnote–anchor–11" href="#footnote–11">[11]</a>
<a id="footnote–anchor–12" href="#footnote–12">[12]</a>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
1
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
2
</p>
<p id="footnote–3">
<a href="#footnote–anchor–3">[3]</a>
3
</p>
<p id="footnote–4">
<a href="#footnote–anchor–4">[4]</a>
4
</p>
<p id="footnote–5">
<a href="#footnote–anchor–5">[5]</a>
5
</p>
<p id="footnote–6">
<a href="#footnote–anchor–6">[6]</a>
6
</p>
<p id="footnote–7">
<a href="#footnote–anchor–7">[7]</a>
7
</p>
<p id="footnote–8">
<a href="#footnote–anchor–8">[8]</a>
8
</p>
<p id="footnote–9">
<a href="#footnote–anchor–9">[9]</a>
9
</p>
<p id="footnote–10">
<a href="#footnote–anchor–10">[10]</a>
10
</p>
<p id="footnote–11">
<a href="#footnote–anchor–11">[11]</a>
11
</p>
<p id="footnote–12">
<a href="#footnote–anchor–12">[12]</a>
12
</p>
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 32
Test name: double–digit footnotes should be numbered correctly
expected:
Throwaway line
<p>
<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
<a id="footnote–anchor–3" href="#footnote–3">[3]</a>
<a id="footnote–anchor–4" href="#footnote–4">[4]</a>
<a id="footnote–anchor–5" href="#footnote–5">[5]</a>
<a id="footnote–anchor–6" href="#footnote–6">[6]</a>
<a id="footnote–anchor–7" href="#footnote–7">[7]</a>
<a id="footnote–anchor–8" href="#footnote–8">[8]</a>
<a id="footnote–anchor–9" href="#footnote–9">[9]</a>
<a id="footnote–anchor–10" href="#footnote–10">[10]</a>
<a id="footnote–anchor–11" href="#footnote–11">[11]</a>
<a id="footnote–anchor–12" href="#footnote–12">[12]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
1
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
2
</p>
<p id="footnote–3">
<a href="#footnote–anchor–3">[3]</a>
3
</p>
<p id="footnote–4">
<a href="#footnote–anchor–4">[4]</a>
4
</p>
<p id="footnote–5">
<a href="#footnote–anchor–5">[5]</a>
5
</p>
<p id="footnote–6">
<a href="#footnote–anchor–6">[6]</a>
6
</p>
<p id="footnote–7">
<a href="#footnote–anchor–7">[7]</a>
7
</p>
<p id="footnote–8">
<a href="#footnote–anchor–8">[8]</a>
8
</p>
<p id="footnote–9">
<a href="#footnote–anchor–9">[9]</a>
9
</p>
<p id="footnote–10">
<a href="#footnote–anchor–10">[10]</a>
10
</p>
<p id="footnote–11">
<a href="#footnote–anchor–11">[11]</a>
11
</p>
<p id="footnote–12">
<a href="#footnote–anchor–12">[12]</a>
12
</p>
but got:
Throwaway line
<a id="footnote–anchor–1" href="#footnote–1">[1]</a><a id="footnote–anchor–2" href="#footnote–2">[2]</a><a id="footnote–anchor–3" href="#footnote–3">[3]</a><a id="footnote–anchor–4" href="#footnote–4">[4]</a><a id="footnote–anchor–5" href="#footnote–5">[5]</a><a id="footnote–anchor–6" href="#footnote–6">[6]</a><a id="footnote–anchor–7" href="#footnote–7">[7]</a><a id="footnote–anchor–8" href="#footnote–8">[8]</a><a id="footnote–anchor–9" href="#footnote–9">[9]</a><a id="footnote–anchor–11" href="#footnote–11">[11]</a><a id="footnote–anchor–13" href="#footnote–13">[13]</a><a id="footnote–anchor–15" href="#footnote–15">[15]</a><p>
</p><p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
1
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
2
</p>
<p id="footnote–3">
<a href="#footnote–anchor–3">[3]</a>
3
</p>
<p id="footnote–4">
<a href="#footnote–anchor–4">[4]</a>
4
</p>
<p id="footnote–5">
<a href="#footnote–anchor–5">[5]</a>
5
</p>
<p id="footnote–6">
<a href="#footnote–anchor–6">[6]</a>
6
</p>
<p id="footnote–7">
<a href="#footnote–anchor–7">[7]</a>
7
</p>
<p id="footnote–8">
<a href="#footnote–anchor–8">[8]</a>
8
</p>
<p id="footnote–9">
<a href="#footnote–anchor–9">[9]</a>
9
</p>
<p id="footnote–10">
<a href="#footnote–anchor–10">[10]</a>
10
</p>
<p id="footnote–11">
<a href="#footnote–anchor–11">[11]</a>
11
</p>
<p id="footnote–12">
<a href="#footnote–anchor–12">[12]</a>
12
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
A few issues:
- There are no new lines characters after the second in–text footnote.
- The in–text footnotes start skipping numbers as soon as the double–digit numbers are reached.
- There are some unnecessary paragraph tags between the in–line footnotes and the
For the first point, one can attempt to add a new line character when the start of a footnote has been found after a new line character.
if nextR == '[' {
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
sb.WriteRune('\n')
continue
}
This breaks some earlier tests. For example:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 29
Test name: a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly
expected:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
but got:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
Here, we return to the question of whether this difference matters. As noted before, there is no semantic difference on whether new line characters are included between closing and opening paragraph tags or not, it is more for the convenience of making the output from tests easier to read. The places where one needs to care about new line characters are within code blocks, as this affects how the code is displayed.
Therefore, for convenience I'm wiling to change the expected outputs from the tests to allow for an additional new line character between the </p>
and <p ...>
tags just before the footnotes at the end.
Four tests have to be updated to allow for this:
{
name: "a footnote in a paragraph and a footnote at the end of the post should have anchor tags added correctly",
input: "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\n[^1]: This is the reference.",
output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference.\n</p>",
},
{
name: "successive footnotes in text and at the end should be numbered correctly",
input: "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\nThis paragraph[^2] also has a footnote.\n\n[^1]: This is the reference.\n[^2]: This is a footnote.",
output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n<p>\nThis paragraph<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a> also has a footnote.\n</p>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference.\n</p>\n<p id=\"footnote–2\">\n<a href=\"#footnote–anchor–2\">[2]</a>\n This is a footnote.\n</p>",
},
{
name: "'#' in footnotes should not cause header tags to be added",
input: "Throwaway line\n\nThis paragraph references a footnote.[^1]\n\n[^1]: This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.",
output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is the reference, it has a url: https://this–is–not–a–real–url.blue/database?query=#a–query.\n</p>",
},
{
name: "double–digit footnotes should be numbered correctly",
input: "Throwaway line\n\n[^1]\n[^2]\n[^3]\n[^4]\n[^5]\n[^6]\n[^7]\n[^8]\n[^9]\n[^10]\n[^11]\n[^12]\n\n[^1]: 1\n[^2]: 2\n[^3]: 3\n[^4]: 4\n[^5]: 5\n[^6]: 6\n[^7]: 7\n[^8]: 8\n[^9]: 9\n[^10]: 10\n[^11]: 11\n[^12]: 12",
output: "Throwaway line\n<p>\n<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>\n<a id=\"footnote–anchor–3\" href=\"#footnote–3\">[3]</a>\n<a id=\"footnote–anchor–4\" href=\"#footnote–4\">[4]</a>\n<a id=\"footnote–anchor–5\" href=\"#footnote–5\">[5]</a>\n<a id=\"footnote–anchor–6\" href=\"#footnote–6\">[6]</a>\n<a id=\"footnote–anchor–7\" href=\"#footnote–7\">[7]</a>\n<a id=\"footnote–anchor–8\" href=\"#footnote–8\">[8]</a>\n<a id=\"footnote–anchor–9\" href=\"#footnote–9\">[9]</a>\n<a id=\"footnote–anchor–10\" href=\"#footnote–10\">[10]</a>\n<a id=\"footnote–anchor–11\" href=\"#footnote–11\">[11]</a>\n<a id=\"footnote–anchor–12\" href=\"#footnote–12\">[12]</a>\n</p>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n 1\n</p>\n<p id=\"footnote–2\">\n<a href=\"#footnote–anchor–2\">[2]</a>\n 2\n</p>\n<p id=\"footnote–3\">\n<a href=\"#footnote–anchor–3\">[3]</a>\n 3\n</p>\n<p id=\"footnote–4\">\n<a href=\"#footnote–anchor–4\">[4]</a>\n 4\n</p>\n<p id=\"footnote–5\">\n<a href=\"#footnote–anchor–5\">[5]</a>\n 5\n</p>\n<p id=\"footnote–6\">\n<a href=\"#footnote–anchor–6\">[6]</a>\n 6\n</p>\n<p id=\"footnote–7\">\n<a href=\"#footnote–anchor–7\">[7]</a>\n 7\n</p>\n<p id=\"footnote–8\">\n<a href=\"#footnote–anchor–8\">[8]</a>\n 8\n</p>\n<p id=\"footnote–9\">\n<a href=\"#footnote–anchor–9\">[9]</a>\n 9\n</p>\n<p id=\"footnote–10\">\n<a href=\"#footnote–anchor–10\">[10]</a>\n 10\n</p>\n<p id=\"footnote–11\">\n<a href=\"#footnote–anchor–11\">[11]</a>\n 11\n</p>\n<p id=\"footnote–12\">\n<a href=\"#footnote–anchor–12\">[12]</a>\n 12\n</p>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 32
Test name: double–digit footnotes should be numbered correctly
expected:
Throwaway line
<p>
<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
<a id="footnote–anchor–3" href="#footnote–3">[3]</a>
<a id="footnote–anchor–4" href="#footnote–4">[4]</a>
<a id="footnote–anchor–5" href="#footnote–5">[5]</a>
<a id="footnote–anchor–6" href="#footnote–6">[6]</a>
<a id="footnote–anchor–7" href="#footnote–7">[7]</a>
<a id="footnote–anchor–8" href="#footnote–8">[8]</a>
<a id="footnote–anchor–9" href="#footnote–9">[9]</a>
<a id="footnote–anchor–10" href="#footnote–10">[10]</a>
<a id="footnote–anchor–11" href="#footnote–11">[11]</a>
<a id="footnote–anchor–12" href="#footnote–12">[12]</a>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
1
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
2
</p>
<p id="footnote–3">
<a href="#footnote–anchor–3">[3]</a>
3
</p>
<p id="footnote–4">
<a href="#footnote–anchor–4">[4]</a>
4
</p>
<p id="footnote–5">
<a href="#footnote–anchor–5">[5]</a>
5
</p>
<p id="footnote–6">
<a href="#footnote–anchor–6">[6]</a>
6
</p>
<p id="footnote–7">
<a href="#footnote–anchor–7">[7]</a>
7
</p>
<p id="footnote–8">
<a href="#footnote–anchor–8">[8]</a>
8
</p>
<p id="footnote–9">
<a href="#footnote–anchor–9">[9]</a>
9
</p>
<p id="footnote–10">
<a href="#footnote–anchor–10">[10]</a>
10
</p>
<p id="footnote–11">
<a href="#footnote–anchor–11">[11]</a>
11
</p>
<p id="footnote–12">
<a href="#footnote–anchor–12">[12]</a>
12
</p>
but got:
Throwaway line
<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
<a id="footnote–anchor–3" href="#footnote–3">[3]</a>
<a id="footnote–anchor–4" href="#footnote–4">[4]</a>
<a id="footnote–anchor–5" href="#footnote–5">[5]</a>
<a id="footnote–anchor–6" href="#footnote–6">[6]</a>
<a id="footnote–anchor–7" href="#footnote–7">[7]</a>
<a id="footnote–anchor–8" href="#footnote–8">[8]</a>
<a id="footnote–anchor–9" href="#footnote–9">[9]</a>
<a id="footnote–anchor–11" href="#footnote–11">[11]</a>
<a id="footnote–anchor–13" href="#footnote–13">[13]</a>
<a id="footnote–anchor–15" href="#footnote–15">[15]</a><p>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
1
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
2
</p>
<p id="footnote–3">
<a href="#footnote–anchor–3">[3]</a>
3
</p>
<p id="footnote–4">
<a href="#footnote–anchor–4">[4]</a>
4
</p>
<p id="footnote–5">
<a href="#footnote–anchor–5">[5]</a>
5
</p>
<p id="footnote–6">
<a href="#footnote–anchor–6">[6]</a>
6
</p>
<p id="footnote–7">
<a href="#footnote–anchor–7">[7]</a>
7
</p>
<p id="footnote–8">
<a href="#footnote–anchor–8">[8]</a>
8
</p>
<p id="footnote–9">
<a href="#footnote–anchor–9">[9]</a>
9
</p>
<p id="footnote–10">
<a href="#footnote–anchor–10">[10]</a>
10
</p>
<p id="footnote–11">
<a href="#footnote–anchor–11">[11]</a>
11
</p>
<p id="footnote–12">
<a href="#footnote–anchor–12">[12]</a>
12
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
There is still the issue that no paragraph tag has been included before the slew of in–text footnotes. This is an artifact of the [
check in the \n
case, added earlier to avoid having <p>
around the footnotes at the end of the post:
if nextR == '[' {
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
sb.WriteRune('\n')
continue
}
At this point, I noticed something rather annoying. lastCharacterWasANewLine
is not set to false outside of the default case. At the end of every other statement, I added in a clause to ensure that this is the case.
lastCharacterWasANewLine = false
Doing so doesn't break any tests (apart from the latest, which is already broken). However, it does change the output, removing the <p>
tags completely.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 32
Test name: double–digit footnotes should be numbered correctly
expected:
#...
but got:
Throwaway line
<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
<a id="footnote–anchor–3" href="#footnote–3">[3]</a>
<a id="footnote–anchor–4" href="#footnote–4">[4]</a>
<a id="footnote–anchor–5" href="#footnote–5">[5]</a>
<a id="footnote–anchor–6" href="#footnote–6">[6]</a>
<a id="footnote–anchor–7" href="#footnote–7">[7]</a>
<a id="footnote–anchor–8" href="#footnote–8">[8]</a>
<a id="footnote–anchor–9" href="#footnote–9">[9]</a>
<a id="footnote–anchor–11" href="#footnote–11">[11]</a>
<a id="footnote–anchor–13" href="#footnote–13">[13]</a>
<a id="footnote–anchor–15" href="#footnote–15">[15]</a>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
1
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
2
</p>
<p id="footnote–3">
<a href="#footnote–anchor–3">[3]</a>
3
</p>
<p id="footnote–4">
<a href="#footnote–anchor–4">[4]</a>
4
</p>
<p id="footnote–5">
<a href="#footnote–anchor–5">[5]</a>
5
</p>
<p id="footnote–6">
<a href="#footnote–anchor–6">[6]</a>
6
</p>
<p id="footnote–7">
<a href="#footnote–anchor–7">[7]</a>
7
</p>
<p id="footnote–8">
<a href="#footnote–anchor–8">[8]</a>
8
</p>
<p id="footnote–9">
<a href="#footnote–anchor–9">[9]</a>
9
</p>
<p id="footnote–10">
<a href="#footnote–anchor–10">[10]</a>
10
</p>
<p id="footnote–11">
<a href="#footnote–anchor–11">[11]</a>
11
</p>
<p id="footnote–12">
<a href="#footnote–anchor–12">[12]</a>
12
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
With the current code, one can either have no paragraph tags around paragraphs made just of footnotes and none around the footnotes at the end, or have paragraph tags around both. Determining if the function is in the footnotes at the end requires finding a [
and then reading ahead by at least four runes. With the byte reader, one cannot unread a rune after having already unread a rune. Having decided to use a different data structure, such as a linked list, to store the output would allow for a lot of flexibility here: it would be possible to look back through the most recent nodes and replace them as needed rather than just working forwards.
At this point, I'm going to take a rather unsatisfying step back. The only lines in a blog post with start with a footnote number should be those at the end. Moreover, there shouldn't be any paragraphs of just footnote numbers in the text – that would be very strange to read. Adding these two assumptions allows me to remove the paragraph tags around the footnotes section of the expected answer in the test. One of three \n
runes around the </p>
also goes.
{
name: "double–digit footnotes should be numbered correctly",
input: "Throwaway line\n\n[^1]\n[^2]\n[^3]\n[^4]\n[^5]\n[^6]\n[^7]\n[^8]\n[^9]\n[^10]\n[^11]\n[^12]\n\n[^1]: 1\n[^2]: 2\n[^3]: 3\n[^4]: 4\n[^5]: 5\n[^6]: 6\n[^7]: 7\n[^8]: 8\n[^9]: 9\n[^10]: 10\n[^11]: 11\n[^12]: 12",
output: "Throwaway line\n\n<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a>\n<a id=\"footnote–anchor–3\" href=\"#footnote–3\">[3]</a>\n<a id=\"footnote–anchor–4\" href=\"#footnote–4\">[4]</a>\n<a id=\"footnote–anchor–5\" href=\"#footnote–5\">[5]</a>\n<a id=\"footnote–anchor–6\" href=\"#footnote–6\">[6]</a>\n<a id=\"footnote–anchor–7\" href=\"#footnote–7\">[7]</a>\n<a id=\"footnote–anchor–8\" href=\"#footnote–8\">[8]</a>\n<a id=\"footnote–anchor–9\" href=\"#footnote–9\">[9]</a>\n<a id=\"footnote–anchor–10\" href=\"#footnote–10\">[10]</a>\n<a id=\"footnote–anchor–11\" href=\"#footnote–11\">[11]</a>\n<a id=\"footnote–anchor–12\" href=\"#footnote–12\">[12]</a>\n\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n 1\n</p>\n<p id=\"footnote–2\">\n<a href=\"#footnote–anchor–2\">[2]</a>\n 2\n</p>\n<p id=\"footnote–3\">\n<a href=\"#footnote–anchor–3\">[3]</a>\n 3\n</p>\n<p id=\"footnote–4\">\n<a href=\"#footnote–anchor–4\">[4]</a>\n 4\n</p>\n<p id=\"footnote–5\">\n<a href=\"#footnote–anchor–5\">[5]</a>\n 5\n</p>\n<p id=\"footnote–6\">\n<a href=\"#footnote–anchor–6\">[6]</a>\n 6\n</p>\n<p id=\"footnote–7\">\n<a href=\"#footnote–anchor–7\">[7]</a>\n 7\n</p>\n<p id=\"footnote–8\">\n<a href=\"#footnote–anchor–8\">[8]</a>\n 8\n</p>\n<p id=\"footnote–9\">\n<a href=\"#footnote–anchor–9\">[9]</a>\n 9\n</p>\n<p id=\"footnote–10\">\n<a href=\"#footnote–anchor–10\">[10]</a>\n 10\n</p>\n<p id=\"footnote–11\">\n<a href=\"#footnote–anchor–11\">[11]</a>\n 11\n</p>\n<p id=\"footnote–12\">\n<a href=\"#footnote–anchor–12\">[12]</a>\n 12\n</p>",
},
This is a little unsatisfying, as it is a second case of changing a test after running it. Overall, I'm comfortable with making this change as the end goal should still be achieved with it. If this was different, for example there was a paragraph of text with footnote numbers strewn throughout it and then the paragraph tags were not being added then there would be a very material change to the output.
Now for the counting of in–text footnotes. Currently, these are double–counted as the footnote number increases for each rune between ^
and ]
.
} else if nextR != '^' {
inlineFootnoteNumber++
}
One fix is very simple. Increase the count each time the rune ^
is seen.
} else if nextR == '^' {
inlineFootnoteNumber++
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now, how about renumbering footnotes at the end, such that they align with the new footnote numbers provided earlier on. Here, I assume that the
Throwaway line
This paragraph references a footnote.[^2]
This paragraph[^1] also has a footnote.
[^1]: This is the reference.
[^2]: This is a footnote.
{
name: "footnotes at the end should be renumbered if footnotes in text were renumbered",
input: "Throwaway line\n\nThis paragraph references a footnote.[^2]\n\nThis paragraph[^1] also has a footnote.\n\n[^1]: This is the reference.\n[^2]: This is a footnote.",
output: "Throwaway line\n<p>\nThis paragraph references a footnote.<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a>\n</p>\n<p>\nThis paragraph<a id=\"footnote–anchor–2\" href=\"#footnote–2\">[2]</a> also has a footnote.\n</p>\n\n<p id=\"footnote–2\">\n<a href=\"#footnote–anchor–2\">[2]</a>\n This is the reference.\n</p>\n<p id=\"footnote–1\">\n<a href=\"#footnote–anchor–1\">[1]</a>\n This is a footnote.\n</p>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:201: TestConvertMarkdownFileToBlogHTML test number: 33
Test name: footnotes at the end should be renumbered if footnotes in text were renumbered
expected:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
This is the reference.
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is a footnote.
</p>
but got:
Throwaway line
<p>
This paragraph references a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
This paragraph<a id="footnote–anchor–2" href="#footnote–2">[2]</a> also has a footnote.
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
This is the reference.
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
This is a footnote.
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
A hash map can help match these up. First, by taking the value of the original footnote, and pairing it with the value it was updated to.
footnoteNumberMap := map[int]int{}
//...
} else {
sb.WriteString(
"<a id=\"footnote–anchor–" +
strconv.Itoa(inlineFootnoteNumber) +
"\" href=\"#footnote–" +
strconv.Itoa(inlineFootnoteNumber) +
"\">[" + strconv.Itoa(inlineFootnoteNumber) +
"]</a>",
)
footnoteOriginalNumber, err := strconv.Atoi(inTextFootnoteNumber.String())
if err != nil {
log.Fatal("unable to convert string to number:", err)
}
footnoteNumberMap[footnoteOriginalNumber] = inlineFootnoteNumber
//...
} else if nextR == '^' {
inlineFootnoteNumber++
} else if nextR != '^' {
inTextFootnoteNumber.WriteRune(nextR)
}
Rather than counting up the endFootnoteNumber
as before, we take the value from inside the [^]:
look up the value from the map.
if nextR == ':' {
footnoteNumber, err := strconv.Atoi(inTextFootnoteNumber.String())
if err != nil {
log.Fatal("unable to convert string to number:", err)
}
sb.WriteString(
"<p id=\"footnote–" +
strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
"\">\n<a href=\"#footnote–anchor–" +
strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
"\">[" +
strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
"]</a>",
)
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Unordered List
This will hopefully be straightforward.
# Unordered List!
– This is an unordered list with a – dash.
– One,
– Two,
– Three.
<h1> Unordered List!</h1>
<p>
<ul>
<li> This is an unordered list with a – dash.</li>
<li> One,</li>
<li> Two,</li>
<li> Three.</li>
</ul>
</p>
{
name: "unordered lists should have <ul> tags and <li> tags",
input: "# Unordered List!\n\n– This is an unordered list with a – dash.\n– One,\n– Two,\n– Three.",
output: "<h1> Unordered List!</h1>\n<p>\n<ul>\n<li> This is an unordered list with a – dash.</li>\n<li> One,</li>\n<li> Two,</li>\n<li> Three.</li>\n</ul>\n</p>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:206: TestConvertMarkdownFileToBlogHTML test number: 34
Test name: unordered lists should have <ul> tags and <li> tags
expected:
<h1> Unordered List!</h1>
<ul>
<li> This is an unordered list with a – dash.</li>
<li> One,</li>
<li> Two,</li>
<li> Three.</li>
</ul>
but got:
<h1> Unordered List!</h1>
<p>
– This is an unordered list with a – dash.
</p>
– One,
– Two,
– Three.
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Here, one needs to distinguish between a –
which is simply in–text, and one which is the start of a list. The distinction is on whether a \n
is followed by –
or not. The list ends when there are two \n
in a row.
Let's try this.
thereIsAnUnorderedListOpen := false
//...
case '\n':
//fmt.Println("\n", thereIsACodeBlockOpen, lastCharacterWasANewLine)
if headerCount > 0 {
addClosingHeaderTag(&sb, headerCount)
headerCount = 0
}
if !thereIsACodeBlockOpen {
if thereIsAnUnorderedListOpen {
sb.WriteRune('\n')
sb.WriteString("</ul>")
thereIsAnUnorderedListOpen = false
}
sb.WriteRune('\n')
lastCharacterWasANewLine = true // <– this was missing from earlier
//...
case '–':
if lastCharacterWasANewLine {
if thereIsAnUnorderedListOpen {
sb.WriteString("<li>")
var nextR rune
for nextR != '\n' {
nextR, _, err := br.ReadRune()
if err == io.EOF {
sb.WriteString("</li>")
break
}
if err != nil {
log.Fatal("unable to read ahead by one rune:", err)
}
if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
sb.WriteString(htmlEntityMap[nextR])
} else {
sb.WriteRune(nextR)
}
}
sb.WriteString("</li>")
sb.WriteRune('\n')
} else {
sb.WriteString("<ul>")
thereIsAnUnorderedListOpen = true
}
} else {
sb.WriteString(htmlEntityMap[r])
}
lastCharacterWasANewLine = false
Here, I noticed, and corrected, the issue of not having lastCharacterWasANewLine
set to true at the end of the case: '\n'
.
A few tests broke here.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:206: TestConvertMarkdownFileToBlogHTML test number: 21
Test name: a multi–line code block with a directory structure within it should be rendered correctly
expected:
<pre><code>
– dashboard
| – frontend
| – backend
</code></pre>
but got:
<pre><code>
<ul> dashboard
| – frontend
| – backend
</code></pre>
main_test.go:206: TestConvertMarkdownFileToBlogHTML test number: 22
Test name: a multi–line code block with a directory structure within it should be rendered correctly
expected:
<pre><code>
– dashboard
| – frontend
| – backend
</code></pre>
but got:
<pre><code>
<ul> dashboard
| – frontend
| – backend
</code></pre>
main_test.go:206: TestConvertMarkdownFileToBlogHTML test number: 34
Test name: unordered lists should have <ul> tags and <li> tags
expected:
<h1> Unordered List!</h1>
<p>
<ul>
<li> This is an unordered list with a – dash.</li>
<li> One,</li>
<li> Two,</li>
<li> Three.</li>
</ul>
</p>
but got:
<h1> Unordered List!</h1>
<p>
<ul> This is an unordered list with a – dash.
</ul>
</p>
<ul> One,
</ul>
<ul> Two,
</ul>
<ul> Three.
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
If there is a code block open, then \n
can be followed by –
without starting an unordered list. There's also an issue where none of the <li>
tags are added.
Let's fix the first one.
case '–':
if lastCharacterWasANewLine {
if thereIsACodeBlockOpen {
sb.WriteString("–")
continue
}
Now there's only the unordered list test to fix. This is quite a bit more involved.
thereIsAnUnorderedListOpen := false
//...
case '\n':
//...
if !thereIsACodeBlockOpen {
if thereIsAnUnorderedListOpen {
sb.WriteString("</ul>")
thereIsAnUnorderedListOpen = false
}
//...
case '–':
if thereIsAnUnorderedListOpen {
if thereIsACodeBlockOpen {
sb.WriteString("–")
continue
}
sb.WriteString("<li>")
var nextR rune
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read ahead by one rune:", err)
}
if nextR == '\n' {
break
}
if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
sb.WriteString(htmlEntityMap[nextR])
} else {
sb.WriteRune(nextR)
}
}
sb.WriteString("</li>")
sb.WriteRune('\n')
} else if lastCharacterWasANewLine {
if thereIsACodeBlockOpen {
sb.WriteString("–")
continue
}
if thereIsAnUnorderedListOpen {
sb.WriteString("<li>")
var nextR rune
for nextR != '\n' {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read ahead by one rune:", err)
}
if nextR == '\n' {
break
}
if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
sb.WriteString(htmlEntityMap[nextR])
} else {
sb.WriteRune(nextR)
}
}
sb.WriteString("</li>")
sb.WriteRune('\n')
} else {
sb.WriteString("<ul>")
thereIsAnUnorderedListOpen = true
sb.WriteRune('\n')
sb.WriteString("<li>")
var nextR rune
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read ahead by one rune:", err)
}
if nextR == '\n' {
break
}
if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
sb.WriteString(htmlEntityMap[nextR])
} else {
sb.WriteRune(nextR)
}
}
sb.WriteString("</li>")
sb.WriteRune('\n')
}
} else {
sb.WriteString(htmlEntityMap[r])
}
lastCharacterWasANewLine = false
//...
if thereIsAnUnorderedListOpen {
sb.WriteString("</ul>")
}
if thereIsAParagraphToClose {
sb.WriteRune('\n')
sb.WriteString("</p>")
}
return sb.String()
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
There is a lot of code here which needs to be refactored out later on.
Tables
Tables are all started with a |
. Let's start with just a header. I assume that the entire set of table tags should be added if a table has been started.
| Table | Head |
<table class="table is–hoverable">
<thead>
<tr>
<th> Table </th>
<th> Head </th>
</tr>
</thead>
<tbody>
</tbody>
</table>
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:211: TestConvertMarkdownFileToBlogHTML test number: 35
Test name: the head of a table should be added correctly
expected:
<table class="table is–hoverable">
<thead>
<tr>
<th> Table </th>
<th> Head </th>
</tr>
</thead>
<tbody>
</tbody>
</table>
but got:
| Table | Head |
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Let's try:
case '|':
sb.WriteString("<table class=\"table is–hoverable\">")
sb.WriteRune('\n')
sb.WriteString("<thead>")
sb.WriteRune('\n')
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<th>")
var nextR rune
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '|' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteString("</th>")
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '\n' {
break
}
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
sb.WriteString("</th>")
sb.WriteRune('\n')
sb.WriteString("<th scope=\"col\">")
} else {
sb.WriteRune(nextR)
}
}
sb.WriteString("</tr>")
sb.WriteRune('\n')
sb.WriteString("</thead>")
sb.WriteRune('\n')
sb.WriteString("<tbody>")
sb.WriteRune('\n')
sb.WriteString("</tbody>")
sb.WriteRune('\n')
sb.WriteString("</table>")
On re–running the tests:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:211: TestConvertMarkdownFileToBlogHTML test number: 21
Test name: a multi–line code block with a directory structure within it should be rendered correctly
expected:
<pre><code>
– dashboard
| – frontend
| – backend
</code></pre>
but got:
<pre><code>
– dashboard
<table class="table is–hoverable">
<thead>
<tr>
<th> – frontend
</tr>
</thead>
<tbody>
</tbody>
</table><table class="table is–hoverable">
<thead>
<tr>
<th> – backend
</tr>
</thead>
<tbody>
</tbody>
</table></code></pre>
main_test.go:211: TestConvertMarkdownFileToBlogHTML test number: 22
Test name: a multi–line code block with a directory structure within it should be rendered correctly
expected:
<pre><code>
– dashboard
| – frontend
| – backend
</code></pre>
but got:
<pre><code>
– dashboard
<table class="table is–hoverable">
<thead>
<tr>
<th> – frontend
</tr>
</thead>
<tbody>
</tbody>
</table><table class="table is–hoverable">
<thead>
<tr>
<th> – backend
</tr>
</thead>
<tbody>
</tbody>
</table></code></pre>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Oh dear. This both failed to account for |
within code blocks. At least the condition is clear.
case '|':
if thereIsACodeBlockOpen {
sb.WriteRune('|')
continue
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now to add the border line after the header. I assume that this should be skipped, as the set of table tags is added when the table header is found.
| Table | Head |
|––|––|
{
name: "the border line after the header of a table should be added correctly",
input: "| Table | Head |\n|––|––|",
output: "<table class=\"table is–hoverable\">\n<thead>\n<tr>\n<th> Table </th>\n<th> Head </th>\n</tr>\n</thead>\n<tbody>\n</tbody>\n</table>",
},
This will fail quite spectacularly.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:216: TestConvertMarkdownFileToBlogHTML test number: 36
Test name: the border line after the header of a table should be added correctly
expected:
<table class="table is–hoverable">
<thead>
<tr>
<th> Table </th>
<th> Head </th>
</tr>
</thead>
<tbody>
</tbody>
</table>
but got:
<table class="table is–hoverable">
<thead>
<tr>
<th> Table </th>
<th> Head </tr>
</thead>
<tbody>
</tbody>
</table><table class="table is–hoverable">
<thead>
<tr>
<th>––</th>
<th>––</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
We can read past the border to the next new line character.
case '|':
//...
for nextR != '\n' {
//...
if nextR == '|' {
//...
if nextR == '\n' {
sb.WriteString("</th>")
sb.WriteRune('\n')
break
}
//...
}
sb.WriteString("</tr>")
sb.WriteRune('\n')
sb.WriteString("</thead>")
sb.WriteRune('\n')
var afterR rune
for afterR != '\n' {
afterR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
}
sb.WriteString("<tbody>")
sb.WriteRune('\n')
sb.WriteString("</tbody>")
sb.WriteRune('\n')
sb.WriteString("</table>")
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now for a table with content.
| col name one | col name two |
|–|–|
| row contents one | row contents two |
<table class="table table–hover">
<thead>
<tr>
<th> col name one </th>
<th> col name two </th>
</tr>
</thead>
<tbody>
<tr>
<td> row contents one </td>
<td> row contents two </td>
</tr>
</tbody>
</table>
And now the output:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:221: TestConvertMarkdownFileToBlogHTML test number: 37
Test name: simple tables without markdown characters in them should have the appropriate table tags added
expected:
<table class="table is–hoverable">
<thead>
<tr>
<th> col name one </th>
<th> col name two </th>
</tr>
</thead>
<tbody>
<tr>
<td> row contents one </td>
<td> row contents two </td>
</tr>
</tbody>
</table>
but got:
<table class="table is–hoverable">
<thead>
<tr>
<th> col name one </th>
<th> col name two </th>
</tr>
</thead>
<tbody>
</tbody>
</table><table class="table is–hoverable">
<thead>
<tr>
<th> row contents one </th>
<th> row contents two </th>
</tr>
</thead>
<tbody>
</tbody>
</table>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
For each line after this, one can read runes until an \n
is found.
sb.WriteString("<tbody>")
sb.WriteRune('\n')
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteString("</tbody>")
sb.WriteRune('\n')
sb.WriteString("</table>")
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '|' {
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<td>")
numOfConsecutiveNewLines := 0
for numOfConsecutiveNewLines < 2 {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '\n' {
sb.WriteString("</td>")
sb.WriteRune('\n')
sb.WriteString("</tr>")
sb.WriteRune('\n')
numOfConsecutiveNewLines++
} else {
if numOfConsecutiveNewLines == 1 {
sb.WriteString("<td>")
} else {
if nextR == '|' {
sb.WriteString("</td>")
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteRune('\n')
sb.WriteString("</tr>")
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
sb.WriteRune('\n')
if nextR != '\n' {
sb.WriteString("<td>")
}
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
} else {
sb.WriteRune(nextR)
}
}
numOfConsecutiveNewLines = 0
}
}
}
sb.WriteString("</tbody>")
sb.WriteRune('\n')
sb.WriteString("</table>")
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
This will be simplified by extracting out much of this logic into a function.
Tables can contain HTML entities as well.
| col name one | col name two |
|–|–|
| A non–entity / | Some entities – ' |
| < More entities > | "And I quote..." |
<table class="table is–hoverable">
<thead>
<tr>
<th> col name one </th>
<th> col name two </th>
</tr>
</thead>
<tbody>
<tr>
<td> A non–entity / </td>
<td> Some entities – ' </td>
</tr>
<tr>
<td> < More entities > </td>
<td> "And I quote..." </td>
</tr>
</tbody>
</table>
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:226: TestConvertMarkdownFileToBlogHTML test number: 38
Test name: a table with html entities should have them replaced
expected:
<table class="table is–hoverable">
<thead>
<tr>
<th> col name one </th>
<th> col name two </th>
</tr>
</thead>
<tbody>
<tr>
<td> A non–entity / </td>
<td> Some entities – ' </td>
</tr>
<tr>
<td> < More entities > </td>
<td> "And I quote..." </td>
</tr>
</tbody>
</table>
but got:
<table class="table is–hoverable">
<thead>
<tr>
<th> col name one </th>
<th> col name two </th>
</tr>
</thead>
<tbody>
<tr>
<td> A non–entity / </td>
<td> Some entities – ' </td>
</td>
</tr>
<td> < More entities > </td>
<td> "And I quote..." </td>
</tr>
</tbody>
</table>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
There's an additional </td>
coming from somewhere, and one–too–few <tr>
tags.
For the first, let's try removing the </tr>
included when a new line character is found.
for numOfConsecutiveNewLines < 2 {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '\n' {
sb.WriteRune('\n')
numOfConsecutiveNewLines++
} else {
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:226: TestConvertMarkdownFileToBlogHTML test number: 38
Test name: a table with html entities should have them replaced
expected:
<table class="table is–hoverable">
<thead>
<tr>
<th> col name one </th>
<th> col name two </th>
</tr>
</thead>
<tbody>
<tr>
<td> A non–entity / </td>
<td> Some entities – ' </td>
</tr>
<tr>
<td> < More entities > </td>
<td> "And I quote..." </td>
</tr>
</tbody>
</table>
but got:
<table class="table is–hoverable">
<thead>
<tr>
<th> col name one </th>
<th> col name two </th>
</tr>
</thead>
<tbody>
<tr>
<td> A non–entity / </td>
<td> Some entities – ' </td>
</tr>
<td> < More entities > </td>
<td> "And I quote..." </td>
</tr>
</tbody>
</table>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
That seems to have solved it, and removed some redundancy.
Just a little further down in the code there is a check to add a new <td>
if the last character was \n
. This should also include a <tr>
as a new table row starts after a single new line character.
for numOfConsecutiveNewLines < 2 {
nextR, _, err = br.ReadRune()
//...
} else {
if numOfConsecutiveNewLines == 1 {
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<td>")
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now what if a table has footnotes? I'll return to this after refactoring the code, as extracting the footnote–translating code into its own functions will make this significantly easier.
Images
One set of items I had forgotten to account for in the initial grammar were images.
This is a little trickier. Currently, I don't decide on whether an image should take the full width of a column, or if it should sit side–by–side with another image until actually submitting the post.
The former would suggest:
Markdown | HTML |
---|---|
![[image_name.png]] | <figure class="image"> <img src="/directory_name/image_name.png"> </figure> |
hilst the latter *might* require: |
Markdown | HTML |
---|---|
![[image_name.png]] | <div class="columns"> <div class="column"> <figure class="image is–5by4"> <img src="image_name.png"> </figure> </div> </div> |
For now, I will assume the former, simpler case. The blog posts have relatively thin columns, and it will be in relatively particular cases that two images sit side–by–side.
The directory_name
can be supplied by the user as a command–line argument. This will be added later on.
{
name: "images should be placed into <figure> and <img> tags",
input: "![[image_name.png]]",
output: "<figure class=\"image\">\n<img src=\"/directory_name/image_name.png\">\n</figure>",
},
// Default directory name. To be overwritten by user with
// a command–line flag.
var imageDirectoryName = "/directory_name"
Time to run the latest test:
=== RUN TestConvertMarkdownFileToBlogHTML
2024/02/03 13:40:42 unable to convert string to number:strconv.Atoi: parsing "[image_name.png": invalid syntax
As there is no case for the !
character, it is being read as a plain rune and the function then attempts to read the image as if it was a footnote. Time to add a new case.
// Default directory name. To be overwritten by user with
// a command–line flag.
var imageDirectoryName = "/directory_name"
//...
case '!':
// Assumes structure of ![[image_name.png]]
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if nextR != '[' {
// not an image
continue
}
var imageNameAndExtension = strings.Builder{}
for nextR != '\n' {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break // to update for plain '!'
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if nextR != '[' && nextR != ']' {
imageNameAndExtension.WriteRune(nextR)
}
}
sb.WriteString("<figure class=\"image\">")
sb.WriteRune('\n')
sb.WriteString("<img src=\"" + imageDirectoryName + "/" + imageNameAndExtension.String() + "\">")
sb.WriteRune('\n')
sb.WriteString("</figure>")
Here, once a !
is found, read the next character to see if it is an image. I assume that only images will have the combination of ![
.
Let's see how this shakes out.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:236: TestConvertMarkdownFileToBlogHTML test number: 34
Test name: unordered lists should have <ul> tags and <li> tags
expected:
<h1> Unordered List!</h1>
<p>
<ul>
<li> This is an unordered list with a – dash.</li>
<li> One,</li>
<li> Two,</li>
<li> Three.</li>
</ul>
</p>
but got:
<h1> Unordered List</h1>
<ul>
<li> This is an unordered list with a – dash.</li>
<li> One,</li>
<li> Two,</li>
<li> Three.</li>
</ul>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
So, the new test passes, but the unordered list test now fails. This is due to the !
not being written as a plain character.
if nextR != '[' {
sb.WriteRune('!')
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
continue
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now what about a !
at the end of a file?
{
name: "! at the end of a file should be written correctly.",
input: "A sentence!",
output: "A sentence!",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:241: TestConvertMarkdownFileToBlogHTML test number: 40
Test name: ! at the end of a file should be written correctly.
expected:
A sentence!
but got:
A sentence
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
A small correction to the io.EOF
condition fixes this.
case '!':
nextR, _, err := br.ReadRune()
if err == io.EOF {
sb.WriteRune('!')
break
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
It is high–time to refactor the code.
Refactoring
Header
Simply extracting the header case from the main function takes us from:
case '#':
if !finishedCountingHeaderTagsForLine {
headerCount++
} else if finishedCountingHeaderTagsForLine {
sb.WriteRune('#')
}
lastCharacterWasANewLine = false
To:
case '#':
countOpeningHeaderTagNumber(
&sb,
&finishedCountingHeaderTagsForLine,
&headerCount,
&lastCharacterWasANewLine,
)
//...
func countOpeningHeaderTagNumber(sb *strings.Builder, finishedCountingHeaderTagsForLine *bool, headerCount *int, lastCharacterWasANewLine *bool) {
if *finishedCountingHeaderTagsForLine {
sb.WriteRune('#')
} else {
*headerCount++
}
*lastCharacterWasANewLine = false
}
This passes the test suite:
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
However, there's an opportunity sitting on the sidewalk here. The header tag is still not added until the case ' ':
is found, and multiple variables are passed into this function to provide context. One can check if a header function needs to be added at all, and if so undertake all the work to add header tags in the same function. The closing header tags are also in other places. There is no need to have this information strewn across multiple cases.
Before proceeding, I'll add one more test to the test suite, checking that HTML elements in a header at replaced correctly.
{
name: "html elements in a header should be replaced correctly",
input: "# A header with html < > \" ' – elements",
output: "<h1> A header with html < > " ' – elements</h1>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Good. Time to rewrite this.
First, to check if a header tag should be added at all. This should be if this is the first character in the file, or the last character was a new line. If the string builder is empty, I assume the byte reader is at the first character of the file.
case '#':
if sb.Len() == 0 {
addHeaderTags(//...)
} else if lastCharacterWasANewLine {
addHeaderTags(//...)
} else {
sb.WriteRune('#')
}
As the closing header tags will be added in the function, the headerCount
and finishedCountingHeaderTagsForLine
variables can be declared in there rather than in this outer function. lastCharacterWasANewLine
is also not needed in this function, as it is decided by the coordinating function.
func addHeaderTags(br *bytes.Reader, sb *strings.Builder) {
var finishedCountingHeaderTagsForLine = false
var headerCount = 1 // assume 1, as a '#' has been seen in order to get to here
var nextR rune
for nextR != '\n' {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune in header:", err)
}
if !finishedCountingHeaderTagsForLine {
if nextR == '#' {
headerCount++
}
if nextR == ' ' {
finishedCountingHeaderTagsForLine = true
sb.WriteString("<h" + strconv.Itoa(headerCount) + ">")
sb.WriteRune(' ')
}
} else if nextR == '\n' {
err := br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune when adding header tags:", err)
}
break
} else {
sb.WriteRune(nextR)
}
}
sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
}
There was a bit of a regression here:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:246: TestConvertMarkdownFileToBlogHTML test number: 41
Test name: html elements in a header should be replaced correctly
expected:
<h1> A header with html < > " ' – elements</h1>
but got:
<h1> A header with html < > " ' – elements</h1>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Throughout the function there is a repeating check of:
if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, nextR) {
sb.WriteString(htmlEntityMap[nextR])
} else {
sb.WriteRune(nextR)
}
This can become its own function.
func addRuneOrHTMLEntity(r rune, sb *strings.Builder) {
if slices.Contains([]rune{'\'', '<', '>', '"', '–'}, r) {
sb.WriteString(htmlEntityMap[r])
} else {
sb.WriteRune(r)
}
}
All other instances can be replaced with a call to the function, together with the else { sb.WriteRune(nextR) }
in addHeaderTags
.
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now the headerCount
and finishedCountingHeaderTagsForLine
variables can be removed from the main function, together with any remaining header tag functions elsewhere. This simplifies the:
- first
io.EOF
check in the for loop, - the
case ' ':
, - the
case '\n':
.
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
We can go a step further as well, and take the check for whether a #
leads to a header into its own function too.
case '#':
addHeaderTagsOrPoundRune(br, &sb, lastCharacterWasANewLine)
//...
func addHeaderTagsOrPoundRune(br *bytes.Reader, sb*strings.Builder, lastCharacterWasANewLine bool){
if sb.Len() == 0 {
addHeaderTags(br, sb)
} else if lastCharacterWasANewLine {
addHeaderTags(br, sb)
} else {
sb.WriteRune('#')
}
}
The addClosingHeaderTag
function is now redundant and can be removed.
Spaces
In extracting out the header case, the space case is now simple enough to leave as–is. The same type of simplification will naturally arise if code blocks are refactored out before other cases.
case ' ':
sb.WriteRune(r)
lastCharacterWasANewLine = false
Code Blocks
case '`':
addCodeBlock(br, &sb, &numberOfCurrentBackQuotes, &thereIsACodeBlockOpen)
lastCharacterWasANewLine = false
//...
func addCodeBlock(br *bytes.Reader, sb *strings.Builder, numberOfCurrentBackQuotes *int, thereIsACodeBlockOpen *bool) {
*numberOfCurrentBackQuotes++
nextR, _, err := br.ReadRune()
if err == io.EOF {
if *thereIsACodeBlockOpen {
sb.WriteString("</code>")
if *numberOfCurrentBackQuotes == 6 {
sb.WriteString("</pre>")
}
*thereIsACodeBlockOpen = false
*numberOfCurrentBackQuotes = 0
}
return
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '`' {
*numberOfCurrentBackQuotes++
return
} else {
if *numberOfCurrentBackQuotes == 3 || *numberOfCurrentBackQuotes == 6 {
if *thereIsACodeBlockOpen {
sb.WriteString("</code></pre>")
*thereIsACodeBlockOpen = false
*numberOfCurrentBackQuotes = 0
} else {
sb.WriteString("<pre><code>")
*thereIsACodeBlockOpen = true
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
}
}
} else {
if *thereIsACodeBlockOpen {
sb.WriteString("</code>")
*thereIsACodeBlockOpen = false
*numberOfCurrentBackQuotes = 0
} else {
sb.WriteString("<code>")
*thereIsACodeBlockOpen = true
}
}
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Time to improve upon this.
Neither numberOfCurrentBackQuotes
nor thereIsACodeBlockOpen
needs to be available to other parts of the for loop. They can be declared in addCodeBlock
function.
To contain more of the code–block–specific logic in this function, it will loop over runes. The code can also be rearranged slightly to make it slightly easier to read. Additionally, the addRuneOrHTMLEntity
function can be used to add all of the characters and HTML entities which needed to be added by other rune cases.
func addCodeBlock(br *bytes.Reader, sb *strings.Builder) {
var numberOfCurrentBackQuotes = 1
var thereIsACodeBlockOpen = false
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
if thereIsACodeBlockOpen {
sb.WriteString("</code>")
if numberOfCurrentBackQuotes == 6 {
sb.WriteString("</pre>")
}
thereIsACodeBlockOpen = false
numberOfCurrentBackQuotes = 0
}
return
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '`' {
numberOfCurrentBackQuotes++
if thereIsACodeBlockOpen {
if numberOfCurrentBackQuotes == 2 {
sb.WriteString("</code>")
return
}
}
if numberOfCurrentBackQuotes == 3 {
sb.WriteString("<pre><code>")
thereIsACodeBlockOpen = true
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
}
sb.WriteRune('\n')
}
if numberOfCurrentBackQuotes == 6 {
sb.WriteString("</code></pre>")
return
}
} else {
if !thereIsACodeBlockOpen {
if numberOfCurrentBackQuotes == 1 {
sb.WriteString("<code>")
thereIsACodeBlockOpen = true
} else if numberOfCurrentBackQuotes == 2 {
sb.WriteString("</code>")
return
}
}
addRuneOrHTMLEntity(nextR, sb)
}
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now the thereIsACodeBlockOpen
conditions littered through the different cases can be deleted. This simplifies:
-
case '\n':
, -
case '*':
, -
case '–':
, -
case '|':
.
For example, the asterisk case changes from:
case '*':
if thereIsACodeBlockOpen {
sb.WriteRune('*')
} else {
countAsterisks++
}
lastCharacterWasANewLine = false
To:
case '*':
countAsterisks++
lastCharacterWasANewLine = false
Images
The logic for adding an image tag is relatively simple and can just be extracted out as–is for now. As it doesn't interact with any other cases, it passes tests as it.
case '!':
addImageTags(br, &sb)
//...
func addImageTags(br *bytes.Reader, sb *strings.Builder) {
// Assumes structure of ![[image_name.png]]
nextR, _, err := br.ReadRune()
if err == io.EOF {
sb.WriteRune('!')
return
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if nextR != '[' {
sb.WriteRune('!')
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
return
}
var imageNameAndExtension = strings.Builder{}
for nextR != '\n' {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if nextR != '[' && nextR != ']' {
imageNameAndExtension.WriteRune(nextR)
}
}
sb.WriteString("<figure class=\"image\">")
sb.WriteRune('\n')
sb.WriteString("<img src=\"" + imageDirectoryName + "/" + imageNameAndExtension.String() + "\">")
sb.WriteRune('\n')
sb.WriteString("</figure>")
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Unordered List
As above, this will first be extracted out as–is to start with.
case '–':
addUnorderedList(br, &sb, r, &thereIsAnUnorderedListOpen, &lastCharacterWasANewLine)
lastCharacterWasANewLine = false
//...
func addUnorderedList(br *bytes.Reader, sb *strings.Builder, r rune, thereIsAnUnorderedListOpen *bool, lastCharacterWasANewLine *bool) {
var err error
if *thereIsAnUnorderedListOpen {
sb.WriteString("<li>")
var nextR rune
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read ahead by one rune:", err)
}
if nextR == '\n' {
break
}
addRuneOrHTMLEntity(nextR, sb)
}
sb.WriteString("</li>")
sb.WriteRune('\n')
} else if *lastCharacterWasANewLine {
if *thereIsAnUnorderedListOpen {
sb.WriteString("<li>")
var nextR rune
for nextR != '\n' {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read ahead by one rune:", err)
}
if nextR == '\n' {
break
}
addRuneOrHTMLEntity(nextR, sb)
}
sb.WriteString("</li>")
sb.WriteRune('\n')
} else {
sb.WriteString("<ul>")
*thereIsAnUnorderedListOpen = true
sb.WriteRune('\n')
sb.WriteString("<li>")
var nextR rune
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read ahead by one rune:", err)
}
if nextR == '\n' {
break
}
addRuneOrHTMLEntity(nextR, sb)
}
sb.WriteString("</li>")
sb.WriteRune('\n')
}
} else {
sb.WriteString(htmlEntityMap[r])
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
The rune r
is only passed into to add an HTML entity if the last character was not a new line. Both of these can be extracted out rather than being passed into the function. thereIsAnUnorderedListOpen
can also be declared within the function.
case '–':
if lastCharacterWasANewLine {
addUnorderedList(br, &sb)
} else {
sb.WriteString(htmlEntityMap[r])
}
lastCharacterWasANewLine = false
addUnorderedList
also needs to loop over runes. In doing this, the remaining code can be simplified.
func addUnorderedList(br *bytes.Reader, sb *strings.Builder) {
var lastCharacterWasANewLine = false
var nextR rune
var err error
sb.WriteString("<ul>")
sb.WriteRune('\n')
sb.WriteString("<li>")
for {
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteString("</li>")
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read rune when creating an unordered list:", err)
}
if nextR == '–' {
if lastCharacterWasANewLine {
sb.WriteString("<li>")
} else {
sb.WriteString(htmlEntityMap[nextR])
}
lastCharacterWasANewLine = false
} else if nextR == '\n' {
sb.WriteString("</li>")
sb.WriteRune('\n')
lastCharacterWasANewLine = true
} else {
addRuneOrHTMLEntity(nextR, sb)
lastCharacterWasANewLine = false
}
}
sb.WriteString("</ul>")
}
The only way out of this new code is via an io.EOF
error. Let's add a test.
# Header
– Unordered
– List
End of file.
<h1> Header</h1>
<p>
<ul>
<li> Unordered</li>
<li> List</li>
</ul>
</p>
<p>
End of file.
</p>
This confirms the above suspicion. There's also an unnecessary \n
between list items.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 42
Test name: unordered lists should have their tags closed correctly before the next piece of content
expected:
<h1> Header</h1>
<p>
<ul>
<li> Unordered</li>
<li> List</li>
</p>
<p>
End of file.
</p>
but got:
<h1> Header</h1>
<p>
<ul>
<li> Unordered</li>
<li> List</li>
</li>
End of file.</li>
</ul>
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
A check for consecutive \n
should catch this out.
func addUnorderedList(br *bytes.Reader, sb *strings.Builder) {
var lastCharacterWasANewLine = false
sb.WriteString("<ul>")
sb.WriteRune('\n')
sb.WriteString("<li>")
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
sb.WriteString("</li>")
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read rune when creating an unordered list:", err)
}
if nextR == '–' {
if lastCharacterWasANewLine {
sb.WriteString("<li>")
} else {
sb.WriteString(htmlEntityMap[nextR])
}
lastCharacterWasANewLine = false
} else if nextR == '\n' {
if lastCharacterWasANewLine {
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune at end of unordered list:", err)
}
break
} else {
sb.WriteString("</li>")
sb.WriteRune('\n')
lastCharacterWasANewLine = true
}
} else {
addRuneOrHTMLEntity(nextR, sb)
lastCharacterWasANewLine = false
}
}
sb.WriteString("</ul>")
}
There's a new issue with paragraph tags, however. This will have to be solved as the new line character case is refactored.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 42
Test name: unordered lists should have their tags closed correctly before the next piece of content
expected:
<h1> Header</h1>
<p>
<ul>
<li> Unordered</li>
<li> List</li>
</ul>
</p>
<p>
End of file.
</p>
but got:
<h1> Header</h1>
<p>
<ul>
<li> Unordered</li>
<li> List</li>
</ul>
</p>
End of file.
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
The case '–':
can set lastCharacterWasANewLine
to true after an unordered list has been added.
case '–':
if lastCharacterWasANewLine {
addUnorderedList(br, &sb)
lastCharacterWasANewLine = true
} else {
sb.WriteString(htmlEntityMap[r])
lastCharacterWasANewLine = false
}
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 42
Test name: unordered lists should have their tags closed correctly before the next piece of content
expected:
<h1> Header</h1>
<p>
<ul>
<li> Unordered</li>
<li> List</li>
</ul>
</p>
<p>
End of file.
</p>
but got:
<h1> Header</h1>
<p>
<ul>
<li> Unordered</li>
<li> List</li>
</ul>
</p><p>
End of file.
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
This has arisen due to the </ul>
being added within the logic rather than in the case '\n':
, leaving one fewer \n
to write between the paragraph tags. It is not possible to unread a rune twice. A tacked–on solution is to add thereIsAnUnorderedListOpen
back in, to ensure that the new line character is accounted for. This will be followed for now, though it is admittedly inelegant.
thereIsAnUnorderedListOpen := false
//...
case '\n':
//...
if thereIsAnUnorderedListOpen {
sb.WriteRune('\n')
thereIsAnUnorderedListOpen = false
}
sb.WriteString("<p>")
//...
case '–':
if lastCharacterWasANewLine {
addUnorderedList(br, &sb)
thereIsAnUnorderedListOpen = true
lastCharacterWasANewLine = true
//...
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Tables
To start with, the table logic can be extracted out as–is.
case '|':
addTable(br, &sb)
//...
func addTable(br *bytes.Reader, sb *strings.Builder) {
var err error
sb.WriteString("<table class=\"table is–hoverable\">")
sb.WriteRune('\n')
sb.WriteString("<thead>")
sb.WriteRune('\n')
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<th>")
var nextR rune
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '|' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteString("</th>")
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '\n' {
sb.WriteString("</th>")
sb.WriteRune('\n')
break
}
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
sb.WriteString("</th>")
sb.WriteRune('\n')
sb.WriteString("<th>")
} else {
addRuneOrHTMLEntity(nextR, sb)
}
}
sb.WriteString("</tr>")
sb.WriteRune('\n')
sb.WriteString("</thead>")
sb.WriteRune('\n')
var afterR rune
for afterR != '\n' {
afterR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
}
sb.WriteString("<tbody>")
sb.WriteRune('\n')
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteString("</tbody>")
sb.WriteRune('\n')
sb.WriteString("</table>")
return
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '|' {
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<td>")
numOfConsecutiveNewLines := 0
for numOfConsecutiveNewLines < 2 {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '\n' {
sb.WriteString("</tr>")
sb.WriteRune('\n')
numOfConsecutiveNewLines++
} else {
if numOfConsecutiveNewLines == 1 {
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<td>")
} else {
if nextR == '|' {
sb.WriteString("</td>")
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteRune('\n')
sb.WriteString("</tr>")
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
sb.WriteRune('\n')
if nextR != '\n' {
sb.WriteString("<td>")
}
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
} else {
addRuneOrHTMLEntity(nextR, sb)
}
}
numOfConsecutiveNewLines = 0
}
}
}
sb.WriteString("</tbody>")
sb.WriteRune('\n')
sb.WriteString("</table>")
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
This can further be broken down to have the header section and each table row be added in their own functions.
func addTable(br *bytes.Reader, sb *strings.Builder) {
sb.WriteString("<table class=\"table is–hoverable\">")
sb.WriteRune('\n')
addTableHeader(br, sb)
skipTableHeaderLine(br)
addTableBody(br, sb)
sb.WriteString("</table>")
}
func addTableHeader(br *bytes.Reader, sb *strings.Builder) {
sb.WriteString("<thead>")
sb.WriteRune('\n')
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<th>")
var nextR rune
var err error
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '|' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteString("</th>")
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '\n' {
sb.WriteString("</th>")
sb.WriteRune('\n')
break
}
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
sb.WriteString("</th>")
sb.WriteRune('\n')
sb.WriteString("<th>")
} else {
addRuneOrHTMLEntity(nextR, sb)
}
}
sb.WriteString("</tr>")
sb.WriteRune('\n')
sb.WriteString("</thead>")
sb.WriteRune('\n')
}
func skipTableHeaderLine(br *bytes.Reader) {
var afterR rune
var err error
for afterR != '\n' {
afterR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
}
}
func addTableBody(br *bytes.Reader, sb *strings.Builder) {
sb.WriteString("<tbody>")
sb.WriteRune('\n')
nextR, _, err := br.ReadRune()
if err == io.EOF {
sb.WriteString("</tbody>")
sb.WriteRune('\n')
sb.WriteString("</table>")
return
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '|' {
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<td>")
numOfConsecutiveNewLines := 0
for numOfConsecutiveNewLines < 2 {
nextR, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == '\n' {
sb.WriteString("</tr>")
sb.WriteRune('\n')
numOfConsecutiveNewLines++
} else {
if numOfConsecutiveNewLines == 1 {
sb.WriteString("<tr>")
sb.WriteRune('\n')
sb.WriteString("<td>")
} else {
if nextR == '|' {
sb.WriteString("</td>")
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteRune('\n')
sb.WriteString("</tr>")
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
sb.WriteRune('\n')
if nextR != '\n' {
sb.WriteString("<td>")
}
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
} else {
addRuneOrHTMLEntity(nextR, sb)
}
}
numOfConsecutiveNewLines = 0
}
}
}
sb.WriteString("</tbody>")
sb.WriteRune('\n')
}
A quick sanity check:
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 35
Test name: the head of a table should be added correctly
expected:
<table class="table is–hoverable">
<thead>
<tr>
<th> Table </th>
<th> Head </th>
</tr>
</thead>
<tbody>
</tbody>
</table>
but got:
<table class="table is–hoverable">
<thead>
<tr>
<th> Table </th>
<th> Head </th>
</tr>
</thead>
<tbody>
</tbody>
</table></table>
main_test.go:251: TestConvertMarkdownFileToBlogHTML test number: 36
Test name: the border line after the header of a table should be added correctly
expected:
<table class="table is–hoverable">
<thead>
<tr>
<th> Table </th>
<th> Head </th>
</tr>
</thead>
<tbody>
</tbody>
</table>
but got:
<table class="table is–hoverable">
<thead>
<tr>
<th> Table </th>
<th> Head </th>
</tr>
</thead>
<tbody>
</tbody>
</table></table>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
The </table>
tag has doubled up. This is due to the io.EOF
condition in addTableBody
.
if err == io.EOF {
sb.WriteString("</tbody>")
sb.WriteRune('\n')
sb.WriteString("</table>")
return
}
Taking this out resolves the issue.
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Footnotes
To start with, I'll try to extract the footnote code out into its own function. However, I cannot pass a pointer to the footnote map along with it, as it isn't possible to index on a pointer to a map. Either the map could be copied into the function each time and the function could return an updated map, or the map can be moved into a higher scope and thus be available to both functions by default. For now, I've opted for the latter.
var imageDirectoryName = "/directory_name"
var footnoteNumberMap = map[int]int{}
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
//...
case '[':
addFootNote(br, &sb, &inlineFootnoteNumber)
lastCharacterWasANewLine = false
//...
}
func addFootNote(br *bytes.Reader, sb *strings.Builder, inlineFootnoteNumber *int) {
inTextFootnoteNumber := strings.Builder{}
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == ']' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteString(
"<a id=\"footnote–anchor–" +
strconv.Itoa(*inlineFootnoteNumber) +
"\" href=\"#footnote–" +
strconv.Itoa(*inlineFootnoteNumber) +
"\">[" + strconv.Itoa(*inlineFootnoteNumber) +
"]</a>",
)
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
if nextR == ':' {
footnoteNumber, err := strconv.Atoi(inTextFootnoteNumber.String())
if err != nil {
log.Fatal("unable to convert string to number:", err)
}
sb.WriteString(
"<p id=\"footnote–" +
strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
"\">\n<a href=\"#footnote–anchor–" +
strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
"\">[" +
strconv.Itoa(footnoteNumberMap[footnoteNumber]) +
"]</a>",
)
sb.WriteRune('\n')
for nextR != '\n' {
nextR, _, err = br.ReadRune()
if err == io.EOF {
sb.WriteRune('\n')
break
}
if err != nil {
log.Fatal("unable to read next rune:", err)
}
addRuneOrHTMLEntity(nextR, sb)
}
sb.WriteString("</p>")
if err == io.EOF {
break
}
sb.WriteRune('\n')
} else {
sb.WriteString(
"<a id=\"footnote–anchor–" +
strconv.Itoa(*inlineFootnoteNumber) +
"\" href=\"#footnote–" +
strconv.Itoa(*inlineFootnoteNumber) +
"\">[" + strconv.Itoa(*inlineFootnoteNumber) +
"]</a>",
)
footnoteOriginalNumber, err := strconv.Atoi(inTextFootnoteNumber.String())
if err != nil {
log.Fatal("unable to convert string to number:", err)
}
footnoteNumberMap[footnoteOriginalNumber] = *inlineFootnoteNumber
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune:", err)
}
}
break
} else if nextR == '^' {
*inlineFootnoteNumber++
} else if nextR != '^' {
inTextFootnoteNumber.WriteRune(nextR)
}
}
}
Italics and Bold
The logic for bold and italics is currently spread across two cases.
case '*':
countAsterisks++
lastCharacterWasANewLine = false
//...
default:
lastCharacterWasANewLine = false
if countAsterisks > 0 {
if thereIsAnItalicsOrBoldTagToClose {
if countAsterisks == 1 {
sb.WriteString("</i>")
} else if countAsterisks == 2 {
sb.WriteString("</b>")
} else if countAsterisks == 3 {
sb.WriteString("</b></i>")
}
countAsterisks = 0
thereIsAnItalicsOrBoldTagToClose = false
} else {
if countAsterisks == 1 {
sb.WriteString("<i>")
} else if countAsterisks == 2 {
sb.WriteString("<b>")
} else if countAsterisks == 3 {
sb.WriteString("<i><b>")
}
countAsterisks = 0
thereIsAnItalicsOrBoldTagToClose = true
}
}
sb.WriteRune(r)
}
This can all be put into the same function. Whilst reading ahead, I assume that any characters between the tags are to be added as–is or replaced with HTML entities, and that the only asterisks seen are those which mark the closing of the tags.
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
//...
case '*':
addItalicsAndOrBoldTags(br, &sb)
lastCharacterWasANewLine = false
//...
default:
lastCharacterWasANewLine = false
sb.WriteRune(r)
//...
}
//...
func addItalicsAndOrBoldTags(br *bytes.Reader, sb *strings.Builder) {
asteriskCount := 1
asteriskCountNeededToCloseTags := 0
stillCountingAsterisks := true
italicsOrBoldTagOpen := false
for {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read ahead by one rune when adding italics or bold tags:", err)
}
if nextR == '*' {
if stillCountingAsterisks {
asteriskCount++
if asteriskCount == asteriskCountNeededToCloseTags {
break
}
} else {
addRuneOrHTMLEntity(nextR, sb)
}
}
if nextR != '*' {
if !italicsOrBoldTagOpen {
stillCountingAsterisks = false
switch asteriskCount {
case 1:
sb.WriteString("<i>")
asteriskCountNeededToCloseTags = 1
case 2:
sb.WriteString("<b>")
asteriskCountNeededToCloseTags = 2
case 3:
sb.WriteString("<i><b>")
asteriskCountNeededToCloseTags = 3
}
asteriskCount = 0
addRuneOrHTMLEntity(nextR, sb)
italicsOrBoldTagOpen = true
} else {
stillCountingAsterisks = true
addRuneOrHTMLEntity(nextR, sb)
}
}
}
switch asteriskCount {
case 1:
sb.WriteString("</i>")
case 2:
sb.WriteString("</b>")
case 3:
sb.WriteString("</b></i>")
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
These assumptions can be broken by adding the following two tests. Here, I will skip implementing these for now and they are only highlighted for awareness.
{
name: "italics tags should correctly surround text with an '*' in it which has spaces either side",
input: "*This text contains * an asterisk.*",
output: "<i>This text contains * an asterisk.</i>",
},
{
name: "a solitary '*' at the end should not create an italics tag",
input: "*This text contains* an asterisk.*",
output: "<i>This text contains</i> an asterisk.*",
},
A Small Test File
Now to see if passing all of these small tests translates into a small file. There are a few extensions included in the below (such as allowing a list to have italicised or emboldened text). Let's see where it breaks.
# Introduction
## A Small File
This is a *small* file. It contains – neigh – requires the program to correctly translate a variety of different Obsidian Markdown elements into the HTML elements I want.
![[image_name.png]]
For example:
– paragraphs[^1]
– "0 < 1"
– "2 > 1"
– **and**
– ***headings***
– `Code blocks`
```Pseudocode
fn removeCharacterFromList(remList list, charToRemove char) list {
match remList {
case x::[]:
match x {
charToRemove: []
_: x
}
case x::xs:
match x {
charToRemove: removeCharacterFromList(xs, charToRemove)
_: x::removeCharacterFromList(xs, charToRemove)
}
}
}
removeCharacterFromList(['a', 'b', 'c'], 'a')
```
## A table conclusion
Another footnote.[^2]
| A table | must have | columns |
|––|––|––|
| and rows. | which may have an arbitrary amount of content | |
[^1]: With footnotes!
[^2]: Pseudocode.
<h1> Introduction</h1>
<p>
</h2> A Small File</h2>
</p>
<p>
This is a <i>small</i> file. It contains – neigh – requires the program to correctly translate a variety of different Obsidian Markdown elements into the HTML elements I want.
</p>
<p>
<figure class="image">
<img src="/directory_name/image_name.png">
</figure>
</p>
<p>
For example:
</p>
<p>
<ul>
<li> paragraphs<a id="footnote–anchor–1" href="#footnote–1">[1]</a></li>
<li> "0 < 1"</li>
<li> "2 > 1"</li>
<li> <b>and</b></li>
<li> <i><b>headings</b></i></li>
<li> <code>Code blocks</code></li>
</ul>
</p>
<p>
<pre><code>
fn removeCharacterFromList(remList list, charToRemove char) list {
match remList {
case x::[]:
match x {
charToRemove: []
_: x
}
case x::xs:
match x {
charToRemove: removeCharacterFromList(xs, charToRemove)
_: x::removeCharacterFromList(xs, charToRemove)
}
}
}
removeCharacterFromList(['a', 'b', 'c'], 'a')
</code></pre>
</p>
<p>
<h2> A table conclusion</h2>
</p>
<p>
Another footnote.<a id="footnote–anchor–2" href="#footnote–2">[2]</a>
</p>
<p>
<table class="table is–hoverable">
<thead>
<tr>
<th> A table </th>
<th> must have </th>
<th> columns </th>
</tr>
</thead>
<tbody>
<tr>
<td> and rows. </td>
<td> which may have an arbitrary amount of content </td>
<td> </td>
</tr>
</tbody>
</table>
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
With footnotes!
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
Pseudocode.
</p>
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:256: TestConvertMarkdownFileToBlogHTML test number: 43
Test name: integration test: a small file
expected:
#...
but got:
<h1> Introduction</h1>
<p>
<h2> A Small File</h2>
</p><p>
</p><p>
This is a <i>small</i> file. It contains – neigh – requires the program to correctly translate a variety of different Obsidian Markdown elements into the HTML elements I want.
</p>
<p>
<figure class="image">
<img src="/directory_name/image_name.png
For example:
– paragraphs^1
– "0 < 1"
– "2 > 1"
– **and**
– ***headings***
– `Code blocks`
```Pseudocode
fn removeCharacterFromList(remList list, charToRemove char) list {
match remList {
case x:::
match x {
charToRemove:
_: x
}
case x::xs:
match x {
charToRemove: removeCharacterFromList(xs, charToRemove)
_: x::removeCharacterFromList(xs, charToRemove)
}
}
}
removeCharacterFromList('a', 'b', 'c', 'a')
```
## A table conclusion
Another footnote.^2
| A table | must have | columns |
|––|––|––|
| and rows. | which may have an arbitrary amount of content | |
^1: With footnotes!
^2: Pseudocode.">
</figure>
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Well, something is going wrong with the image tag.
sb.WriteString("<img src=\"" + imageDirectoryName + "/" + imageNameAndExtension.String() + "\">")
The imageNameAndExtension
variable is reading until the end. Let's add an additional way to break out of the loop.
for nextR != '\n' {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if nextR != '[' && nextR != ']' {
imageNameAndExtension.WriteRune(nextR)
}
if nextR == ']' {
_, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read last ] of an image:", err)
}
break
}
}
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:256: TestConvertMarkdownFileToBlogHTML test number: 43
Test name: integration test: a small file
expected:
#...
but got:
<h1> Introduction</h1>
<p>
<h2> A Small File</h2>
</p><p>
</p><p>
This is a <i>small</i> file. It contains – neigh – requires the program to correctly translate a variety of different Obsidian Markdown elements into the HTML elements I want.
</p>
<p>
<figure class="image">
<img src="/directory_name/image_name.png">
</figure>
</p><p>
</p><p>
For example:
</p>
<p>
<ul>
<li> paragraphs[^1]</li>
<li> "0 < 1"</li>
<li> "2 > 1"</li>
<li> **and**</li>
<li> ***headings***</li>
<li> Code blocks</li>
</ul>
</p>
<p>
<pre><code>
fn removeCharacterFromList(remList list, charToRemove char) list {
match remList {
case x::[]:
match x {
charToRemove: []
_: x
}
case x::xs:
match x {
charToRemove: removeCharacterFromList(xs, charToRemove)
_: x::removeCharacterFromList(xs, charToRemove)
}
}
}
removeCharacterFromList(['a', 'b', 'c'], 'a')
</code></pre>
</p>
<p>
<h2> A table conclusion</h2>
</p><p>
</p><p>
Another footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
<p>
<table class="table is–hoverable">
<thead>
<tr>
<th> A table </th>
<th> must have </th>
<th> columns </th>
</tr>
</thead>
<tbody>
<tr>
<td> and rows. </td>
<td> which may have an arbitrary amount of content </td>
<td> </td>
</tr>
</tr>
</tbody>
</table><p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
With footnotes!
</p>
<p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
Pseudocode.
</p>
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
That's not too bad. The unordered list, which contains a fair number of untested combinations, is the most egregious failure in this test case. There are some additional paragraph tags for new lines, which seem a little inconsistent.
Let's start with the unordered list. I'll extract out the main part to be tested from the last test, and comment it out for the moment.
{
name: "an unordered list may contain italics tags, bold tags, and inline code blocks",
input: "For example:\n\n– paragraphs[^1]\n– \"0 < 1\"\n– \"2 > 1\"\n– **and**\n– ***headings***\n– `Code blocks`",
output: "For example:\n<p>\n<ul>\n<li> paragraphs<a id=\"footnote–anchor–1\" href=\"#footnote–1\">[1]</a></li>\n<li> "0 < 1"</li>\n<li> "2 > 1"</li>\n<li> <b>and</b></li>\n<li> <i><b>headings</b></i></li>\n<li> <code>Code blocks</code></li>\n</ul>\n</p>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:261: TestConvertMarkdownFileToBlogHTML test number: 43
Test name: an unordered list may contain italics tags, bold tags, and inline code blocks
expected:
For example:
<p>
<ul>
<li> paragraphs<a id="footnote–anchor–1" href="#footnote–1">[1]</a></li>
<li> "0 < 1"</li>
<li> "2 > 1"</li>
<li> <b>and</b></li>
<li> <i><b>headings</b></i></li>
<li> <code>Code blocks</code></li>
</ul>
</p>
but got:
For example:
<p>
<ul>
<li> paragraphs[^1]</li>
<li> "0 < 1"</li>
<li> "2 > 1"</li>
<li> **and**</li>
<li> ***headings***</li>
<li> `Code blocks`</li>
</ul>
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Superficially, this should be straightforward now that the footnote, italics, and code block logic are all contained in their own functions. The inlineFootnoteNumber
needs to be accessible to this function as well as the the others. I'll move it into the outer scope. Note that this still needs to be initialised to 0
within convertMarkdownFileToBlogHTML
to reset it between tests.
var imageDirectoryName = "/directory_name"
var footnoteNumberMap = map[int]int{}
var inlineFootnoteNumber int
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
//...
inlineFootnoteNumber = 0
//...
}
//...
func addFootNote(br *bytes.Reader, sb *strings.Builder) {
//...
}
func addUnorderedList(br *bytes.Reader, sb *strings.Builder) {
//...
for {
nextR, _, err := br.ReadRune()
//...
} else if nextR == '[' {
addFootNote(br, sb, inlineFootnoteNumber)
} else if nextR == '*' {
addItalicsAndOrBoldTags(br, sb)
} else if nextR == '`' {
addCodeBlock(br, sb)
} else {
addRuneOrHTMLEntity(nextR, sb)
lastCharacterWasANewLine = false
}
//...
}
It turns out it was.
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now for the additional paragraph tags. These arise after images, and after h2 headings. We'll add test cases for each of these.
# Introduction
![[image_name.png]]
For example:
<h1> Introduction</h1>
<p>
<figure class="image">
<img src="/directory_name/image_name.png">
</figure>
</p>
<p>
For example:
</p>
The most likely issue to cause this is an additional or lost new line character. Back in the addimageTags
function, we read until and including the \n
and write it into a string buffer. Let's read the additional \n
for the empty line.
for nextR != '\n' {
nextR, _, err := br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read rune:", err)
}
if nextR != '[' && nextR != ']' {
imageNameAndExtension.WriteRune(nextR)
}
if nextR == ']' {
_, _, err = br.ReadRune()
if err == io.EOF {
break
}
if err != nil {
log.Fatal("unable to read last ] of an image:", err)
}
break
}
}
_, _, err = br.ReadRune()
if err != nil && err != io.EOF {
log.Fatal("unable to read rune:", err)
}
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:271: TestConvertMarkdownFileToBlogHTML test number: 44
Test name: paragraph tags should be added correctly after an image is added
expected:
<h1> Introduction</h1>
<p>
<figure class="image">
<img src="/directory_name/image_name.png">
</figure>
</p>
<p>
For example:
</p>
but got:
<h1> Introduction</h1>
<p>
<figure class="image">
<img src="/directory_name/image_name.png">
</figure>
</p><p>
For example:
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Well, that produces the correct number of paragraph tags, but now there isn't a new line character to write between them. A workable, though inelegant, solution would be to repurpose the thereIsAnUnorderedListOpen
variable as a more general addNewLineCharBeforeOpeningPara
.
func convertMarkdownFileToBlogHTML(br *bytes.Reader) string {
//...
addNewLineCharBeforeOpeningPara := false
//...
for {
//...
case '!':
addImageTags(br, &sb)
lastCharacterWasANewLine = true
addNewLineCharBeforeOpeningPara = true
//...
}
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Now to fix the paragraph tags after an h2 header.
{
name: "paragraph tags should be added correctly after an h2 header",
input: "# Introduction\n\n## A Small File\n\nThis is a *small* file.",
output: "<h1> Introduction</h1>\n<p>\n<h2> A Small File</h2>\n</p>\n<p>\nThis is a <i>small</i> file.\n</p>",
},
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:276: TestConvertMarkdownFileToBlogHTML test number: 45
Test name: paragraph tags should be added correctly after an h2 header
expected:
<h1> Introduction</h1>
<p>
<h2> A Small File</h2>
</p>
<p>
This is a <i>small</i> file.
</p>
but got:
<h1> Introduction</h1>
<p>
<h2> A Small File</h2>
</p><p>
</p><p>
This is a <i>small</i> file.
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
Looking back over the cases, the case '#':
does not update the lastCharacterWasANewLine
variable.
case '#':
addHeaderTagsOrPoundRune(br, &sb, lastCharacterWasANewLine)
lastCharacterWasANewLine = true
Doing so breaks another five tests.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:276: TestConvertMarkdownFileToBlogHTML test number: 27
Test name: a footnote in a paragraph should have paragraph and anchor tags added correctly
expected:
<h1> This is a heading</h1>
<p>
Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
but got:
<h1> This is a heading</h1><p>
</p><p>
Here is a footnote.<a id="footnote–anchor–1" href="#footnote–1">[1]</a>
</p>
main_test.go:276: TestConvertMarkdownFileToBlogHTML test number: 28
#...
There's an error in addHeaderTags
. Here, upon hitting the \n
character, the function unreads a rune. It should only have done this if it read the second new line character as well.
func addHeaderTags(br *bytes.Reader, sb *strings.Builder) {
//...
for nextR != '\n' {
//...
} else if nextR == '\n' {
err := br.UnreadRune()
if err != nil {
log.Fatal("unable to unread rune when adding header tags:", err)
}
break
}
//...
}
sb.WriteString("</h" + strconv.Itoa(headerCount) + ">")
}
Removing this, so that the function simply breaks if \n
is found leads to the current tests passing:
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Time to uncomment and rerun the small file test.
=== RUN TestConvertMarkdownFileToBlogHTML
main_test.go:276: TestConvertMarkdownFileToBlogHTML test number: 43
Test name: integration test: a small file
#...
but got:
#...
<p>
<table class="table is–hoverable">
<thead>
<tr>
<th> A table </th>
<th> must have </th>
<th> columns </th>
</tr>
</thead>
<tbody>
<tr>
<td> and rows. </td>
<td> which may have an arbitrary amount of content </td>
<td> </td>
</tr>
</tr>
</tbody>
</table><p id="footnote–1">
<a href="#footnote–anchor–1">[1]</a>
With footnotes!
</p>
<p id="footnote–2">
<a href="#footnote–anchor–2">[2]</a>
Pseudocode.
</p>
</p>
––– FAIL: TestConvertMarkdownFileToBlogHTML (0.00s)
It still fails. The paragraph closing tag, which should wrap the table, instead also wraps the footnotes at the end.
This was another case where lastCharacterWasANewLine
had not been set to true. Correcting this doesn't break any more cases, but doesn't solve the small file test.
case '|':
addTable(br, &sb)
lastCharacterWasANewLine = true
The addTableBody
function failed to unread the new line character after the table.
func addTableBody(br *bytes.Reader, sb *strings.Builder) {
//...
if nextR == '|' {
//...
for numOfConsecutiveNewLines < 2 {
//...
if nextR == '\n' {
sb.WriteString("</tr>")
sb.WriteRune('\n')
numOfConsecutiveNewLines++
if numOfConsecutiveNewLines == 2 {
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread new line character after table:", err)
}
break
}
Adding these two parts in closes the paragraph tag immediately after the table. However, it does not solve the test case as a whole. There is now an additional </tr>
tag to remove.
Removing the </tr>
tag can be solved by correcting the logic around the number of consecutive new line characters in addTableBody
.
if nextR == '\n' {
numOfConsecutiveNewLines++
if numOfConsecutiveNewLines < 2 {
sb.WriteString("</tr>")
sb.WriteRune('\n')
}
if numOfConsecutiveNewLines == 2 {
err = br.UnreadRune()
if err != nil {
log.Fatal("unable to unread new line character after table:", err)
}
break
}
Now the small test file passes.
=== RUN TestConvertMarkdownFileToBlogHTML
––– PASS: TestConvertMarkdownFileToBlogHTML (0.00s)
PASS
Reading and Saving to a File
First, I'll create a tmp
directory in the project, to store the program's output. I'll put an input.txt
file in here, which the program will read from.
– project
| – main
| – main.go
| – main_test.go
| – tmp
| – input.txt
Then add a couple of functions to allow the program to read command line flags, create a byte reader, and save the results to a file. For now, I assume that the entire file can be read into memory. All carriage returns (\r
), if they exist, are removed.
func main() {
pathName := os.Args
br := getByteReadForFile(pathName[1])
res := convertMarkdownFileToBlogHTML(br)
saveToFile(res, pathName[2])
}
func getByteReadForFile(pathAndFilename string) *bytes.Reader {
bytesReadIn, err := os.ReadFile(pathAndFilename)
if err != nil {
log.Fatal("unable to find file:", err)
}
// replace carriage returns
bytesReadIn = bytes.ReplaceAll(bytesReadIn, []byte{'\r'}, []byte{})
return bytes.NewReader(bytesReadIn)
}
// See: https://gobyexample.com/writing–files
func saveToFile(res string, outputPathAndFileName string) {
f, err := os.Create(outputPathAndFileName)
if err != nil {
log.Fatal("unable to create file:", err)
}
defer f.Close()
numBytesWritten, err := f.WriteString(res)
if err != nil {
log.Fatal("error when writing to file:", err)
}
fmt.Printf("wrote %d bytes to file", numBytesWritten)
f.Sync()
}
On the command line, whilst in the project
directory, run the following command:
go run .\main\main.go .\tmp\input.txt .\tmp\output.html
This will cause the program to read whatever exists in input.txt
and write it out to a file called output.html
. If the latter doesn't exist yet, then it is created.
Let's update this to include the update value for the imageDirectoryName
variable.
func main() {
pathName := os.Args
br := getByteReadForFile(pathName[1])
res := convertMarkdownFileToBlogHTML(br, pathName[3])
saveToFile(res, pathName[2])
}
func convertMarkdownFileToBlogHTML(br *bytes.Reader, newImageDirectoryName string) string {
//...
imageDirectoryName = newImageDirectoryName
//...
}
This does require an update to main_test.go
. We need to feed in the "/directory_name"
string, as it is used in the current suite of tests.
for i, tst := range testCases {
res := convertMarkdownFileToBlogHTML(bytes.NewReader([]byte(tst.input)), "/directory_name")
if res != tst.output {
t.Errorf(
"TestConvertMarkdownFileToBlogHTML test number: %d \nTest name: %s \nexpected: \n%s \nbut got: \n%s",
i, tst.name, tst.output, res,
)
}
}
go run .\main\main.go .\tmp\input.txt .\tmp\output.html /the_image_directory_path
Putting the small file test input into input.txt
results in an HTML file. When dropped in a browser, it shows the following.
Forty–Six Tests Later
If anything, it is a little surprising that only forty–six tests were needed to create a program which could convert a small markdown file into HTML. Taking the test–first approach made this relatively quick to develop, however, and ensured that issues which arose during refactoring were all removed. It has also worked well enough for this post to be translated with it!
There were a few missing parts from the initial, slapdash grammar. For example, images were missing. These omissions were identified through creating test cases and added in without too much difficulty. A few potential extensions were not included, and have been listed below.
There should be an easier way to do this, however. There will be a follow–up post should I get it working.
Optional Extensions
A few extensions one could add include:
- hyperlinks,
- recursively allowing bold and italics tags within each other, as well as supporting underscores to indicate bold / italicised text,
- allowing footnotes to be included in tables,
- allowing footnotes to be included in code blocks,
- creating a contents list with the header values.
Code
All of the above code is available on Github
A previous version which used a linked list is also available on GitHub
Update
This post has been edited to correct a broken link.
Footnotes
[1] Li, Shida, Erica Xu, Steph Ango, Liam Cain, Johannes Theiner, Matthew Meyers, Tony Grosinger, and Rebbecca Bishop. 'Obsidian', 2024. https://obsidian.md/.
[2] Soares dos Santos, Estevão. 'Showdown', 2019. https://showdownjs.com/.
[3] Thomas, Jeremy. 'Bulma.Io', 2024. https://bulma.io/.
[4] Wikipedia. 'Test–Driven Development', 28 January 2024. https://en.wikipedia.org/wiki/Test–driven_development; Beck, Kent. _Test–Driven Development: By Example_. The Addison–Wesley Signature Series. Boston: Addison–Wesley, 2003; Siddiqui, Saleem. _Learning Test–Driven Development: A Polyglot Guide to Writing Uncluttered Code_. Sebastopol, CA: O'Reilly Media, Inc, USA, 2021.
[5] Li, Shida, Erica Xu, Steph Ango, Liam Cain, Johannes Theiner, Matthew Meyers, Tony Grosinger, and Rebbecca Bishop. 'Basic Formatting Syntax', 2024. https://help.obsidian.md/Editing+and+formatting/Basic+formatting+syntax.
[6] Mozilla Corporation. 'CRLF', 2024. https://developer.mozilla.org/en–US/docs/Glossary/CRLF.
[7] 'Strings', 9 January 2024. https://pkg.go.dev/strings#Builder.
[8] dh1tw. 'Append Function Overwrites Existing Data in Slice', 31 October 2016. https://stackoverflow.com/questions/40343987/append–function–overwrites–existing–data–in–slice.
[9] Mozilla Corporation. 'Entity', 2024. https://developer.mozilla.org/en–US/docs/Glossary/Entity.
[10] Thomas, Jeremy. 'Content', 2024. https://bulma.io/documentation/elements/content/.