Markdown throwdown: what happens when FOSS software gets corporate backing?

This article was published in Ars Technica, you can view the original there, complete with graphics, comments and other fun stuff.

Markdown is a Perl script that converts plain text into Web-ready HTML; it’s also a shorthand syntax for writing HTML tags without needing to write the actual HTML. Markdown has been around for a decade now, but it hasn’t seen an update in all that time—nearly unheard of for a piece of software. In that light, the fact that Markdown continues to work at all is somewhat amazing.

Regrettably, “works” and “works well” are not the same thing. Markdown, despite its longevity, has bugs. But here, the software has an advantage. As free and open source (FOSS) software, licensed under a BSD-style license, anyone can fork Markdown and fix those bugs.

Recently, a group of developers set out to fix some of those bugs, creating what they call a “standard” version of Markdown. From a pure code standpoint, the results are great. Yet there was no surplus of gratitude. Instead, the “standard” group found itself at the center of a much larger and very contentious debate, one that’s ultimately about who we want in control of the tools we use.

HTML is for browsers

The Web turned the whole world into writers. Never in the history of the human race have so many people produced so much text. The Web has not, however, turned the whole world into writers of HTML. If writing HTML were a requirement to writing on the Web, very few people would be writing on the Web.

Not that it’s particularly hard to write HTML. Only a small subset of the hundreds of HTML tags actually end up in the average bit of text. Most of the time you can get by with paragraph tags, em, strong, and anchor tags for links. (And of course list tags—where would the modern Web be without list tags?)

In other words, it’s not that hard to write HTML. But it is a pain.

Typing out all those tags creates an extra wall between you and your thoughts. No one wants to put <p> at the start of every paragraph and then </p> at the end, we just want to hit return and keep typing, which is what I did at the end of the previous paragraph. In fact, despite the fact that you’re reading this article as a rendered bit of HTML in a webpage, I have not typed a single HTML tag while writing it.

Chances are you posted something on Twitter today, chatted with your friends on Facebook, wrote something on your WordPress blog, posted something to Tumblr, committed a bit of code to GitHub, answered a question on Stack Overflow, or did a hundred other things that ended up rendered in HTML. You most likely did all that without ever actually typing any HTML tags.

Most of the time HTML is hidden by a “rich” text editor, which takes care of creating all the necessary HTML tags for you. WordPress, Tumblr, and other sites not aimed at developers tend to use rich text editors.

But developers and the sites they interact with, on the other hand, often use Markdown.

Markdown co-creator John Gruber at Webstock 2013.
Markdown co-creator John Gruber at Webstock 2013.

Markdown is for Web writers

Markdown began life as a little Perl script written by John Gruber and Aaron Swartz back in 2004. Gruber had just started writing daringfireball.net and quickly realized that the article-as-a-fragment-of-HTML model that most publishing systems used at the time was lacking. Like most of us, Gruber wanted to edit and preview his writing in the text editor of his choice before pasting that text into the publishing system.

HTML is great at many things, but reading raw HTML is terrible. HTML is a markup language, a second stage presentation format. That is, you want to get words on the Web and so the first stage is to type those words; the second is to add HTML so they look the way you intended in a browser. No one wants to read, let alone try to edit, text when it’s littered with HTML tags.

Gruber and Swartz wanted to write first and convert to HTML later, which is what Markdown allows you to do. Gruber and Swartz came up with a shorthand syntax for common HTML elements. Markdown then parses your text, finds those shorthand markers, and replaces them with HTML tags. It also automatically wraps your paragraphs in <p> tags (you just need to leave a blank line between them).

Markdown is not an all-or-nothing syntax. You can pick and choose what you want to use. For example, in 10 years of writing in Markdown, I have never used its image syntax. For me, Markdown’s image syntax is no easier to read or simpler to type than an HTML <img> tag, so I just use the tag.

Markdown is something you can make your own, which is one of its great strengths. Don’t like the inline link syntax? Use the reference syntax, or just write your links in HTML. Markdown is very flexible—perhaps too flexible.

Markdown was not the first text-to-HTML converter, but it was simple and took most of its shorthand syntax from the real world. It mimicked informal styles that emerged when people tried to overcome the limitations of plain text—writing styles that grew into conventions in e-mail, IRC, and Usenet.

For example, if you surround a word with asterisks Markdown renders it in HTML as <em> tags, which means it’s (usually) italicized. Surround a word with double asterisks and it gets wrapped in <strong> tags and displayed in a bold font.

Dig through old mailing lists, IRC logs, or Usenet postings and you’ll find this style of writing everywhere. Markdown might have been formalized and the parser written by Gruber and Swartz, but much of its language evolved collectively and informally over years of countless people figuring out how to convey meaning effectively in plain text.

Markdown turned out to be wildly successful, particularly among writers who used text editors rather than word processors and who were devoted to the idea that your documents, no matter where they end up, should begin life as a text file. In other words, programmers.

In the last 10 years, Markdown has been forked many times, ported to more than a dozen programming languages, and rolled out on some big, often developer-oriented websites (for example, Github and Stack Overflow). Markdown isn’t just popular with developers, though; there are also plugins for every major blogging platform, including WordPress.com.

All that is nice for those of us who grew to depend on Markdown, since it means that we can use the familiar syntax all over the Web.

Where’s the conflict?

The problem with Markdown is that it isn’t entirely clear all the time. There are bugs, but worse there are ambiguities and edge cases where it’s unclear what should happen. Consider Markdown’s list syntax. To create an unordered HTML list in Markdown you write something like this:

* item one
* item two
* item three

Markdown then turns that into this HTML:

<ul>
    <li>item one</li>
    <li>item two</li>
    <li>item three</li>
</ul>

So far, so good. But remember when I said Markdown automatically wraps paragraphs in HTML <p> tags? Okay, so what happens if we do this:

Here's a list of stuff:
* item one
* item two
* item three

There’s no line break before it, but any human reader familiar with Markdown would look at this and know there’s supposed to be a list there. That means the parser should close the paragraph tag and start creating a list. Or at least that’s one way to look at it. The parser might also think, well, there’s no line break so it’s still part of the paragraph—but there is an asterisk around “item one” and “item two,” so those should be wrapped in <em> tags.

In fact, depending on which fork of Markdown you use, there are 15 possible ways this snippet of Markdown might be rendered.

This is not an isolated example, either; there are quite a few cases where Markdown is ambiguous. To be clear, there is no real “right” answer. Someone needs to make a decision about which of those 15 possibilities is “right.”

There are also plain old bugs in Markdown as well. That’s why when authors port Markdown to other languages they end up creating something slightly different, and you end up with something that can be rendered 15 different ways. That’s not just annoying for programmers trying to roll Markdown into their projects, it’s a huge problem for Web writers who never really know what’s going to happen when we put some Markdown in a text field.

In a perfect world, Gruber would release an update for Markdown. Perhaps even Markdown 2.0. He might, as Dave Winer has suggested, also move Markdown to some sort of version control system and publicly host the code in such a way that other developers can contribute and improve the code. That is, after all, the point of a FOSS software license—allowing others to freely use and modify the code. The easier you make it to contribute, the more people who will do so.

Regrettably, we don’t live in that perfect world. Markdown, while widely adopted and widely used, hasn’t seen so much as a bug fix since 2004. There’s nothing wrong with that—it’s certainly Gruber’s right to let Markdown stand as is, but it’s not surprising that other people want to fix the problems and make Markdown better.

That “standard” group of developers referenced earlier made an effort to do just that. They created a fork of Markdown that solves the inconsistencies and edge cases, fixing the bugs. They also offered up two reference implementations and plenty of documentation, all hosted on GitHub. Although not ideal, this at least makes it easier for other developers to contribute.

The “standard” fork might even be able to solve the ambiguities discussed earlier—by consensus even. For example, it solves the earlier is-it-a-list-or-not dilemma by requiring blank lines before lists, a decision made in large part because that’s what the majority of existing parsers do and therefore will be what most users will expect.

That all sounds really nice, right? So why did the project rankle so many developers? Two reasons. First was the name—Standard Markdown.

Were the project not using the Markdown name and simply positioning itself as an entirely new thing, it would quite possibly have been welcomed by the entire Markdown community. But names have power, and names give control. When you use a name, you’re telling the world you don’t want to just improve a thing, you want to control it. Standard Markdown very much wants to be the future of Markdown.

CommonMark's logo. Familiar?
commonmark.org
Gruber, understandably, did not like the name. He asked the developers to change it and they did. Standard Markdown became CommonMark. That was pretty much the end of the name controversy (though CommonMark could really use a new logo to further distance itself from Markdown).

The far more interesting reason that Standard Markdown, now CommonMark, created a fuss is because of who was behind it—not the individuals, but the companies they represent.

The once and future Web of people

Exploitation of the user is a dominant business model on today’s Web. Whether that’s in the form of data being gathered about you, onerous terms of service you need to abide by, or privacy policies that treat you like a commodity, it’s hard not to feel like everything is designed to turn you into a device for making someone else massive amounts of money. Today’s Web is short on humanity, and that’s something we do need to fix. But the problem is deep and systemic. Fixing it will not happen overnight; it may well not happen at all.

In the mean time, there seems to be a deep sense among developers that what we don’t need is more big companies trying to take over small projects like Markdown. Hence, the second reason for resistance to CommonMark.

Despite the disappointing state of the Web these days, there remain pockets of the Internet that still feel untainted. We jealously guard these spaces, our personal little Fugazis of the Web that we can point to and say, “See, Pinboard.in isn’t taking venture capital,” “Metafilter isn’t manipulating me for an exit,” or “Markdown is still a little script some guy wrote.”

CommonMark, on the other hand, was announced by Jeff Atwood, creator of Stack Overflow. Its contributors include developers from Github and reddit. It’s unclear to what extent the companies these people represent are involved, but it certainly appears that CommonMark is a project coming out of the very big companies many have learned to distrust.

One of the common arguments leveled at Gruber when he objected to the name Standard Markdown was that there are dozens of other projects using the name Markdown. He did not (publicly anyway) object to those entities—why this one? That is to say, why the apparent hypocrisy?

Gruber initially agreed to talk to Ars for this story, but then did not respond to e-mailed questions. I can’t definitively say why Standard Markdown (or CommonMark) crossed the line for him. But John MacFarlane, creator of the tool Pandoc and the only CommonMark contributor not associated with a Big Internet Corp., told Ars that he first posted the spec to the Markdown mailing list in August, several weeks before making it more widely known. He used the name Standard Markdown and Gruber did not raise any objections at the time.

It was only later, when Atwood announced the project and presented it as an effort backed by some of the biggest industry users of Markdown, that Gruber protested the use of the name.

Now, Gruber was not alone. Plenty of developers balked, ostensibly at the name, but more likely at the name combined with the backers. Developer Dave Winer captured the sentiment nicely when he wrote, “We all use Markdown, not just you and your pals. It isn’t yours to do with as you please.”

The current state of Markdown? "...people will begin to trip on the weeds, and there will be a call for a cleanup."
Enlarge / The current state of Markdown? “…people will begin to trip on the weeds, and there will be a call for a cleanup.”

But it is yours…

Winer is right in one sense: Markdown belongs to everyone who uses it. In a way this is true precisely because Markdown’s license says that anyone may do with it as they please, so long as they don’t use the name Markdown. Doing as you please includes forking the project to move in a different direction. In fact, forking is open source. Names are something else, though.

When Oracle purchased Sun, a group of developers concerned about the future of the MySQL project under Oracle’s leadership forked the code and started a new project. They did not call it Standard MySQL, though. If they had, the Maria DB project most likely would have disappeared under an avalanche of trademark infringement lawsuits. Luckily, the MariaDB developers did the right thing; they chose a fresh start, renaming the project.

The open source world abounds with successful forks. LibreOffice supplanted OpenOffice, Blink is on its way to being used by more projects than WebKit, and WebKit itself completely overshadows the KHTML project. While not all forks are successful, only about 12 percent of them devolve into trademark fights.

Markdown and CommonMark are slightly different since technically CommonMark did not fork the Markdown code but the Markdown syntax. This is much murkier legal territory. Whether or not Markdown’s copyright notice (which applies to derivative works) legally applies to CommonMark is something a judge would have to decide. But legal or no, the name “Standard Markdown” certainly violates the spirit and historical precedence of forking a project.

Changing the name to CommonMark solves the technical problem then, but it allows the bigger problem to go unanswered—who should be allowed to control our tools? Should you go with Markdown or CommonMark?

The free software movement—from which the license governing Markdown is derived—says the answer is any one, or rather anyone who can write code. The license allows anyone to fork and build their own, and it’s all decentralized and open. Except that, as Markdown illustrates, the result of that situation is not always ideal.

MacFarlane likens the current state of Markdown to an untended garden, adding that “it is a predictable result that a garden so tended will become untidy, that people will begin to trip on the weeds, and that there will be a call for a cleanup.” In other words, people want to know that their list will be a list.

The answer to the question of who controls our tools will come in part from us and in part from the services we choose to use. There may be some benefits to CommonMark for developers and for Web writers, but it still has to gain acceptance in the wider world if it has any hope of success.

As Winer writes, “Programmers always underestimate deployment, and think they can wave a magic wand and get everyone to upgrade. It’s actually nothing like that. Once the investment is made, and years have gone by, no one wants to go back and dig out old infrastructure and replace it with something else.”

Stack Overflow, reddit, and GitHub will presumably be moving to CommonMark, which will make it the more familiar version for many users, but unless the CommonMark developers can bring others over to their cause, CommonMark will remain Yet Another Standard.

In the mean time, CommonMark is very much a work in progress. If you have ideas or want to contribute to the project, head on over to GitHub. It might not be Markdown, but it could end up becoming something better (or worse).