New Adventures in HiFi Text

2014 Addendum

This post is very old, most of the links are dead and none of it is necessary anymore thanks to projects like Pandoc which can take Markdown use it to create LaTeX and a dozen other types of files.

When I wrote this in 2006 there was no Pandoc. I guess I was a pioneer in some sense.

But yeah, there are much easier ways to do this now. I leave it up because Daring Fireball linked to it and to this day I get traffic looking for it.

What I stand by 100% is the philosophy and usefulness of plain text. It’s all I use to store data — usually in flat files, as described here, sometimes in databases. Whichever the case, plain text lasts, end of story. I also stand by my pronunciation of LaTeX. I know La Tech is correct, but I’ll be damned if I’m going to say it that way.

In praise of plain text

I sometimes bitch about Microsoft Word in this piece, but let me be clear that I do not hate Windows or Microsoft, nor am I a rabid fan of Apple. In fact prior to the advent of OS X, I was ready to ditch the platform altogether. I could list as many crappy things about Mac OS 7.x-9.x as I can about Windows. Maybe even more. But OS X changed that for me, it’s everything I was looking for in an OS and very little I wasn’t. But I also don’t think Microsoft is inherently evil and Windows is their plan to exploit the vulnerable masses. I mean really, do you think this guy or anything he might do could be evil? Of course not. I happen to much prefer OS X, but that’s just personal preference and computing needs. I use Windows all the time at work and I don’t hate it, it just lacks a certain je ne sais quoi. [2014 update: These days I use Debian Linux because it just works better than any other OS I have use.]

That said, I have never really liked Microsoft Word on any platform. It does all sorts of things I would prefer that it didn’t, such as capitalize URLs while I’m typing or automatically convert email addresses to live links. Probably you can turn these sorts of things on and off in the preferences, but that’s not the point. I dislike the way Word approaches me, assuming that I want every bell and whistle possible, including a shifty looking paperclip with Great Gatsbyesque eyes watching my every move.

Word has too many features and yet fails to implement any of them with much success. Since I don’t work in an office environment, I really don’t have any need for Word (did I mention it’s expensive and crashes with alarming frequency?). I write for a couple of magazines here and there, post things on this site, and slave away at the mediocre American novel, none of which requires me to use MS Word or the .doc format. In short, I don’t need Word.

Yet for years I used it anyway. I still have the copy I purchased in college and even upgraded when it became available for OS X. But I used it mainly out of ignorance to the alternatives, rather than usefulness of the software. I can now say I have tried pretty much every office/word processing program that’s available for OS X and I only like one of them — Mellel. But aside from that one, I’ve concluded I just don’t like word processors (including Apple’s new Pages program).

These days I do my writing in a text editor, usually BBEdit. Since I’ve always used BBEdit to write code, it was open and ready to go. Over time I noticed that when I wanted to jot down some random idea I turned to BBEdit rather than opening up Word. It started as a convenience thing and just sort of snowballed from there. Now I’m really attached to writing in plain text.

In terms of archival storage, plain text is an excellent way to write. If BareBones, the makers of BBEdit, went bankrupt tomorrow I wouldn’t miss a beat because just about any program out there can read my files. As a file storage format, plain text is almost totally platform independent (I’m sure someone has got a text editor running on their PS2 by now.), which makes plain text fairly future proof (and if it’s not then we have bigger issues to deal with). Plain text is also easy to marked up for web display, a couple of <p> tags, maybe a link here and there and we’re on our way.

In praise of formatted text

But there are some drawbacks to writing in plain text — it sucks for physical documents. No one wants to read printed plain text. Because plain text must be single spaced printing renders some pretty small text with no room to make corrections — less than ideal for editing purposes. Sure, I could adjust the font size and whatnot from within BBEdit’s preferences, but I can’t get the double spacing, which is indispensable for editing, but a waste of space when I’m actually writing.

Of course this may be peculiar to me. It may be hard for some people to write without having the double-spaced screen display. Most people probably look at what they’re writing while they write it. I do not. I look at my hands. Not to find the keys, but rather with a sort of abstract fascination. My hands seem to know where to go without me having to think about it, it’s kind of amazing and I like to watch it happen. I could well be thinking about something entirely different from what I’m typing and staring down at my hands produces a strange realization — wow look at those fingers go, I wonder how they know what their doing? I’m thinking about the miraculous way they seem to know what their doing, rather than what they’re actually doing. It’s highly likely that this is my own freakishness, but it eliminates the need for nicely spaced screen output (and introduces the need for intense editing).

But wait, let’s go back to that earlier part where I said its easy to mark up plain text for the web — what if it were possible to mark up plain text for print? Now that would be something.

The Best of Both Worlds (Maybe)

In fact there is a markup language for print documents. Unfortunately its pretty clunky. It goes by the name TeX, the terseness of which should make you think — ah, Unix. But TeX is actually really wonderful. It gives you the ability to write in plain text and use an, albeit esoteric and awkward, syntax to mark it up. TeX can then convert your document into something very classy and beautiful.

Now prior to the advent of Adobe’s ubiquitous PDF format I have no idea what sort of things TeX produced, nor do I care, because PDF exists and TeX can leverage it to render printable, distributable, cross-platform, open standard and, most importantly, really good looking documents.

But first let’s deal with the basics. TeX is convoluted, ugly, impossibly verbose and generally useless to anyone without a computer science degree. Recognizing this, some reasonable folks can along and said, hey, what if we wrote some simple macros to access this impossibly verbose difficult to comprehend language? That would be marvelous. And so some people did and called the result LaTeX because they were nerd/geeks and loved puns and the shift key. Actually I am told that LaTeX is pronounced Lah Tech, and that TeX should not be thought of as tex, but rather the greek letters tau, epsilon and chi. This is all good and well if you want to convince people you’re using a markup language rather than checking out fetish websites, but the word is spelled latex and will be pronounced laytex as long as I’m the one saying it. (Note to Bono: Your name is actually pronounced bo know. Sorry, that’s just how it is in my world.)

So, while TeX may do the actual work of formating your plain text document, what you actually use to mark up your documents is called LaTeX. I’m not entirely certain, but I assume that the packages that comprise LaTeX are simple interfaces that take basic input shortcuts and then tell TeX what they mean. Sort of like what Markdown does in converting text to HTML. Hmmm. More on that later.

Installation and RTFM suggestions

So I went through the whole unixy rigamarole of installing packages in usr/bin/ and other weird directories that I try to ignore and got a lovely little Mac OS X-native front end called TeXShop. Here is a link to the Detailed instructions for the LaTeX/TeX set up I installed. The process was awkward, but not painful. The instruction comprise only four steps, not as bad as say, um, well, okay, it’s not drag-n-drop, but its pretty easy.

I also went a step further because LaTeX in most of it’s incarnations is pretty picky about what fonts it will work with. If this seems idiotic to you, you are not alone. I thought hey, I have all these great fonts, I should be able to use any of them in a LaTeX document, but no, it’s not that easy. Without delving too deep into the mysterious world of fonts, it seems that, in order to render text as well as it does, TeX needs special fonts — particularly fonts that have specific ligatures included in them. Luckily a very nice gentlemen by the name of Jonathan Kew has already solved this problem for those of us using Mac OS X. So I downloaded and installed XeTeX, which is actually a totally different macro processor that runs semi-separately from a standard LaTeX installation (at least I think it is, feel free to correct me if I’m wrong. This link offers more information on XeTeX.

So then I read the fucking manual and the other fucking manual (which should be on your list of best practices when dealing with new software or programming languages). After an hour or so of tinkering with pre-made templates developed by others, and consulting the aforementioned manuals, I was actually able to generate some decent looking documents.

But the syntax for LaTeX is awkward and verbose (remember — written to avoid having to know an awkward and verbose syntax known as TeX). Would you rather write this:

\section{Heading}
\font\a="Bell MT" at 12pt   
\a some text some text some text some text, for the love of god I will not use latin sample text because damnit I am not roman and do not like fiddling. \href{some link text}{http://www.linkaddress.com} to demonstrate what a link looks like in XeTeX. \verb#here is a line of code# to show what inline code looks like in XeTeX some more text because I still won't stoop, yes I said stoop, to Latin.

Or this:

###Heading
some text some text some text some text, for the love of god I will not use latin sample text because damnit I am not roman and do not like fiddling. [some link text][1] to demonstrate what a link looks like in Markdown. `here is a line of code` to show what inline code looks like in Markdown. And some more text because I still won't stoop, yes I said stoop, to Latin.

In simple terms of readability, John Gruber’s Markdown (the second sample code) is a stroke of true brilliance. I can honestly say that nothing has changed my writing style as much since my parents bought one of these newfangled computer thingys back in the late 80’s. So, with no more inane hyperbole, lets just say I like Markdown.

LaTeX on the other hand shows it’s age like the denture baring ladies of a burlesque revival show. It ain’t sexy. And believe me, my sample is the tip of the iceberg in terms of mark up.

using Perl and Applescript to generate XeTeX

Here’s where I get vague, beg personal preferences, hint a vast undivulged knowledge of AppleScript (not true, I just use the “start recording” feature in BBEdit) and simply say that, with a limited knowledge of Perl, I was able to rewrite Markdown, combine that with some applescripts to call various Grep patterns (LaTeX must escape certain characters, most notably, $ and &) and create a BBEdit Textfactory which combines the first two elements to generate LaTeX markup from a Markdown syntax plain text document. And no I haven’t been reading Proust, I just like long, parenthetically-aside sentences.

Yes all of the convolution of the preceding sentence allows me to, in one step, convert this document to a latex document and output it as a PDF file. Don’t believe me? Download this article as a PDF produced using LaTeX. In fact it’s so easy I’m going to batch process all my entries and make them into nice looking PDFs which will be available at the bottom of the page.

Technical Details

I first proposed this idea of using Markdown to generate LaTeX on the BBEdit mailing list and was informed that it would be counter-productive to the whole purpose and philosophy of LaTeX. While I sort of understand this guidance, I disagree.

I already have a ton of documents written with Markdown syntax. Markdown is the most minimal syntax I’ve found for generating html. Why not adapt my existing workflow to generate some basic LaTeX? See I don’t want to write LaTeX documents; I want to write text documents with Markdown syntax in them and generate html and PDF from the same initial document. Then I want to revert the initial document back to it’s original form and stash it away on my hard drive.

I simply wanted a one step method of processing a Markdown syntax text file into XeTeX to compliment the one step method I already have for turning the same document into HTML.

Here’s how I do it. I modified Markdown to generate what LaTeX markup I need, i.e. specific definitions for list elements, headings, quotes, code blocks etc. This was actually pretty easy, and keep in mind that I have never gotten beyond a “hello world” script in Perl. Kudos to John Gruber for copious comments and very logical, easy to read code.

That’s all good and well, but then there are some other things I needed to do to get a usable TeX version of my document. For instance certain characters need to be escaped, like the entities mentioned above. Now if I were more knowledgeable about Perl I would have just added these to the Markdown file, but rather than wrestle with Perl I elected to use grep via BBEdit. So I crafted an applescript that first parsed out things like &mdash; and replaced them with the unicode equivalent which is necessary to get an em-dash in XeTeX (in a normal LaTeX environment you would use --- to generate an emdash). Other things like quote marks, curly brackets and ampersands are similarly replaced with their XeTeX equivalents (for some it’s unicode, others like { or } must be escaped like so: \{).

Next I created a BBEdit Textfactory to call these scripts in the right order (for instance I need to replace quote marks after running my modified Markdown script since Markdown will use quotes to identify things like url title tags (which my version simply discards). Then I created an applescript that calls the textfactory and then applies a BBEdit glossary item to the resulting (selected) text, which adds all the preamble TeX definitions I use and then passes that whole code block off to XeTeX via TeXShop and outputs the result in Preview.

Convoluted? Yes. But now that it’s done and assigned a shortcut key it takes less than two seconds to generate a pdf of really good looking (double spaced) text. The best part is if I want to change things around, the only file I have to adjust is the BBEdit glossary item that creates the preamble.

The only downside is that to undo the various manipulations wrought on the original text file I have to hit the undo command five times. At some point I’ll sit down and figure out how to do everything using Perl and then it will be a one step undo just like regular Markdown. In the mean time I just wrote a quick applescript that calls undo five times :)

Am I insane?

I don’t know. I’m willing to admit to esoteric and when pressed will concede stupid, but damnit I like it. And from initial install to completed workflow we’re only talking about six hours, most of which was spent pouring over LaTeX manuals. Okay yes, I’m insane. I went to all this effort just to avoid an animated paperclip. But seriously, that thing is creepy.

Note of course that my LaTeX needs are limited and fairly simple. I wanted one version of my process to output a pretty simple double spaced document for editing. Then I whipped up another version for actual reading by others (single spaced, nice margins and header etc). I’m a humanities type, I’m not doing complex math equations, inline images, or typesetting an entire book with table of contents and bibliography. Of course even if I were, the only real change I would need to make is to the LaTeX preamble template. Everything else would remain the same, which is pretty future proof. And if BBEdit disappears and Apple goes belly up, well, I still have plain text files to edit on my PS457.