A few Thoughts about Automated Tools that I Need -- by Eugene Reimer 2011-Feb-28

Today I noticed yet another instance of a double-dash in an HTML-comment (in a rant on abortion), with the result that my unhoned rough notes, musings, and amusing (to me) digression about "Last Exit To Brooklyn", that were meant to be semi-private, became visible to everyone using Firefox.  If I still did my webpage-testing with Firefox the error would've been noticed...  I've written a simple grep-based checking-tool (programs/webcommentcheck) some time ago, but haven't gotten around to automating the use thereof.  'Tis high time to do so.

(HTML-comments are not private;  anyone using View: Source in his/her browser will see such comments.  I've used these normally invisible comments for several reasons;  for example, when removing the section on the founding of NativeOrchid.Org in its About page I simply enclosed it in HTML-comment-markers, so it would be readily available were it needed again;  it remained in that state for most of a decade, until just recently.)

Another thing I need to automate: spell-checking.  Recently I've been more writer than programmer, spending about half my at-the-computer time writing rants, blogs, essays, etc (see rants/cancer.htm my blog-about-everything).  I write HTML in a text-editor, and my text-editor chosen for what a programmer wants, not what a writer does, has no spell-checker.  When recently laughing at myself for using an awkward wording just to avoid a word I can't spell, I decided 'twas time spell-checking be automated. 
[Why not use a word-processor you ask:  only if you've never looked at the HTML produced by MS-Word or one of its clones would you ask that -- one needs a strong stomach to look at such.  (In Windows, you "look at HTML" by using Open-With Notepad.)  These products generate HTML so hideously complex that anyone looking at it with a view toward learning HTML is immediately scared off, which is a pity since HTML is a very nice markup-language that's remarkably simple and wonderfully easy to learn.]

One more minor thing needs to be automated.  My wanting a double-space after each sentence has led to 2 different styles over the years.  When preparing other people's material for web-use, I use a script that's a sed-command to add an NBSP-entity (a people-readable Non-Breaking-Space) after each sentence-ending punctuation character, then manually undo any spuriously added ones after Dots (on abbreviations) as opposed to Periods.   (This has led me to ask whether the convention about Dots on abbreviations does more harm tham good, and to conclude that it does.)  However when typing my own prose, I prefer the NBSP-character over the entity.  My keyboard shortcuts (see programs/.Xmodmap) include Windows-key+Space as a shortcut for Nonbreaking-Space, so for a double-space that behaves as such in HTML I type the first with Windows-key depressed, the 2nd without, which is reasonably convenient.  However merely typing 2 spaces would be nicer still, and possible with a bit of automation:  a revised webpage (on upload) gets Space+Space replaced by NBSP+Space.  It's so trivial that it took longer to write about than to do, however there's one problem:  my old pages, written with the knowledge that extra spaces are meaningless in HTML, will acquire spacing anomalies as a result of that systematic replacement  (eg: old pages often have 2 spaces after an NBSP-entity).  Aha, the simple safe solution is to space-destutter old pages, except within <PRE>...</PRE> (where all spaces are meaningful).

Send your suggestions to ereimer@shaw.ca.

[this trivial rant goes to show that my standards are low today:]

[Glossary of words I use in ways not found in dictionaries:

"Destutter" as used above to mean replacing every string of repeated character by one such character;  I first encountered "destutter" used that way when chatting with a Comp-Sci prof at U of Waterloo circa 1987 and gathered it was his coinage, however I've forgotten his name;  Gord Cormack introduced us, and I recall that we had an interest in NYSIIS in common;  googling shows that Joe Stoy (http://users.comlab.ox.ac.uk/joe.stoy/tla-website/=tla/TLA.html) uses "destutter" that way, but very few others do;  ergo a definition is needed. 

"Stroph" as short for "apostrophe" needs a definition;  I first encountered it in the Algol-68c Programmer's Guide, by Bourne et al, in 1975, however the usage has not become common;  incidentally, "strophing" by that compiler's lexical-analyzer, as a workaround for the limitations of monocase keypunches and printers, is of historical interest though not relevant here. 

"Level of indirection" is something I coined when a student, and used to briefly explain how a compiler could implement dynamic-scoping of names to Gord Cormack who used it in a paper published circa 1978;  several other people are known for sayings containing this phrase, and I'm gratified to see my little innovation having caught on, although it also seems entirely possible for others to have "invented" it independently.]