I decided to standardize the captions on my photos of living-things, from a mixture of Common-name only, Scientific-name only, both kinds of name in Com=Sci style,
to a new style where Scientific-name in parentheses follows Common-name. I began by constructing a table of name-pairs, that consists of lines
like:
<I>Acorus calamus</I> <b>Sweet flag</b>
<I>Canis lupus</I> <b>Grey wolf</b>
It is a plain-text file, with 2 columns delimited with HTML-like brackets. For some species I've used more than one Common-name, and for cases where the
accepted Scientific-name has changed recently, my table has more than one of those too. Multiple names are separated with an Equalsign; for example:
<I>Rubus arcticus ssp acaulis=Rubus acaulis</I> <b>Dwarf raspberry=Stemless arctic raspberry=Stemless raspberry</b>
As of 2011-05-07, my
sci2com-table contains 1165 such lines, which includes all the species I currently have photos of on my website.
Some Scientific-names are only to genus or family or order.
When I began this conversion, my captions were a bit of a mess in another way: in the early years proper-nouns were written as one-word in the "CamelCase" style, for
place-names, person-names, and species-names -- because of the way I made the Photos-by-Caption page, which required a single-word as the "major-caption".
After going to the use of a colon to separate major- from minor-caption, I became free to use multiple-word names, however I hadn't finished converting the old
photo-pages, and they still had many CamelCase names. My fix-ER-captions-CamelCase does this standardization, in 2 steps:
first it inserts a space between any pair of adjacent lowercase+uppercase letters; for place-name or person-name that's all that's needed (except for McSomething names
where the CamelCase form is the norm, so no space is added after a "Mc"); however for plant- and animal names, we also want to revise the capitalization and that's
the 2nd step of fix-ER-captions-CamelCase.
Then I used the early version of fix-ER-captions-sci2com-UNDO, followed by fix-ER-captions-sci2com-DO; to first convert "Com=Sci" to "Com"; then convert "Com" to
"Com (Sci)" and "Sci" to "Com (Sci)". Both scripts were originally written for "Com=Sci" name-pairs, then converted to the "Com (Sci)" style.
Both of these as well as the 2nd step of fix-ER-captions-CamelCase use the same approach:
a sed s-cmd rearranges a line of the sci2com-table into a sed s-cmd that makes the desired change to a caption. in other words, the lines of sci2com-table are piped
through sed, then piped into a sed-inplace that modifies all photo-webpages.
The s-cmd that modifies captions uses LH-context and RH-context to match only the cases that need revision, and avoid ones that don't.
The "context" is a single character in most cases, however the 1st s-cmd (the "Com-->Com(Sci)" s-cmd) in the "DO" script needs a more complex RH-context, to match
either (1) a space plus a character-other-than-Leftparen, or (2) Colon|Quote|Query|Plus.
Here are all the caption-modifying s-cmds for the "<I>Canis lupus</I> <b>Grey wolf</b>" line of the table:
line from sci2com-table: <I>Canis lupus</I> <b>Grey wolf</b>
is modified to (CamelCase): s/[gG]rey [wW]olf/Grey wolf/
is modified to (UNDO): s/\([ "+=]\)Grey wolf (Canis lupus)/\1Grey wolf/
is modified to (DO cmd-1): s/\([ "+=]\)Grey wolf\( [^(]\|[:"?+]\)/\1Grey wolf (Canis lupus)\2/
is modified to (DO cmd-2): s/\([ "+=]\)Canis lupus\([ :"?+]\)/\1Grey wolf (Canis lupus)\2/
One detail not shown in the s-cmds above: each of them, except for the Camelcase one, has "/title=/" preceding the s-cmd, so it revises only caption-lines.
(Some photo-pages also contain prose, and I decided against applying these revisions to the prose.)
Here are the scripts:
Send your questions, suggestions, corrections to
ereimer@shaw.ca.