charsetdetective -- by Eugene Reimer 2009-August

Command-line tools to detect which charset/encoding a file is in:
charsetdetective-2.00.tar.gz
charsetdetect


charsetdetective is based on open-source code from Mozilla.  It works amazingly well even on files in hard-to-distinguish charsets.  Being good not only at distinguishing UTF-8 from the 8-bit encodings, but also at distinguishing the different windows-125x and 8859-x encodings from each other.  Here is the ChangeLog with information on version-1.01, version-2.00, test-results, and a description of the test-suite.

charsetdetect gives 2 opinions for the price of one, by combining an algorithmic 4-way classification where definitions from the relevant international-standards provide provable facts, with the heuristic opinion of charsetdetective.  The charsetdetective sometimes reports "unknown" and occasionally gives an incorrect answer, making the provable facts an important addition.  Similarly, the "facts" often tell less than the whole story, so by themselves they're not enough.  And that's why this program gives both.



Send your questions, suggestions, corrections to ereimer@shaw.ca.