Date: 2004.VI.2.
This file serves as a brief introduction to some fairly general features of the machine-readable texts (e-texts) available on this site. While the texts have been prepared over a number of years--since the later nineteen-eighties--, and consistency is therefore hardly to be expected, there are certain fairly constant features that it will be useful for the prospective downloader and user to be aware of. Various commentary files provide a more detailed description for a least some of the texts. The term "text" is used broadly here to include dictionaries and word lists--anything that is in a text format in typical computer usage.
1. The texts are, as far as I can tell, given the bizarre, obscure, and retrogressive state of the current laws, derived from sources that make them free of copyright restrictions. Except where otherwise stated, I have been solely responsible for getting them into their digitized state.
2. Some of the texts derive from an initial use of optical character recognition, followed by a complete reading to remove the misreadings as far as possible, and to reformat the texts in more satisfactory ways. If a text has a square root sign (ASCII 251/FB) followed by "L" at the very start, a mere book-keeping device of mine, it has certainly received a line-by-line reading. I have not been consistent in using this annotation, but I hope that nothing has slipped through into the current material that has not had at least one complete reading of this kind. (This by no means guarantees an absence of errors of typing or proof-reading on my part, of course.) In some texts, the combination of the fallibility of the software, especially in dealing with italic fonts, and of my eyesight has led to more errors than it is pleasant to be confronted with--this is something to watch out for in the version of Trumbull's Natick to English dictionary, for instance, where vowels in the italicized text (i.e. the native forms within the entries) tend to be neutralized in <u>. All other texts, including all that required the use of films or photocopies of facsimiles, were typed directly, and then proof-read. Some of the details of procedure are provided in separate commentary files.
3. None of the texts are in HTML format, and may not be accepted by Web browsers. Furthermore, the character set assumed is a version of the eight-bit extended ASCII system that was the working character set of DOS operating systems, and still found in the underlying DOS of many Microsoft Windows operating systems. Many of the characters in the so-called "upper ASCII" set (128-255) are used constantly. Since this is increasingly less familiar, I provide here an image in PDF format (requiring the Adobe Acrobat reader to view) of those members of the whole set that it is simple to get printed out, given my current resources: ASCII character set. Size: 24KB.. At some points I have been forced to improvize sequences of characters to represent characters, usually those with diacritics beyond the ken of the ASCII set, such as breves and macrons, or superscripted characters, and these may not always be documented adequately. The major locus of this difficulty is in the forms of Trumbull's Natick dictionary, and it should not be hard to grasp what has been done with a copy of the original to hand.
It should be possible to download the texts to the browser, at which point they can be saved. It would have been preferable to allow access via the Unix program ftp, but this requires a server that is unavailable to me.
4. Some texts essentially preserve the lineation of the original, though generally line-end hyphens are removed, the end of the word being carried back to the previous line. Others, such as the bible texts in Massachusett, where the verse is treated as the basic data unit in the text, use the DOS line (terminated in ASCII 13/0D followed by ASCII 10/0A) to encode that unit. This means that the lines may be much longer than the eighty or so characters that many editors take as the norm, and they may not be accepted by some (graphics-oriented) editors such as current word processors. If they are accepted by such a word processor, it may be the case that reformatting of a more or less radical kind occurs. It may be helpful to set the attribtutes of the file to the read-only mode, if possible, or at any rate to avoid saving the text in its new form, at least if it is intended to carry out searches outside the context of the editor.
Pagination generally corresponds to that of the original, with divergences described in commentary files. In general, the page numbering of the original is presented at the left margin within a pair of angle brackets (less than/greater than symbols, "<" and ">").
5. Many of the texts have an abbreviated form of reference between angles (ASCII 169 and 170) at the start of the file, usually in the form date^name^title. This is a habit carried over from the system of stencils used in the Middle English Dictionary for use as abbreviations in the dictionary entries, and applied to the Middle English texts of which I have made machine-readable versions.