Main content

Style guide for the digitization of text to speech

We recently started thinking about developing a ‘style guide’ for digitizing text, common things to look at in terms of text to speech functionality. Obviously this is something that would be of great value to anyone participating in the process and I realised there may already be something in existence?

Some examples of a style guide might be:

  • Stick to the original text as much as possible
  • Convert numerical instances of numbers into the word equivalents, eg, ‘1984’ becomes ‘nineteen eighty four’

Any rules devised would need to be developed in consultation with the technology itself and the users of the technology. It would also have to allow for future improvements and developments in the technology.

Anthony, that's a great idea. Pasted below is a copy of the relevant sections of what one of the alternate format producers have as notes in editing text. Interestingly, they do not convert standard symbols, numbers, or even simple math equations because they are normally recognized by text-to-speech software properly.

I would be very interesting to hear what the results are of some of these finer points depending on the software we use (DAISY Pipeline), and one or two of the frequently used text to speech (for example VoiceOver, and Narrator). Unfortunately, I have not yet found much of this documented publicly, but will definitely add more as I find it.

Do let us know what else you're considering for the style guide and what technologies you're considering them with.


#### Text ####
*   Check spelling and correct misspelled words. Type in passages of text which have not been recognized by the character recognition software.
*   Put the text into correct linear sequence and type in missing text.

#### Headings of the text ####
*   Use the built-in heading styles function in Microsoft Word to mark headings. For example, to mark chapter headings as level 1 and mark section headings as level 2. (On the `Home` tab, in the `Styles` group, click the style that you want). The decision on how many levels are involved depends on the layout and context of the book.
*   Create the heading links by using Table of Contents function in Microsoft Word. (On the `References` tab, in the `Table of Contents` group, click `Table of Contents`, and then click the table of contents style that you want).
#### Special Items: figures, photos, block text (citations), sidebars, textboxes, footnotes, endnotes, key term definitions, annotations, etc... ####
Before the items, apply the language tag `BEGIN` + item name + colon, for example, `BEGIN FOOTNOTE:`. At the end of the items, apply the language tag `END` + item name + period, for example, `END FOOTNOTE.`. For figures, photos, and illustrations, the word `CAPTION` needs to be added to the tags. For example, `BEGIN FIGURE CAPTION:` and `END FIGURE CAPTION.`. All the language tags are in **UPPER CASE** and stand in a separate line. If two or more tags are used next to each other, leave a blank line between them. AutoCorrect function in Microsoft Word is used to insert all the language tags. (These shortcuts can be created in Word: `Options -> Proofing -> AutoCorrect Options`). Create shortcuts for all of the language tags which will be memorable to you, eg. "bt" for `BEGIN TEXT BOX:`, "et" for `END TEXT BOX.`, "bkt" for `BEGIN KEY TERM:` etc.
#### 1. Tables ####
*   Ensure that tabular data is formatted correctly .
#### 2. Figures, Photos and Illustrations ####
*   Mark these items with the appropriate language tags, for example, `BEGIN FIGURE CAPTION: BEGIN PHOTO CAPTION: BEGIN ILLUSTRATION CAPTION:` .
*   If they are required to be described, the description is surrounded by the tags `BEGIN PRODUCER'S NOTE:` and `END END PRODUCER'S NOTE.` and this is nested within the caption tags.


*   Place the captions and any accompanying producer's notes after the paragraph in which these items are referred for the first time.
*   If these items are not referred anywhere,  place the captions on the pages they originated, either at the beginning or end of the page.
*   If these items contain a heading as well as text, treat the heading as other heading
*   (For the description, please refer to CILS Guidelines for Editing Figures.)

        Fig. 7-5. (top) Glass-fibre-reinforced concrete panels are light and strong enough to reduce this building's structural requirements. (bottom) Spray-up fabrication made it easy to create their contoured profiles. (60671, 46228)
        The Bagua or Eight Trigrams
        Beginning at the top and proceeding clockwise, the trigrams represent (1) Moving Water (as rains or streams) and Moon, Kan (K'an); (2) Thunder, Zhen (Chen); (3) Earth, Kun (K'un); (4) Mountain, Gen (Ken); (5) Fire, Sun, Lightning, Li; (6) Wind and Wood, Sun; (7) Heaven or Sky, Quian (Ch'ien); (8) Collected Water (as in a marsh or lake), Dui (Tui).
#### 4. Footnotes and Endnotes ####
*   Mark the note reference (usually marked by  a superscript number, letter or symbol) and the note text.
*   Insert FOOTNOTE or ENDNOTE before the note reference marker, for example: FOOTNOTE 1
*   Put the footnotes next to the paragraph where the foot reference appears and apply the footnote tags.
*   Keep the endnotes in the same location as in the print form and no tags are required.
        “I never could use those things,” Fidelity confessed. “It’s not just my eyes. I feel like that woman in the Colville ENDNOTE 1 painting.”

        For every organ-machine, an energy-machine: all the time, flows and interruptions. Judge Schreber FOOTNOTE * has sunbeams in his ass. A solar anus. And rest assured that it works: Judge Schreber feels something, produces something, and is capable of explaining the process theoretically. Something is produced: the effects of a machine, not mere metaphors.
        BEGIN FOOTNOTE * :
        Daniel Paul Schreber was a German judge who began psychiatric treatment in 1884 at the age of forty-two, and spent the remaining twenty-seven years of his life in and out of mental institutions. In 1903, at the age of sixty-one, he published his Denkwurdigkeiten eines Nervenkranken (Memoirs of a Nervous Illness), which Freud used as the basis of his influential 1911 study on paranoia, "Psycho-Analytic Notes" (reference note 7, page 384 of this volume). pp. 390-472. (Translators' note.)
        END FOOTNOTE *.
#### 5. Other special items ####
*   edit the other special items as the regular text and apply the appropriate the language tags. The most commonly used tags include:
        END SIDEBAR.
        END TEXTBOX.
#### 6. Equations, Formulas, Symbols and Coding ####
*   Transcribe these items under the guidance of Handbook for Spoken Mathematics, NBA Tape Recording Manual, CILS In-house Guidelines for Describing Mathematical Expressions and Equations, and some online sources.
*   Make sure that the terms and vocabulary used for transcription are consistent for the book.
#### 7. End of the file ####
*   At the end of each file, use `END` tag to indicate the end of chapter, appendix, article, glossary etc.

        END OF CHAPTER 1.
        END OF Appendix.
#### 8. Front Matter ####
*   Not all information on the title page and verso is required to keep. Include these items in the following template:
        The complete of book title (in ALL UPPER CASE)
        Subtitle (If present in the print item)
        Name and location of publishing (of print item)
        Copyright information ( of print item)
        CILS alternate format copyright statement
        Producer's general notes (if necessary)
*   Apply the language tags `BEGIN TOC:` and `END TOC.` to the table of contents. Use tabs to indicate the hierarchical organization and insert a space + 3 dots + a space between heading and pagination. For example:
        BEGIN TOC:
        Table of Contents
        Preface ... vii
        Acknowledgments ... xi
        1 Introduction ... 1
        2 Overview of Cognitive Task Analysis Methods ... 9
        &     I Tools for Exploring Cognition in Context ... 27
        3 Preparation and Framing ... 29
        END TOC.
*   Edit the other parts of front matter as regular.

Wow that sounds great. Good to see a few points of difference there that may change our workflow for the better. Should we get a Google doc started on this to collaborate on a potentially official doc?

We are also attempting digitization of a non-fiction title. This is an area we could especially use some guides for. The book I'm looking at has many charts diagrams and images included, not to mention the index! At this stage we can't see how we could feasible include the index, and I wonder if it is best left out, due to the time it would take to mark up such an index, I'm guessing around 4-5 times as long as it takes to correct the book itself! Do Daisy readers have a find text functions equivalent to 'Ctrl F' of search functionality in an ebook reader? As this would make the inclusion on an index virtually redundant.

We are also looking at what to do with Non-Fiction titles. Describing images can be a challenge but is achievable. One sticking point we're finding is the index. There are a number of reasons why producing an effective index would be excessively difficult if not impossible. I would hate to deprive the reader by offering a substandard book, although part of me also wonders how useful and index in audio format is anyway.

My opinion about the index is that it should be there, and is useful to have it read. So if interested, a user could go to the index and hear that "owls,  8" to know where they might find references, but not make the index linked in such a way that a user could automatically jump to the specific page. It'd be useful to have if the file already has that integrated, which is common if the original file is a ePub (but not in .doc), but I don't think it's worth the trade off to add it in manually.

You're correct in assuming that for synthesized DAISY, since the text file is in there, users can search for text much like they would in an ebook.

For images, I suggest following regular website rules. If the image is purely decorative (which is common for chapter heading decoration), then no alternate text is required. If the image has a caption, the caption is generally enough to describe the image. Similarly, if the image or figure is described by the text, then no alt text is required. In the case, where it's none of those, then just a basic description should be added.

I would say that charts and figures depends. Tables should be properly formatted with headers and such. Other charts and figures I would imagine to be images.

Thanks for your reply. I found the linked webpage page below to be very helpful with how to use alt text images (it describes images on the web but I agree the same applies.)

The book I am currently digitizing has lots of images of brain scans, they don't add a great deal as the surrounding text generally does a better job describing the issue than the small fuzzy image. Luckily the images also come with captions which mostly describe the image.