Transcription Guidelines for Captioning

Closed Captioning your files with CaptionSync allows you to submit your own verbatim transcript. This article shows how to format it correctly.

An accurate transcript is the basis from which CaptionSync generates captions. Following these guidelines ensures the transcript will work with our automated captioning process. Learn more about accuracy and quality results for various forms of transcription.

In summary, pay attention to speaker IDs and parenthetical comments, scan the table of contents below and save your transcript as a UTF-8 .txt file.

A sample verbatim transcript is available at the bottom of this article.

1. GENERAL GUIDELINES

1.1. Transcribe Verbatim
1.2. Spell Out Words Instead of Using Symbols
1.3. Omit Background Noise/Sounds from the Transcription
1.4. Speaker IDs
1.5. Use Square Brackets for Any Content Not in the Audio
1.6. Ensure Square Brackets are Matched and Symmetrical
1.7. Avoid Abbreviations
1.8. Avoid High ASCII
1.9. Extra Spaces and Carriage Returns/Line Feeds Are Unnecessary

2. CAPTION BREAKS

2.1. Proper Punctuation Terminates a Sentence
2.2. Double Dash Does Not Terminate a Sentence
2.3. Multiple Chevrons Start a New Sentence
2.4. Ellipses Terminates a Sentence if There is a Space After Them
2.5. Caption Breaks Can be Forced with Markup

3. MARK UP

3.1. Caption Breaks
3.2. Sync Markers
3.3. Style
3.4. Position
3.5. Escape Sequences

1. General Guidelines

1.1. Transcribe Verbatim:

The words in the audio should be transcribed exactly as the speaker says them, in the same order, with no additions or deletions. Ensure you remove all the ancillary text such as title, date, author, pagination, etc. Transcribers sometimes extract the meaning of the language in the audio by summarizing slightly or by leaving out segments of speech that may not interfere with the meaning. For goals other than automated captioning, this can serve a useful purpose, but for automated captioning, the transcript must match the audio, even for sentence fragments. Furthermore, if the audio is not readily audible simply transcribe [inaudible].

Example:

Speaker: Alright ladies and gentlemen. If we could get started, please.

Transcription for automated captioning:

  Correct: Alright ladies and gentlemen. If we could get started, please.
  Incorrect: If we could get started ladies and gentlemen.

Exceptions - What does not need to be transcribed:

  • Speaker hesitations and disfluencies, such as “um”, “uh,” “mmm” do not need to be transcribed. It is ok to include these in the transcription, but not absolutely necessary.
  • If a speaker backs up a bit and repeats a short phrase, it is not absolutely necessary to transcribe this. Again, it is fine to include these in the transcription, but not required.

Example:

Speaker: I have…I have some …um…administrative announcements.

Transcription for automated captioning:

  OK: I have some administrative announcements.

1.2. Spell Out Words Instead of Using Symbols:

Special symbols in the text can lead to uncertainty in exactly what was said, making it more difficult for automated captioning. Also, many special symbols are not included in the standard character set for captioning. Instead, transcribe the exact words of the speaker. For example, if the speaker says something like “backslash”, don’t try to use a backslash symbol in the transcription. Instead, spell it out. This is true for all mathematical and other representational expressions, such as N2 (use “N squared”), or division (use “divided by”) or multiplication signs, for example.

Exceptions - Digits, such as “6” instead of “six,” or “25” instead of “twenty-five,” are OK.

Example:

Speaker: It is written like this: eight backslash twenty-five.

Transcription for automated captioning:

  Correct: It is written like this: Eight backslash twenty-five.
  Correct: It is written like this: 8 backslash 25.
  Incorrect: It is written like this: 8 \ 25.

1.3. Omit Background Noise/Sounds from the Transcription:

Background sounds should not be transcribed, or if the transcriber feels it is necessary for the caption reader to read to understand, then these noises should be transcribed using square brackets to set them off. 

Example:

Speaker: Let’s pause so you can discuss among yourselves.
Background noise while students discuss among themselves for a few minutes.
Speaker: Ok, let’s compare notes.

Transcription for automated captioning:

Correct: Let's pause so you can discuss among yourselves. Ok, let's compare notes.

Correct: Let's pause so you can discuss among yourselves. [Background discussions] Ok, let's compare notes.

Incorrect: Let's pause so you can discuss among yourselves. Now we hear some background noise. Ok, let's compare notes.

1.4. Speaker IDs:

Speaker intros can be formatted in a number of acceptable ways:

  • Standalone multiple chevrons. e.g. >> Hi!
  • Multiple chevrons, name colon. e.g. >> Brent: Hi!
  • Open square brace, name colon, close square brace. No space before closing brace. e.g. [Brent:] Hi!

Example:

Speaker named Paul: Let’s take a look at this function.

Transcription for automated captioning:

  Correct: [Paul:] Let's take a look at this function.
  Correct: >> Paul: Let's take a look at this function.
  Correct: Let's take a look at this function.
  Correct: >> Let's take a look at this function.
  Incorrect: Paul: Let's take a look at this function.

Make sure the speaker IDs are not larger than 58 characters. A speaker ID cannot span captions. Make sure you choose a line length that accommodates for unusually large speaker IDs and the first word on their speech -- the size of a caption, that includes a speaker ID in it, must accommodate the max size of the speaker ID (58) plus the first word the speaker says. Don't include an abundance of punctuation or special characters on the speaker IDs. E.g. it is correct to write >> Dr. Patrick Smith, Geology Lecturer: , but not >> Dr. *Patrick Smith*, our Geology Lecturer!:

Example:

The settings on your submission are a Line Length of 32 characters and 2 Lines per Caption.

Speaker ID:

Correct: >> Pat Smith, lecturer: Starting ...
(23 + 9 characters) 
Correct:
>> Patrick George West Smith, Geology Professor: Starting ...
(48 + 9 characters) 
Correct:
>> Patrick West Smith: [Volcanic Geology Professor] Starting ...
(22 + 29 + 9 characters)

Incorrect: >> Patrick George West Smith, Volcanic Geology Professor in this College: Starting ...
(73 + 9 characters)
Incorrect: >> *Patrick George West Smith* [our Geology Professor!]: Starting ...
(56 + 9 characters) and special characters
Incorrect:
>> Patrick George Allen West Smith: [Volcanic Geology Professor in this College] Starting ...
(35 + 45 + 9 characters)

Incorrect: >> Patrick George Allen West Smith: [ Volcanic Geology Professor in this College ] Starting ...
(35 characters) + additional caption

Please note that our transcribers identify speaker changes just with a double chevron, e.g., >> Speech . If you're making a Captioning and/or Transcription request, and wish to have our transcribers identify speakers by name (e.g. >> Pat: Speech ), you need to make that request in the Guidance for Transcriber field, on the New Submission page.

1.5. Use Square Brackets for Any Content Not in the Audio:

Any content that you wish to include in the transcript but is not present in the audio (such as credits or Speaker IDs) must be enclosed in square brackets. The simple rule is this: the transcript should contain only what the speaker said, and nothing more. Any other content must be in square brackets.

Note that CaptionSync differentiates between parentheticals with spaces and those without, i.e.:
[ Laughter ] is different than [Laughter]. The former is a standalone descriptive caption, whereas the latter is an inline comment within a caption. So speaker introductions should not have a space before the closing brackets.

Example:

Speaker named Paul throws his chalk then says: Let’s take a look at this function.

Transcription for automated captioning:

  Correct: [ Throws chalk ] [Paul:] Let's take a look at this function.
  Correct: [ Throws chalk ] Let's take a look at this function.
  Correct: Let's take a look at this function.
  Incorrect: (Throws chalk) Paul: Let's take a look at this function.

Example:

Children playing, music and multiple speakers: Where's the yellow bike? I left it (inaudible). (Shouts) Did you see it?

Transcription for automated captioning:

^M00:03:36
[ Children playing ]
^M00:03:45
[ Music ]
^M00:04:28
>> Where's the yellow bike?
>> I left it [inaudible].
>> [Shouts] Did you see it?

1.6. Ensure Square Brackets are Matched and are Symmetrical:

Ensure that every opening square bracket has a matching closing one and the spacing matches.

Example:

Student question: (inaudible)

Transcription for automated captioning:

  Correct: [ Student question: inaudible. ]
  Correct: [ Student question: [inaudible.] ]
  Incorrect: [ Student question: [inaudible]
  Incorrect: [ Laughter]
  Incorrect: This will be the last time [inaudible ] happens.

Ensure square brackets are symmetrical.

Example:

^M00:05:35
[ Applause ]
^M00:05:43
>> Paul: Guess I don't need a mic. [Laughter] Let's start with...

Transcription for automated captioning:

Incorrect: ^M00:05:35
  [ Applause ]
^M00:05:43
>> Paul: Guess I don't need a mic. [Laughter ] Let's start with...

Incorrect: ^M00:05:35
  [Applause ]
^M00:05:43
>> Paul: Guess I don't need a mic. [Laughter] Let's start with... 

1.7. Avoid Abbreviations:

Avoid using abbreviations in the text whenever possible, as they are not always clear to an automated parser. “St.” for example, could mean “saint” or “street”. “No.” could be a statement, or an abbreviation for “number”. 

Example: 

Speaker: Use a number one pencil.

Transcription for automated captioning:

  Correct: Use a number one pencil.
  Correct: Use a #1 pencil.
  Incorrect: Use a No. one pencil.

Image of the “AutoFormat As You Type” menu, on the “AutoCorrect” dialog box

1.8. Avoid High ASCII Characters:

Depending on the media type, captions usually use a very restricted character set. Most characters in the so-called “high ASCII” set are not permitted. Characters such as accented letters (e.g. é) (unless you're submitting a non-English request), special symbols (e.g. the degree symbol: °), or single quotes (e.g. ’) should not be used. Because they are not permitted in the captioning output, they are replaced by a space by the automated system – this can result in some odd-looking captions. For accented characters, use the unaccented equivalent (unless you're submitting a non-English request). For special symbols, type out the name; and for single quotes, use the apostrophe symbol. Formatting like bold, different font types, bullet points, etc., are also not required and can interfere with the automation process. Keep the formatting as simple as possible.

If you are using Microsoft Word, you need to turn off "smart quotes" to prevent it from automatically using quotes instead of apostrophes – to do this, go to the Tools -> AutoCorrect Options -> AutoFormat As You Type, and turn off both “smart quotes” and “symbol characters”. This will make all subsequent typing without high ASCII, but does not correct what has already been typed!

Example:

Speaker: Mom’s favorite is Nestlé chocolate cooked at 200 degrees.

Transcription for automated captioning:

  Correct: Mom's favorite is Nestle chocolate cooked at 200 degrees.
  Incorrect: Mom’s favorite is Nestlé chocolate cooked at 200°.

1.9. Extra Spaces and Carriage Returns/Line Feeds Are Unnecessary:

The system disregards carriage returns, line feeds, tab characters, and extra spaces. So don’t waste time making it pretty.                          

Example: 

Hello
there
Bob

Smith.  Know   the
time...  or are you
    free?

From our system’s perspective this is the same as:

Hello there Bob Smith. Know the time... or are you free? 

2. Caption Breaks

2.1. Proper Punctuation Terminates a Sentence:

The text is first broken into sentences then sentences are sub-divided into captions as needed. Therefore punctuation terminating a sentence is very important. 

Example:

Can you correct your spelling please! Yes, that’s better.

This would be broken into two sentences:

Can you correct your spelling please!
Yes, that’s better.

2.2. Double Dash Does Not Terminate a Sentence:

You can certainly use the double dash. While it is considered the most favorable break place for wrapping pop-on, it is not interpreted as the end of a sentence.

Example:

What time is the -- Whoa!

This is considered one sentence and would look like:

What time is the -- Whoa! 

2.3. Multiple Chevrons Start a New Sentence:

Double chevrons (or any number greater than one) will force the start of a new sentence. It does not need terminating punctuation before it.

Example:

Hello Joe, can you tell me where >> Stop right there

This would be broken into two sentences (captions) as follows:

Hello Joe, can you tell me where
>> Stop right there

2.4. Ellipses Terminates a Sentence if There is a Space After Them:

Ellipses (multiple periods) will force the start of a new sentence only if it is followed by a space.

Example:

Then the sky darkened... ...and it rained

This is broken into two sentences as follows:

Then the sky darkened...
...and it rained

Example:

But...he knew the code.

This will not break the sentence:

But...he knew the code.

2.5. Caption Breaks Can be Forced with Markup:

This is detailed next...

3. Markup

The EIA-608 captioning character set is essentially ASCII with a couple of exceptions. For .CAP, .ASC, or .XMS files the following characters are not printable:

*  \  ^  _  `  {  |  }  ~

All control codes are case insensitive (e.g. ^it or ^IT are interpreted identically).

3.1. Caption Breaks:

AST uses the following markup to force caption breaks:

Markup
  Description
  ^*
  Force end of caption immediately preceding this point

Example:

This caption needs to^*break back there. Then continues -- with previous rules... As before >>> right? [ cough ]
[Paula:] But, this

doesn't...break.

This gets interpreted as:

This caption needs to
break back there.
Then continues -- with previous rules...
As before
>>> right?
[ cough ]
[Paula:] But, this doesn't...break.

3.2. Sync Markers:

AST uses the following markups to communicate timing information. The frame (:ff) is optional. This is particularly useful to isolate intro music or heavy “sweetening”.

  Markup   Description
^Bhh:mm:ss:ff
  Begin synchronization at this timestamp.
^Ehh:mm:ss:ff
  End synchronization at this timestamp.
^Mhh:mm:ss:ff
  Arbitrary midstream marker at this timestamp.
^Fhh:mm:ss:ff
  Hard end caption at this timestamp. Example: a ^E00:00:30 marker will put an end marker after the current text at 30 seconds, but that caption will be allowed to end normally -- i.e., it is subject to all of the caption timing rules about minimum hang, distance from subsequent caption, and caption gapping. While a ^F00:00:30 puts an end marker after the current text at 30 seconds, and forces that caption to end at 00:00:30.

Precise frame accuracy is not required for the timing markers. These markers can be approximate, as long as the following guidelines are observed:

  • The timestamp for a ^B marker should point to somewhere in the silence just before the speech starts.
  • The timestamp for a ^M marker is associated with the text immediately following the marker, and should point to somewhere in the silence just before the relevant speech starts.
  • The timestamp for a ^E marker should reflect the time that the speech ends; i.e. it should point to somewhere in the silence just after the relevant speech ends.

It is easier to sprinkle ^M sync markers in the file at the beginning of captions. It doesn't matter where on the line they appear or how many you have in a row.

Example:

^B00:03:59
>> Hello Bill.
>> What time is it? ^M00:04:01:20 Slow down! That's better.
^M00:04:03
[ Laughter ]
^M00:04:05
>> Ouch!

This gets interpreted as follows:

Hello Bill.                    This caption starts after 00:03:59:00
>> What time is it?     This caption ends before 00:04:01:20
Slow down!                 This caption starts after 00:04:01:20
That's better.               This caption ends before 00:04:03:00
[ Laughter ]                This caption ends before 00:04:05:00
>> Ouch!                    This caption starts after 00:04:05:00

Notes:

  • The frames are optional in timestamps.
  • All markup codes are case insensitive.
  • You can use the ^B and ^E as many times as you like, but they must be logical. 

Example: 

^B00:00:01 Hello Walter.
>> What time does this end?
^B00:00:09:26 Never!!

This is invalid since you cannot have two begins in a row.

  • ^B and ^M tags refer to time at the beginning of the caption -- they will start a new caption if placed in the middle.
  • ^E tags refer to the time at the end of the caption -- it will end the caption where it is placed. 

Example:  

This text is ignored
^B00:01:02
Robert Smith, correct?.
>> What's the number called
^E00:01:04:20
This text will be ignored too
^B00:02:01:20
Great tune! We're working again...

This is valid, but keep in mind that text before the ^B or after the ^E is ignored.

Robert Smith, correct?                  This caption starts after 00:01:02:00
>> What's the number called     This caption ends after 00:01:04:20
Great tune!                                   This caption starts after 00:02:01:20
We're working again... 

  • The timestamps must be increasing!

Example:

^M00:04:01:20 Slow down!
^M00:04:01 ...or else!!

This is invalid because the second timestamp is smaller than the first (00:04:01:00 < 00:04:01:20).

3.3. Style:

AST uses the following markup to apply style or print special characters:

 Markup   Description
^IT
  Adds Italics to the current style. In effect until reset to Normal
^UL
  Adds Underline to current style. In effect until reset to Normal
^ST
  Adds Bold to current style. In effect until reset to Normal
^NO
  Resets all formatting to Normal
^MU
  Prints the music symbol character
^P
  Forces a paragraph break in the clean transcript, i.e., creates a new paragraph

Example:

I need the following words in italics. ^ITUsing these markers, this text will be in italics.^NO ^M00:00:21:20 Let's add a marker, then a music symbol ^MU. Don't  worry  about  spaces!

This gets presented as follows:

I need the following words in italics.
Using these markers, this text will be in italics.
Let's add a marker, then a music symbol ♪.
Don't worry about spaces!

Note that only the short forms described in the table above are supported: ^IT, ^NO and ^MU.

Note that for some outputs a suitable replacement for the ♪ symbol will be presented, as not all of them support this symbol.

3.4. Position:

AST uses the following markup to apply positioning to individual captions. Note that results will only be visible in formats that store positioning data:

 Markup Description
^TO
Caption at top of CEA-608 area (top of the screen).
^BO
Caption at bottom of CEA-608 area (bottom of the screen).
^RI
Right justification of the caption.
^LE
Left justification of the caption.
^CE
Center justification of the caption.

 

3.5. Escape Sequences:

If the captions are not for the EIA-608 character set (e.g. broadcast constrained), the following escape sequences can be used:

Markup   Description
  \\
  Prints the \
  \^
  Prints the ^, and interprets following characters as spoken words
  \*
  Prints the *
  \[   Prints the [, and does not apply descriptive text processing rules
  \]   Prints the ], and does not apply descriptive text processing rules

Notes:

  • If ^ or * are seen without the backslash they are passed through for webcasts. If they are seen for broadcast, the transcript is rejected.

Example: 

This webcast shows the \^MUSIC symbol syntax plus \^\*. \[ and that this text is in the audio! \]

This gets presented as follows:

This webcast shows the ^MUSIC symbol syntax plus ^*.
[ and that this text is in the audio! ]

The following escape sequences can be used in both broadcast and web captions: 

\.
  Do not treat this period as end of sentence
\?
  Do not treat this question mark as end of sentence
\!
  Do not treat this exclamation point as end of sentence

Example:

This punctuation should be banned\. and\? or limited. Right?

This gets presented as follows:

This punctuation should be banned. and? or limited.
Right?

 

Was this article helpful?
3 out of 3 found this helpful
Follow

Comments