Closed Captioning your files with CaptionSync allows you to receive different types of transcripts. This article shows how.
When you submit a captioning job to AST and request that we generate the transcript, our trained transcribers will generate a verbatim transcript for your media, in the .txt format, as the first step in the captioning process. AST's transcribers are trained to follow specific guidelines for putting in non-dialog content, and marking off sections of music or noise. Because the CaptionSync system disregards whitespace in the transcript file, our transcribers have a good deal of latitude in how they use whitespace (i.e.: when they put in carriage returns, line spacing, indentation, etc.).
Getting a "Raw Text Transcript with Markers" for a Redo:
Open the Details page of your request and look for Text Transcript. This is the Raw Text Transcript with Markers as created by the AST transcriber. If you choose to Redo a job, you need to use that version of the transcript. You can also have the Raw Text Transcript with Markers generated as a Captioning output, if you select it from the Advanced Settings, at submission time.
Getting a "Clean Text Transcript":
The variability in white space, and the use of various markup codes in the transcript, can make the Raw Transcript with Markers a bit difficult to read. If you would like to receive a transcript for use outside of the CaptionSync system (for example, to post alongside your video), then you should select the Clean Text Transcript output, in the Advanced Settings section. This will result in a clean, consistently formatted transcript being returned to you along with your caption results. The Clean Text Transcript will have the extension of .clean.txt, all markers removed, and paragraph breaks only at the start of each new speaker. All excess white space and extraneous carriage returns will be removed. In general, you will find the Clean Text Transcript much easier to read.
But remember, do not use the Clean Text Transcript for a Redo, as the missing markup tags in the transcript could cause your results to be poorer – for Redos, go back and get the Raw Transcript with Markers to work from.
Our system ignores "white space" in the raw transcript when formatting both the caption files and the transcript -- this allows us to accept a wide variety of poorly-formatted transcripts from our transcribers without trouble. And, of course, there is no concept of a "paragraph break" in the caption file.
If you are submitting jobs for transcription only (that is, no caption results), then CaptionSync will automatically return only the Clean Text Transcript file to you. The Raw Transcript with Markers file is not available for transcription only jobs.
Having periodic Paragraphs (Carriage Returns/Line Feeds) on the Transcript:
When we receive a transcript from our transcribers, our system formats it and line breaks (Carriage Returns/Line Feeds) are only placed when there are speaker changes ( >> ) or standalone parenthetical comments ( [ comment ] ). If the audio consists of speech from just one speaker, or has no pauses, then no extra line breaks will be available in the transcript.
For Clean Text Transcripts, extra paragraph breaks can be added, using a special style marker in the transcript. You just need to make that request using our Guidance feature. E.g.:
Please insert a new paragraph every 30 seconds using the ^P marker.
- CaptionSync allows users to submit Guidance / Persistent Notes for the transcribers as long as they conform to our Transcription Guidelines for Captioning.