Closed Captioning your files with CaptionSync allows you to submit your own formatted transcript. This article shows how to add sync markers to it, and how to resolve sync issues in the captions.
Note that the following markup needs to be applied to the marked text transcript. If you are doing a Redo to repair sync, you need to edit the marked transcript (.txt) available for download from the Submission Details page, by Text Transcript.
TABLE OF CONTENTS:
The Basics
No content at the beginning of the media file
Non-spoken content at the beginning of the media file
No content at the end of the media file
Non-spoken content at the end of the media file
Intermediate segments of non-spoken content and pauses
Parenthetical comments followed by pauses
Removing Parenthetical comments
Marking non-spoken content when no comments are used in the transcript
Captions out of sync for multiple speakers
Captions out of sync in the middle of long-duration speech
Deleting entire captions
The Basics:
- The marker is of the form ^Mhh:mm:ss (hh=hours, mm=minutes, ss=seconds).
- Frames 00 to 29 can also optionally be appended, of the form ^Mhh:mm:ss:ff. So 00:00:00:15 is about half a second. For those familiar with broadcast, these are drop-frame timecodes (which approximate real-time).
No content at the beginning of the media file:
- If there is no content at the beginning of the media file, before the speech starts, note it using a ^B marker.
Example: No content before the first 7 seconds, then speech starts.
Transcription for automated captioning:
^B00:00:07
>> Hello. I am Mary.
Non-spoken content at the beginning of the media file:
- If there is non-spoken content at the beginning of the media file, before the speech starts, note it using a parenthetical comment followed by a ^M marker.
Example: Intro music plays for about 22 seconds, then talking starts.
Transcription for automated captioning:
[ Music ]
^M00:00:22
>> Welcome to CrimeWatch, my name is John Smith.
No content at the end of the media file:
- If there is no content at the end of the media file, after the speech stops, note it using a ^E marker.
Example: The speech stops at 15 minutes, 24 seconds, and then there is no content for the remaining 28 seconds till the end of the file.
Transcription for automated captioning:
>> Hope you had a great time!
^E00:15:24
Non-spoken content at the end of the media file:
- If there is non-spoken content at the end of the media file, after the speech stops, note it using a ^M marker followed by a parenthetical comment.
Example: The talking stops at 27 minutes, 30 seconds, and then there is music for the remaining 4 minutes till the end of the file.
Transcription for automated captioning:
>> I’m John Smith and we’ll see you next week on CrimeWatch.
^M00:27:30
[ Music ]
Intermediate segments of non-spoken content and pauses:
- If there is music, applause, noise, silence, etc, in between spoken portions, they need to be marked using ^M markers and parenthetical comments, or ^E ^B markers.
Example 1: The speaker stops for applause between 13 minutes, 42.5 seconds and 13 minutes, 57 seconds.
Transcription for automated captioning:
And I will work to make this community safer!
^M00:13:42:15
[ Applause ]
^M00:13:57
Thank you very much!
Example 2: Speaker #1 stops talking at 5 minutes, 23 seconds, music is played, and then speaker #2 starts talking at 6 minutes, 37 seconds.
Transcription for automated captioning:
Thank you for your time.
^M00:05:23
[ Music ]
^M00:06:37
>> Hello. Thanks for being here.
Example 3: Some noise interrupts the speaker between 28 minutes, 17.5 seconds and 28 minutes, 21 seconds.
Transcription for automated captioning:
The cells conform to the--
^M00:28:17:15
[ Noise ]
^M00:28:21
Wow, that was loud. Let's proceed.
Example 4: The speaker makes a pause between 18 minutes, 3 seconds, and 18 minutes, 7 seconds.
Transcription for automated captioning:
I'll just turn this off.
^E00:18:03
^B00:18:07
So, as I was saying.
Example 5: The speaker makes a pause between 1 hour, 25 minutes, 3 seconds, and 1 hour, 25 minutes, 31 seconds.
Transcription for automated captioning:
I'll drink some water.
^E01:25:03
^B01:25:31
OK. We'll resume now.
Parenthetical comments followed by pauses:
- If you have music, applause, noise, silence, etc, noted as a parenthetical comment, followed by a pause, and then speech or another parenthetical comment, you will need to use the ^F marker next to the comment.
Example 1: The speaker stops due to a phone ringing between 24 minutes, 16 seconds and 24 minutes, 20 seconds, then there is silence for 6 seconds, and finally the speech resumes.
Transcription for automated captioning:
Just heard a phone...
^M00:24:16
[ Ringing ]
^F00:24:21
^M00:24:26
Thanks for muting that.
Example 2: The speaker stops to play some music, followed by silence for 8 seconds. Music is played again, followed by silence for 4 seconds, and finally the speech resumes.
Transcription for automated captioning:
Let's hear this piece now.
^M00:38:42
[ Music ]
^F00:41:49
^M00:41:56
[ Music ]
^F00:43:04
^M00:43:07
What lovely music.
Removing Parenthetical Comments:
- If you want to remove a parenthetical comment, either to not caption an audio segment, or to replace an inaudible section with text, delete the comment but keep the ^M markers that were around it. You can then either leave just the markers, or add text in between them.
Example 1: The speaker stops for applause between 13 minutes, 42.5 seconds and 13 minutes, 57 seconds.
Transcription for automated captioning:
And I will work to make this community safer!
^M00:13:42:15
[ Applause ]
^M00:13:57
Thank you very much!
Removing the comment:
And I will work to make this community safer!
^M00:13:42:15
^M00:13:57
Thank you very much!
Example 2: Speaker #1 stops talking at 5 minutes, 23 seconds, music is played, and then speaker #2 starts talking at 6 minutes, 37 seconds.
Transcription for automated captioning:
Thank you for your time.
^M00:05:23
[ Music ]
^M00:06:37
>> Hello. Thanks for being here.
Removing the comment:
Thank you for your time.
^M00:05:23
^M00:06:37
>> Hello. Thanks for being here.
Example 3: Background noise makes it hard to discern the speech between 28 minutes, 17.5 seconds and 28 minutes, 21 seconds.
Transcription for automated captioning:
The cell has a very thin membrane, composed of lipids and protein, holding the cytoplasm--
^M00:28:17:15
[ Noise ]
^M00:28:21
The membrane acts as a...
Replacing the comment with text:
The cell has a very thin membrane, composed of lipids and protein, holding the cytoplasm
^M00:28:17:15
and it controls the passage of substances into and out of the cell.
^M00:28:21
The membrane acts as a...
Example 4: The mic is turned off between 18 minutes, 3 seconds, and 18 minutes, 7 seconds.
Transcription for automated captioning:
I'll just turn this off and--
^M00:18:03
[ Inaudible ]
^M00:18:07
No. Turning it on again.
Replacing the comment with text:
I'll just turn this off and
^M00:18:03
see if it sounds better.
^M00:18:07
No. Turning it on again.
Marking non-spoken content when no comments are used in the transcript:
- If you prefer to not use parenthetical comments in the transcript, to note music, applause, noise, etc, you can use ^E ^B markers to mark that content instead.
Example 1: Intro music plays for about 22 seconds, then talking starts.
Transcription for automated captioning:
^B00:00:22
>> Welcome to CrimeWatch, my name is John Smith.
Example 2: The talking stops at 27 minutes, 30 seconds, and then there is music for the remaining 4 minutes till the end of the file.
Transcription for automated captioning:
>> I’m John Smith and we’ll see you next week on CrimeWatch.
^E00:27:30
Example 3: The speaker stops for applause between 13 minutes, 42.5 seconds and 13 minutes, 57 seconds.
Transcription for automated captioning:
And I will work to make this community safer!
^E00:13:42:15
^B00:13:57
Thank you very much!
Example 4: Speaker #1 stops talking at 5 minutes, 23 seconds, music is played, and then speaker #2 starts talking at 6 minutes, 37 seconds.
Transcription for automated captioning:
Thank you for your time.
^E00:05:23
^B00:06:37
>> Hello. Thanks for being here.
Captions out of sync for multiple speakers:
- When there is dialog or speakers talking at the same time, you can mark content out of sync using mid markers in the transcript.
Example 1: A fast dialog between two speakers, where captions get out of sync.
Transcription for automated captioning:
^M00:04:14
>> And then I picked up the phone --
^M00:04:15
>> Where did you find it [cough]?
^M00:04:16
>> It was right there.
^M00:04:17
>> You grabbed it?
>> Yes, it was in plain sight! [Inaudible].
^M00:04:19
>> That's awesome.
Example 2: A fast dialog between two speakers, where captions get out of sync.
Transcription for automated captioning:
^M00:07:14
>> I asked for the form --
>> Did you get it?
^M00:07:15
>> Yes.
>> Did you read it first?
^M00:07:16
>> Didn't have time.
>> Oh no!
^M00:07:17
>> It's OK.
>> No, it's not.
Example 3: Multiple speakers talking at the same time, where captions get out of sync.
Transcription for automated captioning:
^M00:02:47
>> Mary was talking --
>> Did you see that?
^M00:02:48
>> --about the car.
>> No I didn't.
^M00:02:49
>> Did she ever fix that broken light?
>> It was right there.
^M00:02:50
>> I don't know.
>> I swear, I just saw John driving by.
Captions out of sync in the middle of long-duration speech:
- Sometimes captions can go out of sync in the middle of a long-duration speech. You can mark content out of sync using mid markers in the transcript.
Example 1: Captions go in and out of sync during a long-term speech lasting 10 minutes. Sprinkle the text with ^M in the places where the captions are out of sync, and mark any pauses in the content.
Transcription for automated captioning:
(...)
^M00:12:17
Information items to bring in the presence of day, you all know that two weeks ago, maybe three weeks ago, we had a great announcement. And we have had a very positive press. We do believe that this is going to provide a great opportunity for students to just focus on their studies. ^M00:12:32 Yesterday, we had a principal and superintendent's luncheon of the college, and it was very well attended. Superintendents from every single district were there. And I did tell them that we would be working on something that we do believe based on data available nationwide would have a significant impact on improving our graduation rates. ^M00:12:54 And working closely with student services, focus during this coming year on creating a schedule of classes. Let's prepare a schedule for at least four semesters. Studies done nationwide have shown that this has significantly helped students to stay focused.
^E00:13:07
^B00:13:12
In student affairs we're also busy. We are developing the process for students. So we look forward to that to simplifying the process for students. So attend our equity events. Next week we have a national expert, and he'll be sharing a lot of information. The accreditation committee is working. Not everybody could be there because so many people are involved, and everyone was engaged and extremely participatory. ^M00:13:30 They were good participants. It was a good conference. So it's a very good atmosphere, and they also said there would be some flexibility in regards to the data and they understand that institutions are struggling with that. So I encourage you to participate in those events.
It is a lengthy document, so I'm only going to share some high-level information. We're still trying to make sure that under our analysis section that what we're saying is accurate. ^M00:13:52 This is a requirement and has to be updated. And the purpose of the plan is to have institutions look specifically at data, and looking at specific classification types. So we really have to break the data down a lot more. It's important to note though that the data is rich, but they have left that responsibility to the institutions to determine that.
^M00:14:07
(...)
Deleting entire captions:
- If you wish to delete entire captions from the results, you can remove the text in the transcript and then use ^E ^B markers to note the deleted text in the timeline.
Example 1: The speaker stops for applause between 13 minutes, 42.5 seconds and 13 minutes, 57 seconds, but you want to remove the [ Applause ] caption.
Transcription for automated captioning:
And I will work to make this community safer!
^E00:13:42:15
^B00:13:57
Thank you very much!
Example 2: Speaker #1 stops talking at 5 minutes, 23 seconds, music is played, and then speaker #2 starts talking at 6 minutes, 37 seconds, but you want to remove the [ Music ] caption.
Transcription for automated captioning:
Thank you for your time.
^E00:05:23
^B00:06:37
>> Hello. Thanks for being here.
Example 3: There is content you no longer want captioned, for example ads.
Transcription for automated captioning:
^M00:05:17
>> Best Food is a presentation of Proud Food with support from the Food Council and the following sponsors:
^E00:05:22
^B00:07:08
>> We're back with Best Food and the best produce out there!
Example 4: There are multiple conversations and speakers at the same time, and you want to have just the two main speakers captioned.
Transcription for automated captioning:
^M00:02:47
>> Mary was talking about the car.
^E00:02:48
^B00:02:49
>> Did she ever fix that broken light?
>> I don't know.
^E00:02:51
^B00:02:54
>> Is she buying a new car?
>> She can't right now.
^E00:02:57
^B00:03:00
>> Hope it works out for her.
>> Yeah, me too.
^E00:03:03
^B00:03:05
>> She said maybe she'll buy it next Fall.
>> That would be nice.
Complete documentation on transcription formatting and markers is available in our Transcription Guidelines article. More information about our formatting guidelines is available in our Caption Results Not in Sync and Getting Sync Markers to Work articles.
Additional Background:
Prominent music or other "sweetening" that lasts for long durations can cause problems for AST’s CaptionSync automated captioning system. AST uses "sync markers" in the text transcript to inform the system about prominent non-spoken content and help it stay in sync around these points. Sync markers can be used to mark introductory music, exit music, or isolate music, applause, noise, pauses, or other sweetening within the file.
Comments
0 comments
Please sign in to leave a comment.