CaptionSync accepts many of today's common video and audio formats as input; however, for most of our services, only an audio version of the media is required. When a request is made, CaptionSync automatically extracts an audio version of the input file - this is the file both the transcribers and the system use.
CaptionSync uses an automated method to generate captions based on a verbatim transcript and the audio version of the media input. The media files can either be mono or stereo.
Stereo was created to provide a sense of "depth" or field separation for musical content. Audio producers use slight phase and/or volume differences between the two channels to change the perception of where the sound is coming from. But this concept really does not apply to dialog content, which is the main type of material we caption.
When presented with a stereo signal, there are 2 choices: we can either choose one channel, or mix the two channels together. In the vast majority of cases for dialog content, the content is not truly stereo -- it is just two channels with the same audio content. In these cases, either approach yields the same result. If the two channels are different, then the approaches will diverge.
If there is unbalanced audio with different content on each channel, then extracting a single channel will fail because we will only get part of the content. If the content is true stereo (with phase differences between the channels), then the extracted result will sound like reverb has been applied, which is probably undetectable to most listeners, but can cause sync issues. In either case, stereo content can yield incomplete or incorrect results.
Because the problem of unbalanced audio is very noticeable (whereas the phase offset problem would not be), we opted to use a single channel extraction process.
Problems with Results:
We recommend that the input media files submitted to CaptionSync are either mono or balanced stereo (same content in both channels). Unbalanced audio could cause content to be missing. If you notice that the results are is incomplete, please open a Support ticket so that we can troubleshoot further.
If your media is unbalanced, you can use a converter to transcode it into a similar format with mono audio; or simply extract just the audio content into the MP3 or WAV formats, if you don't require Video Encoding.