The .sub file is an XML 1.0 document with UTF-8 encoding which serves two purposes for SFTP processing:
- Since it is uploaded after the media file (and optionally the verbatim text transcript), it serves as a mechanism to signal that the data is in place for a particular submission. This eliminates issues such as the server reading the files before they have completed uploading.
- It also optionally conveys additional processing directives for the submission, such as submission type, notes to the transcriber, purchase order number, batch identifiers, etc.
Outline:
Since it is an XML 1.0 document with defaults pulled from the web UI, the simplest valid form would be just:
<submission>
</submission>
A more elaborate example would be:
<?xml version="1.0" encoding="UTF-8" ?>
<submission>
<App>Captioning</App>
<PO>00493-RT</PO>
<Rush>N</Rush>
<Notes>The names of the speakers are Carl Johns and Joel Kee.</Notes>
<BatchID>LearningFlash</BatchID>
</submission>
Keep in mind that the contents of the document must be valid XML. i.e.
- < needs to be replaced with <
- > needs to be replaced with >
- & needs to be replaced with &
- XML tags are case sensitive
Optional Tags:
The following are a list of valid tags. All are optional, except possibly Purchase Order which would be required if the account is set up as requiring a purchase order and there is not a valid one defined to use as default.
App | This can be one of the valid submission types associated with your account: e.g. Captioning, Transcription, or Production. Captioning is used when you want a closed caption file, such as a DFXP returned (this is the most common case). Transcription indicates that you need transcription only. In this case, you will receive only a .txt transcript file. Customers might use this for audio-only recordings. Production indicates that you want production transcripts returned. If this tag is not present, the submission will inherit the default set via the web UI. e.g. <submission> |
BatchID |
This is the optional batch identifier to be used with the submission. Numerous submissions can be grouped together under a batch identifier for reporting and collecting of results (all results for a particular batch are zipped together for retrieval via SFTP or the web UI). <submission> Keep in mind that the batch identifier needs to be nicely named -- no spaces, punctuation, or high ASCII characters, please. |
Description |
If present, this will be the description presented to the user in lieu of the basename of the media file. <submission> The database field is 765 characters in length – anything in your tag beyond 765 characters will be truncated. |
Notes |
These are notes to help the transcriber regarding the spelling of proper names, speaker ID formats, etc. <submission> Be particularly mindful of making sure these notes to the transcriber are valid XML – speaker ID formats such as >> need to be marked up as >> for example. If there are enabled Persistent Notes to the Transcriber configured on your account, these will be added to each submission first, followed by the content in the Notes tag. |
Language |
This can be one of the valid languages associated with your account: English, Spanish, Mixed Spanish/English, French or German. If not present, the submission will inherit the default set via the web UI. <submission> |
PersistentNote |
This is the tag to specify the integer ID of the persistent note to the transcriber to be used, or 0 if no persistent note to be used (overriding any default). Normally the default from the web UI is used to determine the persistent note to the transcriber (if any). The IDs for notes can be found via the id attribute of the PerNote tags in the .prefs file (retrieved by SFTP). <submission> |
TransExpert |
This is the tag to specify the integer ID(s) of the transcriber expertise area to be used, or 0 if no persistent note to be used (overriding any default). Normally the default from the web UI is used to determine the transcriber expertise area (if any). The IDs for notes can be found via the id attribute of the TransExp tags in the .prefs file (retrieved by SFTP). If more than one is passed they should comma delimited. <submission> |
Rush |
This is the flag to specify which transcription SLA should be used. Normally the default priority associated with the account is used to determine the SLA. This tag can be used to override that default. Note that the account must allow the selected SLA level and for Prepaid accounts, sufficient prepaid credits must be present. The permitted SLAs can be found via the code attribute of the TranscriptionSLA tags in the .prefs file (retrieved by SFTP). Presently values may include L (4 Business Day), T (2 Business Day), R (1 Business Day), H (8 Hour), J (Tier Two: Offshore 5+ Business Day), 6 (Tier Three: Machine), as well as a more generic Y (default expedited transcription for account), and N (default non-expedited transcription for account). <submission> |
Redo |
This tag is used to specify that this is a Redo submission. As such, no media file can be submitted with this submission. The original AST ID must be specified within this tag. The original AST ID can be determined from the web UI or from the .id file returned with the original submission and subsequent redos via SFTP. <submission> |
ResultReview |
This is the flag to specify if Result Review should be used or not (Y or N). Normally the default from the web UI is used to determine if Result Review will be requested. This tag can be used to override that default. Note that for Prepaid accounts, sufficient prepaid credits must be present. <submission> |
ReviseSettings |
This flag (value of Y) is to specify that this Redo submission should reread the advanced settings (as opposed to the default Redo behavior which is to use the settings from the original and expect a revised transcript). A revised transcript may be present if desired, but is not required with this tag. This tag is only valid with the presence of a Redo tag – see above for details on the Redo tag. <submission> |
Offset |
This tag is used to specify offset timecode for broadcast outputs for Captioning submissions in the format hh:mm:ss:ff. If not present, the default offset timecode associated with the account is used. If present for any submission type other than Captioning, it will be ignored. <submission> |
PO |
This is the purchase order number for the submission for accounts set up as requiring purchase orders. If not present, the submission will inherit the default or most recently used valid purchase order for the account via the web UI. <submission> Keep in mind that the purchase order number needs to be nicely named -- no spaces, punctuation, or high ASCII characters please. |
Source |
This tag is to identify the system from which the media to be captioned originated. The tag can be used for reporting volume coming from a particular system and/or used for getting results posted via call back URL. For the posting of results via the callback URL, an additional CallBack tag is required (see below). <submission> AST needs to approve the value of your tag if any specialized reporting or result posting is required. Also it needs to be nicely named and less than 128 characters -- no spaces, punctuation, or high ASCII characters please. |
CallBack |
This tag tells AST where to send results files back to via raw HTTP or HTTPS POST with a Content-Transfer-Encoding of binary. This tag needs to be present with an approved Source tag, otherwise this tag will be ignored. <submission> AST needs to approve the value of your Source tag in conjunction with your CallBack tag for any result posting to take place. This callback URL needs to be properly encoded e.g. %20 for SPACE. In addition, XML encoding may be required. For example The tag also needs to be less than 512 characters in length. If you need more than one result type returned for the submission and/or you need to differentiate between different types of result files posted back (e.g. a caption file and transcript file) an additional {fileType} macro also needs to be present in the callback URL and/or different MIME types need to set up for each result file. <submission> This could result in posts URLs like: https://67.34.4.22/bio1/closedCaptioning?id=1283232 The specific file types of the macro expansion (e.g., closedCaptioning, transcript, etc.) will need to be configured by AST. |
StatusURL |
This tag tells AST where to send status information back via raw HTTP or HTTPS POST with a Content-Transfer-Encoding of binary for a particular submission. <submission> This call back URL needs to be properly encoded e.g. %20 for SPACE. In addition, XML encoding may be required. For example The tag also needs to be less than 512 characters in length. The status information will be directly posted back to the specified URL as a raw POST with a MIME type of text/xml (note that there will not be any form URL encoding). The information will be presented in the following manner: <StatusUpdate> The status callback will be sent as soon as a fatal error occurs, but in general as far enough into the ingest processing to know that it is on track to succeed. For example, if an error occurs when attempting to extract audio from the media file, the status callback will be sent at that point. Otherwise, the system will not send the status callback until audio processing has been completed and an AST ID has been assigned. Note that in the case of a malformed .sub file, no callback will be made, rather an email will be sent to the account holder (provided the account holder has not deselected that email type in Contact Settings via the web UI). Type The tag will be one of Initial or Update. A status callback of the type of Update will only occur where a submission which was successfully ingested, subsequently fails or is rejected (the result will be FAILURE). Result The tag will be one of IN_PROCESS, NOT_INGESTED, or FAILURE. A status callback with a result of FAILURE will only occur where a submission which was successfully ingested, subsequently fails or is rejected (the type will be Update). e.g. The transcriber tells us there is no audio: <StatusUpdate> ASTid The tag will not be present for the Result of NOT_INGESTED, otherwise it will reference the CaptionSync AST ID of the submission. ASTstatus The tag will not be present for the Result of NOT_INGESTED, otherwise it will reference the current CaptionSync AST status of the submission. ErrDetail The tag will only be present for the Result of NOT_INGESTED and FAILURE, e.g.: <StatusUpdate> It is important to consider the possibility that the callback itself fails and no status information about the submission is received (e.g. 500 Internal Server Error). If no status information is received, it is important not to make assumptions about the status, rather further investigation is required. |
Transcriber |
This is the name of the 3rd part transcription entity for the submission request to be sent to. Only applicable if there is more than one transcriber associated with the account. If not present, the submission will inherit the default transcriber set via the web UI. This is rarely used. <submission> |
AudioDesc |
This flag (value of Y) is to specify that audio description is to be added to the submission. This can only be added for the App value of Captioning. e.g. <submission> |
ADCustReview |
This is the flag to specify if the customer should review the description before results are generated or not (Y or N). Normally the default from the web UI is used to determine if this will be requested. This tag can be used to override that default. <submission> <AudioDesc>Y</AudioDesc> <ADCustReview>Y</ADCustReview> </submission> |
ADCallBack |
This tag tells AST where to send audio description result files back to via raw HTTP or HTTPS POST with a Content-Transfer-Encoding of binary. This tag needs to be present with an AudioDesc tag, see above, otherwise this tag will be ignored. e.g. <submission> This callback URL needs to be properly encoded e.g. %20 for SPACE. In addition, XML encoding may be required. For example The tag also needs to be less than 512 characters in length. If you need more than one result type returned for the audio description and/or you need to differentiate between different types of result files posted back (e.g. a VTT file and MP3 file) an additional {fileType} macro also needs to be present in the callback URL and/or different MIME types need to set up for each result file. |
ADNotes |
These are notes to help the describer regarding the spelling of proper names, etc. <submission> Be particularly mindful of making sure these notes to the describer are valid XML – speaker ID formats such as >> need to be marked up as >> for example. If there are enabled Persistent Notes to the Describer configured on your account, these will be added to each submission first, followed by the content in the ADNotes tag. |
Note that this article corresponds to "XML Layout v21.pdf".
Comments
0 comments
Article is closed for comments.