The .lst file is an XML 1.0 document with UTF-8 encoding which conveys the list of URLs to be processed by CaptionSync along with optional additional processing directives such as submission type, notes to the transcriber, purchase order number, etc.
** Note that .lst files will only be processed after your Source tag is approved/configured by AST. **
Outline:
Since it is an XML 1.0 document with defaults pulled from the web UI, the simplest valid form would be something like this:
<List>
<Description>2011-08-01 List for Joe</Description>
<ListItem>
<URL>http://myserver.org/mediafile.mov</URL>
</ListItem>
</List>
A more elaborate example would be:
<?xml version="1.0" encoding="UTF-8" ?>
<List>
<Description>Computer Science 143, lectures</Description>
<App>Captioning</App>
<PO>00493-RT</PO>
<Rush>Y</Rush>
<Language>Spanish</Language>
<Source>PodcastVideo</Source>
<Notes>The names of the speakers are Carl Johns and Joel Kee.</Notes>
<ListItem>
<URL>http://myserver.org/12387987983.mov</URL>
<StatusURL>http://myserver.org/status?id=12387987983</StatusURL>
<CallBack>http://myserver.org/post_results?id=12387987983</CallBack>
</ListItem>
<ListItem>
<URL>http://myserver.org/12387987962.mov</URL>
<StatusURL>http://myserver.org/status?id=12387987983</StatusURL>
<CallBack>http://myserver.org/post_results?id=12387987962</CallBack>
</ListItem>
</List>
Keep in mind that the contents of the document must be valid XML. i.e.
- < needs to be replaced with <
- > needs to be replaced with >
- & needs to be replaced with &
- XML tags are case sensitive
Required Tags:
A valid list XML document must have one Description tag and at least one ListItem tag. The ListItem tag must contain one URL tag which specifies a URL to a media file to be captioned and/or transcribed:
e.g.
<List>
<Description>Computer Science 143, lectures</Description>
<ListItem>
<URL>http://myserver.org/mediafile.mov</URL>
</ListItem>
</List>
Description |
This is used for your tracking purposes and will be presented to the user to describe the list. If you select to have the list item collected together as a batch, this name will be used as the batch identifier. This description needs to be unique, though CaptionSync will try to robustly rename the description by appending timestamps if it is not. The database field is 250 characters in length – anything in your tag beyond 250 characters will be truncated. |
ListItem |
Each list item must have a URL tag and may optionally have a CallBack tag if a valid Source tag is provided for the list (see Section Optional List Tags and Optional List Item Tags below). URL This tag tells AST where to get the media file from, and what type it is. In general, the URL needs to specifically call out the media file, but additional GET parameters can be added. This URL needs to be properly encoded e.g. %20 for SPACE. In addition, XML encoding may be required. For example The tag also needs to be less than 2048 characters in length. |
Optional List Tags:
The following are optional tags for the list. These tags apply to all the list items in the list. All are optional, except possibly Purchase Order which would be required if the account is set up as requiring a purchase order and there is not a valid one defined to use as default.
App | This can be one of the valid submission types associated with your account: e.g. Captioning, Transcription, or Production. Captioning is used when you want a closed caption file, such as a DFXP returned (this is the most common case). Transcription indicates that you need transcription only. In this case, you will receive only a .txt transcript file. Customers might use this for audio-only recordings. Production indicates that you want production transcripts returned. If this tag is not present, the submission will inherit the default set via the web UI. e.g. <List> |
AsBatch |
This is the flag to specify that the list items should be grouped together as a batch for reporting and collecting of results (all results for a particular batch are zipped together for retrieval via SFTP or the web UI and will use the cleansed value of Description as the batch identifier). If not present, the list items will not be grouped together as a batch. <List> |
Notes |
These are notes to help the transcriber regarding the spelling of proper names, speaker ID formats, etc, and will apply to all list items. <List> Be particularly mindful of making sure these notes to the transcriber are valid XML – speaker ID formats such as >> need to be marked up as >> for example. If there are enabled Persistent Notes to the Transcriber configured on your account, these will be added to each list item first, followed by the content in the Notes tag. |
Language |
This can be one of the valid languages associated with your account: English, Spanish, Mixed Spanish/English, French or German. If not present, the list items will inherit the default set via the web UI. <List> |
PersistentNote |
This is the tag to specify the integer ID of the persistent note to the transcriber to be used, or 0 if no persistent note to be used (overriding any default). Normally the default from the web UI is used to determine the persistent note to the transcriber (if any). The IDs for notes can be found via the id attribute of the PerNote tags in the .prefs file (retrieved by SFTP). <List> |
TransExpert |
This is the tag to specify the integer ID(s) of the transcriber expertise area to be used, or 0 if no persistent note to be used (overriding any default). Normally the default from the web UI is used to determine the transcriber expertise area (if any). The IDs for notes can be found via the id attribute of the TransExp tags in the .prefs file (retrieved by SFTP). If more than one is passed they should comma delimited. e.g. <List> |
Result Review |
This is the flag to specify if Result Review should be used or not (Y or N). Normally the default from the web UI is used to determine if Result Review will be requested. This tag can be used to override that default. Note that for Prepaid accounts, sufficient prepaid credits must be present. The default can be found via the is_default attribute of the CanRR tag in the .prefs file (retrieved by SFTP). If CanRR is N, then Result Review is disabled for the user. <List> |
Rush |
This is the flag to specify which transcription SLA should be used. Normally the default priority associated with the account is used to determine the SLA. This tag can be used to override that default. Note that the account must allow the selected SLA level and for Prepaid accounts, sufficient prepaid credits must be present. The permitted SLAs can be found via the code attribute of the TranscriptionSLA tags in the .prefs file (retrieved by SFTP). <List> |
Offset |
This tag is used to specify offset timecode for broadcast outputs for Captioning submission types in the format hh:mm:ss:ff. If not present, the default offset timecode associated with the account is used. If present for any submission type other than Captioning, it will be ignored. <List> |
PO |
This is the purchase order number for the list for accounts set up as requiring purchase orders. If not present, the submission will inherit the default or most recently used valid purchase order for the account via the web UI. <List> Keep in mind that the purchase order number needs to be nicely named -- no spaces, punctuation, or high ASCII characters please. |
Source |
This tag is to identify the system from which the media to be captioned originated. The tag can be used for reporting volume coming from a particular system and/or used for getting results posted via call back URL. For the posting of results via the call back URL, an additional CallBack tag is required (see next section). <List> AST needs to approve the value of your tag if any specialized reporting or result posting is required. Also it needs to be nicely named and less than 128 characters -- no spaces, punctuation, or high ASCII characters please. |
Optional List Item Tags:
CallBack |
This tag tells AST where to send results files back to via raw HTTP or HTTPS POST with a Content-Transfer-Encoding of binary for a particular list item. This tag needs to be present with an approved Source tag, otherwise this tag will be ignored. <List> AST needs to approve the value of your Source tag in conjunction with your CallBack tag for any result posting to take place. This call back URL needs to be properly encoded e.g. %20 for SPACE. In addition, XML encoding may be required. For example The tag also needs to be less than 512 characters in length. If you need more than one result type returned for the list item and/or you need to differentiate between different types of result files posted back (e.g. a caption file and transcript file) an additional {fileType} macro also needs to be present in the callback URL and/or different MIME types need to set up for each result file. <List> This could result in posts URLs like: https://67.34.4.22/bio1/closedCaptioning?id=1283232 The specific file types of the macro expansion (e.g., closedCaptioning, transcript, etc.) will need to be configured by AST. |
StatusURL |
This tag tells AST where to send status information back via raw HTTP or HTTPS POST with a Content-Transfer-Encoding of binary for a particular list item. <List> This call back URL needs to be properly encoded e.g. %20 for SPACE. In addition, XML encoding may be required. For example The tag also needs to be less than 512 characters in length. The status information will be directly posted back to the specified URL as a raw POST with a MIME type of text/xml (note that there will not be any form URL encoding). The information will be presented in the following manner: <StatusUpdate> The status callback will be sent as soon as a fatal error occurs, but in general as far enough into the ingest processing to know that it is on track to succeed. For example if a 404 error occurs when attempting to fetch the media file, the status callback will be sent at that point. Otherwise the system will not send the status callback until the file has been fetched, audio processing has been completed and an AST ID has been assigned. Note that in the case of a malformed .lst file, no callback will be made, rather an email will be sent to the account holder (provided the account holder has not deselected that email type in Contact Settings via the web UI). Type The tag will be one of Initial or Update. A status callback of the type of Update will only occur where a submission which was successfully ingested, subsequently fails or is rejected (the result will be FAILURE). Result The tag will be one of IN_PROCESS, NOT_INGESTED, or FAILURE. A status callback with a result of FAILURE will only occur where a submission, which was successfully ingested, subsequently fails or is rejected (the type will be Update). e.g. The transcriber tells us there is just silence in the audio: <StatusUpdate> ASTid The tag will not be present for the Result of NOT_INGESTED, otherwise it will reference the CaptionSync AST ID of the list item. ASTstatus The tag will not be present for the Result of NOT_INGESTED, otherwise it will reference the current CaptionSync AST status of the list item. ErrDetail The tag will only be present for the Result of NOT_INGESTED and FAILURE. e.g. <StatusUpdate> It is important to consider the possibility that the callback itself fails and no status information about the list item is received (e.g. 500 Internal Server Error). If no status information is received, it is important not to make assumptions about the status, rather further investigation is required. |
Basename |
This tag tells AST how to name the description of the submission pertaining to this list item. If not present the system will try to use an appropriate name gleaned from the URL. <List> It needs to be nicely named and less than 128 characters -- no spaces, punctuation, or high ASCII characters please. |
AudioDesc |
This flag (value of Y) is to specify that audio description is to be added to the submission pertaining to this list item. This can only be added for the App value of Captioning. <List> |
ADCallBack |
This tag tells AST where to send audio description result files back to via raw HTTP or HTTPS POST with a Content-Transfer-Encoding of binary for a particular list item. This tag needs to be present with an AudioDesc tag, see above, otherwise this tag will be ignored. <List> This call back URL needs to be properly encoded e.g. %20 for SPACE. In addition, XML encoding may be required. For example The tag also needs to be less than 512 characters in length. If you need more than one result type returned for the audio description and/or you need to differentiate between different types of result files posted back (e.g. a VTT file and MP3 file) an additional {fileType} macro also needs to be present in the callback URL and/or different MIME types need to set up for each result file. |
ADNotes |
These are notes to help the describer regarding the spelling of proper names, etc, and will apply to all list items. <List> Be particularly mindful of making sure these notes to the transcriber are valid XML – speaker ID formats such as >> need to be marked up as >> for example. If there are enabled Persistent Notes to the Describer configured on your account, these will be added to each list item first, followed by the content in the ADNotes tag. |
Note that this article corresponds to "List XML Layout v16.pdf".
Comments
0 comments
Article is closed for comments.