License
Copyright 2012, 2013, 2014, 2015 transLectures-UPV Team / Machine Learning and Language Processing (MLLP) research group.
Licensed under the Apache License, Version 2.0.
The transLectures-UPV Platform (TLP) includes software developed at the Universitat Politècnica de València (UPV) by the MLLP research group as part of the transLectures EU project until TLP Version 1.0.1.
1. Introduction
The transLectures-UPV Platform (TLP) consists of a set of software tools for multilingual automatic subtitling of large video repositories, as well as the integration of these processes into existing workflows. It was developed up to Version 1.0.1 by the MLLP research group from the Universitat Politècnica de València (UPV) as part of the EU research project transLectures. After the end of transLectures, TLP is still being maintained by the MLLP research group, with the release of TLP versions 1.1, 1.2, 2.0 and 2.1.
2. Getting Started
In this section we will give a brief overview of the transLectures-UPV Platform (TLP), describing the workflows involved when integrating Automatic Speech Recognition (ASR), Machine Translation (MT) and Text-To-Speech Systhesis (TTS) technologies into large media repositories with the aid of this platform.
2.1. TLP Overview
TLP is a self-contained piece of software that includes everything that is needed in order to integrate transcription, translation and speech synthesis technologies into large media repositories. Its main components are the Database, the Web Service, the Ingest Service and the Player, each of which are described in their corresponding sections. TLP offers several client tools that are also described in detail in the Client Tools Section.
The figure below shows the main components of TLP and a simplification of the interactions between them.
2.2. Use Cases
We have defined three use cases to illustrate the main ways a media repository and its users can interact with the TransLectures-UPV Platform:
-
A new recording from the media repository is uploaded to TLP for the generation of automatic subtitles and audio tracks.
-
A user plays a media file with subtitles from the media repository’s website.
-
A user corrects subtitle errors (transcription or translation).
A lecturer/speaker records a new lecture/media in a recording studio, in a classroom, or during a conference. To get this video transcribed and translated into several languages, a Media Package File (MPF) package made up with the recorded media file plus metadata is created and sent to the TLP Web Service via the /ingest interface. The TLP Ingest Service unpacks the MPF and launches the required transcription, translation and/or speech synthesis processes. During this stage, the client (the remote media repository) can check at any time the progress of the upload using the /status endpoint of the Web Service. Finally, the Ingest Service creates a new media record in the Database and stores all media, subtitles, and synthesized audiotrack files.
A user browses the media repository’s catalogue and selects the media he or she wants to watch/listen using the repository’s media player. The user can watch the selected media with subtitles in different languages, or even listen to it in another language using automatically synthesized audio tracks where available. To get the list of all subtitle languages available, the repository’s media player sends a request to the /langs interface of the TLP’s Web Service, displaying to the user the language availability. As the user selects the desired subtitle language, the repository’s media player calls the /subs endpoint to download the corresponding subtitle file in the required format (srt, vtt, dfxp, etc.), which is immediately processed and displayed in the media player. A similar procedure is applied when the user requests a synthesized audio track, but in this case the media player makes use of the /audiotrack interface instead of the /subs one.
A user, while playing a media file with subtitles (as shown above in use case 2), notices that the displayed subtitles contain some errors and decides to correct them. To do this, the user presses an Edit Subtitles button (or similar) that is shown by the repository’s media player, and afterwards, the user is redirected to the TLP Player. The TLP Player offers an ergonomic and efficient interface for subtitle editing. It loads the main media file and the subtitles file by calling the /metadata and /subs interfaces of the Web Service, respectively. Any corrections made by the user are sent back to the Web Service via the /mod interface and appended to the original DFXP file. The updated DFXP file is committed to the Database and afterwards, automatic translations and synthesized audio tracks are automatically re-generated using user corrections.
3. Database
The TLP Database is a SQL-based relational database which stores all the data required for the Web Service and the Ingest Service. The main categories of data stored in the Database are the following:
-
Media/Lecture: All the information related to a specific media/lecture is stored in the database, including language, duration, title, keywords and category. An external ID, provided by the client repository, is used to identify the media object in all transactions performed between the client and the Web Service API.
-
Speakers: Information about the speaker/lecturer can be used by the ASR system to adapt the underlying models to the unique characteristics of the given speaker and, therefore, improve the quality of the resulting subtitles.
-
Subtitles: All subtitles automatically generated by the Ingest Service are stored in DFXP format into the database and retrieved by the client via the Web Service.
-
Audiotracks: As in the case of subtitles, automatically synthesized audio tracks from translated subtitles are also stored in the database.
-
Uploads: Every time an /ingest operation is performed, a new upload entry is stored in the database to track its progress.
4. Web Service
The TLP Web Service is the API interface for exchanging information and data between the client’s media repository and the transLectures-UPV Platform. It also enables the subtitle display and editing capabilities of the TLP Player. The Web Service defines a wide set of API HTTP interfaces to allow for the full integration between TLP and the remote media repository:
/ingest
|
Upload media (audio/video) files and any attachments and metadata to the TLP Server for automatic multilingual subtitling and speech synthesis. |
/uploadslist
|
Get a list of all the user’s uploads. |
/status
|
Check the current status of a specific upload ID. |
/systems
|
Get a list of all available Speech Recognition, Machine Translation, and Text-To-Speech Systems that can be applied to transcribe, translate, and synthesize a media file. |
/metadata
|
Get metadata and media file locations for a given media ID. |
/langs
|
Get a list of all subtitle and audiotrack languages available for a given media ID. |
/subs
|
Download the current subtitle file for a given media ID and language. |
/audiotrack
|
Download an audiotrack file for a given media ID and language. |
/start_session
|
Starts an edition session to send and commit modifications of a subtitles file. |
/session_status
|
Returns the current status of the given session ID. |
/mod
|
Send and commit subtitle corrections under an edit session. |
/end_session
|
Ends an open edition session, and depending on the confidence of the user, editions are directly stored in the corresponding subtitles files or left for revision. |
/lock_subs
|
Allow/disallow regular users to send subtitles modifications for an specific Media ID. |
/edit_history
|
Returns a list of all edit sessions that involved an specific media ID. |
/revisions
|
Returns a list of all edit sessions for all API user’s media files that are pending to be revised. |
/mark_revised
|
Mark/unmark as revised an specific edit session ID, typically from another Session ID on the TLP Player. |
/accept
|
Accept modifications of one or more pending edit sessions without having to revise them. Modifications are commited into the corresponding subtitles files. |
/reject
|
Reject modifications of one or more pending edit sessions without having to revise them. |
A detailed description of the Web Service API can be found in this Appendix. In addition, TLP offers several tools to interact with this API; you will find more information about them in the Client Tools Section.
4.1. API User Authentication
The TLP Web Service comes with a custom API user authentication system based on authentication tokens. Every API call must include a valid authentication token in order to authenticate the API user. TLP offers two different authentication methods:
-
Secret Key: An API user authentication token, associated to the user account, is provided to the Web Service. This token is valid for user authentication on all API interfaces. This is the recommended authentication method for direct client to server API calls.
-
Request Key: A lifetime-limited request-dependent authentication token is provided to the Web Service. This token is valid for user authentication only on a reduced set of API interfaces and for a limited period of time. This authentication method should be used in case the use of the secret key as an authentication token could be exposed or revealed to third-parties, for instance when a user belonging to the API client organisation is using the TLP Player to edit a subtitle file (in this case, the authentication token is sent via URL parameters).
The figure above shows a typical integration scenario between TLP and the remote media repository, in which the Secret Key authentication method is used for all direct API calls between both parts, whilst the alternative Request Key method is used to generate TLP Player URLs that will be followed by the repository’s users to review media subtitles. In this latter case, the Request Key is the authentication token used in all API calls between the Player and the Web Service.
For further information and technical details, please refer to the Preface of the Web Service’s API Documentation.
5. Player
The TLP Player is an HTML5 media player which allows users to review and modify media subtitles with ease. The Player provides a highly ergonomic editing interface, optimized to reduce user effort.
The TLP Player can be called externally using a valid URL. For further technical information, please refer to the Calling the TLP Player Annex.
5.1. User Guide
The TLP Player will automatically load the media file and subtitles. Manually edited subtitle segments will be shown in green, while automatic subtitle segments will appear in black.
-
: Jump to the beginning of the previous subtitle segment (Up arrow)
-
: Seek video -1.5 seconds (Alt + Left arrow)
-
: Play/Pause video (Tab)
-
: Seek video +1.5 seconds (Alt + Right arrow)
-
: Jump to the beginning of the next subtitle segment (Down arrow)
-
: Reveals a Help layer with descriptions and keyboard shortcuts.
-
: Saves both Reference and Editing subtitle changes, if any.
-
: Allows the user to select the Reference and Editing languages. The Editing language must be selected first, and it is the language the user wishes to edit. The Reference language can be optionally displayed to help the user in the translation process. In this mode, both Reference and Editing subtitles can be edited simultaneously.
-
: Allows the user to select different editing layout modes.
-
: Shows different options such as download/import subtitle file or enable/disable the Advanced mode.
-
Enter (or click): Edit/Confirm the current segment.
-
Shift + Tab: Replay the current segment from the beginning.
-
Ctrl + Enter: Create a new segment starting on current media time.
-
Ctrl + S: Split the current segment.
-
Ctrl + Backspace: Join the current and the previous segments.
6. Ingest Service
The Ingest Service is the service devoted to handle and process Media Package Files (MPF) uploaded via the /ingest interface of the Web Service. The Ingest Service checks periodically (typically every minute) whether new MPFs have been uploaded in order to start their processing, also checking if the ongoing uploads are progressing correctly or have failed. The uploads table of the Database is used to keep track of the status of every upload.
Media Package File specifications can be found in this Appendix.
The figure above shows the internal structure of the Ingest Service, which is split in two layers:
The Upper Layer implements the main logic of the Ingest Service using a modular design. It has a central node, the Core, which the logic of all possible workflows that can be followed by a MPF, leaving data processing tasks to external modules. This means that the functionalities of the Ingest Service can be easily modified, replaced or extended by swapping these external modules with others, e.g., other Automatic Speech Recognition (ASR) and Machine Translation (MT) modules.
External modules can be divided in two categories:
-
Base Modules: Modules that implement APIs for basic operations used by the Core.
-
URL Downloader: Module that allows for the download media files from a given URL address. It also offers the possibility of downloading obfuscated URLs such as YouTube or Vimeo using external plug-ins, called URL decoders.
-
Media Module: Module that offers several methods of media format conversion.
-
Mailer Module: Module with routines used to send e-mail notifications regarding upload status updates.
-
-
transLectures Modules: Modules that integrate transcription, translation and speech synthesis technologies into the Ingest Service.
-
ASR Modules: Automatic Speech Recognition Modules, used to generate transcription subtitle files.
-
MT Modules: Machine Translation Modules, used to generate translated subtitle files.
-
TTS Modules: Text-To-Speech Modules, used to generate synthesized audiotracks in a specific language.
-
Text Retrieval Module: Extracts plain text information from the different file resources included in the MPF. It also downloads related text documents from the web. This text data can be used by ASR Modules to enhance transcription quality by adapting the underlying ASR System to the topic of the media file.
-
The Lower Layer satisfies all local installation dependencies related to data storage and job scheduling. It is split into two parallel sublayers:
-
Scheduler layer: Implements an API for launching and scheduling the transcription and translation processes, typically in a Grid Engine/Job Management System.
-
Storage layer: Implements an API that allows access to the data stored in the Database and in the TLP Server’s hard drive.
6.1. Uploads Workflow
In this section we explain the different steps an upload can follow from the moment it is ingested into the transLectures-UPV Platform until its processing finishes.
First we must distinguish between four types of operations:
-
New Media: This operation is requested when a newly-recorded, non-existing media is uploaded to TLP for the first time. In this operation, a new Media object is created in the Database.
-
Update Media: This operation is requested when updates are applied to an existing media. For instance, new text resources such as slides might be added to the Media Package File (MPF) to improve the automatic transcription and translations of the existing media, or to update the existing media file with a re-recording.
-
Delete Media: This operation is requested when a media is deleted from the remote repository.
-
Cancel Upload: This operation is requested to cancel an ongoing upload for whatever reason.
Depending on the type of operation and the input data, the steps an upload follows in the Ingest Service may vary. The figure below illustrates the standard Ingest Service workflow:
Media Package Files are uploaded to the transLectures-UPV Platform via the Web Service's /ingest interface and stored in the Database. The Ingest Service reads the uploads table of the database and starts processing the uploaded MPF. An upload will typically follow the following sequential steps, with some exceptions (some steps might be skipped depending on the input data):
-
Media Package Processing: The MPF is processed for the first time, performing several security, data integrity and data format checks, and, if all checks are correct, the upload status moves to the next processing step.
-
Transcription Generation: In this step, a transcription file in DFXP format is generated from the main media file (video, audio) using an Automatic Speech Recognition (ASR) Module.
This step is skipped in the following cases:
-
The Ingest Service does not feature a suitable ASR Module for the source language of the main media file.
-
Subtitles in the source language were provided in the MPF.
-
The client has explicitly not requested this step.
-
In update operations that do not involve re-transcribing the lecture.
-
In delete or cancel operations.
-
-
Translation(s) Generation: In this step, one or more translation files in DFXP format are generated from a transcription file (either automatically generated in the previous step, or provided in the MPF), using the appropriate Machine Translation Modules.
This step is skipped in the following cases:
-
The Ingest Service does not have suitable MT Modules for the source language of the main media file.
-
Subtitles in all requested translation languages offered by the Ingest Service are already provided in the MPF.
-
The client has explicitly not requested this step.
-
In update operations that do not involve re-translating the lecture.
-
In delete or cancel operations.
-
-
Text-To-Speech Track Generation: In this step one or more synthesized audiotrack files are generated from a translation file (either automatically generated in the previous step or provided in the MPF), using the appropriate Text-To-Speech Modules. This step is skipped in the following cases:
-
The Ingest Service does not have suitable TTS Modules for the target language of any translation files.
-
Audiotracks in all requested languages offered by the Ingest Service are already provided in the MPF.
-
The client has explicitly not requested this step.
-
In update operations that do not involve re-translating the lecture.
-
In delete or cancel operations.
-
-
Media Conversion: In this step, the main media file is converted into the media formats required by the TLP Player in order to maximize browser compatibility. This step is skipped in the following cases:
-
All required media files were attached in the MPF.
-
In update operations where the main media file has not changed.
-
In delete or cancel operations.
-
-
Store Data: This is the final step. For New Media and Update Media operations, the data contained in the MPF and the data automatically generated by the Ingest Service are stored in the Database. For Delete Media operations, all previously stored media files and data are deleted.
In every execution of the Ingest Service, the Core reviews which uploads are being processed, checking whether the related processes are:
-
Queued: Processes are queued when they are waiting to be executed. No action is performed.
-
Running: Processes are being executed. No action is performed.
-
Finished: All processes finished successfully. The Core changes the upload status to the next processing step.
-
Failed: Some processes failed. The Core changes the upload status to an error state.
Detailed information about the Ingest Service workflows and behaviour can be found in this Appendix.
6.2. User Quota
Each TLP user / API client account has an upload quota. This quota represents the remaining number of videos and media time that the user can upload. Once a new media file is uploaded, the Ingest Service checks whether the client has enough quota to process that particular media, updating accordingly the user’s quota after processing the media file (the length of the uploaded media is subtracted from the total remaining time). Automatic re-transcriptions and re-translations do not decrease the user’s quota.
The Ingest Service features a Test Mode that allows the client to perform integration tests without consuming quota, and obtaining fast responses. For more information please refer to the Manifest JSON File Specification. |
7. Client tools
The transLectures-UPV Platform offers several libraries and command-line utilities in order to facilitate the client’s interaction with
the Web Service API and the TLP Player. These tools
are located under the misc/client-tools
folder.
-
ws-client.py
: Script to call all Web Service interfaces. -
player-url-generator.py
: Generates valid URLs to the TLP Player.
-
Python →
libtlp.py
(see Documentation) -
PHP →
libtlp.php
(see Documentation)
Appendices
Appendix A: Installation
In this section we provide intallation instructions to properly set up TLP.
Despite the distributed nature of TLP, that is, that each of its components can in theory be installed on different machines, for the sake of simplicity we recommend they be installed on a single machine to create what we have called a TLP Server.
We have tested a TLP installation on Ubuntu 14.04 LTS Desktop and Server versions. It has not been tested on other versions or distributions. All installation notes contained in this documentation are based on Debian/Ubuntu-based distributions and may not, therefore, be applicable to other distributions.
What follows are the minimum and recommended hardware requirements for installing TLP on a single machine.
Minimum hardware requirements
-
Intel Core i5 processor.
-
4 GB RAM.
-
500 GB of free hard disk space.
-
Linux-based operating system.
Recommended hardware requirements (TLP + in-house grid engine)
-
High-end Intel Core i7 processor.
-
2 x high-end GPUs with Nvidia CUDA support.
-
128 GB RAM.
-
2 TB of free hard disk space.
-
Ubuntu Server 14.04 LTS operating system.
First, we recommend the creation of a TLP user on your machine
for running all tasks and processess related to TLP workflows.
In our installation examples, we will assume that a tluser
user
and tlgroup
group have been created,
as well as a home directory for this user
(/home/tluser
) where all media, subtitles and uploaded files will be stored.
In Debian/Ubuntu systems this can be done as follows:
sudo useradd -d /home/tluser -m -s /bin/bash tluser
sudo groupadd tlgroup
sudo usermod -a -G tlgroup tluser
Manual installation and configuration guides for each TLP component:
Database
Requirements
-
PostgreSQL server and client (version 9.1 or above).
Installation Steps
The following steps take you through the process of creating a transLectures Repository, including the creation of a system user, the Database and the required directory structure.
-
Install the PostgreSQL server and client packages. On Ubuntu 14.04 LTS, this is easy to do using the following command line:
sudo apt-get install postgresql
-
Create a new database user, the transLectures user (
tluser
), which will be used when connecting to the TLP Database. Remember to set a password for this user (e.g.tlpass
). Note that, in order to execute the following commands, we must be operating as the default database superuser,postgres
.sudo -u postgres createuser -s tluser sudo -u postgres psql -c "ALTER USER tluser WITH PASSWORD 'tlpass'"
-
Create a new database with the name
tldb
, and insert the Database schema and static data (located atdb/sql/schema.sql
anddb/sql/static_data.sql
, respectively). Note that, in order to execute the following commands, we must be operating as the transLectures user (tluser
, see Manual Installation).sudo -u tluser createdb tldb sudo -u tluser psql -f "db/sql/schema.sql" tldb sudo -u tluser psql -f "db/sql/static_data.sql" tldb
-
Create Media, Transcriptions and Uploads root directories, and set proper directory permissions. Create these directories in the system user’s home directory.
sudo mkdir -p /home/tluser/tlp-repo/media sudo mkdir -p /home/tluser/tlp-repo/trans sudo mkdir -p /home/tluser/tlp-repo/uploads sudo chown tluser:tlgroup /home/tluser/tlp-repo/* sudo chmod 775 /home/tluser/tlp-repo/* sudo chmod g+s /home/tluser/tlp-repo/*
-
Insert a new row in the machines table and add a machine named
localhost
, with IP127.0.0.1
and ID0
:sudo -u tluser psql tldb -c "INSERT INTO machines (id, hostname, ip) VALUES (0, 'localhost', '127.0.0.1');"
-
Insert the mount point for each of the three directories mentioned above for the machine ID
0
created in step 5.sudo -u tluser psql tldb -c " INSERT INTO mount_points (name, machine_id, path) VALUES ('media', 0, '/home/tluser/tlp-repo/media'), ('transcriptions', 0, '/home/tluser/tlp-repo/trans'), ('uploads', 0, '/home/tluser/tlp-repo/uploads');"
-
(optional) Check the connection to the Database using the following command line:
sudo -u tluser psql tldb
Web Service
Requirements
-
Apache HTTP Server (version 2.0 or above).
-
Apache Mod modwsgi library for Apache (version 3.3 or above).
-
Psycopg Python library (version 2.4 or above).
-
Paste Python library (version 1.7 or above).
Installation Steps
-
Install the required software dependencies. On Ubuntu 14.04 LTS, this is easy to do using the following command line:
sudo apt-get install apache2 apache2-utils libapache2-mod-wsgi python-psycopg2 python-paste
-
Copy the
web-service
andlib
directories into the desired installation directory. In our example we use/home/tluser/tlp
.sudo mkdir -p /home/tluser/tlp sudo cp -r web-service lib /home/tluser/tlp/
-
Configure your Apache
sites-enabled
file so that a relative address of your HTTP Server points to the Web Service's WSGI script (/home/tluser/tlp/web-service/ws.py
). Here we configure it so that the Web Service is accessible from the relative path/api
(e.g.http://myserver.com/api
). In Ubuntu 14.04 LTS, the Apache configuration file is located at/etc/apache2/sites-enabled/000-default.conf
. You have to add the following command line to your<VirtualHost>
directive(s):<VirtualHost *:80> ... WSGIScriptAlias /api /home/tluser/tlp/web-service/ws.py <Directory /home/tluser/tlp/web-service> Require all granted </Directory> ... </VirtualHost>
-
Add
www-data
user to thetlgroup
group.sudo usermod -a -G tlgroup www-data
-
Restart Apache server. On Ubuntu 14.04 LTS:
sudo service apache2 restart
Configuration
The Web Service comes with a configuration file that
indicates its root directory and database connection parameters,
among other information. This configuration file must be given the name config.ini
and be located in the Web Service installation dir
(i.e. /home/tluser/tlp/web-service/config.ini
).
The specification of the configuration file is as follows:
- key_generator = <string>
-
Web Service’s Secret Key generator.
- db_name = <string>
-
Database name.
- db_user = <string>
-
Database user name.
- db_host = <string>
-
Database hostname or IP address.
- db_passwd = <string>
-
Database user password. You can leave this field empty if Database SSL Auth is enabled.
- use_urls = <boolean>
-
Return URLs instead of absolute file paths when returning URIs of media files.
- base_url = <boolean>
-
URL prefix used to create full media URLs.
- enabled = <boolean>
-
Send e-mail alerts whenever the Web Service fails for whatever reason.
- smtp_server = <string>
-
SMTP server to send e-mails.
- from_address = <string>
-
From address.
- to_address = <string>
-
Comma-separated recipient e-mails.
- html_msg = <string>
-
HTML code to be returned by the Web Service when accessing to unexisting API endpoints.
Below is a real example of a Web Service configuration file.
[authentication] key_generator = 12345 [storage] db_name = tldb db_user = tluser db_host = my-tlp-server.com # You may leave this empty if SSL auth is enabled: db_passwd = [media_urls] use_urls = yes base_url = http://my-tlp-server.com/data [mailing] enabled = yes smtp_server = smtp.my-tlp.server.com from_address = noreply@my-tlp-server.com to_address = admin@my-tlp-server.com [misc] html_msg = <html><head><meta charset="UTF-8"></head><body><p>Hi there!</p></body></html>
Player
Requirements
-
HTTP Server (i.e. Apache HTTP Server)
-
PHP5 with cURL support
Installation steps
-
Install the required external software dependencies. In our installation example, we use the open source Apache HTTP Server. On Ubuntu 14.04 LTS, this is easy to do using the following command line:
sudo apt-get install apache2 php5 libapache2-mod-php5 php5-curl
-
Move all of the files inside the player folder of the TLP package into any folder of your HTTP Server. In our example we will use the directory
/home/tluser/tlp/player
:sudo mkdir -p /home/tluser/tlp sudo cp -r player /home/tluser/tlp/
You will need to add the following lines in your
<VirtualHost>
directive(s) of the Apachesites-enabled
file (located at/etc/apache2/sites-enabled/000-default
).<VirtualHost> ... Alias /player /home/tluser/tlp/player <Directory /home/tluser/tlp/player> Options Indexes FollowSymLinks MultiViews Require all granted </Directory> ... </VirtualHost>
-
Set ownership of the
player/translectures/config.json
file to the HTTP Server user (in Apache it iswww-data
), and disable its read and write permissions for groups and others:sudo chown www-data /home/tluser/tlp/player/translectures/config.json sudo chmod 600 /home/tluser/tlp/player/translectures/config.json
-
(Optional) Create a symbolic link inside the
player
directory pointing to the media repository directory (/home/tluser/tlp-repo/media
, see the section on installation of the Database), for instanceplayer/data
:sudo ln -s /home/tluser/tlp-repo/media /home/tluser/tlp/player/data
Note that this symbolic link, when in the form of a URL (i.e.
http://localhost/player/data
), will become the tlbaseurl parameter when calling the Player (please see this Appendix). -
Restart your HTTP Server where necessary (in our example, it is):
sudo service apache2 restart
-
(Optional) Check whether the Player can be accessed (note that it will show you an error message - this is normal):
curl http://localhost/player/
Configuration
The configuration file player/translectures/config.json
must be edited to in order to grant the Player access to the Web Service.
{
"ws_url" : <str> ,
"data_url" : <str>
}
Configuration parameters:
-
ws_url: Web Service URL.
-
data_url: Media storage data URL (to serve non-URL videos).
{ "ws_url" : "http://my.server.com/tl/", "data_url" : "http://my.server.com/player/data/" }
Ingest Service
Requirements
-
Psycopg Python library (version 2.4 or above).
-
tldextract Python library (version 1.5 or above).
-
FFmpeg with H.264 support.
-
Zip command-line utility.
-
(optional) Job scheduling/queue management system.
Installation Steps
-
Install the required software dependencies. On Ubuntu 14.04 LTS, this is easy to do using the following command line:
sudo apt-get install python-psycopg2 zip
The tldextract python library is not available in the official Ubuntu repositories. However, it can be easily installed via pip:
sudo apt-get install python-pip sudo pip install tldextract
The FFmpeg package stored in the Ubuntu repositories does not include H.264 codec support, so you will have to download all sources and compile them by yourself. You will find a useful guide for said compilation in Debian/Ubuntu distributions here.
You can also set up a job scheduling/queue management system, required in order to execute and manage transcription, translation and media conversion processes. In our case we have tested TLP with the open source version of the Sun Grid Engine.
-
Copy the
ingest-service
directory into the desired installation directory. In our example, we use/home/tluser/tlp
.sudo mkdir -p /home/tluser/tlp sudo cp -r ingest-service /home/tluser/tlp/
-
Specify the root directory where ASR, MT and TTS modules will be placed in the
systems
mount point of the Database. Asuming that we will put all these modules under/home/tluser/tlp/ingest-service/modules/tl
:sudo -u tluser psql tldb -c " INSERT INTO mount_points (name, machine_id, path) VALUES ('systems', 0, '/home/tluser/tlp/ingest-service/modules/tl');"
Configuration
The TLP Ingest Service comes with a configuration file in which
several parameters and options are defined. The Ingest Service's Core will
attempt to load a file named config.ini
located in the same directory (i.e.
/home/tluser/tlp/ingest-service/config.ini
). If the configuration file does not
exist or cannot be parsed, the execution will fail. You can manually
specify another path to the configuration file using the option --config-file
(see Execution).
The specification of the configuration file is as follows:
In this section, general settings can be configured.
- hostname = <string>
-
Host name of the machine that will run the Ingest Service.
- tl_user = <string>
-
System user that will run the Ingest Service, in order to set ownership of all stored files.
- tl_group = <string>
-
System group to set ownership of all stored files.
- rm_finished_up_days = <int>
-
Automatically delete temporary data from finished uploads after
n
days. - rm_error_up_days = <int>
-
Automatically delete temporary data from failed uploads after
n
days. - local_repository = <boolean>
-
Store all media files uploaded to TLP instead of accessing them via URL (when provided).
In this section, TLP Database connection settings can be customised.
- db_name = <string>
-
Database name to connect with.
- db_user = <string>
-
Database user name.
- db_passwd = <string>
-
Database user password. You can leave this field empty if database SSL auth is enabled.
- db_host = <string>
-
Database hostname or IP address.
In this section, some settings relating the job management system are defined.
- localhost = <string>
-
Local machine hostname for the job management system. It is used to launch media conversion processes in the local machine, as these tasks are very network-consuming.
- status_cmd = <string>
-
System call to get the status of all processes previously submitted to the job management system.
- submit_scr = <string>
-
Path to the script or binary program to submit tasks to the job management system.
- submit_opts = <string>
-
Options of the submit script (which will be appended to all submit calls).
- job_name_prefix = <string>
-
Job name prefix for all tasks submitted to the job management system.
- enabled = <boolean>
-
Make the Ingest Service to create edit sessions on update operations over the media ID that is being processed. Hence, users won’t be able to edit subtitles with the TLP Player until the update operation finishes.
- author_id = <boolean>
-
Author ID of the Ingest Service.
- author_name = <boolean>
-
Author Name of the Ingest Service.
- author_conf = <int>
-
Confidence of the Ingest Service, from 0 to 100 (
100
, right?).
In this section, you can customize several parameters of the Mailing module.
- enabled = <boolean>
-
Enables or disables Mailing module.
- smtp_server = <string>
-
SMTP Server hostname or IP address used to send e-mail notifications.
- from_address = <string>
-
E-mail address that will be used as "From" address.
- send_client_started_mail = <boolean>
-
Enables or disables e-mail notifications to the client to inform that an upload has started to be processed.
- send_client_error_mail = <boolean>
-
Enables or disables e-mail notifications to the client to inform that an upload has failed.
- send_client_finished_mail = <boolean>
-
Enables or disables e-mail notifications to the client to inform that an upload has successfully finished.
- send_admin_started_mail = <boolean>
-
Enables or disables e-mail notifications to the system administrator to inform that an upload has started to be processed.
- admin_address_started = <string>
-
E-mail address of the system administrator that will receive notifications about uploads that have started to be processed.
- send_admin_error_mail = <boolean>
-
Enables or disables e-mail notifications to the system administrator to inform that an upload has failed.
- admin_address_error = <string>
-
E-mail address of the system administrator that will receive notifications about uploads that have failed.
- send_admin_finished_mail = <boolean>
-
Enables or disables e-mail notifications to the system administrator to inform that an upload has finished.
- admin_address_finished = <string>
-
E-mail address of the system administrator that will receive notifications about uploads that have finished.
In this section, the required/allowed file formats for every type of file that can be uploaded to the Ingest Service are defined. Please note that only file formats in the file_formats table of the TLP Database can be used.
- max_audio_track_length = <int>
-
Defines the maximum length allowed, in seconds, of the audio track of the main media file.
- generate_pcm_stream = <boolean>
-
Generate a PCM stream file to be retrieved by the TLP Player under the Advanced Mode.
- required_video_formats = <string>
-
Defines which video formats are needed to maximise compatibility of the TLP Player with all browsers. If the uploaded media files are not in some of the required formats, then the Ingest Service will do the appropriate conversion.
- allowed_video_formats = <string>
-
Comma-separated list of all video formats that will be allowed to be uploaded as main media.
- allowed_audio_formats = <string>
-
Comma-separated list of all audio formats that will be allowed to be uploaded as main media.
- allowed_slides_text_formats = <string>
-
Comma-separated list of all slides text formats that will be allowed to be uploaded.
- allowed_slides_video_formats = <string>
-
Comma-separated list of all slides video formats (video-recorded slides) that will be allowed to be uploaded.
- allowed_docs_formats = <string>
-
Comma-separated list of all document formats that will be allowed to be uploaded.
- allowed_caption_formats = <string>
-
Comma-separated list of all subtitle formats that will be allowed to be uploaded.
- allowed_thumbnail_formats = <string>
-
Comma-separated list of all thumbnail formats that will be allowed to be uploaded.
- allowed_packages = <sring>
-
Comma-separated list of all package formats that will be allowed to be uploaded.
In this section, the location of the text retrieval module is defined.
- module_path = <string>
-
Path to the text retrieval module. If it is not available, just leave the right part empty.
In this section, paths to external data files are defined.
- audio_background_img = <path>
-
Background image that will be used to encode videos for the TLP Player using an uploaded audio file as audio stream. If it is not provided, then the generated videos will show a black background.
- test_thumbnail_img = <path>
-
Background image that will be used to encode a short video for the TLP Player when using the test mode of the Ingest Service.
In this section, URL decoders for non-public URLs can be registered to be used by the URL Downloader Module.
[general] hostname = my-tlp-server.com tl_user = tluser tl_group = users rm_finished_up_days = 1 rm_error_up_days = 5 local_repository = no [storage] db_name = tldb db_user = tluser db_passwd = db_host = my-tlp-server.com [scheduler] localhost = my-tlp-server status_cmd = qstat -u tluser submit_scr = qsubmit submit_opts = "" job_name_prefix = TLP-job [sessions] enabled = no author_id = ingest-service author_name = Ingest Service author_conf = 100 [mailing] enabled = yes smtp_server = smtp.my-tlp.server.com from_address = no-reply@my-tlp.server.com send_client_started_mail = yes send_client_error_mail = yes send_client_finished_mail = yes send_admin_started_mail = yes admin_address_started = admin@my-tlp.server.com send_admin_error_mail = yes admin_address_error = admin@my-tlp.server.com send_admin_finished_mail = yes admin_address_finished = admin@my-tlp.server.com [file_formats] max_audio_track_length = 10800 generate_pcm_stream = yes required_video_formats = mp4 allowed_video_formats = mp4, m4v, ogv, wmv, avi, mpg, flv, mov, 3gp, webm, mkv allowed_audio_formats = wav, mp2, mp3, oga, flac, aac, ape, wma, m4a allowed_slides_text_formats = txt, ppt, pptx, doc, docx, pdf allowed_slides_video_formats = mp4, m4v, ogv, wmv, avi, mpg, flv allowed_docs_formats = pdf, doc, docx, ppt, pptx, txt, html, xls, xlsx allowed_caption_formats = dfxp, trs, srt allowed_thumbnail_formats = jpg allowed_packages = zip [text_retrieval_module] module_path = /path/to/my/text_retrieval_module.py [data] audio_background_img = /path/to/audio_background.png test_thumbnail_img = /path/to/test_thumbnail.png [url_decoders] youtube = /path/to/url-decoder.youtube.py
Execution
In order to run the Ingest Service, you have to execute the Ingest Service’s
Core (ingest-service/core.py
). All options of the Core python script are
shown below.
Usage: core.py [options]
Options:
-h, --help show this help message and exit
-v, --verbose Verbose power on!
-d, --debug Debug mode
-C CONFIG_FILE, --config-file=CONFIG_FILE
Configuration file. Default: config.ini
-D DB_NAME, --database=DB_NAME
Database with which to work. Default: specified in
config file
To run the Ingest Service:
python /home/tluser/tlp/ingest-service/core.py -v
However, you might probably want to schedule its execution
periodically. Under UNIX systems you can consider using
Crontab. For instance, if you want to execute the Ingest Service every minute, logging
all information into a log file, put this line in the tluser
's crontab file:
*/1 * * * * /bin/bash -l -c -x '
source /home/tluser/.bashrc;
python /home/tluser/tlp/ingest-service/core.py -v >> /home/tluser/tlp/ingest-service/core.log 2>&1;'
To prevent multiple executions of the Ingest Service to be running at the same time, you can simply use a custom lock file.
*/1 * * * * /bin/bash -l -c -x '
if [ ! -e /home/tluser/.cron-ingest.lock ]; then
touch /home/tluser/.cron-ingest.lock;
source /home/tluser/.bashrc;
python /home/tluser/tlp/ingest-service/core.py -v >> /home/tluser/tlp/ingest-service/core.log 2>&1;
rm /home/tluser/.cron-ingest.lock;
fi'
Appendix B: Calling the TLP Player
The TLP Player must be called using different input parameters depending on which subtitles are being edited, the language of these subtitles and what kind of user is doing the editing. These parameters are sent as a Base64-encoded JSON string via HTTP GET or POST methods. A full request key is sent for the authentication of the API client on all Web Service calls.
API users should avoid sending their secret key on the Player input parameters, since these parameters are exposed to third-parties (i.e. external Player users) as they travel inside the URL to the Player. Please see Annex Generating a Request Key to learn how to produce valid full request keys in order to protect your private secret key. |
The transLectures-UPV Platform includes in its Client Tools Package the player-url-generator.py command-line script that
generates valid Player URLs for the given input parameters. The usage of the
--debug option might be very useful to check how these URLs are generated.
Furthermore, you will find libraries for different platforms that include
Player URL generation methods. |
Input parameters
{
"id" : <str> ,
"lang" : <str> ,
"author_id" : <str> ,
"author_conf" : <int> ,
"author_name" : <str> ,
"expire" : <int> ,
"api_user" : <str> ,
"request_key" : <str>
}
id:<str>
|
Media ID. |
lang:<str>
|
Language code of the subtitles being edited (i.e. en, es, ca). If this parameter is not defined, the Player will load the source language transcriptions (optional). |
author_id:<str>
|
ID of the user that will edit the subtitles. It is typically the internal user ID that the API client’s organisation assigns to the user. |
author_conf:<int>
|
Integer value (range 0-100) that indicates the confidence level that the API client’s organisation provide to the user. |
author_name:<str>
|
Full name of the user that will edit the subtitles (optional). |
expire:<int>
|
Expiration date of the URL in UNIX timestamp format. |
api_user:<str>
|
TLP username / API Client username (Please see Web Service user authentication). |
request_key:<str>
|
Request key (see Generating a Request Key). |
{ "id" : "id-001", "lang" : "en", "author_id" : "bobama", "author_conf" : 100, "author_name" : "Barack Obama", "expire" : 1400173491, "api_user" : "tluser", "request_key" : "5251982f3d00544e6e9a91962a2eec2f0b3df38c" }
Parameters are sent as a Base64-encoded JSON string. The JSON string for the above example would be as follows:
{"id" : "id-001", "lang" : "en", "author_id" : "bobama", "author_conf" : 100, "author_name" : "Barack Obama", "expire" : 1400173491, "api_user" : "tluser", "request_key" : "5251982f3d00544e6e9a91962a2eec2f0b3df38c"}
Base64 encode of the above JSON string:
eyJpZCIgOiAiaWQtMDAxIiwgImxhbmciIDogImVuIiwgImF1dGhvcl9pZCIgOiAiYm9iYW1hIiwgImF1dGhvcl9jb25mIiA6IDEwMCwgImF1dGhvcl9uYW1lIiA6ICJCYXJhY2sgT2JhbWEiLCAiZXhwaXJlIiA6IDE0MDAxNzM0OTEsICJhcGlfdXNlciIgOiAidGx1c2VyIiwgInJlcXVlc3Rfa2V5IiA6ICI1MjUxOTgyZjNkMDA1NDRlNmU5YTkxOTYyYTJlZWMyZjBiM2RmMzhjIn0=
HTTP call
The requested Base64 JSON string is received by the Player via HTTP GET or POST methods using the following parameters:
-
request → Base64 JSON string
-
t → Media start time in seconds (optional)
http://ttp.mllp.upv.es/player?request=eyJpZCIgOiAiaWQtMDAxIiwgImxhbmciIDogImVuIiwgImF1dGhvciIgOiAiYm9iYW1hIiwgImF1dGhvcl9uYW1lIiA6ICJCYXJhY2sgT2JhbWEiLA0KImF1dGhvcl9jb25mIiA6IDEwMCwgImludGVybmFsdXNlciIgOiAwLCAiZXhwaXJlIiA6IDE0MDAxNzM0OTEsICJhcGlfdXNlciIgOiAidGx1c2VyIiwgDQoicmVxdWVzdF9rZXkiIDogIjUyNTE5ODJmM2QwMDU0NGU2ZTlhOTE5NjJhMmVlYzJmMGIzZGYzOGMifQ==
Appendix C: Web Service API Specification
In this section, a detailed description of the inputs and outputs of all Web Service interfaces is provided.
The transLectures-UPV Platform includes in its Client Tools Package several libraries for different platforms that implement
all the interfaces described below. Also, you will find in that package the
ws-client.py command-line tool ready to be used for making all
these API calls. |
Preface
Please read carefully the following considerations before interacting with this API:
Allowed HTTP methods
The interfaces featured by the Web Service can be called using either GET or POST methods.
-
GET Method:
-
Using a single Base64-encoded JSON dict
data
GET parameter. Example:Parameters encoded into a JSON Dict:
{"parameter1":"value1", "parameter2": "value2"}
Base64-encoded JSON dict:
eyJwYXJhbWV0ZXIxIjoidmFsdWUxIiwgInBhcmFtZXRlcjIiOiAidmFsdWUyIn0
URL:
http://ttp.mllp.upv.es/api/action?data=eyJwYXJhbWV0ZXIxIjoidmFsdWUxIiwgInBhcmFtZXRlcjIiOiAidmFsdWUyIn0
-
Using multiple GET parameters. Example:
GET Parameters:
parameter1=value1 parameter2=value2
URL:
http://ttp.mllp.upv.es/api/action?parameter1=value1¶meter2=value2
-
-
POST Method:
-
All parameters must be sent as a Base64-encoded JSON dictionary stored in the body of the request. Example:
Parameters encoded into a JSON Dict:
{"parameter1":"value1", "parameter2": "value2"}
Base64-encoded JSON dict:
eyJwYXJhbWV0ZXIxIjoidmFsdWUxIiwgInBhcmFtZXRlcjIiOiAidmFsdWUyIn0
URL:
http://ttp.mllp.upv.es/api/action
(+ POST data)
-
The explanation above is applicable to all interfaces, except for the /ingest, which requires a combined POST+GET query, where:
-
Query parameters are sent via GET,
-
Media Package File (MPF) is sent via POST in the body of the request.
The Web Service will return an HTTP 400 Bad Request error code whenever required arguments are missing or are provided under an incorrect format. |
The ws-client.py command-line tool implements (via the
libtlp python library) all possible ways to send query
parameters to the API described above (see the options --use-get-query and
--use-data-param ). If you plan to implement your own API client, the
--debug option of this script might be very useful for you to check how
parameters and HTTP calls must be generated. |
API Client Authentication
-
API clients have to send in every call to the Web Service the following parameters:
-
user
→ API Client username / TLP username, -
auth_token
→ The authentication token for user authentication.
-
-
And additionally, if
auth_token
is a request_key:-
expire
→ Expiration date UNIX timestamp for the request (seconds since 01-01-1970 in UTC to the expiration date).
-
-
To learn how to generate a valid request key, please refer to the Generating a Request Key Annex.
-
The request key authentication method is valid for all interfaces, except /ingest, /uploadslist, /status, /systems and /revisions.
-
The Web Service will return the following HTTP error codes:
-
401 Unauthorized, if:
-
The API user does not exist,
-
the authentication token is invalid,
-
the API user does not have permissions to get information about the provided object ID.
-
-
419 Authentication Timeout, if:
-
the authentication token has expired (only for Request Key).
-
-
/ingest
This interface allows the client to upload media files to the platform so they can be automatically transcribed and translated into several languages by the Ingest Service. The uploaded data (media, slides, documents, etc.) are bundled into a non-compressed ZIP file called Media Package (MPF). The Web Service stores the Media Packages in the server and returns an Upload ID, which can be used afterwards to check the upload progress via the /status interface.
If you are developing your own API client, it is recommended to enable the Test Mode of the Ingest Service when performing call tests to this interface. Please refer to the Manifest JSON File Specification. |
Input
Input data is divided in two parts: query parameters, that goes through the url
as a GET query, and the Media Package file (ZIP format,
Content-Type must be application/zip, or multipart/form-data with a field
entitled file
and Content-Type = application/zip) sent through the body as a POST query.
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Object ID ( |
|
opc |
str |
Yes |
Operation code ( |
|
str |
No |
Email address to send notifications about status updates. |
||
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
http://ttp.mllp.upv.es/api/ingest?id=MEDIA-ID-1234&opc=0&email=jsnow21@got.com&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode" : <int> ,
"rcode_description" : <str> ,
"id" : <str> ,
"hash" : <str>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
id:<str>
|
Upload ID, which can be used afterwards to check the progress of the upload via the /status interface. |
hash:<str>
|
Internal media hash ID. |
Output Examples
{ "rcode" : 0, "rcode_description" : "Ingestion complete", "id" : "UPLOAD-ID-1234" "hash" : "433e11c295c51b94a074a" }
/uploadslist
Returns a list of all user’s uploads.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
object_id |
str |
No |
Get list of uploads involving the provided object ID (could be an Upload ID or a Media ID). |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
http://ttp.mllp.upv.es/api/uploadslist?user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
[
{
"id": <str>,
"object_id": <str>,
"status_code": <int>,
"uploaded_on": <str>,
"last_update": <str>
},
...
]
id:<str>
|
Upload ID. |
object_id:<str>
|
Object ID involved (could be an Upload ID or a Media ID). |
status_code:<int>
|
Status code of the Upload.
|
uploaded_on:<str>
|
Upload timestamp. |
last_update:<str>
|
Last upload check timestamp. |
Output Examples
[ { "id": "up-ac83be70-a01c-4c18-8cc4-dc0b2676cbb0", "object_id": "MEDIA-ID-1234", "status_code": 2, "uploaded_on": "2015-06-10 17:19:37.239458", "last_update": "2015-06-10 17:20:02.557135" }, { "id": "up-60a70bbd-e111-4d0c-b41f-6e235c434330", "object_id": "MEDIA-ID-1234", "status_code": 101, "uploaded_on": "2015-06-09 11:21:07.735656", "last_update": "2015-06-09 11:22:02.549826" }, { "id": "up-776ae4ec-6904-4da1-afa8-1b68017a524a", "object_id": "MEDIA-ID-5678", "status_code": 6, "uploaded_on": "2015-06-10 17:25:04.673541", "last_update": "2015-06-10 17:26:02.542902" } ]
/status
Returns information about the progress of an uploaded media given an Upload ID. It enables the remote repository to keep track of the automatic uploads and to notice possible processing errors.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Upload ID, returned by the /ingest interface. |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
http://ttp.mllp.upv.es/api/status?id=UPLOAD-ID-1234&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode" : <int> ,
"rcode_description" : <str> ,
"status_code" : <int> ,
"info" : <str> ,
"error_code" : <int> ,
"uploaded_on" : <str> ,
"last_update" : <str>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
status_code:<int>
|
Status code of the upload.
|
info:<str>
|
Detailed information about the status code. |
error_code:<int>
|
Generic error code that identifies the operation that failed within the process, if any. Otherwise null. |
uploaded_on:<str>
|
Upload timestamp. |
last_update:<str>
|
Last status check timestamp. |
Output Examples
{ "rcode": 1, "rcode_description" : "Upload ID [ up-1234 ] does not exist." }
{ "rcode": 0, "rcode_description" : "Upload ID exists.", "status_code": 2, "info": "Transcription in progress. It may take several hours for it to finish.", "uploaded_on": "2014-03-26 19:02:16.174944", "last_update": "2014-03-26 19:03:05.298861" }
/systems
Get a list of all available ASR/MT/TTS Systems that can be applied to transcribe/translate/synthesize a un uploaded media file using the /ingest interface.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
http://ttp.mllp.upv.es/api/systems?user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
[
"asr": [
{
"lang": <str>,
"id": <int>,
"name": <str>,
"description": <str>
},
...
],
"mt": [
{
"source_lang": <str>,
"target_lang": <str>,
"id": <int>,
"name": <str>,
"description": <str>
},
...
],
"tts": [
{
"lang": <str>,
"id": <int>,
"name": <str>,
"description": <str>,
"voice_gender": <str>
},
...
]
]
asr:<list:dict>
|
List of all available Automatic Speech Recognition Systems.
|
mt:<list:dict>
|
List of all available Machine Translation Systems.
|
tts:<list:dict>
|
List of all available Text-To-Speech Systems.
|
The system ID might be used to explicitly request to the Ingest Service the application of a particular ASR/MT/TTS system. For further information please see Requesting Subtitle Languages). |
Output Examples
{ "asr": [ { "lang": "en", "id": 43, "name": "English ASR System", "description": "" }, { "lang": "es", "id": 22, "name": "Spanish ASR System", "description": "" }, { "lang": "ca", "id": 64, "name": "Catalan ASR System", "description": "" } ], "mt": [ { "source_lang": "ca", "target_lang": "es", "id": 14, "name": "Catalan-Spanish MT System", "description": "" }, { "source_lang": "es" "target_lang": "ca", "id": 11, "name": "Spanish-Catalan MT System", "description": "" }, { "source_lang": "en", "target_lang": "es", "id": 73, "name": "English-Spanish MT System", "description": "" }, { "source_lang": "es" "target_lang": "en", "id": 24, "name": "Spanish-English MT System", "description": "" } ], "tts": [ { "lang": "en", "id": 71, "name": "English TTS System (Female)", "description": "", "voice_gender": "f" } ] }
/metadata
Returns metadata and media file locations of a given media ID. For example, this operation is called by the TLP Player to get the main media file location.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/metadata?id=MEDIA-ID-1234&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode": <int> ,
"rcode_description": <str> ,
"mediainfo": {
"language" : <str> ,
"title" : <str> ,
"category" : <str> ,
"duration" : <str> ,
"speakers" : [
{
"name" : <str>
} ,
...
]
} ,
"media": [
{
"is_url" : <bool> ,
"type_code" : <int> ,
"media_format" : <str> ,
"location" : <str>
} ,
...
],
"audiotracks": [
{
"lang": <str>,
"voice_type": <str>,
"id": <int>,
"location": <str>,
"media_format": <str>,
"audio_type": <str>,
"is_url": <bool>,
"sub_type": <int>,
"description": <str>
} ,
...
],
"attachments": [
{
"is_url" : <bool> ,
"type_code" : <int> ,
"media_format" : <str> ,
"location" : <str>
} ,
...
]
}
rcode:<int>
|
Return code of the WS call.
|
rcode_description:<str>
|
Description of the return code (rcode). |
mediainfo:<dict>
|
Media Metadata.
|
media:<list:dict>
|
List of media files available.
|
audiotracks:<list:dict>
|
List of available audiotracks.
|
attachments:<list:dict>
|
List of attachments available.
|
Output Examples
{ "rcode" : 1, "rcode_description" : "Media ID [ 1234-abcd ] does not exist or has no media" }
{ "rcode": 0, "rcode_description": "Media list and info available.", "attachments": [ { "is_url": true, "type_code": 1, "media_format": "txt", "location": "http://ttp.mllp.upv.es/data/9/9bc70b33e49c2/b3f0bee253651191cdd1f1ee6c865074.txt" }, { "is_url": true, "type_code": 2, "media_format": "txt", "location": "http://ttp.mllp.upv.es/data/9/9bc70b33e49c2/d41d8cd98f00b204e9800998ecf8427e.txt" } ], "media": [ { "is_url": true, "type_code": 0, "media_format": "mp4", "location": "http://ttp.mllp.upv.es/data/9/9bc70b33e49c2/1a4edf93069b3.mp4" }, { "is_url": true, "type_code": 3, "media_format": "jpg", "location": "http://ttp.mllp.upv.es/data/9/9bc70b33e49c2/d7bbbe4210bd3.jpg" }, { "is_url": true, "type_code": 6, "media_format": "pcm", "location": "http://ttp.mllp.upv.es/data/9/9bc70b33e49c2/75bae29f33670.pcm" } ], "lectureinfo": { "duration": 86, "speakers": [ { "name": "John Snow" } ], "language": "en", "title": "I do know nothing" }, "audiotracks": [ { "lang": "es", "voice_type": "tts", "id": 68, "location": "http://ttp.mllp.upv.es/data/9/9bc70b33e49c2/audiotrack.es.mp3", "media_format": "mp3", "audio_type": null, "is_url": true, "sub_type": 2, "description": null } ] }
/langs
Returns list of subtitle languages available for a specific media ID.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/langs?id=MEDIA-ID-1234&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode" : <int> ,
"rcode_description" : <str> ,
"media_lang" : <str> ,
"subs_locked" : <bool> ,
"langs" : [
{
"lang_code" : <str> ,
"lang_name" : <str> ,
"sup_status" : <str> ,
"audiotracks": [
{
"aid": <int>,
"voice_gender": <str>,
"voice_type": <str>
},
...
]
},
...
]
}
rcode:<int>
|
Return code of the WS call.
|
rcode_description:<str>
|
Description of the return code (rcode). |
media_lang:<str>
|
Language code (ISO-639-1) of the media’s original audio language. |
subs_locked:<bool>
|
Lock status of subtitles. If |
langs:<list:dict>
|
List of languages available.
|
Output Examples
{ "rcode" : 1 , "rcode_description" : "ID 1234-abcd does not exist or has no subtitles" , }
{ "rcode": 0, "rcode_description": "Language list available.", "media_lang": "es", "subs_locked": false, "langs": [ { "lang_code": "es", "sup_status": 1, "lang_name": "Español", "audiotracks": [] }, { "lang_code": "ca", "sup_status": 0, "lang_name": "Català", "audiotracks": [] }, { "lang_code": "en", "sup_status": 2, "lang_name": "English", "audiotracks": [ { "aid": 12, "voice_gender": "f", "voice_type": "tts" } ] } ] }
/subs
Returns subtitles for a specific media ID and language. By default, subtitles are sent in DFXP format, although they can be retrieved in many other formats.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
lang |
str |
Yes |
Language code (ISO 639-1). |
|
format |
int |
No |
Subtitles format.
|
|
session_id |
int |
No |
Load subtitles modifications from the given session ID (if any). If format=0, modified segments will include the highlight attribute ( |
|
seg_filt_policy |
int |
No |
Segment text filtering policy.
|
|
sel_data_policy |
int |
No |
Subtitle contents to be returned.
|
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/subs?id=MEDIA-ID-1234&lang=en&format=2&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
-
DFXP file: Content-Type = application/ttml+xml
-
SRT file: Content-Type = application/x-subrip
-
VTT file: Content-Type = text/vtt
-
TXT file: Content-Type = text/plain
-
Content-Type = application/json
{
"rcode" : <int> ,
"rcode_description" : <int>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
Output Examples
{ "rcode" : 1 , "rcode_description" : "Media ID [MEDIA-ID-1234] does not exist" }
1
00:08:44,090 --> 00:08:48,680
Good morning, my name is Kit Harington, but people is used to call me John Snow.
2
00:08:48,680 --> 00:08:51,850
Winter is coming, isn't it?
/audiotrack
Sends in binary format an audiotrack file.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
lang |
str |
Yes |
Language code (ISO 639-1). |
|
aid |
int |
Yes |
Audiotrack ID (from /metadata). |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/audiotrack?id=MEDIA-ID-1234&lang=en&aid=428&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
-
A Content-Type different than application/json, typically audio/mpeg (mp3) or audio/wav (wav)
-
Content-Type = application/json
{
"rcode" : <int> ,
"rcode_description" : <str>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
Output Examples
{ "rcode" : 1 , "rcode_description" : "Media ID [MEDIA-ID-1234] does not exist" }
/start_session
Starts an edition session to send and commit modifications of a subtitles file. Edition sessions are a mechanism devoted to avoid race conditions between different users when editing a subtitles file.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
author_id |
str |
Yes |
Author ID, authorised by the API client, that starts the edition session. |
|
author_name |
str |
No |
Author Name |
|
author_conf |
int |
Yes |
Confidence level of the author, from 0 to 100. |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/start_session?id=MEDIA-ID-1234&author_id=jsnow21&author_conf=90&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode": <int>,
"rcode_description": <str>,
"session_id": <int>,
"author_id": <str>,
"author_name": <str>,
"author_conf": <int>,
"author_type": <str>,
"started_at": <str>,
"last_update": <str>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
session_id:<int>
|
Session ID. |
author_id:<str>
|
Author ID that started the session and currently editing the Media ID. |
author_conf:<int>
|
Confidence level of the author ID, from 0 to 100. |
author_name:<int>
|
Author Name. |
author_type:<str>
|
Author Type.
|
started_at:<str>
|
Session start timestamp. |
last_update:<str>
|
Timestamp of the last update (/mod call) made by the user on the session. |
Output Examples
{ "rcode": 0, "rcode_description": "Session started.", "session_id": 8, "author_id": "jsnow21", "author_name": "John Snow", "author_conf": 90, "author_type": "human", "started_at": "2015-07-04 20:44:55.042786", "last_update": "2015-07-04 20:44:55.042786", }
{ "rcode": 4, "rcode_description": "Cannot start mod session: there exist an open mod session for the given media ID.", "session_id": 7, "author_id": "olly666", "author_name": "Olly", "author_conf": 100, "author_type": "human", "started_at": "2015-07-04 12:23:15.040186", "last_update": "2015-07-04 20:32:11.892843" }
/session_status
Returns the current status of the given session ID. If it is alive, it updates the last alive timestamp (last_update output key). This interface is commonly used to avoid the automatic end of session due to user inactivity.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
session_id |
int |
Yes |
Session ID |
8 |
author_id |
str |
Yes |
Author ID, authorised by the API client, owner of the session_id. |
|
author_conf |
int |
Yes |
Confidence level of the author, from 0 to 100. |
|
alive |
int |
No |
Alive message type.
|
1 |
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/session_status?id=MEDIA-ID-1234&session_id=8&author_id=jsnow21&author_conf=90&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode": <int>,
"rcode_description": <str>,
"started_at": <str>,
"last_update": <str>,
"last_alive": <str>,
"ended_at": <str>,
"ended_by_id": <str>,
"ended_by_name": <str>,
"ended_by_type": <str>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
started_at:<str>
|
Session start timestamp. |
last_update:<str>
|
Session last update timestamp. |
last_alive:<str>
|
Session last alive timestamp. |
ended_at:<str>
|
Session end timestamp. Will be |
ended_by_id:<str>
|
Author ID of the user that ended this session. Will be |
ended_by_name:<str>
|
Author name of the user that ended this session. Will be |
ended_by_type:<str>
|
Author type of the user that ended this session. Will be
|
Output Examples
{ "rcode": 0, "rcode_description": "Session alive.", "started_at": "2015-07-04 20:44:55.042786", "last_update": "2015-07-04 21:03:23.017192", "last_alive": "2015-07-04 21:03:51.932841", "ended_at": null, "ended_by_id": null, "ended_by_name": null "ended_by_type": null }
{ "rcode": 4, "rcode_description": "Session ID 8 is closed.", "started_at": "2015-07-04 20:44:55.042786", "last_update": "2015-07-04 20:44:55.042786", "last_alive": "2015-07-04 20:45:03.016531", "ended_at": "2015-07-04 22:28:51.075672", "ended_by_id": "lord_stark1", "ended_by_name": "Eddard Stark", "ended_by_type": "human" }
/mod
Send and commit modifications of subtitles files made by a user under a session ID returned by /start_session interface.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
session_id |
int |
Yes |
Session ID |
8 |
author_id |
str |
Yes |
Author ID, authorised by the API client, owner of the session_id. |
|
author_conf |
int |
Yes |
Confidence level of the author, from 0 to 100. |
|
mods |
json |
Yes |
JSON Dictionary containing as many key-values as subtitle languages has been modified, being keys a ISO 639-1 language code of the subtitle languages edited, and values a dictionary containing the following keys:
Note: When using the GET method with multiple GET parameters, this JSON object must be encoded in Base64. |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/mod?id=MEDIA-ID-1234&session_id=8&language=en&author_id=jsnow21&author_conf=90&mod=eyAiZW4iOiB7InR4dCI6WyB7InNJIjoxNiwgImIiOjg3Ljk2LCAiZSI6OTEuMzcsICJ0IjoiU2hlIHRvbGQgbWU6IFlvdSBrbm93IG5vdGhpbmcsIEpvaG4gU25vdyJ9IF0sICJkZWwiOlszLDddfSwgImVzIjogeyJkZWwiOls5XSwgInR4dCI6W119IH0=&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode" : <int> ,
"rcode_description" : <str> ,
"details": [
{
"language": <str>,
"rcode_description": <str>,
"rcode": <int>
},
...
]
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
details:<array:dict>
|
List of dicts for every subtitle language modifications if format errors were found (rcode |
Output Examples
{ "rcode" : 0 , "rcode_description" : "Changes successfully saved." }
{ "rcode" : 4 , "rcode_description" : "Session ID [ 8 ] is closed, changes have been backuped. Please contact system administrator." }
{ "rcode_description": "Modifications for some subtitle languages were successfully saved, but other failed. Please see 'details' key.", "rcode": 8, "details": [ { "rcode_description": "The given media ID does not have 'en' subtitles.", "rcode": 10, "language": "en" }, { "rcode_description": "Changes successfully saved.", "rcode": 0, "language": "es" } ] }
/end_session
Ends an open edition session. Depending on the confidence of the user, editions are directly stored in the corresponding DFXP files or are left for revision.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
session_id |
int |
Yes |
Session ID |
8 |
author_id |
str |
Yes |
Author ID, authorised by the API client, that closes the session. |
|
author_name |
str |
No |
Author Name |
|
author_conf |
int |
Yes |
Confidence level of the author, from 0 to 100. |
|
force |
int |
No |
Force end session when user and/or author_id are not the owners.
|
1 |
regenerate |
int |
No |
Request regeneration of subtitles and/or synthesized audiotracks immediately after closing the session.
|
0 |
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/end_session?id=MEDIA-ID-1234&session_id=8&author_id=jsnow21&author_conf=90&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode" : <int> ,
"rcode_description" : <str>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
Output Examples
{ "rcode": 0, "rcode_description": "Session succesfully closed." }
/lock_subs
Allow/disallow regular users to send subtitles modifications for an specific Media ID.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
lock |
int |
Yes |
Lock action:
|
|
author_id |
str |
Yes |
Author ID, authorised by the API client, owner of the session_id. |
|
author_conf |
int |
Yes |
Confidence level of the author, from 0 to 100. Must be 100. |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/lock_subs?id=MEDIA-ID-1234&lock=1&author_id=lord_stark1&author_conf=100&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode" : <int> ,
"rcode_description" : <str>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
Output Examples
{ "rcode" : 0 , "rcode_description" : "Subtitles successfully locked." }
/edit_history
Returns a list of all edit sessions carried out over an specific Media ID.
Session edits can be applied to a subtitles file calling the /subs
interface and passing to it the proper Session ID to the session_id
parameter.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/edit_history?id=MEDIA-ID-1234&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode_description": <str>,
"rcode": <int>,
"edit_history": [
{
"session_id": <int>,
"author_id": <str>,
"author_conf": <int>,
"author_name": <str>,
"timestamp": <str>,
"requires_revision": <bool>,
"revised": <bool>,
"revised_at": <str>,
"revised_by_id": <int>,
"revised_by_name": <str>,
"revised_via": <str>,
"revised_in_session_id": <int>,
"edit_stats": <dict>
},
...
]
}
rcode:<int>
|
Return code.
|
edit_history:<array:dict>
|
List of dictionaries containing information about each edit session.
|
Output Examples
{ "rcode_description": "Edit history available.", "rcode": 0, "edit_history": [ { "session_id": 9, "author_id": "lord_stark1", "author_conf": 100, "author_name": "Eddard Stark", "timestamp": "2015-07-05 12:14:15.824233", "requires_revision": false, "revised": null, "revised_at": null, "revised_by_id": null, "revised_by_name": null, "revised_via": null, "revised_in_session_id": null, "edit_stats": { "en": { "del_segs": 2, "edit_segs": 67, "edit_time": 631.97, "edit_time_percent": 100.00 }, "es": { "del_segs": 3, "edit_segs": 5, "edit_time": 32.52, "edit_time_percent": 5.12 } } }, { "session_id": 8, "author_id": "jsnow21", "author_conf": 90, "author_name": "John Snow", "timestamp": "2015-07-04 21:23:07.000031", "requires_revision": true, "revised": true, "revised_at": "2015-07-05 12:14:15.824233", "revised_by_id": "lord_stark1", "revised_by_name": "Eddard Stark", "revised_via": "mark_revised", "revised_in_session_id": 9, "edit_stats": { "en": { "del_segs": 0, "edit_segs": 21, "edit_time": 197.23, "edit_time_percent": 31.74 } } }, { "session_id": 7, "author_id": "olly666", "author_conf": 20, "author_name": "Olly", "timestamp": "2015-07-02 18:53:59.565720", "requires_revision": true, "revised": false, "revised_at": null, "revised_by_id": null, "revised_by_name": null, "revised_via": null, "revised_in_session_id": null, "edit_stats": { "en": { "del_segs": 4, "edit_segs": 3, "edit_time": 13.22, "edit_time_percent": 2.58 } } } ] }
/revisions
Returns a list of all edit sessions for all API user’s media files that are pending to be revised.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
http://ttp.mllp.upv.es/api/revisions?user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
[
{
"media_id": <str>,
"session_id": <int>,
"author_id": <str>,
"author_conf": <int>,
"author_name": <str>,
"timestamp": <str>,
"edit_stats": <dict>
},
...
]
media_id:<str>
|
Media ID. |
session_id:<str>
|
Session ID. |
author_id:<str>
|
Author ID owner of the session. |
author_conf:<int>
|
Confidence level of the author, from 0 to 100. |
author_name:<str>
|
Author name. Can be |
timestamp:<str>
|
Session end timestamp. |
edit_stats:<dict>
|
Dictionary containing statistics of the session’s edits. Each dictionary key is a language code (ISO-639-1) whose value is another dictionary that contains edit statistics of the corresponding subtitles language. These dictionaries feature the following keys:
|
Output Examples
[ { "media_id": "MEDIA-ID-1234", "session_id": 8, "author_id": "jsnow21", "author_conf": 90, "author_name": "John Snow", "timestamp": "2015-07-04 21:23:07.000031", "edit_stats": { "en": { "del_segs": 2, "edit_segs": 1, "edit_time": 3.41, "edit_time_percent": 1.97 } } } ]
/mark_revised
Mark/unmark an edition session as revised.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Media ID. |
|
session_id |
int |
Yes |
Session ID |
8 |
author_id |
str |
Yes |
Author ID, authorised by the API client, who revised the changes made in the Session ID session_id. |
|
author_name |
str |
No |
Author Name |
|
author_conf |
int |
Yes |
Confidence level of the author. Must be |
|
revision_session_id |
int |
No |
Session ID under which the given Author ID revised the changes made in the Session ID session_id. |
12 |
unmark |
int |
No |
Do the inverse process: delete revised mark.
|
1 |
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/mark_revised?id=MEDIA-ID-1234&session_id=8&author_id=lord_stark1&author_conf=100&revision_session_id=12&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
{
"rcode" : <int> ,
"rcode_description" : <str>
}
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
Output Examples
{ "rcode": 0, "rcode_description": "Session ID marked as revised." }
/accept
Accept modifications of one or more pending edit sessions without having to revise them. Modifications are commited into the corresponding subtitles files.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Comma-separated list of Session IDs whose edits are meant to be accepted by the given Author ID. |
|
author_id |
str |
Yes |
Author ID, authorised by the API client, who revised the changes made in the Session ID session_id. |
|
author_name |
str |
No |
Author Name |
|
author_conf |
int |
Yes |
Confidence level of the author. Must be |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/accept?id=27,43,82,121&&author_id=lord_stark1&author_conf=100&revision_session_id=12&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
Returns a list of dictionaries, one for each Session ID.
[
{
"session_id": <int>,
"rcode" : <int> ,
"rcode_description" : <str>
},
...
]
session_id:<int>
|
Session ID. |
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
Output Examples
[ { "rcode_description": "Session ID does not exist.", "rcode": 1, "session_id": 27 }, { "rcode_description": "Session ID did not require revision.", "rcode": 4, "session_id": 43 }, { "rcode_description": "Session ID already revised.", "rcode": 5, "session_id": 82 }, { "rcode_description": "Changes from Session ID successfully accepted.", "rcode": 0, "session_id": 121 } ]
/reject
Reject modifications of one or more pending edit sessions without having to revise them.
Input
Parameter name | Type | Required | Description | Example |
---|---|---|---|---|
id |
str |
Yes |
Comma-separated list of Session IDs whose edits are meant to be accepted by the given Author ID. |
|
author_id |
str |
Yes |
Author ID, authorised by the API client, who revised the changes made in the Session ID session_id. |
|
author_name |
str |
No |
Author Name |
|
author_conf |
int |
Yes |
Confidence level of the author. Must be |
|
user |
str |
Yes |
TLP Username / API Username. |
|
auth_token |
str |
Yes |
Authentication token for the provided |
|
expire |
int |
No |
Expiration date UNIX timestamp of the request (seconds since 01-01-1970 in UTC to the expiration date). Required only if |
|
http://ttp.mllp.upv.es/api/reject?id=27,43,82,121&&author_id=lord_stark1&author_conf=100&revision_session_id=12&user=tluser&auth_token=edbab44c3a3f1ca8db4de8277a3b
Output
Returns a list of dictionaries, one for each Session ID.
[
{
"session_id": <int>,
"rcode" : <int> ,
"rcode_description" : <str>
},
...
]
session_id:<int>
|
Session ID. |
rcode:<int>
|
Return code.
|
rcode_description:<str>
|
Description of the return code (rcode). |
Output Examples
[ { "rcode_description": "Session ID does not exist.", "rcode": 1, "session_id": 27 }, { "rcode_description": "Session ID did not require revision.", "rcode": 4, "session_id": 43 }, { "rcode_description": "Session ID already revised.", "rcode": 5, "session_id": 82 }, { "rcode_description": "Changes from Session ID successfully rejected.", "rcode": 0, "session_id": 121 } ]
Appendix D: Generating a Request Key
The request key is an alternative authentication method for the TLP Web Service that avoids revealing the API secret key of the client in the requests to the Web Service. This method can be used only with a reduced subset of Web Service interfaces (see Web Service API Specification Preface).
A request key depends on the values of some call parameters, and therefore it has to be explicitly generated for each API call.
The request key token is divided in two parts. The first part - the basic request key - is mandatory for all interfaces, while the second part is needed only when calling the /mod interface. Both parts are SHA-1 sums of a string composed by the concatenation of different call parameters plus the user’s API secret key. The concatenation of both key parts becomes the full request key.
On the one hand, the first part (basic request key) consists of the SHA-1 sum (40 bytes length) of:
-
id
API call parameter -
expire
API call parameter -
user
API call parameter -
User’s secret key.
SHA-1 (id + expire + user + secret_key)
On the other hand, the second part is the SHA-1 sum (40 bytes length) of:
-
author_id
API call parameter -
author_conf
API call parameter -
expire
API call parameter -
user
API call parameter -
User’s secret key.
SHA-1 (author_id + author_conf + expire + user + secret_key)
The full request key (80 bytes length) is the concatenation of both parts:
SHA-1 (id + expire + user + secret_key ) + SHA-1 (author_id + author_conf + expire + user + secret_key)
It is important to note that all Web Service interfaces will read only the first part (the first 40 bytes, ignoring the remaining bytes) of the request key, except the /mod one that requires the full request key.
Input parameters:
id = media-1234
expire = 1433875891
user = tluser
author_id = player_user_1234
author_conf = 100
secret_key = mhes28gfj7vfdg7ylnpapom26ksjfyvjmsoe
First part (Basic request key):
SHA-1 (id + expire + user + secret_key)
SHA-1 (media-12341433875891tlusermhes28gfj7vfdg7ylnpapom26ksjfyvjmsoe)
0cce9cd1c2a0f8d4cd1486bc317c24cb137fbf58
Second part:
SHA-1 (author_id + author_conf + expire + user + secret_key)
SHA-1 (player_user_12341001433875891tlusermhes28gfj7vfdg7ylnpapom26ksjfyvjmsoe)
5ff4dec6d50b3aac13f3d0a7de4e3bb111bc107d
Full request key:
0cce9cd1c2a0f8d4cd1486bc317c24cb137fbf585ff4dec6d50b3aac13f3d0a7de4e3bb111bc107d
Appendix E: Media Package Specification
A Media Package File is an uncompressed ZIP file that contains several media
files and attachments plus a JSON file, named manifest.json
, that declares in
a JSON object all the uploaded media files and attachments included in the
Media Package, in addition to other metadata. All files must be placed in the
root on the ZIP package (not inside folders or sub-folders).
Media Package Files (MPF) are uploaded to TLP via the /ingest interface of the Web Service.
Manifest file JSON Specification
{
"operation_code" : <int>,
"media" : {
"url" : <str> ,
"filename" : <str> ,
"fileformat" : <str> ,
"md5" : <str>
} ,
"attachments" : [
{
"filename" : <str> ,
"fileformat" : <str> ,
"md5" : <str> ,
"type_code" : <str> ,
"language" : <str> ,
"human" : <bool>
},
...
] ,
"metadata" : {
"external_id" : <str> ,
"language" : <str> ,
"title" : <str> ,
"topic" : <str> ,
"keywords" : <str> ,
"date" : <str> ,
"speakers" : [
{
"speaker_id" : <int> ,
"speaker_name" : <str> ,
"speaker_gender" : <str> ,
"speaker_email" : <str>
} ,
...
]
},
"requested_langs": <dict> ,
"transLecture" : <int> ,
"tL-regenerate": [ <str>, ...] ,
"tL-force": <int> ,
"delete_mode" : <str>,
"test_mode" : <bool>
}
-
operation_code:
<int>
→ Operation type:-
0
→ New media. A new media will be processed by the Ingest Service and inserted into database. -
1
→ Update media. An existing media will be updated after processing the input data by the Ingest Service. -
2
→ Delete media. An existing media will be deleted from the database. -
3
→ Cancel upload. An ongoing upload will be cancelled.
-
-
media:
<dict>
→ Main media file to be transcribed and/or translated.<dict>
keys:-
url:
<str>
→ URL to the main media file. If a url field is given the other fields are ignored. -
filename:
<str>
→ File name of the main media file. -
fileformat:
<str>
→ Format of the main media file (see "Allowed attachments" below). -
md5:
<str>
→ MD5 checksum of the main media file.
-
-
metadata:
<dict>
→ Media metadata.<dict>
keys:-
external_id:
<str>
→ Media ID (typically an internal ID in the client’s media repository database) used to identify the video in further queries to the Web Service, or Upload ID in case of Cancel Upload operation. -
title:
<str>
→ Title of the media. -
language:
<str>
→ Media language code in ISO 639-1 format (e.g. "en", "es"). -
speakers:
<list:dict>
→ Information about the speaker(s) of the media.<dict>
keys:-
speaker_id:
<int>
→ Speaker ID (client). -
speaker_name:
<str>
→ Full name of the speaker. -
speaker_email:
<str>
→ E-mail of the speaker (optional). -
speaker_gender:
<str>
→ Gender of the speaker (optional).-
M
→ Male. -
F
→ Female.
-
-
-
topic:
<str>
→ Topic of the media (optional). -
keywords:
<str>
→ Media keywords (optional). -
date:
<str>
→ Publication date of the media (optional).
-
-
attachments:
<list:dict>
→ Additional files that have been attached to the media package, such as slides, related documents or subtitles.<dict>
keys:-
filename:
<str>
→ File name of the attachment. -
fileformat:
<str>
→ Format of the attachment (see "Allowed attachments" below). -
md5:
<str>
→ MD5 checksum of the attachment. -
type_code:
<int>
→ Attachment type code (see "Allowed attachments" below).-
0
→ Media file. -
1
→ Slides file. -
2
→ Related text document file. -
3
→ Video Snapshot/Thumbnail file. -
4
→ Subtitles file. -
5
→ Audiotrack file.
-
-
language:
<str>
→ Language of the attachment, in case it is a subtitles file, in ISO 639-1 format (e.g. "en", "es") (optional, defaultnull
). -
human:
<bool>
→ If the attachment is a subtitles file, determine if provided subtitles have been generated by humans (optional, defaulttrue
).
-
-
requested_langs:
<dict>
→ Explicit request of subtitles and audiotrack languages, along with some advanced options. Please see Requesting Subtitle Languages section (optional). -
tL-regenerate:
<list:str>
→ On update operation: request a regeneration of translations and/or synthesized audiotracks. Must be a list of keywords (optional). Allowed Keywords:-
tx
→ Request regeneration of the media transcription. -
tl
→ Request regeneration of media translations. -
tts
→ Request regeneration of synthesized audiotracks.
-
-
tL-force:
<int>
→ On update operation: Force regeneration of automatic subtitles even if there exist human-supervised subtitles (optional).-
0
→ Do not force regeneration of subtitles (default). -
1
→ Force regeneration of subtitles.
-
-
transLecture:
<int>
→ Enable/disable transcription and translation technologies (optional).-
0
→ Disable automatic transcription and translation of the uploaded media. On new operations, an empty subtitles file for the spoken language is generated. -
1
→ Enable automatic transcription and translation of the uploaded media (default).
-
-
delete_mode:
<str>
→ On delete operation: Specify delete mode (optional).-
soft
→ The media will be marked as "deleted" in the database and its files will be backed up. -
hard
→ All media and subtitle files stored will be deleted, as well as all related entries in the Database (default).
-
-
test_mode:
<bool>
→ Enable/Disable the Test Mode of the Ingest Service. When enabled, the uploaded media will be available inmediately with an empty subtitles file. This feature is very useful when executing integration tests with the /ingest interface. By default it is disabled (optional).
Requesting Subtitle Languages
The requested_langs option, as stated before, is used to request additional subtitle languages, specifying advanced transcription, translation or text-to-speech options. requested_langs is a JSON dictonary in which keys are ISO 639-1 language codes (e.g. "en", "es"), and values are dictionaries in which keys are the objects or outputs requested to be generated for that particular language, and values are dictionaries with advanced options. Those keys (objects) might be:
-
sub:
<dict>
→ Generate subtitles for that specific language. -
tts:
<dict>
→ Generate text-to-speech audiotracks for that specific language.
sub and tts values can be empty dictionaries ({} ) or null values, if no
advanced options are specified. |
"requested_langs": { "es": { "sub": {} }, "en": { "tts": {}, "sub": {} } }
The example above means: "Generate Spanish and English subtitles and a English TTS audiotrack, using default options in all cases".
Advanced options:
-
sub options:
-
sid:
<int>
→ Specify which System will be applied to generate the transcription, translation, or audiotrack file. If not specified, the default system is used. -
lma:
<bool>
→ Only for transcriptions (ASR): Enable or disable Language Model Adaptation. By default it is enabled. -
tlpath:
<list>
→ Only for translations (MT): Explicitly declare a Translation Path. This is useful to generate translations to a language which is not featured directly from the spoken language, using intermediate translation languages. It is declared as an ordered list of dictionaries, where each dictionary specifies the target language code l of the step, and optionally the System ID sid to apply.The example below shows how to request Catalan (
ca
) subtitles from the spoken language (XX
) using English (en
) and Spanish (es
) as intermediate languages, thus defining the following translation path:XX->En->Es->Ca
. The intermediateEn->Es
translation is generated using System ID 3."requested_langs": { "ca": { "sub": { "tlpath": [ { "l":"en" }, { "l":"es", "sid":3}, { "l":"ca"} ] } } }
If this option is not specified, the Ingest Service will assume that a direct translation from the spoken language is requested.
-
-
tts options:
-
sid:
<int>
→ Specify which System ID will be applied to generate the synthesized audiotrack. If not specified, the default system is used.
-
The following example of the requested_langs option
requests Estonian (et
) subtitles disabling the Language Model Adaptation
feature and making use of the System ID 22, as well as English (en
) subtitles with
default options and a synthesized English audiotrack using System ID 54.
"requested_langs": { "et": { "sub": { "lma":False, "sid":22 } } "en": { "sub": { } "tts": { "sid":54 } } }
Ingest Service Behavior
This section explains how the Ingest Service will behave in different manners depending on both the data declared in the manifest file and the requested operation type (new, update, delete or cancel).
New Media operation
Required inputs:
-
A media file to be transcribed and/or translated in the media section.
-
Required metadata keys:
-
external_id → Media ID in the remote (client) repository.
-
title → Title of the Media. It must be as descriptive as possible, since it might be used to search and download from the Internet related documents in order to adapt the ASR system to the topic of the video and enhance the quality of the automatic subtitles.
-
language → Spoken language of the media file, in ISO 639-1 format (e.g. "en", "es").
-
speakers → Speaker(s) info.
-
Behavior changes with optional inputs
-
requested_langs option:
-
Not provided: By default the media only will be transcribed, if possible and if needed (depending on the provided attachments). No translation nor TTS processes are launched.
-
Provided: Actions that can be inferred from this option will be executed, unless the expected outputs are already provided in the attachments section.
-
-
attachments section:
-
Language Model Adaptation on transcription depending on provided textual attachments: If Language Model Adaptation is enabled for the ASR System that generates the transcription file, this ASR system will be adapted to the topic of the media using different textual resources:
-
No text attachments provided: The adaptation will be carried out using external resources automatically downloaded from the Internet based on a web search using the title of the media.
-
A slides file: The adaptation will be carried out using the text extracted from the slides file.
-
Related documents: The adaptation will be carried out using the text extracted from the attached documents.
-
A slides file + Related documents: The adaptation will be carried out using both the text extracted from the slides file and the text extracted from the attached documents.
-
-
Providing expected outputs:
-
Subtitles file (spoken language): Media won’t be transcribed. Subtitles will be translated and afterwards synthesized if explicitly requested in the requested_langs option.
-
Subtitles files (other languages): Media will be firstly transcribed, and then translated into other destination languages, except for the provided subtitles language, if explicitly requested in the requested_langs option. Synthesized audiotracks will be also generated if explicitly requested.
-
Subtitles files (spoken language + other languages): The media will be translated into the remaining available destination languages, if any and if explicitly requested in the requested_langs option. Synthesized audiotracks will be generated if explicitly requested.
-
Audiotrack files: Synthesized audiotracks for the languages of the attached audiotracks won’t be generated even if they were explicitly requested.
-
-
Update Media operation
Required inputs:
-
Required metadata keys:
-
external_id → Media ID in the remote (client) repository.
-
Behavior changes with optional inputs
-
media section:
-
Not Provided: the Ingest Service will re-generate transcriptions, translations and/or audiotracks depending on the provided attachments and other options (see below).
-
Provided: the uploaded media is assumed to be a re-recording of the existing media; therefore a new transcription file needs to be generated. The Ingest Service will behave as described in the New Media operation. The only difference between both cases is that the video ID is kept. Old media file and subtitles are backed up.
-
-
requested_langs option:
-
Not provided: By default the media only will be transcribed, if possible and if needed (depending on the provided attachments). No translation nor TTS processes are launched.
-
Provided: Actions that can be inferred from this option will be executed, unless the expected outputs are already provided in the attachments section. Actions that involve the re-generation of subtitles that have already been edited or supervised by users won’t be executed, unless the tL-force option is provided and set to 1.
-
-
tL-regenerate option:
-
tx
: Transcription file will be automatically regenerated if not supervised before by a user, unless the tL-force option is provided and set to 1. -
tl
: Translation files will be automatically regenerated if not supervised before by a user, unless the tL-force option is provided and set to 1. -
tts
: Synthesized audiotracks will be automatically regenerated.
-
-
attachments section:
-
Language Model Adaptation on transcription regeneration depending on provided textual attachments: If Language Model Adaptation is enabled for the ASR System that generates the transcription file, this ASR system will be adapted to the topic of the media using different textual resources:
-
No text attachments provided: The adaptation will be carried out using external resources automatically downloaded from the Internet based on a web search using the title of the media.
-
A slides file: The adaptation will be carried out using the text extracted from the slides file.
-
Related documents: The adaptation will be carried out using the text extracted from the attached documents.
-
A slides file + Related documents: The adaptation will be carried out using both the text extracted from the slides file and the text extracted from the attached documents.
-
-
Providing outputs:
-
Subtitles file (spoken language): Current transcription file will be overwritten by the supplied one. Subtitles will be translated and afterwards synthesized if explicitly requested in the requested_langs option, or the tL-regenerate option contains the token tl for translation and tts for synthesis.
-
Subtitles files (other languages): Current translation files (if existing) will be overwritten by the supplied ones. Other subtitles languages will be generated if explicitly requested in the requested_langs option, or the tL-regenerate option contains the token tl. Synthesized audiotracks will be also generated if explicitly requested either with the requested_langs option or the tL-regenerate option.
-
Audiotrack files: Synthesized audiotracks for the languages of the attached audiotracks won’t be generated even if they were explicitly requested.
-
-
Delete Media operation
Required inputs:
-
Required metadata keys:
-
external_id:
<str>
→ Media ID in the remote (client) repository.
-
Behavior changes with optional inputs
-
delete_mode option:
-
soft
: media will be marked as "deleted" in the database and its files will be backed up. -
hard
: all media and subtitle files stored will be deleted, as well as all related entries in the Database.
-
Cancel Upload operation
Required inputs:
-
Required metadata keys:
-
external_id:
<str>
→ Upload ID to be canceled.
-
Allowed attachments
Type Code ID | Type Code Name | Allowed File Format List |
---|---|---|
0 |
Media (video) |
mp4, m4v, ogv, wmv, avi, mpg, flv. |
0 |
Media (audio) |
wav, mp3, oga, flac, aac. |
1 |
Slides (text) |
txt, ppt, pptx, doc, docx, pdf. |
1 |
Slides (video) |
mp4, m4v, ogv, wmv, avi, mpg, flv. |
2 |
Documents |
txt, doc, docx, ppt, pptx, pdf. |
3 |
Video Thumbnail |
jpg. |
4 |
Subtitles |
dfxp, srt, trs. |
5 |
Audiotracks |
wav, mp3, oga, flac, aac. |
Manifest JSON Examples
{ "operation_code": 0, "media": { "fileformat": "mp4", "md5": "05b59346bc3fe5d3eac7a0dcd0022fb6", "filename": "main_media.mp4" }, "attachments": [ { "filename":"awesome_slides_in_video_format.wmv", "fileformat":"wmv", "type_code":1, "md5":"c8722d0e8e27d4b5caaa7122a14676e3" } ], "requested_langs": { "es": { "sub": {}, "tts": {} }, "en": { "sub": {} } }, "metadata": { "external_id": "9abc7230fe36a18b885c", "language": "en", "title": "To know something or not to know nothing: A brief essay about knowledge.", "speakers": [ { "speaker_email": "kit@got.com", "speaker_name": "Kit Harington", "speaker_gender": "M", "speaker_id": "kit1234" } ] } }
{ "operation_code": 1, "metadata": { "external_id":"9abc7230fe36a18b885c" }, "tL-regenerate": [ "tl", "tts" ] }
{ "operation_code": 1, "media": { "filename": "main_media_NEW_VERSION.mp4" , "fileformat": "mp4" , "md5": "a86fae8e7af6fd6786efa876fa0e4212" }, "metadata": { "external_id": "9abc7230fe36a18b885c", "language": "en", "title": "To know something or not to know nothing: A brief essay about knowledge. (UPDATED)", "speakers": [ { "speaker_email": "kit@got.com", "speaker_name": "Kit Harington", "speaker_gender": "M", "speaker_id": "kit1234" } ] } }
{ "operation_code": 2, "metadata": { "external_id":"9abc7230fe36a18b885c" } }
{ "operation_code": 3, "metadata": { "external_id":"up-7249dec8-2b38-4413-b182-3481675c550c" } }
Appendix F: Python Commandline Utilities
TLP offers two useful Python command-line scripts to interact with the TLP
Server. These scripts are ws-client.py
and player-url-generator.py
. Both of them make use of the libtlp
python module. A configuration file is needed for both scripts.
On the client tools package, all this three files are located at:
client-tools/python/scr/ws-client.py
client-tools/python/scr/player-url-generator.py
client-tools/python/scr/config.ini
Configuration File
Both scripts requires a configuration file (config.ini
) properly set up in order to work.
Each parameter of the configuration file is detailed below.
- web_service_url = <str>
-
Base URL location of the TLP Web Service.
- player_url = <str>
-
Base URL location of the TLP Player.
- enabled = <bool>
-
Enable or disable HTTP Authentication.
- username = <str>
-
HTTP Auth username.
- password = <str>
-
HTTP Auth Password.
- username = <str>
-
TLP username / API username.
- secret_key = <str>
-
API Secret Key Authentication Token.
- url_lifetime = <int>
-
Time slot, starting from the generation of the URL, in which the user will be allowed to access the Player (expire input parameter).
- user_id = <user_id>
-
Client-side user ID of the user who will edit the subtitles (author_id input parameter).
- user_full_name = <user_full_name>
-
Client-side user name of the user who will edit the subtitles (author_name input parameter). (optional)
- user_confidence = <user_confidence>
-
Confidence level of the above user (author_conf input parameter).
[general] web_service_url = http://ttp.mllp.upv.es/api player_url = http://ttp.mllp.upv.es/player [http_auth] enabled = no username = password = [api_client_auth] username = tluser secret_key = akjsfd982323098qwjs209823id09321io3290d request_key_expire_lifetime = 1440 [player_user_info] user_id = jsnow21 user_full_name = John Snow user_confidence = 100
By default, both scripts will attempt to load a configuration file named
config.ini
from the same script directory. You can provide an alternative configuration
file location with the --config-file option. The option --print-sample-config-file
prints a sample config file to the standard output.
ws-client.py
ws-client.py is a Python command-line utility that interacts with the TLP Web Service API. It can perform queries to all Web Service interfaces. The configuration file has to be properly set up in order to work.
-
libtlp Python Library (included in the TLP Client Tools Package).
usage: ws-client.py [-h] [-d] [-g] [-D] [-c <file>] [-C] [-I <user_tuple>]
[-B <username>] [-f <dest>]
{systems,uploadslist,ingest,status,langs,metadata,subs,audiotrack,mod}
...
ws-client.py: TLP Web Service API client tool.
optional arguments:
-h, --help show this help message and exit
-d, --debug Print debug information
-g, --use-get-query Use GET HTTP queries instead of POST when possible
-D, --use-data-param Use a single base64-encoded JSON 'data' GET parameter
instead of multiple GET parameters
-c <file>, --config-file <file>
Config file. Default: config.ini
-C, --print-sample-config-file
Print sample config file and exit
-I <user_tuple>, --api-client-auth <user_tuple>
API client user name and authentication token, in the
following format: USERNAME:AUTH_TOKEN. Default: from
config file.
-B <username>, --su <username>
su (substitute user) option: specify username. Only
for admin users.
-f <dest>, --store-output-file <dest>
Store the Web Service response in a file.
Web Service Interfaces:
Use a subcommand to call the corresponding Web Service interface
{systems,uploadslist,ingest,status,langs,metadata,subs,audiotrack,mod}
systems Get a list of all available ASR/MT/TTS Systems that
can be applied to transcribe/translate/synthesize a
media file.
uploadslist Get a list of all user's uploads.
ingest Upload media (audio/video) files and many other
attachments along with other metadata bundled into a
Media Package File (MPF) to the TLP Server.
status Check the current status of an upload.
langs Get a list of subtitle and audiotrack languages
available for a given media ID.
metadata Get metadata and media file locations for a given
media ID.
subs Download subtitles for a given media ID and language.
audiotrack Download audiotrack file for a given media ID and
language.
mod Send and commit subtitle corrections made by a user.
The ws-client.py
tool includes several subcommands, one for each Web Service interface.
ws-client.py ingest
ingest
help:usage: ws-client.py ingest [-h] [-F MPF] [-D <dir>] [-n] [-l <language>]
[-t <title>] [-S <speaker_tuple>] [-m <email>]
[-k <keywords>] [-i <topic>] [-M <file/URL>]
[-A <file/URL>] [-s <file>] [-r <file>]
[-b <sub_tuple>] [-K <track_tuple>] [-a <file>]
[-o {0,1,2,3}] [-p <lang>] [-P <lang>] [-L <JSON>]
[-R {tx,tl,tts}] [-f {0,1}] [-T {0,1}]
[-x {soft,hard}]
object_id
positional arguments:
object_id Object ID (Media ID for New, Update and Delete
operations; Upload ID for Cancel operation.
optional arguments:
-h, --help show this help message and exit
-F MPF, --media-package-file MPF
Media Package File. Ingest existing media package file
and exit.
-D <dir>, --data-dir <dir>
Directory to store media package. By default, it is
stored in a temp dir
-n, --no-ingest Create media package only, do not ingest
-X, --test-mode Use Ingest Service test mode
Metadata options:
-l <language>, --language <language>
Media Language (ISO 639-1 code)
-t <title>, --title <title>
Title of the media
-S <speaker_tuple>, --speaker-info <speaker_tuple>
Speaker Info, in the following format:
'SPEAKER_ID:[FULL_NAME]:[GENDER]:[EMAIL]', where
GENDER={M,F}. Example: 'id1234:Pepita
Greus::pgreus@mymail.com'
-m <email>, --mail <email>
List of e-mails separated by commas
-k <keywords>, --keywords <keywords>
Media keywords
-i <topic>, --topic <topic>
Topic of the media
Media file options:
-M <file/URL>, --media-file <file/URL>
Main media file
-A <file/URL>, --extra-media-file <file/URL>
Additional media files (main media file encoded in
other formats)
-s <file>, --slides-file <file>
Slides file
-r <file>, --docs-file <file>
Document file
-b <sub_tuple>, --subtitle-file <sub_tuple>
Subtitle files, in the following format:
'LANG:FILE:[HUMAN]', where HUMAN={0,1 (def)}. Example:
'es:sub_es.srt:'
-K <track_tuple>, --audiotrack-file <track_tuple>
Audiotrack files, in the following format:
'LANG:FILE:[HUMAN]', where HUMAN={0,1 (def)}. Example:
'es:es.tts.mp3:0'
-a <file>, --thumbnail-file <file>
Video thumbnail file
Worflow options:
-o {0,1,2,3}, --operation-code {0,1,2,3}
Operation Code. 0 -> New (def), 1 -> Update, 2 ->
Delete, 3 -> Cancel Upload
-p <lang>, --requested-languages-subs <lang>
Request subtitle languages, in the following format:
'LANG[:SYS_ID]'. Example: '-p es -p en:17'
-P <lang>, --requested-languages-tts <lang>
Request TTS languages, in the following format:
'LANG[:SYS_ID]'. Example: '-P en:23'
-L <JSON>, --requested-languages-json <JSON>
"requested_langs" JSON string. Ignores -p and -P
options. Example: '{ "es":{"sub":{}},
"en":{"sub":{"sid":3}} }'
-R {tx,tl,tts}, --tL-regenerate-opt {tx,tl,tts}
Set 'tL-regenerate' option: request re-generation of
automatic subtitles and/or TTS tracks. Example: '-R tx
-R tl' requests regeneration of transcription and
translations
-f {0,1}, --tL-force-opt {0,1}
Set 'tL-force' option: overwrite any pre-existing
human-supervised subtitles. 0 -> Disabled, 1 -> Force
re-translation of supervised translations
-T {0,1}, --transLecture-opt {0,1}
Set 'transLecture' option: enables or disables
automatic transcription and translation of the
ingested media. 0 -> Disabled, 1 -> Enabled (def)
-x {soft,hard}, --delete-mode {soft,hard}
Delete mode
ws-client.py ingest
call: Create and upload a Media Package.--$ ./ws-client.py ingest
-M media.mp4
-t "Introduction to Machine Learning"
-l en
-s "slides.pptx"
-r "related-document.pdf"
-r "lecture-notes.doc"
"MEDIA-ID-1234"
ws-client.py uploadslist
uploadslist
help:usage: ws-client.py uploadslist [-h] [-o OBJECT_ID]
optional arguments:
-h, --help show this help message and exit
-o OBJECT_ID, --object_id OBJECT_ID
Get list of uploads involving the given Object ID.
ws-client.py uploadslist
call: Get list of uploads.--$ ./ws-client.py uploadlist
ws-client.py status
status
help:usage: ws-client.py status [-h] upload_id
positional arguments:
upload_id Upload ID
optional arguments:
-h, --help show this help message and exit
ws-client.py status
call: Check upload status.--$ ./ws-client.py status "UPLOAD-ID-1234"
ws-client.py systems
systems
help:usage: ws-client.py systems [-h]
optional arguments:
-h, --help show this help message and exit
ws-client.py systems
call: Get list of available ASR/MT/TTS systems.--$ ./ws-client.py systems
ws-client.py langs
langs
help:usage: ws-client.py langs [-h] [-k] [-K REQUEST_KEY_LIFETIME] [-H HASH_ID]
media_id
positional arguments:
media_id Media ID
optional arguments:
-h, --help show this help message and exit
-k, --use-request-key
Use request-key as auth_token instead of secret_key.
-K REQUEST_KEY_LIFETIME, --request-key-lifetime REQUEST_KEY_LIFETIME
Set request key lifetime in minutes. Default: from
config file.
-H HASH_ID, --hash-id HASH_ID
Media Hash ID
ws-client.py langs
call: Get available subtitle languages for a specific media ID.--$ ./ws-client.py langs "MEDIA-ID-1234"
ws-client.py metadata
metadata
help:usage: ws-client.py metadata [-h] [-H HASH_ID] [-k] [-K REQUEST_KEY_LIFETIME]
media_id
positional arguments:
media_id Media ID
optional arguments:
-h, --help show this help message and exit
-H HASH_ID, --hash-id HASH_ID
Media Hash ID
-k, --use-request-key
Use request-key as auth_token instead of secret_key.
-K REQUEST_KEY_LIFETIME, --request-key-lifetime REQUEST_KEY_LIFETIME
Set request key lifetime in minutes. Default: from
config file.
ws-client.py metadata
call: Get metadata and media file locations for a specific media ID.--$ ./ws-client.py metadata "MEDIA-ID-1234"
ws-client.py subs
subs
help:usage: ws-client.py subs [-h] [-f {dfxp,ttml,srt,vtt}] [-D {0,1,2}]
[-s {-1,0,1}] [-H HASH_ID] [-k]
[-K REQUEST_KEY_LIFETIME]
media_id language
positional arguments:
media_id Media ID
language Language ISO 639-1 code (i.e. en, es, ca, ...)
optional arguments:
-h, --help show this help message and exit
-f {dfxp,ttml,srt,vtt,text}, --format {dfxp,ttml,srt,vtt,text}
Subtitles format
-D {0,1,2}, --select-data-policy {0,1,2}
sel_data_policy parameter
-s {-1,0,1}, --segment-filtering-policy {-1,0,1}
seg_filt_policy parameter
-H HASH_ID, --hash-id HASH_ID
Media Hash ID
-k, --use-request-key
Use request-key as auth_token instead of secret_key.
-K REQUEST_KEY_LIFETIME, --request-key-lifetime REQUEST_KEY_LIFETIME
Set request key lifetime in minutes. Default: from
config file.
ws-client.py subs
call: Download English subtitles in SRT format for a specific media ID.--$ ./ws-client.py subs "MEDIA-ID-1234" en -f srt
ws-client.py audiotrack
audiotrack
help:usage: ws-client.py subs [-h] [-f {dfxp,ttml,srt,vtt}] [-D {0,1,2}]
[-s {-1,0,1}] [-H HASH_ID] [-k]
[-K REQUEST_KEY_LIFETIME]
media_id language
positional arguments:
media_id Media ID
language Language ISO 639-1 code (i.e. en, es, ca, ...)
optional arguments:
-h, --help show this help message and exit
-f {dfxp,ttml,srt,vtt}, --format {dfxp,ttml,srt,vtt}
Subtitles format
-D {0,1,2}, --select-data-policy {0,1,2}
sel_data_policy parameter
-s {-1,0,1}, --segment-filtering-policy {-1,0,1}
seg_filt_policy parameter
-H HASH_ID, --hash-id HASH_ID
Media Hash ID
-k, --use-request-key
Use request-key as auth_token instead of secret_key.
-K REQUEST_KEY_LIFETIME, --request-key-lifetime REQUEST_KEY_LIFETIME
Set request key lifetime in minutes. Default: from
config file.
ws-client.py audiotrack
call: Download English audio track of an specific media and audiotrack IDs.--$ ./ws-client.py audiotrack "MEDIA-ID-1234" en 12
ws-client.py mod
mod
help:usage: ws-client.py mod [-h] [-D DATA] [-i MEDIA_ID] [-l LANGUAGE]
[-a AUTHOR_ID] [-c AUTHOR_CONF] [-n AUTHOR_NAME]
[-t TXT_JSON_DICT] [-r DEL_SEGM_ID] [-H HASH_ID] [-k]
[-K REQUEST_KEY_LIFETIME]
optional arguments:
-h, --help show this help message and exit
-D DATA, --data DATA Directly provide input base64-encoded JSON data string
-i MEDIA_ID, --media-id MEDIA_ID
Media Id
-l LANGUAGE, --language LANGUAGE
Subtitle language
-a AUTHOR_ID, --author-id AUTHOR_ID
Author Id
-c AUTHOR_CONF, --author-conf AUTHOR_CONF
Author Confidence level (0-100)
-n AUTHOR_NAME, --author-name AUTHOR_NAME
Author full name
-t TXT_JSON_DICT, --txt-json-dict TXT_JSON_DICT
Segment modification dictionary: {"sI":<int>,
"b":<float, "e":<float>, "t":<str>}
-r DEL_SEGM_ID, --del-segm-id DEL_SEGM_ID
Segment IDs to delete.
-H HASH_ID, --hash-id HASH_ID
Media Hash ID
-k, --use-request-key
Use request-key as auth_token instead of secret_key.
-K REQUEST_KEY_LIFETIME, --request-key-lifetime REQUEST_KEY_LIFETIME
Set request key lifetime in minutes. Default: from
config file.
ws-client.py mod
call: Send modifications for Englsh subtitles of an specific media ID.--$ ./ws-client.py mod
--media-id MEDIA-ID-1234
--language en
--author-id jsnow21
--author-conf 100
-t '{"sI":1, "b":0.0, "e":2.0, "t":"Winter is coming."}'
-t '{"sI":2, "b":3.0, "e":5.0, "t":"Valar Morghulis."}'
-n "John Snow"
player-url-generator.py
player-url-generator.py is a Python command-line utility that generates valid URL links to the TLP Player according to these specifications. The configuration file has to be properly set up in order to work.
-
libtlp Python Library (included in the TLP Client Tools Package).
usage: player-url-generator.py [-h] [-d] [-C] [-c <file>] [-s START_TIME]
[-l LANGUAGE] [-t TIME_SLOT] [-a AUTHOR_ID]
[-n AUTHOR_NAME] [-k AUTHOR_CONF]
[-I <user_tuple>]
media_id
player-url-generator.py: Generate valid URLs for calling the TLP Player.
positional arguments:
media_id Media ID
optional arguments:
-h, --help show this help message and exit
-d, --debug Print debug information
-C, --print-sample-config-file
Print sample config file and exit
-c <file>, --config-file <file>
Config file. Default: config.ini
-s START_TIME, --start-time START_TIME
Start time in seconds.
-l LANGUAGE, --language LANGUAGE
Subtitles language.
-t TIME_SLOT, --time-slot TIME_SLOT
Time slot for editing in minutes. Default: from config
file.
-a AUTHOR_ID, --author-id AUTHOR_ID
Author ID ('author_id'). Default: from config file.
-n AUTHOR_NAME, --author-name AUTHOR_NAME
Author Name. Default: from config file.
-k AUTHOR_CONF, --author-conf AUTHOR_CONF
Author confidence level [0-100]. Default: from config
file.
-I <user_tuple>, --api_user <user_tuple>
User name and authentication token, in the following
format: USERNAME:AUTH_TOKEN. Default: from config
file.
player-url-generator.py
call: Get URL for editing English subtitles of a given media ID:--$ player-url-generator.py -l en MEDIA-ID-1234
Appendix G: DFXP Format Specification
This Appendix describes a format extension from the original DFXP format. This extension was made in order to reflect the needs of the transLectures EU project:
-
Confidence measures for automatic transcription and translations have to be reflected in the DFXP document.
-
Track needs to be kept of all subtitle edits made by human users, starting from an automatic transcription/translation.
For this purpose, new XML tags have been proposed. These tags belong to a new namespace called tl. Therefore, these new XML tags will be something like <tl:XXX>, where tl is the namespace and XXX is the tag. The root <tt> element has been extended in this way:
<tt xml:lang="en" xmlns="http://www.w3.org/2006/04/ttaf1" xmlns:tts="http://www.w3.org/2006/10/ttaf1#style" xmlns:tl="translectures.eu">
The MLLP research group launched with TLP 1.2 an updated version of the DFXP format, namely DFXP v1.1, which enables the modification of the speech segmentation. The counterpart is that user edition history cannot be tracked inside the DFXP file.
DFXP Tags
Tags are defined at four levels: document, segment, group and word. Document tags are located at the head section, while segment, group and word tags are located at the body section. An additional tag to relate alternative transcriptions/translations is also included. A detailed explanation of tags follows:
-
<tl:document>: This tag defines the attributes of the transcription/translation at the top level. As the attributes are inherited, the value of the attributes defined here are the default values, unless otherwise redefined. It contains a specific attribute to associate the current file to a unique video ID. Abbreviation: <tl:d>.
-
<tl:current>: This tag defines the current status of the subtitle file with the last modifications made by users, it contains an ordered sequence of text segments or captions. Abbreviation: <tl:c>.
-
<tl:origin>: This tag defines the former status of the subtitle file (typically the automatic transcription/translation), it contains an ordered sequence of text segments or captions. Abbreviation: <tl:o>.
-
<tl:segment>:This tag defines text segments or captions. Abbreviation: <tl:s>
-
<tl:group>: This tag defines a group of words inside a segment. This tag will usually appear as a result of the interaction with the user. Abbreviation: <tl:g>
-
<tl:word>: A simple tag used to specify single word properties, mostly used for time alignments and confidence measures. Other attributes are generally inherited. Abbreviation: <tl:w>
Next, we define the set of attributes related to the tags just defined. Most of the attributes are applicable to all levels:
-
authorType: Type of author. Their values are automatic or human. Human for those transcriptions/translations generated by human experts or completely supervised by human experts. Automatic for those transcriptions/translations fully generated by an ASR/MT system. Abbreviation: aT.
-
authorId: Author identifier. For example: RWTH, XEROX, UPV, Maria Gialama, etc. Abbreviation: aI.
-
authorConf: Confidence measure of the author when the authorType is human. This attribute is coupled with an authorId. This tag could be useful for non-native users supervising a foreign language. Abbreviation: aC.
-
wordSegId: It identifies the system that performs the automatic segmentation at the word level. It could be different from the authorId, since groups of words supervised by the user may be segmented at the word level with a different system from that providing the automatic transcription. Abbreviation: wS.
-
timeStamp: Instant of creation or modification. The timestamp format is a combination of date and time of day in Chapter 5.4 of ISO 8601. The format is [-]CCYY-MM- DDThh:mm:ss[Z$|$(+$|$-)hh:mm]. Abbreviation: tS.
-
confMeasure: Confidence measure of the level. These values are generated by ASR and MT systems. Abbreviation: cM.
-
videoId: Tag only defined at the document level. It links the current transcription or translation DFXP file to a unique video. Abbreviation: vI.
-
segmentId: It is used to uniquely identify a segment in a transcription or translation file. As mentioned above, alternative segments have the same segmentId. Abbreviation: sI.
-
begin: Instant of the beginning of an audio portion of the current tag in seconds. Abbreviation: b.
-
end: Instant of the end of an audio portion of the current tag in seconds. Abbreviation: e.
-
elapsedTime: Processing time. Abbreviation: eT.
-
modelID: Model used by decoder. Abbreviation: mI.
-
processingSteps: Processing steps of decoder. Abbreviation: pS.
-
audioLength: Complete length of the video. Abbreviation: aL.
-
status: Supervision status of the subtitles. Abbreviation: st. Possible values:
-
fully_automatic → All segments are automatic.
-
partially_human → Some segments have been supervised.
-
fully_human → All segments have been supervised.
-
Special characters such as & “ < > ' must be escaped in the DFXP files according to the XML standard (see http://xml.silmaril.ie/specials.html). |
Examples of Extended DFXP Tags
Examples at <head>
<tl:d aT="automatic" aI="UPV-v1.0" tS="2012-10-03T21:32:52" aC="0.6" cM="0.75" vI="1234-abcd" b="0.0" e="400.6"/> <tl:d aT="human" aI="John Doe" tS="2012-10-03T21:32:52" aC="1.0" cM="1.0" videoId="1234-abcd" b="1.0" e="400.6"/>
Examples at <body>
<tl:s sI="1" aT="automatic" aI="UPV" wS="UPV" tS="2012-10-03T21:32:52" cM="0.62" aC="0.5" b="0.0" e="15.6"> i am very hungry with you </tl:s>
<tl:g aT="human" aI="John Doe" aC="0.75" tS="2012-10-03T21:32:52" cM="1.0" b="2.7" e="3.5"> the way we train in IBM </tl:g>
<tl:w aT="automatic" aI="UPV" aC="0.5" cM="0.61" tS="2012-10-03T21:32:52" cM="1.0" b="1.3" e="2.1">the</tl:w> <tl:w cM="1.3" b="1.6" e="2.1">way</tl:w>
Use cases
-
A transcription/translation is automatically generated by an automatic system creating a DFXP file from scratch.
-
A transcription/translation is manually generated by a human expert creating a new DFXP file.
-
A user supervises an automatic/manual transcription/translation.
Use case examples
<?xml version="1.0" encoding="utf-8"?> <tt xml:lang="en" xmlns="http://www.w3.org/2006/04/ttaf1" xmlns:tts="http://www.w3.org/2006/10/ttaf1#style" xmlns:tl="translectures.eu"> <head> <tl:d aT="automatic" aI="UPV-v1.0" wS="UPV-v1.0" tS="2012-10-03T21:32:52" aC="0.56" cM="0.75" videoId="00505-Profesores_Alcoy.M03.B01" b="0.0" e="12.50"/> </head> <body> <tl:c> <tl:s sI="1" cM="0.75" b="0.00" e="3.20"> <tl:w cM="0.85" b="0.00" e="0.75">most</tl:w> <tl:w cM="0.89" b="0.75" e="0.95">of</tl:w> <tl:w cM="0.63" b="0.95" e="1.15">you</tl:w> <tl:w cM="0.40" b="1.15" e="1.35">are</tl:w> <tl:w cM="0.90" b="1.35" e="1.50">probably</tl:w> <tl:w cM="0.85" b="1.50" e="1.75">ventured</tl:w> <tl:w cM="0.55" b="1.75" e="2.00">the </tl:w> <tl:w cM="0.98" b="2.00" e="2.75">problem</tl:w> <tl:w cM="0.60" b="2.75" e="3.20">that</tl:w> </tl:s> <tl:s sI="2" cM="0.19" b="8.50" e="12.50"> <tl:w cM="0.1" b="8.50" e="9.00">To</tl:w> <tl:w cM="0.2" b="9.00" e="10.00">solve</tl:w> <tl:w cM="0.1" b="10.00" e="10.70">on</tl:w> <tl:w cM="0.1" b="10.70" e="12.50">this</tl:w> </tl:s> </tl:c> <tl:o> <tl:s sI="1" cM="0.75" b="0.00" e="3.20"> <tl:w cM="0.85" b="0.00" e="0.75">most</tl:w> <tl:w cM="0.89" b="0.75" e="0.95">of</tl:w> <tl:w cM="0.63" b="0.95" e="1.15">you</tl:w> <tl:w cM="0.40" b="1.15" e="1.35">are</tl:w> <tl:w cM="0.90" b="1.35" e="1.50">probably</tl:w> <tl:w cM="0.85" b="1.50" e="1.75">ventured</tl:w> <tl:w cM="0.55" b="1.75" e="2.00">the </tl:w> <tl:w cM="0.98" b="2.00" e="2.75">problem</tl:w> <tl:w cM="0.60" b="2.75" e="3.20">that</tl:w> </tl:s> <tl:s sI="2" cM="0.19" b="8.50" e="12.50"> <tl:w cM="0.1" b="8.50" e="9.00">To</tl:w> <tl:w cM="0.2" b="9.00" e="10.00">solve</tl:w> <tl:w cM="0.1" b="10.00" e="10.70">on</tl:w> <tl:w cM="0.1" b="10.70" e="12.50">this</tl:w> </tl:s> </tl:o> </body> </tt>
<?xml version="1.0" encoding="utf-8"?> <tt xml:lang="en" xmlns="http://www.w3.org/2006/04/ttaf1" xmlns:tts="http://www.w3.org/2006/10/ttaf1#style" xmlns:tl="translectures.eu"> <head> <tl:d aT="manual" aI="Maria" aC="1.0" videoId="00505-Profesores_Alcoy.M03.B01" tS="2012-10-03T21:32:52" cM="1.0" b="0.0" e="12.50"/> </head> <body> <tl:c> <tl:s sI="1" b="0.00" e="3.20"> most of you have probably ventured into the problem set. </tl:s> <tl:s sI="2" b="8.50" e="12.50"> The solution is: </tl:s> </tl:c> <tl:o> <tl:s sI="1" b="0.00" e="3.20"> most of you have probably ventured into the problem set. </tl:s> <tl:s sI="2" b="8.50" e="12.50"> The solution is: </tl:s> </tl:o> </body> </tt>
<?xml version="1.0" encoding="utf-8"?> <tt xml:lang="en" xmlns="http://www.w3.org/2006/04/ttaf1" xmlns:tts="http://www.w3.org/2006/10/ttaf1#style" xmlns:tl="translectures.eu"> <head> <tl:d aT="automatic" aI="UPV-v1.0" wS="UPV-v1.0" tS="2012-10-03T21:32:52" aC="0.56" cM="0.75" videoId="00505-Profesores_Alcoy.M03.B01" b="0.0" e="12.50"/> </head> <body> <tl:c> <tl:s sI="1" aT="human" aC="0.81" cM="1.0" aI="John" b="0.17" e="3.32" tS="2012-10-04T13:31:45"> most of you probably ventured into the problem set </tl:s> <tl:s sI="2" cM="0.19" b="8.5" e="12.50"> <tl:w cM="0.1" b="8.5" e="9">To</tl:w> <tl:w cM="0.2" b="9" e="10">solve</tl:w> <tl:w cM="0.1" b="10" e="10.7">on</tl:w> <tl:w cM="0.1" b="10.7" e="12.5">this</tl:w> </tl:s> </tl:c> <tl:o> <tl:s sI="1" cM="0.75" b="0.00" e="3.20"> <tl:w cM="0.85" b="0.00" e="0.75">most</tl:w> <tl:w cM="0.89" b="0.75" e="0.95">of</tl:w> <tl:w cM="0.63" b="0.95" e="1.15">you</tl:w> <tl:w cM="0.40" b="1.15" e="1.35">are</tl:w> <tl:w cM="0.90" b="1.35" e="1.50">probably</tl:w> <tl:w cM="0.85" b="1.50" e="1.75">ventured</tl:w> <tl:w cM="0.55" b="1.75" e="2.00">the </tl:w> <tl:w cM="0.98" b="2.00" e="2.75">problem</tl:w> <tl:w cM="0.60" b="2.75" e="3.20">that</tl:w> </tl:s> <tl:s sI="2" cM="0.19" b="8.50" e="12.50"> <tl:w cM="0.1" b="8.50" e="9.00">To</tl:w> <tl:w cM="0.2" b="9.00" e="10.00">solve</tl:w> <tl:w cM="0.1" b="10.00" e="10.70">on</tl:w> <tl:w cM="0.1" b="10.70" e="12.50">this</tl:w> </tl:s> </tl:o> </body> </tt>