API Reference
dlt.TranslationModel
init
dlt.TranslationModel.__init__(self, model_or_path: str = 'm2m100', tokenizer_path: str = None, device: str = 'auto', model_family: str = None, model_options: dict = None, tokenizer_options: dict = None)
Instantiates a multilingual transformer model for translation.
| Parameter | Type | Default | Description | 
|---|---|---|---|
| model_or_path | str | m2m100 | 
The path or the name of the model. Equivalent to the first argument of AutoModel.from_pretrained(). You can also specify shorthands ("mbart50" and "m2m100"). | 
| tokenizer_path | str | optional | The path to the tokenizer. By default, it will be set to model_or_path. | 
| device | str | auto | 
"cpu", "gpu" or "auto". If it's set to "auto", will try to select a GPU when available or else fall back to CPU. | 
| model_family | str | optional | Either "mbart50" or "m2m100". By default, it will be inferred based on model_or_path. Needs to be explicitly set if model_or_path is a path. | 
| model_options | dict | optional | The keyword arguments passed to the model, which is a transformer for conditional generation. | 
| tokenizer_options | dict | optional | The keyword arguments passed to the model's tokenizer. | 
translate
dlt.TranslationModel.translate(self, text: Union[str, List[str]], source: str, target: str, batch_size: int = 32, verbose: bool = False, generation_options: dict = None) -> Union[str, List[str]]
Translates a string or a list of strings from a source to a target language.
| Parameter | Type | Default | Description | 
|---|---|---|---|
| text | Union[str, List[str]] | required | The content you want to translate. | 
| source | str | required | The language of the original text. | 
| target | str | required | The language of the translated text. | 
| batch_size | int | 32 | 
The number of samples to load at once. If set to None, it will process everything at once. | 
| verbose | bool | False | 
Whether to display the progress bar for every batch processed. | 
| generation_options | dict | optional | The keyword arguments passed to model.generate(), where model is the underlying transformers model. | 
Note:
- Run print(dlt.utils.available_languages()) to see what's available.
- A smaller value is preferred for batch_size if your (video) RAM is limited.
get_transformers_model
dlt.TranslationModel.get_transformers_model(self)
Retrieve the underlying mBART transformer model.
get_tokenizer
dlt.TranslationModel.get_tokenizer(self)
Retrieve the mBART huggingface tokenizer.
available_codes
dlt.TranslationModel.available_codes(self) -> List[str]
Returns all the available codes for a given dlt.TranslationModel
instance.
available_languages
dlt.TranslationModel.available_languages(self) -> List[str]
Returns all the available languages for a given dlt.TranslationModel
instance.
get_lang_code_map
dlt.TranslationModel.get_lang_code_map(self) -> Dict[str, str]
Returns the language -> codes dictionary for a given dlt.TranslationModel
instance.
save_obj
dlt.TranslationModel.save_obj(self, path: str = 'saved_model') -> None
Saves your model as a torch object and save your tokenizer.
| Parameter | Type | Default | Description | 
|---|---|---|---|
| path | str | saved_model | 
The directory where you want to save your model and tokenizer | 
load_obj
dlt.TranslationModel.load_obj(path: str = 'saved_model', **kwargs)
Initialize dlt.TranslationModel from the torch object and tokenizer
saved with dlt.TranslationModel.save_obj
| Parameter | Type | Default | Description | 
|---|---|---|---|
| path | str | saved_model | 
The directory where your torch model and tokenizer are stored | 
dlt.utils
get_lang_code_map
dlt.utils.get_lang_code_map(weights: str = 'mbart50') -> Dict[str, str]
Get a dictionary mapping a language -> code for a given model. The code will depend on the model you choose.
| Parameter | Type | Default | Description | 
|---|---|---|---|
| weights | str | mbart50 | 
The name of the model you are using. For example, "mbart50" is the multilingual BART Large with 50 languages available to use. | 
available_codes
dlt.utils.available_codes(weights: str = 'mbart50') -> List[str]
Get all the codes available for a given model. The code format will depend on the model you select.
| Parameter | Type | Default | Description | 
|---|---|---|---|
| weights | str | mbart50 | 
The name of the model you are using. For example, "mbart50" is the multilingual BART Large with 50 codes available to use. | 
available_languages
dlt.utils.available_languages(weights: str = 'mbart50') -> List[str]
Get all the languages available for a given model.
| Parameter | Type | Default | Description | 
|---|---|---|---|
| weights | str | mbart50 | 
The name of the model you are using. For example, "mbart50" is the multilingual BART Large with 50 languages available to use. |