Skip to content

API Reference

dlt.TranslationModel

init

dlt.TranslationModel.__init__(self, model_or_path: str = 'm2m100', tokenizer_path: str = None, device: str = 'auto', model_family: str = None, model_options: dict = None, tokenizer_options: dict = None)

Instantiates a multilingual transformer model for translation.

Parameter Type Default Description
model_or_path str m2m100 The path or the name of the model. Equivalent to the first argument of AutoModel.from_pretrained(). You can also specify shorthands ("mbart50" and "m2m100").
tokenizer_path str optional The path to the tokenizer. By default, it will be set to model_or_path.
device str auto "cpu", "gpu" or "auto". If it's set to "auto", will try to select a GPU when available or else fall back to CPU.
model_family str optional Either "mbart50" or "m2m100". By default, it will be inferred based on model_or_path. Needs to be explicitly set if model_or_path is a path.
model_options dict optional The keyword arguments passed to the model, which is a transformer for conditional generation.
tokenizer_options dict optional The keyword arguments passed to the model's tokenizer.


translate

dlt.TranslationModel.translate(self, text: Union[str, List[str]], source: str, target: str, batch_size: int = 32, verbose: bool = False, generation_options: dict = None) -> Union[str, List[str]]

Translates a string or a list of strings from a source to a target language.

Parameter Type Default Description
text Union[str, List[str]] required The content you want to translate.
source str required The language of the original text.
target str required The language of the translated text.
batch_size int 32 The number of samples to load at once. If set to None, it will process everything at once.
verbose bool False Whether to display the progress bar for every batch processed.
generation_options dict optional The keyword arguments passed to model.generate(), where model is the underlying transformers model.

Note: - Run print(dlt.utils.available_languages()) to see what's available. - A smaller value is preferred for batch_size if your (video) RAM is limited.


get_transformers_model

dlt.TranslationModel.get_transformers_model(self)

Retrieve the underlying mBART transformer model.


get_tokenizer

dlt.TranslationModel.get_tokenizer(self)

Retrieve the mBART huggingface tokenizer.


available_codes

dlt.TranslationModel.available_codes(self) -> List[str]

Returns all the available codes for a given dlt.TranslationModel instance.


available_languages

dlt.TranslationModel.available_languages(self) -> List[str]

Returns all the available languages for a given dlt.TranslationModel instance.


get_lang_code_map

dlt.TranslationModel.get_lang_code_map(self) -> Dict[str, str]

Returns the language -> codes dictionary for a given dlt.TranslationModel instance.


save_obj

dlt.TranslationModel.save_obj(self, path: str = 'saved_model') -> None

Saves your model as a torch object and save your tokenizer.

Parameter Type Default Description
path str saved_model The directory where you want to save your model and tokenizer


load_obj

dlt.TranslationModel.load_obj(path: str = 'saved_model', **kwargs)

Initialize dlt.TranslationModel from the torch object and tokenizer saved with dlt.TranslationModel.save_obj

Parameter Type Default Description
path str saved_model The directory where your torch model and tokenizer are stored



dlt.utils

get_lang_code_map

dlt.utils.get_lang_code_map(weights: str = 'mbart50') -> Dict[str, str]

Get a dictionary mapping a language -> code for a given model. The code will depend on the model you choose.

Parameter Type Default Description
weights str mbart50 The name of the model you are using. For example, "mbart50" is the multilingual BART Large with 50 languages available to use.


available_codes

dlt.utils.available_codes(weights: str = 'mbart50') -> List[str]

Get all the codes available for a given model. The code format will depend on the model you select.

Parameter Type Default Description
weights str mbart50 The name of the model you are using. For example, "mbart50" is the multilingual BART Large with 50 codes available to use.


available_languages

dlt.utils.available_languages(weights: str = 'mbart50') -> List[str]

Get all the languages available for a given model.

Parameter Type Default Description
weights str mbart50 The name of the model you are using. For example, "mbart50" is the multilingual BART Large with 50 languages available to use.