data¶
data_layer¶
Data layer classes
-
class
data.data_layer.
DataLayer
(params, model, num_workers, worker_id)[source]¶ Bases:
object
Abstract class from which all data layers must inherit.
-
__init__
(params, model, num_workers, worker_id)[source]¶ Data layer constructor. The TensorFlow graph should not be created here, but rather in the
self.build_graph()
method.Parameters: - params (dict) – parameters describing the data layer.
All supported parameters are listed in
get_required_params()
,get_optional_params()
functions. - model (instance of a class derived from
Model
) – parent model that created this data layer. Could be None if no model access is required for the use case. - num_workers (int) – number of Horovod processes or number of GPUs if Horovod is not used.
- worker_id (int) – Horovod process id or GPU id if Horovod is not used.
Config parameters:
- shuffle (bool) — whether to shuffle dataset after an epoch. Typically will be True for train and False for inference and evaluation.
- dtype — data dtype. Could be either
tf.float16
ortf.float32
.
- params (dict) – parameters describing the data layer.
All supported parameters are listed in
-
create_feed_dict
(model_in)[source]¶ A function that must be defined for data layers that support interactive infer. Given input which is an abstract data element to be defined by the data layer. The intended use is for the user to build and pass model_in from the jupyter notebook. Given model_in, the data layer must preprocess the raw data, and create the feed dict that defines the placeholders defined in create_interactive_placeholders().
-
create_interactive_placeholders
()[source]¶ A function that must be defined for data layers that support interactive infer. This function is intended to create placeholders that will be passed to self._input_tensors that will be passed to the model.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
get_size_in_samples
()[source]¶ Should return the dataset size in samples. That is, the number of objects in the dataset. This method is used to calculate a valid epoch size. If this method is not defined, you will need to make sure that your dataset for evaluation is created only for one epoch. You will also not be able to use
num_epochs
parameter in the base config.Returns: dataset size in samples. Return type: int
-
input_tensors
¶ Dictionary containing input tensors. This dictionary has to define the following keys: source_tensors, which should contain all tensors describing the input object (i.e. tensors that are passed to the encoder, e.g. input sequence and input length). And when
self.params['mode'] != "infer"
data layer should also define target_tensors which is the list of all tensors related to the corresponding target object (i.e. tensors taht are passed to the decoder and loss, e.g. target sequence and target length). Note that all tensors have to be created insideself.build_graph()
method.
-
iterator
¶ tf.data.Dataset
iterator. Should be created byself.build_graph()
.
-
params
¶ Parameters used to construct the data layer (dictionary).
-
utils¶
-
data.utils.
load_pre_existing_vocabulary
(path, min_idx=0, read_chars=False)[source]¶ Loads pre-existing vocabulary into memory.
The vocabulary file should contain a token on each line with optional token count on the same line that will be ignored. Example:
a 1234 b 4321 c 32342 d e word 234
Parameters: - path (str) – path to vocabulary.
- min_idx (int, optional) – minimum id to assign for a token.
- read_chars (bool, optional) – whether to read only the first symbol of the line.
Returns: vocabulary dictionary mapping tokens (chars/words) to int ids.
Return type: dict