image2label

image2label

class data.image2label.image2label.CifarDataLayer(params, model, num_workers, worker_id)[source]

Bases: open_seq2seq.data.data_layer.DataLayer

build_graph()[source]

Here all TensorFlow graph construction should happen.

static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict
get_size_in_samples()[source]

Should return the dataset size in samples. That is, the number of objects in the dataset. This method is used to calculate a valid epoch size. If this method is not defined, you will need to make sure that your dataset for evaluation is created only for one epoch. You will also not be able to use num_epochs parameter in the base config.

Returns:dataset size in samples.
Return type:int
input_tensors

Dictionary containing input tensors. This dictionary has to define the following keys: source_tensors, which should contain all tensors describing the input object (i.e. tensors that are passed to the encoder, e.g. input sequence and input length). And when self.params['mode'] != "infer" data layer should also define target_tensors which is the list of all tensors related to the corresponding target object (i.e. tensors taht are passed to the decoder and loss, e.g. target sequence and target length). Note that all tensors have to be created inside self.build_graph() method.

iterator

tf.data.Dataset iterator. Should be created by self.build_graph().

parse_record(raw_record, is_training, num_classes=10)[source]

Parse CIFAR-10 image and label from a raw record.

preprocess_image(image, is_training)[source]

Preprocess a single image of layout [height, width, depth].

class data.image2label.image2label.ImagenetDataLayer(params, model, num_workers, worker_id)[source]

Bases: open_seq2seq.data.data_layer.DataLayer

build_graph()[source]

Here all TensorFlow graph construction should happen.

static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict
get_size_in_samples()[source]

Should return the dataset size in samples. That is, the number of objects in the dataset. This method is used to calculate a valid epoch size. If this method is not defined, you will need to make sure that your dataset for evaluation is created only for one epoch. You will also not be able to use num_epochs parameter in the base config.

Returns:dataset size in samples.
Return type:int
input_tensors

Dictionary containing input tensors. This dictionary has to define the following keys: source_tensors, which should contain all tensors describing the input object (i.e. tensors that are passed to the encoder, e.g. input sequence and input length). And when self.params['mode'] != "infer" data layer should also define target_tensors which is the list of all tensors related to the corresponding target object (i.e. tensors taht are passed to the decoder and loss, e.g. target sequence and target length). Note that all tensors have to be created inside self.build_graph() method.

iterator

tf.data.Dataset iterator. Should be created by self.build_graph().

split_data(data)[source]

imagenet_preprocessing

Provides utilities to preprocess images. Training images are sampled using the provided bounding boxes, and subsequently cropped to the sampled bounding box. Images are additionally flipped randomly, then resized to the target output size (without aspect-ratio preservation). Images used during evaluation are resized (with aspect-ratio preservation) and centrally cropped. All images undergo mean color subtraction. Note that these steps are colloquially referred to as “ResNet preprocessing,” and they differ from “VGG preprocessing,” which does not use bounding boxes and instead does an aspect-preserving resize followed by random crop during training. (These both differ from “Inception preprocessing,” which introduces color distortion steps.)

data.image2label.imagenet_preprocessing._aspect_preserving_resize(image, resize_min)[source]

Resize images preserving the original aspect ratio.

Parameters:
  • image – A 3-D image Tensor.
  • resize_min – A python integer or scalar Tensor indicating the size of the smallest side after resize.
Returns:

A 3-D tensor containing the resized image.

Return type:

resized_image

data.image2label.imagenet_preprocessing._central_crop(image, crop_height, crop_width)[source]

Performs central crops of the given image list.

Parameters:
  • image – a 3-D image tensor
  • crop_height – the height of the image following the crop.
  • crop_width – the width of the image following the crop.
Returns:

3-D tensor with cropped image.

data.image2label.imagenet_preprocessing._decode_crop_and_flip(image_buffer, bbox, num_channels)[source]

Crops the given image to a random part of the image, and randomly flips. We use the fused decode_and_crop op, which performs better than the two ops used separately in series, but note that this requires that the image be passed in as an un-decoded string Tensor.

Parameters:
  • image_buffer – scalar string Tensor representing the raw JPEG image buffer.
  • bbox – 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax].
  • num_channels – Integer depth of the image buffer for decoding.
Returns:

3-D tensor with cropped image.

data.image2label.imagenet_preprocessing._mean_image_subtraction_and_normalization(image, means, num_channels)[source]

Subtracts the given means from each image channel and divides by 127.5.

For example:
means = [123.68, 116.779, 103.939] image = _mean_image_subtraction_and_normalization(image, means)

Note that the rank of image must be known.

Parameters:
  • image – a tensor of size [height, width, C].
  • means – a C-vector of values to subtract from each channel.
  • num_channels – number of color channels in the image that will be distorted.
Returns:

the centered image and normalized image.

Raises:

ValueError – If the rank of image is unknown, if image has a rank other than three or if the number of channels in image doesn’t match the number of values in means.

data.image2label.imagenet_preprocessing._parse_example_proto(example_serialized)[source]

Parses an Example proto containing a training example of an image. The output of the build_image_data.py image preprocessing script is a dataset containing serialized Example protocol buffers. Each Example proto contains the following fields (values are included as examples):

image/height: 462 image/width: 581 image/colorspace: ‘RGB’ image/channels: 3 image/class/label: 615 image/class/synset: ‘n03623198’ image/class/text: ‘knee pad’ image/object/bbox/xmin: 0.1 image/object/bbox/xmax: 0.9 image/object/bbox/ymin: 0.2 image/object/bbox/ymax: 0.6 image/object/bbox/label: 615 image/format: ‘JPEG’ image/filename: ‘ILSVRC2012_val_00041207.JPEG’ image/encoded: <JPEG encoded string>
Parameters:example_serialized – scalar Tensor tf.string containing a serialized Example protocol buffer.
Returns:Tensor tf.string containing the contents of a JPEG file. label: Tensor tf.int32 containing the label. bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax].
Return type:image_buffer
data.image2label.imagenet_preprocessing._resize_image(image, height, width)[source]

Simple wrapper around tf.resize_images. This is primarily to make sure we use the same ResizeMethod and other details each time.

Parameters:
  • image – A 3-D image Tensor.
  • height – The target height for the resized image.
  • width – The target width for the resized image.
Returns:

A 3-D tensor containing the resized image. The first two

dimensions have the shape [height, width].

Return type:

resized_image

data.image2label.imagenet_preprocessing._smallest_size_at_least(height, width, resize_min)[source]

Computes new shape with the smallest side equal to smallest_side. Computes new shape with the smallest side equal to smallest_side while preserving the original aspect ratio.

Parameters:
  • height – an int32 scalar tensor indicating the current height.
  • width – an int32 scalar tensor indicating the current width.
  • resize_min – A python integer or scalar Tensor indicating the size of the smallest side after resize.
Returns:

an int32 scalar tensor indicating the new height. new_width: an int32 scalar tensor indicating the new width.

Return type:

new_height

data.image2label.imagenet_preprocessing.parse_record(raw_record, is_training, image_size=224, num_classes=1000)[source]

Parses a record containing a training example of an image. The input record is parsed into a label and image, and the image is passed through preprocessing steps (cropping, flipping, and so on).

Parameters:
  • raw_record – scalar Tensor tf.string containing a serialized Example protocol buffer.
  • is_training – A boolean denoting whether the input is for training.
  • image_size (int) – size that images should be resized to.
  • num_classes (int) – number of output classes.
Returns:

Tuple with processed image tensor and one-hot-encoded label tensor.

data.image2label.imagenet_preprocessing.preprocess_image(image_buffer, bbox, output_height, output_width, num_channels, is_training=False)[source]

Preprocesses the given image. Preprocessing includes decoding, cropping, and resizing for both training and eval images. Training preprocessing, however, introduces some random distortion of the image to improve accuracy.

Parameters:
  • image_buffer – scalar string Tensor representing the raw JPEG image buffer.
  • bbox – 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax].
  • output_height – The height of the image after preprocessing.
  • output_width – The width of the image after preprocessing.
  • num_channels – Integer depth of the image buffer for decoding.
  • is_trainingTrue if we’re preprocessing the image for training and False otherwise.
Returns:

A preprocessed image.