A tf.data.Dataset object that contains a potentially large set
of elements, where each element is a pair of (input_data, target). The
input_data means the raw input data, like an image, a text etc., while
the target means the ground truth of the raw input data, e.g. the
classification label of the image etc.
size
The size of the dataset. tf.data.Dataset donesn't support a function
to get the length directly since it's lazy-loaded and may be infinite.
Attributes
label_names
num_classes
size
Returns the size of the dataset.
Same functionality as calling len. See the len method definition for
more information.
Folder structure should be:
/
images/
.jpg
...
labels.json
The labels.json annotations file should should have the following format:
{
"categories": [{"id": 0, "name": "background"}, ...],
"images": [{"id": 0, "file_name": ".jpg"}, ...],
"annotations": [{
"id": 0,
"image_id": 0,
"category_id": 2,
"bbox": [x-top left, y-top left, width, height],
}, ...]
}
Note that category id 0 is reserved for the "background" class. It is
optional to include, but if included it must be set to "background".
Args
data_dir
Name of the directory containing the data files.
max_num_images
Max number of images to process.
cache_dir
The cache directory to save TFRecord and metadata files. The
TFRecord files are a standardized format for training object detection
while the metadata file is used to store information like dataset size
and label mapping of id to label name. If the cache_dir is not set, a
temporary folder will be created and will not be removed automatically
after training which means it can be reused later.
Returns
Dataset containing images and labels and other related info.
Raises
ValueError
If the input data directory is empty.
ValueError
If the label_name for id 0 is set to something other than
the 'background' class.
Folder structure should be:
/
images/
.jpg
...
Annotations/
.xml
...
Each .xml annotation file should have the following format:
file0.jpg
Args
data_dir
Name of the directory containing the data files.
max_num_images
Max number of images to process.
cache_dir
The cache directory to save TFRecord and metadata files. The
TFRecord files are a standardized format for training object detection
while the metadata file is used to store information like dataset size
and label mapping of id to label name. If the cache_dir is not set, a
temporary folder will be created and will not be removed automatically
after training which means it can be reused later.
Returns
Dataset containing images and labels and other related info.
If size is not set, this method will fallback to using the len method
of the tf.data.Dataset in self._dataset. Calling len on a
tf.data.Dataset instance may throw a TypeError because the dataset may
be lazy-loaded with an unknown size or have infinite size.
In most cases, however, when an instance of this class is created by helper
functions like 'from_folder', the size of the dataset will be preprocessed,
and the _size instance variable will be already set.
Raises
TypeError if self._size is not set and the cardinality of self._dataset
is INFINITE_CARDINALITY or UNKNOWN_CARDINALITY.