![]() |
Dataset library for text classifier.
Inherits From: Dataset
mediapipe_model_maker.text_classifier.Dataset(
dataset: tf.data.Dataset, size: int, label_names: List[str]
)
Methods
from_csv
@classmethod
from_csv( filename: str, csv_params:
mediapipe_model_maker.text_classifier.CSVParams
, shuffle: bool = True ) -> 'Dataset'
Loads text with labels from a CSV file.
Args | |
---|---|
filename
|
Name of the CSV file. |
csv_params
|
Parameters used for reading the CSV file. |
shuffle
|
If True, randomly shuffle the data. |
Returns | |
---|---|
Dataset containing (text, label) pairs and other related info. |
gen_tf_dataset
gen_tf_dataset(
batch_size: int = 1,
is_training: bool = False,
shuffle: bool = False,
preprocess: Optional[Callable[..., Any]] = None,
drop_remainder: bool = False
) -> tf.data.Dataset
Generates a batched tf.data.Dataset for training/evaluation.
Args | |
---|---|
batch_size
|
An integer, the returned dataset will be batched by this size. |
is_training
|
A boolean, when True, the returned dataset will be optionally shuffled and repeated as an endless dataset. |
shuffle
|
A boolean, when True, the returned dataset will be shuffled to create randomness during model training. |
preprocess
|
A function taking three arguments in order, feature, label and boolean is_training. |
drop_remainder
|
boolean, whether the finally batch drops remainder. |
Returns | |
---|---|
A TF dataset ready to be consumed by Keras model. |
split
split(
fraction: float
) -> Tuple[ds._DatasetT, ds._DatasetT]
Splits dataset into two sub-datasets with the given fraction.
Primarily used for splitting the data set into training and testing sets.
Args | |
---|---|
fraction
|
float, demonstrates the fraction of the first returned subdataset in the original data. |
Returns | |
---|---|
The splitted two sub datasets. |
__len__
__len__()
Returns the number of element of the dataset.