Preprocessor for an Average Word Embedding model.
mediapipe_model_maker.text_classifier.preprocessor.AverageWordEmbeddingClassifierPreprocessor(
seq_len: int, do_lower_case: bool, texts: Sequence[str], vocab_size: int
)
Takes (text, label) data and applies regex tokenization and padding to the
text to generate (token IDs, label) data.
Attributes |
seq_len
|
Length of the input sequence to the model.
|
do_lower_case
|
Whether text inputs should be converted to lower-case.
|
vocab
|
Vocabulary of tokens used by the model.
|
Methods
get_vocab
View source
get_vocab() -> Mapping[str, int]
Returns the vocab of the AverageWordEmbeddingClassifierPreprocessor.
preprocess
View source
preprocess(
dataset: mediapipe_model_maker.text_classifier.Dataset
) -> mediapipe_model_maker.text_classifier.Dataset
Preprocesses data into input for an Average Word Embedding model.
Args |
dataset
|
Stores (text, label) data.
|
Returns |
Dataset containing (token IDs, label) data.
|
Class Variables |
PAD
|
'<PAD>'
|
START
|
'<START>'
|
UNKNOWN
|
'<UNKNOWN>'
|