Attention: This MediaPipe Solutions Preview is an early release. Learn more.

mediapipe_model_maker.text_classifier.preprocessor.AverageWordEmbeddingClassifierPreprocessor

Stay organized with collections Save and categorize content based on your preferences.

Preprocessor for an Average Word Embedding model.

Takes (text, label) data and applies regex tokenization and padding to the text to generate (token IDs, label) data.

seq_len Length of the input sequence to the model.
do_lower_case Whether text inputs should be converted to lower-case.
vocab Vocabulary of tokens used by the model.

Methods

get_vocab

View source

Returns the vocab of the AverageWordEmbeddingClassifierPreprocessor.

preprocess

View source

Preprocesses data into input for an Average Word Embedding model.

Args
dataset Stores (text, label) data.

Returns
Dataset containing (token IDs, label) data.

PAD '<PAD>'
START '<START>'
UNKNOWN '<UNKNOWN>'