AI-generated Key Takeaways
-
StreamTokenizerbreaks an input stream into tokens like numbers, identifiers, strings, and comments. -
It can be used for basic processing of programming language source code, but it's not a full parser.
-
Tokens are categorized into types like TT_WORD, TT_NUMBER, TT_EOL, and TT_EOF.
-
StreamTokenizerprovides methods to customize how different characters are handled, such as defining comment characters, word characters, and whitespace. -
You can configure the
StreamTokenizerto recognize C-style and C++-style comments and handle case sensitivity.
Parses a stream into a set of defined tokens, one at a time. The different types of tokens that can be found are numbers, identifiers, quoted strings, and different comment styles. The class can be used for limited processing of source code of programming languages like Java, although it is nowhere near a full parser.
Constant Summary
| int | TT_EOF | The constant representing the end of the stream. |
| int | TT_EOL | The constant representing the end of the line. |
| int | TT_NUMBER | The constant representing a number token. |
| int | TT_WORD | The constant representing a word token. |
Field Summary
| public double | nval | Contains a number if the current token is a number (ttype ==
TT_NUMBER). |
| public String | sval | Contains a string if the current token is a word (ttype ==
TT_WORD). |
| public int | ttype | After calling nextToken(), ttype contains the type of
token that has been read. |
Public Constructor Summary
Public Method Summary
| void |
commentChar(int ch)
Specifies that the character
ch shall be treated as a comment
character. |
| void |
eolIsSignificant(boolean flag)
Specifies whether the end of a line is significant and should be returned
as
TT_EOF in ttype by this tokenizer. |
| int |
lineno()
Returns the current line number.
|
| void |
lowerCaseMode(boolean flag)
Specifies whether word tokens should be converted to lower case when they
are stored in
sval. |
| int |
nextToken()
Parses the next token from this tokenizer's source stream or reader.
|
| void |
ordinaryChar(int ch)
Specifies that the character
ch shall be treated as an ordinary
character by this tokenizer. |
| void |
ordinaryChars(int low, int hi)
Specifies that the characters in the range from
low to hi
shall be treated as an ordinary character by this tokenizer. |
| void |
parseNumbers()
Specifies that this tokenizer shall parse numbers.
|
| void |
pushBack()
Indicates that the current token should be pushed back and returned again
the next time
nextToken() is called. |
| void |
quoteChar(int ch)
Specifies that the character
ch shall be treated as a quote
character. |
| void |
resetSyntax()
Specifies that all characters shall be treated as ordinary characters.
|
| void |
slashSlashComments(boolean flag)
Specifies whether "slash-slash" (C++-style) comments shall be recognized.
|
| void |
slashStarComments(boolean flag)
Specifies whether "slash-star" (C-style) comments shall be recognized.
|
| String |
toString()
Returns the state of this tokenizer in a readable format.
|
| void |
whitespaceChars(int low, int hi)
Specifies that the characters in the range from
low to hi
shall be treated as whitespace characters by this tokenizer. |
| void |
wordChars(int low, int hi)
Specifies that the characters in the range from
low to hi
shall be treated as word characters by this tokenizer. |
Inherited Method Summary
Constants
public static final int TT_EOF
The constant representing the end of the stream.
public static final int TT_EOL
The constant representing the end of the line.
public static final int TT_NUMBER
The constant representing a number token.
public static final int TT_WORD
The constant representing a word token.
Fields
public double nval
Contains a number if the current token is a number (ttype ==
TT_NUMBER).
public int ttype
After calling nextToken(), ttype contains the type of
token that has been read. When a single character is read, its value
converted to an integer is stored in ttype. For a quoted string,
the value is the quoted character. Otherwise, its value is one of the
following:
-
TT_WORD- the token is a word. -
TT_NUMBER- the token is a number. -
TT_EOL- the end of line has been reached. Depends on whethereolIsSignificantistrue. -
TT_EOF- the end of the stream has been reached.
Public Constructors
public StreamTokenizer (InputStream is)
This constructor is deprecated.
Use StreamTokenizer(Reader)
Constructs a new StreamTokenizer with is as source input
stream. This constructor is deprecated; instead, the constructor that
takes a Reader as an arugment should be used.
Parameters
| is | the source stream from which to parse tokens. |
|---|
Throws
| NullPointerException | if is is null. |
|---|
public StreamTokenizer (Reader r)
Constructs a new StreamTokenizer with r as source reader.
The tokenizer's initial state is as follows:
- All byte values 'A' through 'Z', 'a' through 'z', and '\u00A0' through '\u00FF' are considered to be alphabetic.
- All byte values '\u0000' through '\u0020' are considered to be white space. '/' is a comment character.
- Single quote '\'' and double quote '"' are string quote characters.
- Numbers are parsed.
- End of lines are considered to be white space rather than separate tokens.
- C-style and C++-style comments are not recognized.
Parameters
| r | the source reader from which to parse tokens. |
|---|
Public Methods
public void commentChar (int ch)
Specifies that the character ch shall be treated as a comment
character.
Parameters
| ch | the character to be considered a comment character. |
|---|
public void eolIsSignificant (boolean flag)
Specifies whether the end of a line is significant and should be returned
as TT_EOF in ttype by this tokenizer.
Parameters
| flag | true if EOL is significant, false otherwise.
|
|---|
public int lineno ()
Returns the current line number.
Returns
- this tokenizer's current line number.
public void lowerCaseMode (boolean flag)
Specifies whether word tokens should be converted to lower case when they
are stored in sval.
Parameters
| flag | true if sval should be converted to lower
case, false otherwise.
|
|---|
public int nextToken ()
Parses the next token from this tokenizer's source stream or reader. The
type of the token is stored in the ttype field, additional
information may be stored in the nval or sval fields.
Returns
- the value of
ttype.
Throws
| IOException | if an I/O error occurs while parsing the next token. |
|---|
public void ordinaryChar (int ch)
Specifies that the character ch shall be treated as an ordinary
character by this tokenizer. That is, it has no special meaning as a
comment character, word component, white space, string delimiter or
number.
Parameters
| ch | the character to be considered an ordinary character. |
|---|
public void ordinaryChars (int low, int hi)
Specifies that the characters in the range from low to hi
shall be treated as an ordinary character by this tokenizer. That is,
they have no special meaning as a comment character, word component,
white space, string delimiter or number.
Parameters
| low | the first character in the range of ordinary characters. |
|---|---|
| hi | the last character in the range of ordinary characters. |
public void parseNumbers ()
Specifies that this tokenizer shall parse numbers.
public void pushBack ()
Indicates that the current token should be pushed back and returned again
the next time nextToken() is called.
public void quoteChar (int ch)
Specifies that the character ch shall be treated as a quote
character.
Parameters
| ch | the character to be considered a quote character. |
|---|
public void resetSyntax ()
Specifies that all characters shall be treated as ordinary characters.
public void slashSlashComments (boolean flag)
Specifies whether "slash-slash" (C++-style) comments shall be recognized. This kind of comment ends at the end of the line.
Parameters
| flag | true if // should be recognized as the start
of a comment, false otherwise.
|
|---|
public void slashStarComments (boolean flag)
Specifies whether "slash-star" (C-style) comments shall be recognized. Slash-star comments cannot be nested and end when a star-slash combination is found.
Parameters
| flag | true if /* should be recognized as the start
of a comment, false otherwise.
|
|---|
public String toString ()
Returns the state of this tokenizer in a readable format.
Returns
- the current state of this tokenizer.
public void whitespaceChars (int low, int hi)
Specifies that the characters in the range from low to hi
shall be treated as whitespace characters by this tokenizer.
Parameters
| low | the first character in the range of whitespace characters. |
|---|---|
| hi | the last character in the range of whitespace characters. |
public void wordChars (int low, int hi)
Specifies that the characters in the range from low to hi
shall be treated as word characters by this tokenizer. A word consists of
a word character followed by zero or more word or number characters.
Parameters
| low | the first character in the range of word characters. |
|---|---|
| hi | the last character in the range of word characters. |