StreamTokenizer

public class StreamTokenizer extends Object

Parses a stream into a set of defined tokens, one at a time. The different types of tokens that can be found are numbers, identifiers, quoted strings, and different comment styles. The class can be used for limited processing of source code of programming languages like Java, although it is nowhere near a full parser.

Constant Summary

int TT_EOF The constant representing the end of the stream.
int TT_EOL The constant representing the end of the line.
int TT_NUMBER The constant representing a number token.
int TT_WORD The constant representing a word token.

Field Summary

public double nval Contains a number if the current token is a number (ttype == TT_NUMBER).
public String sval Contains a string if the current token is a word (ttype == TT_WORD).
public int ttype After calling nextToken(), ttype contains the type of token that has been read.

Public Constructor Summary

StreamTokenizer(InputStream is)
This constructor is deprecated. Use StreamTokenizer(Reader)
StreamTokenizer(Reader r)
Constructs a new StreamTokenizer with r as source reader.

Public Method Summary

void
commentChar(int ch)
Specifies that the character ch shall be treated as a comment character.
void
eolIsSignificant(boolean flag)
Specifies whether the end of a line is significant and should be returned as TT_EOF in ttype by this tokenizer.
int
lineno()
Returns the current line number.
void
lowerCaseMode(boolean flag)
Specifies whether word tokens should be converted to lower case when they are stored in sval.
int
nextToken()
Parses the next token from this tokenizer's source stream or reader.
void
ordinaryChar(int ch)
Specifies that the character ch shall be treated as an ordinary character by this tokenizer.
void
ordinaryChars(int low, int hi)
Specifies that the characters in the range from low to hi shall be treated as an ordinary character by this tokenizer.
void
parseNumbers()
Specifies that this tokenizer shall parse numbers.
void
pushBack()
Indicates that the current token should be pushed back and returned again the next time nextToken() is called.
void
quoteChar(int ch)
Specifies that the character ch shall be treated as a quote character.
void
resetSyntax()
Specifies that all characters shall be treated as ordinary characters.
void
slashSlashComments(boolean flag)
Specifies whether "slash-slash" (C++-style) comments shall be recognized.
void
slashStarComments(boolean flag)
Specifies whether "slash-star" (C-style) comments shall be recognized.
String
toString()
Returns the state of this tokenizer in a readable format.
void
whitespaceChars(int low, int hi)
Specifies that the characters in the range from low to hi shall be treated as whitespace characters by this tokenizer.
void
wordChars(int low, int hi)
Specifies that the characters in the range from low to hi shall be treated as word characters by this tokenizer.

Inherited Method Summary

Constants

public static final int TT_EOF

The constant representing the end of the stream.

Constant Value: -1

public static final int TT_EOL

The constant representing the end of the line.

Constant Value: 10

public static final int TT_NUMBER

The constant representing a number token.

Constant Value: -2

public static final int TT_WORD

The constant representing a word token.

Constant Value: -3

Fields

public double nval

Contains a number if the current token is a number (ttype == TT_NUMBER).

public String sval

Contains a string if the current token is a word (ttype == TT_WORD).

public int ttype

After calling nextToken(), ttype contains the type of token that has been read. When a single character is read, its value converted to an integer is stored in ttype. For a quoted string, the value is the quoted character. Otherwise, its value is one of the following:

  • TT_WORD - the token is a word.
  • TT_NUMBER - the token is a number.
  • TT_EOL - the end of line has been reached. Depends on whether eolIsSignificant is true.
  • TT_EOF - the end of the stream has been reached.

Public Constructors

public StreamTokenizer (InputStream is)

This constructor is deprecated.
Use StreamTokenizer(Reader)

Constructs a new StreamTokenizer with is as source input stream. This constructor is deprecated; instead, the constructor that takes a Reader as an arugment should be used.

Parameters
is the source stream from which to parse tokens.
Throws
NullPointerException if is is null.

public StreamTokenizer (Reader r)

Constructs a new StreamTokenizer with r as source reader. The tokenizer's initial state is as follows:

  • All byte values 'A' through 'Z', 'a' through 'z', and '\u00A0' through '\u00FF' are considered to be alphabetic.
  • All byte values '\u0000' through '\u0020' are considered to be white space. '/' is a comment character.
  • Single quote '\'' and double quote '"' are string quote characters.
  • Numbers are parsed.
  • End of lines are considered to be white space rather than separate tokens.
  • C-style and C++-style comments are not recognized.

Parameters
r the source reader from which to parse tokens.

Public Methods

public void commentChar (int ch)

Specifies that the character ch shall be treated as a comment character.

Parameters
ch the character to be considered a comment character.

public void eolIsSignificant (boolean flag)

Specifies whether the end of a line is significant and should be returned as TT_EOF in ttype by this tokenizer.

Parameters
flag true if EOL is significant, false otherwise.

public int lineno ()

Returns the current line number.

Returns
  • this tokenizer's current line number.

public void lowerCaseMode (boolean flag)

Specifies whether word tokens should be converted to lower case when they are stored in sval.

Parameters
flag true if sval should be converted to lower case, false otherwise.

public int nextToken ()

Parses the next token from this tokenizer's source stream or reader. The type of the token is stored in the ttype field, additional information may be stored in the nval or sval fields.

Returns
  • the value of ttype.
Throws
IOException if an I/O error occurs while parsing the next token.

public void ordinaryChar (int ch)

Specifies that the character ch shall be treated as an ordinary character by this tokenizer. That is, it has no special meaning as a comment character, word component, white space, string delimiter or number.

Parameters
ch the character to be considered an ordinary character.

public void ordinaryChars (int low, int hi)

Specifies that the characters in the range from low to hi shall be treated as an ordinary character by this tokenizer. That is, they have no special meaning as a comment character, word component, white space, string delimiter or number.

Parameters
low the first character in the range of ordinary characters.
hi the last character in the range of ordinary characters.

public void parseNumbers ()

Specifies that this tokenizer shall parse numbers.

public void pushBack ()

Indicates that the current token should be pushed back and returned again the next time nextToken() is called.

public void quoteChar (int ch)

Specifies that the character ch shall be treated as a quote character.

Parameters
ch the character to be considered a quote character.

public void resetSyntax ()

Specifies that all characters shall be treated as ordinary characters.

public void slashSlashComments (boolean flag)

Specifies whether "slash-slash" (C++-style) comments shall be recognized. This kind of comment ends at the end of the line.

Parameters
flag true if // should be recognized as the start of a comment, false otherwise.

public void slashStarComments (boolean flag)

Specifies whether "slash-star" (C-style) comments shall be recognized. Slash-star comments cannot be nested and end when a star-slash combination is found.

Parameters
flag true if /* should be recognized as the start of a comment, false otherwise.

public String toString ()

Returns the state of this tokenizer in a readable format.

Returns
  • the current state of this tokenizer.

public void whitespaceChars (int low, int hi)

Specifies that the characters in the range from low to hi shall be treated as whitespace characters by this tokenizer.

Parameters
low the first character in the range of whitespace characters.
hi the last character in the range of whitespace characters.

public void wordChars (int low, int hi)

Specifies that the characters in the range from low to hi shall be treated as word characters by this tokenizer. A word consists of a word character followed by zero or more word or number characters.

Parameters
low the first character in the range of word characters.
hi the last character in the range of word characters.