StreamTokenizer

Page Summary

StreamTokenizer breaks an input stream into tokens like numbers, identifiers, strings, and comments.
It can be used for basic processing of programming language source code, but it's not a full parser.
Tokens are categorized into types like TT_WORD, TT_NUMBER, TT_EOL, and TT_EOF.
StreamTokenizer provides methods to customize how different characters are handled, such as defining comment characters, word characters, and whitespace.
You can configure the StreamTokenizer to recognize C-style and C++-style comments and handle case sensitivity.

public class StreamTokenizer extends Object

Parses a stream into a set of defined tokens, one at a time. The different types of tokens that can be found are numbers, identifiers, quoted strings, and different comment styles. The class can be used for limited processing of source code of programming languages like Java, although it is nowhere near a full parser.

Constant Summary

int	TT_EOF	The constant representing the end of the stream.
int	TT_EOL	The constant representing the end of the line.
int	TT_NUMBER	The constant representing a number token.
int	TT_WORD	The constant representing a word token.

Field Summary

public double	nval	Contains a number if the current token is a number (`ttype` == `TT_NUMBER`).
public String	sval	Contains a string if the current token is a word (`ttype` == `TT_WORD`).
public int	ttype	After calling `nextToken()`, `ttype` contains the type of token that has been read.

Public Constructor Summary

	StreamTokenizer(InputStream is) This constructor is deprecated. Use `StreamTokenizer(Reader)`
	StreamTokenizer(Reader r) Constructs a new `StreamTokenizer` with `r` as source reader.

Public Method Summary

void	commentChar(int ch) Specifies that the character `ch` shall be treated as a comment character.
void	eolIsSignificant(boolean flag) Specifies whether the end of a line is significant and should be returned as `TT_EOF` in `ttype` by this tokenizer.
int	lineno() Returns the current line number.
void	lowerCaseMode(boolean flag) Specifies whether word tokens should be converted to lower case when they are stored in `sval`.
int	nextToken() Parses the next token from this tokenizer's source stream or reader.
void	ordinaryChar(int ch) Specifies that the character `ch` shall be treated as an ordinary character by this tokenizer.
void	ordinaryChars(int low, int hi) Specifies that the characters in the range from `low` to `hi` shall be treated as an ordinary character by this tokenizer.
void	parseNumbers() Specifies that this tokenizer shall parse numbers.
void	pushBack() Indicates that the current token should be pushed back and returned again the next time `nextToken()` is called.
void	quoteChar(int ch) Specifies that the character `ch` shall be treated as a quote character.
void	resetSyntax() Specifies that all characters shall be treated as ordinary characters.
void	slashSlashComments(boolean flag) Specifies whether "slash-slash" (C++-style) comments shall be recognized.
void	slashStarComments(boolean flag) Specifies whether "slash-star" (C-style) comments shall be recognized.
String	toString() Returns the state of this tokenizer in a readable format.
void	whitespaceChars(int low, int hi) Specifies that the characters in the range from `low` to `hi` shall be treated as whitespace characters by this tokenizer.
void	wordChars(int low, int hi) Specifies that the characters in the range from `low` to `hi` shall be treated as word characters by this tokenizer.

Inherited Method Summary

From class java.lang.Object

Object	clone() Creates and returns a copy of this `Object`.
boolean	equals(Object obj) Compares this instance with the specified object and indicates if they are equal.
void	finalize() Invoked when the garbage collector has detected that this instance is no longer reachable.
final Class<?>	getClass() Returns the unique instance of `Class` that represents this object's class.
int	hashCode() Returns an integer hash code for this object.
final void	notify() Causes a thread which is waiting on this object's monitor (by means of calling one of the `wait()` methods) to be woken up.
final void	notifyAll() Causes all threads which are waiting on this object's monitor (by means of calling one of the `wait()` methods) to be woken up.
String	toString() Returns a string containing a concise, human-readable description of this object.
final void	wait(long timeout, int nanos) Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object or until the specified timeout expires.
final void	wait(long timeout) Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object or until the specified timeout expires.
final void	wait() Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object.

Constants

public static final int TT_EOF

The constant representing the end of the stream.

Constant Value: -1

public static final int TT_EOL

The constant representing the end of the line.

Constant Value: 10

public static final int TT_NUMBER

The constant representing a number token.

Constant Value: -2

public static final int TT_WORD

The constant representing a word token.

Constant Value: -3

Fields

public double nval

Contains a number if the current token is a number (ttype == TT_NUMBER).

public String sval

Contains a string if the current token is a word (ttype == TT_WORD).

public int ttype

After calling nextToken(), ttype contains the type of token that has been read. When a single character is read, its value converted to an integer is stored in ttype. For a quoted string, the value is the quoted character. Otherwise, its value is one of the following:

TT_WORD - the token is a word.
TT_NUMBER - the token is a number.
TT_EOL - the end of line has been reached. Depends on whether eolIsSignificant is true.
TT_EOF - the end of the stream has been reached.

Public Constructors

public StreamTokenizer (InputStream is)

This constructor is deprecated.
Use StreamTokenizer(Reader)

Constructs a new StreamTokenizer with is as source input stream. This constructor is deprecated; instead, the constructor that takes a Reader as an arugment should be used.

Parameters

is	the source stream from which to parse tokens.

Throws

NullPointerException	if `is` is `null`.

public StreamTokenizer (Reader r)

Constructs a new StreamTokenizer with r as source reader. The tokenizer's initial state is as follows:

All byte values 'A' through 'Z', 'a' through 'z', and '\u00A0' through '\u00FF' are considered to be alphabetic.
All byte values '\u0000' through '\u0020' are considered to be white space. '/' is a comment character.
Single quote '\'' and double quote '"' are string quote characters.
Numbers are parsed.
End of lines are considered to be white space rather than separate tokens.
C-style and C++-style comments are not recognized.

Parameters

r	the source reader from which to parse tokens.

Public Methods

public void commentChar (int ch)

Specifies that the character ch shall be treated as a comment character.

Parameters

ch	the character to be considered a comment character.

public void eolIsSignificant (boolean flag)

Specifies whether the end of a line is significant and should be returned as TT_EOF in ttype by this tokenizer.

Parameters

flag	`true` if EOL is significant, `false` otherwise.

public int lineno ()

Returns the current line number.

Returns

this tokenizer's current line number.

public void lowerCaseMode (boolean flag)

Specifies whether word tokens should be converted to lower case when they are stored in sval.

Parameters

flag	`true` if `sval` should be converted to lower case, `false` otherwise.

public int nextToken ()

Parses the next token from this tokenizer's source stream or reader. The type of the token is stored in the ttype field, additional information may be stored in the nval or sval fields.

Returns

the value of ttype.

Throws

IOException	if an I/O error occurs while parsing the next token.

public void ordinaryChar (int ch)

Specifies that the character ch shall be treated as an ordinary character by this tokenizer. That is, it has no special meaning as a comment character, word component, white space, string delimiter or number.

Parameters

ch	the character to be considered an ordinary character.

public void ordinaryChars (int low, int hi)

Specifies that the characters in the range from low to hi shall be treated as an ordinary character by this tokenizer. That is, they have no special meaning as a comment character, word component, white space, string delimiter or number.

Parameters

low	the first character in the range of ordinary characters.
hi	the last character in the range of ordinary characters.

public void parseNumbers ()

Specifies that this tokenizer shall parse numbers.

public void pushBack ()

Indicates that the current token should be pushed back and returned again the next time nextToken() is called.

public void quoteChar (int ch)

Specifies that the character ch shall be treated as a quote character.

Parameters

ch	the character to be considered a quote character.

public void resetSyntax ()

Specifies that all characters shall be treated as ordinary characters.

public void slashSlashComments (boolean flag)

Specifies whether "slash-slash" (C++-style) comments shall be recognized. This kind of comment ends at the end of the line.

Parameters

flag	`true` if `//` should be recognized as the start of a comment, `false` otherwise.

public void slashStarComments (boolean flag)

Specifies whether "slash-star" (C-style) comments shall be recognized. Slash-star comments cannot be nested and end when a star-slash combination is found.

Parameters

flag	`true` if `/*` should be recognized as the start of a comment, `false` otherwise.

public String toString ()

Returns the state of this tokenizer in a readable format.

Returns

the current state of this tokenizer.

public void whitespaceChars (int low, int hi)

Specifies that the characters in the range from low to hi shall be treated as whitespace characters by this tokenizer.

Parameters

low	the first character in the range of whitespace characters.
hi	the last character in the range of whitespace characters.

public void wordChars (int low, int hi)

Specifies that the characters in the range from low to hi shall be treated as word characters by this tokenizer. A word consists of a word character followed by zero or more word or number characters.

Parameters

low	the first character in the range of word characters.
hi	the last character in the range of word characters.

StreamTokenizer Stay organized with collections Save and categorize content based on your preferences.

Page Summary

Constant Summary

Field Summary

Public Constructor Summary

Public Method Summary

Inherited Method Summary

Constants

public static final int TT_EOF

public static final int TT_EOL

public static final int TT_NUMBER

public static final int TT_WORD

Fields

public double nval

public String sval

public int ttype

Public Constructors

public StreamTokenizer (InputStream is)

Parameters

Throws

public StreamTokenizer (Reader r)

Parameters

Public Methods

public void commentChar (int ch)

Parameters

public void eolIsSignificant (boolean flag)

Parameters

public int lineno ()

Returns

public void lowerCaseMode (boolean flag)

Parameters

public int nextToken ()

Returns

Throws

public void ordinaryChar (int ch)

Parameters

public void ordinaryChars (int low, int hi)

Parameters

public void parseNumbers ()

public void pushBack ()

public void quoteChar (int ch)

Parameters

public void resetSyntax ()

public void slashSlashComments (boolean flag)

Parameters

public void slashStarComments (boolean flag)

Parameters

public String toString ()

Returns

public void whitespaceChars (int low, int hi)

Parameters

public void wordChars (int low, int hi)

Parameters

StreamTokenizer