Yume
Public Member Functions | Static Public Attributes | List of all members
yume::Tokenizer Class Reference

Contains the state while the tokenizer is running, such as the position within the file currently being read. More...

Collaboration diagram for yume::Tokenizer:
Collaboration graph
[legend]

Public Member Functions

void tokenize ()
 
 Tokenizer (std::istream &in, const char *source_file)
 
auto tokens ()
 

Static Public Attributes

static constexpr const auto is_word
 Words consist of alphanumeric characters, or underscores, but must begin with a letter. More...
 
static constexpr const auto is_str
 Strings are delimited by double quotes " and may contain escapes. More...
 
static constexpr const auto is_char_lit
 Character literals begin with a question mark ? and may contain escapes. More...
 
static constexpr const auto is_comment
 Comments begin with an octothorpe # and last until the end of the line. More...
 
static constexpr const auto is_num_or_hex_num
 This matches both regular numbers (0-9), and hex number. Hex numbers begin with 0x, and consist of any of 0-9, a-f or A-F. If the first character is a 0, is is ambiguous and must be checked further. More...
 
static constexpr const auto is_any_of
 Generate a criterion matching a single character from any within the string checks. More...
 
static constexpr const auto is_partial
 Generate a criterion matching one or both of the character. More...
 
static constexpr const auto is_char
 Generate a criterion matching the singular character. More...
 

Detailed Description

Contains the state while the tokenizer is running, such as the position within the file currently being read.

Definition at line 55 of file token.cpp.

Constructor & Destructor Documentation

◆ Tokenizer()

yume::Tokenizer::Tokenizer ( std::istream &  in,
const char *  source_file 
)
inline

Definition at line 214 of file token.cpp.

Member Function Documentation

◆ tokenize()

void yume::Tokenizer::tokenize ( )
inline

◆ tokens()

auto yume::Tokenizer::tokens ( )
inline

Definition at line 216 of file token.cpp.

Member Data Documentation

◆ is_any_of

constexpr const auto yume::Tokenizer::is_any_of
staticconstexpr
Initial value:
= [](string_view checks) {
return [checks](TokenState& state) {
return state.index == 0 && state.accept_validate(checks.find(state.c) != string::npos);
};
}

Generate a criterion matching a single character from any within the string checks.

Definition at line 155 of file token.cpp.

Referenced by tokenize().

◆ is_char

constexpr const auto yume::Tokenizer::is_char
staticconstexpr
Initial value:
= [](char chr) {
return [chr](TokenState& state) { return state.index == 0 && state.accept_validate(chr); };
}

Generate a criterion matching the singular character.

Definition at line 173 of file token.cpp.

Referenced by tokenize().

◆ is_char_lit

constexpr const auto yume::Tokenizer::is_char_lit
staticconstexpr
Initial value:
= [escape = false](TokenState& state) mutable {
if (state.index == 0)
return state.c == '?';
if (state.index == 1) {
if (state.c == '\\')
escape = true;
else
state.stream.write(state.c);
return state.validate();
}
if (state.index == 2 && escape) {
state.stream.write(unescape(state.c));
return state.validate();
}
return false;
}

Character literals begin with a question mark ? and may contain escapes.

Definition at line 109 of file token.cpp.

Referenced by tokenize().

◆ is_comment

constexpr const auto yume::Tokenizer::is_comment
staticconstexpr
Initial value:
= [](TokenState& state) {
if (state.index == 0)
return state.accept_validate('#');
return state.accept_validate(state.c != '\n');
}

Comments begin with an octothorpe # and last until the end of the line.

Definition at line 128 of file token.cpp.

Referenced by tokenize().

◆ is_num_or_hex_num

constexpr const auto yume::Tokenizer::is_num_or_hex_num
staticconstexpr
Initial value:
= [possibly_hex = false](TokenState& state) mutable {
if (state.index == 0 && state.c == '0') {
possibly_hex = true;
return state.accept_validate(true);
}
if (possibly_hex && state.index == 1) {
if (state.c == 'x') {
state.valid = false;
return state.accept();
}
possibly_hex = false;
}
if (possibly_hex)
return state.accept_validate(llvm::isHexDigit);
return state.accept_validate(llvm::isDigit);
}

This matches both regular numbers (0-9), and hex number. Hex numbers begin with 0x, and consist of any of 0-9, a-f or A-F. If the first character is a 0, is is ambiguous and must be checked further.

Definition at line 136 of file token.cpp.

Referenced by tokenize().

◆ is_partial

constexpr const auto yume::Tokenizer::is_partial
staticconstexpr
Initial value:
= [](char c1, char c2) {
return [c1, c2](TokenState& state) {
if (state.index == 0)
return state.accept_validate(c1);
if (state.index == 1)
return state.accept_validate(c2);
return false;
};
}

Generate a criterion matching one or both of the character.

Definition at line 162 of file token.cpp.

Referenced by tokenize().

◆ is_str

constexpr const auto yume::Tokenizer::is_str
staticconstexpr
Initial value:
= [end = false, escape = false](TokenState& state) mutable {
if (end)
return false;
if (state.index == 0)
return state.c == '"';
if (state.c == '\\' && !escape) {
escape = true;
} else if (state.c == '"' && !escape && !end) {
end = true;
state.validate();
} else if (escape) {
state.stream.write(unescape(state.c));
escape = false;
} else {
state.stream.write(state.c);
}
return true;
}
string_view end
Definition: errors.cpp:42

Strings are delimited by double quotes " and may contain escapes.

Definition at line 87 of file token.cpp.

Referenced by tokenize().

◆ is_word

constexpr const auto yume::Tokenizer::is_word
staticconstexpr
Initial value:
= [](TokenState& state) {
if (state.index == 0)
return state.accept_validate(llvm::isAlpha(state.c) || state.c == '_');
return state.accept_validate(llvm::isAlnum(state.c) || state.c == '_');
}

Words consist of alphanumeric characters, or underscores, but must begin with a letter.

Definition at line 80 of file token.cpp.

Referenced by tokenize().


The documentation for this class was generated from the following file: