Added section about input files
This commit is contained in:
parent
bb855836f6
commit
eb93293551
23
README.md
23
README.md
|
@ -65,6 +65,29 @@ in the project root, and watch the results.
|
||||||
|
|
||||||
## Getting started
|
## Getting started
|
||||||
|
|
||||||
|
Now that Lexesis is successfully built and your terminal is in the `build` folder, it's time to generate the lexer based on your input file.
|
||||||
|
|
||||||
|
### The input file
|
||||||
|
|
||||||
|
Input files for Lexesis have a `.lxs` extension and have a set of some very simple rules:
|
||||||
|
On each line, a new type of token is specified with a different priority, starting with the highest at the top of the file and lowest at the bottom.
|
||||||
|
If your input matches more than one of the regexes in your input file, the generated lexer will choose the token with the highest priority.
|
||||||
|
The line begins with the name for the new type of token, following a `=` and finally the regex used to match tokens of that type.
|
||||||
|
If you want to add a comment to the file, make sure the line starts with a `#` and Lexesis will ignore that line.
|
||||||
|
|
||||||
|
Consider the following example:
|
||||||
|
|
||||||
|
```
|
||||||
|
Capital_letters = [A-Z]
|
||||||
|
Numbers = [0-9]
|
||||||
|
|
||||||
|
# This is a comment
|
||||||
|
All_letters = [a-zA-Z]
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we have 3 different tokens `Capital_letters`, `Numbers` and `All_letters`.
|
||||||
|
Note that the names for the tokens only consist of capital letters, small letter and underscores, other characters are not accepted.
|
||||||
|
When we run `A` through the generated lexer, it will return that it's a `Capital_letter`, since it is specified higher than `All_letters`.
|
||||||
|
|
||||||
### Regular expressions
|
### Regular expressions
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue