Added section about input files
This commit is contained in:
parent
bb855836f6
commit
eb93293551
23
README.md
23
README.md
|
@ -65,6 +65,29 @@ in the project root, and watch the results.
|
|||
|
||||
## Getting started
|
||||
|
||||
Now that Lexesis is successfully built and your terminal is in the `build` folder, it's time to generate the lexer based on your input file.
|
||||
|
||||
### The input file
|
||||
|
||||
Input files for Lexesis have a `.lxs` extension and have a set of some very simple rules:
|
||||
On each line, a new type of token is specified with a different priority, starting with the highest at the top of the file and lowest at the bottom.
|
||||
If your input matches more than one of the regexes in your input file, the generated lexer will choose the token with the highest priority.
|
||||
The line begins with the name for the new type of token, following a `=` and finally the regex used to match tokens of that type.
|
||||
If you want to add a comment to the file, make sure the line starts with a `#` and Lexesis will ignore that line.
|
||||
|
||||
Consider the following example:
|
||||
|
||||
```
|
||||
Capital_letters = [A-Z]
|
||||
Numbers = [0-9]
|
||||
|
||||
# This is a comment
|
||||
All_letters = [a-zA-Z]
|
||||
```
|
||||
|
||||
Here we have 3 different tokens `Capital_letters`, `Numbers` and `All_letters`.
|
||||
Note that the names for the tokens only consist of capital letters, small letter and underscores, other characters are not accepted.
|
||||
When we run `A` through the generated lexer, it will return that it's a `Capital_letter`, since it is specified higher than `All_letters`.
|
||||
|
||||
### Regular expressions
|
||||
|
||||
|
|
Loading…
Reference in New Issue