Added section about input files

This commit is contained in:
Thomas Ave 2016-05-29 18:57:55 +02:00
parent bb855836f6
commit eb93293551
1 changed files with 23 additions and 0 deletions

View File

@ -65,6 +65,29 @@ in the project root, and watch the results.
## Getting started
Now that Lexesis is successfully built and your terminal is in the `build` folder, it's time to generate the lexer based on your input file.
### The input file
Input files for Lexesis have a `.lxs` extension and have a set of some very simple rules:
On each line, a new type of token is specified with a different priority, starting with the highest at the top of the file and lowest at the bottom.
If your input matches more than one of the regexes in your input file, the generated lexer will choose the token with the highest priority.
The line begins with the name for the new type of token, following a `=` and finally the regex used to match tokens of that type.
If you want to add a comment to the file, make sure the line starts with a `#` and Lexesis will ignore that line.
Consider the following example:
```
Capital_letters = [A-Z]
Numbers = [0-9]
# This is a comment
All_letters = [a-zA-Z]
```
Here we have 3 different tokens `Capital_letters`, `Numbers` and `All_letters`.
Note that the names for the tokens only consist of capital letters, small letter and underscores, other characters are not accepted.
When we run `A` through the generated lexer, it will return that it's a `Capital_letter`, since it is specified higher than `All_letters`.
### Regular expressions