Added section about input files

2016-05-29 18:57:55 +02:00 · 2016-05-29 18:57:55 +02:00 · eb93293551
parent bb855836f6
commit eb93293551
1 changed files with 23 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -65,6 +65,29 @@ in the project root, and watch the results.

 ## Getting started

+Now that Lexesis is successfully built and your terminal is in the `build` folder, it's time to generate the lexer based on your input file.
+
+### The input file
+
+Input files for Lexesis have a `.lxs` extension and have a set of some very simple rules:
+On each line, a new type of token is specified with a different priority, starting with the highest at the top of the file and lowest at the bottom.
+If your input matches more than one of the regexes in your input file, the generated lexer will choose the token with the highest priority.
+The line begins with the name for the new type of token, following a `=` and finally the regex used to match tokens of that type.
+If you want to add a comment to the file, make sure the line starts with a `#` and Lexesis will ignore that line.
+
+Consider the following example:
+
+```
+Capital_letters = [A-Z]
+Numbers = [0-9]
+
+# This is a comment
+All_letters = [a-zA-Z]
+```
+
+Here we have 3 different tokens `Capital_letters`, `Numbers` and `All_letters`. 
+Note that the names for the tokens only consist of capital letters, small letter and underscores, other characters are not accepted.
+When we run `A` through the generated lexer, it will return that it's a `Capital_letter`, since it is specified higher than `All_letters`.

 ### Regular expressions