Parsodus/README.md

Parsodus
=======
A language agnostic parser generator

## Table Of Contents
* [Introduction](#introduction)
* [Requirements](#requirements)
* [Used dependencies](#used-dependencies)
* [Building](#building)
* [Getting started](#getting-started)
* [More examples](#more-examples)
* [Authors](#authors)
* [License](#license)

## Introduction
Parsodus is a language agnostic parser generator. Which means that it uses a description of a grammar in a kind of BNF notation, and outputs source files for a parser (which can be in any language for which a backend has been built, currently only c++), using a specified parsing algorithm (currently 4 kinds of LR parser have been implemented). This parser can then be augmented with rule handling to build an abstract structure representing the data, doing computations immediately, or anything else you can imagine.
It's principle is very similar to the well known tools such as [yacc](https://en.wikipedia.org/wiki/Yacc) or [bison](https://www.gnu.org/software/bison/), which the difference that Parsodus has a simpler input format, and does not depend on language specific actions to be specified in the configuration file. It uses a programming language independent description of the grammar, along with optional naming for the rules, in order to allow a bigger reusability across different programming languages of the same parser specification.

This project came into existence as an application exercise in a course on languages and turing machines for the University of Antwerp, and can be considered a continuation of Lexesis (a lexical analyser generator).

## Requirements
* git
* CMake 3.2.2+
* Boost variant header library (needed for mstch)
* Doxygen (optional, needed for building documentation)

For those still on *Ubuntu Trusty*, the default cmake version is still 2.8.12, so there is a ppa available with a more up-to-date version.

Run

    sudo apt-get update && sudo apt-get -y install software-properties-common; \
    sudo add-apt-repository -y ppa:george-edison55/cmake-3.x; \
    sudo apt-get update && sudo apt-get install -y cmake

to get this newer version

Boost variant can be installed on an ubuntu machine by running

    sudo apt-get update && sudo apt-get -y install libboost-dev

## Used dependencies
The following dependencies will be automatically downloaded with git while building

* [mstch](https://github.com/no1msd/mstch)
* [optparse](https://github.com/myint/optparse)
* [g3log](https://github.com/KjellKod/g3log)
* Lexesis

## Building
Get your terminal in the source tree and run the following commands:

    mkdir build
    cd build
    cmake ..
    make
    sudo make install

You can now simply run `Parsodus` with the arguments you like (see below and in the man pages for an overview).

If you want to build the documentation as well, simply run

    make doc

The output should be located in `build/doc`, with the main *html* page in `build/doc/html/index.html`.

### Running tests


First, build Parsodus in debug mode. The first difference with the normal building, is the line where you call cmake. That line should read

    cmake .. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=.

instead. Afterwards, after calling make install (which now installs locally in the build folder, so you don't need the sudo), build all the examples as well with

    cmake .
    make examples

since they are used to test the functionality of the generated parsers.

To run the unit tests: simply run `make test`
To run the tests for the generated parser: build the examples and run `python3 ./run_tests.py` in the project root

### Building examples

To build the examples: after running `make install`, run `cmake . && make examples`
You will now find the examples built in the *example* subdirectory of the build folder

## Getting started

Now that Parsodus is successfully built and your terminal is in the `build` folder, it's time to generate the parser based on your input file.

### The input file

Input files for Parsodus have a `.pds` extension and have a set of some very simple rules:
Variables in the grammar follow the regular expression `<[a-zA-Z_][a-zA-Z0-9_]*>`, and terminals use the same scheme, except using double quotes instead of angular brackets.

Furthermore, Parsodus uses a couple of key-value associations, including

- *parser*: the parsing algorithm to use
- *terminals*: a whitespace separated list of terminals
- *lexesis* (optional): a reference to a lexesis specification file. If given, terminals will be read from the lexesis file, and should as such not be specified separately in this file.
- *precedence* (optional): a whitespace separated list of `left`, `right`, or `nonassoc` followed by terminals, higher up is a higher precedence
- start: a variable to use as the start symbol
- grammar: a list of rules (see below)

A grammar rule is a variable followed by `::=` followed by a `|`-separated list of rule tails ended with a semicolon. A rule tail is a list of variables and terminals followed by an optional rule name of the form `[name]`. These annotations can additionally contain a specific precedence for this rule (a common example for this is the unary minus in mathematical expressions) in the form of a precedence specifier (`left`, `right` or `nonassoc`) followed by a non-negative integer. This integer specifies the precedence level for that rule (higher means higher precedence). Otherwise, the precedence for a rule is determined by the precedence of its last terminal. The lowest precedence in the *precedence*  section of the file has precedence level 0.

    parser: lalr(1)
    terminals:
        "A"
    start: <s>
    grammar:
        <s> ::= "A" [single]
              | "A" "A" [double]
              ;

We are building an LALR(1) parser, with replacement rules, both starting from the start-symbol `<s>`, named appropriately `single` and `double`.

Conventionally, terminals are all caps, while variables are lowercase.

Comments are from a `#` to the end of the line.

### Using the parser

Of course, how you use the generated parser highly depends on which backend you used to generate it. For the default c++ backend however, the easiest way of getting to know the parser is probably having a look at the class definition in the generated header file, usually named *&lt;Parsername&gt;.h*.
In general, there should be some way to run the parser, along with user defined actions, and get back the generated structure or abstract syntax tree.

### Debugging the parser

Nobody is perfect, so it is unlikely a grammar will always be completely conflict free from the very first time. It will happen on several occasions that Parsodus will notify you of a problem with your and you're unable to simply see where things go wrong. Or maybe the parser is behaving differently than you would expect. That's why there is a command line option to Parsodus that allows you to inspect the decisions made by the generator.

For the *LR* based parsers, that means that the information about the action and goto table is written, along with information about the configuration sets leading to the actual parser states. Conflicting actions are indicated with a leading exclamation mark, to be able to easily find the exact source of your troubles.

`Parsodus --debug` emits a *debug.log* file to the same output directory as the generated parser.

### Error recovery

If you provide a rule with `<error>` somewhere inside the body, this can be used to recover from errors. The generated parser will discard states until it encounters somewhere it can use such a rule. Afterwards, it will start discarding tokens until it can proceed in parsing. If it's impossible to recover from the error (cannot find a recovery rule or cannot find a synchronizing token before EOF), there will generally be an error (depending on the backend). Usually, some sort of reporting mechanism should also be in place by the generated parser.

`<error>` should never appear as the head of a grammar rule.

### Bash completion

A file that provides bash completion is also provided for your convenience, to use it, do:

for a single session: `source Parsodus-completion.bash`
If you want it always, copy the file to */etc/bash_completion.d* or your distro's equivalent location, and start a new shell.

## More examples
More examples can be found in the *examples* subdirectory, go ahead an have a look at them.
Feel free to play around and experiment with them.

## Authors
* Thomas Avé
* Robin Jadoul
* Kobe Wullaert

## License
This program is distributed under the terms of the MIT license. The generated code falls under the permissive zlib/libpng license.
First README version 2017-01-16 14:42:40 +01:00			`Parsodus`
			`=======`
			`A language agnostic parser generator`

			`## Table Of Contents`
			`* [Introduction](#introduction)`
			`* [Requirements](#requirements)`
Updated table of contents 2017-01-30 16:54:29 +01:00			`* [Used dependencies](#used-dependencies)`
First README version 2017-01-16 14:42:40 +01:00			`* [Building](#building)`
			`* [Getting started](#getting-started)`
			`* [More examples](#more-examples)`
			`* [Authors](#authors)`
Updated table of contents 2017-01-30 16:54:29 +01:00			`* [License](#license)`
First README version 2017-01-16 14:42:40 +01:00
			`## Introduction`
README fix 2017-01-18 11:51:11 +01:00			Parsodus is a language agnostic parser generator. Which means that it uses a description of a grammar in a kind of BNF notation, and outputs source files for a parser (which can be in any language for which a backend has been built, currently only c++), using a specified parsing algorithm (currently 4 kinds of LR parser have been implemented). This parser can then be augmented with rule handling to build an abstract structure representing the data, doing computations immediately, or anything else you can imagine.
First README version 2017-01-16 14:42:40 +01:00			It's principle is very similar to the well known tools such as [yacc](https://en.wikipedia.org/wiki/Yacc) or [bison](https://www.gnu.org/software/bison/), which the difference that Parsodus has a simpler input format, and does not depend on language specific actions to be specified in the configuration file. It uses a programming language independent description of the grammar, along with optional naming for the rules, in order to allow a bigger reusability across different programming languages of the same parser specification.

			`This project came into existence as an application exercise in a course on languages and turing machines for the University of Antwerp, and can be considered a continuation of Lexesis (a lexical analyser generator).`

			`## Requirements`
			`* git`
			`* CMake 3.2.2+`
			`* Boost variant header library (needed for mstch)`
			`* Doxygen (optional, needed for building documentation)`

			`For those still on Ubuntu Trusty, the default cmake version is still 2.8.12, so there is a ppa available with a more up-to-date version.`

			`Run`

			`sudo apt-get update && sudo apt-get -y install software-properties-common; \`
			`sudo add-apt-repository -y ppa:george-edison55/cmake-3.x; \`
			`sudo apt-get update && sudo apt-get install -y cmake`

			`to get this newer version`

			`Boost variant can be installed on an ubuntu machine by running`

			`sudo apt-get update && sudo apt-get -y install libboost-dev`

			`## Used dependencies`
			`The following dependencies will be automatically downloaded with git while building`

			`* [mstch](https://github.com/no1msd/mstch)`
			`* [optparse](https://github.com/myint/optparse)`
G3log from CMake 2017-01-26 20:59:17 +01:00			`* [g3log](https://github.com/KjellKod/g3log)`
Readme update 2017-01-26 16:12:41 +01:00			`* Lexesis`
First README version 2017-01-16 14:42:40 +01:00
			`## Building`
			`Get your terminal in the source tree and run the following commands:`

			`mkdir build`
			`cd build`
			`cmake ..`
			`make`
Install step 2017-01-30 16:21:31 +01:00			`sudo make install`
First README version 2017-01-16 14:42:40 +01:00
Install step 2017-01-30 16:21:31 +01:00			You can now simply run `Parsodus` with the arguments you like (see below and in the man pages for an overview).
First README version 2017-01-16 14:42:40 +01:00
			`If you want to build the documentation as well, simply run`

			`make doc`

			The output should be located in `build/doc`, with the main html page in `build/doc/html/index.html`.

			`### Running tests`

Install step 2017-01-30 16:21:31 +01:00
			`First, build Parsodus in debug mode. The first difference with the normal building, is the line where you call cmake. That line should read`

			`cmake .. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=.`

			`instead. Afterwards, after calling make install (which now installs locally in the build folder, so you don't need the sudo), build all the examples as well with`

			`cmake .`
			`make examples`

			`since they are used to test the functionality of the generated parsers.`

Readme update 2017-01-26 16:12:41 +01:00			To run the unit tests: simply run `make test`
			To run the tests for the generated parser: build the examples and run `python3 ./run_tests.py` in the project root

			`### Building examples`

			To build the examples: after running `make install`, run `cmake . && make examples`
			`You will now find the examples built in the example subdirectory of the build folder`

First README version 2017-01-16 14:42:40 +01:00			`## Getting started`

			Now that Parsodus is successfully built and your terminal is in the `build` folder, it's time to generate the parser based on your input file.

			`### The input file`

README fix 2017-01-18 11:51:11 +01:00			Input files for Parsodus have a `.pds` extension and have a set of some very simple rules:
First README version 2017-01-16 14:42:40 +01:00			Variables in the grammar follow the regular expression `<[a-zA-Z_][a-zA-Z0-9_]*>`, and terminals use the same scheme, except using double quotes instead of angular brackets.

			`Furthermore, Parsodus uses a couple of key-value associations, including`

			`- parser: the parsing algorithm to use`
			`- terminals: a whitespace separated list of terminals`
			`- lexesis (optional): a reference to a lexesis specification file. If given, terminals will be read from the lexesis file, and should as such not be specified separately in this file.`
			- precedence (optional): a whitespace separated list of `left`, `right`, or `nonassoc` followed by terminals, higher up is a higher precedence
			`- start: a variable to use as the start symbol`
			`- grammar: a list of rules (see below)`

Document rule precedence + slight clarification for error recovery 2017-01-30 20:57:18 +01:00			A grammar rule is a variable followed by `::=` followed by a `\|`-separated list of rule tails ended with a semicolon. A rule tail is a list of variables and terminals followed by an optional rule name of the form `[name]`. These annotations can additionally contain a specific precedence for this rule (a common example for this is the unary minus in mathematical expressions) in the form of a precedence specifier (`left`, `right` or `nonassoc`) followed by a non-negative integer. This integer specifies the precedence level for that rule (higher means higher precedence). Otherwise, the precedence for a rule is determined by the precedence of its last terminal. The lowest precedence in the precedence section of the file has precedence level 0.
First README version 2017-01-16 14:42:40 +01:00
			`parser: lalr(1)`
			`terminals:`
			`"A"`
			`start: <s>`
			`grammar:`
			`<s> ::= "A" [single]`
			`\| "A" "A" [double]`
			`;`

			We are building an LALR(1) parser, with replacement rules, both starting from the start-symbol `<s>`, named appropriately `single` and `double`.

			`Conventionally, terminals are all caps, while variables are lowercase.`

Add comments to the input file 2017-01-28 11:56:27 +01:00			Comments are from a `#` to the end of the line.

First README version 2017-01-16 14:42:40 +01:00			`### Using the parser`

Readme update 2017-01-26 16:12:41 +01:00			`Of course, how you use the generated parser highly depends on which backend you used to generate it. For the default c++ backend however, the easiest way of getting to know the parser is probably having a look at the class definition in the generated header file, usually named <Parsername>.h.`
First README version 2017-01-16 14:42:40 +01:00			`In general, there should be some way to run the parser, along with user defined actions, and get back the generated structure or abstract syntax tree.`

Document debugging 2017-01-31 13:48:19 +01:00			`### Debugging the parser`

			`Nobody is perfect, so it is unlikely a grammar will always be completely conflict free from the very first time. It will happen on several occasions that Parsodus will notify you of a problem with your and you're unable to simply see where things go wrong. Or maybe the parser is behaving differently than you would expect. That's why there is a command line option to Parsodus that allows you to inspect the decisions made by the generator.`

			`For the LR based parsers, that means that the information about the action and goto table is written, along with information about the configuration sets leading to the actual parser states. Conflicting actions are indicated with a leading exclamation mark, to be able to easily find the exact source of your troubles.`

			`Parsodus --debug` emits a debug.log file to the same output directory as the generated parser.

Document error recovery 2017-01-30 17:49:00 +01:00			`### Error recovery`

			If you provide a rule with `<error>` somewhere inside the body, this can be used to recover from errors. The generated parser will discard states until it encounters somewhere it can use such a rule. Afterwards, it will start discarding tokens until it can proceed in parsing. If it's impossible to recover from the error (cannot find a recovery rule or cannot find a synchronizing token before EOF), there will generally be an error (depending on the backend). Usually, some sort of reporting mechanism should also be in place by the generated parser.

Document rule precedence + slight clarification for error recovery 2017-01-30 20:57:18 +01:00			`<error>` should never appear as the head of a grammar rule.

Document bash completion 2017-01-30 17:43:04 +01:00			`### Bash completion`

			`A file that provides bash completion is also provided for your convenience, to use it, do:`

			for a single session: `source Parsodus-completion.bash`
			`If you want it always, copy the file to /etc/bash_completion.d or your distro's equivalent location, and start a new shell.`

First README version 2017-01-16 14:42:40 +01:00			`## More examples`
			`More examples can be found in the examples subdirectory, go ahead an have a look at them.`
			`Feel free to play around and experiment with them.`

			`## Authors`
			`* Thomas Avé`
			`* Robin Jadoul`
			`* Kobe Wullaert`
Add license information 2017-01-30 16:34:56 +01:00
			`## License`
			`This program is distributed under the terms of the MIT license. The generated code falls under the permissive zlib/libpng license.`