Grammar Syntax
Complete reference for the Galore grammar DSL
Grammar Syntax
Galore uses an extended BNF syntax for defining grammars. This page is the complete syntax reference.
Basic Rule Syntax
A grammar rule defines a non-terminal and its possible productions:
NonTerminal -> production1 | production2 | ... ;
Key points:
- Rules end with a semicolon
; - Productions are separated by
|(pipe) - You can also use
:instead of->
Example
Expr -> Expr "+" Term | Term ;
Term -> Term "*" Factor | Factor ;
Factor -> "(" Expr ")" | NUMBER ;
Symbols
Non-Terminals
Non-terminals are symbols that have productions. By convention, they start with an uppercase letter:
Expr, Term, Statement, Program
Any identifier that appears on the left side of a rule (->) is automatically recognized as a non-terminal.
Terminals (Tokens)
Terminals can be defined in several ways:
| Type | Syntax | Example |
|---|---|---|
| String literal | "..." or '...' |
"if", "+", 'while' |
| Regex pattern | /pattern/flags |
/[0-9]+/, /[a-zA-Z_]\w*/ |
| Named token | %token NAME /pattern/ |
%token NUMBER /[0-9]+/ |
Inline Tokens
String literals and regex patterns used directly in rules are automatically added to the lexer:
// These create tokens automatically
Stmt -> "if" "(" Expr ")" Stmt ;
Number -> /[0-9]+/ ;
Directives
Directives configure the grammar and lexer. They start with %.
%token - Named Token Definition
Defines a named terminal with a pattern:
%token NUMBER /[0-9]+/
%token STRING /"([^"\\]|\\.)*"/
%token IDENT /[a-zA-Z_][a-zA-Z0-9_]*/
The pattern can be a regex (/pattern/) or a string literal ("literal").
%skip - Skipped Patterns
Defines patterns to skip (whitespace, comments):
%skip /[ \t\n\r]+/ // Whitespace
%skip /\/\/[^\n]*/ // Line comments
%skip /\/\*[\s\S]*?\*\// // Block comments
Skipped patterns are consumed but don't produce tokens.
%define - Reusable Patterns
Defines a named pattern that can be referenced in other patterns (using TLEX syntax):
%define DIGIT [0-9]
%define ALPHA [a-zA-Z]
%token NUMBER /{DIGIT}+/
%token IDENT /{ALPHA}({ALPHA}|{DIGIT})*/
%start - Start Symbol
Explicitly sets the start symbol (default is the first non-terminal):
%start Program
Program -> Statement* ;
%resyntax - Regex Syntax
Chooses between regex syntaxes:
%resyntax js // JavaScript regex (default)
%resyntax flex // Flex-style patterns
With flex syntax, patterns don't need delimiters and extend to end of line:
%resyntax flex
%token NUMBER [0-9]+
%tokenflex STRING \"([^\"\\]|\\.)*\"
EBNF Extensions
Galore supports EBNF notation for common patterns:
| Operator | Meaning | Example |
|---|---|---|
* |
Zero or more | Statement* |
+ |
One or more | Expr+ |
? |
Optional (zero or one) | ElseClause? |
( ) |
Grouping | (Expr ("," Expr)*) |
[ ] |
Optional group | ["else" Stmt] |
Examples
// Zero or more statements
Program -> Statement* ;
// One or more expressions separated by commas
ExprList -> Expr ("," Expr)* ;
// Optional else clause
IfStmt -> "if" "(" Expr ")" Stmt ["else" Stmt] ;
// Alternation within grouping
BinOp -> ("+" | "-" | "*" | "/") ;
Implementation Note
EBNF operators are expanded to auxiliary non-terminals internally. For example:
// This:
List -> Item* ;
// Becomes (with left recursion):
$List_star -> $List_star Item | ;
List -> $List_star ;
Semantic Actions
Attach handlers to rules for building ASTs or evaluating expressions:
Expr -> Expr "+" Term { add }
| Term { $1 }
;
Action Syntax
{ $N }- Return the Nth child (1-indexed){ handlerName }- Call a named handler function
See Semantic Actions for details on implementing handlers.
Comments
Galore supports C-style comments:
// Line comment
/*
Block comment
can span multiple lines
*/
Expr -> Term ; // Inline comment
Operator Precedence
Note: Directive-based precedence (%left, %right, %nonassoc) is not yet implemented.
Use grammar structure to encode precedence. Lower rules have higher precedence:
// Precedence from lowest to highest:
// 1. Addition/Subtraction (lowest)
// 2. Multiplication/Division
// 3. Parentheses (highest)
Expr -> Expr ("+" | "-") Term | Term ;
Term -> Term ("*" | "/") Factor | Factor ;
Factor -> "(" Expr ")" | NUMBER ;
In this grammar, * and / bind tighter than + and - because they're resolved at a deeper level of the grammar hierarchy.
Complete Example
// Calculator grammar with all features
%token NUMBER /-?[0-9]+(\.[0-9]+)?/
%token ID /[a-zA-Z_][a-zA-Z0-9_]*/
%skip /[ \t\n\r]+/
%skip /\/\/.*/
%start Program
Program -> Statement* ;
Statement -> ID "=" Expr ";"
| Expr ";"
;
Expr -> Expr ("+" | "-") Term { binop }
| Term { $1 }
;
Term -> Term ("*" | "/") Factor { binop }
| Factor { $1 }
;
Factor -> "(" Expr ")" { $2 }
| "-" Factor { negate }
| NUMBER { num }
| ID { ident }
;