Grammar Syntax

Complete reference for the Galore grammar DSL

Grammar Syntax

Galore uses an extended BNF syntax for defining grammars. This page is the complete syntax reference.

Basic Rule Syntax

A grammar rule defines a non-terminal and its possible productions:

NonTerminal -> production1 | production2 | ... ;

Key points:

  • Rules end with a semicolon ;
  • Productions are separated by | (pipe)
  • You can also use : instead of ->

Example

Expr -> Expr "+" Term | Term ;
Term -> Term "*" Factor | Factor ;
Factor -> "(" Expr ")" | NUMBER ;

Symbols

Non-Terminals

Non-terminals are symbols that have productions. By convention, they start with an uppercase letter:

Expr, Term, Statement, Program

Any identifier that appears on the left side of a rule (->) is automatically recognized as a non-terminal.

Terminals (Tokens)

Terminals can be defined in several ways:

TypeSyntaxExample
String literal "..." or '...' "if", "+", 'while'
Regex pattern /pattern/flags /[0-9]+/, /[a-zA-Z_]\w*/
Named token %token NAME /pattern/ %token NUMBER /[0-9]+/

Inline Tokens

String literals and regex patterns used directly in rules are automatically added to the lexer:

// These create tokens automatically
Stmt -> "if" "(" Expr ")" Stmt ;
Number -> /[0-9]+/ ;

Directives

Directives configure the grammar and lexer. They start with %.

%token - Named Token Definition

Defines a named terminal with a pattern:

%token NUMBER /[0-9]+/
%token STRING /"([^"\\]|\\.)*"/
%token IDENT /[a-zA-Z_][a-zA-Z0-9_]*/

The pattern can be a regex (/pattern/) or a string literal ("literal").

%skip - Skipped Patterns

Defines patterns to skip (whitespace, comments):

%skip /[ \t\n\r]+/           // Whitespace
%skip /\/\/[^\n]*/           // Line comments
%skip /\/\*[\s\S]*?\*\//     // Block comments

Skipped patterns are consumed but don't produce tokens.

%define - Reusable Patterns

Defines a named pattern that can be referenced in other patterns (using TLEX syntax):

%define DIGIT [0-9]
%define ALPHA [a-zA-Z]
%token NUMBER /{DIGIT}+/
%token IDENT /{ALPHA}({ALPHA}|{DIGIT})*/

%start - Start Symbol

Explicitly sets the start symbol (default is the first non-terminal):

%start Program

Program -> Statement* ;

%resyntax - Regex Syntax

Chooses between regex syntaxes:

%resyntax js     // JavaScript regex (default)
%resyntax flex   // Flex-style patterns

With flex syntax, patterns don't need delimiters and extend to end of line:

%resyntax flex
%token NUMBER [0-9]+
%tokenflex STRING \"([^\"\\]|\\.)*\"

EBNF Extensions

Galore supports EBNF notation for common patterns:

OperatorMeaningExample
* Zero or more Statement*
+ One or more Expr+
? Optional (zero or one) ElseClause?
( ) Grouping (Expr ("," Expr)*)
[ ] Optional group ["else" Stmt]

Examples

// Zero or more statements
Program -> Statement* ;

// One or more expressions separated by commas
ExprList -> Expr ("," Expr)* ;

// Optional else clause
IfStmt -> "if" "(" Expr ")" Stmt ["else" Stmt] ;

// Alternation within grouping
BinOp -> ("+" | "-" | "*" | "/") ;

Implementation Note

EBNF operators are expanded to auxiliary non-terminals internally. For example:

// This:
List -> Item* ;

// Becomes (with left recursion):
$List_star -> $List_star Item | ;
List -> $List_star ;

Semantic Actions

Attach handlers to rules for building ASTs or evaluating expressions:

Expr -> Expr "+" Term { add }
      | Term { $1 }
      ;

Action Syntax

  • { $N } - Return the Nth child (1-indexed)
  • { handlerName } - Call a named handler function

See Semantic Actions for details on implementing handlers.

Comments

Galore supports C-style comments:

// Line comment

/*
   Block comment
   can span multiple lines
*/

Expr -> Term ;  // Inline comment

Operator Precedence

Note: Directive-based precedence (%left, %right, %nonassoc) is not yet implemented.

Use grammar structure to encode precedence. Lower rules have higher precedence:

// Precedence from lowest to highest:
// 1. Addition/Subtraction (lowest)
// 2. Multiplication/Division
// 3. Parentheses (highest)

Expr   -> Expr ("+" | "-") Term | Term ;
Term   -> Term ("*" | "/") Factor | Factor ;
Factor -> "(" Expr ")" | NUMBER ;

In this grammar, * and / bind tighter than + and - because they're resolved at a deeper level of the grammar hierarchy.

Complete Example

// Calculator grammar with all features

%token NUMBER /-?[0-9]+(\.[0-9]+)?/
%token ID /[a-zA-Z_][a-zA-Z0-9_]*/
%skip /[ \t\n\r]+/
%skip /\/\/.*/

%start Program

Program -> Statement* ;

Statement -> ID "=" Expr ";"
           | Expr ";"
           ;

Expr -> Expr ("+" | "-") Term { binop }
      | Term { $1 }
      ;

Term -> Term ("*" | "/") Factor { binop }
      | Factor { $1 }
      ;

Factor -> "(" Expr ")" { $2 }
        | "-" Factor { negate }
        | NUMBER { num }
        | ID { ident }
        ;