Skip to main content

Bison & Flex

Bison and Flex form a traditional parser generator toolchain widely used in C/C++ environments. They are suitable for scenarios where you need custom language parsing, including SQL-like expressions, DSLs, or expression evaluation.


What It Is

  • Flex (lexical analyzer): Tokenizes input text into symbols.
  • Bison (parser generator): Builds a parser from grammar rules and produces a parse tree or AST.
  • Together, they allow building a full compiler-like pipeline: Input Text -> Lexical Tokens -> AST/Parse Tree -> Execution/Translation.

When to Use

  • Small to medium DSLs in C/C++ projects.
  • Offline language compilation or code generation, where control over parsing and AST is important.
  • Real-time SQL or expression parsing in C++ backend systems; Bison has no mature alternative in C++ for complex grammars.

Traditional business applications often do not need this level of IR or parsing control.


Advantages

  • Mature, stable in C/C++ ecosystems.
  • Fine-grained control of grammar, parsing, and AST generation.
  • Integrates with C++ code directly.
  • Many real-world examples, including SQL parsers in production systems.

Considerations

  • C mode recommended for better ecosystem compatibility; C++ mode is less mature.
  • Integration is more complex than PEGTL or DSL-based proto parsers.
  • Parsing performance is closely related to expression complexity and grammar design, including backtracking requirements.
  • Good for offline compilation pipelines, not primarily for high-throughput online services.

Minimal Example

This example shows a full Flex -> Bison -> C++ integration pipeline.

Lexer (lexer.l)

%{
#include "parser.tab.h"
#include <cstdlib>
%}

%%
SELECT return SELECT;
FROM return FROM;
WHERE return WHERE;
[0-9]+ { yylval.ival = atoi(yytext); return NUMBER; }
[a-zA-Z_]+ { yylval.sval = strdup(yytext); return IDENTIFIER; }
[ \t\n]+ /* skip whitespace */;
. return *yytext;
%%

Parser (parser.y)

%{
#include <iostream>
#include <string>
#include <vector>
using namespace std;

struct Column { string name; };
struct Table { string name; };
struct Condition { string expr; };
struct Query {
vector<Column> columns;
Table table;
Condition condition;
};
Query parsedQuery;
%}

%union {
int ival;
char* sval;
}

%token SELECT FROM WHERE IDENTIFIER NUMBER
%type <sval> IDENTIFIER
%type <ival> NUMBER

%%

query:
SELECT select_list FROM table_name where_clause
;

select_list:
IDENTIFIER { parsedQuery.columns.push_back({$1}); free($1); }
| select_list ',' IDENTIFIER { parsedQuery.columns.push_back({$3}); free($3); }
;

table_name:
IDENTIFIER { parsedQuery.table.name = $1; free($1); }
;

where_clause:
WHERE condition { parsedQuery.condition.expr = $2; free($2); }
| /* empty */ { parsedQuery.condition.expr = ""; }
;

condition:
IDENTIFIER '>' NUMBER
{
string cond = string($1) + ">" + to_string($3);
$$ = strdup(cond.c_str());
free($1);
}
;
%%

C++ Integration (main.cpp)

#include <iostream>
extern "C" {
int yyparse();
extern FILE* yyin;
}

int main() {
FILE* f = fopen("example.sql", "r");
if (!f) {
std::cerr << "Cannot open file" << std::endl;
return 1;
}
yyin = f;
yyparse();
fclose(f);
return 0;
}

CMake Build Example

cmake_minimum_required(VERSION 3.10)
project(sql_parser)

find_package(FLEX REQUIRED)
find_package(BISON REQUIRED)

BISON_TARGET(Parser parser.y ${CMAKE_CURRENT_BINARY_DIR}/parser.tab.cpp)
FLEX_TARGET(Lexer lexer.l ${CMAKE_CURRENT_BINARY_DIR}/lex.yy.cpp)
ADD_FLEX_BISON_DEPENDENCY(Lexer Parser)

add_executable(sql_parser main.cpp ${BISON_Parser_OUTPUTS} ${FLEX_Lexer_OUTPUTS})

References


Summary

  • Integration complexity: Medium–High (requires C/C++ linkage, memory management).
  • Performance: Good for offline or limited online parsing; depends on grammar complexity.
  • Best use cases: Offline compilation, expression evaluation, SQL parsing in backend systems.

The minimal example demonstrates complete pipeline integration, which is critical for readers to understand practical usage, not just fragments.