C++ Code Generation
Warning: This BETA API is not final, and subject to change before release.

1. Quickstart Guide

DBToaster generates C++ code for incrementally maintaining the results of a given set of queries if CPP is specified as the output language (-l cpp command line option). In this case DBToaster produces a C++ header file containing a set of datastructures (tlq_t, data_t and Program) required for executing the sql program.

Let's consider the following sql query:

$> cat test/queries/simple/rs_example1.sql CREATE TABLE R(A int, B int) FROM FILE '../../experiments/data/tiny/r.dat' LINE DELIMITED CSV (fields := ','); CREATE STREAM S(B int, C int) FROM FILE '../../experiments/data/tiny/s.dat' LINE DELIMITED CSV (fields := ','); SELECT SUM(r.A*s.C) as RESULT FROM R r, S s WHERE r.B = s.B;
The corresponding C++ header file can be obtained by running:
$> bin/dbtoaster test/queries/simple/rs_example1.sql -l cpp -o rs_example1.hpp

Alternatively, DBToaster can build a standalone binary (if the -c [binary name] flag is present) by compiling the generated header file against lib/dbt_c++/main.cpp, which provides code for executing the sql program and printing the results.

Requirements: The Boost header files and the following library binaries: boost_program_options, boost_serialization, boost_system, boost_filesystem, boost_chrono and boost_thread have to be present on the system since the generated code makes use of them. If these can't be found in the paths searched by default by g++ then their location has to be explicitly provided to DBToaster. This can be done in one of the following two ways, either through the environment variables:

  • DBT_HDR which should contain the path to Boost's include folder;
  • DBT_LIB which should contain the path to Boost's lib folder.
$> export DBT_HDR=path-to-boost-include-dir $> export DBT_LIB=path-to-boost-lib-dir $> bin/dbtoaster test/queries/simple/rs_example1.sql -l cpp -c rs_example1
or through the -I and -L command line flags:
$> bin/dbtoaster test/queries/simple/rs_example1.sql -l cpp -c rs_example1 -I path-to-boost-include-dir -L path-to-boost-lib-dir

Running the compiled binary will result in the following output:

$> ./rs_example1 <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> Initializing program: Running program: Printing final result: <snap class_id="0" tracking_level="0" version="0"> <RESULT>156</RESULT> </snap> </boost_serialization>
If the generated binary is run with the --async flag, it will also print intermediary results as frequently as possible while the sql program is running in a separate thread.
$> ./rs_example1 --async <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> Initializing program: Running program: <snap class_id="0" tracking_level="0" version="0"> <RESULT>0</RESULT> </snap> <snap> <RESULT>0</RESULT> </snap> <snap> <RESULT>0</RESULT> </snap> <snap> <RESULT>0</RESULT> </snap> <snap> <RESULT>9</RESULT> </snap> <snap> <RESULT>74</RESULT> </snap> <snap> <RESULT>141</RESULT> </snap> Printing final result: <snap> <RESULT>156</RESULT> </snap> </boost_serialization>

2. C++ API Guide

The DBToaster C++ codegenerator produces a header file containing 3 main type definitions in the dbtoaster namespace: tlq_t, data_t and Program. Additionally snapshot_t is pre-defined as a garbage collected pointer to tlq_t. What follows is a brief description of these types, while a more detailed presentation can be found in the Reference section.

tlq_t encapsulates the materialized views directly needed for computing the results and offers functions for retrieving them.

data_t extends tlq_t with auxiliary materialized views needed for maintaining the results and offers trigger functions for incrementally updating them.

Program represents the execution engine of the sql program. It encapsulates a data_t object and provides implementations to a set of abstract functions of the IProgram class used for running the program. Default implementations for some of these functions are inherited from the ProgramBase class while others are generated depending on the previously defined tlq_t and data_t types.

2.1. Executing the Program

The execution of a program can be controlled through the functions: IProgram::init(), IProgram::run(), IProgram::is_finished(), IProgram::process_streams() and IProgram::process_stream_event().

virtual void IProgram::init()
Loads the tuples of static tables and performs initialization of materialized views based on that data. The definition of this functions is generated as part of the Program class.
void IProgram::run( bool async = false )
Executes the program by invoking the Program::process_streams() function. If parameter async is set to true the execution takes place in a separate thread. This is a standard function defined by the IProgram class.
bool IProgram::is_finished()
Tests whether the program has finished or not. Especially relevant when the program is run in asynchronous mode. This is a standard function defined by the IProgram class.
virtual void IProgram::process_streams()
Reads stream events from various sources and invokes the IProgram::process_stream_event() on each event. Default implementation of this function (ProgramBase::process_streams()) reads events from the sources specified in the sql program.
virtual void IProgram::process_stream_event(event_t& ev)
Processes each stream event passing through the system. Default implementation of this function (ProgramBase::process_stream_event()) does incremental maintenance work by invoking the trigger function corresponding to the event type ev.type for stream ev.id with the arguments contained in ev.data.

2.2. Retrieving the Results

The snapshot_t IProgram::get_snapshot() function returns a snapshot of the results of the program. The query results can then be obtained by calling the appropriate get_TLQ_NAME() function on the snapshot object as described in the reference of tlq_t. If the program is running in asynchronous mode it is guaranteed that the taken snapshot is consistent.

Currently, the mechanism for taking snapshots is trivial, in that a snapshot consists of a full copy of the tlq_t object associated with the program. Consequently, the time required to obtain such a snapshot is linear in the size of the results set.

2.3. Basic Example

We will use as an example the C++ code generated for the rs_example1.sql sql program introduced above. In the interest of clarity some implementation details are omitted.

$> bin/dbtoaster test/queries/simple/rs_example1.sql -l cpp -o rs_example1.hpp #include <lib/dbt_c++/program_base.hpp> namespace dbtoaster { /* Definitions of auxiliary maps for storing materialized views. */ ... ... ... /* Type definition providing a way to access the results of the sql */ /* program */ struct tlq_t{ tlq_t() {} ... /* Functions returning / computing the results of top level */ /* queries */ long get_RESULT(){ ... } protected: /* Data structures used for storing/computing top level queries */ ... }; /* Type definition providing a way to incrementally maintain the */ /* results of the sql program */ struct data_t : tlq_t{ data_t() {} /* Registering relations and trigger functions */ void register_data(ProgramBase<tlq_t>& pb) { ... } /* Trigger functions for table relations */ void on_insert_R(long R_A, long R_B) { ... } /* Trigger functions for stream relations */ void on_insert_S(long S_B, long S_C) { ... } void on_delete_S(long S_B, long S_C) { ... } void on_system_ready_event() { ... } private: /* Data structures used for storing materialized views */ ... }; /* Type definition providing a way to execute the sql program */ class Program : public ProgramBase<tlq_t> { public: Program(int argc = 0, char* argv[] = 0) : ProgramBase<tlq_t>(argc,argv) { data.register_data(*this); /* Specifying data sources */ ... } /* Imports data for static tables and performs view */ /* initialization based on it. */ void init() { process_tables(); data.on_system_ready_event(); } /* Saves a snapshot of the data required to obtain the results */ /* of top level queries. */ snapshot_t take_snapshot(){ return snapshot_t( new tlq_t((tlq_t&)data) ); } private: data_t data; }; } }

Below is an example of how the API can be used to execute the sql program and print its results:

#include "rs_example1.hpp" int main(int argc, char* argv[]) { bool async = argc > 1 && !strcmp(argv[1],"--async"); dbtoaster::Program p; dbtoaster::Program::snapshot_t snap; cout << "Initializing program:" << endl; p.init(); cout << "Running program:" << endl; p.run( async ); while( !p.is_finished() ) { snap = p.get_snapshot(); cout << "RESULT: " << snap->get_RESULT() << endl; } cout << "Printing final result:" << endl; snap = p.get_snapshot(); cout << "RESULT: " << snap->get_RESULT() << endl; return 0; }

2.4. Custom Execution

Custom event processing can be performed on each stream event if the virtual function void IProgram::process_stream_event(event_t& ev) is overriden while still delegating the basic processing task of an event to Program::process_stream_event().

Example: Custom event processing.

namespace dbtoaster{ class CustomProgram_1 : public Program { public: void process_stream_event(event_t& ev) { cout << "on_" << event_name[ev.type] << "_"; cout << get_relation_name(ev.id) << "(" << ev.data << ")" << endl; Program::process_stream_event(ev); } }; }

Stream events can be manually read from custom sources and fed into the system by overriding the virtual function void IProgram::process_streams() and calling process_stream_event() for each event read.

Example: Custom event sourcing.

namespace dbtoaster{ class CustomProgram_2 : public Program { public: void process_streams() { for( long i = 1; i <= 10; i++ ) { event_args_t ev_args; ev_args.push_back(i); ev_args.push_back(i+10); event_t ev( insert_tuple, get_relation_id("S"), ev_args); process_stream_event(ev); } } }; }

3. C++ Generated Code Reference

3.1. struct tlq_t

The tlq_t contains all the relevant datastructures for computing the results of the sql program, also called the top level queries. It provides a set of functions named get_TLQ_NAME that return the top level query result labeled TLQ_NAME. For our example the tlq_t produced has a function named get_RESULT that returns the query result corresponding to SELECT SUM(r.A*s.C) as RESULT ... in rs_example1.sql.

3.1.1. Queries computing collections

In the example above the result consisted of a single value. If however our query has a GROUP BY clause its result is a collection and the corresponding get_RESULT function will return either a boost::multi_index_container or a std::map.

Let's consider the following example:

$> cat test/queries/simple/rs_example2.sql CREATE STREAM R(A int, B int) FROM FILE '../../experiments/data/tiny/r.dat' LINE DELIMITED CSV (fields := ','); CREATE STREAM S(B int, C int) FROM FILE '../../experiments/data/tiny/s.dat' LINE DELIMITED CSV (fields := ','); SELECT r.B, SUM(r.A*s.C) as RESULT_1, SUM(r.A+s.C) as RESULT_2 FROM R r, S s WHERE r.B = s.B GROUP BY r.B;
The generated code defines two collection types RESULT_1_map and RESULT_2_map and two corresponding entry types: RESULT_1_entry and RESULT_2_entry. These entry structures have a set of key fields corresponding to the GROUP BY clause, in our case R_B, and an additional value field, __av, storing the aggregated value of the top level query for each key in the collection. Finally, tlq_t contains two functions get_RESULT_1 and get_RESULT_2 returning the top level query results as RESULT_1_map and RESULT_2_map objects.
/* Definitions of auxiliary maps for storing materialized views. */ struct RESULT_1_entry { long R_B; long __av; ... }; typedef multi_index_container<RESULT_1_entry, ... > RESULT_1_map; ... struct RESULT_2_entry { long R_B; long __av; ... }; typedef multi_index_container<RESULT_2_entry, ... > RESULT_2_map; ... /* Type definition providing a way to access the results of the sql program */ struct tlq_t{ tlq_t() {} /* Serialization Code */ ... /* Functions returning / computing the results of top level queries */ RESULT_1_map& get_RESULT_1(){ ... } RESULT_2_map& get_RESULT_2(){ ... } protected: /* Data structures used for storing / computing top level queries */ RESULT_1_map RESULT_1; RESULT_2_map RESULT_2; };
If the given query has no aggregates the COUNT(*) aggregate will be computed by default and consequently the resulting collections will be guaranteed not to have any duplicate keys.

3.1.2. Partial Materialization

Some of the work involved in maintaining the results of a query can be saved by performing partial materialization and only computing the final results when invoking tlq_t's get_TLQ_NAME functions. This behaviour is especially desirable when the rate of querying the results is lower than the rate of updates, and can be enabled through the -F EXPRESSIVE-TLQS command line flag.
Below is an example of a query where partial materialization is indeed beneficial.

$> cat test/queries/simple/r_lift_of_count.sql CREATE STREAM R(A int, B int) FROM FILE '../../experiments/data/tiny/r.dat' LINE DELIMITED csv (); SELECT r2.C FROM ( SELECT r1.A, COUNT(*) AS C FROM R r1 GROUP BY r1.A ) r2;
Generated tlq_t without -F EXPRESSIVE-TLQS: We can see that get_COUNT() simply returns the materialized view of the results.
$> bin/dbtoaster test/queries/simple/r_lift_of_count.sql -l cpp ... /* Type definition providing a way to access the results of the sql program */ struct tlq_t{ tlq_t() {} ... /* Functions returning / computing the results of top level queries */ COUNT_map& get_COUNT(){ COUNT_map& __v_1 = COUNT; return __v_1; } protected: /* Data structures used for storing / computing top level queries */ COUNT_map COUNT; }; ...
Generated tlq_t with -F EXPRESSIVE-TLQS: We can see that get_COUNT() perfoms some final computation for constructing the end result in a temporary std::map before returning it. We should remark that tlq_t no longer contains the full materialized view of the results COUNT_map COUNT; but a partial materialization COUNT_1_E1_1_map COUNT_1_E1_1; used by get_COUNT() in computing the final query result.
$> bin/dbtoaster test/queries/simple/r_lift_of_count.sql -l cpp -F EXPRESSIVE-TLQS ... /* Type definition providing a way to access the results of the sql program */ struct tlq_t{ tlq_t() {} ... /* Functions returning / computing the results of top level queries */ map<long,long> get_COUNT(){ map<long,long> __v_41; /* Result computation based on COUNT_1_E1_1 */ return __v_41; } protected: /* Data structures used for storing / computing top level queries */ COUNT_1_E1_1_map COUNT_1_E1_1; }; ...

3.2. struct data_t

The data_t contains all the relevant datastructures and trigger functions for incrementally maintaining the results of the sql program.

For each stream based relation STREAM_X, present in the sql program, it provides a pair of trigger functions named on_insert_STREAM_X() and on_delete_STREAM_X() that incrementally maintain the query results in the event of an insertion/deletion of a tuple in STREAM_X. If generating code for the query presented above (rs_example1.sql) the data_t produced has the trigger functions void on_insert_S(long S_B, long S_C) / void on_delete_S(long S_B, long S_C).

For static table based relations only the insertion trigger is required and will get called when processing the static tables in the initialization phase of the program.

3.3. class Program

Finally, Program is a class that implements the IProgram interface and provides the basic functionalities for reading static table tuples and stream events from their sources, initializing the relevant datastructures, running the sql program and retrieving its results.