Note: To compile and run queries using the Scala backend requires the Scala compiler to be installed. Please refer to Getting Started for details
1. Compiling and running a query
DBToaster generates a JAR file for a query when using the -l scala and the -c <file> switch:
The command above compiles the query to test.jar. It can now be run as follows:
After processing all insertions and deletions, the final result is printed.
As an important side note for MS Windows users, we received some reports regarding problems about running the compiled Scala programs under Cygwin. We can propose several solutions, which among them is executing the command for running the compiled programs under the MS Windows itself (and not via Cygwin). The other solution, as most of the time there are some problems with the classpath under Cygwin, is passing the classpath explicitly, by running a command similar to the one below:
2. Scala API Guide
In the previous example, we used the standard main function to test the query. However, to use the query in real applications, it has to be run from within an application.
The following listing shows a simple example application that communicates with the query class. The communication between the application and the query class is handled using akka.
This example first creates an ActorSystem and then launches the query actor. The events are sent to the query using TupleEvent messages with the following structure:
Argument | Comment |
---|---|
ord: Int | Order number of the event. |
op:TupleOp | TupleInsert for an insertion, TupleDelete for a deletion. |
stream:String | Name of the stream as it appears in the SQL file. |
data:List[Any] | The values of the tuple that was inserted into/deleted from the stream. |
To retrieve the final result, an EndOfStream message is sent to the query actor. Alternatively the intermediate result of a query can be retrieved using a GetSnapshot message with the following structure:
Argument | Comment |
---|---|
view:List[Int] | List of maps that a snapshot is taken of. |
Assuming that the example code has been saved as example.scala, it can be compiled with:
It can then be launched with the following command:
3. Generated Code Reference
The Scala code generator generates a single file containing an object and an actor for a query.
Both of them are called Query by default if no other name has been specified using the -n
The code generated for the previous example looks as follows:
3.1. The query object
The query object contains the code used by the standalone binary to execute the query. Its execute method reads from the input streams specified in the query file and sends them to the query actor. The main method calls execute and prints the result when all tuples have been processed.
3.2. The query actor
The actual query processor lives in the query actor. Events like tuple insertions and deletions are communicated to the actor using actor messages as described previously. The receive method routes events to the appropriate trigger method. For every stream R, there is an insertion onAddR and a deletion trigger onDelR. These trigger methods are responsible of updating the intermediate result. The map and singleton data structures at the top of the actor hold the intermediate result.
The onSystemReady trigger is responsible of loading static information (CREATE TABLE statements in the query file) before the actual processing begins.
The EndOfStream message is sent from the event source when it is exhausted. The query actor replies to this message with the current processing statistics (processing time, number of tuples processed, number of tuples skipped) and one or multiple query results.
The GetSnapshot message can be used by an application to access the intermediate result. The query actor replies to this message with the current processing statistics and the results that the message asks for.
The whole process is guarded by a timeout. If the timeout is reached, the actor will stop to process tuples.
3.3. Partial materialization
Some of the work involved in maintaining the results of a query can be saved by performing partial materialization and only materialize the result when requested (i.e. when all tuples have been processed). This behaviour is especially desirable when the rate of querying the results is lower than the rate of updates, and can be enabled through the -F EXPRESSIVE-TLQS command line flag.
Below is an example of a query where partial materialization is indeed beneficial (this query can be found as examples/queries/simple/r_lif_of_count.sql in the DBToaster download).
When compiling this query with -F EXPRESSIVE-TLQS, the generated code now has functions representing top-level results:
4. Using queries in Java programs
Note: The DBToaster library currently does not offer a nice interface for Java applications. We are planning to add a nicer future once the Scala interface is stable.
Since Scala is compatible with Java, it is possible to use the generated queries in Java applications. In order to use a query in Java, the Java application has to reference the libraries present in the lib/dbt_scala folder as well as the Scala library.
The following Java code is equivalent to the Scala code presented in the example above:
If this code has been saved to ExampleApp.java and the Scala library has been copied to lib/dbt_scala, it can be compiled as follows:
The resulting exampleapp.jar can be launched using the following command: