asfernandes
This blog has a new home: https://asfernandes.github.io

Integração da JVM ao Firebird

Posted In: , . By Adriano

A um tempo atrás "descobri" a integração do Oracle com Java no lado servidor, e logo pensei: O Firebird precisa disso também.

Sabia da iniciativa anterior de outros desenvolvedores fazerem o mesmo, e a idéia foi fazer o projeto vingar. Por diversos motivos isso pareceu impossível, então resolvi iniciar uma implementação alternativa (o FB/Java), bastante diferente internamente e com mais recursos (inclusive que o Oracle).

Usei este projeto como trabalho de conclusão de curso (TCC) de Ciência da Computação, e estou disponibilizando o mesmo para download.

Inicio explicando os conceitos e funcionalidades do Firebird e do Java relevantes ao tema. Depois, explico o funcionamento da integração entre as duas tecnologias.

 

Ubuntu 9.10 review

Posted In: . By Adriano

I've upgraded Ubuntu 9.04 to 9.10, and here is what I found about it.

At upgrade start, it asked about keyboard. Surprisingly, it has nothing about Brazilian keyboards and I choose to maintain "kernel keymaps" without know what exactly that is and why that keyboard popup appeared.

During the upgrade, flash_plugin and package_hook updates crashed. Nvidia driver, as always failed too. So after restarting in safe mode, I changed nvidia driver to nv to be able to enter in the graphical mode.

Then, to fix the system:
    sudo aptitude safe-upgrade
    sudo aptitude safe-upgrade
    sudo aptitude dist-upgrade

And reverted to nvidia driver.

They claim to have new beautiful and faster boot. Lies. The new boot screen is horrible, specially when running fsck. And appears to be slower than in 9.04.

The "fast user switch" gnome applet do not exist as before. It can't show your IM status anymore, nor can be compact as before. I changed it by the shutdown applet to have the same functionality that I used to have.

As with Ubuntu 9.04, Firefox 3.5 continues slow as the 9.04 Shiretoko in my system. And the flash plugin continue crashing. I use the AMD64 version, which requires wrapper to 32 bit flash, because Adobe didn't released the final AMD64 version yet.

Resuming, you decide. I want an updated system anyway. :-)

 

The buggy const

Posted In: , . By Adriano

In C/C++, there are constants and pointer to constants. I'm talking here about constants (i.e., const int var1; int* const var2) and not pointer to constants (i.e., const int* var3).

In 90% of cases that I see, usage of constants for local "variables" and parameters is "wrong". People seems to use it for a variable that is not changing, while, in my opinion, it should rather be used for variables that should never change.

Anyway, I always thought that constant parameters were not part of function signatures, because a const parameter has nothing to do with the caller, but is routine implementation detail. Indeed, this code compiles, links and run with G++ and MSVC++ and is also validated by the Comeau online validator:


#include <stdio.h>

class Test
{
public:
    void m1(const int n);
    void m2(int n);
};

int main()
{
    Test t;
    t.m1(10);
    t.m2(10);
    return 0;
}

void Test::m1(int n)
{
    printf("%d\n", n);
}

void Test::m2(const int n)
{
    printf("%d\n", n);
}

But not all compilers agree. It has been sent some patches fixing these inconsistencies to compile Firebird with Solaris Sun Studio compiler. This usage seems to cause problems because the compiler treat the declaration and implementation as different signatures.

In Java, final could be used for the same thing. Let see what it think about:

interface Int
{
    public void m1(final int n);
    public void m2(int n);
}

class Test implements Int
{
    public void m1(int n)
    {
    }

    public void m2(final int n)
    {
    }
}

Java also accepts this code.

 

Here I'm going to explain how Firebird DDL commands works in the architecture, why it stops innovation and how it is supposed to work in Firebird 3.0.

DDL in FB works more or less like DML, so first a briefly explanation of how DML works. When a DML command is prepared, it starts in the parser constructing a tree of nodes. That nodes are all a single pointer type, used for all node types and others usages (like storing constants). A node have a list of child nodes.

After parsing, it enters in the semantics phase, where things are verified and adjusted with execution in mind. After that, it enters in the generation phase, that outputs BLR bytes and stores them in the statement.

The DSQL API is a layer around the engine API. The engine API knows only how to execute BLR. Hence the DSQL API does passes the stored BLR to that lower level functions (engine API).

For engine API, there is no knowledge of DSQL nodes and the processing done in the semantics phase. The engine does a parse of BLR, reconstruct another tree and compiles it. That does almost what DSQL does, but DSQL does it on SQL code and engine does it on BLR bytes.

When a statement is executed, the BLR tree is traversed and everything runs.

So back to DDL... DDL is not described as BLR, but in an equivalent binary format: DYN. Some DYN verbs have BLR bytes embedded, like the body of a stored procedure, but that is not relevant for generic DDL handling.

As in DML case, the engine knows nothing about DDL nodes constructed in DSQL. The engine knows only how to run DYN, with the isc_ddl API. In DSQL, everything works as in DML, but the generation phase emits DYN bytes, and statement execution is layered around isc_ddl.

So you might ask, what's wrong with DDL execution if it's so consistent with DML?

A lot of things, I'd say:

  • DYN codes are very badly structured. Some codes that would be private for some specific verbs shares the "number space" of all verbs. That make things visible wrong.
  • Related to above, we're going out of space. In v2.5, we're at number 247. We could go only at 254, and use the 255 to make a second number space. This would make things even worse.
  • While some may say that BLR is also bad, this is a model not invented in InterBase/Firebird. This is how compilers works (assembly, byte codes, etc). But that model does not have any practical benefit in DDL.
  • Too much code is used to convert DSQL structures to bytes and to engine again.
  • There is large code duplication to handle similar things, like CREATE / ALTER commands.
  • DYN execution is totally unnatural for a RDBMS system. A CREATE PROCEDURE command is more or less defined this way:

isc_dyn_def_procedure, <name>,
isc_dyn_prc_source, <source>,
...,
// for each parameter - begin
isc_dyn_def_parameter, <name>,
isc_dyn_prm_number, <number>,
isc_dyn_prm_type, <type>,
...,
isc_dyn_end,
// for each parameter - end
isc_dyn_prc_blr, <bytes>,
isc_dyn_end

And is executed in this way (pseudo-code):

function DYN_execute()
{
    while ((dynVerb = getDynByte()) != isc_dyn_end)
    {
        switch (dynVerb)
        {
            case isc_dyn_def_procedure:
                DYN_define_procedure();
                break;
            case isc_dyn_def_parameter:
                DYN_define_parameter();
                break;
            ...
        }
    }
}

function DYN_define_procedure()
{
    procName = getString();

    while ((dynVerb = getDynByte()) != isc_dyn_end)
    {
        switch (dynVerb)
        {
            case isc_dyn_prc_source:
                procSource = getString();
                break;
            case isc_dyn_prc_blr:
                procBlr = getBytes();
                break;
            ...
            default:
                DYN_execute();
        }
    }

    INSERT INTO RDB$PROCEDURES
}

function DYN_define_parameter()
{
    paramName = getString();

    while ((dynVerb = getDynByte()) != isc_dyn_end)
    {
        switch (dynVerb)
        {
            case isc_dyn_prm_number:
                paramNumber = getNumber();
                break;
            case isc_dyn_prm_type:
                paramType = getParamType();
                break;
            ...
        }
    }

    INSERT INTO RDB$PROCEDURE_PARAMETERS
}

Did you see what is wrong? Did you ever asked why Firebird system tables does not have primary and foreign keys?

The answer is simple, there is no way to have a FK from RDB$PROCEDURE_PARAMETERS to RDB$PROCEDURES if the records in RDB$PROCEDURE_PARAMETERS are inserted first.

With these problems defined, I started an alternate DDL path in v2.5, with the ALTER CHARACTER SET command. This command does not emit DYN, and when asked to execute, it does execute directly in DSQL with a new C++ friendly node (also used in the parser).

This was not sufficient through. To have benefits, existing commands shall also be migrated to the new scheme, and DYN should better be totally eliminated. The problem of this is that, documented or not, DYN and isc_ddl are part of the API, and used by client tools. The GDEF utility is totally based on it, as well DDL commands of GPRE.

These archaic utilities are the reason of how things had been defined in this way and why it never changed. Natively, former InterBase versions didn't understand text commands. DDL and DML had been part of client utilities, that compiles them and send BLR/DYN to the server.

Fortunately the Firebird team agreed to deprecate (in the sense of warn to tell to not use anymore) these things in v2.5, so it will be eliminated in v3.0.

In my branch for v3.0, I started to implement this conversion. Procedures and triggers had been refactored and their DYN handling was eliminated. The new external function was also done in the new scheme. With nice C++ classes, the architecture of PACKAGES became smooth. Also, the implementation of these commands became shorter, self-contained and much more readable.

Yesterday, I did a step further and did some unification of EXECUTE BLOCK (DML) with procedures and triggers (DDL) code. A prototype of sub-procedures is in the way...

 

CVSNT client is slow

Posted In: . By Adriano

For years I was using CVSNT client to work with Firebird and any other CVS repository. I started to use it because I needed a client capable of handling the "sserver" protocol method, which the "standard" CVS can't.

A first thing I've noted has that it most times refetches completely (or just crashes trying) Firebird ChangeLog file during updates because some "corruption" (supposedly because it has more than 2MB). But that was OK with me, I just press Ctrl+Break and continue working.

By some reason, I had to use Cygwin's CVS to checkout whole firebird2 tree. It was much faster. So I switched to non-NT client and anything is much faster with it.

I've just started a merge of HEAD in my branch now. A lot of conflicts to resolve, most of them related to cleanups and formating changes. Unfortunately, it can't make this faster. :-)

BTW1, to open all conflicted files in gedit, I'm using:

grep -RlZ "<<<<<<<" builds/ src/ |xargs -0 gedit

BTW2, CVS is dead. Who cares?

BTW3, GIT seems excellent, but is somewhat difficult to grasp.

 

People often talk that Firebird performs badly in networks with high latency. I did some tests to see how the new protocol has been improved compared to the old one, and to see how it could be improved even more. My tests only cares about latency, and not about bandwidth.

It simulate network latency in Linux using Netem. With the tc utility, we can add a delay for outbound traffic. Since I did it for lo interface, it happens that it works for in/out traffic. I set delay to 100ms (that is more or less a value I found doing it in the internet with my horrible Telefonica speedy connection), using this command:

$ sudo tc qdisc add dev lo root handle 1:0 netem delay 100ms

And here is the result of a ping:
$ ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=200 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=200 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=200 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=200 ms

To later remove the delay, the command is:
$ sudo tc qdisc del dev lo root

The test uses Firebird library present on (my till unknown) CppSys project (more on CppSys in a future post). The CppSys database library has interfaces and semantics based on JDBC. The relevant test code is here:
println("Connecting to the database");
AutoPtr<FbDatabase> database(client->openDatabase(
    String("firebird:") + databaseName));


println("Starting the transaction");
AutoPtr<FbTransaction> transaction(database->startTransaction(
    TransactionIsolation::SNAPSHOT));

println("Preparing stmt1");
AutoPtr<FbPreparedStatement> stmt1(database->prepareStatement(
    transaction,

    "select rdb$relation_name from rdb$relations"
    "    where rdb$system_flag = 1"
    "    order by rdb$relation_name"));

println("Preparing stmt2");
AutoPtr<FbPreparedStatement> stmt2(database->prepareStatement(
    transaction,

    "select rdb$field_name from rdb$relation_fields"
    "    where rdb$relation_name = ?"
    "    order by rdb$field_name"));

println("Preparing stmt3");
AutoPtr<FbPreparedStatement> stmt3(database->prepareStatement(
    transaction,

    "select rdb$trigger_name from rdb$triggers"
    "    where rdb$relation_name = ?"
    "    order by rdb$trigger_name"));

println("Executing stmt1");
AutoPtr<FbResultSet> rs1(stmt1->executeQuery());

println("Starting fetch rs1");

while (rs1->fetch())
{
    println("rs1: %s", rs1->getString(1).getPtr());

    println("\tExecuting stmt2");
    stmt2->setString(1, rs1->getString(1));
    AutoPtr<FbResultSet> rs2(stmt2->executeQuery());

    println("\tExecuting stmt3");
    stmt3->setString(1, rs1->getString(1));
    AutoPtr<FbResultSet> rs3(stmt3->executeQuery());

    println("\tStarting fetch rs2");

    while (rs2->fetch())
        println("\trs2: %s", rs2->getString(1).getPtr());

    println("\tStarting fetch rs3");

    while (rs3->fetch())
        println("\trs3: %s", rs3->getString(1).getPtr());
}

The println function buffers the text and prints it in its next run, with the current time (in milliseconds) minus the time when the text was buffered. Here is its code:
static void println(const char* format, ...)
{
    static char lineBuffer[1024];
    static int64 lastTime = 0;

    timeb tb;
    ftime(&tb);

    int64 thisTime = int64(tb.time) * 1000 + tb.millitm;

    if (lastTime == 0)
        lastTime = thisTime;
    else
    {
        printf("%06d: %s\n", (int) (thisTime - lastTime),
            lineBuffer);

    }

    va_list va;
    va_start(va, format);

    vsprintf(lineBuffer, format, va);

    va_end(va);

    lastTime = thisTime;
}

As you see, the test does what every client/server database developer must knows that he/she shouldn't do: a main query with nested queries for each record. If you ever run ISQL over the internet, you had see it performs very slow, and that is the reason. ISQL does just that. It does that because it's written using embedded SQL with GPRE and sometimes this is needed to access new columns that may not be presented on a database of an older ODS. And due to embedded SQL nature, "details" of where and when to prepare statements are just ignored.

So far, here is the results.

Result for the old protocol, with 2.0 client and 2.5 server:
000665: Connecting to the database
000201: Starting the transaction
001234: Preparing stmt1
001204: Preparing stmt2
001204: Preparing stmt3
000200: Executing stmt1
000202: Starting fetch rs1
000000: rs1: MON$ATTACHMENTS
000200:     Executing stmt2
000201:     Executing stmt3
000201:     Starting fetch rs2
000000:     rs2: MON$ATTACHMENT_ID
000000:     rs2: MON$ATTACHMENT_NAME
000000:     rs2: MON$CHARACTER_SET_ID
000001:     rs2: MON$GARBAGE_COLLECTION
000000:     rs2: MON$REMOTE_ADDRESS
000000:     rs2: MON$REMOTE_PID
000000:     rs2: MON$REMOTE_PROCESS
000001:     rs2: MON$REMOTE_PROTOCOL
000000:     rs2: MON$ROLE
000000:     rs2: MON$SERVER_PID
000000:     rs2: MON$STATE
000001:     rs2: MON$STAT_ID
000000:     rs2: MON$TIMESTAMP
000000:     rs2: MON$USER
000601:     Starting fetch rs3
000000: rs1: MON$CALL_STACK
000200:     Executing stmt2
000201:     Executing stmt3
000201:     Starting fetch rs2
000000:     rs2: MON$CALLER_ID
000000:     rs2: MON$CALL_ID
000000:     rs2: MON$OBJECT_NAME
000000:     rs2: MON$OBJECT_TYPE
000000:     rs2: MON$SOURCE_COLUMN
000000:     rs2: MON$SOURCE_LINE
000000:     rs2: MON$STATEMENT_ID
000001:     rs2: MON$STAT_ID
000000:     rs2: MON$TIMESTAMP
000601:     Starting fetch rs3
<...>
000000: rs1: RDB$CHECK_CONSTRAINTS
000200:     Executing stmt2
000200:     Executing stmt3
000200:     Starting fetch rs2
000001:     rs2: RDB$CONSTRAINT_NAME
000000:     rs2: RDB$TRIGGER_NAME
000200:     Starting fetch rs3
000000:     rs3: RDB$TRIGGER_14
000000:     rs3: RDB$TRIGGER_15
000001:     rs3: RDB$TRIGGER_16
000400:     rs3: RDB$TRIGGER_35
000000: rs1: RDB$COLLATIONS
<...>

Result for the new protocol, with 2.5 client and 2.5 server:
000670: Connecting to the database
000200: Starting the transaction
000229: Preparing stmt1
000203: Preparing stmt2
000203: Preparing stmt3
000000: Executing stmt1
000202: Starting fetch rs1
000000: rs1: MON$ATTACHMENTS
000000:     Executing stmt2
000000:     Executing stmt3
000202:     Starting fetch rs2
000000:     rs2: MON$ATTACHMENT_ID
000000:     rs2: MON$ATTACHMENT_NAME
000000:     rs2: MON$CHARACTER_SET_ID
000000:     rs2: MON$GARBAGE_COLLECTION
000000:     rs2: MON$REMOTE_ADDRESS
000000:     rs2: MON$REMOTE_PID
000000:     rs2: MON$REMOTE_PROCESS
000000:     rs2: MON$REMOTE_PROTOCOL
000000:     rs2: MON$ROLE
000000:     rs2: MON$SERVER_PID
000000:     rs2: MON$STATE
000000:     rs2: MON$STAT_ID
000000:     rs2: MON$TIMESTAMP
000000:     rs2: MON$USER
000200:     Starting fetch rs3
000000: rs1: MON$CALL_STACK
000001:     Executing stmt2
000000:     Executing stmt3
000200:     Starting fetch rs2
000000:     rs2: MON$CALLER_ID
000000:     rs2: MON$CALL_ID
000000:     rs2: MON$OBJECT_NAME
000001:     rs2: MON$OBJECT_TYPE
000000:     rs2: MON$SOURCE_COLUMN
000000:     rs2: MON$SOURCE_LINE
000000:     rs2: MON$STATEMENT_ID
000000:     rs2: MON$STAT_ID
000000:     rs2: MON$TIMESTAMP
000200:     Starting fetch rs3
<...>
000000: rs1: RDB$CHECK_CONSTRAINTS
000000:     Executing stmt2
000000:     Executing stmt3
000201:     Starting fetch rs2
000000:     rs2: RDB$CONSTRAINT_NAME
000000:     rs2: RDB$TRIGGER_NAME
000200:     Starting fetch rs3
000000:     rs3: RDB$TRIGGER_14
000000:     rs3: RDB$TRIGGER_15
000000:     rs3: RDB$TRIGGER_16
000000:     rs3: RDB$TRIGGER_35
000000: rs1: RDB$COLLATIONS
<...>

FWIW, I did tested the old protocol on real network with similar latency too. The result was very resemblant.

The total times are:
Old protocol
55.5s
New Protocol
18.5s
New Protocol minus round-trip times of rs3
10.1s

So we may see that:
1) The old protocol is a crap.
2) The new protocol does a good job removing some "unnecessary" round-trips.
3) It may be improved.

The first interesting problem is the series of statement preparations. The preparation of a statement may result in an error, so it necessary involves a round-trip so we can get the status from the server. But does it really needs it? Using Oracle JDBC driver, a prepareStatement with a wrong statement does not throw an exception. It's deferred. In fact, this is a contract, and a change in a contract would require some way for the client applications to enable or disable it.

The second interesting thing is that executing a statement that needs fetch does not involves a round-trip. Does it happens too when selecting from a SP? Won't this be a contract change as well, since selectable SPs could write to the database? I didn't did further tests... Anyway, that's what allows the optimizations I talk below.

Then we start the fetches. The first fetch for rs1 necessarily involves a round-trip and buffers some records in the client. Subsequents fetches are resolved locally.

The third interesting thing is about the nested queries. Since queries are executed, a round-trip would be inevitable, but it does one for each query. A more intelligent approach would be to fetch rs3 too when asked to fetch rs2, since it's already executed. In this test, it would run ~8s faster. And a much more intelligent approach would do it in the background (and transmitting asynchronous), without putting some overhead on the rs2 call.

With a super intelligent server/client/protocol, the client and the server may cache prepared statements, and the server may asynchronous send statements invalidations to the client. Then, preparation of statements would have no round-trip (when already prepared one time), being sure that the statement is not wrong.

So what you may do now?
1) Do not talk with the server in this poor (and common) way. Use selectable SPs to run multiple queries.
2) Do not prepare/close/prepare your common statements. Cache them in your application.