Friday, November 23, 2012

RfC: Improving Mavens Performance

I am typically working in projects that are relatively complex, like one parent projects and 20 modules, or so. To handle the complexity, I have learned to use and appreciate Maven. OTOH, after 8 years or so with Maven, I am still missing some aspects of Ant builds, in particular the speed. Maven does a good job when it comes to understand Build scripts (biggest problem of Ant), but it can be painfully slow. Why is that? I could name several reason, but the most obvious seems to be that Maven is always building the whole project, whereas Ant allows to implement logic like

   if (module.isUpToDate()) {
     // Build it
   } else {
     // Ignore it
Of course, Ant's syntax is completely different, but that's not the point, unless you are a fanatic XML hater and really believe that a Groovy or JSON syntax is faster by definition (If so, stop reading, you picked up the wrong posting!)
The absence of such an uptodate check isn't necessarily a problem. Most Maven plugins are nowadays implementing an uptodate check for themselves. OTOH, if every plugin does an uptodate check and the module is possibly made up of other modules itself, then it sums up.
Apart from that, uptodate checks can be unnecessarily slow. Suggest the following situation, which I have quite frequently:
A module contains an XML schema. JAXB is used to create Java classes from the schema If the schema is complex, then the module might easily have severeal thousand Java source files.
This means, that the Compiler plugin needs to check the timestamps of several thousand Java and .class files, before it can detect that it is uptodate. Likewise, the Jar Plugin will check the same thousands of .class files and compare it against the jar file, before building it.
That's sad, because we could have a very easy and quick uptodate check by comparing the time stamps of the XML schema, and the pom file (it does affect the build, does it) with that of the jar file. If we notice that the jar file is uptodate with regard to the other two, then we might ignore the module at all: Ignore it would mean to completely remove it from the reactor and not invoke the Compiler or Jar plugins at all. Okay, that would help, but how do we achieve that without breaking the complete logic of Maven? Well, here's my proposal:
  1. Introduce a new lifecycle phase into Maven, which comes before everything else. (Let's call it "init". In other words, a typical Maven lifecycle would be "init, validate, compile, test, package, integration-test, verify, install, deploy" (see this document, if you need to learn about these phases.
  2. Create a new project property called "uptodate" with a default value of false (upwards compatibility).
  3. Create a new Maven plugin called "maven-init-plugin" with a configuration like
       groupid: org.apache.maven.plugins
            artifactId: artifactid>="maven-init-plugin"
            configuration:
               sourceResources:
                 sourceResource:
                   directory: src/main/schema
                   includes:
                     include: **/*.xsd
                 sourceResource:
                   directory: .
                   includes:
                     include: pom.xml
               targetResources: ${project.build.directory}
                   includes:
                     include: *.jar
        (Excuse the crude syntax, I have no idea how to dixplay XML on blogspot.com!
         I hope, you do get the idea, though.)
        The plugins purpose would be to perform an uptodate check by comparing source-
        and target resources and set th "uptodate" flag accordingly.
      


  • Modify the Maven core as follows: After the "init" phase, search for modules with isUptodate() == true and remove those modules from the reactor. Then run the other lifecycle phases.
  • That's it. Perfectly upwards compatible. Moderate changes. Much faster builds. How about that?

    Friday, November 16, 2012

    DB2 Weirdness

    In the year 2012, what serious database might require code like this:
    private ResultSet getColumns(DatabaseMetaData pMetaData,
                                 String pCat,
                                 String pSchema,
                                 String pTableName)
        throws SQLException {
     if (pMetaData.getDatabaseProductName().startsWith("DB2")) {
       final String q = "SELECT null, TABSCHEMA, TABNAME, COLNAME," 
      + " CASE TYPENAME"
      + " WHEN 'BIGINT' THEN -5"
      + " WHEN 'BLOB' THEN 2004"
      + " WHEN 'CHARACTER' THEN 1"
      + " WHEN 'DATE' THEN 91"
      + " WHEN 'INTEGER' THEN 5"
      + " WHEN 'SMALLINT' THEN 4"
      + " WHEN 'TIMESTAMP' THEN 93"
      + " WHEN 'VARCHAR' THEN 12"
      + " WHEN 'XML' THEN -1"
      + " ELSE NULL"
      + " END, TYPENAME, LENGTH FROM SYSCAT.COLUMNS"
      + " WHERE TABSCHEMA=? AND TABNAME=?";
       final PreparedStatement stmt =
         pMetaData.getConnection().prepareStatement(q);
       stmt.setString(1, pSchema);
       stmt.setString(2, pTableName);
       return stmt.executeQuery();
     } else {
       return pMetaData.getColumns(pCat, pSchema, pTableName, null);
     }
    }
    
    or this:
      private ResultSet getExportedKeys(DatabaseMetaData pMetaData)
         throws SQLException {
        if (pMetaData.getDatabaseProductName().startsWith("DB2")) {
          final String q = "SELECT null, TABSCHEMA, TABNAME,"
          +  " PK_COLNAMES, null, REFTABSCHEMA, REFTABNAME,"
          +  " FK_COLNAMES, COLCOUNT FROM SYSCAT.REFERENCES"
          +  " WHERE TABSCHEMA=? OR REFTABSCHEMA=?";
          final PreparedStatement stmt =
            pMetaData.getConnection().prepareStatement(q);
          stmt.setString(1, "EKFADM");
          stmt.setString(2, "EKFADM");
          return stmt.executeQuery();   
        } else {
          return pMetaData.getExportedKeys(null, "EKFADM", null);
        }
    }