Saturday, September 4, 2010

Using Groovy script for ETL, and more about the ERP

After 7 months of hard work, we have successfully migrated off a primitive console base system. The console base system consist of just 2 tables:
  • Order + Line Items (this is one table!)
  • Products (It was called Inventory, but really it just keep track of the product code, description and the Stock on Hand)
As a result of the primitive construct, little could be derived from the data. In phase one, we were able to make use of the data in the old system and merge it with 3 years of purchasing record stored in some excel documents. The ETL exercise itself was quite laborious. Merging 15 years of sales record and 3 years of purchase record was no small feat! Groovy scripts proved to be very handy for this ETL exercise, and Groovy's Java heritage, there was shortage of connectors to different data systems. For we have two sources: FoxPro DBF file and Excel XLS file, and the target system is a MS Access accdb file.  Here is an example to demonstrate how concise the connection code is, which free us to concentrate on the transformation logic:

def src= Sql.newInstance("jdbc:odbc:Driver={Microsoft Excel Driver (*.xls)};DBQ=<path to xls file>;DriverID=22;READONLY=true", "", "", "sun.jdbc.odbc.JdbcOdbcDriver");

def dest = Sql.newInstance("jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=<path to accdb destination file>;pwd=<accdb file password>", "", "", "sun.jdbc.odbc.JdbcOdbcDriver");

src.eachRow("SELECT [PRODUCT ID], [PRODUCT CODE], ... WHERE ...", {
  // function body to process each row obtain from your source using the SQL statement
  // it.<field name> to get the value: for example:
  def productId = it."[Product ID]"

  // ... more code to get stuff from source

  // ... probably logic to run different insert statements base on different value from the source and business logic

  // an example to insert data into your target
  dest.execute("INSERT INTO [Products] ([Product ID], [Product Code], ...) VALUE (?,?, ...)", [productId, productCode, ...])

The new system went live after 3 weeks of hard work and we ran the old and new systems in parallel for 1 quarter and a bit. As of August, I am happy we are comfortable enough with the new system and since retired the old FoxPro system. During the last 1 quarter and a bit, development of the new system has continue to evolved. The functionality of the system and integration with the company is depicted in the following diagram:

There are two main problems with the above ecosystem:
  1. Salesman, Inventory Controller, Shipper and Couriers all rely on the clerk to access the system.  
  2. The analysis and data mining activity by the managers are stretching the limit of Access.
In the next phase of the evolution we have plan on addressing the above issues and add on more features. In order to prepare for the next phase, I have begun migrating the data in Access to SQL Server Express. This proved to be quite a challenging task, migrating a live system while trying to minimize the interruption to the business. I am in the middle of the process and this whole migration could be a post of its own.