Saturday, August 28, 2010

Environment Monitoring Probe Optimization and the "-Xnoclassgc" Java parameter

By integration an existing monitoring software and some wireless reader and sensor technology, we have quickly came up with a prototye software to do environment monitoring as mentioned in another post. We where able to deploy a POC with a client and the feedback was positive.  However the client did notice some lingering Java process on the server where the software was deployed.  This is okay for a POC, but probably not okay for a live site with a lot of probes.  Luckily with our software, we are able to monitor the response time of these probes and chart it.  In a our test-bed with more than 100 sensors, I have picked 50 sensors and charted their response time.  They are indicated in the chart below.  During the time frame annotated with (1), it is the response time of the first generate probes.


Initially, I was hoping for a quick fix with some optimization parameters. Since this is a short-lived Java program, I used the "-Xnoclassgc" parameter and boom! the response time dropped by more than a third.  The response time can be seen marked as (2) in the chart.  If you do a quick google about the "-Xnoclassgc" parameter, you will get a lot of warning about using this parameter.  For example, this article titled "Java’s -Xnoclassgc considered harmful".  However, for a short-lived program, this is one of the situation which warrants using this parameter.

With the -Xnoclassgc fix, the response time of the probes are still taking almost 2 seconds, while when I do a network ping, it is roughly taking between 100~200 ms. In order to address this issue, a re-architecture of the probes were necessary.  While for the initial development, I was keeping the quick-to-market approach in mind now it is time to take it to the next-level for real production usage.

To ensure the long time viability of this solution, I have also setup proper source control and redundant repositories to ensure the code is not lost.  With the new version with the new streamline code base, the solution has been optimize in both size and speed.  The response time is now comparable to a ping command between 100~200 ms as indicate in section (3) in the chart.  The deployment package was also shrunk from over 1.1 MB to 21 KB.  With this improvement, there are no longer any lingering Java process on the server!