However I was wrong - and had the opportunity to correct that misconception. The holiday season in December gave me some time to attend to a nagging latency problem with an application called OpenTSDB (or simply TSDB). Its a simple, yet very effective, scalable and highly performant timeseries datastore used primarily for monitoring and metrics data. Our usecase was nontrivial with 550+ million metrics per day being injected into the datastore and 20-30 concurrent users monitoring and displaying graphs that would fetch upto a few million metrics per graph and refreshed periodically. With so much activity going on in the system, it was not realistic to get sub-second graph refreshes - and I deemed it acceptable to have some latency to refresh a graph that fetches say, 2 million data points at each refresh. In TSDB once a graph is loaded on the browser, it does not disappear when the next refresh is scheduled. In fact, it is replaced only when the next graph is ready. Thus the user does not see a blank screen, but a graph that is replaced immediately.
Anyway, I decided to give this issue some attention. This required pouring over the TSDB logs and turning on garbage collection logging. As a result, I found that users were experiencing two kinds of latencies from TSDB. The first kind were long GC pauses that occurred quite frequently within TSDB. The second were delays from HBase, the underlying NoSQL datastore. This article focuses on the former.
With the frequent long gc pauses, it was obvious that the TSDB daemon JVM was working very hard to manage memory reclamation. Memory churn (object creation and collection) was probably more a measure of the activity in the system and not something that I could easily and directly influence. Since reclamation was triggered when the fractions of free heap memory falls below a threshold, it seemed that the JVM was crossing that used memory threshold very frequently. There were 2 easy options to help dampen it - increase heap memory or lower the free heap memory threshold(i.e. increase the occupancy threshold). I increased the TSDB memory limit from 6GB to 12GB in 2GB increements and saw noticable reduction in the frequency and duration of garbage collection.
By this time I had given gc a good look over and realized there were still a number of tuning knobs. But first, given the parallel, CMS and G1 garbage collectors, I wanted to find one that would go a long way - one that would give a decent improvement without any explicit tuning. And that work led to the revelations in this presentation.
For the TSDB usecase, I discovered that G1 performed best out of the box. However, I did see occasional multi-second GC pauses. While 90% of the GC pauses were reasonable (within 200ms), the 10% of the GC pauses were long. However G1 did have the lowest gc frequency and the least amount of total gc time. The only further tuning that I had to do was to add the following JVM flags and things became smooth.
-XX:InitiatingHeapOccupancyPercent=70
-XX:MaxGCPauseMillis=100
-XX:+ParallelRefProcEnabled
InitiatingHeapOccupancyPercent changes the value at heap occupancy level (in %) at which the G1 garbage collection will start a concurrent garbage collection cycle. The default value of 45 is a little too early and does not (or did not) result in much garbage collection.
As the name indicates, MaxGCPauseMillis is a goal for the garbage collector to try to adhere to. Essentially G1 collector will use heuristics from previous cycles to estimate the amount of space it will try to collect to meet the GC pause goal.
And ParallelRefProcEnabled enables multiple/parallel threads during the concurrent collection cycle of the Old generation.
These changes and few changes at the HBase level (related to compaction and hstore tuning) got us back happy TSDB users again.