Thursday, December 6, 2012

HBase: finding balance

The disadvantage of abstractions

The interesting thing about abstractions. It's good to make independent parts of the system. And it's fine when it works as you expect. But suddenly it breaks, and you starting to realize you should dive into problem to formulate your expectations precisely. Because, just of of blue, you have to split your problem to gain "micro-expectations". Expectations of lower level than just "make this world happy". Sometimes abstractions just don't work. Sometimes, you have to unfold this black box, and start to hurl bricks.

Hey, it's not a lecture of gnosiology! It's just discourse about Java. And about Hadoop. Adherents of Java are always trying to create abstract things with a statement "it should just work". But when it breaks, you left with huge expectations and no idea how deal with it. And, Java style worsen the situation - it's just harder to "unfold" this black box because of sophistication of creator. All details are carefully hidden. Otherwise, books will not be so transparent, and the idea itself will be unclean.

Still, it's not a lecture :-) Just look at the following problem in HBase

Data locality in HBase

The whole idea of map-reduce (in terms of performance) is data-locality and independance. Jobs are work with their own data. You will gain maximum performance if your data spreaded equally within your cluster. Each job work with local data 'cause access to local HDDs much cheaper than remote HDD and network transmission.

Strong side of abstraction is what HBase itself is just a idea build upon HDFS. And therefore, it has to play HDFS' rules.

When HBase starts, it's regions are balanced throughout region-servers (RS). Bu how does data-locality work in this case? Regions are just a couple of files in HDFS. And HBase have no secret interfaces to HDFS. It simply works using this rule while creating new blocks:

  • Try to put initial block onto requesting server
  • Put second block as near as possible: to the same cluster, even to the same frame
  • Put third block as far as possible, just for backup. To another frame, or even another cluster

And it really works. But then you restart your HBase cluster. Because of error, just for prevention at the end! Anyway, your cluster starting to work slower than before. Why? It's a "law of Windows": to work perfectly after restart/re-install! Why portable Java doesn't follow this rule?!

The problem is: by-default HBase doesn't store block-map. It simply starts with absolutely another distribution of regions. RSes have no meaning about previous state. And you can see higher network load in your monitoring. Hadoop slowly rearrange your blocks. So slowly, what it's better to rewrite them all to recreate data-locality artificially.

I really don't know how to enforce HBase to memorize this state. But here is a simple script to measure locality. Just launch

./count_locality.sh tablename

It's output is data-locality of each RS in your cluster. Locality of 98% - 100% is perfect. Locality lower than 50 percent is certainly bad.

No comments:

Post a Comment