Apache HBase
is arguably the leader within this rapidly evolving market and numerous best
practices have emerged out of the open-source software ecosystem surrounding HBase. Many best practices target specific strengths of HBase and some accommodate various weaknesses, such as
limited support for ACID transactions. In HBase ACID
transactions are supported only across a single row, not multiple rows or
tables. Therefore a common best practice involves grouping potentially large
segments of data, for example an entire user profile, within a single HBase row. Other critical best practices involve the use of
column families and in particular the use and format and design of composite
row and column keys. Composite row key design in particular involves critical
decisions affecting the current and future query capabilities of a table and in
general the performance and even distribution of table data across regions in a
cluster.
The CloudGraph™
implementation encapsulates many HBase best practices
in each of these areas and provides a framework within which to encapsulate
future best practices as they evolve. Complexities
of terse and efficient physical row and column key generation are completely
hidden and the client user is provided with rich configuration capability and a
generated, standards-based API based on one or more domain-specific business
models.