Big Data Workshop, April 23, 2010
Title: Hbase – What? Why? Where does it fit?
Convener: Jon Gray Notes-taker: Mason Ng
- Jon’s background – His startup used Postgres and had problem. He then started to use Hbase. Joined Facebook.
- List of things to discuss:What is Hbase – distributed, column oriented, scalable database
- Coming from relational world perspective
- Hbase does not use local filesystems, only use HDFS
- Cassandra uses local filesystems and then replicate itself. One read could be pulling from 3 nodes. AP (Availabile, Partition) systemHbase uses HDFS replication. CA (Consistent, Available) system
- No transaction across row. No ACID.
- Within a row, the transaction is Atomic.
- Acid property, row-> column family – c1->[versions…], c2->[versions….]
- Column family has its own file(s)
- Relational random reads. Hbase/HDFS access sequential writes/reads.
- Buffering writes. Inserts buffers in-memory and then batch write (flush) to HDFS. Table breaks into shards. Each shards only on one node.
- rowA-rowD in memory flushes 64MB to disk
- Update by versioning. Does not actually delete/update.
- How bigtable and hbase differ. Hbase is in Java.
- Bigtable in C++. Lots of crap needs to manage with Java and therefore hbase.Use zookeeper
- Supports random and sequential access. Supports RDF.
- Hbase background tasks compaction – takes lots of 64MB shards and compacts to one big chucks (about 3 shards) split – redistribute a-b and b-d to different shards
- Data model
- Processing of hadoop. Needs random access. Could not scale relational.
- Blog / RSS aggregators. Source, stamp, itemid
- Source a, b, c insert in random. Ends up with large btree. Merge 2000 sources and each source has 10000 items. Relational then delegates to query engine.
- Wants source A has this list of items – column oriented data. Items are sorted/mapped. Item stamp->id. Source (key) A is a large row across a sequential (3) 64MB chucks.
- Schema is not fixed but delegates to the application to create/extend the schemas.
- Time is last. Wants to get latest copies. May not be efficient because needs to skip.
- FB is not using Hbase but evaluating.
- @FB Hadoop shop. HDFS committers. Using Hive. Use hbase for the needs of incremental update.
- Hbase vs. Cassandra – needs eventual consistency then use hbase. Log shipping replication for colo replication. Hbase provides audit trails for slave replication.
- Hbase slower comparing to Cassandra on random access due to HDFS layer.
- Hbase and Cassandra should converge over time. Cassandra would get better at range scan. Hbase would get better at random access.
- 100 nodes per cluster. Use of segmenting. Stumbleupon has a cluster for website. Copy data to another offline cluster for mapreduce processing.
- Special table lives on one node. If there is a hot row, then the special table could be bottleneck. Use zookeeper for the special table due to read replication with zookeeper to distribute the reads.
- Hadoop namenode single point of failure. GFS2 uses bigtable to store the meta. GFS1->bigtable->GFS2->biggertable. How to scale namenode and make it highly available.
- Hbase vs. hypertable. Hypertable is written in C++. Community is supported by strong apache community.
- Subject to GC performance due to Java dependency on hadoop/hbase. Total throughput depends on GC performance. G1 garbage collector coming, defrag memory.
- Hbase does not need to run with hadoop. Hbase could run on local filesystem on one node.
- There is data loss until next Hbase release due to HDFS limitation such as append write to the log file. If writer dies before a close() then data loss could occur.
- @FB Hive and Hbase integration. Also looking at HBQL. ORM build on top of
- Hbase. Hadoop + Hbase stack can provide rollup archive.