Session: Hbase – What? Why? Where does it fit?

Big Data Workshop, April 23, 2010
Session 2C
Title: Hbase – What? Why? Where does it fit?

Convener: Jon Gray    Notes-taker: Mason Ng

  • Jon’s background – His startup used Postgres and had problem. He then started to use Hbase. Joined Facebook.
  • List of things to discuss:What is Hbase – distributed, column oriented, scalable database
  • Coming from relational world perspective
  • Hbase does not use local filesystems, only use HDFS
  • Cassandra uses local filesystems and then replicate itself. One read could be pulling from 3 nodes. AP (Availabile, Partition) systemHbase uses HDFS replication. CA (Consistent, Available) system
  • No transaction across row. No ACID.
  • Within a row, the transaction is Atomic.
  • Acid property, row-> column family – c1->[versions…], c2->[versions….]
  • Column family has its own file(s)
  • Relational random reads. Hbase/HDFS access sequential writes/reads.
  • Buffering writes. Inserts buffers in-memory and then batch write (flush) to HDFS. Table breaks into shards. Each shards only on one node.
  • rowA-rowD in memory flushes 64MB to disk
  • Update by versioning. Does not actually delete/update.
  • How bigtable and hbase differ. Hbase is in Java.
  • Bigtable in C++. Lots of crap needs to manage with Java and therefore hbase.Use zookeeper
  • Supports random and sequential access. Supports RDF.
  • Hbase background tasks compaction – takes lots of 64MB shards and compacts to one big chucks (about 3 shards) split – redistribute a-b and b-d to different shards
  • Data model
  • Processing of hadoop. Needs random access. Could not scale relational.
  • Blog / RSS aggregators. Source, stamp, itemid
  • Source a, b, c insert in random. Ends up with large btree. Merge 2000 sources and each source has 10000 items. Relational then delegates to query engine.
  • Wants source A has this list of items – column oriented data. Items are sorted/mapped. Item stamp->id. Source (key) A is a large row across a sequential (3) 64MB chucks.
  • Schema is not fixed but delegates to the application to create/extend the schemas.
  • Time is last. Wants to get latest copies. May not be efficient because needs to skip.
  • FB is not using Hbase but evaluating.
  • @FB Hadoop shop. HDFS committers. Using Hive. Use hbase for the needs of incremental update.
  • Hbase vs. Cassandra – needs eventual consistency then use hbase. Log shipping replication for colo replication. Hbase provides audit trails for slave replication.
  • Hbase slower comparing to Cassandra on random access due to HDFS layer.
  • Hbase and Cassandra should converge over time. Cassandra would get better at range scan. Hbase would get better at random access.
  • 100 nodes per cluster. Use of segmenting. Stumbleupon has a cluster for website. Copy data to another offline cluster for mapreduce processing.
  • Special table lives on one node. If there is a hot row, then the special table could be bottleneck. Use zookeeper for the special table due to read replication with zookeeper to distribute the reads.
  • Hadoop namenode single point of failure. GFS2 uses bigtable to store the meta. GFS1->bigtable->GFS2->biggertable. How to scale namenode and make it highly available.
  • Hbase vs. hypertable. Hypertable is written in C++. Community is supported by strong apache community.
  • Subject to GC performance due to Java dependency on hadoop/hbase. Total throughput depends on GC performance. G1 garbage collector coming, defrag memory.
  • Hbase does not need to run with hadoop. Hbase could run on local filesystem on one node.
  • There is data loss until next Hbase release due to HDFS limitation such as append write to the log file. If writer dies before a close() then data loss could occur.
  • @FB Hive and Hbase integration. Also looking at HBQL. ORM build on top of
  • Hbase. Hadoop + Hbase stack can provide rollup archive.

Session: Introduction to Hypertable  Q&A

Big Data Workshop, April 23, 2010
Session 3F

Titles: Introduction to Hypertable  Q&A
Convener: Doug Judd
Notes-taker: Matthew Gonzales


  • Hypertable is an opensource implementation of Google’s Bigtable
  • Hyperspace is equivalent to Google’s Chubby System
  • Hypertable performance features include: Implemented in C++, Query Cache, Block Cache, Bloom Filter
  • Hypertable has been around for 3 years1.0 Release for July
  • Hypertable Large deployments include – Baidu and Rediff
  • How is Hypertable compared to Hbase?  Ans:  C++ vs. Java / Hypertable chose c++ for performance reasonsConsidering offering Hypertable as hosted solution in the cloud
  • Does Hypertable simulate nodes or do you have to have multiple servers?  Ans:  You can run it on a laptop

Session: mongoDB

Big Data Workshop, April 23, 2010<
Session 4F

Convener: Aaron Stable
Notes-taker: Matthew Gonzales


  • NoSQL really means: non-relational, next-generation operational data stores and databases
  • Index any combination of fields and subfields
  • MongoDB 1.6 release in July 2010 – Focus on strong consistency using sharding and replica sets
  • Best Use Cases – High Volume, Scaling Out, Caching
  • Less Good At – highly transactional, ad-hoc business intelligence, problems that require SQL
  • Production Examples –, foursquare, etsy, nytimes

Session: GLUE – Using Multiple Databases Together Efficiently

Big Data Workshop, April 23, 2010
Session 2I
GLUE – Using Multiple Databases Together Efficiently

Convener: Josh
Notes-taker(s): Chris BunchKey Questions:

1) How does one grow from a SQL-database to a NoSQL solution?

2) How does one use multiple DBs in the same environment? Sometimes input formats and replication strategies differ in a non-trivial way.

Can support transactions across DBs, but at what cost to scalability?
Fundamentally: What is the value your business places on consistency?
Glue is commonly used in Service Oriented Architectures and connecting message queues between application servers and databases

Strong desire to stop reinventing the wheel: either patterns or open-source solutions should emerge.

Session: New Apps Enabled by Scalable Database

Big Data Workshop,  April 23, 2010
Session - 1G
Title: New Apps Enabled by Scalable Database
Convener: Doug Judd / Andy Lee
Notes-taker: Matthew Gonzales
  • Social Apps are most popular with those using app engine
  • Observation – Loud in room G session 1 with construction noise in the background
  • Geographical location based games are enabled and popular with Scalable Databases
  • Gov’t, Medical…want decision engine and not as interested in storing data
  • What does it mean to say “big data”  what size is considered big?
  • Observation – Hard to know who has what experience while discussion is going. Should have started with introductions first.
  • Pluto is no longer a planet

Notes template – plain text

Big Data Workshop
Notes Taker Form
If taking notes electronically onto this from please ‘save as’ using name of session and email this document as an attachment to: and put the session number and location letter in the subject line.  If taking notes by hand please come to Documentation Center to type them up on a computer – it will only take a few minutes – Thank You!
Session Number_____ Session Location _____
. Tags for the session – technology discussed/ideas considered:
. Discussion notes, key understandings, outstanding questions, observations, and, if appropriate to this discussion: action items, next steps:

Big Data Questions

When people registered we asked them to share the questions they have about Big Data here they are below. Hopefully these questions will be addressed by the attendees gathered at the Big Data Workshop next Friday.

  • Nevermind big data, what about big metadata?
  • How can the enterprise data management industry better serve the web’s data management problems?
  • Interested in developments in the area of big data, what applications have beendeveloped, migration of large structured RDBMS to NoSQL, if transaction based processing is a consideration. This is coming from the perspective of working for a large investment firm and wanting to determine the applicability of these technologies to “”traditional”" RDBMS environments and what other opportunities existfor possible deployment.
  • How can we best evaluate the design tradeoffs underlying different nosql technologies?
  • Will DataWarehouse and Transactional converge?
  • What are people buying today for large DBs?
  • What are current best practices for bioinformatics and sequence analysis?
  • What new data technologies are emerging, being adopted or not?and more importantly, why.
  • Interested in hearing how other people cope with large quantities of data, especially withrespect to storage on Amazon Web Services (S3), doing real-time analytics and such.

We also asked topics people wanted to present about their own work related to Big Data:

  • mongoDB.
  • CouchDB
  • JavaScript
  • Creator of Hypertable, a high performance, open source implementation of Google’s Bigtable. Will  highlight the difference between Hypertable and the otherscalable database alternatives.
  • A super efficient, scalable, database with excellent map reduce properties.
  • Biology: Now Under Moore’s Law — Informatics is now the bottleneck for High-ThroughputSequencing (HTS)
  • HBase
  • How we are implementing data cubes for traffic data and using it for real-time analytics
  • HBase committer so can present on that. Also work with big data at Facebook which I can speakabout in general but not specifically what I work on.
  • Semantic Web technologies and Data Integration
  • Redis via Java client, and its integration with other frameworks in this area
  • upcoming features in Cassandra 0.7 and 0.8
  • interested to see what other people are doing in the field

NoSQL at SXSWi 2010

The Beyond LAMP: Scaling Websites Past MySQL session on 3/14 9:30 was one of the most popular at SXSW, with panelists from Twitter, Imgur, Facebook, Reddit. Notes on this session:  Tweets William Hertling Infochimps Sara Davies

That evening Infochimps sponsored a Data Cluster Meetup featuring a NON-RELATIONAL DATABASE SMACKDOWN between Cassandra core team committer Stu Hood (Rackspace), CouchDB core team committer Jan Lehnardt (Apache), and MongoDB evangelist Wynn Netherland (Orrka, TweetCongress). Tweets on the session Postmortem by Toby Jungen Wynn and Adam recorded it for their Audio: NoSQL Smackdown;  below are my notes from it, which are a work still in progress.

Everyone’s claiming they are NoSQL, nobody knows what it means, it means many things, scala

We’re all document stores, right? 5:00 Data model. What do you think of huge documents? Define huge.

As big as the machine can fit.

Jan: Guess it differs in Mongo and Couch

Wynn: I believe MongoDB has 4G row limit. All these other competitors use JSON and are untyped. I don’t like that they are untyped. Because you can do massively interesting things with typed data. Might be sorted, you can slice little pieces out of it. If it grows large you might want just a piece of it out. Mongo uses the BSON spec so it’s pseudotyped at the filesystem level. It’s not just strings in the db, has Ints, other types, files etc.

Jan: CouchDB uses JSON, has bunch of data types, the nice thing about JSON is it’s the lowest common denominator of programming languages, can use it easily in any language with little code. JSON is really good for data that changes.

Heckler: Slow? Jan: There’s a compiled Python module that is actually fast, so shut up. (laughter)

Wynn: So, I believe we disagreee on types, and that’s all right.

Jan: The web is not really typed. Most of the people who use the web are not computer scientists. It enables everyone to share data. Having to teach them about datatypes is an arcane artifact of programming, they should be able to just stuff whatever they have in a database. Everyone who is interested in writing apps should not be restricted to that computer science approach.

Stu: I hope people developing web apps are computer scientists! But maybe not, I dunno.

Jan: iPhone App Store has >100k apps on it. It’s a different magnitude of scale, if everyone participated on the open web. The amateurs really have no clue…

WV: That’s a load of crap. How many languages are actually typeless? (besides Perl?)  Everyone developing actually has types, suddenly you go to the DB and all your types disappear.

Jan: That’s not true. JSON defines types, you can do that, but you don’t necessarily have to worry about them, you don’t have to go up front and define them.

WV: You force everybody to rewrite all their programs with these things in mind. That’s a lot of work. There seem to be quite a lot of programs out there… It’s all about compatibility stuff.

Stu: Also, Hadoop is completely unstructured by default. Could do something similar with CouchDB. Dunno why I’m defending CouchDB! Type doesn’t always win.

WV: Traditional applications use a very different model, you have to rewrite your applications according to your model whether it’s type or consistency model, forces you to rethink, could be a good thing.

Wynn: Let’s talk about consistency really quickly. Cassandra has peer to peer model from Werner’s brainchild Dynamo where any node can accept a write and then if enough nodes have accepted it, it succeeds, otherwise not, and at read time you resolve all that. Dunno how I feel about the Couch and Mongo models. Mongo hasn’t actually figured out that part, right:

That’s right there is 2 second delay, is that what you’re talking about? Wynn: No, Mongo is master-slave replication.

I must admit that I’m not a core committer for these projects – I’m a Mongo fanboy,  just an end user.

So if you have a data center in Washington and one in California, you can do a write in one of them, and even if the other is down, depending on your tunables, you can still succeed that write, because no one of those nodes is actually responsible, there’s no one is dedicated to a particular key.

That’s an advantage of Cassandra, but in most applications that’s not needed.

WV: Actually the whole consistency model doesn’t come from what you want at the application level. It’s an artifact of implementation abstractions, leaking up. Either you do it for for fault tolerance or for more concurrency so you can get better read throughput or better write throughput . For those two reasons you have to replicate, guarantee write to all replicas such that my reads are always consistent. Comes at huge cost. You cannot get your quorum, you may have to fail your writes. May not be useful for some apps. These are things that are leaking off from the implementations, through the APIs.  If everybody could get a choice, everyone would want strong consistency, ja? But strong consistency means you have to take a lot of other tradeoffs. Main one is not being able to get much write throughput, other is that there are a number of failure scenarios in which you’ll be dead in water.

Are you saying Dynamo wasn’t user friendly? WV: No, absolutely not. No, actually, so, there’s a range of things. Dynamo predates, we weren’t the first. Consistency model is explicit. It’s not that we the first to provide eventual consistency. In fact most RDBMS give you eventual consistency, you just don’t know it! If you use conventional database there’s a delay when the logs are being shipped and if you read from the slave you do not get consistency. Ja, there is always a window. So why wasn’t Dynamo user friendly? Not only for the consistency level, but also you have to have the key which normally comes from somewhere else, there’s no way to do a list, to figure out what are my ??, you have to have the key, for example from customer database. So when we developed Dynamo it was to support shopping carts, that was one of the use cases, so it made you wade thru the database, .. storage system, you really had a key, so that’s why SV is a user friendly key value storage system, Dynamo not so much. With SV you can do lists, you can do prefix lists on one of my keys and then find things out. That stuff is not in… what’s the name of that one? (laughter)

Heckler: Isn’t S3 built on Dynamo?

No comment! (confusion)

So the answer is no because if you would be an engineer. you would know that if you have to do a list operator on top of this, that’s a completely different internal architecture.   all of these systems … We have to get enormous scale.   All these things consist of modules that are reused. It’s more the principles that matter not the specific implementations.

Stu: I would say Cassandra is more user friendly cause in that case cause we’re not using hashing to determine where key lives. You can do those list operations, treat it like you would BigTable from Google, and get a list of all your keys. I imagine you can do that with the competitors but Cassandra’s implementation is.. better. (laughter)

Jan: You guys focus on the big data problem. the massive scan on all the websites that have that problem, which are like 7? CouchDB is more like the personal DB that you can use for whatever you want to do. It doesn’t force you to think in these, to have these big thoughts, but lets you start small and grow gradually with whatever usage pattern you have. These guys are building Ferraris and dragsters, we (Mongo) are building 100 Accord of databases that everyone can use but get along with for a long long time.

Absolutely, but there’s a reason why Couch rhymes with ouch! (laughter) Anyone who’s used Mongo coming from CouchDB, it’s like night and day in the ease of use getting set up, getting the servers installed, wrappers for your language of choice, and suddenly I don’t have to know what I’m gonna ask for up front. (Seinfeld  reference) It reminds me of when Kramer is doing the moviethon, and he says Why don’t you just tell me the movies you want to watch? It’s the same thing where you have to materialize your views up front.

Jan: Do indexes magically appear with no performance hit?

Well indexes are one thing but users are completely different.

Realisticl   Can get around if I have a low edge case

Anything between dynamic and Couch is full of water.

17:45 you should try Neo then

18:00 WV: Let me tell you why these guys suck. (laughter) You should not run your own database any more. That time is passed. These guys force you to run your own database, to manage replication, to manage all of that.

Jan: What do you do if your DSL provider craps out. You’re dead in the water with a great cloud no one can reach. WV: You go to a bar, get a few beers… Jan: And your customers leave you right and left while you’re offline.

19:00 WV If you aggregate all these customers we have … You’re wasting your time. I love building this database stuff. I could build 10 more Dynamos, it’s really cool, but I’m not solving your customer’s problems, because I’m forcing them to have a lot of operational skills.

19:30 Jan:  Part of what we’re doing is abstracting the database away. It’s just there, you can just use it. My mom should be able to run a CouchDB server without knowing it.

Don’t you all want to *be* one of the 7 biggest sites? So why not build for it?

WV: Actually I want to argue against it only being the 7 biggest sites. Big Data: That’s why we’re here. How many of you are not from the 7 biggest sites? Most of you. Everybody has petabyte datasets now!

Jan: Big providers like Apple, Facebook, they own all your data, all the URLs. People should be able to put their own data under the URLs they control. Privacy laws in Germany, you can’t…

WV: Yes you can! The Sept 1 new privacy law has a definition of a data processor. With SV you can use it as a data processor.

Jan: I’m thinking of a specific policy that you have to prove a user’s data was deleted on request, if it’s in the cloud you can’t do that.

WV: We comply with safe harbor rules. Data protection directorate of the EU has very explicit rules on what you have to do: have to be allowed to retrieve your data before moving it..

22:10 Werner, the rest of us agree against you in the sense that we’re all open source.

WV: You should be building better value for your customers, not better databases.

Jan: That’s what we do with the local databases. We give Salesforce as an example. You have a local version of Salesforce, if your connection is down..

Heckler: How often does Salesforce go down?

Jan: Oh, it does happen.

Wynn: Are you saying Amazon should let you download the whole database and shop locally?

Heckler: Yes!

I hear noSQL often. I see posts. Like Web 2.0 couple of years ago, not defined. How many people think it means big and scaling? How many think nonrelational schemas? We need to agree on terms so we can have these smackdowns.

Jan: Very fast login, looking at Memcache, Redis, Mongo; P2p replication => CouchDB; …..? => S3 and the stuff Amazon and others are doing….. if 100,000 of servers I need to keep busy => Hadoop or Cassandra

Stu: I would like to point out that this is a Big Data meetup. SimpleDB has a 10 gig limit?

WV: You have to do your own partitioning. When I think about NoSQL: Any data storage, the default application or service was a relational database cause that was the only choice. What drove us to build other DBs, was if you look closer at what your processing is, you can decompose, different steps have different requirements, for each you can find a solution that is very fast and very reliable. IT’s technology developed in the 80s that we’re expecting to 2000s requirements. If you dump all requirements in one bucket, it’s impossible to meet.

26:00 WV Not tht I think SimpleDB is .. it’s a whole bucket of solutions.

Stu: You say impossible, I say just not discovered yet.

WV:For example if you want to do internet everything, If you want inner transactions, multilevel views… if we had built infiite…

Stu: Cap theorem, we’ve all heard of it. none of us have transactions, so skip that. .. has transactions?

WV: .Conditional, which are actually in line with eventual consistency. Under the covers, SQL DB is still an eventually consistent system. There’s just on top so you can use both.

Stu: I’d just like to point out we have Cassandra users with multiple terabytes per node. Twitter, Digg, Reddit, FAcebook.

Jan: Couch supports that: BBC,

How many sites started supporting that scale.

If they had seen the future they would have started on Cassandra!

WV: Think about anybody who builds a Facebook game today. You can go from 0 to 25 million users in a month. Imagine all the logging, objects you hgave to keep around. You run the marketing campaign on the web,it’s not just.  it’s social gaming…. terabytes of data quickly.

So let’s talk about scenarios. Can Couch or Mongo update… ?Can Cassandra update documents incrementally

Stu: Cassandra can update incrementally. We have very large rows. People build indexes within a single row

Updating a key in a hash?

Jan: You write a Javascript funtion for that. We have a standard library but people haven’t asked for that a lot.

WV: You guys are open source. So if you put out a release, do your customers have to take the database down.

Jan: No. CouchDB has a very robust storage model. Same file format, hasn’t changed for several versions. On top of that CouchDB s written in Erlang which lets you update version at runtime, live upgrades built in.

Stu: Cassandra is changing file format soon, you will have to restart the cluster. Never say never.

WV: How with 10000 notes? Stu: Rolling restart. WV: How long does that take?

WV: You should not be worried about this stuff. This so old fashioned, so 1990s.

Stu: I disagree. With Cassandra You can run.a single node, you can get another node running easily. We have 45 node installs, Twitter running on 45, FB on 150. It’s easy enough to grow your cluster. It may be easier than using EC2!

Q: I’d like to bring it up a notch. I’m a developer, I write Erlang, but it’s your data. Replication means any copy of the data. None of these guys, we’re zigging, you’re zagging. I want to share photos of grandma, don’t want to ask Zuckerberg any favors.

WV: … key value just mapp addressable stuff, that’s the way to go.

Stu: Does Grandma know how to user curl? I assume you have to develop an app for her…

In terms of performance? We don’t even need to talk about it because Cassandra has you guys topped.

Jan: The properties it comes with is … Mongo.. tens of thousands of connections it supports. without falling over.

Wynn: I would argue Mongo is fast enough. It’s in C.

Jan: You don’t have a concurrency story. It doesn’t scale concurrently.

WV: How easy is it to hook up? caching all over the world. SV doing 35000 transactions per second?

Stu: Cassandra can do 25000 requests per second per node!

Q: Talk about transactions. Are transactions fixed?

Jan: He’s asking about transactions. Who needs transactions, raise your hands? So that’s your answer. (laughter)

35:00 WV: Transactions have nothing to do with relational databases. You get some ACID guarantees, unrelated to relational

Stu: Also noSQL is about using the right tool for the job. Build transactions with a tool like Zookeeper.

Ecosystems:  Cassandra has a few big installs. Cloudkick, Twitter, Facebook

Couch: BBC, Canonical. Not as big as you can get but probably a few more than you have.

Mongo: Sourceforge

WV: Someone tweeted remark I have gone too far. You protect yourself on multiple different levels. .. There’s really techniques you can use to protect yourself from these kind of failures. Just fancy…

Stu: So how about wide area replication. People are geographically distributed. Cassandra supports natively.

Jan: Couch has multi master replication built in. Just have that.

Mongo: Believe master replication coming.

But it never works with your data model!

WV: Metadata will never leave EU

You get geographical.. reuation   other things as well

Stu: BigTable recently had a outage, I think App Engine. I love google is very open about the cause, out of sync between data centers. Is Mongo planning to break that?

WV: I mean there are advantages on all sides. As always the CAP theorem is .. you get to do the tradeoffs. One of the exercises in Dynamo was we were giving the hands of the developers. Plus I think the .. innovation is. the choice do you want to   consistency model…. can always write to …. it always works.

Stu: I don’t actually have a response for that. I guess it’s possible in CouchDB just because no node knows whether it’s responsible for something

Q: Why not just buy a huge machine and scale up not out? Who needs NoSQL?

Eventually that machine goes down, sticky situation, have to use patches to MySQL or have your ops team implement

Back to the big data vs schemaless question. If you compare Mongo to ..   highly productive. Let’s face it, a lot of the data you use is not in house you are consuming data from other places, JSON hashes. NoSQL you can just stash the hatch

Jan: When I see people writing a Ruby or Java app with huge middle layer, huge waste of time. We just have .. and a jquery guy. Having a http based db means

WV: Sometimes existing software still needs relational databases.  However there are a ton of applications where if you use ActiveRecord or any standard ORM, it requires MySQL! Developers don’t care. But as soon as you scale or reliability becomes an issue, it becomes .. have all these … that will kill you.

Jan: ActiveRecord is fine but about 25000 lines of Ruby code! CouchDB built with simplicity in mind. Our DB is smaller than their wrapper! Bloated middleware is boring, slow, just plain sucks.

Wynn: The problem of Couch is you have to do everything yourself! You have to drop down to MapReduce, Javascript to do anything. A lot of folks I work with feel you have to have Hazmat gloves to handle Javascript.


you’re going to give a wb designer a CouchDB database?

Q: Data models

There are names for what we do. In relational you want to normalize, in nonrelational you want to denormalize. Just really that simple. Duplicate, that’s what we say.

Closing question: If you couldn’t use your own product, what would you use?

I think I’d use Preservere which is actually written in Javascript.

Because all the other languages are boooooring! But does it scale?

Stu: Riak’s intersting but closed source. Voldemort doesn’t have ordered keys and I love ordered keys.

Wynn: If not Mongo, maybe Couch, depends on scenario, dynamic. Check out Redis or other systems that should be here. Hope I didn’t deter anyone form Mongo

WV: One DB that’s left out – Neo4j is different from others. It’s stored as graphs. Take any social application, multiple relationships, multiple connections, Neo4j just rocks. But how do you partition? Wynn: It’s a CS problem.

Thank you, the Nonrelational Database Smackdown! Woo hoo!


WordPress Themes