Big Data Workshop, April 23, 2010
Title: Migrating From Small Data to Large – What Grows Well and How?
Convener: Rob Brackett
Notes-taker: Ashley Frank
Session was formed out of the desire of one to discuss possibilities of migration from MySQL with heavy analysis to Big Data technology.
View was expressed and supported that some kind of standards need to evolve to have some interoperability between Big Data implementations but it was realized that because there is not much consistency in the contracts and expectations so these standards would be tough to form.
Comparisons between Amazon’s EC2 and Google’s App Engine, App Engine supports Java and Python, EC2 is a virtualized environment for full operating systems.
Comparisons between Microsoft’s Azure and EC2 were discussed. Azure has 2 storage options, Azure Storage (big data style) with REST interface and Azure SQL Server.
Starting on new projects it is recommended to start with the new technology rather than prototyping on old relational technology and migrating.
Why use the new technology?
If you have elastic demand pricing is better. If you will need to sale large it is easy. It solves some licensing and hardware headaches.
Data needs to be near the applications that use it if the data transfer is large or data transfer costs rise.
If you have ever tried to implement the data warehousing strategy for relational databases where you denormalize your fact tables and horizontally partition them, you soon realize that you have broken your ability to join or use indexes in the way you used to and have pretty much abandoned the query optimizer. Your fact tables can only be queried one way and any analytics or other joining must be done after the only selective key that works (the leading column(s) of your partitions) is used. You may end up with redundant data not just in a single view but multiple fact tables to express different views. The result is that you are moving toward this new approach we are calling Big Data with all the baggage of Relational but none of the innovation of Big Data solutions. However it seems that the dimension support and analysis is done by ‘other’ tools or custom solutions in the new Big Data world. Relational vendors could document the path large data requires in the evolution toward denormalization and partitioning of the fact table and at a point, provide the option to migrate the fact table to a Big Data technology and provide the glue for interacting with the Big Data with relational vocabulary and dimensional support as well as other infrastructure for using Big Data new school.