Session: mongoDB

Big Data Workshop, April 23, 2010<
Session 4F

Convener: Aaron Stable
Notes-taker: Matthew Gonzales

Notes:

  • NoSQL really means: non-relational, next-generation operational data stores and databases
  • Index any combination of fields and subfields
  • MongoDB 1.6 release in July 2010 – Focus on strong consistency using sharding and replica sets
  • Best Use Cases – High Volume, Scaling Out, Caching
  • Less Good At – highly transactional, ad-hoc business intelligence, problems that require SQL
  • Production Examples – Justin.tv, foursquare, etsy, nytimes

Session: GLUE – Using Multiple Databases Together Efficiently

Big Data Workshop, April 23, 2010
Session 2I
GLUE – Using Multiple Databases Together Efficiently

Convener: Josh
Notes-taker(s): Chris BunchKey Questions:

1) How does one grow from a SQL-database to a NoSQL solution?

2) How does one use multiple DBs in the same environment? Sometimes input formats and replication strategies differ in a non-trivial way.

Can support transactions across DBs, but at what cost to scalability?
Fundamentally: What is the value your business places on consistency?
Glue is commonly used in Service Oriented Architectures and connecting message queues between application servers and databases

Strong desire to stop reinventing the wheel: either patterns or open-source solutions should emerge.

Session: New Apps Enabled by Scalable Database

Big Data Workshop,  April 23, 2010
Session - 1G
Title: New Apps Enabled by Scalable Database
Convener: Doug Judd / Andy Lee
Notes-taker: Matthew Gonzales
Notes:
  • Social Apps are most popular with those using app engine
  • Observation – Loud in room G session 1 with construction noise in the background
  • Geographical location based games are enabled and popular with Scalable Databases
  • Gov’t, Medical…want decision engine and not as interested in storing data
  • What does it mean to say “big data”  what size is considered big?
  • Observation – Hard to know who has what experience while discussion is going. Should have started with introductions first.
  • Pluto is no longer a planet

Big Data Questions

When people registered we asked them to share the questions they have about Big Data here they are below. Hopefully these questions will be addressed by the attendees gathered at the Big Data Workshop next Friday.

  • Nevermind big data, what about big metadata?
  • How can the enterprise data management industry better serve the web’s data management problems?
  • Interested in developments in the area of big data, what applications have beendeveloped, migration of large structured RDBMS to NoSQL, if transaction based processing is a consideration. This is coming from the perspective of working for a large investment firm and wanting to determine the applicability of these technologies to “”traditional”" RDBMS environments and what other opportunities existfor possible deployment.
  • How can we best evaluate the design tradeoffs underlying different nosql technologies?
  • Will DataWarehouse and Transactional converge?
  • What are people buying today for large DBs?
  • What are current best practices for bioinformatics and sequence analysis?
  • What new data technologies are emerging, being adopted or not?and more importantly, why.
  • Interested in hearing how other people cope with large quantities of data, especially withrespect to storage on Amazon Web Services (S3), doing real-time analytics and such.

We also asked topics people wanted to present about their own work related to Big Data:

  • mongoDB.
  • CouchDB
  • JavaScript
  • Creator of Hypertable, a high performance, open source implementation of Google’s Bigtable. Will  highlight the difference between Hypertable and the otherscalable database alternatives.
  • A super efficient, scalable, database with excellent map reduce properties.
  • Biology: Now Under Moore’s Law — Informatics is now the bottleneck for High-ThroughputSequencing (HTS)
  • HBase
  • How we are implementing data cubes for traffic data and using it for real-time analytics
  • HBase committer so can present on that. Also work with big data at Facebook which I can speakabout in general but not specifically what I work on.
  • Semantic Web technologies and Data Integration
  • Redis via Java client, and its integration with other frameworks in this area
  • upcoming features in Cassandra 0.7 and 0.8
  • interested to see what other people are doing in the field

Blog

WordPress Themes