Facebook just open sourced its Hadoop solution called Presto for doing SQL queries on Big Data.
Interesting features of this system:
- It doesn't use MapReduce paradigm.
- It's couple of times faster than Hive: "Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook."
- Its data sources are not only HDFS and HBase. One can use other sources which is a matter of implementing a certain API for given data source.
- It seems that the system is already of production quality: "The system is actively used by over a thousand employees, who run more than 30,000 queries processing one petabyte daily.
- In general, it seems that this is a direct counterpart of Google's Dremel/BigQuery tool which we discussed on one of our journal club meetings.
Sources:
- A general description in Computerworld
- A more detailed one on Facebook's engineering blog