Facebook Open Sources Presto SQL Query Engine

November 12, 2013 | Comments(0) |

In June 2013 at Analytics @ WebScale conference, Facebook announced Presto which they were using internally to process petabytes of data. It has now been made open-source as per a recent post by Facebook Engineering.

So what is Presto?

Hive, which was initially developed by Facebook used MapReduce chaining to transform a query into multiple MapReduce Jobs. Presto different as it does not use MapReduce & is 10 times faster that Hive for most queries as per Facebook. Presto allows querying data where it lives, including Hive, HBase, relational databases or even proprietary data stores. You can issue SQL like queries on Presto that include left/right outer join, subqueries or even common aggregate functions. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Presto Architecture Diagram (source: Presto Website)

Facebook uses Presto internally to interactively query over a petabyte of data by about 1000 employees running more than 30,000 queries a day. Currently its also being used by leading internet companies including Airbnb and Dropbox.

 

You can find more about Presto here :

Presto Website
Facebook Blog about Presto
Gigaom Story


Leave a Reply