Getting Started With PrestoDB
With a diverse and ever-expanding data warehouse or data lake with various data sources, it can become difficult to derive useful insights in a timely manner. PrestoDB is a distributed SQL query engine for big data that aims to tackle all of these problems in a federated way. It offers many connectors to the different data sources with its own query optimization and can be scaled to any number of machines to parallelize processing.
PrestoDB was born out of Facebook in 2012 after struggling with slow queries running on Apache Hive. The problem with Hive was that it would store intermediate results of its MapReduce queries on disk, which would result in a lot of I/O overhead on disk. PrestoDB, with its new architecture and in-memory engine, would significantly reduce its latencies and query speed, which in turn allows for much more interactive queries. In 2015, Netflix would show that PrestoDB was, in fact, 10 times faster than Hive and even faster in some cases.
The use cases for PrestoDB range from interactive ad hoc queries to long-running batch ETL pipelines, making it flexible for a large variety of data-driven use cases and applications.Section 2
PrestoDB resembles a massively parallel processing (MPP) system that facilitates the separation between storage and compute and allows you to scale its computing power horizontally by adding more servers. This is great for existing data storage systems and does not require migrating data from sources to run queries on them. PrestoDB is written in Java and uses the ANSI SQL specification for running queries with most features from this dialect.
In PrestoDB, you have one coordinator node and multiple worker nodes. In the case of single-node PrestoDB setups, the single node is both coordinator and worker. The coordinator node is responsible for the whole orchestration by accepting, parsing, planning, and optimizing queries. Worker nodes are responsible only for the query processing. The workers and coordinator communicate via an HTTP-based protocol on a single port.
Additionally, PrestoDB uses a discovery service that is running on the coordinator, where each worker can register and periodically send its heartbeat. This runs on the same HTTP server — including the same port.
In the following figure, you can see the full architecture in a simplified manner.
Figure 1: PrestoDB architecture
Image source: Facebook Research, “Presto: SQL on Everything”
More on PrestoDB architecture: https://research.fb.com/publications/presto-sql-on-everything/Section 3
PrestoDB uses connectors to connect to the various data sources. Each connector needs to implement four service provider interfaces (SPIs):
These SPIs provide the interface with which PrestoDB can communicate and query data on the various data sources. The data sources are separated into connector, catalog, schema, and table concepts with the following description:
Connector | Implementation of PrestoDB to a data source such as Hive, PostgreSQL, or any other available data source. |
Catalog | Contains schemas and references to a data source via a connector. |
Schema | Organizes a set of tables. In RDBMSs like PostgreSQL and MySQL, this translates to the same concept in the database. |
Table | Collection of data in terms of rows, columns, and the associated data types. |
When PrestoDB executes SQL statements, they are turned into queries and then planned and executed with its query execution model using the following terminology:
Statement | Statements are defined in the ANSI SQL standard, consisting of clauses, expressions, and predicates. |
Query | The previous SQL statement is parsed into a query and creates a distributed query plan consisting of a series of interconnected stages that contain all of the below elements. |
Stage | The execution is structured in a hierarchy of stages that resembles a tree. They model the distributed query plan but are not executed by the worker nodes. |
Task | Each stage consists of a series of tasks that are distributed over the PrestoDB worker nodes. Tasks contain one or more parallel drivers. |
Split | Tasks operate on splits, which are sections of a larger data set. |
Driver | Drivers work with the data and combine operators to produce output that is aggregated by a task and delivered to another task in another stage. Each driver has one input and one output. |
Operator | An operator consumes, transforms, and produces data. |
Exchange | Exchanges transfer data between PrestoDB nodes for different stages in a query. |
More on PrestoDB concepts: https://prestodb.io/docs/current/overview/concepts.htmlSection 4
To run PrestoDB, make sure to fulfill the following prerequisites for your system:
Current Release | 0.256 |
Downloads Page | prestodb.io/download |
JDK Version | 1.8 or higher |
Python | 2.6/Task3.5 or higher |
Development Version | github.com/prestodb/presto |
Make sure to have Java 8 or 11 installed and make python
execute python3
for the PrestoDB launcher:Shell1
sudo apt update
2
sudo apt install \
3
python-is-python3 \
4
openjdk-8-jdk-headless \
5
openjdk-8-jre-headless \
6
openjdk-8-jre
Download and extract the latest PrestoDB version:Shell1
wget "https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.256/presto-server-0.256.tar.gz"
2
tar -xzvf presto-server-0.256.tar.gz
3
sudo mv presto-server-0.256 /opt/presto-server
4
sudo chown -R $USER:$USER /opt/presto-server
Add the following lines to .bashrc
file:Shell1
export PRESTO_HOME=/opt/presto-server
2
export PATH=$PATH:$PRESTO_HOME/bin
Then, run source ~/.bashrc
for the changes to take effect.
PrestoDB can run both as a single node cluster or a cluster with multiple nodes. For a single node cluster, the coordinator and worker run on the same machine/instance. First, you need to create all of the configuration files for PrestoDB by creating the following folder structure:Shell1
mkdir -p /opt/presto-server/etc/catalog
The required files for a minimal setup:
etc/node.properties
etc/config.properties
etc/jvm.config
etc/log.properties
Let’s have a look at each of the files above. Starting with etc/node.properties
, this is its minimal configuration:Shell1
node.environment=production
2
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
3
node.data-dir=/usr/local/presto/data
4
plugin.dir=/opt/presto-server/plugin
Next, the configuration properties for etc/config.properties
:Shell1
coordinator=true
2
node-scheduler.include-coordinator=true
3
http-server.http.port=8080
4
query.max-memory=5GB
5
query.max-memory-per-node=1GB
6
query.max-total-memory-per-node=2GB
7
discovery-server.enabled=true
8
discovery.uri=http://localhost:8080
The Java settings and flags can be configured in etc/jvm.config
with the following minimal configuration:Shell1
-server
2
-Xmx16G
3
-XX:+UseG1GC
4
-XX:G1HeapRegionSize=32M
5
-XX:+UseGCOverheadLimit
6
-XX:+ExplicitGCInvokesConcurrent
7
-XX:+HeapDumpOnOutOfMemoryError
8
-XX:+ExitOnOutOfMemoryError
Finally, the logging configuration can be set in etc/log.properties
:Shell1
com.facebook.presto=DEBUG
The log levels are DEBUG
, INFO
, WARN
, and ERROR
.
The difference between multiple nodes is in etc/config.properties
, which is where you define the coordinator and the discovery URL for the nodes. Additionally, you need to have a different node.id
specified in each server in etc/node.properties
. The configuration in etc/config.properties
for the coordinator node:Shell1
coordinator=true
2
node-scheduler.include-coordinator=true
3
http-server.http.port=8080
4
query.max-memory=5GB
5
query.max-memory-per-node=1GB
6
query.max-total-memory-per-node=2GB
7
discovery-server.enabled=true
8
discovery.uri=http://presto-coordinator:8080
9
The configuration for the worker nodes:Shell1
coordinator=false
2
http-server.http.port=8080
3
query.max-memory=5GB
4
query.max-memory-per-node=1GB
5
query.max-total-memory-per-node=2GB
6
discovery.uri=http://presto-coordinator:8080
Troubleshooting:
8080
or whatever port you have specified for http-server.http.port
from each server to the coordinator node.python
. Make sure to have python-is-python3
installed (sudo apt install python-is-python3
).Additional guidance: https://prestodb.io/docs/current/installation/deployment.html
For tuning your PrestoDB deployment, review the various properties: https://prestodb.io/docs/current/admin/properties.html
To start PrestoDB, run launcher start
on each server. Below is a list of commands for the launcher:
launcher start | Start PrestoDB and return PID of running PrestoDB instance |
launcher status | Check status of running PrestoDB and return PID of running PrestoDB instance |
launcher stop | Stop PrestoDB |
Once it is running, you can open http://localhost:8080
and see the cluster overview UI.
Figure 2: Presto UI
The log files can be found in data/var/log/
, which are:
data/var/log/launcher.log
– Contains stdout and stderr streams of the server during initialization of the launcher.data/var/log/server.log
– Main log file with the relevant information if the server fails during initialization. It is automatically rotated and compressed.data/var/log/http-request.log
– HTTP request log containing all HTTP requests received by the server. It is also automatically rotated and compressed.Download the PrestoDB CLI and make it executable:Shell1
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.256/presto-cli-0.256-executable.jar
2
mv presto-cli-0.256-executable.jar /opt/presto-server/bin/presto
3
chmod +x /opt/presto-server/bin/presto
Connect to PrestoDB:Shell1
presto \
2
--server localhost:8080 \
3
--catalog hive \
4
--schema default
It is also possible to just run presto
with no further settings specified. This connects to localhost:8080
, and no schema or catalog would be specified.
More on the CLI: https://prestodb.io/docs/current/installation/cli.html
The PrestoDB UI offers a large set of functionalities and insights to explore and further optimize queries. Previously, you saw the cluster-level details view, including the query list. Each query can have one of the following query states:
QUEUED | Query has been accepted and is awaiting execution. |
PLANNING | Query is being planned. |
STARTING | Query execution is being started. |
RUNNING | Query has at least one running task. |
BLOCKED | Query is blocked and is waiting for resources. |
FINISHING | Query is finishing. |
FINISHED | Query has finished executing and all output has been consumed. |
FAILED | Query execution failed. |
You can open further details on each query by clicking at the top left corner of each query item, which opens the Query Details view.
Figure 3: PrestoDB Query Details
This view includes the following sections:
Next, you can look at the Live Plan view, which shows the stages involved in the query.
Figure 4: PrestoDB Live Plan
You can further drill down into each stage by exploring the Stage Performance tab for each stage.
Figure 5: PrestoDB Stage Performance
You can also see the parallelism and how the execution was done over time by checking the Splits tab.
Figure 6: PrestoDB Splits
Finally, the JSON tab will return all of the query details in a JSON. This is handy for automated evaluation of queries.Section 5
To follow along with a PrestoDB instance with a local S3 storage running on docker, feel free to use https://github.com/njanakiev/presto-minio-docker, which covers a standalone docker setup.
PrestoDB uses a variety of connectors to access data from different data sources. In this Refcard, you will see how to access data on HDFS, S3, and PostgreSQL. More on other connectors: https://prestodb.io/docs/current/connector.html.
TPCH (http://www.tpc.org/tpch/) and TPCDS (http://www.tpc.org/tpcds/) connectors provide a set of schemas for the TPC Benchmark™ to measure the performance of complex decision support databases. They are generally popular when running benchmarks to evaluate big data systems.
To add the tpcds
connector, create the file, etc/catalog/tpcds.properties
:Shell1
connector.name=tpcds
For the tpch
connector, create the file, etc/catalog/tpch.properties
:Shell1
connector.name=tpch
Those connectors generate the data automatically and offer various schemas with a different number of rows for each table — and are generally great for testing queries. TPCH will be used in some of the example queries later in the Refcard.
More on these connectors:
For the Hive Connector, you need a Hive metastore running to connect either to Hadoop HDFS or an S3 storage, which is beyond the scope of this Refcard (see Additional Resources to learn more). When using HDFS, you need to configure tables with Hive. With S3, you can also specify tables for certain file types like Parquet with PrestoDB alone without creating an additional table in Hive.
The following file types are supported for the Hive Connector:
Create the file, etc/cataloc/hive.properties
:Shell1
connector.name=hive-hadoop2
2
hive.metastore.uri=thrift://HIVE_METASTORE_ENDPOINT:9083
3
hive.s3.path-style-access=true
4
hive.s3.endpoint=S3_ENDPOINT
5
hive.s3.aws-access-key=S3_ACCESS_KEY
6
hive.s3.aws-secret-key=S3_SECRET_KEY
7
hive.non-managed-table-writes-enabled=true
8
hive.s3select-pushdown.enabled=true
9
hive.storage-format=PARQUET
Replace S3_ENDPOINT
, S3_ACCESS_KEY
, and S3_SECRET_KEY
with the values you specified in the metastore-site.xml
or hive-site.xml
configuration, as they must be identical. Then, change the HIVE_METASTORE_ENDPOINT
to the endpoint of the hive metastore.
Enable S3 Select Pushdown to allow predicate pushdown by adding the property, hive.s3select-pushdown.enabled=true
. If you are using multiple Hive clusters, you can add those by creating new configuration files in etc/catalog
for each endpoint. This enables you to mix HDFS, AWS Glue, other S3 external object stores, and even local S3 object stores.
Create a schema to a specific bucket:Shell1
CREATE SCHEMA hive.data
2
WITH (location = 's3://data/')
Note that this bucket must exist beforehand. You can create it in the S3 UI or with a tool like s3cmd
(https://s3tools.org/s3cmd) by running:Shell1
s3cmd mb s3://data
2
Now, create a new Hive table:Shell1
CREATE TABLE hive.data.example_data (
2
created_at timestamp,
3
user_id bigint,
4
name varchar,
5
country varchar
6
)
7
WITH (
8
format = 'PARQUET'
9
);
Query the table:Shell1
SELECT created_at, user_id
2
FROM hive.data.example_data
3
LIMIT 10;
Drop the metadata of the table (the data itself is not deleted):Shell1
DROP TABLE hive.data.example_data;
Drop a schema:Shell1
DROP SCHEMA hive.data;
More on configuration: https://prestodb.io/docs/current/connector/hive.html
PostgreSQL is a popular and highly advanced, open-source RDBMS that can often be suitable for various use cases. This PrestoDB connector allows you to query and create tables in an external PostgreSQL database.
The configuration for this connector must be added to the file, etc/catalog/postgresql.properties
:Shell1
connector.name=postgresql
2
connection-url=jdbc:postgresql://postgres-endpoint:5432/database
3
connection-user=username
4
connection-password=password
To see all schemas in PostgreSQL:Shell1
SHOW SCHEMAS FROM postgresql;
To see all tables in a schema in PostgreSQL:Shell1
SHOW TABLES FROM postgresql.public;
Further, to describe the columns in a table:Shell1
DESCRIBE postgresql.public.table_name;
To query this table:Shell1
SELECT column_1, column_2
2
FROM postgresql.public.table_name
3
WHERE column_1 IS NOT NULL
4
LIMIT 10;
An example of how to create a table from another table in PostgreSQL:Shell1
CREATE TABLE postgresql.public.item AS
2
SELECT i_item_id, i_item_desc
3
FROM tpcds.tiny.item;
Note that CREATE TABLE
by itself without AS
is not supported. The same goes for these SQL statements:
DELETE
ALTER TABLE
GRANT
/REVOKE
SHOW GRANTS
/SHOW ROLES
/SHOW ROLE GRANTS
More information: https://prestodb.io/docs/current/connector/postgresql.html.
Explore catalogs, schemas, and tables with:
SHOW CATALOGS; | List all catalogs |
SHOW SCHEMAS IN catalog_name; | List all schemas in a catalog |
SHOW TABLES IN catalog_name.schema_name; | List all tables in a schema |
DESCRIBE catalog_name.schema_name.table_name; | List the columns in a table along with their data type and other attributes (alias for SHOW COLUMNS FROM table ) |
PrestoDB offers a wide variety of functions and operators. You can apply logical operators with AND
, OR
, and NOT
, as well as comparisons with <
, >
, and =
, among others. There is a large set of commonly used mathematical functions and operators like abs(x)
, ceiling(x)
, floor(x)
, sqrt(x)
, sin(x)
, cos(x)
, tan(x)
, and random()
, among many others. For more functions: https://prestodb.io/docs/current/functions/math.html.
The same goes for common string and date functions and operators that you might want to use. Find the full list at https://prestodb.io/docs/current/functions/string.html and https://prestodb.io/docs/current/functions/datetime.html, respectively.
When working with analytical queries, a common use case is to run aggregation over groups or whole columns. PrestoDB has many such functions that cover almost any use case. The following simple example returns the average account balance for each marketsegment
in the TPCH data set:Shell1
SELECT
2
mktsegment, avg(acctbal)
3
FROM
4
tpch.tiny.customer
5
GROUP BY
6
mktsegment
7
ORDER BY 2 DESC;
Common aggregation functions are count()
, avg()
, sum(x)
, min(x)
, max(x)
, and stddev(x)
, among many more advanced aggregations and statistical functions. For more: https://prestodb.io/docs/current/functions/aggregate.html
PrestoDB also offers the well-known functionality for window functions using the OVER
clause. A window function uses values from one or multiple rows in a table to return a value for each row. A common example query for a rolling average over seven days:Shell1
SELECT
2
orderdate,
3
avg(totalprice) OVER (ORDER BY orderdate ASC ROWS 7 PRECEDING) AS rolling_average
4
FROM
5
tpch.tiny.orders;
Another example is to calculate a sum over order priority for each row:Shell1
SELECT
2
orderdate,
3
totalprice,
4
orderpriority,
5
sum(totalprice) OVER (PARTITION BY orderpriority)
6
AS total_price_per_priority
7
FROM
8
tpch.tiny.orders
9
ORDER BY 1 ASC;
10
Read more about window functions: https://prestodb.io/docs/current/functions/window.html
Also, the PostGIS extension includes a large set of geospatial functions and aggregations that should be familiar to geospatial professionals coming from PostgreSQL. An example of how to aggregate data for a specific region:Shell1
SELECT
2
points.event_code,
3
COUNT(points.event_id) AS cnt
4
FROM
5
events AS points,
6
natural_earth AS countries
7
WHERE
8
countries.iso_a2 = 'DE'
9
AND ST_Contains(
10
ST_GeomFromBinary(countries.geometry),
11
ST_Point(points.lon, points.lat))
12
GROUP BY
13
points.event_code;
A full list for the geospatial functionality: https://prestodb.io/docs/current/functions/geospatial.html
PrestoDB is quite extensive in its functionalities, and this section should serve as a quick overview of what is available. Other functions and operators include:
The complete list of functions and operators: https://prestodb.io/docs/current/functions.html
For your query to be executed on the various data sources, it requires a few steps from the initial SQL statement to the resulting query plan — where the query optimizer comes into play. After parsing the SQL statement into a syntax tree and later into a logical plan, the query optimizer takes care in creating an efficient execution strategy chosen among many possible strategies.
PrestoDB uses two optimizers. The Rule-Based Optimizer (RBO) applies filters to prune irrelevant data and uses hash joins to avoid full cartesian joins. This includes strategies such as predicate pushdown, limit pushdown, column pruning, and decorrelation. Next, it uses a Cost-Based Optimizer (CBO) continuing from the previous optimization. Here it uses statistics of the table (e.g., number of distinct values, number of null values, distributions of column data) to optimize queries and reduce I/O and network overhead.
You can see available statistics in your tables using these commands:
SHOW STATS FOR table_name; | Approximated statistics for the named table |
SHOW STATS FOR ( SELECT ... ); | Approximated statistics for the query result |
To see the cost-based analysis of a query, you can use the EXPLAIN and the EXPLAIN ANALYZE keywords:
EXPLAIN [VERBOSE] SELECT ... | Execute statement and show the distributed execution plan with the cost of each operation. |
EXPLAIN ANALYZE [VERBOSE] SELECT ... | Execute statement and show the distributed execution plan with the cost and duration of each operation. |
An example of using EXPLAIN
on a simple SELECT
statement with the LIMIT
clause, showing the calculated cost and expected number of rows:Shell1
presto> EXPLAIN SELECT mktsegment, acctbal FROM tpch.tiny.customer LIMIT 5;
2
- Output[mktsegment, acctbal] => [mktsegment:varchar(10), acctbal:double]
3
Estimates: {rows: 5 (70B), cpu: 21185.30, memory: 0.00, network: 70.15}
4
- Limit[5] => [acctbal:double, mktsegment:varchar(10)]
5
Estimates: {rows: 5 (70B), cpu: 21185.30, memory: 0.00, network: 70.15}
6
- LocalExchange[SINGLE] () => [acctbal:double, mktsegment:varchar(10)]
7
Estimates: {rows: 5 (70B), cpu: 21115.15, memory: 0.00, network: 70.15}
8
- RemoteStreamingExchange[GATHER] => [acctbal:double, mktsegment:varchar(10)]
9
Estimates: {rows: 5 (70B), cpu: 21115.15, memory: 0.00, network: 70.15}
10
- LimitPartial[5] => [acctbal:double, mktsegment:varchar(10)]
11
Estimates: {rows: 5 (70B), cpu: 21115.15, memory: 0.00, network: 0.00}
12
- TableScan[TableHandle {connectorId='tpch', connectorHandle='customer:sf0.01',
13
layout='Optional[customer:sf0.01]'}] => [acctbal:double, mktsegment:varchar(10)]
14
Estimates: {rows: 1500 (20.55kB), cpu: 21045.00, memory: 0.00, network: 0.00}
15
acctbal := tpch:acctbal
16
mktsegment := tpch:mktsegment
The same query with EXPLAIN ANALYZE
, showing the distributed execution plan, including the duration and cost for each stage:Shell1
presto> EXPLAIN ANALYZE SELECT mktsegment, acctbal FROM tpch.tiny.customer LIMIT 5;
2
Fragment 1 [SINGLE]
3
CPU: 2.69ms, Scheduled: 13.73ms, Input: 20 rows (461B); per task: avg.: 20.00 std.dev.: 0.00, Output: 5 rows (116B)
4
Output layout: [acctbal, mktsegment]
5
Output partitioning: SINGLE []
6
Stage Execution Strategy: UNGROUPED_EXECUTION
7
- Limit[5] => [acctbal:double, mktsegment:varchar(10)]
8
CPU: 0.00ns (0.00%), Scheduled: 11.00ms (5.39%), Output: 5 rows (116B)
9
Input avg.: 5.00 rows, Input std.dev.: 0.00%
10
- LocalExchange[SINGLE] () => [acctbal:double, mktsegment:varchar(10)]
11
CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 5 rows (116B)
12
Input avg.: 1.25 rows, Input std.dev.: 387.30%
13
- RemoteSource[2] => [acctbal:double, mktsegment:varchar(10)]
14
CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 20 rows (461B)
15
Input avg.: 1.25 rows, Input std.dev.: 387.30%
16
Fragment 2 [SOURCE]
17
CPU: 20.40ms, Scheduled: 297.41ms, Input: 1500 rows (0B); per task: avg.: 1500.00 std.dev.: 0.00, Output: 20 rows (461B)
18
Output layout: [acctbal, mktsegment]
19
Output partitioning: SINGLE []
20
Stage Execution Strategy: UNGROUPED_EXECUTION
21
- LimitPartial[5] => [acctbal:double, mktsegment:varchar(10)]
22
CPU: 1.00ms (5.26%), Scheduled: 160.00ms (78.43%), Output: 20 rows (461B)
23
Input avg.: 375.00 rows, Input std.dev.: 0.00%
24
- TableScan[TableHandle {connectorId='tpch', connectorHandle='customer:sf0.01',
25
layout='Optional[customer:sf0.01]'}, grouped = false] => [acctbal:double, mktsegment:varchar(10)]
26
CPU: 18.00ms (94.74%), Scheduled: 33.00ms (16.18%), Output: 1500 rows (33.66kB)
27
Input avg.: 375.00 rows, Input std.dev.: 0.00%
28
acctbal := tpch:acctbal
29
mktsegment := tpch:mktsegment
30
Input: 1500 rows (0B), Filtered: 0.00%
You can also add the VERBOSE
option to get more detailed information and low-level statistics. For more on cost in EXPLAIN
and cost-based optimizations, visit https://prestodb.io/docs/current/optimizer/cost-in-explain.html and https://prestodb.io/docs/current/optimizer/cost-based-optimizations.html, respectively.Section 6
PrestoDB is a powerful federated query engine that can bridge multiple diverse data sources and gain timely insights in an optimized way. Many companies have adopted PrestoDB in their data operations. In this Refcard, you learned about PrestoDB architecture, setup on single- and multi-node machines, and the various functionality and optimizations available. For more resources, see the following links and those throughout this Refcard.
Source: https://dzone.com/refcardz/getting-started-with-prestodb
Department of Information Technologies: https://www.ibu.edu.ba/department-of-information-technologies/