Getting Started With PrestoDB - International Burch University
Getting Started With Static Code Analysis
July 1, 2021
Why Agile Fails Because of Corporate Culture
July 1, 2021

Getting Started With PrestoDB

Presto is an open-source distributed SQL (on everything) query engine that is designed and built from the ground up for running interactive analytic queries against a large number of data sources. PrestoDB’s ability to query data at its source — across multiple sources — in a federated way, plus its in-memory pipelining architecture, make for greater scalability, speed, and performance.

This Refcard provides everything to get you up and running with PrestoDB: You will learn about concepts and terminology; installation, setup, and launch; key techniques (with examples); and more!

About PrestoDB

With a diverse and ever-expanding data warehouse or data lake with various data sources, it can become difficult to derive useful insights in a timely manner. PrestoDB is a distributed SQL query engine for big data that aims to tackle all of these problems in a federated way. It offers various connectors to the different data sources with its own query optimization and can be scaled to any number of machines to distribute load.

PrestoDB was born out of Facebook in 2012 after struggling with slow queries running on Apache Hive. The problem with Hive was that it would store intermediate results of its MapReduce queries on disk, which would result in a lot of I/O overhead on disk. PrestoDB, with its new architecture and in-memory engine, would significantly reduce its latencies and query speed, which in turn allows for much more interactive queries. In 2015, Netflix would show that PrestoDB was, in fact, 10 times faster than Hive and even faster in some cases.

The use cases for PrestoDB range from interactive ad hoc queries to long-running batch ETL pipelines, making it flexible for a large variety of data-driven use cases and applications.Section 2

PrestoDB Architecture

PrestoDB resembles a massively parallel processing (MPP) system that facilitates the separation between storage and compute and allows you to scale its computing power horizontally by adding more servers. This is great for existing data storage systems and does not require migrating data sources to run queries on them. PrestoDB is written in Java and uses the ANSI SQL specification for running queries with most features from this dialect.

In PrestoDB, you have one coordinator node and multiple worker nodes. In the case of single-node PrestoDB setups, the single node is both coordinator and worker. The coordinator node is responsible for the whole orchestration by accepting, parsing, planning, and optimizing queries. Worker nodes are responsible only for the query processing. The workers and coordinator communicate via an HTTP-based protocol on a single port.

Additionally, PrestoDB uses a discovery service that is running on the coordinator, where each worker can register and periodically send its heartbeat. This runs on the same HTTP server — including the same port. 

In the following figure, you can see the full architecture in a simplified manner.

Figure 1: PrestoDB architecture

PrestoDB architecture

PrestoDB Concepts

PrestoDB uses connectors to connect to the various data sources. Each connector needs to implement four service provider interfaces (SPIs):

  • Metadata SPI
  • Data Location SPI
  • Data Statistics SPI
  • Data Source SPI

These SPIs provide the interface with which PrestoDB can communicate and query data on the various data sources. The data sources are separated into connectorcatalogschema, and table concepts with the following description:

ConnectorImplementation of PrestoDB to a data source such as Hive, PostgreSQL, or any other available data source.
CatalogContains schemas and references to a data source via a connector.
SchemaOrganizes a set of tables. In RDBMSs like PostgreSQL and MySQL, this translates to the same concept in the database.
TableCollection of data in terms of rows, columns, and the associated data types.

When PrestoDB executes SQL statements, they are turned into queries and then planned and executed with its query execution model using the following terminology:

StatementStatements are defined in the ANSI SQL standard, consisting of clauses, expressions, and predicates.
QueryThe previous SQL statement is parsed into a query and creates a distributed query plan consisting of a series of interconnected stages that contain all of the below elements.
StageThe execution is structured in a hierarchy of stages that resembles a tree. They model the distributed query plan but are not executed by the worker nodes.
TaskEach stage consists of a series of tasks that are distributed over the PrestoDB worker nodes. Tasks contain one or more parallel drivers.
SplitTasks operate on splits, which are sections of a larger data set.
DriverDrivers work with the data and combine operators to produce output that is aggregated by a task and delivered to another task in another stage. Each driver has one input and one output.
OperatorAn operator consumes, transforms, and produces data.
Exchange  Exchanges transfer data between PrestoDB nodes for different stages in a query.

Getting Started With PrestoDB

Installation and Configuration

To run PrestoDB, make sure to fulfill the following prerequisites for your system:

Current Release0.256
Downloads Pageprestodb.io/download
JDK Version1.8 or higher
Python2.6/Task3.5 or higher
Development Versiongithub.com/prestodb/presto

Make sure to have Java 8 or 11 installed and make python execute python3 for the PrestoDB launcher:Shell1

sudo apt update

2

sudo apt install \

3

    python-is-python3 \

4

    openjdk-8-jdk-headless \

5

    openjdk-8-jre-headless \

6

    openjdk-8-jre

Download and extract the latest PrestoDB version:Shell1

wget "https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.256/presto-server-0.256.tar.gz"

2

tar -xzvf presto-server-0.256.tar.gz

3

sudo mv presto-server-0.256 /opt/presto-server

4

sudo chown -R $USER:$USER /opt/presto-server

Add the following lines to .bashrc file:Shell1

export PRESTO_HOME=/opt/presto-server

2

export PATH=$PATH:$PRESTO_HOME/bin

Then, run source ~/.bashrc for the changes to take effect. 

Single Node Configuration

PrestoDB can run both as a single node cluster or a cluster with multiple nodes. For a single node cluster, the coordinator and worker run on the same machine/instance. First, you need to create all of the configuration files for PrestoDB by creating the following folder structure:Shell1

mkdir -p /opt/presto-server/etc/catalog

The required files for a minimal setup:

  • etc/node.properties
  • etc/config.properties
  • etc/jvm.config
  • etc/log.properties

Let’s have a look at each of the files above. Starting with etc/node.properties, this is its minimal configuration:Shell1

node.environment=production

2

node.id=ffffffff-ffff-ffff-ffff-ffffffffffff

3

node.data-dir=/usr/local/presto/data

4

plugin.dir=/opt/presto-server/plugin

Next, the configuration properties for etc/config.properties:Shell1

coordinator=true

2

node-scheduler.include-coordinator=true

3

http-server.http.port=8080

4

query.max-memory=5GB

5

query.max-memory-per-node=1GB

6

query.max-total-memory-per-node=2GB

7

discovery-server.enabled=true

8

discovery.uri=http://localhost:8080

The Java settings and flags can be configured in etc/jvm.config with the following minimal configuration:Shell1

-server

2

-Xmx16G

3

-XX:+UseG1GC

4

-XX:G1HeapRegionSize=32M

5

-XX:+UseGCOverheadLimit

6

-XX:+ExplicitGCInvokesConcurrent

7

-XX:+HeapDumpOnOutOfMemoryError

8

-XX:+ExitOnOutOfMemoryError

Finally, the logging configuration can be set in etc/log.properties:Shell1

com.facebook.presto=DEBUG

The log levels are DEBUGINFOWARN, and ERROR.

Multiple Nodes Configuration

The difference between multiple nodes is in etc/config.properties, which is where you define the coordinator and the discovery URL for the nodes. Additionally, you need to have a different node.id specified in each server in etc/node.properties. The configuration in etc/config.properties for the coordinator node:Shell1

coordinator=true

2

node-scheduler.include-coordinator=true

3

http-server.http.port=8080

4

query.max-memory=5GB

5

query.max-memory-per-node=1GB

6

query.max-total-memory-per-node=2GB

7

discovery-server.enabled=true

8

discovery.uri=http://presto-master:8080

9

The configuration for the worker nodes:Shell1

coordinator=false

2

http-server.http.port=8080

3

query.max-memory=5GB

4

query.max-memory-per-node=1GB

5

query.max-total-memory-per-node=2GB

6

discovery.uri=http://presto-master:8080

Troubleshooting:

  • Check if each server is reachable from the other servers. The best method is to specify the servers with their IP in the /etc/hosts file.
  • Check if the firewall was configured properly. You need to access port 8080 or whatever port you have specified for http-server.http.port from each server to the coordinator node.
  • The launcher requires Python 2.6+ or 3.5+ by calling python. Make sure to have python-is-python3 installed (sudo apt install python-is-python3).

Additional guidance: https://prestodb.io/docs/current/installation/deployment.html

For tuning your PrestoDB deployment, review the various properties: https://prestodb.io/docs/current/admin/properties.html

Launch PrestoDB

To start PrestoDB, run launcher start on each server. Below is a list of commands for the launcher:

launcher startStart PrestoDB and return PID of running PrestoDB instance
launcher statusCheck status of running PrestoDB and return PID of running PrestoDB instance
launcher stopStop PrestoDB

 Once it is running, you can open http://localhost:8080 and see the cluster overview UI.

Figure 2: Presto UI

PrestoDB UI

The log files can be found in data/var/log/, which are:

  • data/var/log/launcher.log – Contains stdout and stderr streams of the server during initialization of the launcher.
  • data/var/log/server.log – Main log file with the relevant information if the server fails during initialization. It is automatically rotated and compressed.
  • data/var/log/http-request.log – HTTP request log containing all HTTP requests received by the server. It is also automatically rotated and compressed.

PrestoDB Command Line Interface

Download the PrestoDB CLI and make it executable:Shell1

wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.256/presto-cli-0.256-executable.jar

2

mv presto-cli-0.256-executable.jar /opt/presto-server/bin/presto

3

chmod +x /opt/presto-server/bin/presto

Connect to PrestoDB:Shell1

presto \

2

    --server localhost:8080 \

3

    --catalog hive \

4

    --schema default

It is also possible to just run presto with no further settings specified. This connects to localhost:8080, and no schema or catalog would be specified.

More on the CLI: https://prestodb.io/docs/current/installation/cli.html

PrestoDB UI

The PrestoDB UI offers a large set of functionalities and insights to explore and further optimize queries. Previously, you saw the cluster-level details view, including the query list. Each query can have one of the following query states:

QUEUEDQuery has been accepted and is awaiting execution.
PLANNINGQuery is being planned.
STARTINGQuery execution is being started.
RUNNINGQuery has at least one running task.
BLOCKEDQuery is blocked and is waiting for resources.
FINISHINGQuery is finishing.
FINISHEDQuery has finished executing and all output has been consumed.
FAILEDQuery execution failed.

You can open further details on each query by clicking at the top left corner of each query item, which opens the Query Details view.

Figure 3: PrestoDB Query Details

PrestoDB query details

This view includes the following sections:

  • Session
  • Execution
  • Resource Utilization Summary
  • Timeline
  • Query
  • Prepare Query
  • Stages
  • Tasks

Next, you can look at the Live Plan view, which shows the stages involved in the query.

Figure 4: PrestoDB Live Plan

You can further drill down into each stage by exploring the Stage Performance tab for each stage. 

Figure 5: PrestoDB Stage Performance

PrestoDB stage performance

You can also see the parallelism and how the execution was done over time by checking the Splits tab.

Figure 6: PrestoDB Splits

PrestoDB splits

Finally, the JSON tab will return all of the query details in a JSON. This is handy for automated evaluation of queries.Section 5

Key Techniques in PrestoDB

To follow along with a PrestoDB instance with a local S3 storage running on docker, feel free to use https://github.com/njanakiev/presto-minio-docker, which covers a standalone docker setup.

Connectors

PrestoDB uses a variety of connectors to access data from different data sources. In this Refcard, you will see how to access data on HDFS, S3, and PostgreSQL. More on other connectors: https://prestodb.io/docs/current/connector.html.

TPC-H and TPC-DS Connectors

TPCH (http://www.tpc.org/tpch/) and TPCDS (http://www.tpc.org/tpcds/) connectors provide a set of schemas for the TPC Benchmark™ to measure the performance of complex decision support databases. They are generally popular when running benchmarks to evaluate big data systems.

To add the tpcds connector, create the file, etc/catalog/tpcds.properties:Shell1

connector.name=tpcds

For the tpch connector, create the file, etc/catalog/tpch.properties:Shell1

connector.name=tpch 

Those connectors generate the data automatically and offer various schemas with a different number of rows for each table — and are generally great for testing queries. TPCH will be used in some of the example queries later in the Refcard.

More on these connectors:

Hive Connector

For the Hive Connector, you need a Hive metastore running to connect either to Hadoop HDFS or an S3 storage, which is beyond the scope of this Refcard (see Additional Resources to learn more). When using HDFS, you need to configure tables with Hive. With S3, you can also specify tables for certain file types like Parquet with PrestoDB alone without creating an additional table in Hive.

The following file types are supported for the Hive Connector:

  • ORC
  • Parquet
  • Avro
  • RCFile
  • SequenceFile
  • JSON
  • Text

Create the file, etc/cataloc/hive.properties:Shell1

connector.name=hive-hadoop2

2

hive.metastore.uri=thrift://HIVE_METASTORE_ENDPOINT:9083

3

hive.s3.path-style-access=true

4

hive.s3.endpoint=S3_ENDPOINT

5

hive.s3.aws-access-key=S3_ACCESS_KEY

6

hive.s3.aws-secret-key=S3_SECRET_KEY

7

hive.non-managed-table-writes-enabled=true

8

hive.s3select-pushdown.enabled=true

9

hive.storage-format=PARQUET

Replace S3_ENDPOINTS3_ACCESS_KEY, and S3_SECRET_KEY with the values you specified in the metastore-site.xml or hive-site.xml configuration, as they must be identical. Then, change the HIVE_METASTORE_ENDPOINT to the endpoint of the hive metastore.

Enable S3 Select Pushdown to allow predicate pushdown by adding the property, hive.s3select-pushdown.enabled=true. If you are using multiple Hive clusters, you can add those by creating new configuration files in etc/catalog for each endpoint. This enables you to mix HDFS, AWS Glue, other S3 external object stores, and even local S3 object stores.

Create a schema to a specific bucket:Shell1

CREATE SCHEMA hive.data

2

WITH (location = 's3://data/')

Note that this bucket must exist beforehand. You can create it in the S3 UI or with a tool like s3cmd (https://s3tools.org/s3cmd) by running:Shell1

s3cmd mb s3://data

2

Now, create a new Hive table:Shell1

CREATE TABLE hive.data.example_data (

2

  created_at timestamp,

3

  user_id bigint,

4

  name varchar,

5

  country varchar

6

)

7

WITH (

8

  format = 'PARQUET'

9

);

Query the table:Shell1

SELECT created_at, user_id

2

FROM hive.data.example_data

3

LIMIT 10; 

Drop the metadata of the table (the data itself is not deleted):Shell1

DROP TABLE hive.data.example_data;

Drop a schema:Shell1

DROP SCHEMA hive.data;

More on configuration: https://prestodb.io/docs/current/connector/hive.html

PostgreSQL Connector

PostgreSQL is a popular and highly advanced, open-source RDBMS that can often be suitable for various use cases. This PrestoDB connector allows you to query and create tables in an external PostgreSQL database.

The configuration for this connector must be added to the file, etc/catalog/postgresql.properties:Shell1

connector.name=postgresql

2

connection-url=jdbc:postgresql://postgres-endpoint:5432/database

3

connection-user=username

4

connection-password=password

To see all schemas in PostgreSQL:Shell1

SHOW SCHEMAS FROM postgresql;

To see all tables in a schema in PostgreSQL:Shell1

SHOW TABLES FROM postgresql.public;

Further, to describe the columns in a table:Shell1

DESCRIBE postgresql.public.table_name;

To query this table:Shell1

SELECT column_1, column_2

2

FROM postgresql.public.table_name

3

WHERE column_1 IS NOT NULL

4

LIMIT 10;

An example of how to create a table from another table in PostgreSQL:Shell1

CREATE TABLE postgresql.public.item AS 

2

  SELECT i_item_id, i_item_desc

3

  FROM tpcds.tiny.item;

Note that CREATE TABLE by itself without AS is not supported. The same goes for these SQL statements:

  • DELETE
  • ALTER TABLE
  • GRANT/REVOKE
  • SHOW GRANTS/SHOW ROLES/SHOW ROLE GRANTS

More information: https://prestodb.io/docs/current/connector/postgresql.html.

Functions and Operators

Explore catalogs, schemas, and tables with:

SHOW CATALOGS;List all catalogs
SHOW SCHEMAS IN catalog_name;List all schemas in a catalog
SHOW TABLES IN catalog_name.schema_name;List all tables in a schema
DESCRIBE catalog_name.schema_name.table_name;List the columns in a table along with their data type and other attributes (alias for SHOW COLUMNS FROM table)

PrestoDB offers a wide variety of functions and operators. You can apply logical operators with ANDOR, and NOT, as well as comparisons with <>, and =, among others. There is a large set of commonly used mathematical functions and operators like abs(x)ceiling(x)floor(x)sqrt(x)sin(x)cos(x)tan(x), and random(), among many others. For more functions: https://prestodb.io/docs/current/functions/math.html.

The same goes for common string and date functions and operators that you might want to use. Find the full list at https://prestodb.io/docs/current/functions/string.html and https://prestodb.io/docs/current/functions/datetime.html, respectively.

When working with analytical queries, a common use case is to run aggregation over groups or whole columns. PrestoDB has many such functions that cover almost any use case. The following simple example returns the average account balance for each marketsegment in the TPCH data set:Shell1

SELECT 

2

  mktsegment, avg(acctbal)

3

FROM 

4

  tpch.tiny.customer

5

GROUP BY

6

  mktsegment

7

ORDER BY 2 DESC;

Common aggregation functions are count()avg()sum(x)min(x)max(x), and stddev(x), among many more advanced aggregations and statistical functions. For more: https://prestodb.io/docs/current/functions/aggregate.html

PrestoDB also offers the well-known functionality for window functions using the OVER clause. A window function uses values from one or multiple rows in a table to return a value for each row. A common example query for a rolling average over seven days:Shell1

SELECT

2

  orderdate,

3

  avg(totalprice) OVER (ORDER BY orderdate ASC ROWS 7 PRECEDING) AS rolling_average

4

FROM

5

  tpch.tiny.orders;

Another example is to calculate a sum over order priority for each row:Shell1

SELECT

2

  orderdate,

3

  totalprice,

4

  orderpriority,

5

  sum(totalprice) OVER (PARTITION BY orderpriority) 

6

    AS total_price_per_priority

7

FROM 

8

  tpch.tiny.orders

9

ORDER BY 1 ASC;

10

Read more about window functions: https://prestodb.io/docs/current/functions/window.html

Also, the PostGIS extension includes a large set of geospatial functions and aggregations that should be familiar to geospatial professionals coming from PostgreSQL. An example of how to aggregate data for a specific region:Shell1

SELECT 

2

  points.event_code,

3

  COUNT(points.event_id) AS cnt

4

FROM 

5

  events AS points,

6

  natural_earth AS countries

7

WHERE 

8

  countries.iso_a2 = 'DE'

9

  AND ST_Contains(

10

     ST_GeomFromBinary(countries.geometry), 

11

    ST_Point(points.lon, points.lat))

12

GROUP BY

13

  points.event_code;

A full list for the geospatial functionality: https://prestodb.io/docs/current/functions/geospatial.html

PrestoDB is quite extensive in its functionalities, and this section should serve as a quick overview of what is available. Other functions and operators include:

  • Bitwise Functions
  • Binary Functions and Operators
  • Regular Expression Functions
  • JSON Functions and Operators
  • URL, IP, and HyperLogLog Functions
  • Lambda Expressions
  • Array and Map Functions and Operators

The complete list of functions and operators: https://prestodb.io/docs/current/functions.html

Query Optimization

For your query to be executed on the various data sources, it requires a few steps from the initial SQL statement to the resulting query plan — where the query optimizer comes into play. After parsing the SQL statement into a syntax tree and later into a logical plan, the query optimizer takes care in creating an efficient execution strategy chosen among many possible strategies.

PrestoDB uses two optimizers. The Rule-Based Optimizer (RBO) applies filters to prune irrelevant data and uses hash joins to avoid full cartesian joins. This includes strategies such as predicate pushdown, limit pushdown, column pruning, and decorrelation. Next, it uses a Cost-Based Optimizer (CBO) continuing from the previous optimization. Here it uses statistics of the table (e.g., number of distinct values, number of null values, distributions of column data) to optimize queries and reduce I/O and network overhead.

You can see available statistics in your tables using these commands:

SHOW STATS FOR table_name;Approximated statistics for the named table
SHOW STATS FOR ( SELECT ... );Approximated statistics for the query result

To see the cost-based analysis of a query, you can use the EXPLAIN and the EXPLAIN ANALYZE keywords:

EXPLAIN [VERBOSE] SELECT ...Execute statement and show the distributed execution plan with the cost of each operation.
EXPLAIN ANALYZE [VERBOSE] SELECT ...Execute statement and show the distributed execution plan with the cost and duration of each operation.

An example of using EXPLAIN on a simple SELECT statement with the LIMIT clause, showing the calculated cost and expected number of rows:Shell1

presto> EXPLAIN SELECT mktsegment, acctbal FROM tpch.tiny.customer LIMIT 5;

2

 - Output[mktsegment, acctbal] => [mktsegment:varchar(10), acctbal:double]

3

         Estimates: {rows: 5 (70B), cpu: 21185.30, memory: 0.00, network: 70.15}

4

     - Limit[5] => [acctbal:double, mktsegment:varchar(10)]

5

             Estimates: {rows: 5 (70B), cpu: 21185.30, memory: 0.00, network: 70.15}

6

         - LocalExchange[SINGLE] () => [acctbal:double, mktsegment:varchar(10)]

7

                 Estimates: {rows: 5 (70B), cpu: 21115.15, memory: 0.00, network: 70.15}

8

             - RemoteStreamingExchange[GATHER] => [acctbal:double, mktsegment:varchar(10)]

9

                      Estimates: {rows: 5 (70B), cpu: 21115.15, memory: 0.00, network: 70.15}

10

                 - LimitPartial[5] => [acctbal:double, mktsegment:varchar(10)]

11

                         Estimates: {rows: 5 (70B), cpu: 21115.15, memory: 0.00, network: 0.00}

12

                     - TableScan[TableHandle {connectorId='tpch', connectorHandle='customer:sf0.01',

13

                          layout='Optional[customer:sf0.01]'}] => [acctbal:double, mktsegment:varchar(10)]

14

                              Estimates: {rows: 1500 (20.55kB), cpu: 21045.00, memory: 0.00, network: 0.00}

15

                              acctbal := tpch:acctbal

16

                              mktsegment := tpch:mktsegment

The same query with EXPLAIN ANALYZE, showing the distributed execution plan, including the duration and cost for each stage:Shell1

presto> EXPLAIN ANALYZE SELECT mktsegment, acctbal FROM tpch.tiny.customer LIMIT 5;

2

 Fragment 1 [SINGLE]                 

3

     CPU: 2.69ms, Scheduled: 13.73ms, Input: 20 rows (461B); per task: avg.: 20.00 std.dev.: 0.00, Output: 5 rows (116B)

4

     Output layout: [acctbal, mktsegment]

5

     Output partitioning: SINGLE []

6

     Stage Execution Strategy: UNGROUPED_EXECUTION

7

     - Limit[5] => [acctbal:double, mktsegment:varchar(10)]

8

             CPU: 0.00ns (0.00%), Scheduled: 11.00ms (5.39%), Output: 5 rows (116B)

9

             Input avg.: 5.00 rows, Input std.dev.: 0.00%

10

         - LocalExchange[SINGLE] () => [acctbal:double, mktsegment:varchar(10)]

11

                 CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 5 rows (116B)

12

                 Input avg.: 1.25 rows, Input std.dev.: 387.30%

13

             - RemoteSource[2] => [acctbal:double, mktsegment:varchar(10)]

14

                     CPU: 0.00ns (0.00%), Scheduled: 0.00ns (0.00%), Output: 20 rows (461B)

15

                     Input avg.: 1.25 rows, Input std.dev.: 387.30%

16

 Fragment 2 [SOURCE]

17

     CPU: 20.40ms, Scheduled: 297.41ms, Input: 1500 rows (0B); per task: avg.: 1500.00 std.dev.: 0.00, Output: 20 rows (461B)

18

     Output layout: [acctbal, mktsegment]

19

     Output partitioning: SINGLE []

20

     Stage Execution Strategy: UNGROUPED_EXECUTION

21

     - LimitPartial[5] => [acctbal:double, mktsegment:varchar(10)]

22

             CPU: 1.00ms (5.26%), Scheduled: 160.00ms (78.43%), Output: 20 rows (461B)

23

             Input avg.: 375.00 rows, Input std.dev.: 0.00%

24

         - TableScan[TableHandle {connectorId='tpch', connectorHandle='customer:sf0.01', 

25

             layout='Optional[customer:sf0.01]'}, grouped = false] => [acctbal:double, mktsegment:varchar(10)]

26

                 CPU: 18.00ms (94.74%), Scheduled: 33.00ms (16.18%), Output: 1500 rows (33.66kB)

27

                 Input avg.: 375.00 rows, Input std.dev.: 0.00%

28

                 acctbal := tpch:acctbal

29

                 mktsegment := tpch:mktsegment

30

                 Input: 1500 rows (0B), Filtered: 0.00%

You can also add the VERBOSE option to get more detailed information and low-level statistics. For more on cost in EXPLAIN and cost-based optimizations, visit https://prestodb.io/docs/current/optimizer/cost-in-explain.html and https://prestodb.io/docs/current/optimizer/cost-based-optimizations.html, respectively.Section 6

Conclusion

PrestoDB is a powerful federated query engine that can bridge multiple diverse data sources and gain timely insights in an optimized way. Many companies have adopted PrestoDB in their data operations. In this Refcard, you learned about PrestoDB architecture, setup on single- and multi-node machines, and the various functionality and optimizations available. For more resources, see the following links and those throughout this Refcard.

Source: https://dzone.com/refcardz/getting-started-with-prestodb

Department of Information Technologies: https://www.ibu.edu.ba/department-of-information-technologies/