Scale up your SQL queries

Run your SQL queries distributed and fast, with the lightweight, cross-plattform and easy to install dask-sql package

if __name__ == "__main__":

# Create a dask cluster

from dask.distributed import Client

client = Client()


# Load the data with dask

import dask.datasets

df = dask.datasets.timeseries()


# Register the data in a dask-sql context

from dask_sql import Context

c = Context()

c.create_table("timeseries", df)


# Query the data utilizing your dask cluster with standard SQL

result = c.sql("SELECT name, SUM(x) FROM timeseries GROUP BY name").compute()

print(result)

Features

DASK AND DASK-SQL

Python and SQL

Query your data with both the dask API and normal SQL syntax in combination without the need for a database.

USE THE FULL POWER OF YOUR CLUSTER

Infinite Scaling

You are writing normal SQL - but in the background your queries get distributed over your cluster and leverage the full computation power.

MIX SQL WITH CUSTOM FUNCTIONS

Your data - your queries

Combine SQL functions with your self-written, python functions without any performance drawbacks or rewriting.

Computing Infrastructure

As dask-sql utilizes dask, you can use the large variety of possible computing infrastructures that dask supports: cloud providers, YARN, k8s, batch systems, ...

dask Cluster

Connect to your infrastructure and deploy a dask scheduler to control your computation. There are many ways to do so. By default, a local cluster is spawned for you.

dask-sql Library

dask-sql will connect to your dask cluster and will translate your SQL queries into dask API calls. A large fraction of the SQL standard is already understood.
Learn more.

dask-sql SQL server

It is also possible to deploy a dask-sql service, e.g. with a docker image, which lets you send SQL queries via a presto-compatible REST interface.
Learn more.

Notebooks and Code

You can import dask-sql into your own scripts and mix SQL code with your normal dataframe operations. It works particularly well with the interactive notebook format. Launch a test notebook server.

External Applications, e.g. BI Tools

Using a well-known protocol and the standardized SQL language, dask-sql helps you to query your data from wherever you want.

Install dask-sql via conda or pip

$ conda install -c conda-forge dask-sql
# or (needs java pre-installed)
$ pip install -U dask-sql
# or run the SQL server via docker
$ docker run --rm -it -p 8080:8080 nbraun/dask-sql