Skip to content

Add export capabilities to MSQ with SQL syntax#15689

Merged
cryptoe merged 53 commits into
apache:masterfrom
adarshsanjeev:export-syntax
Feb 7, 2024
Merged

Add export capabilities to MSQ with SQL syntax#15689
cryptoe merged 53 commits into
apache:masterfrom
adarshsanjeev:export-syntax

Conversation

@adarshsanjeev
Copy link
Copy Markdown
Contributor

@adarshsanjeev adarshsanjeev commented Jan 16, 2024

Problem

Druid currently does not allow export of tables in a programmatic manner. While is is possible to download results from a SELECT query, this relies on writing the results to a single query report, which cannot support large datasets. An export syntax which writes the results in a desired format directly to an external location (such as s3 or hdfs) would be useful.

(INSERT/REPLACE) INTO
EXTERN(<external source function>)
AS <format>
[OVERWRITE ALL]
<select query>

For example: A statement to export all rows from a table into S3 as CSV files would look like

REPLACE INTO 
EXTERN(s3(bucket='bucket1', prefix='export/', tempDir='/var/temp'))
AS CSV
OVERWRITE ALL
SELECT * FROM wikipedia

Initially, only CSV is supported as an export format, but this can be expanded to support other formats easily.

Release note

  • Adds export statements to MSQ, as a part of INSERT and REPLACE statements. This will allow the results of a query to be written to destination in a configurable format.

Key changed/added classes in this PR
  • sql/src/main/codegen/includes/common.ftl
  • sql/src/main/codegen/includes/replace.ftl
  • IngestHandler

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - Documentation Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying Needs web console change Backend API changes that would benefit from frontend support in the web console

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants