Skip to content

feat(plugins): import data from CSV and TSV files into a table (#1568)#1578

Open
datlechin wants to merge 3 commits into
mainfrom
feat/1568-csv-import
Open

feat(plugins): import data from CSV and TSV files into a table (#1568)#1578
datlechin wants to merge 3 commits into
mainfrom
feat/1568-csv-import

Conversation

@datlechin
Copy link
Copy Markdown
Member

Closes #1568.

Adds CSV and TSV import into a database table for any SQL target.

What it does

Pick File > Import > From CSV and choose a .csv or .tsv file. The row import sheet opens with CSV parsing options. Map columns to an existing table, or create a new table with inferred, editable types.

Options

  • Delimiter: auto-detect, comma, semicolon, tab, pipe
  • Quote character: double or single
  • Encoding: auto-detect, UTF-8, ISO Latin 1, Windows-1252
  • First row is a header
  • Trim leading and trailing spaces
  • Treat empty values as NULL, plus an optional NULL token
  • On error, wrap in transaction, delete existing rows (shared with SQL and JSON import)

Changing any dialect option re-reads the file so the column mapping reflects it.

How it works

  • Reuses the existing CSV parser. CSVStreamingParser, CSVDialect, and CSVTypeInferrer move from CSVInspectorPlugin into TableProPluginKit as public types, so the importer and the inspector share one RFC 4180 tokenizer (quoted commas and newlines, doubled quotes, BOM, delimiter and encoding detection). Additive PluginKit ABI, no version bump.
  • New CSVImportPlugin bundle memory-maps the file, indexes rows, and inserts in 500-row parameterized batches. Memory is bounded by the row-range index, not the row data. Cancellable per batch, wrapped in a transaction.
  • JSONImportSheet becomes RowImportSheet, shared by JSON and CSV. The format plugin supplies the icon, name, and options view. JSONImportTypeMapper becomes ImportTypeMapper. A new fieldDetectionSignature hook (additive, defaults to empty) drives live re-detection when an option changes.

Tests

  • New CSVImportPluginTests: dialect resolution, header and header-less naming with dedup, NULL/empty/trim handling, ragged rows, type mapping, quoted/embedded/doubled-quote parsing, semicolon auto-detect.
  • The moved CSV parser suites and the JSON import suites still pass after the move. 116 cases green.

Notes

  • Inserts are parameterized, so CSV values are never concatenated into SQL.
  • Docs updated in docs/features/import-export.mdx; CHANGELOG entry under Unreleased.

@mintlify
Copy link
Copy Markdown

mintlify Bot commented Jun 4, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
TablePro 🟢 Ready View Preview Jun 4, 2026, 5:05 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd014d0c1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +75 to +79
if settings.deleteExistingRows {
try await sink.deleteAllRowsFromTargetTable()
}
if useTransaction {
try await sink.beginTransaction()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Start the transaction before deleting target rows

When a CSV import is run with both Delete existing rows and the default transactional rollback mode, the target table is cleared before beginTransaction() is called, so a later parse/insert error rolls back only the inserts and leaves the pre-existing data deleted. This makes the rollback option unsafe for the exact scenario where users are replacing a table from a CSV; put the delete inside the transaction when useTransaction is true.

Useful? React with 👍 / 👎.

progress.incrementStatement(by: batch.count)
} catch {
switch settings.errorHandling {
case .stopAndRollback, .stopAndCommit:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor Stop and Commit for transactional CSV imports

If the user selects Stop and Commit with Wrap in transaction enabled, the first failed batch takes this combined case and throws to the outer handler, which always rolls back while useTransaction is true. That makes Stop and Commit behave like Stop and Rollback for CSV imports instead of preserving the successfully inserted batches.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 856b29dfb5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +37 to +44
var used = Set<String>()
for index in 0..<columnCount {
let raw = header.flatMap { index < $0.count ? $0[index] : nil } ?? ""
let trimmed = raw.trimmingCharacters(in: .whitespacesAndNewlines)
let base = trimmed.isEmpty ? defaultColumnName(index) : trimmed
var unique = base
var suffix = 2
while !used.insert(unique).inserted {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deduplicate CSV headers case-insensitively

When a CSV header contains names that differ only by case (for example id,ID) and the user imports into an existing table, this case-sensitive used set leaves both source fields distinct in the mapping UI, but ImportDataSinkAdapter later lowercases source keys when building columnMapping. One mapping then overwrites the other, and both row fields can be routed to the same target column or one source column can be silently lost; make the generated CSV field names unique under the same case-folding used by the sink.

Useful? React with 👍 / 👎.

Comment on lines +17 to +27

static func resolveDialect(in data: Data, options: CSVImportOptions) -> CSVDialect {
var dialect = CSVDialect.detect(from: data)
if let byte = options.delimiter.byte {
dialect.delimiter = byte
}
dialect.quoteChar = options.quoteCharacter.byte
if let forced = options.encoding.stringEncoding {
dialect.encoding = forced
}
return dialect
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid returning UTF-16 dialects to the byte parser

For a BOM-marked UTF-16 CSV, CSVDialect.detect(from:) sets .utf16LittleEndian/.utf16BigEndian, but CSVStreamingParser still scans delimiters and newlines as single bytes and advances only one byte past them. A UTF-16LE file such as a,b\n1,2 therefore leaves the delimiter's trailing NUL at the start of the next field, producing corrupted headers/values during detection and import; either transcode/reject UTF-16 before parsing or keep auto-detection to encodings the byte parser can tokenize correctly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 006fb24118

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +128 to +130
let value = column < fields.count ? fields[column] : ""
guard !value.isEmpty else { continue }
samples[column].append(value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply CSV null and trim options before inferring types

When the user sets CSV options such as NULL text or Trim leading and trailing spaces, imports apply those conversions in cellValue, but field detection still feeds the raw token into samples. For a new-table import with values like 1,\N,2 after configuring NULL text = \N (or numeric values padded with spaces while trim is enabled), the inferred type becomes text even though the rows will be inserted as integers/nulls, so the generated table schema is wrong.

Useful? React with 👍 / 👎.

Comment on lines +100 to +101
.onChange(of: currentPlugin?.fieldDetectionSignature) { _, _ in
Task { await redetectFields() }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh new-table fields after inactive CSV option changes

This only reloads the currently selected destination when CSV detection options change, but .task has already populated newColumns and loadNewColumns() will later no-op while newColumnsLoaded is true. If a user changes a field-shaping option while on Existing table (for example disables First row is a header) and then switches to New table, the stale mapping from the old settings is used while the import runs with the new settings; in that header toggle case the new table is created with old header names but the imported rows are keyed as Column 1, Column 2, so the sink skips the values and the import can appear successful with no data inserted.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support import from csv

1 participant