feat(plugins): import data from CSV and TSV files into a table (#1568)#1578
feat(plugins): import data from CSV and TSV files into a table (#1568)#1578datlechin wants to merge 3 commits into
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd014d0c1f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if settings.deleteExistingRows { | ||
| try await sink.deleteAllRowsFromTargetTable() | ||
| } | ||
| if useTransaction { | ||
| try await sink.beginTransaction() |
There was a problem hiding this comment.
Start the transaction before deleting target rows
When a CSV import is run with both Delete existing rows and the default transactional rollback mode, the target table is cleared before beginTransaction() is called, so a later parse/insert error rolls back only the inserts and leaves the pre-existing data deleted. This makes the rollback option unsafe for the exact scenario where users are replacing a table from a CSV; put the delete inside the transaction when useTransaction is true.
Useful? React with 👍 / 👎.
| progress.incrementStatement(by: batch.count) | ||
| } catch { | ||
| switch settings.errorHandling { | ||
| case .stopAndRollback, .stopAndCommit: |
There was a problem hiding this comment.
Honor Stop and Commit for transactional CSV imports
If the user selects Stop and Commit with Wrap in transaction enabled, the first failed batch takes this combined case and throws to the outer handler, which always rolls back while useTransaction is true. That makes Stop and Commit behave like Stop and Rollback for CSV imports instead of preserving the successfully inserted batches.
Useful? React with 👍 / 👎.
# Conflicts: # CHANGELOG.md
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 856b29dfb5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var used = Set<String>() | ||
| for index in 0..<columnCount { | ||
| let raw = header.flatMap { index < $0.count ? $0[index] : nil } ?? "" | ||
| let trimmed = raw.trimmingCharacters(in: .whitespacesAndNewlines) | ||
| let base = trimmed.isEmpty ? defaultColumnName(index) : trimmed | ||
| var unique = base | ||
| var suffix = 2 | ||
| while !used.insert(unique).inserted { |
There was a problem hiding this comment.
Deduplicate CSV headers case-insensitively
When a CSV header contains names that differ only by case (for example id,ID) and the user imports into an existing table, this case-sensitive used set leaves both source fields distinct in the mapping UI, but ImportDataSinkAdapter later lowercases source keys when building columnMapping. One mapping then overwrites the other, and both row fields can be routed to the same target column or one source column can be silently lost; make the generated CSV field names unique under the same case-folding used by the sink.
Useful? React with 👍 / 👎.
|
|
||
| static func resolveDialect(in data: Data, options: CSVImportOptions) -> CSVDialect { | ||
| var dialect = CSVDialect.detect(from: data) | ||
| if let byte = options.delimiter.byte { | ||
| dialect.delimiter = byte | ||
| } | ||
| dialect.quoteChar = options.quoteCharacter.byte | ||
| if let forced = options.encoding.stringEncoding { | ||
| dialect.encoding = forced | ||
| } | ||
| return dialect |
There was a problem hiding this comment.
Avoid returning UTF-16 dialects to the byte parser
For a BOM-marked UTF-16 CSV, CSVDialect.detect(from:) sets .utf16LittleEndian/.utf16BigEndian, but CSVStreamingParser still scans delimiters and newlines as single bytes and advances only one byte past them. A UTF-16LE file such as a,b\n1,2 therefore leaves the delimiter's trailing NUL at the start of the next field, producing corrupted headers/values during detection and import; either transcode/reject UTF-16 before parsing or keep auto-detection to encodings the byte parser can tokenize correctly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 006fb24118
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let value = column < fields.count ? fields[column] : "" | ||
| guard !value.isEmpty else { continue } | ||
| samples[column].append(value) |
There was a problem hiding this comment.
Apply CSV null and trim options before inferring types
When the user sets CSV options such as NULL text or Trim leading and trailing spaces, imports apply those conversions in cellValue, but field detection still feeds the raw token into samples. For a new-table import with values like 1,\N,2 after configuring NULL text = \N (or numeric values padded with spaces while trim is enabled), the inferred type becomes text even though the rows will be inserted as integers/nulls, so the generated table schema is wrong.
Useful? React with 👍 / 👎.
| .onChange(of: currentPlugin?.fieldDetectionSignature) { _, _ in | ||
| Task { await redetectFields() } |
There was a problem hiding this comment.
Refresh new-table fields after inactive CSV option changes
This only reloads the currently selected destination when CSV detection options change, but .task has already populated newColumns and loadNewColumns() will later no-op while newColumnsLoaded is true. If a user changes a field-shaping option while on Existing table (for example disables First row is a header) and then switches to New table, the stale mapping from the old settings is used while the import runs with the new settings; in that header toggle case the new table is created with old header names but the imported rows are keyed as Column 1, Column 2, so the sink skips the values and the import can appear successful with no data inserted.
Useful? React with 👍 / 👎.
Closes #1568.
Adds CSV and TSV import into a database table for any SQL target.
What it does
Pick File > Import > From CSV and choose a
.csvor.tsvfile. The row import sheet opens with CSV parsing options. Map columns to an existing table, or create a new table with inferred, editable types.Options
Changing any dialect option re-reads the file so the column mapping reflects it.
How it works
CSVStreamingParser,CSVDialect, andCSVTypeInferrermove fromCSVInspectorPluginintoTableProPluginKitas public types, so the importer and the inspector share one RFC 4180 tokenizer (quoted commas and newlines, doubled quotes, BOM, delimiter and encoding detection). Additive PluginKit ABI, no version bump.CSVImportPluginbundle memory-maps the file, indexes rows, and inserts in 500-row parameterized batches. Memory is bounded by the row-range index, not the row data. Cancellable per batch, wrapped in a transaction.JSONImportSheetbecomesRowImportSheet, shared by JSON and CSV. The format plugin supplies the icon, name, and options view.JSONImportTypeMapperbecomesImportTypeMapper. A newfieldDetectionSignaturehook (additive, defaults to empty) drives live re-detection when an option changes.Tests
CSVImportPluginTests: dialect resolution, header and header-less naming with dedup, NULL/empty/trim handling, ragged rows, type mapping, quoted/embedded/doubled-quote parsing, semicolon auto-detect.Notes
docs/features/import-export.mdx; CHANGELOG entry under Unreleased.