Skip to content

bulk from file (json, json array, parquet, delta lake)#27

Merged
fupelaqu merged 11 commits into
mainfrom
feature/bulkFromSourceFile
Dec 6, 2025
Merged

bulk from file (json, json array, parquet, delta lake)#27
fupelaqu merged 11 commits into
mainfrom
feature/bulkFromSourceFile

Conversation

@fupelaqu

@fupelaqu fupelaqu commented Dec 5, 2025

Copy link
Copy Markdown
Contributor

Add support to bulk from multiple data sources :

Data Sources

Source Type Format Description
In-Memory Scala objects Direct streaming from collections
JSON Text Newline-delimited JSON (NDJSON)
JSON Array Text JSON array with nested structures
Parquet Binary Columnar storage format
Delta Lake Directory ACID transactional data lake

Examples:

// High-performance file indexing
implicit val options: BulkOptions = BulkOptions(
  defaultIndex = "products",
  maxBulkSize = 10000,
  balance = 16,
  disableRefresh = true
)

implicit val hadoopConf: Configuration = new Configuration()

// Load from Parquet
client.bulkFromFile(
  filePath = "/data/products.parquet",
  format = Parquet,
  idKey = Some("id")
).foreach { result =>
  result.indices.foreach(client.refresh)
  println(s"Indexed ${result.successCount} docs at ${result.metrics.throughput} docs/sec")
}

// Load from Delta Lake
client.bulkFromFile(
  filePath = "/data/delta-products",
  format = Delta,
  idKey = Some("id"),
  update = Some(true)
).foreach { result =>
  println(s"Updated ${result.successCount} products from Delta Lake")
}

// Load JSON Array with nested objects
client.bulkFromFile(
  filePath = "/data/persons.json",
  format = JsonArray,
  idKey = Some("uuid")
).foreach { result =>
  println(s"Indexed ${result.successCount} persons with nested structures")
}

@fupelaqu fupelaqu marked this pull request as ready for review December 5, 2025 20:39
@fupelaqu fupelaqu merged commit e4b0e40 into main Dec 6, 2025
2 checks passed
@fupelaqu fupelaqu deleted the feature/bulkFromSourceFile branch December 9, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant