Merged
Conversation
adamsitnik
commented
Oct 31, 2025
src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipelineOptions.cs
Outdated
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds telemetry, logging, and error handling capabilities to the data ingestion pipeline. The key enhancement is a new IngestionPipeline class that orchestrates the document reading, chunking, and writing processes with comprehensive observability support.
- Introduces
IngestionPipeline<T>with Activity tracing and logging support - Adds configurable error handling through
IngestionPipelineOptions - Reorganizes test utilities and namespaces for better organization
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| IngestionPipeline.cs | New pipeline orchestrator with telemetry and error handling |
| IngestionPipelineOptions.cs | Configuration options for pipeline error handling and activity source naming |
| Log.cs | Logging message definitions using source generator pattern |
| DiagnosticsConstants.cs | Activity names and tag constants for telemetry |
| Microsoft.Extensions.DataIngestion.csproj | Adds logging and diagnostics dependencies |
| IngestionPipelineTests.cs | Comprehensive tests for pipeline processing and error scenarios |
| TestReader.cs | New test helper for simulating custom reader behavior |
| TestEmbeddingGenerator.cs | Moved to base namespace for reuse across test namespaces |
| VectorStoreWriterTests.cs | Namespace changed to align with folder structure |
| SqliteVectorStoreWriterTests.cs | Namespace changed to align with folder structure |
| InMemoryVectorStoreWriterTests.cs | Namespace changed to align with folder structure |
| Microsoft.Extensions.DataIngestion.Tests.csproj | Adds OpenTelemetry dependency for telemetry testing |
Comments suppressed due to low confidence (1)
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs:1
- The condition uses
>=but should use>when comparing withMaximumErrorsPerProcessing. With the current logic, ifMaximumErrorsPerProcessingis set to 3, processing will stop after 3 errors (indices 0,1,2), but the documentation states 'maximum number of ingestions that are allowed to fail', which should allow 3 failures before throwing on the 4th.
// Licensed to the .NET Foundation under one or more agreements.
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Outdated
Show resolved
Hide resolved
stephentoub
reviewed
Oct 31, 2025
src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Outdated
Show resolved
Hide resolved
# Conflicts: # src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj
src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Outdated
Show resolved
Hide resolved
stephentoub
approved these changes
Nov 4, 2025
src/Libraries/Microsoft.Extensions.DataIngestion/DiagnosticsConstants.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Outdated
Show resolved
Hide resolved
…g a single DocumentProcessors make Pipeline return a collection of the results, so the users can handle everything as they want to
adamsitnik
commented
Nov 4, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Pipeline itself is rather simple. The tricky part is exception handling: should the pipeline fail when a processing of a single file has failed? Initially I wanted to introduce something like
ContinueOnError, but when I search the repo and foundMaximumConsecutiveErrorsPerRequestand got inspired with that idea (for better or worse).Microsoft Reviewers: Open in CodeFlow