Skip to content

Comments

fix: catch pydantic ValidationError in VectorStoreQueryOutputParser#20450

Merged
logan-markewich merged 6 commits intorun-llama:mainfrom
majiayu000:fix/vector-auto-retriever-validation-error
Feb 6, 2026
Merged

fix: catch pydantic ValidationError in VectorStoreQueryOutputParser#20450
logan-markewich merged 6 commits intorun-llama:mainfrom
majiayu000:fix/vector-auto-retriever-validation-error

Conversation

@majiayu000
Copy link
Contributor

Description

Wrap pydantic ValidationError as OutputParserException in VectorStoreQueryOutputParser.parse() to ensure consistent exception handling in VectorIndexAutoRetriever._parse_generated_spec.

Previously, when the LLM returned malformed JSON that passed JSON parsing but failed Pydantic validation, an uncaught ValidationError would be raised instead of the expected OutputParserException.

Fixes #19410

New Package?

  • No

Version Bump?

  • No

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • I added new unit tests to cover this change

Added two test cases:

  • test_output_parser_invalid_schema_raises_output_parser_exception: Tests that missing required fields raise OutputParserException
  • test_output_parser_invalid_type_raises_output_parser_exception: Tests that invalid types raise OutputParserException

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

Wrap pydantic ValidationError as OutputParserException to ensure
consistent exception handling in VectorIndexAutoRetriever.

Fixes run-llama#19410

Signed-off-by: majiayu000 <1835304752@qq.com>
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 5, 2026
Comment on lines +16 to +19
except ValidationError as e:
raise OutputParserException(
f"Failed to validate query spec. Error: {e}. Got JSON dict: {json_dict}"
) from e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you notice a better UX here by throwing the OutputParserException instead of the ValidationError that would be normally thrown without this extra step?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Farther up the stack the OutputParserException is caught

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see the linked issue for more details)

Signed-off-by: majiayu000 <1835304752@qq.com>
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jan 5, 2026
Signed-off-by: majiayu000 <1835304752@qq.com>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these changes are unrelated

@majiayu000
Copy link
Contributor Author

@AstraBert Thanks for the suggestion. The parser now wraps the ValidationError and raises OutputParserException with context (see llama-index-core/llama_index/core/indices/vector_store/retrievers/auto_retriever/output_parser.py).

@logan-markewich The schema changes are part of the MediaResource hash fix; happy to split if you prefer.

@AstraBert
Copy link
Member

@majiayu000 I see you have another PR open for the MediaResource hashing changes, we can follow up there with those and I suggest you remove them from here

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 13, 2026
@majiayu000
Copy link
Contributor Author

Thanks @AstraBert! You are right.

I have updated this PR and removed the MediaResource hashing changes. Those changes are indeed handled in PR #20451.

This PR is now focused solely on catching the ValidationError in VectorStoreQueryOutputParser.

@logan-markewich logan-markewich merged commit 70d949a into run-llama:main Feb 6, 2026
7 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: VectorIndexAutoRetriever._parse_generated_spec may raise uncaught pydantic.ValidationError

3 participants