Add document platform CRUD models, service, and API endpoints by jfrench9 · Pull Request #575 · RoboFinSystems/robosystems

jfrench9 · 2026-04-02T23:55:45Z

Summary

Introduces full CRUD functionality for documents within the platform, spanning the database migration, domain models, service layer, API router, and comprehensive test coverage.

Key Accomplishments

Database & Models

New migration (366a6659343d): Adds the documents table to the platform database schema.
Document domain model (models/core/document/document.py): Defines the core document entity with 190 lines of model logic, including validation and data transformations.
Search model updates: Extended models/api/search.py to support document-related search parameters and responses.

Service Layer

Document service (operations/documents/service.py): Implements create, read, update, delete, and list operations for documents, encapsulating all business logic and data access.

API Endpoints

Enhanced documents router (routers/graphs/documents.py): Significantly expanded the existing documents router (~200 lines of changes) with full CRUD endpoints, proper request/response schemas, and integration with the new service layer.

Dependencies

Updated robosystems-client dependency from previous version to 0.2.42 to align with new document API contracts.

Breaking Changes

The documents router has been substantially reworked. Any existing clients consuming the previous document endpoints should verify compatibility with the updated request/response shapes.
The new migration must be applied before deploying this change.

Testing

Comprehensive test suite added across all layers:

Model tests (tests/models/core/document/test_document.py): 275 lines covering document model validation, serialization, and edge cases.
Service tests (tests/operations/documents/test_document_service.py): 232 lines testing all CRUD operations, error handling, and business logic.
Router/integration tests (tests/routers/graphs/test_documents.py): 321 lines covering endpoint behavior, authorization, input validation, and error responses.
Updated tests/conftest.py with necessary fixtures for the new test modules.

Infrastructure Considerations

A new database migration is included and must be executed against the platform database prior to deployment.
The robosystems-client dependency version bump (0.2.42) requires a lock file update; ensure CI/CD pipelines pull the updated dependencies.

🤖 Generated with Claude Code

Branch Info:

Source: feature/document-platform-crud
Target: main
Type: feature

Co-Authored-By: Claude noreply@anthropic.com

…cument models and API endpoints - Updated the robosystems-client version in pyproject.toml and uv.lock files for improved functionality. - Added new fields to DocumentUploadResponse and DocumentListItem models for better metadata handling. - Introduced DocumentDetailResponse and DocumentUpdateRequest models to support detailed document operations. - Refactored document-related API endpoints to utilize PostgreSQL for document management and improved error handling.

jfrench9 · 2026-04-02T23:55:47Z

@claude please review this PR

claude · 2026-04-02T23:56:13Z

Claude finished @jfrench9's task in 1m 51s —— View job

PR Review: Add document platform CRUD models, service, and API endpoints

Overall this is a solid, well-structured PR. The layering (migration → model → service → router → tests) is clean and consistent with the rest of the codebase. A few issues worth addressing before merge.

Bugs / Correctness

1. Fragile 404 detection in update_document router (routers/graphs/documents.py:230)

status = 404 if "not found" in str(e).lower() else 422

This string-match on error messages is fragile. If the message ever changes, or a different ValueError with "not found" in it surfaces (e.g., from the OpenSearch sync path), the wrong status code will be returned. Better to define a custom DocumentNotFoundError or use a KeyError/LookupError for the not-found case that the router can catch separately. Fix this →

2. Upsert path doesn't propagate user_id (operations/documents/service.py:55-62)

When an external_id match triggers an upsert, update_document is called but user_id is not updated. If a different user uploads a document with the same external_id, the original owner is silently preserved. This may be intentional (ownership doesn't transfer on upsert), but it's worth an explicit comment since it's a security-relevant decision.

3. OpenSearch document ID is derived inconsistently (operations/documents/service.py:176-179, 207)

In _sync_to_opensearch, the OpenSearch request is built with external_id=doc.id (the PG ULID). Then _delete_from_opensearch constructs f"doc_{graph_id}_{document_id}". These two formats need to stay perfectly aligned — if the search service changes how it constructs the document ID from external_id, deletes will miss. The ID derivation logic is split across the service and the search service. A comment or constant would help, or better, the search service should return the OS doc ID on upload so delete can use it directly.

Design Concerns

4. Two database sessions per write request (routers/graphs/documents.py:44-60, 145-161)

_resolve_tier opens and closes its own SessionFactory() session, then the endpoint immediately opens a second session for the document operation. That's two DB connections per upload_document and upload_documents_bulk call. Consider passing the existing session into _resolve_tier, or reading graph_tier within the DocumentService using the same session. Fix this →

5. No pagination on list_documents (routers/graphs/documents.py:96-114)

Document.get_by_graph returns all documents without any LIMIT/OFFSET. For graphs with hundreds or thousands of documents this will be a large response. Consider adding limit and offset query parameters to the endpoint now before this endpoint is in use.

6. Document.get_by_id is orphaned (models/core/document/document.py:105-107)

get_by_id is never called by the service layer — all access goes through get_by_id_and_graph (correct for security). This method will be confusing to future readers who wonder why it exists without a graph scope check. Either remove it or add a comment explaining it's only for internal/admin use.

Minor / Style

7. Untyped doc parameter in converter functions (routers/graphs/documents.py:63, 77)

def _document_to_list_item(doc) -> DocumentListItem:
def _document_to_detail(doc) -> DocumentDetailResponse:

Add doc: Document annotations. Helps type checkers and documents intent.

8. Timestamp fields use str instead of datetime (models/api/search.py:150-151, 176-177)

DocumentListItem.created_at: str and DocumentDetailResponse.created_at: str accept any string — datetime would give validation for free and let consumers use .isoformat() themselves. Fix this →

9. Redundant onupdate on updated_at (models/core/document/document.py:57-62)

The update() method already sets self.updated_at = datetime.now(UTC) manually before committing. The onupdate=lambda: datetime.now(UTC) column-level hook will also fire on ORM UPDATE statements, causing the field to be set twice (harmlessly, but confusingly). Pick one approach — if manual control is needed, drop onupdate.

Tests

Test coverage is thorough across all three layers. A couple of gaps worth noting:

TestBulkUpload is missing from the router tests — the upload_documents_bulk endpoint has no test coverage.
test_upserts_on_external_id in TestCreateDocument patches both get_by_external_id and get_by_id_and_graph (via with patch.object) but the test structure makes the latter implicit. Worth making the dependency explicit.

Summary

Severity	Count
Bug / Correctness	3
Design	3
Minor / Style	3
Test gaps	1

The fragile error detection (#1) and two-session pattern (#4) are the ones I'd prioritize before merging. The rest are low-risk improvements that could be addressed in follow-up.

- Removed the onupdate attribute from the updated_at column in the Document model. - Eliminated the get_by_id method from the Document class to streamline document retrieval. - Updated error handling in DocumentService to raise KeyError instead of ValueError when a document is not found. - Modified the _resolve_tier function to accept an existing session, improving session management. - Enhanced type hints in document-related functions for better clarity and type safety.

github-code-quality Bot found potential problems Apr 2, 2026

View reviewed changes

Comment thread tests/operations/documents/test_document_service.py Fixed

Comment thread tests/operations/documents/test_document_service.py Fixed

jfrench9 merged commit b2dec89 into main Apr 3, 2026
7 checks passed

jfrench9 deleted the feature/document-platform-crud branch April 3, 2026 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add document platform CRUD models, service, and API endpoints#575

Add document platform CRUD models, service, and API endpoints#575
jfrench9 merged 2 commits into
mainfrom
feature/document-platform-crud

jfrench9 commented Apr 2, 2026

Uh oh!

jfrench9 commented Apr 2, 2026

Uh oh!

claude Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jfrench9 commented Apr 2, 2026

Summary

Key Accomplishments

Database & Models

Service Layer

API Endpoints

Dependencies

Breaking Changes

Testing

Infrastructure Considerations

Uh oh!

jfrench9 commented Apr 2, 2026

Uh oh!

claude Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add document platform CRUD models, service, and API endpoints

Bugs / Correctness

Design Concerns

Minor / Style

Tests

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 2, 2026 •

edited

Loading