Skip to content

Add document platform CRUD models, service, and API endpoints#575

Merged
jfrench9 merged 2 commits into
mainfrom
feature/document-platform-crud
Apr 3, 2026
Merged

Add document platform CRUD models, service, and API endpoints#575
jfrench9 merged 2 commits into
mainfrom
feature/document-platform-crud

Conversation

@jfrench9
Copy link
Copy Markdown
Member

@jfrench9 jfrench9 commented Apr 2, 2026

Summary

Introduces full CRUD functionality for documents within the platform, spanning the database migration, domain models, service layer, API router, and comprehensive test coverage.

Key Accomplishments

Database & Models

  • New migration (366a6659343d): Adds the documents table to the platform database schema.
  • Document domain model (models/core/document/document.py): Defines the core document entity with 190 lines of model logic, including validation and data transformations.
  • Search model updates: Extended models/api/search.py to support document-related search parameters and responses.

Service Layer

  • Document service (operations/documents/service.py): Implements create, read, update, delete, and list operations for documents, encapsulating all business logic and data access.

API Endpoints

  • Enhanced documents router (routers/graphs/documents.py): Significantly expanded the existing documents router (~200 lines of changes) with full CRUD endpoints, proper request/response schemas, and integration with the new service layer.

Dependencies

  • Updated robosystems-client dependency from previous version to 0.2.42 to align with new document API contracts.

Breaking Changes

  • The documents router has been substantially reworked. Any existing clients consuming the previous document endpoints should verify compatibility with the updated request/response shapes.
  • The new migration must be applied before deploying this change.

Testing

Comprehensive test suite added across all layers:

  • Model tests (tests/models/core/document/test_document.py): 275 lines covering document model validation, serialization, and edge cases.
  • Service tests (tests/operations/documents/test_document_service.py): 232 lines testing all CRUD operations, error handling, and business logic.
  • Router/integration tests (tests/routers/graphs/test_documents.py): 321 lines covering endpoint behavior, authorization, input validation, and error responses.
  • Updated tests/conftest.py with necessary fixtures for the new test modules.

Infrastructure Considerations

  • A new database migration is included and must be executed against the platform database prior to deployment.
  • The robosystems-client dependency version bump (0.2.42) requires a lock file update; ensure CI/CD pipelines pull the updated dependencies.

🤖 Generated with Claude Code

Branch Info:

  • Source: feature/document-platform-crud
  • Target: main
  • Type: feature

Co-Authored-By: Claude noreply@anthropic.com

…cument models and API endpoints

- Updated the robosystems-client version in pyproject.toml and uv.lock files for improved functionality.
- Added new fields to DocumentUploadResponse and DocumentListItem models for better metadata handling.
- Introduced DocumentDetailResponse and DocumentUpdateRequest models to support detailed document operations.
- Refactored document-related API endpoints to utilize PostgreSQL for document management and improved error handling.
@jfrench9
Copy link
Copy Markdown
Member Author

jfrench9 commented Apr 2, 2026

@claude please review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Apr 2, 2026

Claude finished @jfrench9's task in 1m 51s —— View job


PR Review: Add document platform CRUD models, service, and API endpoints

  • Gather context and understand the request
  • Read migration file
  • Read domain model
  • Read service layer
  • Read API router
  • Read tests
  • Review search model changes
  • Provide comprehensive review feedback

Overall this is a solid, well-structured PR. The layering (migration → model → service → router → tests) is clean and consistent with the rest of the codebase. A few issues worth addressing before merge.


Bugs / Correctness

1. Fragile 404 detection in update_document router (routers/graphs/documents.py:230)

status = 404 if "not found" in str(e).lower() else 422

This string-match on error messages is fragile. If the message ever changes, or a different ValueError with "not found" in it surfaces (e.g., from the OpenSearch sync path), the wrong status code will be returned. Better to define a custom DocumentNotFoundError or use a KeyError/LookupError for the not-found case that the router can catch separately. Fix this →

2. Upsert path doesn't propagate user_id (operations/documents/service.py:55-62)

When an external_id match triggers an upsert, update_document is called but user_id is not updated. If a different user uploads a document with the same external_id, the original owner is silently preserved. This may be intentional (ownership doesn't transfer on upsert), but it's worth an explicit comment since it's a security-relevant decision.

3. OpenSearch document ID is derived inconsistently (operations/documents/service.py:176-179, 207)

In _sync_to_opensearch, the OpenSearch request is built with external_id=doc.id (the PG ULID). Then _delete_from_opensearch constructs f"doc_{graph_id}_{document_id}". These two formats need to stay perfectly aligned — if the search service changes how it constructs the document ID from external_id, deletes will miss. The ID derivation logic is split across the service and the search service. A comment or constant would help, or better, the search service should return the OS doc ID on upload so delete can use it directly.


Design Concerns

4. Two database sessions per write request (routers/graphs/documents.py:44-60, 145-161)

_resolve_tier opens and closes its own SessionFactory() session, then the endpoint immediately opens a second session for the document operation. That's two DB connections per upload_document and upload_documents_bulk call. Consider passing the existing session into _resolve_tier, or reading graph_tier within the DocumentService using the same session. Fix this →

5. No pagination on list_documents (routers/graphs/documents.py:96-114)

Document.get_by_graph returns all documents without any LIMIT/OFFSET. For graphs with hundreds or thousands of documents this will be a large response. Consider adding limit and offset query parameters to the endpoint now before this endpoint is in use.

6. Document.get_by_id is orphaned (models/core/document/document.py:105-107)

get_by_id is never called by the service layer — all access goes through get_by_id_and_graph (correct for security). This method will be confusing to future readers who wonder why it exists without a graph scope check. Either remove it or add a comment explaining it's only for internal/admin use.


Minor / Style

7. Untyped doc parameter in converter functions (routers/graphs/documents.py:63, 77)

def _document_to_list_item(doc) -> DocumentListItem:
def _document_to_detail(doc) -> DocumentDetailResponse:

Add doc: Document annotations. Helps type checkers and documents intent.

8. Timestamp fields use str instead of datetime (models/api/search.py:150-151, 176-177)

DocumentListItem.created_at: str and DocumentDetailResponse.created_at: str accept any string — datetime would give validation for free and let consumers use .isoformat() themselves. Fix this →

9. Redundant onupdate on updated_at (models/core/document/document.py:57-62)

The update() method already sets self.updated_at = datetime.now(UTC) manually before committing. The onupdate=lambda: datetime.now(UTC) column-level hook will also fire on ORM UPDATE statements, causing the field to be set twice (harmlessly, but confusingly). Pick one approach — if manual control is needed, drop onupdate.


Tests

Test coverage is thorough across all three layers. A couple of gaps worth noting:

  • TestBulkUpload is missing from the router tests — the upload_documents_bulk endpoint has no test coverage.
  • test_upserts_on_external_id in TestCreateDocument patches both get_by_external_id and get_by_id_and_graph (via with patch.object) but the test structure makes the latter implicit. Worth making the dependency explicit.

Summary

Severity Count
Bug / Correctness 3
Design 3
Minor / Style 3
Test gaps 1

The fragile error detection (#1) and two-session pattern (#4) are the ones I'd prioritize before merging. The rest are low-risk improvements that could be addressed in follow-up.

Comment thread tests/operations/documents/test_document_service.py Fixed
Comment thread tests/operations/documents/test_document_service.py Fixed
- Removed the onupdate attribute from the updated_at column in the Document model.
- Eliminated the get_by_id method from the Document class to streamline document retrieval.
- Updated error handling in DocumentService to raise KeyError instead of ValueError when a document is not found.
- Modified the _resolve_tier function to accept an existing session, improving session management.
- Enhanced type hints in document-related functions for better clarity and type safety.
@jfrench9 jfrench9 merged commit b2dec89 into main Apr 3, 2026
7 checks passed
@jfrench9 jfrench9 deleted the feature/document-platform-crud branch April 3, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant