feat: Add comprehensive workspace and data operation tools for MCP#81
Conversation
- Added new tools for managing workspaces, including creation, deletion, listing, and switching between workspaces, allowing for isolated development environments. - Introduced data operation tools for building fact grids, ingesting files, mapping elements, querying staging tables, and materializing graphs, enhancing data handling capabilities. - Updated OpenAPI tags and documentation to reflect the new tools and their functionalities, improving API clarity and usability. - Refactored existing code to integrate the new tools into the MCP framework, ensuring seamless operation and user experience.
|
@claude please review this PR |
|
Claude finished @jfrench9's task —— View job Code Review CompleteReviewing PR #81: Add comprehensive workspace and data operation tools for MCP SummaryThis PR adds comprehensive workspace management and data operation tools for the MCP (Model Context Protocol) implementation. The implementation is generally solid with good test coverage, but there are several important issues that need to be addressed before merging. Critical Issues1. Database Session Management - Resource Leak Risk 🔴Location: The code uses db = next(get_db_session())
try:
# ... operations ...
finally:
db.close()Problem: This pattern doesn't properly consume the generator, so the cleanup code in Recommendation: Use the proper context manager pattern: from robosystems.database import get_db_session
async with get_db_session() as db:
# ... operations ...
# Session automatically cleaned upOr if async context manager isn't available, use: db_gen = get_db_session()
db = next(db_gen)
try:
# ... operations ...
finally:
try:
next(db_gen)
except StopIteration:
passThis affects:
2. Workspace ID Validation Logic Error 🔴Location: The validation logic prevents deletion of the primary graph, but has a flawed check: # Can't delete primary graph
if workspace_id == parent_graph_id:
return {
"error": "cannot_delete_primary",
...
}Problem: This check occurs AFTER extracting Recommendation: Either:
3. Inconsistent Error Handling Between Tools 🟡workspace.py vs data_tools.py
For example, compare:
Recommendation: Standardize on a consistent error response format across all tools. Suggest: {
"error": "error_code",
"message": "Human readable message",
"details": {...}, # Optional
"hint": "..." # Optional
}Major Issues4. Missing Input Validation 🟡Location: Multiple data_tools.py functions Several tools lack proper input validation:
SQL Injection Risk in # Add LIMIT if not present
if "LIMIT" not in sql.upper():
sql = f"{sql} LIMIT {limit}"This directly concatenates user input. While DuckDB has some protections, this is still risky. Recommendation:
5. Test Coverage Claims vs Reality 🟡PR Description claims: "782+ new test cases" Reality: The test file has 782 lines, but only ~40 actual test functions. This is misleading. The test coverage is actually:
Missing test coverage:
Minor Issues6. Incomplete Tool Implementations 🟡Location: data_tools.py:192-235, data_tools.py:324-338 Two tools return "requires client SDK" messages instead of actual implementations:
Question: Why include these in the MCP tool set if they can't be executed? Consider either:
7. Hardcoded Magic Values 🟡Location: data_tools.py:318 for assoc in mapping_response.structure.associations[:50]The limit of 50 associations is hardcoded without explanation. Recommendation: Extract to a named constant with documentation: MAX_ASSOCIATION_PREVIEW = 50 # Limit response size for MCP protocol8. Inconsistent Docstring Quality 🟡Some functions have excellent docstrings (e.g., 9. Import Organization 🟢Location: data_tools.py:13 import pandas as pdPandas is imported at module level but only used in one function ( def execute(self, arguments):
import pandas as pd
# ...This reduces import overhead when these tools aren't used. 10. README Documentation 🟢Location: README.md:179-270 The README additions are comprehensive and well-written. Great job documenting the new tools! Minor suggestion: Add a "Prerequisites" or "Limitations" section noting:
Positive Aspects ✅
Recommendations SummaryMust Fix Before Merge:
Should Fix Before Merge:
Consider for Follow-up:
Security Review
Files Reviewed
ConclusionThis is a substantial and well-architected addition to the MCP system. The code quality is generally good, but the database session management issue is critical and must be fixed before merge to prevent resource leaks in production. The SQL injection risk should also be addressed. Once the critical issues are resolved, this will be a valuable enhancement to the platform's MCP capabilities. Recommendation: Request changes for critical issues #1, #2, and #4. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Build Cypher query to retrieve facts | ||
| element_filter = ", ".join([f"'{e}'" for e in elements]) | ||
| period_filter = ", ".join([f"'{p}'" for p in periods]) | ||
|
|
||
| query = f""" | ||
| MATCH (f:Fact)-[:FACT_HAS_ELEMENT]->(el:Element) | ||
| MATCH (f)-[:FACT_HAS_PERIOD]->(p:Period) | ||
| WHERE el.uri IN [{element_filter}] | ||
| AND p.period_end IN [{period_filter}] |
There was a problem hiding this comment.
Parameterize fact grid filters to avoid Cypher injection
The fact-grid tool builds its Cypher by interpolating elements and periods directly into the f-string (element_filter = ", ".join([f"'{e}'" for e in elements]) and the subsequent WHERE el.uri IN [...] AND p.period_end IN [...]). Any value containing quotes or Cypher syntax will break the query or allow injection against the target graph because the strings are not escaped or passed as parameters. A malicious or even just oddly formatted element URI would cause query errors or unintended query execution whenever build-fact-grid is invoked.
Useful? React with 👍 / 👎.
| try: | ||
| # Delete subgraph using SubgraphService (same as API routers) | ||
| service = SubgraphService() | ||
| await service.delete_subgraph_database( | ||
| subgraph_id=workspace_id, force=force, create_backup=False |
There was a problem hiding this comment.
Enforce parent authorization before deleting workspaces
DeleteWorkspaceTool invokes SubgraphService.delete_subgraph_database on whatever workspace_id is supplied without checking that the ID belongs to the current graph or that the caller has admin rights. With only a valid-looking ID (e.g., kg_othergraph_dev), a user tied to any graph can trigger deletion of another graph’s subgraph, unlike the REST endpoint that verifies GraphUser admin access. This lacks tenant isolation and could delete other customers’ workspaces.
Useful? React with 👍 / 👎.
- Introduced validation for rows and columns in the BuildFactGridTool to ensure correct data structure. - Updated Cypher query execution to use parameterized statements, improving security against injection attacks. - Added validation for SQL query limits in QueryStagingTool to enforce safe bounds. - Implemented table name format validation in MaterializeGraphTool to restrict to alphanumeric characters and underscores. - Enhanced error handling in DeleteWorkspaceTool to provide clearer messages for authorization and workspace existence checks. - Refactored database session management to ensure proper cleanup and prevent resource leaks.
|
@claude have we addressed all issues? |
|
Claude finished @jfrench9's task —— View job Issue Resolution Status - Complete ReviewAll critical and major issues from the previous review have been addressed! Here's the comprehensive status: ✅ All Critical Issues RESOLVED1. ✅ Database Session Management (Critical Issue #1)Status: FIXED The database session management has been properly fixed using the correct generator consumption pattern: db_gen = get_db_session()
db = next(db_gen)
try:
# ... operations ...
finally:
try:
next(db_gen)
except StopIteration:
passFixed in all locations:
2. ✅ Authorization Enforcement (P1 Security - Codex Review)Status: FIXED Excellent security implementation! The
This completely addresses the P1 security issue where any user could delete another tenant's workspace. 3. ✅ Cypher Injection Prevention (P1 Security - Codex Review)Status: FIXED Excellent security fix! The
This properly prevents Cypher injection by separating query structure from user data. ✅ All Major Issues RESOLVED4. ✅ Input Validation (Major Issue #4)Status: FIXED BuildFactGridTool validation (data_tools.py):
MaterializeGraphTool validation (data_tools.py):
QueryStagingTool validation (data_tools.py):
5.
|
…se structure - Updated the tool's description to clarify functionality and added a new parameter, `include_summary`, to optionally include aggregated statistics (count, total, avg, min, max) by element in the response. - Modified the Cypher query to include unit information and adjusted the response format to return structured data records. - Removed unused tools from the KuzuMCPTools class to streamline the codebase.
- Corrected the default value for the `include_summary` parameter in the BuildFactGridTool to use `False` instead of `false`. - Improved readability of the conditional statement for including summary statistics in the tool's output. - Updated test assertions in TestKuzuMCPTools to reflect the correct number of tools and adjusted error handling for missing elements in the build-fact-grid tool call.
Summary
This PR introduces a comprehensive set of workspace management and data operation tools for the Model Context Protocol (MCP) implementation, significantly expanding the system's capabilities for file management, data processing, and workspace organization.
Key Accomplishments
New Workspace Tools (
workspace.py)Enhanced Data Tools (
data_tools.py)Infrastructure Improvements
Testing Coverage
Breaking Changes
None. All changes are additive and maintain backward compatibility with existing MCP implementations.
Infrastructure Considerations
The implementation maintains the existing architectural patterns while significantly expanding the system's capabilities for workspace management and data operations.
🤖 Generated with Claude Code
Branch Info:
feature/subgraph-workspace-mcp-toolsmainCo-Authored-By: Claude noreply@anthropic.com