You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have verified this would not be more appropriate as a feature request in a specific repository
I have searched existing discussions to avoid duplicates
Your Idea
Summary
This proposal suggests a minimal, optional MCP extension that helps tool developers answer one practical question:
“Is my tool slowing down the conversation between the user and the LLM?”
The goal is not deep tracing or model telemetry.
The goal is simply to preserve the perceived instantaneity of conversational UX as external tools become more common.
Problem
As a tool developer, when an interaction feels slow to users, it is currently difficult to know:
whether the delay comes from the tool backend,
from network transport,
from UI rendering,
or from factors outside the developer’s control.
Because MCP tools are inserted into live conversations, even small delays can break the feeling of instant interaction. Developers need a lightweight way to understand their contribution to total latency — without requiring insight into LLM internals.
Design Goals
Help tool developers measure conversation-level latency.
Avoid exposing internal LLM routing or orchestration.
Keep the extension minimal and optional.
Require no heavy observability stack.
Non-Goals:
No OpenTelemetry replacement.
No internal model performance data.
No provider infrastructure exposure.
Proposed Minimal Concept
An optional object propagated during tool calls:
conversation_timing:
trace_id: string
timestamps:
user_intent_received: number
tool_request_sent: number
tool_response_received: number
ui_render_start: number
Key principles:
The LLM runtime MAY create a trace_id.
MCP servers MAY append observable timestamps.
Clients MAY ignore the field entirely.
Only externally observable lifecycle events are included.
This is conceptually similar to HTTP “Server-Timing”, but scoped to conversational flows.
Example Use Case
A developer builds a tool that adds a map UI.
Users report the conversation feels slower.
With minimal conversation timing, the developer could see:
total conversation latency
tool processing duration
UI rendering delay
without needing any visibility into the LLM’s internal processing.
Benefits
Helps developers ensure tools do not degrade conversational UX.
Encourages performance-aware tool design.
Complements (but does not replace) deeper tracing approaches like OTEL.
Low complexity and backward compatible.
Why This Matters
External tools are a recent addition to conversational systems.
Maintaining a fast, natural interaction loop is critical for adoption.
A minimal “conversation timing” signal could give developers actionable insight while respecting provider boundaries.
Open Questions
Should timing propagation reuse existing trace context formats?
Should timestamps be absolute (epoch ms) or relative durations?
Would this live in MCP core or as a registry convention?
Thank you for considering this lightweight developer-focused idea.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Pre-submission Checklist
Your Idea
Summary
This proposal suggests a minimal, optional MCP extension that helps tool developers answer one practical question:
“Is my tool slowing down the conversation between the user and the LLM?”
The goal is not deep tracing or model telemetry.
The goal is simply to preserve the perceived instantaneity of conversational UX as external tools become more common.
Problem
As a tool developer, when an interaction feels slow to users, it is currently difficult to know:
Because MCP tools are inserted into live conversations, even small delays can break the feeling of instant interaction. Developers need a lightweight way to understand their contribution to total latency — without requiring insight into LLM internals.
Design Goals
Non-Goals:
Proposed Minimal Concept
An optional object propagated during tool calls:
Key principles:
This is conceptually similar to HTTP “Server-Timing”, but scoped to conversational flows.
Example Use Case
A developer builds a tool that adds a map UI.
Users report the conversation feels slower.
With minimal conversation timing, the developer could see:
without needing any visibility into the LLM’s internal processing.
Benefits
Why This Matters
External tools are a recent addition to conversational systems.
Maintaining a fast, natural interaction loop is critical for adoption.
A minimal “conversation timing” signal could give developers actionable insight while respecting provider boundaries.
Open Questions
Thank you for considering this lightweight developer-focused idea.
Scope
Beta Was this translation helpful? Give feedback.
All reactions