steaming response for chat panel

The litellm backend is capable of returning streaming responses from inference calls rather than just sync responses.

Let us implement an optional capability in chatPanel which (for just the litelllm backends for now) sends stream=True and attempts to stream responses back to users.

It's important to note that for Qwen (not sure about all models) "thinking" is wrapped in a `<think> contenthere blah blah </think>` tag. Not sure if that is always true, but if a stream starts with <think> let's have a circle spinner/accordion which users can open to view all stream while it comes in, but by default stays closed while waiting for end of thinking. A spinning animation or breathing animation would be nice so users are aware that work is happening in the background.  Then stream the rest of response to output. 

This is working python code with requests to call to litellm and response/print the stream in real time.
```
chat2 = r.post(
    "<litellmurl>/v1/chat/completions",
    headers={...},
    json={
        "model": "qwen3-32b",
        "messages": [{"role": "user", "content": "prompt?"}],
        "stream": True,
    },
    stream=True,
)

for line in chat2.iter_lines():
    if line:
        decoded = line.decode("utf-8")
        if decoded.startswith("data: ") and decoded != "data: [DONE]":
            chunk = json.loads(decoded[6:])  # strip "data: " prefix
            content = chunk["choices"][0]["delta"].get("content", "")
            print(content, end="", flush=True)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

steaming response for chat panel #516

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

steaming response for chat panel #516

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions