Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

idea: Production Level Queue System #580

@dan-menlo

Description

@dan-menlo

Objective

  • Do we need a queue system that scales to thousands of requests

Motivation

Nullpointer Errors?

  • Currently, inference requests are handled FIFO
  • We are adopting an OpenAI API, which means that we will receive requests across Chat, Audio, Vision etc
  • Given that users are on laptops with limited RAM and VRAM, we are likely to have to switch models

Preparing for Cloud Native

  • Our long-term future is likely as an enterprise OpenAI-alternative, which will be multi-user and have a queue system
  • Should we bake in this abstraction, and use a local file-based queue (which is later swapped out for a more sophisticated queue?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions