In fastchat/serve/model_worker.py:
def generate_gate(self, params):
for x in self.generate_stream_gate(params):
pass
return json.loads(x[:-1].decode())
If the generator yields nothing (e.g., an exception before the first yield), x is never defined and this crashes with UnboundLocalError.