Fine-tune Qwen3-8B on your posts and auto-post to Bluesky.
This is the original Python implementation that inspired the Elixir version.
- Scrape your existing posts from Bluesky/Twitter
- Fine-tune Qwen3-8B-4bit using LoRA on Apple Silicon
- Generate new posts in your voice
- Post automatically to Bluesky on a schedule
- Apple Silicon Mac (M1/M2/M3/M4)
- Python 3.10+
- ~8GB RAM for training, ~5GB for inference
pip install -r requirements.txtcp .env.example .env
# Edit .env with your Bluesky credentialsGet an app password: Bluesky → Settings → App Passwords → Add App Password
# Fetch your Bluesky posts
python fetch_bluesky_posts.py @your-handle.bsky.social -o bluesky_posts.jsonl
# If you have Twitter data, combine them
python combine_data.py# Train with LoRA (takes 1-2 hours on M1 Pro)
mlx_lm.lora --config qwen3_4bit_config.yaml# Fuse LoRA weights into base model
python merge_lora.py# Preview a generated post
python bluesky_bot.py generate
# Post to Bluesky
python bluesky_bot.py post
# Run on schedule (every 4 hours)
python bluesky_bot.py schedule --interval 4| File | Purpose |
|---|---|
bluesky_bot.py |
Main bot - generate and post |
fetch_bluesky_posts.py |
Scrape posts from Bluesky |
combine_data.py |
Merge Twitter + Bluesky data |
merge_lora.py |
Fuse LoRA adapters into model |
qwen3_4bit_config.yaml |
Training configuration |
requirements.txt |
Python dependencies |
model: lmstudio-community/Qwen3-8B-MLX-4bit
batch_size: 1
grad_accumulation_steps: 8 # Effective batch = 8
iters: 2000
learning_rate: 1e-5
max_seq_length: 256
lora_parameters:
rank: 16
dropout: 0.05
scale: 32.0- 4-bit model: Fits in 5GB, trains on consumer Macs
- LoRA rank 16: Good quality without massive VRAM
- Learning rate 1e-5: Low to preserve base model knowledge
- max_seq 256: Posts are short, no need for long context
Training data is ChatML format in JSONL:
{"messages": [
{"role": "user", "content": "Write a tweet in your authentic voice."},
{"role": "assistant", "content": "your actual post here"}
]}The fetch script uses varied prompts to help the model generalize.
Run the bot as a background service:
# Install LaunchAgent
./install.sh
# Check status
launchctl list | grep bluesky
# View logs
tail -f ~/Desktop/bskybot.log
# Uninstall
./uninstall.shGENERATION_CONFIG = {
"temp": 0.8, # Higher = more random
"top_p": 0.9, # Nucleus sampling
"max_tokens": 280 # Bluesky limit is 300
}┌─────────────────────────────────────────────────────────────┐
│ Training Pipeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Your │ │ ChatML │ │ Combined │ │
│ │ Posts │ -> │ Format │ -> │ Dataset │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Qwen3-8B │ │ LoRA │ │ Trained │ │
│ │ 4-bit │ -> │ Training │ -> │ Adapters │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Base + │ │ Fuse │ │ Fused │ │
│ │ Adapters │ -> │ Weights │ -> │ Model │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
├─────────────────────────────────────────────────────────────┤
│ Inference Pipeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Prompt │ │ MLX-LM │ │ Generated │ │
│ │ Template │ -> │ Generate │ -> │ Post │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Bluesky │ │
│ │ API Post │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
MLX is Apple's ML framework optimized for M-series chips:
- Unified Memory: CPU and GPU share RAM - no copying
- Lazy Evaluation: Builds compute graph, executes efficiently
- Native Quantization: 4-bit inference without quality loss
Instead of training all 8B parameters:
- Freeze the base model weights
- Add small "adapter" matrices (rank 16)
- Only train the adapters (~0.1% of params)
- Merge adapters back into model after training
This lets you fine-tune on a laptop in 1-2 hours.
from mlx_lm import load, generate
model, tokenizer = load("./fused_model")
response = generate(
model,
tokenizer,
prompt="<|im_start|>user\nWrite a post<|im_end|>\n<|im_start|>assistant\n",
max_tokens=280,
temp=0.8
)The library updated. Use the new sampler API:
from mlx_lm.sample_utils import make_sampler
sampler = make_sampler(temp=0.8, top_p=0.9)- Reduce batch_size to 1
- Reduce max_seq_length to 128
- Close other applications
- Reduce max_tokens to 200
- Add "keep it brief" to prompt
This Python version works great. But I wanted to run inference in Elixir.
That required:
- Forking EMLX to add quantization NIFs
- Writing a safetensors parser
- Implementing Qwen3 architecture in Elixir
- Building a Phoenix app around it
See bobby_posts for the insane Elixir version.
MIT