Bluesky Bot (Python)

Fine-tune Qwen3-8B on your posts and auto-post to Bluesky.

This is the original Python implementation that inspired the Elixir version.

What This Does

Scrape your existing posts from Bluesky/Twitter
Fine-tune Qwen3-8B-4bit using LoRA on Apple Silicon
Generate new posts in your voice
Post automatically to Bluesky on a schedule

Requirements

Apple Silicon Mac (M1/M2/M3/M4)
Python 3.10+
~8GB RAM for training, ~5GB for inference

pip install -r requirements.txt

Quick Start

1. Set Up Credentials

cp .env.example .env
# Edit .env with your Bluesky credentials

Get an app password: Bluesky → Settings → App Passwords → Add App Password

2. Collect Training Data

# Fetch your Bluesky posts
python fetch_bluesky_posts.py @your-handle.bsky.social -o bluesky_posts.jsonl

# If you have Twitter data, combine them
python combine_data.py

3. Fine-Tune the Model

# Train with LoRA (takes 1-2 hours on M1 Pro)
mlx_lm.lora --config qwen3_4bit_config.yaml

4. Merge Adapters

# Fuse LoRA weights into base model
python merge_lora.py

5. Generate Posts

# Preview a generated post
python bluesky_bot.py generate

# Post to Bluesky
python bluesky_bot.py post

# Run on schedule (every 4 hours)
python bluesky_bot.py schedule --interval 4

Files

File	Purpose
`bluesky_bot.py`	Main bot - generate and post
`fetch_bluesky_posts.py`	Scrape posts from Bluesky
`combine_data.py`	Merge Twitter + Bluesky data
`merge_lora.py`	Fuse LoRA adapters into model
`qwen3_4bit_config.yaml`	Training configuration
`requirements.txt`	Python dependencies

Training Configuration

model: lmstudio-community/Qwen3-8B-MLX-4bit
batch_size: 1
grad_accumulation_steps: 8    # Effective batch = 8
iters: 2000
learning_rate: 1e-5
max_seq_length: 256

lora_parameters:
  rank: 16
  dropout: 0.05
  scale: 32.0

Why These Settings?

4-bit model: Fits in 5GB, trains on consumer Macs
LoRA rank 16: Good quality without massive VRAM
Learning rate 1e-5: Low to preserve base model knowledge
max_seq 256: Posts are short, no need for long context

Data Format

Training data is ChatML format in JSONL:

{"messages": [
  {"role": "user", "content": "Write a tweet in your authentic voice."},
  {"role": "assistant", "content": "your actual post here"}
]}

The fetch script uses varied prompts to help the model generalize.

macOS Service

Run the bot as a background service:

# Install LaunchAgent
./install.sh

# Check status
launchctl list | grep bluesky

# View logs
tail -f ~/Desktop/bskybot.log

# Uninstall
./uninstall.sh

Generation Settings

GENERATION_CONFIG = {
    "temp": 0.8,      # Higher = more random
    "top_p": 0.9,     # Nucleus sampling
    "max_tokens": 280 # Bluesky limit is 300
}

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Training Pipeline                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │   Your       │    │   ChatML     │    │   Combined   │   │
│  │   Posts      │ -> │   Format     │ -> │   Dataset    │   │
│  └──────────────┘    └──────────────┘    └──────────────┘   │
│                                                │             │
│                                                ▼             │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │   Qwen3-8B   │    │    LoRA      │    │   Trained    │   │
│  │   4-bit      │ -> │   Training   │ -> │   Adapters   │   │
│  └──────────────┘    └──────────────┘    └──────────────┘   │
│                                                │             │
│                                                ▼             │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │   Base +     │    │    Fuse      │    │   Fused      │   │
│  │   Adapters   │ -> │   Weights    │ -> │   Model      │   │
│  └──────────────┘    └──────────────┘    └──────────────┘   │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│                    Inference Pipeline                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │   Prompt     │    │   MLX-LM     │    │   Generated  │   │
│  │   Template   │ -> │   Generate   │ -> │   Post       │   │
│  └──────────────┘    └──────────────┘    └──────────────┘   │
│                                                │             │
│                                                ▼             │
│                                         ┌──────────────┐    │
│                                         │   Bluesky    │    │
│                                         │   API Post   │    │
│                                         └──────────────┘    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

How It Works

MLX on Apple Silicon

MLX is Apple's ML framework optimized for M-series chips:

Unified Memory: CPU and GPU share RAM - no copying
Lazy Evaluation: Builds compute graph, executes efficiently
Native Quantization: 4-bit inference without quality loss

LoRA Fine-Tuning

Instead of training all 8B parameters:

Freeze the base model weights
Add small "adapter" matrices (rank 16)
Only train the adapters (~0.1% of params)
Merge adapters back into model after training

This lets you fine-tune on a laptop in 1-2 hours.

Generation

from mlx_lm import load, generate

model, tokenizer = load("./fused_model")
response = generate(
    model,
    tokenizer,
    prompt="<|im_start|>user\nWrite a post<|im_end|>\n<|im_start|>assistant\n",
    max_tokens=280,
    temp=0.8
)

Troubleshooting

"mlx_lm API changed"

The library updated. Use the new sampler API:

from mlx_lm.sample_utils import make_sampler
sampler = make_sampler(temp=0.8, top_p=0.9)

Out of Memory

Reduce batch_size to 1
Reduce max_seq_length to 128
Close other applications

Posts Too Long

Reduce max_tokens to 200
Add "keep it brief" to prompt

The Elixir Version

This Python version works great. But I wanted to run inference in Elixir.

That required:

Forking EMLX to add quantization NIFs
Writing a safetensors parser
Implementing Qwen3 architecture in Elixir
Building a Phoenix app around it

See bobby_posts for the insane Elixir version.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bluesky Bot (Python)

What This Does

Requirements

Quick Start

1. Set Up Credentials

2. Collect Training Data

3. Fine-Tune the Model

4. Merge Adapters

5. Generate Posts

Files

Training Configuration

Why These Settings?

Data Format

macOS Service

Generation Settings

Architecture

How It Works

MLX on Apple Silicon

LoRA Fine-Tuning

Generation

Troubleshooting

"mlx_lm API changed"

Out of Memory

Posts Too Long

The Elixir Version

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
bluesky_bot.py		bluesky_bot.py
bot-service.sh		bot-service.sh
com.bluesky.bot.plist.template		com.bluesky.bot.plist.template
combine_data.py		combine_data.py
fetch_bluesky_posts.py		fetch_bluesky_posts.py
install.sh		install.sh
merge_lora.py		merge_lora.py
qwen3_4bit_config.yaml		qwen3_4bit_config.yaml
qwen3_config.yaml		qwen3_config.yaml
requirements.txt		requirements.txt
uninstall.sh		uninstall.sh

Folders and files

Latest commit

History

Repository files navigation

Bluesky Bot (Python)

What This Does

Requirements

Quick Start

1. Set Up Credentials

2. Collect Training Data

3. Fine-Tune the Model

4. Merge Adapters

5. Generate Posts

Files

Training Configuration

Why These Settings?

Data Format

macOS Service

Generation Settings

Architecture

How It Works

MLX on Apple Silicon

LoRA Fine-Tuning

Generation

Troubleshooting

"mlx_lm API changed"

Out of Memory

Posts Too Long

The Elixir Version

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages