Small Language Models.
| Model | Parameters | Perplexity |
|---|---|---|
| LITHE TS | TBD | TBD |
The model is built on a modern, highly optimized decoder-only transformer architecture matching the LLaMA standard. It features:
- LLaMA-style SwiGLU Feed-Forward Networks (utilizing a gated SiLU activation for higher representation power).
- Rotary Positional Embeddings (RoPE) for robust sequence processing without absolute position vectors.
- RMSNorm for stable and efficient pre-normalization.
- Bias-free Linear Layers, maximizing parameter efficiency and reducing memory overhead.
The model utilizes a custom, compact tokenizer with a vocabulary size of 2048 tokens. This highly optimized vocabulary is designed to tightly fit its training corpus (focused on simplified language, such as TinyStories), ensuring maximum token efficiency and minimizing the memory footprint of the embedding layers.
We still didnt provided the model: work in progress
cargo build --release# Show model info (parameter count, config)
./target/release/lithe --info
# Basic generation
./target/release/lithe --prompt "Once upon a time"
# Focused / deterministic output (low temperature)
./target/release/lithe --prompt "The sky is" --temperature 0.3
# Creative / more random output
./target/release/lithe --prompt "In the year 2050" --temperature 1.2 --top-p 0.95
# Generate more tokens
./target/release/lithe --prompt "Hello" --max-tokens 500
# Help
./target/release/lithe --help| Option | Default | Description |
|---|---|---|
--prompt / -p |
"Once upon a time" |
Input prompt text |
--temperature |
0.8 |
Sampling temperature. Lower = focused, higher = creative |
--top-p |
0.9 |
Nucleus sampling threshold. 1.0 = disabled |
--max-tokens |
200 |
Maximum number of new tokens to generate |
--model |
model/model.safetensors |
Path to model weights |
--config |
model/config.json |
Path to model config |
--tokenizer |
model/tokenizer.json |
Path to tokenizer |
--info |
— | Print model config and parameter count, then exit |
To load a different model version:
./target/release/lithe --model model-v2.safetensors --config config-v2.json --prompt "Hello"Add to your Cargo.toml:
[dependencies]
lithe = { git = "https://github.com/radevgit/lithe.git" }Basic usage:
use lithe::{Config, Generator, Model, SamplingParams};
use candle_core::Device;
use tokenizers::Tokenizer;
fn main() -> anyhow::Result<()> {
let cfg = Config::from_file("model/config.json")?;
let device = Device::Cpu;
let model = Model::load(&cfg, "model/model.safetensors", &device)?;
let tokenizer = Tokenizer::from_file("model/tokenizer.json")
.map_err(|e| anyhow::anyhow!("{}", e))?;
let generator = Generator::new(model, tokenizer, device);
let params = SamplingParams {
temperature: 0.8,
top_p: 0.9,
max_new_tokens: 200,
};
let output = generator.generate("Once upon a time", ¶ms)?;
println!("{}", output);
Ok(())
}