Generated: 2026-05-12 09:15 UTC
Focus: Improve Pareto-optimal ratios from current baseline
| Benchmark | Pareto-Optimal | Total Configs | Pass Rate | Status |
|---|---|---|---|---|
| Conversational | 2 | 9 | 22% | ❌ CRITICAL |
| Vision | 4 | 8 | 50% | |
| Scientific | 6 | 12 | 50% | |
| Balanced | 2 | 2 | 100% | ✅ GOOD |
Overall: 14/31 = 45% Pareto-optimal
Target: 80% Pareto-optimal
Pareto-Optimal Configurations (2/9):
-
conversational_fp16_seq1024(Score: 0.7982)- VRAM: 3000 MiB (75.5%)
- Throughput: 60 ops/sec
- Accuracy: 6.75
- Hamiltonian: 7.764
-
conversational_fp16_seq2048(Score: 0.7982)- VRAM: 3000 MiB (75.5%)
- Throughput: 30 ops/sec
- Accuracy: 6.75
- Hamiltonian: 7.764
Non-Pareto Configurations (7/9):
-
conversational_int4_seq1024/2048/4096(3 configs)- VRAM: 1200 MiB (30%)
- Throughput: 60/30/15 ops/sec
- Accuracy: 5.2
⚠️ Lower accuracy - Score: 0.629 (vs 0.798 for fp16)
-
conversational_int8_seq1024/2048/4096(3 configs)- VRAM: 1800 MiB (45%)
- Throughput: 60/30/15 ops/sec
- Accuracy: 6.0
⚠️ Still lower than fp16 - Score: 0.706 (vs 0.798 for fp16)
-
conversational_fp16_seq4096(1 config)- VRAM: 3000 MiB
- Throughput: 15 ops/sec
⚠️ Too slow - Accuracy: 6.75
- Score: 0.724 (lower due to throughput)
Root Causes:
- Quantization accuracy loss - int4/int8 models have lower accuracy (5.2-6.0 vs 6.75)
- Long sequence penalty - seq4096 has poor throughput (15 ops/sec)
- Lack of middle-ground configs - No fp16_seq512 or int8_seq1536 tested
Pareto-Optimal Configurations (4/8):
vision_b8_i640- Best throughput, moderate VRAMvision_b4_i640- Balancedvision_b2_i640- Lower VRAM, high accuracyvision_b1_i640- Lowest VRAM, highest accuracy
Non-Pareto Configurations (4/8):
vision_b*_i320(4 configs) - All lower resolution variants- Lower accuracy due to smaller input size
- Not competitive with 640x640 variants
Root Causes:
- Resolution matters - All 640x640 configs are Pareto-optimal
- 320x320 not competitive - Lower resolution hurts accuracy too much
- Need intermediate resolutions - Test 480x480, 512x512
Pareto-Optimal Configurations (6/12):
- Good mix of qubit counts (12, 16, 20) and sequence lengths
- All Pareto configs have low VRAM (180-220 MiB)
- Balance between throughput and accuracy
Non-Pareto Configurations (6/12):
- Configurations with poor throughput/accuracy trade-offs
- Too many qubits with long sequences (high latency)
- Too few qubits (lower accuracy potential)
Root Causes:
- Over-parameterization - Some configs use more resources than needed
- Under-parameterization - Others don't meet accuracy requirements
- Need better sampling - Explore q14, q18 qubit counts
Add 4 new configurations:
-
conversational_fp16_seq512Quantization: FP16 (no loss) Sequence: 512 (shorter, faster) Expected: Higher throughput (120 ops/sec) VRAM: ~2400 MiB Reason: Gap between seq1024 and seq2048
-
conversational_int8_seq1536Quantization: INT8 (acceptable loss) Sequence: 1536 (middle ground) Expected: Good accuracy (6.2-6.4), moderate throughput VRAM: ~1950 MiB Reason: Better int8 accuracy with optimal sequence length
-
conversational_fp16_seq768Quantization: FP16 Sequence: 768 Expected: High throughput (80 ops/sec), full accuracy VRAM: ~2700 MiB Reason: Fill gap between seq512 and seq1024
-
conversational_int8_seq768Quantization: INT8 Sequence: 768 Expected: Better throughput (80 ops/sec), acceptable accuracy (6.0-6.2) VRAM: ~1650 MiB Reason: INT8 sweet spot - shorter sequences improve accuracy
Expected Result: 6/13 = 46% (still below target, but 2x improvement)
Additional Improvements:
- Tune quantization calibration for int4/int8
- Use Qwen2:1.5b instead of Qwen-1.5 (better accuracy)
- Test dynamic quantization (loads fp16, quantizes at runtime)
Add 2 new configurations:
-
vision_b*_i512(4 configs: b1, b2, b4, b8)Resolution: 512x512 (intermediate) Expected: Better accuracy than 320, lower VRAM than 640 Reason: Fill resolution gap
-
vision_b*_i480(4 configs)Resolution: 480x480 Expected: Good balance Reason: Common mobile resolution
Expected Result: 10/16 = 63% (if 6 new configs are Pareto-optimal)
Additional Improvements:
- Use YOLOv8n quantized (reduces VRAM by 40%)
- Test TensorRT optimization (improves throughput)
- Add mAP50 metric for accuracy
Add 4 new configurations:
-
scientific_q14_s1024Qubits: 14 (between 12 and 16) Sequence: 1024 Expected: Good accuracy/throughput balance
-
scientific_q18_s1024Qubits: 18 (between 16 and 20) Sequence: 1024 Expected: Higher accuracy, acceptable throughput
-
scientific_q14_s2048 -
scientific_q18_s2048
Expected Result: 10/16 = 63%
Additional Improvements:
- Use adaptive shot count (more shots for higher qubits)
- Tune QAOA parameters (gamma, beta) per configuration
- Test VQE algorithm (may be more efficient)
-
Add conversational configs
cd ~/diamond-node/benchmarks # Edit orthogonal_test.py to add 4 new configs
-
Add vision i512 resolution
# Add 4 new vision configs with 512x512 resolution -
Re-run benchmarks
/home/diamondnode/venv312/bin/python benchmarks/orthogonal_test.py --workload all
-
Switch to Qwen2:1.5b
ollama pull qwen2:1.5b # Better accuracy than Qwen-1.5 -
Quantize YOLO
# Use YOLOv8n with INT8 quantization from ultralytics import YOLO model = YOLO('yolov8n.pt') model.export(format='engine', int8=True)
-
Tune QUBO parameters
# In mycelial_qubo.py lam_dist = 0.35 # was 0.4 lam_redund = -0.25 # was -0.2 lam_resource = -0.9 # was -0.8
- Implement dynamic quantization
- Add TensorRT optimization
- Implement waveform equilibrium scoring (currently returns 0)
- Add LangSmith tracing for optimization metrics
| Metric | Current | After Phase 1 | After Phase 2 | After Phase 3 |
|---|---|---|---|---|
| Conversational | 22% | 46% | 56% | 67% ✅ |
| Vision | 50% | 63% | 69% | 75% ✅ |
| Scientific | 50% | 63% | 69% | 75% ✅ |
| Overall | 45% | 57% | 65% | 72% |
Final Target: 72% overall (close to 80% goal)
# Pull better model
ollama pull qwen2:1.5b
# Test inference
ollama run qwen2:1.5b "What is 2+2?" --verbose
# Monitor VRAM
~/diamond-node/scripts/vram_check.shfrom ultralytics import YOLO
import torch
# Load and quantize
model = YOLO('yolov8n.pt')
model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
# Test inference
results = model('test_image.jpg')cd ~/diamond-node
# Edit scripts/mycelial_qubo.py with new lambda values
nano scripts/mycelial_qubo.py
# Run test
/home/diamondnode/venv312/bin/python scripts/mycelial_qubo.py --shots 1024 --outer-rounds 5 --json# Run full benchmark suite
/home/diamondnode/venv312/bin/python benchmarks/orthogonal_test.py --workload all
# Check results
cat ~/diamond-node/benchmark_results/current-full/report_*.txt
# Compare before/after
diff \
~/diamond-node/benchmark_results/previous/report_conversational.txt \
~/diamond-node/benchmark_results/current-full/report_conversational.txt# Terminal 1: Run benchmarks
/home/diamondnode/venv312/bin/python benchmarks/orthogonal_test.py
# Terminal 2: Monitor VRAM
watch -n 1 '~/diamond-node/scripts/vram_check.sh'
# Terminal 3: Monitor GPU temperature
watch -n 1 'nvidia-smi --query-gpu=temperature.gpu,power.draw --format=csv,noheader'- 4 new conversational configs added
- 4 new vision i512 configs added
- Benchmark suite runs successfully
- Conversational: 4-5 Pareto-optimal (44-56%)
- Qwen2:1.5b installed and tested
- YOLO quantization working
- QUBO parameters tuned
- Conversational: 5-6 Pareto-optimal (56-67%)
- Waveform equilibrium implemented
- TensorRT optimization working
- LangSmith tracing active
- Overall: 70%+ Pareto-optimal ✅
Status: Analysis complete, ready for implementation
Priority: Phase 1 (quick wins) recommended
ETA: Phase 1 = 2 hours, Phase 2 = 6 hours, Phase 3 = 14 hours total