Agentic AI infra & Web3 engineer/Architect. I build large-scale decentralised storage / compute systems (Filecoin, IPFS) and cloud-native platforms, across enterprise and high-growth environments.
- Agentic AI infra: Open source for inference networks, GPU resource orchestration, tooling around LLM stacks.πΆ π΄ββοΈ
- Distributed systems & platform architecture: high availability, observability, cost/perf.
- Decentralised storage & compute: Filecoin/IPFS service networks, multi-cluster operations.
- Cloud & infrastructure delivery: IDC build-out, capacity planning, supply chain, budgets.
- Ecosystem & solutions: partner enablement, regional growth, technical program leadership.
- Current project: https://github.com/lebitai/minipoetry
- Active repos (recent):
-
Qingning β AI-powered EdTech platform (full-stack)
- Cross-platform EdTech monorepo: Go/Gin/GORM API, Vue 3 admin console, Taro + React WeChat miniapp, official site and partner portal, backed by PostgreSQL 14+, Redis caching, Kafka learning-data sync, JWT/RBAC, and DDD-style handler/service/repository boundaries.
- AI learning/content pipeline: QWen/Tongyi LLM provider layer with prompt templates, retries and human-reviewable output for translation/annotation/analysis; TTS generation and admin-side AI content operations for knowledge books, poems, questions and audio.
- Production/SRE architecture: 236 SafeLine WAF edge, 239 Docker Compose app host, 213 PostgreSQL, 211 monitoring center with VictoriaMetrics, vmagent, Grafana, Loki, Tempo, vmalert, Alertmanager and Feishu alerts; K3s/Argo CD/GitOps planned as the next platform evolution.
- Agentic Coding design: Feishu feedback/task intake, admin approval, worker-driven LLM prompt extraction, structured JSON planning, queued Agent Runner jobs, sandboxed Codex/Claude Code CLI execution, test reports, PR/patch output, audit logs and human-gated merge flow.
-
Decentralized storage & compute platforms
- Filecoin/IPFS storage networks: 520 PB+ capacity onboarded across 10+ multi-region clusters.
- HA service redesign: automated failover, 85% unplanned-downtime reduction (annualized).
-
GPU inference resource networking
- Inference-oriented GPU resource network: multi-tenant scheduling across 4,000+ GPUs.
- Cluster orchestration with dynamic allocation, health monitoring, and usage-based billing.
-
NovaTranscoder β GPU-accelerated video transcoding engine
- Go Fiber API gateway + Rust worker consuming Redis queues; FFmpeg/NVENC pipelines for 8K/HDR transcoding.
- Distributed job scheduling with configurable GPU profiles and real-time progress tracking.
-
Operational delivery & automation
- Infra delivery pipeline: 700 NAS servers provisioned with standardised imaging and cabling workflows.
- Automated pre-deployment audit checks; fleet compliance raised to 95%+ against ops standards.
-
More detailed, email me.
- Organized summits in Hong Kong and Singapore with 300+ builders and partners.
- Led cross-country collaboration across China/Singapore/Malaysia.
- System Design: distributed architecture, containerization, orchestration, storage/compute frameworks
- Infrastructure: Linux, self-hosted, CI/CD, Docker, Kubernetes, Terraform
- Web3: storage, DeFi, DApps, GPU-based blockchain
- Agentic AI: LangChain, vLLM, Ray, LlamaIndex
- Machine Learning: PyTorch, NumPy, Colab
- Languages: Go, Python, Rust, TypeScript
- Backend/Data: Node.js, PostgreSQL, YugabyteDB, MongoDB
- Hardware: NVIDIA, DELL, Supermicro, H3C
- Shanghai Jiao Tong University β Computer Science and Technology
Blog: https://xiaoli.dev


