An Intelligent AI Developer Workspace A VS Code-style browser IDE powered by Pydantic-AI and the Model Context Protocol (MCP). It functions as an autonomous coding partner capable of managing file systems, executing git operations, and generating code in real-time.
⚡ Tech Stack & Architecture:
- Core AI: Python 3.11, Pydantic-AI, Google Gemini Models.
- Backend: FastAPI, MCP (Model Context Protocol) Server, Cloud Run.
- Frontend: React 19, Vite, Monaco Editor, XTerm.js.
- Architecture: Agent-based Microservices with tool registration and stateful session management.
|
2. Zenith
Real-Time Voice and Vision AI Consultant
The Zenith Pod is an advanced interactive assistant powered by Gemini Live that enables natural, fluid voice conversations while sharing live camera feed for real-time vision analysis. It can search the web, generate images, and analyze what it sees — all in one seamless experience.
⚡ Tech Stack & Architecture:
- Core AI: Google Gemini Live (voice + vision), Gemini models for tool calling.
- Real-Time Engine: WebSockets for bidirectional audio & metadata streaming.
- Vision Layer: MediaDevices API + canvas frame extraction.
- Tools: Google Search API, Imagen API (text-to-image).
- Frontend: React 19, Tailwind CSS, Canvas WaveformVisualizer.
- Architecture: Event-driven services (LiveService, ToolService) with React state management.
|
|
3. Swara
Real-time AI Music Creation Studio
The Swara is a real-time AI music creation studio that lets users describe a vibe and interactively steer music generation live. It integrates Google Gemini 2.5 Flash for intelligent music planning and Google Lyria RealTime for continuous audio streaming, providing a fully interactive music composition experience.
⚡ Tech Stack & Architecture:
- Core AI: Google Gemini 2.5 Flash (vibe analysis, song planning), Google Lyria RealTime (music generation and streaming).
- Backend: Next.js API routes for prompt generation, WebSockets for real-time audio.
- Frontend: React, Tailwind CSS, shadcn/ui components, Web Audio API for playback and visualization.
- Architecture: Component-based with services (MusicService, audioUtils) for WebSocket management, audio pipeline, and stateful React management including debounced updates and error handling.
|
4. Model Park
Self-hosted Multi-Model LLM Inference Platform A production-grade LLM gateway providing access to 10+ open-source models (Gemma, Llama, DeepSeek, etc.) via a single OpenAI-compatible API. Hosted on Google Cloud Run with NVIDIA L4 GPUs, it features intelligent auto-routing, scale-to-zero cost optimization, and secure service-to-service authentication.
⚡ Tech Stack & Architecture:
- AI Models: Gemma 3, Llama 3.1/3.2, DeepSeek-R1, Qwen 2.5/3, Mistral, Phi-4, etc.
- Backend: Python 3.11, FastAPI, Ollama, OpenAI API Protocol.
- Infrastructure: Google Cloud Run (GPU-enabled), NVIDIA L4 GPUs, Artifact Registry.
- Architecture: Multi-model router with keyword-based auto-routing, API key middleware, and identity-based outbound authentication using Google Service Accounts.
|
Enterprise MLOps & Computer Vision Pipeline A comprehensive MLOps-IoT platform designed for automated video surveillance. It automates the training, deployment, and maintenance of computer vision models to track occupancy, demographics (gender, age), and workplace safety in real-time.
⚡ Tech Stack & Architecture:
- ML Core: PyTorch 2.4, Ultralytics YOLOv8, OpenCV.
- Orchestration: Kubeflow Pipelines (KFP 2.0), Vertex AI Training.
- Data: BigQuery, Firestore, Google Cloud Storage.
- Architecture: Event-Driven MLOps Pipeline with automated retraining triggers and edge-compatible deployment.
|
|
|
|
|
|
6. Podcraftor
Text-to-Podcast Automation Engine A full-stack application that converts plain text into complete podcast episodes. Utilizes Google TTS with SSML support to generate natural, human-like audio, automating podcast production end-to-end.
⚡ Tech Stack & Architecture:
- Backend: Python 3.12, FastAPI, LangChain (LLM Orchestration).
- Audio Processing: FFMpeg, Pydub, Google Cloud TTS (SSML).
- AI Models: Google Vertex AI, Gemini Pro.
- Architecture: Service-oriented architecture with Factory patterns for TTS providers and Strategy patterns for content generation.
|
|
|
|
Intelligent Document-to-Audio System An intelligent full-stack solution that processes PDFs and ePUB formats to autonomously structure chapters and generate high-quality audiobooks using Google TTS — surpassing traditional audiobook features offered by platforms like ElevenLabs.
⚡ Tech Stack & Architecture:
- Backend: FastAPI, Pydantic, SpaCy (NLP), Ebooklib.
- Frontend: Next.js 14, React 18, Firebase Auth.
- Cloud: Cloud Run, Cloud Storage, Google Cloud Vision API.
- Architecture: Async-first microservices with background task scheduling (APScheduler) and stream processing.
|
|
|
|
Healthcare Fraud Detection System A smart healthcare document analysis system capable of interpreting diverse medical documents including handwritten prescriptions and discharge summaries. It extracts key KPIs, detects fraud, verifies document legitimacy, and generates contextual follow-up questions.
⚡ Tech Stack & Architecture:
- Core: Next.js 14, TypeScript, PDFKit, Sharp.
- AI: Gemini 2.0 Flash (Multimodal Analysis).
- Infrastructure: Cloud Run (Serverless), Cloud Build.
- Architecture: Serverless Multi-Step AI Pipeline (Classification → Analysis → Fraud Detection).
|
|
|
|
|
|
AI-Powered Image and Video Generation Studio A "Photoshop Agent" and Video Generation studio that unifies multiple generative models into a single creative workflow. It handles complex media operations like video trimming and composition directly in the browser.
⚡ Tech Stack & Architecture:
- Frontend: Next.js 14, React 18, MediaRecorder API.
- AI Models: Gemini 1.5 Pro, Google Imagen 3, Google Veo (Video Generation).
- Architecture: Multi-Model AI Studio pattern with hybrid client-side video processing and server-side AI generation.
|
|
|
|
10. Live Voice Agent
Real-time 3D AI Interaction An Intelligent live voice agent using Gemini-Live-Voice model. The Agent is integrated with function calling and tool use to take action on behalf of the user, visualized with a reactive 3D avatar.
⚡ Tech Stack & Architecture:
- Core: TypeScript, React, Three.js (WebGL), Web Audio API.
- AI: Gemini Live API (Real-time Streaming).
- Architecture: Client-side audio processing pipeline with bidirectional WebSocket communication.
| Video | Screenshot |
|---|---|
| AI Voice bot |
|
11. Gifinity
AI Sprite Sheet & GIF Generator A fun App that generates sprite sheet images using Gemini Nano/Pro models and converts them to animated GIFs client-side.
⚡ Tech Stack & Architecture:
- Core: Next.js, HTML5 Canvas API, Gifshot.js.
- AI: Gemini Pro Vision, Gemini Nano.
- Architecture: Client-Side Animation Pipeline where heavy frame processing is offloaded to the browser via Canvas API.
|
|
|
Trend-to-Content Intelligence A sophisticated API-driven application designed to bridge the gap between raw trend data and actionable podcast content. It intelligently combines insights from global Google Trends and internal podcast analytics.
⚡ Tech Stack & Architecture:
- Backend: Python 3.12, FastAPI, Pydantic.
- Data: Google BigQuery (Trend Analysis), Elasticsearch 8.x (Analytics).
- Architecture: Async Microservices orchestration leveraging BigQuery for large-scale trend aggregation.
Social Media Campaign Tracker A comprehensive social media analytics tool that scrapes user comments from platforms like Instagram, YouTube, and Facebook to analyze sentiment trends and track influencer campaign effectiveness.
⚡ Tech Stack & Architecture:
- Core: Next.js 14, React 18, TailwindCSS.
- AI: Gemini 1.5 Flash (Sentiment Classification).
- Infrastructure: Google Cloud Run, Cloud Build (CI/CD).
- Architecture: Serverless Microservices pattern deployed via Cloud Build.




































