G Santosh Kumar gsantoshkumar1999

Hi there, I'm G Santosh Kumar 👋

Projects

1. Boxsand Coder Agent

An Intelligent AI Developer Workspace A VS Code-style browser IDE powered by Pydantic-AI and the Model Context Protocol (MCP). It functions as an autonomous coding partner capable of managing file systems, executing git operations, and generating code in real-time.

⚡ Tech Stack & Architecture:

Core AI: Python 3.11, Pydantic-AI, Google Gemini Models.
Backend: FastAPI, MCP (Model Context Protocol) Server, Cloud Run.
Frontend: React 19, Vite, Monaco Editor, XTerm.js.
Architecture: Agent-based Microservices with tool registration and stateful session management.

2. Zenith

Real-Time Voice and Vision AI Consultant

The Zenith Pod is an advanced interactive assistant powered by Gemini Live that enables natural, fluid voice conversations while sharing live camera feed for real-time vision analysis. It can search the web, generate images, and analyze what it sees — all in one seamless experience.

⚡ Tech Stack & Architecture:

Core AI: Google Gemini Live (voice + vision), Gemini models for tool calling.
Real-Time Engine: WebSockets for bidirectional audio & metadata streaming.
Vision Layer: MediaDevices API + canvas frame extraction.
Tools: Google Search API, Imagen API (text-to-image).
Frontend: React 19, Tailwind CSS, Canvas WaveformVisualizer.
Architecture: Event-driven services (LiveService, ToolService) with React state management.

3. Swara

Real-time AI Music Creation Studio

The Swara is a real-time AI music creation studio that lets users describe a vibe and interactively steer music generation live. It integrates Google Gemini 2.5 Flash for intelligent music planning and Google Lyria RealTime for continuous audio streaming, providing a fully interactive music composition experience.

⚡ Tech Stack & Architecture:

Core AI: Google Gemini 2.5 Flash (vibe analysis, song planning), Google Lyria RealTime (music generation and streaming).
Backend: Next.js API routes for prompt generation, WebSockets for real-time audio.
Frontend: React, Tailwind CSS, shadcn/ui components, Web Audio API for playback and visualization.
Architecture: Component-based with services (MusicService, audioUtils) for WebSocket management, audio pipeline, and stateful React management including debounced updates and error handling.

4. Model Park

Self-hosted Multi-Model LLM Inference Platform A production-grade LLM gateway providing access to 10+ open-source models (Gemma, Llama, DeepSeek, etc.) via a single OpenAI-compatible API. Hosted on Google Cloud Run with NVIDIA L4 GPUs, it features intelligent auto-routing, scale-to-zero cost optimization, and secure service-to-service authentication.

⚡ Tech Stack & Architecture:

AI Models: Gemma 3, Llama 3.1/3.2, DeepSeek-R1, Qwen 2.5/3, Mistral, Phi-4, etc.
Backend: Python 3.11, FastAPI, Ollama, OpenAI API Protocol.
Infrastructure: Google Cloud Run (GPU-enabled), NVIDIA L4 GPUs, Artifact Registry.
Architecture: Multi-model router with keyword-based auto-routing, API key middleware, and identity-based outbound authentication using Google Service Accounts.

5. Occupancy Analytics & People Tracking

Enterprise MLOps & Computer Vision Pipeline A comprehensive MLOps-IoT platform designed for automated video surveillance. It automates the training, deployment, and maintenance of computer vision models to track occupancy, demographics (gender, age), and workplace safety in real-time.

⚡ Tech Stack & Architecture:

ML Core: PyTorch 2.4, Ultralytics YOLOv8, OpenCV.
Orchestration: Kubeflow Pipelines (KFP 2.0), Vertex AI Training.
Data: BigQuery, Firestore, Google Cloud Storage.
Architecture: Event-Driven MLOps Pipeline with automated retraining triggers and edge-compatible deployment.

6. Podcraftor

Text-to-Podcast Automation Engine A full-stack application that converts plain text into complete podcast episodes. Utilizes Google TTS with SSML support to generate natural, human-like audio, automating podcast production end-to-end.

⚡ Tech Stack & Architecture:

Backend: Python 3.12, FastAPI, LangChain (LLM Orchestration).
Audio Processing: FFMpeg, Pydub, Google Cloud TTS (SSML).
AI Models: Google Vertex AI, Gemini Pro.
Architecture: Service-oriented architecture with Factory patterns for TTS providers and Strategy patterns for content generation.

7. Audiobook Generator

Intelligent Document-to-Audio System An intelligent full-stack solution that processes PDFs and ePUB formats to autonomously structure chapters and generate high-quality audiobooks using Google TTS — surpassing traditional audiobook features offered by platforms like ElevenLabs.

⚡ Tech Stack & Architecture:

Backend: FastAPI, Pydantic, SpaCy (NLP), Ebooklib.
Frontend: Next.js 14, React 18, Firebase Auth.
Cloud: Cloud Run, Cloud Storage, Google Cloud Vision API.
Architecture: Async-first microservices with background task scheduling (APScheduler) and stream processing.

8. Medical Docs Analyzer

Healthcare Fraud Detection System A smart healthcare document analysis system capable of interpreting diverse medical documents including handwritten prescriptions and discharge summaries. It extracts key KPIs, detects fraud, verifies document legitimacy, and generates contextual follow-up questions.

⚡ Tech Stack & Architecture:

Core: Next.js 14, TypeScript, PDFKit, Sharp.
AI: Gemini 2.0 Flash (Multimodal Analysis).
Infrastructure: Cloud Run (Serverless), Cloud Build.
Architecture: Serverless Multi-Step AI Pipeline (Classification → Analysis → Fraud Detection).

9. Pixora Studio

AI-Powered Image and Video Generation Studio A "Photoshop Agent" and Video Generation studio that unifies multiple generative models into a single creative workflow. It handles complex media operations like video trimming and composition directly in the browser.

⚡ Tech Stack & Architecture:

Frontend: Next.js 14, React 18, MediaRecorder API.
AI Models: Gemini 1.5 Pro, Google Imagen 3, Google Veo (Video Generation).
Architecture: Multi-Model AI Studio pattern with hybrid client-side video processing and server-side AI generation.

10. Live Voice Agent

Real-time 3D AI Interaction An Intelligent live voice agent using Gemini-Live-Voice model. The Agent is integrated with function calling and tool use to take action on behalf of the user, visualized with a reactive 3D avatar.

⚡ Tech Stack & Architecture:

Core: TypeScript, React, Three.js (WebGL), Web Audio API.
AI: Gemini Live API (Real-time Streaming).
Architecture: Client-side audio processing pipeline with bidirectional WebSocket communication.

Video	Screenshot
AI Voice bot

11. Gifinity

AI Sprite Sheet & GIF Generator A fun App that generates sprite sheet images using Gemini Nano/Pro models and converts them to animated GIFs client-side.

⚡ Tech Stack & Architecture:

Core: Next.js, HTML5 Canvas API, Gifshot.js.
AI: Gemini Pro Vision, Gemini Nano.
Architecture: Client-Side Animation Pipeline where heavy frame processing is offloaded to the browser via Canvas API.

12. Podcast Idea Generator

Trend-to-Content Intelligence A sophisticated API-driven application designed to bridge the gap between raw trend data and actionable podcast content. It intelligently combines insights from global Google Trends and internal podcast analytics.

⚡ Tech Stack & Architecture:

Backend: Python 3.12, FastAPI, Pydantic.
Data: Google BigQuery (Trend Analysis), Elasticsearch 8.x (Analytics).
Architecture: Async Microservices orchestration leveraging BigQuery for large-scale trend aggregation.

13. Sentiment Analyzer

Social Media Campaign Tracker A comprehensive social media analytics tool that scrapes user comments from platforms like Instagram, YouTube, and Facebook to analyze sentiment trends and track influencer campaign effectiveness.

⚡ Tech Stack & Architecture:

Core: Next.js 14, React 18, TailwindCSS.
AI: Gemini 1.5 Flash (Sentiment Classification).
Infrastructure: Google Cloud Run, Cloud Build (CI/CD).
Architecture: Serverless Microservices pattern deployed via Cloud Build.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

G Santosh Kumar gsantoshkumar1999

Achievements

Achievements

Block or report gsantoshkumar1999

Hi there, I'm G Santosh Kumar 👋

Projects

1. Boxsand Coder Agent

2. Zenith

3. Swara

4. Model Park

5. Occupancy Analytics & People Tracking

6. Podcraftor

7. Audiobook Generator

8. Medical Docs Analyzer

9. Pixora Studio

10. Live Voice Agent

11. Gifinity

12. Podcast Idea Generator

13. Sentiment Analyzer

Pinned Loading

Uh oh!