Skip to content
View gsantoshkumar1999's full-sized avatar
:octocat:
🧑🏾‍💻
:octocat:
🧑🏾‍💻

Block or report gsantoshkumar1999

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
gsantoshkumar1999/README.md

Hi there, I'm G Santosh Kumar 👋

Projects

1. Boxsand Coder Agent

An Intelligent AI Developer Workspace A VS Code-style browser IDE powered by Pydantic-AI and the Model Context Protocol (MCP). It functions as an autonomous coding partner capable of managing file systems, executing git operations, and generating code in real-time.

⚡ Tech Stack & Architecture:

  • Core AI: Python 3.11, Pydantic-AI, Google Gemini Models.
  • Backend: FastAPI, MCP (Model Context Protocol) Server, Cloud Run.
  • Frontend: React 19, Vite, Monaco Editor, XTerm.js.
  • Architecture: Agent-based Microservices with tool registration and stateful session management.
image

Real-Time Voice and Vision AI Consultant

The Zenith Pod is an advanced interactive assistant powered by Gemini Live that enables natural, fluid voice conversations while sharing live camera feed for real-time vision analysis. It can search the web, generate images, and analyze what it sees — all in one seamless experience.

⚡ Tech Stack & Architecture:

  • Core AI: Google Gemini Live (voice + vision), Gemini models for tool calling.
  • Real-Time Engine: WebSockets for bidirectional audio & metadata streaming.
  • Vision Layer: MediaDevices API + canvas frame extraction.
  • Tools: Google Search API, Imagen API (text-to-image).
  • Frontend: React 19, Tailwind CSS, Canvas WaveformVisualizer.
  • Architecture: Event-driven services (LiveService, ToolService) with React state management.
image image

Real-time AI Music Creation Studio

The Swara is a real-time AI music creation studio that lets users describe a vibe and interactively steer music generation live. It integrates Google Gemini 2.5 Flash for intelligent music planning and Google Lyria RealTime for continuous audio streaming, providing a fully interactive music composition experience.

⚡ Tech Stack & Architecture:

  • Core AI: Google Gemini 2.5 Flash (vibe analysis, song planning), Google Lyria RealTime (music generation and streaming).
  • Backend: Next.js API routes for prompt generation, WebSockets for real-time audio.
  • Frontend: React, Tailwind CSS, shadcn/ui components, Web Audio API for playback and visualization.
  • Architecture: Component-based with services (MusicService, audioUtils) for WebSocket management, audio pipeline, and stateful React management including debounced updates and error handling.
Swara

Self-hosted Multi-Model LLM Inference Platform A production-grade LLM gateway providing access to 10+ open-source models (Gemma, Llama, DeepSeek, etc.) via a single OpenAI-compatible API. Hosted on Google Cloud Run with NVIDIA L4 GPUs, it features intelligent auto-routing, scale-to-zero cost optimization, and secure service-to-service authentication.

⚡ Tech Stack & Architecture:

  • AI Models: Gemma 3, Llama 3.1/3.2, DeepSeek-R1, Qwen 2.5/3, Mistral, Phi-4, etc.
  • Backend: Python 3.11, FastAPI, Ollama, OpenAI API Protocol.
  • Infrastructure: Google Cloud Run (GPU-enabled), NVIDIA L4 GPUs, Artifact Registry.
  • Architecture: Multi-model router with keyword-based auto-routing, API key middleware, and identity-based outbound authentication using Google Service Accounts.
image

Enterprise MLOps & Computer Vision Pipeline A comprehensive MLOps-IoT platform designed for automated video surveillance. It automates the training, deployment, and maintenance of computer vision models to track occupancy, demographics (gender, age), and workplace safety in real-time.

⚡ Tech Stack & Architecture:

  • ML Core: PyTorch 2.4, Ultralytics YOLOv8, OpenCV.
  • Orchestration: Kubeflow Pipelines (KFP 2.0), Vertex AI Training.
  • Data: BigQuery, Firestore, Google Cloud Storage.
  • Architecture: Event-Driven MLOps Pipeline with automated retraining triggers and edge-compatible deployment.
linkers_pipeline-1 linkers_pipeline-2
linkers_pipeline-3 linkers_pipeline-4
linkers_pipeline-5 linkers_pipeline-6

Text-to-Podcast Automation Engine A full-stack application that converts plain text into complete podcast episodes. Utilizes Google TTS with SSML support to generate natural, human-like audio, automating podcast production end-to-end.

⚡ Tech Stack & Architecture:

  • Backend: Python 3.12, FastAPI, LangChain (LLM Orchestration).
  • Audio Processing: FFMpeg, Pydub, Google Cloud TTS (SSML).
  • AI Models: Google Vertex AI, Gemini Pro.
  • Architecture: Service-oriented architecture with Factory patterns for TTS providers and Strategy patterns for content generation.
image image
image image

Intelligent Document-to-Audio System An intelligent full-stack solution that processes PDFs and ePUB formats to autonomously structure chapters and generate high-quality audiobooks using Google TTS — surpassing traditional audiobook features offered by platforms like ElevenLabs.

⚡ Tech Stack & Architecture:

  • Backend: FastAPI, Pydantic, SpaCy (NLP), Ebooklib.
  • Frontend: Next.js 14, React 18, Firebase Auth.
  • Cloud: Cloud Run, Cloud Storage, Google Cloud Vision API.
  • Architecture: Async-first microservices with background task scheduling (APScheduler) and stream processing.
image image
image

Healthcare Fraud Detection System A smart healthcare document analysis system capable of interpreting diverse medical documents including handwritten prescriptions and discharge summaries. It extracts key KPIs, detects fraud, verifies document legitimacy, and generates contextual follow-up questions.

⚡ Tech Stack & Architecture:

  • Core: Next.js 14, TypeScript, PDFKit, Sharp.
  • AI: Gemini 2.0 Flash (Multimodal Analysis).
  • Infrastructure: Cloud Run (Serverless), Cloud Build.
  • Architecture: Serverless Multi-Step AI Pipeline (Classification → Analysis → Fraud Detection).
image image
image image
image image

AI-Powered Image and Video Generation Studio A "Photoshop Agent" and Video Generation studio that unifies multiple generative models into a single creative workflow. It handles complex media operations like video trimming and composition directly in the browser.

⚡ Tech Stack & Architecture:

  • Frontend: Next.js 14, React 18, MediaRecorder API.
  • AI Models: Gemini 1.5 Pro, Google Imagen 3, Google Veo (Video Generation).
  • Architecture: Multi-Model AI Studio pattern with hybrid client-side video processing and server-side AI generation.
image image
image image

Real-time 3D AI Interaction An Intelligent live voice agent using Gemini-Live-Voice model. The Agent is integrated with function calling and tool use to take action on behalf of the user, visualized with a reactive 3D avatar.

⚡ Tech Stack & Architecture:

  • Core: TypeScript, React, Three.js (WebGL), Web Audio API.
  • AI: Gemini Live API (Real-time Streaming).
  • Architecture: Client-side audio processing pipeline with bidirectional WebSocket communication.
Video Screenshot
AI Voice bot image

AI Sprite Sheet & GIF Generator A fun App that generates sprite sheet images using Gemini Nano/Pro models and converts them to animated GIFs client-side.

⚡ Tech Stack & Architecture:

  • Core: Next.js, HTML5 Canvas API, Gifshot.js.
  • AI: Gemini Pro Vision, Gemini Nano.
  • Architecture: Client-Side Animation Pipeline where heavy frame processing is offloaded to the browser via Canvas API.
image (47) image (48) image (49)

Animated GIF's


Trend-to-Content Intelligence A sophisticated API-driven application designed to bridge the gap between raw trend data and actionable podcast content. It intelligently combines insights from global Google Trends and internal podcast analytics.

⚡ Tech Stack & Architecture:

  • Backend: Python 3.12, FastAPI, Pydantic.
  • Data: Google BigQuery (Trend Analysis), Elasticsearch 8.x (Analytics).
  • Architecture: Async Microservices orchestration leveraging BigQuery for large-scale trend aggregation.
image

Social Media Campaign Tracker A comprehensive social media analytics tool that scrapes user comments from platforms like Instagram, YouTube, and Facebook to analyze sentiment trends and track influencer campaign effectiveness.

⚡ Tech Stack & Architecture:

  • Core: Next.js 14, React 18, TailwindCSS.
  • AI: Gemini 1.5 Flash (Sentiment Classification).
  • Infrastructure: Google Cloud Run, Cloud Build (CI/CD).
  • Architecture: Serverless Microservices pattern deployed via Cloud Build.
image

Pinned Loading

  1. google-adk google-adk Public

    Playground Repo for Building Agents with Google ADK 🤖

    Python

  2. kokoro-test kokoro-test Public

    Python

  3. podcastfy-api podcastfy-api Public

    API for Podcastfy library

    HTML