Rafay is a researcher, an AI enthusiast, software engineer, content creator, and advocate for AI EdTech.
- AdaptiveCloset: Reinforcement Learning in Personalized Clothing Recommendations
- Deciphering Faces: Enhancing Emotion Detection with Machine Learning Techniques
- Development of a Smart Shoe for Blind and Visual Impaired People
- Customized Learning for ADHD: An AI-Driven Assistive Study App
- Robotrolley: Customer Following Trolley (CFT)
[Open Source] Language, Decoded: Exploring the Impact of Native-Language Code on Multilingual Models
Legesher lets developers write code in their native language; replacing English Python keywords with translated equivalents in 52 languages. The premise: programming shouldn't require knowing English.
In Legesher research, we ask whether exposing large language models to Legesher-style native-language code during fine-tuning actually improves their multilingual reasoning abilities. Using CohereLabs/tiny-aya-base (3.35B params) with QLoRA 4-bit quantization, we fine-tuned on datasets where Python keywords were swapped into Chinese, Urdu, and Spanish, then evaluated on XNLI, CSQA, and MGSM benchmarks across languages.
Produced as part of Cohere's Expedition Tiny Aya program (March 2026), in collaboration with Legesher and Grayhat.
Research replicating and extending simulation-based evaluation frameworks for Explainable AI (XAI). Implements the XAIsim2real pipeline (HCOMP/AAAI-22) to compare explanation properties; faithfulness, sparsity, and robustness across synthetic human proxy models and target functions.
Extends the Chen et al. (NeurIPS 2021) framework with forward simulation and data-bug detection experiments, benchmarking SHAP and LIME across multiple tabular datasets (UCI Adult, Credit Default, Diabetes, Bank Marketing). Introduces a novel LLM-as-Cognitive-Proxy experiment that replaces statistical agents with an LLM required to verbalize causal reasoning; measuring not just prediction accuracy but reasoning quality, causal correctness, and uncertainty calibration against SHAP ground truth.
A voice-first AI system built on the LiveKit Agents framework that uses multiple LLM-backed agents within a shared real-time audio session. The architecture uses a two-stage agent pipeline: a ModeratorAgent (OpenAI gpt-4o-mini + Deepgram nova-3 STT) that onboards the presenter via natural conversation and triggers a tool call to hand off control, and a PresentationAgent (OpenAI Realtime API) that dynamically switches between an expert persona and a beginner persona using function tools and shared RunContext state.
Benchmarked cultural competence of South Asian multilingual LLMs across 7 Indic languages using two evaluation frameworks:
- DOSA — surface artifact recall
- MILU — ~79K MCQs from Indian competitive exams
Compared three Tiny Aya variants (South Asian, African, and global post-training) to isolate the effect of regional post-training data on cultural reasoning. Found that South Asian-focused post-training (Fire) significantly outperformed both baselines on 5/7 languages (p < 0.05), and surpassed Aya-23-35B — a 4× larger model — on three low-resource languages, demonstrating that targeted post-training can offset scale disadvantages in culturally situated tasks.
- Community Lead, EdTech — Cohere For AI, a non-profit open science organization with 3,000+ members from 100+ countries
- Research Mentor — TopMate, helping students navigate their higher education journey
Language Models & Training -LLM interpretability & explainability -Pre-training and fine-tuning pipelines -Low-resource NLP with cross-lingual transfer and data augmentation
Agents & Interaction -Dialogue systems with intent classification, slot filling, and retrieval-augmented response generation -LLM-based agent frameworks for multi-turn human-AI interaction -Adaptive learning systems
Vision & Generation -Object detection and image classification/segmentation -Text-to-image generation using latent diffusion models

