Welcome to my GitHub profile! I am a Data Scientist and Machine Learning Engineer with over 10 years of experience in software development and more than 6 years in machine learning, currently working at the Court of Justice, where I am developing Generative AI (GenAI) and RAG systems for document summarization, QA, and legal text simplification.
- Programming Languages: Python, SQL
- Data Science & Machine Learning: pandas, NumPy, scikit-learn, TensorFlow, PyTorch, Keras, Hugginface, Spacy
- NLP & LLMs: 6+ years of experience working with Natural Language Processing (NLP) and Large Language Models (LLMs), including fine-tuning models like Mistral and Llama for legal and business applications
- Cloud Platforms & DevOps: Google Cloud Platform (GCP), Vertex AI, BigQuery, Docker, Kubernetes
- MLOps & Deployment: Expertise in building and managing ML pipelines using MLOps principles, including model serving and orchestration with Kubernetes and Vertex AI
- Legal AI Systems: Building AI-driven tools to simplify judicial text, generate case summaries, and automate decision-making insights
I have extensive experience contributing to cutting-edge AI projects in the legal tech space, helping organizations build smarter, more efficient processes. I've led projects that involve everything from data preparation and annotation to model deployment and API integration, following best practices in MLOps.
Recent projects include:
- GenAI and RAG systems: Developing advanced generative models and retrieval-augmented generation systems to summarize legal documents, generate reports, and simplify complex legal language.
- Automated Legal Summarization Systems: Implementing tools to extract meaningful insights from court case documents using NLP techniques and Langchain.
- Fine-tuning LLMs: Continuously experimenting with Supervised Fine-Tuning (SFT) on models like Mistral and Llama to adapt them to specific legal and business contexts.
I'm always learning new tools and technologies to stay on top of the rapidly evolving data engineering landscape. Currently, I'm exploring LangGraph, CrewAI, and Ollama.
- LinkedIn: Roberto Aragy
- GitHub: @aragy
Iβm always open to new challenges and opportunities, so feel free to connect and explore collaboration opportunities!