ml-evaluation

Here are 7 public repositories matching this topic...

OlivierBinette / er-evaluation

An End-to-End Evaluation Framework for Entity Resolution Systems

data-science statistics matching record-linkage entity-resolution evaluation fuzzy-matching disambiguation deduplication duplicate-detection author-name-disambiguation ml-testing ml-evaluation inventor-name-disambiguation

Updated Dec 3, 2023
Python

supermodeltools / mcpbr

Star

Model Context Protocol Benchmark Runner

python benchmarking machine-learning mcp ml-evaluation llm-evaluation model-context-protocol swe-bench

Updated Mar 27, 2026
Python

AmeyaWagh / robometric-frame

Star

A metrics library to evaluate vision language models with a pytorch eco system.

robotics policy-evaluation evaluation-metrics ml-evaluation torchmetrics diffusion-policy lerobot vision-language-action-model policy-lear

Updated Mar 7, 2026
Python

pareshrnayak / confusion-matrix-generator

Star

An open-source Streamlit web app to generate beautiful confusion matrices for multi-class machine learning models. Supports numeric and string labels, CSV upload, manual label entry, custom color maps, and displays evaluation metrics like Accuracy, Precision, Recall, and F1-score. Users can download the confusion matrix as an image.

python open-source data-science machine-learning data-visualization confusion-matrix model-evaluation multiclass-classification streamlit streamlit-webapp classification-metrics ml-evaluation confusion-matrix-generator

Updated Jan 18, 2026
Python

Comrade-1729 / lex-brief-ai

Star

Safety-first legal NLP system with hierarchical long-document processing, deterministic inference, clause extraction, and rule-based risk engine — built for traceability and deployment constraints.

nlp django transformers pytorch production-ml ml-evaluation rule-based-systems deterministic-inference

Updated Feb 10, 2026
Python

rodrigoguedes09 / model-observability-system

Star

Enterprise-grade machine learning observability platform that detects data drift, concept drift, and performance degradation in production models. Features statistical drift detection (KS test, PSI), real-time alerting, Redis caching, and FastAPI backend.

python machine-learning machine-learning-algorithms ml observability ml-observability ml-evaluation

Updated Jan 15, 2026
Python

SvetLuna-Lab / Mini-rag-eval-demo

Star

Small, educational project that shows how to build a **minimal RAG pipeline** with a **simple evaluation loop**

python nlp machine-learning information-retrieval text-mining evaluation tfidf educational-project rag qa-system ml-evaluation retrieval-augmented-generation

Updated Nov 10, 2025
Python

Improve this page

Add a description, image, and links to the ml-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ml-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml-evaluation

Here are 7 public repositories matching this topic...

OlivierBinette / er-evaluation

supermodeltools / mcpbr

AmeyaWagh / robometric-frame

pareshrnayak / confusion-matrix-generator

Comrade-1729 / lex-brief-ai

rodrigoguedes09 / model-observability-system

SvetLuna-Lab / Mini-rag-eval-demo

Improve this page

Add this topic to your repo