A novel approach for an informed selection of suitable ML metrics for classification and regression tasks
In the area of ML/DL, oftentimes standard evaluation metrics are selected for the training, evaluation, and/or monitoring of ML/DL models. Mostly, the choice of specific metrics is ad-hoc and not questioned or justified. Therefore, task- or dataset-specific requirements are not sufficiently taken into account. The choice of ML metrics not only has an impact on the conclusions drawn on the predictive performance of a model, it consequently also affects the actual performance, following the principle: "You can only improve what you measure."
This is a research project with the goal of helping ML/DL practitioners to select metrics appropriate for their tasks or datasets.
# Clone the repository
git clone
# Install dependencies specified in pyproject.toml
pip install .The simplest way to use the tool is via command line:
python main.py --csv_path data/matrix.csv --out_path ./outputThis will:
- Load your metric-property matrix with positive examples only from the CSV file
- Automatically generate training data by systematically flipping individual property values
- Save the complete training data to
output/training_data_*.csv - Train a base decision tree classifier on the generated training data
- Save the decision tree visualization as
base_decision_tree_*.pdfin the output directory - Generate the metric decision tree visualization as
metric_decision_tree_*.pdfin the output directory
--csv_path(required): Path to your metric-property matrix CSV file (containing positive examples only)--out_path(required): Output directory where the training data CSV and decision tree PDF will be saved--max_depth(optional): Maximum depth of the decision tree (default: 8)--min_samples_leaf(optional): Minimum number of samples required at a leaf node (default: 4)
The metric-property matrix CSV file should contain only positive examples (suitable metrics). Negative examples will be automatically generated.
- Separator: Semicolon (
;) - First row (header): Column names
- First column:
metric - Middle columns: Property names (e.g.,
Multiclass_capable,Sensitive_to_outliers, ...) - Last column:
suitable(must always be1for positive examples)
- First column:
- Data rows (positive examples only):
- First column:
metric- Name of the metric - Property columns: Binary values indicating metric properties
1= property applies to the metric0= property does not apply to the metric
- Last column:
suitable- Must be1(these are positive examples)
- First column:
metric;property1;property2;property3;property4;suitable
MAE;0;1;0;1;1
RMSE;1;1;0;1;1
MSE;1;1;0;1;1
MAPE;1;0;0;1;1The tool automatically generates negative training examples using the following strategy:
- For each positive example (suitable metric), the tool creates multiple negative examples
- Each negative example is generated by flipping exactly one property value (0→1 or 1→0)
- If a metric has multiple positive examples (variants), negative examples are generated for each variant
- Duplicates are automatically removed (if flipping a property creates an existing positive example)
Example: If MAE has properties [0,1,0,1], the tool generates negative examples:
[1,1,0,1](first property flipped)[0,0,0,1](second property flipped)[0,1,1,1](third property flipped)[0,1,0,0](fourth property flipped)
This systematic approach ensures comprehensive coverage of unsuitable metric configurations.
The code is organized into modular functions:
-
generate_training_data(data, save_csv, output_dir): Generates negative training examples- Reads metric-property-matrix from CSV or DataFrame
- Systematically creates negative examples by flipping individual property values
- Removes duplicates (negative examples that match existing positive examples)
- Saves complete training data (positive + negative) to CSV
- Returns: Path to saved CSV file (or DataFrame if
save_csv=False)
-
load_training_data(data): Loads and preprocesses the training data- Reads the complete training data CSV or DataFrame (generated by
generate_negative_examples) - Extracts target variable
suitable - Removes non-numeric column with metric names (
metric) - Returns: Feature matrix
X(property values) and target variabley(suitability)
- Reads the complete training data CSV or DataFrame (generated by
-
train_decision_tree(X, y, max_depth, min_samples_leaf): Trains the decision tree classifier- Uses entropy criterion for information gain
- Applies balanced class weights to handle imbalanced data
- Configurable tree depth and leaf size
- Returns: Trained classifier object
-
generate_base_decision_tree(clf, feature_names, out_path): Exports the decision tree as PDF vector graphics- Color-coded nodes for better readability
- Shows property names at decision nodes
- Displays class distribution at leaf nodes
-
generate_metric_decision_tree(clf, samples_df, feature_names): Performs inference for the metrics and their properties with the trained DT classifier and prunes empty nodes.- Returns: Decision tree for metric recommendation as a Digraph object
The tool generates two main outputs:
Contains the complete training dataset with:
- All positive examples from the input file (with
suitable=1) - All automatically generated negative examples (with
suitable=0) - Organized by metric in the original input order
A high-resolution PDF file showing:
- Decision nodes: Property-based splitting criteria (e.g., "Multiclass_capable <= 0.5")
- Leaf nodes: Final classification (suitable vs. unsuitable)
- Node information: Entropy, samples, and class distribution
- Color-coding: Visual distinction between suitable and unsuitable metrics
The decision tree helps you understand which combination of metric properties leads to suitable or unsuitable metrics for your specific use case.
The final decision tree for metric recommendation.
- Decision nodes: Ask about desired characteristics.
- Leaf nodes: Collection of suitable metrics for the chosen path.
This repository accompanies our research paper submitted to QualITA 2026. The paper will be referenced here upon publication.
We welcome contributions from the community! As discussed in the workshop paper, by nature, our evaluation did not cover all metrics and properties you can think of. Further, it is currently limited to classification and regression tasks. If a metric or property you consider important is missing, you can contribute to the project as follows. Also, new features, such as improved tree visualization methods, are welcome.
- Fork the repository
- Create a feature branch: Example:
git checkout -b feature/new-feature - Make your changes: For example, update metric-property matrix and re-generate decision trees.
- Commit your changes:
git commit -m 'Add new metric and properties' - Push to the branch:
git push origin feature/new-feature - Open a Pull Request
If you use this software in your research, please cite our paper:
TBD
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.