Feature Election for NVIDIA FLARE

A plug-and-play horizontal federated feature selection framework for tabular datasets in NVIDIA FLARE.

Overview

This work originates from FLASH: A framework for Federated Learning with Attribute Selection and Hyperparameter optimization, presented at FLTA IEEE 2025 achieving the Best Student Paper Award.

Feature Election enables multiple clients with tabular datasets to collaboratively identify the most relevant features without sharing raw data. It works by using conventional feature selection algorithms on the client side and performing a weighted aggregation of their results.

FLASH is available on GitHub

Citation

If you use Feature Election in your research, please cite the FLASH framework paper:

IEEE Style:

I. Christofilogiannis, G. Valavanis, A. Shevtsov, I. Lamprou and S. Ioannidis, "FLASH: A Framework for Federated Learning with Attribute Selection and Hyperparameter Optimization," 2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA), Dubrovnik, Croatia, 2025, pp. 93-100, doi: 10.1109/FLTA67013.2025.11336571.

BibTeX:

@INPROCEEDINGS{11336571,
  author={Christofilogiannis, Ioannis and Valavanis, Georgios and Shevtsov, Alexander and Lamprou, Ioannis and Ioannidis, Sotiris},
  booktitle={2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA)}, 
  title={FLASH: A Framework for Federated Learning with Attribute Selection and Hyperparameter Optimization}, 
  year={2025},
  pages={93-100},
  doi={10.1109/FLTA67013.2025.11336571}
}

Key Features

Easy Integration: Simple API for tabular datasets (pandas, numpy)
Multiple Feature Selection Methods: Lasso, Elastic Net, Mutual Information, Random Forest, PyImpetus, and more
Flexible Aggregation: Configurable freedom degree (0=intersection, 1=union, 0-1=weighted voting)
Auto-tuning: Automatic optimization of freedom degree using hill-climbing
Multi-phase Workflow: Local FS → Feature Election with tuning → FL Aggregation
Privacy-Preserving: Only feature selections and scores are shared, not raw data
Production-Ready: Fully compatible with NVIDIA FLARE workflows

Optional Dependencies

scikit-learn ≥ 1.0 is required for most feature selection methods
→ automatically installed with pip install nvflare
PyImpetus ≥ 0.0.6 is optional (enables advanced permutation importance methods)
→ install manually if needed:

pip install PyImpetus

Quick Start

Basic Usage

from nvflare.app_opt.feature_election import quick_election
import pandas as pd

# Load your tabular dataset
df = pd.read_csv("your_data.csv")

# Run feature election (simulation mode)
selected_mask, stats = quick_election(
    df=df,
    target_col='target',
    num_clients=4,
    fs_method='lasso',
)

# Get selected features
selected_features = df.columns[:-1][selected_mask]
print(f"Selected {len(selected_features)} features: {list(selected_features)}")
print(f"Freedom degree: {stats['freedom_degree']}")

Custom Configuration

from nvflare.app_opt.feature_election import FeatureElection

# Initialize with custom parameters
fe = FeatureElection(
    freedom_degree=0.6,
    fs_method='elastic_net',
    aggregation_mode='weighted',
    auto_tune=True,
    tuning_rounds=5
)

# Prepare data splits for clients
client_data = fe.prepare_data_splits(
    df=df,
    target_col='target',
    num_clients=5,
    split_strategy='stratified'  # or 'random', 'sequential', 'dirichlet'
)

# Run simulation
stats = fe.simulate_election(client_data)

# Access selected features
selected_features = fe.selected_feature_names
print(f"Selected {stats['num_features_selected']} features")

Workflow Architecture

The Feature Election workflow consists of three phases:

┌─────────────────────────────────────────────────────────────────┐
│                    PHASE 1: Local Feature Selection             │
│  Clients perform local FS using configured method (lasso, etc.) │
│  → Each client sends: selected_features, feature_scores         │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│              PHASE 2: Tuning & Global Mask Generation           │
│  If auto_tune=True: Hill-climbing to find optimal freedom_degree│
│  → Aggregates selections using weighted voting                  │
│  → Distributes global feature mask to all clients               │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                 PHASE 3: FL Aggregation (Training)              │
│  Standard FedAvg training on reduced feature set                │
│  → num_rounds of federated training                             │
└─────────────────────────────────────────────────────────────────┘

NVIDIA FLARE Deployment

1. Generate Configuration Files

from nvflare.app_opt.feature_election import FeatureElection

fe = FeatureElection(
    freedom_degree=0.5,
    fs_method='lasso',
    aggregation_mode='weighted',
    auto_tune=True,
    tuning_rounds=4
)

# Generate FLARE job configuration
config_paths = fe.create_flare_job(
    job_name="feature_selection_job",
    output_dir="./jobs/feature_selection",
    min_clients=2,
    num_rounds=5,
    client_sites=['hospital_1', 'hospital_2', 'hospital_3']
)

2. Prepare Client Data

Each client should prepare their data:

from nvflare.app_opt.feature_election import FeatureElectionExecutor
import numpy as np

# In your client script
executor = FeatureElectionExecutor(
    fs_method='lasso',
    eval_metric='f1'
)

# Load and set client data
X_train, y_train = load_client_data()  # Your data loading logic
executor.set_data(X_train, y_train, feature_names=feature_names)

3. Submit FLARE Job

nvflare job submit -j ./jobs/feature_selection

Feature Selection Methods

Method	Description	Best For	Parameters
`lasso`	L1 regularization	High-dimensional sparse data	`alpha`, `max_iter`
`elastic_net`	L1+L2 regularization	Correlated features	`alpha`, `l1_ratio`, `max_iter`
`random_forest`	Tree-based importance	Non-linear relationships	`n_estimators`, `max_depth`
`mutual_info`	Information gain	Any data type	`n_neighbors`
`pyimpetus`	Permutation importance	Robust feature selection	`p_val_thresh`, `num_sim`

Parameters