|
1 | | -# Feature Election Examples |
| 1 | +# Feature Election for NVIDIA FLARE |
2 | 2 |
|
3 | | -Examples demonstrating federated feature selection using NVIDIA FLARE. |
| 3 | +A plug-and-play horizontal federated feature selection framework for tabular datasets in NVIDIA FLARE. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This work originates from FLASH: A framework for Federated Learning with Attribute Selection and Hyperparameter optimization, presented at [FLTA IEEE 2025](https://flta-conference.org/flta-2025/) achieving the Best Student Paper Award. |
| 8 | + |
| 9 | +Feature Election enables multiple clients with tabular datasets to collaboratively identify the most relevant features without sharing raw data. It works by using conventional feature selection algorithms on the client side and performing a weighted aggregation of their results. |
| 10 | + |
| 11 | +FLASH is available on [GitHub](https://github.com/parasecurity/FLASH) |
| 12 | + |
| 13 | +## Citation |
| 14 | + |
| 15 | +If you use Feature Election in your research, please cite the FLASH framework paper: |
| 16 | + |
| 17 | +**IEEE Style:** |
| 18 | +> I. Christofilogiannis, G. Valavanis, A. Shevtsov, I. Lamprou and S. Ioannidis, "FLASH: A Framework for Federated Learning with Attribute Selection and Hyperparameter Optimization," 2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA), Dubrovnik, Croatia, 2025, pp. 93-100, doi: 10.1109/FLTA67013.2025.11336571. |
| 19 | +
|
| 20 | +**BibTeX:** |
| 21 | +```bibtex |
| 22 | +@INPROCEEDINGS{11336571, |
| 23 | + author={Christofilogiannis, Ioannis and Valavanis, Georgios and Shevtsov, Alexander and Lamprou, Ioannis and Ioannidis, Sotiris}, |
| 24 | + booktitle={2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA)}, |
| 25 | + title={FLASH: A Framework for Federated Learning with Attribute Selection and Hyperparameter Optimization}, |
| 26 | + year={2025}, |
| 27 | + pages={93-100}, |
| 28 | + doi={10.1109/FLTA67013.2025.11336571} |
| 29 | +} |
| 30 | +``` |
| 31 | + |
| 32 | +### Key Features |
| 33 | + |
| 34 | +- **Easy Integration**: Simple API for tabular datasets (pandas, numpy) |
| 35 | +- **Multiple Feature Selection Methods**: Lasso, Elastic Net, Mutual Information, Random Forest, PyImpetus, and more |
| 36 | +- **Flexible Aggregation**: Configurable freedom degree (0=intersection, 1=union, 0-1=weighted voting) |
| 37 | +- **Auto-tuning**: Automatic optimization of freedom degree using hill-climbing |
| 38 | +- **Multi-phase Workflow**: Local FS → Feature Election with tuning → FL Aggregation |
| 39 | +- **Privacy-Preserving**: Only feature selections and scores are shared, not raw data |
| 40 | +- **Production-Ready**: Fully compatible with NVIDIA FLARE workflows |
| 41 | + |
| 42 | +### Optional Dependencies |
| 43 | + |
| 44 | +- `scikit-learn` ≥ 1.0 is required for most feature selection methods |
| 45 | + → automatically installed with `pip install nvflare` |
| 46 | + |
| 47 | +- `PyImpetus` ≥ 0.0.6 is optional (enables advanced permutation importance methods) |
| 48 | + → install manually if needed: |
| 49 | +```bash |
| 50 | +pip install PyImpetus |
| 51 | +``` |
| 52 | + |
| 53 | +## Quick Start |
| 54 | + |
| 55 | +### Basic Usage |
| 56 | + |
| 57 | +```python |
| 58 | +from nvflare.app_opt.feature_election import quick_election |
| 59 | +import pandas as pd |
| 60 | + |
| 61 | +# Load your tabular dataset |
| 62 | +df = pd.read_csv("your_data.csv") |
| 63 | + |
| 64 | +# Run feature election (simulation mode) |
| 65 | +selected_mask, stats = quick_election( |
| 66 | + df=df, |
| 67 | + target_col='target', |
| 68 | + num_clients=4, |
| 69 | + fs_method='lasso', |
| 70 | +) |
| 71 | + |
| 72 | +# Get selected features |
| 73 | +selected_features = df.columns[:-1][selected_mask] |
| 74 | +print(f"Selected {len(selected_features)} features: {list(selected_features)}") |
| 75 | +print(f"Freedom degree: {stats['freedom_degree']}") |
| 76 | +``` |
| 77 | + |
| 78 | +### Custom Configuration |
| 79 | + |
| 80 | +```python |
| 81 | +from nvflare.app_opt.feature_election import FeatureElection |
| 82 | + |
| 83 | +# Initialize with custom parameters |
| 84 | +fe = FeatureElection( |
| 85 | + freedom_degree=0.6, |
| 86 | + fs_method='elastic_net', |
| 87 | + aggregation_mode='weighted', |
| 88 | + auto_tune=True, |
| 89 | + tuning_rounds=5 |
| 90 | +) |
| 91 | + |
| 92 | +# Prepare data splits for clients |
| 93 | +client_data = fe.prepare_data_splits( |
| 94 | + df=df, |
| 95 | + target_col='target', |
| 96 | + num_clients=5, |
| 97 | + split_strategy='stratified' # or 'random', 'sequential', 'dirichlet' |
| 98 | +) |
| 99 | + |
| 100 | +# Run simulation |
| 101 | +stats = fe.simulate_election(client_data) |
| 102 | + |
| 103 | +# Access selected features |
| 104 | +selected_features = fe.selected_feature_names |
| 105 | +print(f"Selected {stats['num_features_selected']} features") |
| 106 | +``` |
| 107 | + |
| 108 | +## Workflow Architecture |
| 109 | + |
| 110 | +The Feature Election workflow consists of three phases: |
| 111 | + |
| 112 | +``` |
| 113 | +┌─────────────────────────────────────────────────────────────────┐ |
| 114 | +│ PHASE 1: Local Feature Selection │ |
| 115 | +│ Clients perform local FS using configured method (lasso, etc.) │ |
| 116 | +│ → Each client sends: selected_features, feature_scores │ |
| 117 | +└─────────────────────────────────────────────────────────────────┘ |
| 118 | + ↓ |
| 119 | +┌─────────────────────────────────────────────────────────────────┐ |
| 120 | +│ PHASE 2: Tuning & Global Mask Generation │ |
| 121 | +│ If auto_tune=True: Hill-climbing to find optimal freedom_degree│ |
| 122 | +│ → Aggregates selections using weighted voting │ |
| 123 | +│ → Distributes global feature mask to all clients │ |
| 124 | +└─────────────────────────────────────────────────────────────────┘ |
| 125 | + ↓ |
| 126 | +┌─────────────────────────────────────────────────────────────────┐ |
| 127 | +│ PHASE 3: FL Aggregation (Training) │ |
| 128 | +│ Standard FedAvg training on reduced feature set │ |
| 129 | +│ → num_rounds of federated training │ |
| 130 | +└─────────────────────────────────────────────────────────────────┘ |
| 131 | +``` |
| 132 | + |
| 133 | +## NVIDIA FLARE Deployment |
| 134 | + |
| 135 | +### 1. Generate Configuration Files |
| 136 | + |
| 137 | +```python |
| 138 | +from nvflare.app_opt.feature_election import FeatureElection |
| 139 | + |
| 140 | +fe = FeatureElection( |
| 141 | + freedom_degree=0.5, |
| 142 | + fs_method='lasso', |
| 143 | + aggregation_mode='weighted', |
| 144 | + auto_tune=True, |
| 145 | + tuning_rounds=4 |
| 146 | +) |
| 147 | + |
| 148 | +# Generate FLARE job configuration |
| 149 | +config_paths = fe.create_flare_job( |
| 150 | + job_name="feature_selection_job", |
| 151 | + output_dir="./jobs/feature_selection", |
| 152 | + min_clients=2, |
| 153 | + num_rounds=5, |
| 154 | + client_sites=['hospital_1', 'hospital_2', 'hospital_3'] |
| 155 | +) |
| 156 | +``` |
| 157 | + |
| 158 | +### 2. Prepare Client Data |
| 159 | + |
| 160 | +Each client should prepare their data: |
| 161 | + |
| 162 | +```python |
| 163 | +from nvflare.app_opt.feature_election import FeatureElectionExecutor |
| 164 | +import numpy as np |
| 165 | + |
| 166 | +# In your client script |
| 167 | +executor = FeatureElectionExecutor( |
| 168 | + fs_method='lasso', |
| 169 | + eval_metric='f1' |
| 170 | +) |
| 171 | + |
| 172 | +# Load and set client data |
| 173 | +X_train, y_train = load_client_data() # Your data loading logic |
| 174 | +executor.set_data(X_train, y_train, feature_names=feature_names) |
| 175 | +``` |
| 176 | + |
| 177 | +### 3. Submit FLARE Job |
| 178 | + |
| 179 | +```bash |
| 180 | +nvflare job submit -j ./jobs/feature_selection |
| 181 | +``` |
| 182 | + |
| 183 | +## Feature Selection Methods |
| 184 | + |
| 185 | +| Method | Description | Best For | Parameters | |
| 186 | +|--------|-------------|----------|------------| |
| 187 | +| `lasso` | L1 regularization | High-dimensional sparse data | `alpha`, `max_iter` | |
| 188 | +| `elastic_net` | L1+L2 regularization | Correlated features | `alpha`, `l1_ratio`, `max_iter` | |
| 189 | +| `random_forest` | Tree-based importance | Non-linear relationships | `n_estimators`, `max_depth` | |
| 190 | +| `mutual_info` | Information gain | Any data type | `n_neighbors` | |
| 191 | +| `pyimpetus` | Permutation importance | Robust feature selection | `p_val_thresh`, `num_sim` | |
| 192 | + |
| 193 | +## Parameters |
| 194 | + |
| 195 | +### FeatureElection |
| 196 | + |
| 197 | +| Parameter | Type | Default | Description | |
| 198 | +|-----------|------|---------|-------------| |
| 199 | +| `freedom_degree` | float | 0.5 | Controls feature inclusion (0=intersection, 1=union) | |
| 200 | +| `fs_method` | str | "lasso" | Feature selection method | |
| 201 | +| `aggregation_mode` | str | "weighted" | How to weight client votes ('weighted' or 'uniform') | |
| 202 | +| `auto_tune` | bool | False | Enable automatic tuning of freedom_degree | |
| 203 | +| `tuning_rounds` | int | 5 | Number of rounds for auto-tuning | |
| 204 | + |
| 205 | +### FeatureElectionController |
| 206 | + |
| 207 | +| Parameter | Type | Default | Description | |
| 208 | +|-----------|------|---------|-------------| |
| 209 | +| `freedom_degree` | float | 0.5 | Initial freedom degree | |
| 210 | +| `aggregation_mode` | str | "weighted" | Client vote weighting | |
| 211 | +| `min_clients` | int | 2 | Minimum clients required | |
| 212 | +| `num_rounds` | int | 5 | FL training rounds after feature selection | |
| 213 | +| `auto_tune` | bool | False | Enable auto-tuning | |
| 214 | +| `tuning_rounds` | int | 0 | Number of tuning rounds | |
| 215 | +| `train_timeout` | int | 300 | Training phase timeout (seconds) | |
| 216 | + |
| 217 | +### Data Splitting Strategies |
| 218 | + |
| 219 | +- **stratified**: Maintains class distribution (recommended for classification) |
| 220 | +- **random**: Random split |
| 221 | +- **sequential**: Sequential split for ordered data |
| 222 | +- **dirichlet**: Non-IID split with Dirichlet distribution (alpha=0.5) |
| 223 | + |
| 224 | +## API Reference |
| 225 | + |
| 226 | +### Core Classes |
| 227 | + |
| 228 | +#### FeatureElection |
| 229 | + |
| 230 | +Main interface for feature election. |
| 231 | + |
| 232 | +```python |
| 233 | +class FeatureElection: |
| 234 | + def __init__( |
| 235 | + self, |
| 236 | + freedom_degree: float = 0.5, |
| 237 | + fs_method: str = "lasso", |
| 238 | + aggregation_mode: str = "weighted", |
| 239 | + auto_tune: bool = False, |
| 240 | + tuning_rounds: int = 5, |
| 241 | + ) |
| 242 | + |
| 243 | + def prepare_data_splits(...) -> List[Tuple[pd.DataFrame, pd.Series]] |
| 244 | + def simulate_election(...) -> Dict |
| 245 | + def create_flare_job(...) -> Dict[str, str] |
| 246 | + def apply_mask(...) -> Union[pd.DataFrame, np.ndarray] |
| 247 | + def save_results(filepath: str) |
| 248 | + def load_results(filepath: str) |
| 249 | +``` |
| 250 | + |
| 251 | +#### FeatureElectionController |
| 252 | + |
| 253 | +Server-side controller for NVIDIA FLARE. |
| 254 | + |
| 255 | +```python |
| 256 | +class FeatureElectionController(Controller): |
| 257 | + def __init__( |
| 258 | + self, |
| 259 | + freedom_degree: float = 0.5, |
| 260 | + aggregation_mode: str = "weighted", |
| 261 | + min_clients: int = 2, |
| 262 | + num_rounds: int = 5, |
| 263 | + task_name: str = "feature_election", |
| 264 | + train_timeout: int = 300, |
| 265 | + auto_tune: bool = False, |
| 266 | + tuning_rounds: int = 0, |
| 267 | + ) |
| 268 | +``` |
| 269 | + |
| 270 | +#### FeatureElectionExecutor |
| 271 | + |
| 272 | +Client-side executor for NVIDIA FLARE. |
| 273 | + |
| 274 | +class FeatureElectionExecutor(Executor): |
| 275 | + def __init__( |
| 276 | + self, |
| 277 | + fs_method: str = "lasso", |
| 278 | + fs_params: Optional[Dict] = None, |
| 279 | + eval_metric: str = "f1", |
| 280 | + task_name: str = "feature_election" |
| 281 | + ) |
| 282 | + |
| 283 | + def set_data(X_train, y_train, X_val=None, y_val=None, feature_names=None) |
| 284 | + def evaluate_model(X_train, y_train, X_val, y_val) -> float |
| 285 | +``` |
| 286 | + |
| 287 | +### Convenience Functions |
| 288 | + |
| 289 | +```python |
| 290 | +def quick_election( |
| 291 | + df: pd.DataFrame, |
| 292 | + target_col: str, |
| 293 | + num_clients: int = 3, |
| 294 | + freedom_degree: float = 0.5, |
| 295 | + fs_method: str = "lasso", |
| 296 | + split_strategy: str = "stratified", |
| 297 | + **kwargs |
| 298 | +) -> Tuple[np.ndarray, Dict] |
| 299 | + |
| 300 | +def load_election_results(filepath: str) -> Dict |
| 301 | +``` |
| 302 | + |
| 303 | +## Troubleshooting |
| 304 | + |
| 305 | +### Common Issues |
| 306 | + |
| 307 | +1. **"No features selected"** |
| 308 | + - Increase freedom_degree |
| 309 | + - Try different fs_method |
| 310 | + - Check feature scaling |
| 311 | + |
| 312 | +2. **"No feature votes received"** |
| 313 | + - Ensure client data is loaded before execution |
| 314 | + - Check that task_name matches between controller and executor |
| 315 | + |
| 316 | +3. **"Poor performance after selection"** |
| 317 | + - Enable auto_tune to find optimal freedom_degree |
| 318 | + - Try weighted aggregation mode |
| 319 | + |
| 320 | +4. **"PyImpetus not available"** |
| 321 | + - Install with: `pip install PyImpetus` |
| 322 | + - Falls back to mutual information if unavailable |
| 323 | + |
| 324 | +### Debug Mode |
| 325 | + |
| 326 | +Enable detailed logging: |
| 327 | + |
| 328 | +```python |
| 329 | +import logging |
| 330 | +logging.basicConfig(level=logging.DEBUG) |
| 331 | +``` |
| 332 | + |
| 333 | +## Running Tests |
| 334 | + |
| 335 | +```bash |
| 336 | +pytest tests/unit_test/app_opt/feature_election/test.py -v |
| 337 | +``` |
| 338 | + |
| 339 | +# Examples |
4 | 340 |
|
5 | 341 | ## Quick Start |
6 | 342 |
|
|
0 commit comments