Skip to content

Commit 4dd84f2

Browse files
moved/merged readme as suggested
1 parent 41188db commit 4dd84f2

File tree

2 files changed

+338
-343
lines changed

2 files changed

+338
-343
lines changed

examples/advanced/feature_election/README.md

Lines changed: 338 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,342 @@
1-
# Feature Election Examples
1+
# Feature Election for NVIDIA FLARE
22

3-
Examples demonstrating federated feature selection using NVIDIA FLARE.
3+
A plug-and-play horizontal federated feature selection framework for tabular datasets in NVIDIA FLARE.
4+
5+
## Overview
6+
7+
This work originates from FLASH: A framework for Federated Learning with Attribute Selection and Hyperparameter optimization, presented at [FLTA IEEE 2025](https://flta-conference.org/flta-2025/) achieving the Best Student Paper Award.
8+
9+
Feature Election enables multiple clients with tabular datasets to collaboratively identify the most relevant features without sharing raw data. It works by using conventional feature selection algorithms on the client side and performing a weighted aggregation of their results.
10+
11+
FLASH is available on [GitHub](https://github.com/parasecurity/FLASH)
12+
13+
## Citation
14+
15+
If you use Feature Election in your research, please cite the FLASH framework paper:
16+
17+
**IEEE Style:**
18+
> I. Christofilogiannis, G. Valavanis, A. Shevtsov, I. Lamprou and S. Ioannidis, "FLASH: A Framework for Federated Learning with Attribute Selection and Hyperparameter Optimization," 2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA), Dubrovnik, Croatia, 2025, pp. 93-100, doi: 10.1109/FLTA67013.2025.11336571.
19+
20+
**BibTeX:**
21+
```bibtex
22+
@INPROCEEDINGS{11336571,
23+
author={Christofilogiannis, Ioannis and Valavanis, Georgios and Shevtsov, Alexander and Lamprou, Ioannis and Ioannidis, Sotiris},
24+
booktitle={2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA)},
25+
title={FLASH: A Framework for Federated Learning with Attribute Selection and Hyperparameter Optimization},
26+
year={2025},
27+
pages={93-100},
28+
doi={10.1109/FLTA67013.2025.11336571}
29+
}
30+
```
31+
32+
### Key Features
33+
34+
- **Easy Integration**: Simple API for tabular datasets (pandas, numpy)
35+
- **Multiple Feature Selection Methods**: Lasso, Elastic Net, Mutual Information, Random Forest, PyImpetus, and more
36+
- **Flexible Aggregation**: Configurable freedom degree (0=intersection, 1=union, 0-1=weighted voting)
37+
- **Auto-tuning**: Automatic optimization of freedom degree using hill-climbing
38+
- **Multi-phase Workflow**: Local FS → Feature Election with tuning → FL Aggregation
39+
- **Privacy-Preserving**: Only feature selections and scores are shared, not raw data
40+
- **Production-Ready**: Fully compatible with NVIDIA FLARE workflows
41+
42+
### Optional Dependencies
43+
44+
- `scikit-learn` ≥ 1.0 is required for most feature selection methods
45+
→ automatically installed with `pip install nvflare`
46+
47+
- `PyImpetus` ≥ 0.0.6 is optional (enables advanced permutation importance methods)
48+
→ install manually if needed:
49+
```bash
50+
pip install PyImpetus
51+
```
52+
53+
## Quick Start
54+
55+
### Basic Usage
56+
57+
```python
58+
from nvflare.app_opt.feature_election import quick_election
59+
import pandas as pd
60+
61+
# Load your tabular dataset
62+
df = pd.read_csv("your_data.csv")
63+
64+
# Run feature election (simulation mode)
65+
selected_mask, stats = quick_election(
66+
df=df,
67+
target_col='target',
68+
num_clients=4,
69+
fs_method='lasso',
70+
)
71+
72+
# Get selected features
73+
selected_features = df.columns[:-1][selected_mask]
74+
print(f"Selected {len(selected_features)} features: {list(selected_features)}")
75+
print(f"Freedom degree: {stats['freedom_degree']}")
76+
```
77+
78+
### Custom Configuration
79+
80+
```python
81+
from nvflare.app_opt.feature_election import FeatureElection
82+
83+
# Initialize with custom parameters
84+
fe = FeatureElection(
85+
freedom_degree=0.6,
86+
fs_method='elastic_net',
87+
aggregation_mode='weighted',
88+
auto_tune=True,
89+
tuning_rounds=5
90+
)
91+
92+
# Prepare data splits for clients
93+
client_data = fe.prepare_data_splits(
94+
df=df,
95+
target_col='target',
96+
num_clients=5,
97+
split_strategy='stratified' # or 'random', 'sequential', 'dirichlet'
98+
)
99+
100+
# Run simulation
101+
stats = fe.simulate_election(client_data)
102+
103+
# Access selected features
104+
selected_features = fe.selected_feature_names
105+
print(f"Selected {stats['num_features_selected']} features")
106+
```
107+
108+
## Workflow Architecture
109+
110+
The Feature Election workflow consists of three phases:
111+
112+
```
113+
┌─────────────────────────────────────────────────────────────────┐
114+
│ PHASE 1: Local Feature Selection │
115+
│ Clients perform local FS using configured method (lasso, etc.) │
116+
│ → Each client sends: selected_features, feature_scores │
117+
└─────────────────────────────────────────────────────────────────┘
118+
119+
┌─────────────────────────────────────────────────────────────────┐
120+
│ PHASE 2: Tuning & Global Mask Generation │
121+
│ If auto_tune=True: Hill-climbing to find optimal freedom_degree│
122+
│ → Aggregates selections using weighted voting │
123+
│ → Distributes global feature mask to all clients │
124+
└─────────────────────────────────────────────────────────────────┘
125+
126+
┌─────────────────────────────────────────────────────────────────┐
127+
│ PHASE 3: FL Aggregation (Training) │
128+
│ Standard FedAvg training on reduced feature set │
129+
│ → num_rounds of federated training │
130+
└─────────────────────────────────────────────────────────────────┘
131+
```
132+
133+
## NVIDIA FLARE Deployment
134+
135+
### 1. Generate Configuration Files
136+
137+
```python
138+
from nvflare.app_opt.feature_election import FeatureElection
139+
140+
fe = FeatureElection(
141+
freedom_degree=0.5,
142+
fs_method='lasso',
143+
aggregation_mode='weighted',
144+
auto_tune=True,
145+
tuning_rounds=4
146+
)
147+
148+
# Generate FLARE job configuration
149+
config_paths = fe.create_flare_job(
150+
job_name="feature_selection_job",
151+
output_dir="./jobs/feature_selection",
152+
min_clients=2,
153+
num_rounds=5,
154+
client_sites=['hospital_1', 'hospital_2', 'hospital_3']
155+
)
156+
```
157+
158+
### 2. Prepare Client Data
159+
160+
Each client should prepare their data:
161+
162+
```python
163+
from nvflare.app_opt.feature_election import FeatureElectionExecutor
164+
import numpy as np
165+
166+
# In your client script
167+
executor = FeatureElectionExecutor(
168+
fs_method='lasso',
169+
eval_metric='f1'
170+
)
171+
172+
# Load and set client data
173+
X_train, y_train = load_client_data() # Your data loading logic
174+
executor.set_data(X_train, y_train, feature_names=feature_names)
175+
```
176+
177+
### 3. Submit FLARE Job
178+
179+
```bash
180+
nvflare job submit -j ./jobs/feature_selection
181+
```
182+
183+
## Feature Selection Methods
184+
185+
| Method | Description | Best For | Parameters |
186+
|--------|-------------|----------|------------|
187+
| `lasso` | L1 regularization | High-dimensional sparse data | `alpha`, `max_iter` |
188+
| `elastic_net` | L1+L2 regularization | Correlated features | `alpha`, `l1_ratio`, `max_iter` |
189+
| `random_forest` | Tree-based importance | Non-linear relationships | `n_estimators`, `max_depth` |
190+
| `mutual_info` | Information gain | Any data type | `n_neighbors` |
191+
| `pyimpetus` | Permutation importance | Robust feature selection | `p_val_thresh`, `num_sim` |
192+
193+
## Parameters
194+
195+
### FeatureElection
196+
197+
| Parameter | Type | Default | Description |
198+
|-----------|------|---------|-------------|
199+
| `freedom_degree` | float | 0.5 | Controls feature inclusion (0=intersection, 1=union) |
200+
| `fs_method` | str | "lasso" | Feature selection method |
201+
| `aggregation_mode` | str | "weighted" | How to weight client votes ('weighted' or 'uniform') |
202+
| `auto_tune` | bool | False | Enable automatic tuning of freedom_degree |
203+
| `tuning_rounds` | int | 5 | Number of rounds for auto-tuning |
204+
205+
### FeatureElectionController
206+
207+
| Parameter | Type | Default | Description |
208+
|-----------|------|---------|-------------|
209+
| `freedom_degree` | float | 0.5 | Initial freedom degree |
210+
| `aggregation_mode` | str | "weighted" | Client vote weighting |
211+
| `min_clients` | int | 2 | Minimum clients required |
212+
| `num_rounds` | int | 5 | FL training rounds after feature selection |
213+
| `auto_tune` | bool | False | Enable auto-tuning |
214+
| `tuning_rounds` | int | 0 | Number of tuning rounds |
215+
| `train_timeout` | int | 300 | Training phase timeout (seconds) |
216+
217+
### Data Splitting Strategies
218+
219+
- **stratified**: Maintains class distribution (recommended for classification)
220+
- **random**: Random split
221+
- **sequential**: Sequential split for ordered data
222+
- **dirichlet**: Non-IID split with Dirichlet distribution (alpha=0.5)
223+
224+
## API Reference
225+
226+
### Core Classes
227+
228+
#### FeatureElection
229+
230+
Main interface for feature election.
231+
232+
```python
233+
class FeatureElection:
234+
def __init__(
235+
self,
236+
freedom_degree: float = 0.5,
237+
fs_method: str = "lasso",
238+
aggregation_mode: str = "weighted",
239+
auto_tune: bool = False,
240+
tuning_rounds: int = 5,
241+
)
242+
243+
def prepare_data_splits(...) -> List[Tuple[pd.DataFrame, pd.Series]]
244+
def simulate_election(...) -> Dict
245+
def create_flare_job(...) -> Dict[str, str]
246+
def apply_mask(...) -> Union[pd.DataFrame, np.ndarray]
247+
def save_results(filepath: str)
248+
def load_results(filepath: str)
249+
```
250+
251+
#### FeatureElectionController
252+
253+
Server-side controller for NVIDIA FLARE.
254+
255+
```python
256+
class FeatureElectionController(Controller):
257+
def __init__(
258+
self,
259+
freedom_degree: float = 0.5,
260+
aggregation_mode: str = "weighted",
261+
min_clients: int = 2,
262+
num_rounds: int = 5,
263+
task_name: str = "feature_election",
264+
train_timeout: int = 300,
265+
auto_tune: bool = False,
266+
tuning_rounds: int = 0,
267+
)
268+
```
269+
270+
#### FeatureElectionExecutor
271+
272+
Client-side executor for NVIDIA FLARE.
273+
274+
class FeatureElectionExecutor(Executor):
275+
def __init__(
276+
self,
277+
fs_method: str = "lasso",
278+
fs_params: Optional[Dict] = None,
279+
eval_metric: str = "f1",
280+
task_name: str = "feature_election"
281+
)
282+
283+
def set_data(X_train, y_train, X_val=None, y_val=None, feature_names=None)
284+
def evaluate_model(X_train, y_train, X_val, y_val) -> float
285+
```
286+
287+
### Convenience Functions
288+
289+
```python
290+
def quick_election(
291+
df: pd.DataFrame,
292+
target_col: str,
293+
num_clients: int = 3,
294+
freedom_degree: float = 0.5,
295+
fs_method: str = "lasso",
296+
split_strategy: str = "stratified",
297+
**kwargs
298+
) -> Tuple[np.ndarray, Dict]
299+
300+
def load_election_results(filepath: str) -> Dict
301+
```
302+
303+
## Troubleshooting
304+
305+
### Common Issues
306+
307+
1. **"No features selected"**
308+
- Increase freedom_degree
309+
- Try different fs_method
310+
- Check feature scaling
311+
312+
2. **"No feature votes received"**
313+
- Ensure client data is loaded before execution
314+
- Check that task_name matches between controller and executor
315+
316+
3. **"Poor performance after selection"**
317+
- Enable auto_tune to find optimal freedom_degree
318+
- Try weighted aggregation mode
319+
320+
4. **"PyImpetus not available"**
321+
- Install with: `pip install PyImpetus`
322+
- Falls back to mutual information if unavailable
323+
324+
### Debug Mode
325+
326+
Enable detailed logging:
327+
328+
```python
329+
import logging
330+
logging.basicConfig(level=logging.DEBUG)
331+
```
332+
333+
## Running Tests
334+
335+
```bash
336+
pytest tests/unit_test/app_opt/feature_election/test.py -v
337+
```
338+
339+
# Examples
4340

5341
## Quick Start
6342

0 commit comments

Comments
 (0)