Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
245 commits
Select commit Hold shift + click to select a range
972192b
wdl -> dlrm
ShawnXuan Dec 21, 2021
bd91bde
update train.py
ShawnXuan Dec 21, 2021
4f76677
update readme temporary
ShawnXuan Dec 21, 2021
a997362
update
ShawnXuan Dec 21, 2021
e2c2328
update
ShawnXuan Dec 21, 2021
6618516
udpate
ShawnXuan Dec 21, 2021
c8703e2
update
ShawnXuan Dec 21, 2021
4f939cd
update
ShawnXuan Dec 21, 2021
05da954
update
ShawnXuan Dec 21, 2021
35a8e8f
update arguments
ShawnXuan Dec 22, 2021
4121820
rm spase optimizer
ShawnXuan Dec 22, 2021
2ae57af
update
ShawnXuan Dec 22, 2021
d9a048f
update
ShawnXuan Dec 22, 2021
6f99cfd
update
ShawnXuan Dec 22, 2021
ef5dd21
dot
ShawnXuan Dec 22, 2021
9e68be5
eager 1 device, old embedding
ShawnXuan Dec 22, 2021
994be71
eager consistent ok
ShawnXuan Dec 22, 2021
5f9bb1e
OK for train only
ShawnXuan Dec 24, 2021
8f10588
rm transpose
ShawnXuan Dec 24, 2021
bbef16b
still only train OK
ShawnXuan Dec 24, 2021
d021d07
use register_buffer
ShawnXuan Dec 24, 2021
800d7dc
train and eval ok
ShawnXuan Dec 24, 2021
c65c7ec
embedding type
ShawnXuan Dec 24, 2021
f08604b
dense to int
ShawnXuan Dec 24, 2021
1cb8864
log(dense+1)
ShawnXuan Dec 24, 2021
2f637ee
eager OK
ShawnXuan Jan 5, 2022
79dd12a
rm model type
ShawnXuan Jan 5, 2022
55ebe92
ignore buffer
ShawnXuan Jan 5, 2022
be36841
update sh
ShawnXuan Jan 6, 2022
89de1e0
rm dropout
ShawnXuan Jan 6, 2022
978967d
update module
ShawnXuan Jan 6, 2022
79abd96
one module
ShawnXuan Jan 6, 2022
e13f849
update
ShawnXuan Jan 6, 2022
181ee77
update
ShawnXuan Jan 7, 2022
d646d11
update
ShawnXuan Jan 7, 2022
d7f7626
update
ShawnXuan Jan 7, 2022
832b7b9
labels dtype
ShawnXuan Jan 10, 2022
bc98f4b
Dev dlrm parquet (#282)
ShawnXuan Jan 18, 2022
fa05f0c
add lr scheduler (#283)
ShawnXuan Jan 18, 2022
a621dce
Dev dlrm eval partnum (#284)
ShawnXuan Jan 18, 2022
656507a
support slots (#285)
ShawnXuan Jan 19, 2022
6f6909e
Update dlrm.py
ShawnXuan Jan 21, 2022
57f25c9
Dev dlrm embedding split (#290)
ShawnXuan Jan 25, 2022
ab4a6c1
dlrm one embedding add options (#291)
guo-ran Jan 25, 2022
4394945
add fp16 and loss_scaler (#292)
guo-ran Jan 25, 2022
63fa19e
fix (#293)
ShawnXuan Jan 25, 2022
5ea2f47
Dev dlrm offline auc (#294)
ShawnXuan Jan 28, 2022
4511069
fix one embedding module, rm optimizer conf (#296)
guo-ran Jan 29, 2022
46264bc
refine embedding options (#299)
guo-ran Feb 4, 2022
3d63d66
fix arg
guo-ran Feb 7, 2022
7dd3a33
Dev dlrm offline eval (#300)
ShawnXuan Feb 7, 2022
493207d
merge master
guo-ran Feb 9, 2022
c65b01b
Dev dlrm consistent 2 global (#303)
ShawnXuan Feb 10, 2022
7d3c108
Dev dlrm petastorm (#306)
ShawnXuan Feb 18, 2022
014f2b4
bce with logits (#307)
ShawnXuan Feb 18, 2022
9025c23
Dev dlrm make eval ds (#308)
ShawnXuan Feb 21, 2022
968b77b
Dev dlrm vocab size (#309)
ShawnXuan Feb 22, 2022
f5d7f27
fix fc(scores) init (#310)
ShawnXuan Feb 22, 2022
aee0e11
udate dense relu (#311)
ShawnXuan Feb 22, 2022
3a5a985
update
ShawnXuan Feb 22, 2022
6e3547a
use naive logger
ShawnXuan Feb 22, 2022
39ee398
rm logger.py
ShawnXuan Feb 22, 2022
43aa2f8
update
ShawnXuan Feb 22, 2022
53b21a5
fix loss to local
ShawnXuan Feb 23, 2022
b272f5f
rm usless line
ShawnXuan Feb 23, 2022
6c22bf7
remove to local
ShawnXuan Feb 23, 2022
1b695a6
rank 0
ShawnXuan Feb 23, 2022
66e4de2
fix
ShawnXuan Feb 23, 2022
a794783
add graph_train.py
ShawnXuan Feb 24, 2022
f8f7a1b
keep graph mode only in graph_train.py
ShawnXuan Feb 24, 2022
495313c
rm is_global
ShawnXuan Feb 24, 2022
4151861
update
ShawnXuan Feb 24, 2022
5f51bc3
train one_embedding with graph
ShawnXuan Feb 24, 2022
2ba5945
update
ShawnXuan Feb 24, 2022
ba2402c
rm usless files
ShawnXuan Feb 24, 2022
5e43ed4
rm more files
ShawnXuan Feb 24, 2022
496bfdf
update
ShawnXuan Feb 24, 2022
b6ad2d3
save -> save_model
ShawnXuan Feb 25, 2022
54759fa
update eval arguments
ShawnXuan Feb 25, 2022
2af0a04
rm eval_save_dir
ShawnXuan Feb 25, 2022
263d1eb
mv import oneflow before sklearn.metrics, otherwise not work on onebrain
ShawnXuan Feb 25, 2022
675ca61
rm usless lines
ShawnXuan Feb 25, 2022
6fab657
print host and device mem after eval
ShawnXuan Feb 28, 2022
ba983ed
add auc calculation time
ShawnXuan Feb 28, 2022
dcf8dd9
update
ShawnXuan Feb 28, 2022
b431f52
add fused_dlrm temporarily
ShawnXuan Mar 1, 2022
93f844e
eager train
ShawnXuan Mar 3, 2022
1c170db
shuffling_queue_capacity -> shuffle_row_groups
ShawnXuan Mar 5, 2022
af0eb02
update trainer for eager
ShawnXuan Mar 5, 2022
2d3a963
rm dataset type
ShawnXuan Mar 5, 2022
058c069
update
ShawnXuan Mar 5, 2022
b6764b5
update
ShawnXuan Mar 5, 2022
885a193
parquet dataloader
ShawnXuan Mar 5, 2022
2d4af55
rm fused_dlrm.py
ShawnXuan Mar 5, 2022
85a1627
update
ShawnXuan Mar 5, 2022
478a805
update graph train
ShawnXuan Mar 5, 2022
b3056bc
update
ShawnXuan Mar 5, 2022
32144be
update
ShawnXuan Mar 5, 2022
6bcbbef
update lr scheduler
ShawnXuan Mar 6, 2022
517c79d
update
ShawnXuan Mar 6, 2022
9e740c7
update shell
ShawnXuan Mar 6, 2022
5c3997c
rm lr scheduler
ShawnXuan Mar 7, 2022
5731ffe
rm useless lines
ShawnXuan Mar 7, 2022
d3be350
update
ShawnXuan Mar 7, 2022
1f6415a
update one embedding api
ShawnXuan Mar 8, 2022
f02801d
fix
ShawnXuan Mar 8, 2022
61cec5e
change size_factor order
ShawnXuan Mar 10, 2022
1453c13
fix eval loader
ShawnXuan Mar 11, 2022
ddb269f
rm debug lines
ShawnXuan Mar 11, 2022
29c9744
Merge branch 'main' of github.com:Oneflow-Inc/models into dev_dlrm_ea…
ShawnXuan Mar 11, 2022
58ad8ae
rm train/eval subfolders
ShawnXuan Mar 11, 2022
ec6f455
files
ShawnXuan Mar 11, 2022
237bfc3
support test
ShawnXuan Mar 14, 2022
bc70a25
update oneembedding initlizer
ShawnXuan Mar 14, 2022
f4e62ae
update
ShawnXuan Mar 14, 2022
d555c90
update
ShawnXuan Mar 14, 2022
bef232b
update
ShawnXuan Mar 14, 2022
b1f2312
rm usless lines
ShawnXuan Mar 14, 2022
c7c27ff
option -> options
ShawnXuan Mar 14, 2022
23a4f7e
eval barrier
ShawnXuan Mar 15, 2022
9f40e31
update
ShawnXuan Mar 15, 2022
846d7e2
rm column_ids
ShawnXuan Mar 16, 2022
7155e32
new api
guo-ran Mar 17, 2022
3421088
fix push pull job
ShawnXuan Mar 17, 2022
38ff19e
rm eager test
ShawnXuan Mar 18, 2022
d568ce2
rm graph test
ShawnXuan Mar 18, 2022
5332bd9
rm
ShawnXuan Mar 18, 2022
001395b
eager_train-
ShawnXuan Mar 18, 2022
eb3f00a
rm
ShawnXuan Mar 18, 2022
41e0dd1
merge graph train to train
ShawnXuan Mar 18, 2022
776341b
rm Embedding
ShawnXuan Mar 18, 2022
f5f6a91
update
ShawnXuan Mar 18, 2022
a210c0e
rm vocab size
ShawnXuan Mar 18, 2022
fc583cc
rm test name
ShawnXuan Mar 18, 2022
d8bebf8
rm split axis
ShawnXuan Mar 18, 2022
1543e46
update
ShawnXuan Mar 18, 2022
b750ce6
train -> train_eval
ShawnXuan Mar 18, 2022
1fe7c71
update
ShawnXuan Mar 18, 2022
04b4738
replace class Trainer
ShawnXuan Mar 18, 2022
e2f2f39
fix
ShawnXuan Mar 18, 2022
2100843
fix
ShawnXuan Mar 19, 2022
58b9dfb
merge mlp and fused mlp
ShawnXuan Mar 19, 2022
670bfdd
pythonic
ShawnXuan Mar 19, 2022
de87b67
interaction padding
ShawnXuan Mar 19, 2022
a163f63
format
ShawnXuan Mar 19, 2022
21e04bf
left 3 store types
ShawnXuan Mar 19, 2022
7981125
left 3 store types
ShawnXuan Mar 19, 2022
15c6bd9
use capacity_per_rank
ShawnXuan Mar 19, 2022
e8a0f88
fix
ShawnXuan Mar 19, 2022
0077c03
format
ShawnXuan Mar 19, 2022
7e25d9b
update
ShawnXuan Mar 19, 2022
ab00941
update
ShawnXuan Mar 19, 2022
66b8dc6
update
ShawnXuan Mar 19, 2022
f09c7c2
use 13 and 26
ShawnXuan Mar 19, 2022
ea92774
update
ShawnXuan Mar 20, 2022
205adbd
rm size factor
ShawnXuan Mar 20, 2022
a72abeb
update
ShawnXuan Mar 20, 2022
2fee930
update
ShawnXuan Mar 20, 2022
c084cc8
update readme
ShawnXuan Mar 20, 2022
8ee6a06
update
ShawnXuan Mar 20, 2022
6ccae63
update
ShawnXuan Mar 20, 2022
054297a
Merge branch 'main' of github.com:Oneflow-Inc/models into dev_dlrm_gr…
ShawnXuan Mar 20, 2022
c44f985
modify_read
BakerMara Mar 21, 2022
94f7b10
rm usless import
ShawnXuan Mar 21, 2022
6ce0d19
add requirements.txt
ShawnXuan Mar 21, 2022
c4b7710
Merge branch 'dev_dlrm_graph_train' of github.com:Oneflow-Inc/models …
ShawnXuan Mar 21, 2022
9304e5a
rm args.not_eval_after_training
ShawnXuan Mar 21, 2022
8da68fc
rm batch size per rank
ShawnXuan Mar 21, 2022
3a190b7
set default eval batches
ShawnXuan Mar 21, 2022
a8433b2
every_n_iter -> interval
ShawnXuan Mar 21, 2022
fa15508
device_memory_budget_mb_per_rank -> cache_memory_budget_mb_per_rank
ShawnXuan Mar 21, 2022
87435fa
dataloader-
ShawnXuan Mar 21, 2022
fc4abc7
update
ShawnXuan Mar 21, 2022
3c2d76c
update
ShawnXuan Mar 21, 2022
067048c
update
ShawnXuan Mar 21, 2022
55e0b84
update
ShawnXuan Mar 21, 2022
5a1f072
update
ShawnXuan Mar 21, 2022
ef79e26
update
ShawnXuan Mar 21, 2022
43e4838
use_fp16-
ShawnXuan Mar 21, 2022
326dfcf
single py
ShawnXuan Mar 21, 2022
7a0f2cb
disable_fusedmlp
ShawnXuan Mar 21, 2022
02000db
4 to 1
ShawnXuan Mar 21, 2022
383cdc9
new api
guo-ran Mar 22, 2022
c855c71
add capacity
guo-ran Mar 22, 2022
ea8ee0b
Arguments description (#325)
BakerMara Mar 22, 2022
5a77ba3
column-
ShawnXuan Mar 23, 2022
8f1c714
make_table
ShawnXuan Mar 23, 2022
704b0b7
MultiTableEmbedding
ShawnXuan Mar 23, 2022
fdb6a95
update store type
ShawnXuan Mar 23, 2022
2081993
update
ShawnXuan Mar 23, 2022
3eb6be4
update readme
ShawnXuan Mar 23, 2022
4cdeae9
update README
ShawnXuan Mar 23, 2022
117b8c8
update
ShawnXuan Mar 23, 2022
ce223fe
iter->step
ShawnXuan Mar 24, 2022
cb72da3
update README
ShawnXuan Mar 24, 2022
05def25
add license
ShawnXuan Mar 26, 2022
9720632
update README
ShawnXuan Mar 26, 2022
4b588b4
install oneflow nightly
ShawnXuan Mar 28, 2022
6543e61
Add tools directory info to DLRM README.md (#328)
Liuxinman Mar 28, 2022
4380e88
Add deepfm model(FM component missed)
Liuxinman Mar 31, 2022
1b645eb
Add FM component
Liuxinman Apr 1, 2022
3df60df
Update README.md
Liuxinman Apr 1, 2022
804d06c
Fix loss bug; change weight initialization methods
Liuxinman Apr 7, 2022
be10bfb
Merge branch 'dev_deepfm' of https://github.com/Oneflow-Inc/models in…
Liuxinman Apr 7, 2022
8691e94
change lr scheduler to multistepLR
Liuxinman Apr 7, 2022
3234bbf
Add dropout layer to dnn
Liuxinman Apr 8, 2022
17836c8
Add monitor for early stopping
Liuxinman Apr 8, 2022
f61fd09
Simplify early stopping schema
Liuxinman Apr 11, 2022
3f6f09d
Normal initialization for oneembedding; Adam optimizer; h52parquet
Liuxinman Apr 12, 2022
f735c29
Add logloss in eval for early stop
Liuxinman Apr 12, 2022
8863d54
Fix dataloader slicing bug
Liuxinman Apr 12, 2022
a8c92de
Merge branch 'main' of https://github.com/Oneflow-Inc/models into dev…
Liuxinman Apr 12, 2022
fa75fbd
Change lr schedule to reduce lr on plateau
Liuxinman Apr 14, 2022
0a254de
Refine train/val/test
Liuxinman Apr 24, 2022
8050a30
Add validation and test evaluation
Liuxinman Apr 24, 2022
f554d43
Update readme and help message
Liuxinman Apr 25, 2022
8faab5f
use flow.roc_auc_score, prefetch eval batches, fix train step start time
Liuxinman Apr 28, 2022
be1e8c6
Delete unused args;
Liuxinman May 7, 2022
d09be0c
Add deepfm with MultiColOneEmbedding
Liuxinman May 7, 2022
402f5eb
remove fusedmlp; change interaction class to function; keep val graph…
Liuxinman May 9, 2022
0de67a6
Use flow._C.binary_cross_entropy_loss;
Liuxinman May 9, 2022
8301402
Fix early stop bug;
Liuxinman May 9, 2022
f693aa2
Change auc time and logloss time to metrics time;
Liuxinman May 10, 2022
d41e9ab
replace view with keepdim;
Liuxinman May 10, 2022
abb0941
change unsqueeze to keepdim;
Liuxinman May 11, 2022
e5ee1f6
Use from numpy to reduce cast time
Liuxinman May 12, 2022
03b7f64
Add early stop and save best to args
Liuxinman May 13, 2022
10b4c1b
Reformat deepfm_train_eval
Liuxinman May 16, 2022
7ed7ab5
Merge branch 'main' of https://github.com/Oneflow-Inc/models into dev…
Liuxinman May 16, 2022
2c5c5a8
Use BCEWithLogitsLoss
Liuxinman May 16, 2022
8f38552
Update readme;
Liuxinman May 16, 2022
9048220
Update README.md
Liuxinman May 16, 2022
afc4b20
Fix early stop bugs
Liuxinman May 17, 2022
242b0a2
Refine save best model help message
Liuxinman May 18, 2022
c348747
Add scala script and spark launching shell script
Liuxinman May 18, 2022
3a5da9a
Delete h5_to_parquet.py
Liuxinman May 18, 2022
64970f8
Update readme.md
Liuxinman May 18, 2022
c6f071e
Use real values in table size array example;
Liuxinman May 18, 2022
a09af1b
Add split_criteo_kaggle.py
Liuxinman May 18, 2022
7235d20
Update readme.md
Liuxinman May 18, 2022
c335217
Rename training script;
Liuxinman May 19, 2022
60111b2
Update Readme.md (fix bad links)
Liuxinman May 19, 2022
0963add
Update README.md
Liuxinman May 19, 2022
21c55ed
Format files
Liuxinman May 19, 2022
a61f51c
Add out_features in DNN
Liuxinman May 19, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions RecommenderSystems/deepfm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# DeepFM

[DeepFM](https://arxiv.org/abs/1703.04247) is a Factorization-Machine based Neural Network for CTR prediction. Its model structure is as follows. Based on this structure, this project uses OneFlow distributed deep learning framework to realize training the model in graph mode on the Criteo data set.

<p align='center'>
<img width="539" alt="Screen Shot 2022-04-01 at 4 45 22 PM" src="https://user-images.githubusercontent.com/46690197/161228714-ae9410bb-56db-46b0-8f0b-cb8becb6ee03.png">
</p>


## Directory description

```txt
.
├── deepfm_train_eval.py # OneFlow DeepFM train/val/test scripts with OneEmbedding module
├── README.md # Documentation
├── tools
│ ├── deepfm_parquet.scala # Read Criteo Kaggle data and export it as parquet data format
│ └── launch_spark.sh # Spark launching shell script
│ └── split_criteo_kaggle.py # Split criteo kaggle dataset to train\val\test set
├── train_deepfm.sh # DeepFM training shell script
```

## Arguments description

We use exactly the same default values as [the DeepFM_Criteo_x4_001 experiment](https://github.com/openbenchmark/BARS/tree/master/ctr_prediction/benchmarks/DeepFM/DeepFM_criteo_x4_001) in FuxiCTR.

| Argument Name | Argument Explanation | Default Value |
| -------------------------- | ------------------------------------------------------------ | ------------------------ |
| data_dir | the data file directory | *Required Argument* |
| num_train_samples | the number of train samples | *Required Argument* |
| num_val_samples | the number of validation samples | *Required Argument* |
| num_test_samples | the number of test samples | *Required Argument* |
| model_load_dir | model loading directory | None |
| model_save_dir | model saving directory | None |
| save_best_model | save best model or not | False |
| save_initial_model | save initial model parameters or not | False |
| save_model_after_each_eval | save model after each eval or not | False |
| embedding_vec_size | embedding vector size | 16 |
| dnn | dnn hidden units number | 1000,1000,1000,1000,1000 |
| net_dropout | number of minibatch training interations | 0.2 |
| embedding_vec_size | embedding vector size | 16 |
| learning_rate | initial learning rate | 0.001 |
| batch_size | training/evaluation batch size | 10000 |
| train_batches | the maximum number of training batches | 75000 |
| loss_print_interval | interval of printing loss | 100 |
| patience | Number of epochs with no improvement after which learning rate will be reduced | 2 |
| min_delta | threshold for measuring the new optimum, to only focus on significant changes | 1.0e-6 |
| table_size_array | embedding table size array for sparse fields | *Required Argument* |
| persistent_path | path for persistent kv store of embedding | *Required Argument* |
| store_type | OneEmbeddig persistent kv store type: `device_mem`, `cached_host_mem` or `cached_ssd` | `cached_host_mem` |
| cache_memory_budget_mb | size of cache memory budget on each device in megabytes when `store_type` is `cached_host_mem` or `cached_ssd` | 1024 |
| amp | enable Automatic Mixed Precision(AMP) training or not | False |
| loss_scale_policy | loss scale policy for AMP training: `static` or `dynamic` | `static` |
| disable_early_stop | disable early stop or not | False |

#### Early Stop Schema

The model is evaluated at the end of every epoch. At the end of each epoch, if the early stopping criterion is met, the training process will be stopped.

The monitor used for the early stop is `val_auc - val_log_loss`. The mode of the early stop is `max`. You could tune `patience` and `min_delta` as needed.

If you want to disable early stopping, simply add `--disable_early_stop` in the [train_deepfm.sh](https://github.com/Oneflow-Inc/models/blob/dev_deepfm_multicol_oneemb/RecommenderSystems/deepfm/train_deepfm.sh).

## Getting Started

A hands-on guide to train a DeepFM model.

### Environment

1. Install OneFlow by following the steps in [OneFlow Installation Guide](https://github.com/Oneflow-Inc/oneflow#install-oneflow) or use the command line below.

```shell
python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/cu102
```

2. Install all other dependencies listed below.

```json
psutil
petastorm
pandas
sklearn
```

### Dataset

**Note**:

According to [the DeepFM paper](https://arxiv.org/abs/1703.04247), we treat both categorical and continuous features as sparse features.

> χ may include categorical fields (e.g., gender, location) and continuous fields (e.g., age). Each categorical field is represented as a vec- tor of one-hot encoding, and each continuous field is repre- sented as the value itself, or a vector of one-hot encoding after discretization.

1. Download the [Criteo Kaggle dataset](https://www.kaggle.com/c/criteo-display-ad-challenge) and then split it using [split_criteo_kaggle.py](https://github.com/Oneflow-Inc/models/blob/dev_deepfm_multicol_oneemb/RecommenderSystems/deepfm/tools/split_criteo_kaggle.py).

Note: Same as [the DeepFM_Criteo_x4_001 experiment](https://github.com/openbenchmark/BARS/tree/master/ctr_prediction/benchmarks/DeepFM/DeepFM_criteo_x4_001) in FuxiCTR, only train.txt is used. Also, the dataset is randomly spllitted into 8:1:1 as training set, validation set and test set. The dataset is splitted using StratifiedKFold in sklearn.

```shell
python3 split_criteo_kaggle.py --input_dir=/path/to/your/criteo_kaggle --output_dir=/path/to/your/output/dir
```

2. Download spark from https://spark.apache.org/downloads.html and then uncompress the tar file into the directory where you want to install Spark. Ensure the `SPARK_HOME` environment variable points to the directory where the spark is.

3. launch a spark shell using [launch_spark.sh](https://github.com/Oneflow-Inc/models/blob/dev_deepfm_multicol_oneemb/RecommenderSystems/deepfm/tools/launch_spark.sh).

- Modify the SPARK_LOCAL_DIRS as needed

```shell
export SPARK_LOCAL_DIRS=/path/to/your/spark/
```

- Run `bash launch_spark.sh`

4. load [deepfm_parquet.scala](https://github.com/Oneflow-Inc/models/blob/dev_deepfm_multicol_oneemb/RecommenderSystems/deepfm/tools/deepfm_parquet.scala) to your spark shell by `:load deepfm_parquet.scala`.

5. call the `makeDeepfmDataset(srcDir: String, dstDir:String)` function to generate the dataset.

```shell
makeDeepfmDataset("/path/to/your/src_dir", "/path/to/your/dst_dir")
```

After generating parquet dataset, dataset information will also be printed. It contains the information about the number of samples and table size array, which is needed when training.

```txt
train samples = 36672493
validation samples = 4584062
test samples = 4584062
table size array:
649,9364,14746,490,476707,11618,4142,1373,7275,13,169,407,1376
1460,583,10131227,2202608,305,24,12517,633,3,93145,5683,8351593,3194,27,14992,5461306,10,5652,2173,4,7046547,18,15,286181,105,142572
```

### Start Training by Oneflow

1. Modify the [train_deepfm.sh](https://github.com/Oneflow-Inc/models/blob/dev_deepfm_multicol_oneemb/RecommenderSystems/deepfm/train_deepfm.sh) as needed.

```shell
#!/bin/bash
DEVICE_NUM_PER_NODE=1
DATA_DIR=/path/to/deepfm_parquet
PERSISTENT_PATH=/path/to/persistent
MODEL_SAVE_DIR=/path/to/model/save/dir

python3 -m oneflow.distributed.launch \
--nproc_per_node $DEVICE_NUM_PER_NODE \
--nnodes 1 \
--node_rank 0 \
--master_addr 127.0.0.1 \
deepfm_train_eval.py \
--data_dir $DATA_DIR \
--persistent_path $PERSISTENT_PATH \
--table_size_array "649,9364,14746,490,476707,11618,4142,1373,7275,13,169,407,1376,1460,583,10131227,2202608,305,24,12517,633,3,93145,5683,8351593,3194,27,14992,5461306,10,5652,2173,4,7046547,18,15,286181,105,142572" \
--store_type 'cached_host_mem' \
--cache_memory_budget_mb 1024 \
--batch_size 10000 \
--train_batches 75000 \
--loss_print_interval 100 \
--dnn "1000,1000,1000,1000,1000" \
--net_dropout 0.2 \
--learning_rate 0.001 \
--embedding_vec_size 16 \
--num_train_samples 36672493 \
--num_val_samples 4584062 \
--num_test_samples 4584062 \
--model_save_dir $MODEL_SAVE_DIR \
--save_best_model
```

2. train a DeepFM model by `bash train_deepfm.sh`.
Loading