This project is an Extract, Transform, Load (ETL) pipeline designed to extract metadata from the global version of the Yu-Gi-Oh! Duel Links game via reverse engineering, transform it into a structured format, and store the results in Parquet files, mainly to be used by the Floowandereeze & Modding tool.
This project does not extract the assets themselves (e.g. png files containing art of cards), it only gets the references to the bundles containing the data, which can be used for modding.
- Extracts game metadata by reverse engineering Unity game files. The data currently extracted includes:
- Card Arts.
- Sleeves.
- Playmats.
- Transforms raw game data into a structured format, and cleans bad data.
- Loads the processed data into Parquet files for efficient storage and querying.
- Python for scripting and data processing
- UnityPy for reverse engineering Unity assets
- pandas for data manipulation
- PIL for image manipulation
Ensure you have the following installed:
- Python 3.8+
- pip package manager
- Yu-Gi-Oh! Duel Links, with the in-game download complete
# Clone the repository
git clone https://github.com/Nauder/floowandereeze-and-modding-etl-dl.git
cd floowandereeze-and-modding-etl-dl
# Install dependencies
pip install -r requirements.txtReplace the empty string in config.json with your game path.
Run the extraction script to pull data from Unity game files:
python .\etl\main.pyAfter extracting the data, it will be available as Parquet files inside the data/ folder, as well as a version.txt file
containing the date of the last script run.
├── data/ # Final Parquet files
├── etl/
│ ├── services/ # Pipeline logic
│ ├── main.py # Main script
│ └── util.py # Utility functions
├── config.json # Configurable parameters
└── README.md # ThisThe data/ folder in the repository contains the game metadata as of it's last push, so if there was no game update
after the last push it should already be up to date.
Modify the values in config.json, the configurations are:
- game_path path to your Duel Links installation's user data, up to the LocalData folder.
- num_threads amount of threads to use when extracting data, performance varies by hardware.
- excluded_sleeves sleeve assets to be ignored when building the list of sleeves. The game names sleeve materials the same way as animated sleeve frames, so they are removed manually.
The only mandatory configuration is the game_path, the rest come with default values that should be appropriate for most cases.
This project uses GitHub Actions for continuous integration and code quality checks:
- Pylint: Runs on every push and pull request to the main branch to ensure code quality and catch potential errors. The check requires a minimum score of 9/10.
- Black: Runs on every push and pull request to the main branch to enforce consistent code formatting across the project.
- The card name is obtained using the YGOPro API, which is not guaranteed to be accurate or complete. A more reliable approach would be to decrypt the files containing the card names, but this is not implemented yet.
This project is licensed under the GNU General Public License.
- YGOPRODeck for their public API.