Small, dependency-light CLI for converting, validating, cleaning and batching operational data files.
Atlas Data Toolkit is the productized version of the earlier datatoolkit prototype. It is intentionally simple: one Python CLI that helps turn messy CSV/JSON/YAML/XML files into clean handoff artifacts for dashboards, automations and client data pipelines.
For Windows, Android (Termux), or any machine without pip:
- Download
datatoolkit-portable-v0.1.0.zip - Extract anywhere
- Run with
python datatoolkit-portable.py <command>
# Windows
python datatoolkit-portable.py convert data.csv -o data.json
# Android (Termux)
python datatoolkit-portable.py validate data.csvJSON and CSV work out of the box. YAML needs pip install pyyaml. XML is bundled.
See README-portable.md for full details.
- Convert client exports between CSV, JSON, YAML and XML.
- Validate row counts, columns, nulls and empty strings before delivery.
- Normalize whitespace and numeric values.
- Remove duplicate rows.
- Split large files into smaller batches for review or processing.
pip install atlas-datatoolkitcurl -sL https://raw.githubusercontent.com/AtlasNexusTech/datatoolkit/master/install.bat | cmdOr download datatoolkit.exe
curl -sL https://raw.githubusercontent.com/AtlasNexusTech/datatoolkit/master/install-android.sh | bashDownload datatoolkit-portable-v0.1.0.zip, extract, run:
python datatoolkit-portable.py convert data.csv -o data.json# JSON → CSV
datatoolkit convert data.json -o data.csv
# CSV → JSON with cleanup
datatoolkit convert messy.csv -o clean.json --clean
# YAML → XML
datatoolkit convert config.yaml -f xml -o config.xmldatatoolkit validate data.csvExample output:
{
"total_rows": 342,
"total_columns": 5,
"columns": ["name", "price", "category", "url", "updated_at"],
"null_counts": {
"name": 0,
"price": 2
},
"empty_strings": {
"name": 0,
"price": 2
}
}# Deduplicate + normalize numeric strings
datatoolkit clean messy.csv -o clean.csv
# Normalize only, keep duplicates
datatoolkit clean data.json -o normalized.json --no-dedupdatatoolkit batch big.csv 100 ./chunks/Output:
chunks/chunk_001.csv
chunks/chunk_002.csv
...
| Format | Read | Write |
|---|---|---|
| JSON | yes | yes |
| CSV | yes | yes |
| YAML | yes | yes |
| XML | yes | yes |
.
├── dtk.py # CLI implementation
├── requirements.txt # pyyaml + xmltodict
├── README.md # product documentation
└── LICENSE # MIT
- Boring is good — standard formats, simple CLI, no service dependency.
- Pipeline-friendly — commands can be used inside cron jobs, scripts and agent workflows.
- Client-delivery oriented — outputs are easy to inspect and hand off.
- Small surface area — suitable for quick customization during micro-builds.
This repository supports Atlas Nexus data-operation offers:
- data cleanup;
- dashboard preparation;
- conversion of client exports;
- repeatable micro-pipelines;
- pre-validation before automation.
Main Atlas Nexus site:
https://atlasnexusops.github.io/
MIT — Atlas Nexus, 2026