Atlas Data Toolkit

Small, dependency-light CLI for converting, validating, cleaning and batching operational data files.

Atlas Data Toolkit is the productized version of the earlier datatoolkit prototype. It is intentionally simple: one Python CLI that helps turn messy CSV/JSON/YAML/XML files into clean handoff artifacts for dashboards, automations and client data pipelines.

Portable Edition (no pip needed)

For Windows, Android (Termux), or any machine without pip:

Download datatoolkit-portable-v0.1.0.zip
Extract anywhere
Run with python datatoolkit-portable.py <command>

# Windows
python datatoolkit-portable.py convert data.csv -o data.json

# Android (Termux)
python datatoolkit-portable.py validate data.csv

JSON and CSV work out of the box. YAML needs pip install pyyaml. XML is bundled.

See README-portable.md for full details.

Use cases

Convert client exports between CSV, JSON, YAML and XML.
Validate row counts, columns, nulls and empty strings before delivery.
Normalize whitespace and numeric values.
Remove duplicate rows.
Split large files into smaller batches for review or processing.

Install

🐍 pip (any OS)

pip install atlas-datatoolkit

🪟 Windows .exe (zero dependencies — no Python needed)

curl -sL https://raw.githubusercontent.com/AtlasNexusTech/datatoolkit/master/install.bat | cmd

Or download datatoolkit.exe

📱 Android (Termux)

curl -sL https://raw.githubusercontent.com/AtlasNexusTech/datatoolkit/master/install-android.sh | bash

📦 Portable script (any OS with Python 3.9+)

Download datatoolkit-portable-v0.1.0.zip, extract, run:

python datatoolkit-portable.py convert data.csv -o data.json

CLI examples

Convert files

# JSON → CSV
datatoolkit convert data.json -o data.csv

# CSV → JSON with cleanup
datatoolkit convert messy.csv -o clean.json --clean

# YAML → XML
datatoolkit convert config.yaml -f xml -o config.xml

Validate a dataset

datatoolkit validate data.csv

Example output:

{
  "total_rows": 342,
  "total_columns": 5,
  "columns": ["name", "price", "category", "url", "updated_at"],
  "null_counts": {
    "name": 0,
    "price": 2
  },
  "empty_strings": {
    "name": 0,
    "price": 2
  }
}

Clean a file

# Deduplicate + normalize numeric strings
datatoolkit clean messy.csv -o clean.csv

# Normalize only, keep duplicates
datatoolkit clean data.json -o normalized.json --no-dedup

Split into batches

datatoolkit batch big.csv 100 ./chunks/

Output:

chunks/chunk_001.csv
chunks/chunk_002.csv
...

Supported formats

Format	Read	Write
JSON	yes	yes
CSV	yes	yes
YAML	yes	yes
XML	yes	yes

Repository structure

.
├── dtk.py              # CLI implementation
├── requirements.txt    # pyyaml + xmltodict
├── README.md           # product documentation
└── LICENSE             # MIT

Design principles

Boring is good — standard formats, simple CLI, no service dependency.
Pipeline-friendly — commands can be used inside cron jobs, scripts and agent workflows.
Client-delivery oriented — outputs are easy to inspect and hand off.
Small surface area — suitable for quick customization during micro-builds.

Atlas Nexus context

This repository supports Atlas Nexus data-operation offers:

data cleanup;
dashboard preparation;
conversion of client exports;
repeatable micro-pipelines;
pre-validation before automation.

Main Atlas Nexus site:

https://atlasnexusops.github.io/

License

MIT — Atlas Nexus, 2026

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
src/datatoolkit		src/datatoolkit
vendor		vendor
LICENSE		LICENSE
README-portable.md		README-portable.md
README.md		README.md
datatoolkit-portable.py		datatoolkit-portable.py
datatoolkit.bat		datatoolkit.bat
dtk.py		dtk.py
install-android.sh		install-android.sh
install.bat		install.bat
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atlas Data Toolkit

Portable Edition (no pip needed)

Use cases

Install

🐍 pip (any OS)

🪟 Windows .exe (zero dependencies — no Python needed)

📱 Android (Termux)

📦 Portable script (any OS with Python 3.9+)

CLI examples

Convert files

Validate a dataset

Clean a file

Split into batches

Supported formats

Repository structure

Design principles

Atlas Nexus context

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Atlas Data Toolkit

Portable Edition (no pip needed)

Use cases

Install

🐍 pip (any OS)

🪟 Windows .exe (zero dependencies — no Python needed)

📱 Android (Termux)

📦 Portable script (any OS with Python 3.9+)

CLI examples

Convert files

Validate a dataset

Clean a file

Split into batches

Supported formats

Repository structure

Design principles

Atlas Nexus context

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages