Skip to content

AtlasNexusTech/datatoolkit

Repository files navigation

Atlas Data Toolkit

CI License: MIT PyPI

Small, dependency-light CLI for converting, validating, cleaning and batching operational data files.

Atlas Data Toolkit is the productized version of the earlier datatoolkit prototype. It is intentionally simple: one Python CLI that helps turn messy CSV/JSON/YAML/XML files into clean handoff artifacts for dashboards, automations and client data pipelines.

Portable Edition (no pip needed)

For Windows, Android (Termux), or any machine without pip:

  1. Download datatoolkit-portable-v0.1.0.zip
  2. Extract anywhere
  3. Run with python datatoolkit-portable.py <command>
# Windows
python datatoolkit-portable.py convert data.csv -o data.json

# Android (Termux)
python datatoolkit-portable.py validate data.csv

JSON and CSV work out of the box. YAML needs pip install pyyaml. XML is bundled.

See README-portable.md for full details.

Use cases

  • Convert client exports between CSV, JSON, YAML and XML.
  • Validate row counts, columns, nulls and empty strings before delivery.
  • Normalize whitespace and numeric values.
  • Remove duplicate rows.
  • Split large files into smaller batches for review or processing.

Install

🐍 pip (any OS)

pip install atlas-datatoolkit

🪟 Windows .exe (zero dependencies — no Python needed)

curl -sL https://raw.githubusercontent.com/AtlasNexusTech/datatoolkit/master/install.bat | cmd

Or download datatoolkit.exe

📱 Android (Termux)

curl -sL https://raw.githubusercontent.com/AtlasNexusTech/datatoolkit/master/install-android.sh | bash

📦 Portable script (any OS with Python 3.9+)

Download datatoolkit-portable-v0.1.0.zip, extract, run:

python datatoolkit-portable.py convert data.csv -o data.json

CLI examples

Convert files

# JSON → CSV
datatoolkit convert data.json -o data.csv

# CSV → JSON with cleanup
datatoolkit convert messy.csv -o clean.json --clean

# YAML → XML
datatoolkit convert config.yaml -f xml -o config.xml

Validate a dataset

datatoolkit validate data.csv

Example output:

{
  "total_rows": 342,
  "total_columns": 5,
  "columns": ["name", "price", "category", "url", "updated_at"],
  "null_counts": {
    "name": 0,
    "price": 2
  },
  "empty_strings": {
    "name": 0,
    "price": 2
  }
}

Clean a file

# Deduplicate + normalize numeric strings
datatoolkit clean messy.csv -o clean.csv

# Normalize only, keep duplicates
datatoolkit clean data.json -o normalized.json --no-dedup

Split into batches

datatoolkit batch big.csv 100 ./chunks/

Output:

chunks/chunk_001.csv
chunks/chunk_002.csv
...

Supported formats

Format Read Write
JSON yes yes
CSV yes yes
YAML yes yes
XML yes yes

Repository structure

.
├── dtk.py              # CLI implementation
├── requirements.txt    # pyyaml + xmltodict
├── README.md           # product documentation
└── LICENSE             # MIT

Design principles

  • Boring is good — standard formats, simple CLI, no service dependency.
  • Pipeline-friendly — commands can be used inside cron jobs, scripts and agent workflows.
  • Client-delivery oriented — outputs are easy to inspect and hand off.
  • Small surface area — suitable for quick customization during micro-builds.

Atlas Nexus context

This repository supports Atlas Nexus data-operation offers:

  • data cleanup;
  • dashboard preparation;
  • conversion of client exports;
  • repeatable micro-pipelines;
  • pre-validation before automation.

Main Atlas Nexus site:

https://atlasnexusops.github.io/

License

MIT — Atlas Nexus, 2026

About

Atlas Data Toolkit — CLI for converting, validating, cleaning and batching operational data files

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors