Skip to content

wang-q/tva

Repository files navigation

tva: Tab-separated Values Assistant

Fast, reliable TSV processing toolkit in Rust.

Build codecov Crates.io license Documentation

Overview

tva (pronounced "Tee-Va") is a high-performance command-line toolkit written in Rust for processing tabular data. It brings the safety and speed of modern systems programming to the classic Unix philosophy.

Inspiration

  • eBay's tsv-utils (discontinued): The primary reference for functionality and performance.
  • GNU Datamash: Statistical operations.
  • R's tidyverse: Reshaping concepts and string manipulation.
  • xan: DSL and terminal-based plotting.

Use Cases

  • "Middle Data": Files too large for Excel/Pandas but too small for distributed systems ( Spark/Hadoop).
  • Data Pipelines: Robust CLI-based ETL steps compatible with awk, sort, etc.
  • Exploration: Fast summary statistics, sampling, and filtering on raw data.

Design Principles

  • Single Binary: A standalone executable with no dependencies, easy to deploy.
  • Header Aware: Manipulate columns by name or index.
  • Fail-fast: Strict validation ensures data integrity (no silent truncation).
  • Streaming: Stateless processing designed for infinite streams and large files.
  • TSV-first: Prioritizes the reliability and simplicity of tab-separated values.
  • Performance: Single-pass execution with minimal memory overhead.

Read the documentation online

Installation

Current release: 0.3.1

# Clone the repository and install via cargo
cargo install --force --path .

Or install the pre-compiled binary via the cross-platform package manager cbp (supports older Linux systems with glibc 2.17+):

cbp install tva

You can also download the pre-compiled binaries from the Releases page.

Running Examples

The examples in the documentation use sample data located in the docs/data/ directory. To run these examples yourself, we recommend cloning the repository:

git clone https://github.com/wang-q/tva.git
cd tva

Then you can run the commands exactly as shown in the docs (e.g., tva select -f 1 docs/data/input.csv).

Alternatively, you can download individual files from the docs/data directory on GitHub.

Commands

Select specific rows or columns from your data.

  • select: Select and reorder columns.
  • filter: Filter rows based on numeric, string, or regex conditions.
  • slice: Slice rows by index (keep or drop). Supports multiple ranges and header preservation.
  • sample: Randomly sample rows (Bernoulli, reservoir, weighted).

Transform the structure or values of your data.

  • longer: Reshape wide to long (unpivot). Requires a header row.
  • wider: Reshape long to wide (pivot). Supports aggregation via --op (sum, count, etc.).
  • fill: Fill missing values in selected columns (down/LOCF, const).
  • blank: Replace consecutive identical values in selected columns with empty strings ( sparsify).
  • transpose: Swaps rows and columns (matrix transposition).

Expression-based transformations for complex data manipulation.

  • expr: Evaluate expressions and output results.
  • extend: Add new columns to each row (alias for expr -m extend).
  • mutate: Modify existing column values (alias for expr -m mutate).

Organize and combine multiple datasets.

  • sort: Sorts rows based on one or more key fields.
  • reverse: Reverses the order of lines (like tac), optionally keeping the header at the top.
  • join: Join two files based on common keys.
  • append: Concatenate multiple TSV files, handling headers correctly.
  • split: Split a file into multiple files (by size, key, or random).

Calculate statistics and summarize your data.

  • stats: Calculate summary statistics (sum, mean, median, min, max, etc.) with grouping.
  • bin: Discretize numeric values into bins (useful for histograms).
  • uniq: Deduplicate rows or count unique occurrences (supports equivalence classes).

Visualize your data in the terminal.

  • plot point: Draw scatter plots or line charts in the terminal.
  • plot box: Draw box plots (box-and-whisker plots) in the terminal.
  • plot bin2d: Draw 2D histograms/heatmaps in the terminal.

Format and validate your data.

  • check: Validate TSV file structure (column counts, encoding).
  • nl: Add line numbers to rows.
  • keep-header: Run a shell command on the body of a TSV file, preserving the header.

Import & Export

Convert data to and from TSV format.

  • from: Convert other formats to TSV (csv, xlsx, html).
  • to: Convert TSV to other formats (csv, xlsx, md).

Author

Qiang Wang wang-q@outlook.com

License

MIT. Copyright by Qiang Wang.