Fast, reliable TSV processing toolkit in Rust.
tva (pronounced "Tee-Va") is a high-performance command-line toolkit written in Rust for
processing tabular data. It brings the safety and speed of modern systems programming to the classic
Unix philosophy.
Inspiration
- eBay's tsv-utils (discontinued): The primary reference for functionality and performance.
- GNU Datamash: Statistical operations.
- R's tidyverse: Reshaping concepts and string manipulation.
- xan: DSL and terminal-based plotting.
Use Cases
- "Middle Data": Files too large for Excel/Pandas but too small for distributed systems ( Spark/Hadoop).
- Data Pipelines: Robust CLI-based ETL steps compatible with
awk,sort, etc. - Exploration: Fast summary statistics, sampling, and filtering on raw data.
Design Principles
- Single Binary: A standalone executable with no dependencies, easy to deploy.
- Header Aware: Manipulate columns by name or index.
- Fail-fast: Strict validation ensures data integrity (no silent truncation).
- Streaming: Stateless processing designed for infinite streams and large files.
- TSV-first: Prioritizes the reliability and simplicity of tab-separated values.
- Performance: Single-pass execution with minimal memory overhead.
Current release: 0.3.1
# Clone the repository and install via cargo
cargo install --force --path .Or install the pre-compiled binary via the cross-platform package manager cbp (supports older Linux systems with glibc 2.17+):
cbp install tvaYou can also download the pre-compiled binaries from the Releases page.
The examples in the documentation use sample data located in the docs/data/ directory. To run
these examples yourself, we recommend cloning the repository:
git clone https://github.com/wang-q/tva.git
cd tvaThen you can run the commands exactly as shown in the docs (e.g.,
tva select -f 1 docs/data/input.csv).
Alternatively, you can download individual files from the docs/data directory on GitHub.
Select specific rows or columns from your data.
select: Select and reorder columns.filter: Filter rows based on numeric, string, or regex conditions.slice: Slice rows by index (keep or drop). Supports multiple ranges and header preservation.sample: Randomly sample rows (Bernoulli, reservoir, weighted).
Transform the structure or values of your data.
longer: Reshape wide to long (unpivot). Requires a header row.wider: Reshape long to wide (pivot). Supports aggregation via--op(sum, count, etc.).fill: Fill missing values in selected columns (down/LOCF, const).blank: Replace consecutive identical values in selected columns with empty strings ( sparsify).transpose: Swaps rows and columns (matrix transposition).
Expression-based transformations for complex data manipulation.
expr: Evaluate expressions and output results.extend: Add new columns to each row (alias forexpr -m extend).mutate: Modify existing column values (alias forexpr -m mutate).
Organize and combine multiple datasets.
sort: Sorts rows based on one or more key fields.reverse: Reverses the order of lines (liketac), optionally keeping the header at the top.join: Join two files based on common keys.append: Concatenate multiple TSV files, handling headers correctly.split: Split a file into multiple files (by size, key, or random).
Calculate statistics and summarize your data.
stats: Calculate summary statistics (sum, mean, median, min, max, etc.) with grouping.bin: Discretize numeric values into bins (useful for histograms).uniq: Deduplicate rows or count unique occurrences (supports equivalence classes).
Visualize your data in the terminal.
plot point: Draw scatter plots or line charts in the terminal.plot box: Draw box plots (box-and-whisker plots) in the terminal.plot bin2d: Draw 2D histograms/heatmaps in the terminal.
Format and validate your data.
check: Validate TSV file structure (column counts, encoding).nl: Add line numbers to rows.keep-header: Run a shell command on the body of a TSV file, preserving the header.
Convert data to and from TSV format.
from: Convert other formats to TSV (csv,xlsx,html).to: Convert TSV to other formats (csv,xlsx,md).
Qiang Wang wang-q@outlook.com
MIT. Copyright by Qiang Wang.