A practical tool for translating dataset fields between languages at scale. It helps teams localize structured data reliably by automating field-level translation using a familiar web-based translation engine.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for google-dataset-items-translator you've just found your team — Let’s Chat. 👆👆
This project translates selected fields across all items in a dataset from one language to another. It solves the repetitive and error-prone work of manual data translation and is designed for developers, data teams, and product owners handling multilingual datasets.
- Processes entire datasets item by item without manual intervention
- Supports translating one or multiple fields per record
- Preserves original data or stores translations separately
- Works well with large, structured datasets used in production systems
| Feature | Description |
|---|---|
| Multi-field translation | Translate one or several dataset fields in a single run. |
| Language flexibility | Convert content between any supported source and target languages. |
| Non-destructive mode | Keep original values and store translations in a separate object. |
| Replace mode | Optionally overwrite original field values with translated text. |
| Dataset-wide processing | Automatically iterates through every dataset item. |
| Field Name | Field Description |
|---|---|
| sourceLanguage | Original language code of the dataset content. |
| targetLanguage | Desired language code for translation. |
| datasetId | Identifier of the dataset being processed. |
| pathsToFields | List of dataset fields selected for translation. |
| translation | Object holding translated field values. |
[
{
"description": "Ranch condo with two bedroom and two bathrooms on the main level.",
"translation": {
"description": "Condo rancho con dos habitaciones y dos baños en el nivel principal."
}
}
]
Google Dataset Items Translator/
├── src/
│ ├── translator.js
│ ├── datasetProcessor.js
│ ├── validators/
│ │ └── inputValidator.js
│ └── utils/
│ └── languageMapper.js
├── data/
│ ├── sample-input.json
│ └── sample-output.json
├── config/
│ └── default.settings.json
├── package.json
└── README.md
- Data engineers use it to localize product datasets so global teams can work with native-language content.
- Marketplace operators translate listing descriptions to reach users in new regions faster.
- Analytics teams prepare multilingual datasets so reports remain consistent across countries.
- SaaS platforms automate dataset localization to support international customers without manual workflows.
Can I translate multiple fields at once? Yes. You can provide an array of field paths, and each will be translated for every dataset item in one run.
Does this overwrite my original data? Only if you enable replacement. By default, translations are stored in a separate object, keeping the source data intact.
Is there a limit on text length? Each field supports text up to 5,000 characters, which aligns with common translation service limits.
What formats can I export the results in? After processing, datasets can be exported in structured formats such as JSON, CSV, or XML for easy reuse.
Primary Metric: Processes hundreds of dataset items per minute, depending on field size and language pair.
Reliability Metric: Maintains a high success rate when input constraints are respected and fields stay within size limits.
Efficiency Metric: Optimized iteration minimizes repeated processing and unnecessary translation calls.
Quality Metric: Produces consistent, complete translations with clear field-to-field alignment across the dataset.
