All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Ability to import a video as frames with the
video_framesformat and to split a video into frames with thedatum util split_videocommand (open-edge-platform#555) --subsetparameter in theimage_dirformat (open-edge-platform#555)MediaManagerAPI to control loaded media resources at runtime (open-edge-platform#555)- Command to detect the format of a dataset (open-edge-platform#576)
- More comfortable access to library API via
import datumaro(open-edge-platform#630) - CLI command-like free functions (
export,transform, ...) (open-edge-platform#630) - Reading specific annotation files for train dataset in Cityscapes (open-edge-platform#632)
- Random sampling transforms (
random_sampler,label_random_sampler) to create smaller datasets from bigger ones (open-edge-platform#636, open-edge-platform#640) - Support for downloading the ImageNetV2 and COCO datasets (open-edge-platform#653, open-edge-platform#659)
- A way for formats to signal that they don't support detection (open-edge-platform#665)
- Removal transforms to remove items/annoations/attributes from dataset
(
remove_items,remove_annotations,remove_attributes) (open-edge-platform#670)
- Allowed direct file paths in
datum import. Such sources are imported like when therpathparameter is specified, however, only the selected path is copied into the project (open-edge-platform#555) - Improved
statsperformance, added new filtering parameters, image stats (unique,repeated) moved to thedatasetsection, removedmeanandstdfrom thedatasetsection (open-edge-platform#621) - Allowed
Imagecreation from justsizeinfo (open-edge-platform#634) - Added image search in VOC XML-based subformats (open-edge-platform#634)
- Added image path equality checks in simple merge, when applicable (open-edge-platform#634)
- Supported saving box attributes when downloading the TFDS version of VOC (open-edge-platform#668)
- TBD
- Official support of Python 3.6 (due to it's EOL) (open-edge-platform#617)
- Backward compatibility annotation symbols in
components.extractor(open-edge-platform#630)
- Prohibited calling
add,importandexportcommands without a project (open-edge-platform#555) - Calling
make_dataseton empty project tree now produces the error properly (open-edge-platform#555) - Saving (overwriting) a dataset in a project when rpath is used (open-edge-platform#613)
- Output image extension preserving in the
Resizetransform (open-edge-platform#606) - Memory overuse in the
Resizetransform (open-edge-platform#607) - Invalid image pixels produced by the
Resizetransform (open-edge-platform#618) - Numeric warnings that sometimes occurred in
statscommand (e.g. open-edge-platform#607) (open-edge-platform#621) - Added missing item attribute merging in simple merge (open-edge-platform#634)
- Inability to disambiguate VOC from LabelMe in some cases (open-edge-platform#658)
- TBD
- Command to download public datasets (open-edge-platform#582)
- Extension autodetection in
ByteImage(open-edge-platform#595) - MPII Human Pose Dataset (import-only) (.mat and .json) (open-edge-platform#584)
- MARS format (import-only) (open-edge-platform#585)
- The
pycocotoolsdependency lower bound is raised to2.0.4. (open-edge-platform#449) smooth_linefromdatumaro.util.annotation_util- the function is renamed toapproximate_lineand has updated interface (open-edge-platform#592)
- Python 3.6 support
- TBD
- Fails in multimerge when lines are not approximated and when there are no label categories (open-edge-platform#592)
- Cannot convert LabelMe dataset, that has no subsets (open-edge-platform#600)
- TBD
- Video reading API (open-edge-platform#521)
- Python API documentation (open-edge-platform#526)
- Mapillary Vistas dataset format (Import-only) (open-edge-platform#537)
- Datumaro can now be installed on Windows on Python 3.9 (open-edge-platform#547)
- Import for SYNTHIA dataset format (open-edge-platform#532)
- Support of
scoreattribute in KITTI detetion (open-edge-platform#571) - Support for Accuracy Checker dataset meta files in formats (open-edge-platform#553, open-edge-platform#569, open-edge-platform#575)
- Import for VoTT dataset format (open-edge-platform#573)
- Image resizing transform (open-edge-platform#581)
- The following formats can now be detected unambiguously:
ade20k2017,ade20k2020,camvid,coco,cvat,datumaro,icdar_text_localization,icdar_text_segmentation,icdar_word_recognition,imagenet_txt,kitti_raw,label_me,lfw,mot_seq,open_images,vgg_face2,voc,widerface,yolo(open-edge-platform#531, open-edge-platform#536, open-edge-platform#550, open-edge-platform#557, open-edge-platform#558) - Allowed Pytest-native tests (open-edge-platform#563)
- Allowed export options in the
datum mergecommand (open-edge-platform#545)
- Using
Image,ByteImagefromdatumaro.util.image- these classes are moved todatumaro.components.media(open-edge-platform#538)
- Equality comparison support between
datumaro.components.media.Imageandnumpy.ndarray(open-edge-platform#568)
- Bug #560: import issue with MOT dataset when using seqinfo.ini file (open-edge-platform#564)
- Empty lines in VOC subset lists are not ignored (open-edge-platform#587)
- TBD
- Import for CelebA dataset format. (open-edge-platform#484)
- File
people.txtbecame optional in LFW (open-edge-platform#509) - File
image_ids_and_rotation.csvbecame optional Open Images (open-edge-platform#509) - Allowed underscores (
_) in subset names in COCO (open-edge-platform#509) - Allowed annotation files with arbitrary names in COCO (open-edge-platform#509)
- The
icdar_text_localizationformat is no longer detected in every directory (open-edge-platform#531) - Updated
pycocotoolsversion to 2.0.2 (open-edge-platform#534)
- TBD
- TBD
- Unhandled exception when a file is specified as the source for a COCO or MOTS dataset (open-edge-platform#530)
- Exporting dataset without
colorattribute into theicdar_text_segmentationformat (open-edge-platform#556)
- TBD
- A new installation target:
pip install datumaro[default], which should be used by default. The simpledatumarois supposed for library users. (open-edge-platform#238) - Dataset and project versioning capabilities (Git-like) (open-edge-platform#238)
- "dataset revpath" concept in CLI, allowing to pass a dataset path with
the dataset format in
diff,merge,explainandinfoCLI commands (open-edge-platform#238) import,remove,commit,checkout,log,status,infoCLI commands (open-edge-platform#238)Coco*Extractorclasses now have an option to preserve label IDs from the original annotation file (open-edge-platform#453)patchCLI command to patch datasets (open-edge-platform#401)ProjectLabelstransform to change dataset labels for merging etc. (open-edge-platform#401, open-edge-platform#478)- Support for custom labels in the KITTI detection format (open-edge-platform#481)
- Type annotations and docs for Annotation classes (open-edge-platform#493)
- Options to control label loading behavior in
imagenet_txtimport (open-edge-platform#434, open-edge-platform#489)
- A project can contain and manage multiple datasets instead of a single one. CLI operations can be applied to the whole project, or to separate datasets. Datasets are modified inplace, by default (open-edge-platform#328)
- CLI help for builtin plugins doesn't require project (open-edge-platform#328)
- Annotation-related classes were moved into a new module,
datumaro.components.annotation(open-edge-platform#439) - Rollback utilities replaced with Scope utilities (open-edge-platform#444)
- The
Projectclass fromdatumaro.componentsis changed completely (open-edge-platform#238) diffandediffare joined into a singlediffCLI command (open-edge-platform#238)- Projects use new file layout, incompatible with old projects.
An old project can be updated with
datum project migrate(open-edge-platform#238) - Inheriting
CliPluginis not required in plugin classes (open-edge-platform#238) Importers do not createProjects anymore and just return a list of extractor configurations (open-edge-platform#238)
- TBD
import,project mergeCLI commands (open-edge-platform#238)- Support for project hierarchies. A project cannot be a source anymore (open-edge-platform#238)
- Project cannot have independent internal dataset anymore. All the project data must be stored in the project data sources (open-edge-platform#238)
datumaro_projectformat (open-edge-platform#238)- Unused
pathfield ofDatasetItem(open-edge-platform#455)
- Deprecation warning in
open_images_format.py(open-edge-platform#440) lazy_imagereturning unrelated data sometimes (open-edge-platform#409)- Invalid call to
pycocotools.mask.iou(open-edge-platform#450) - Importing of Open Images datasets without image data (open-edge-platform#463)
- Return value type in
Dataset.is_modified(open-edge-platform#401) - Remapping of secondary categories in
RemapLabels(open-edge-platform#401) - VOC dataset patching for classification and segmentation tasks (open-edge-platform#478)
- Exported mask label ids in KITTI segmentation (open-edge-platform#481)
- Missing
labelforPointsread in the LFW format (open-edge-platform#494)
- TBD
- The Open Images format now supports bounding box and segmentation mask annotations (open-edge-platform#352, open-edge-platform#388).
- Bounding boxes values decrement transform (open-edge-platform#366)
- Improved error reporting in
Dataset(open-edge-platform#386) - Support ADE20K format (import only) (open-edge-platform#400)
- Documentation website at https://openvinotoolkit.github.io/datumaro (open-edge-platform#420)
- Datumaro no longer depends on scikit-image (open-edge-platform#379)
Datasetremembers export options on saving / exporting for the first time (open-edge-platform#386)
- TBD
- TBD
- Application of
remap_labelsto dataset categories of different length (open-edge-platform#314) - Patching of datasets in formats (open-edge-platform#348)
- Improved Cityscapes export performance (open-edge-platform#367)
- Incorrect format of
*_labelIds.pngin Cityscapes export (open-edge-platform#325, open-edge-platform#342) - Item id in ImageNet format (open-edge-platform#371)
- Double quotes for ICDAR Word Recognition (open-edge-platform#375)
- Wrong display of builtin formats in CLI (open-edge-platform#332)
- Non utf-8 encoding of annotation files in Market-1501 export (open-edge-platform#392)
- Import of ICDAR, PASCAL VOC and VGGFace2 images from subdirectories on WIndows (open-edge-platform#392)
- Saving of images with Unicode paths on Windows (open-edge-platform#392)
- Calling
ProjectDataset.transform()with a string argument (open-edge-platform#402) - Attributes casting for CVAT format (open-edge-platform#403)
- Loading of custom project plugins (open-edge-platform#404)
- Reading, writing anno file and saving name of the subset for test subset (open-edge-platform#447)
- Fixed unsafe unpickling in CIFAR import (open-edge-platform#362)
- Support for import/export zip archives with images (open-edge-platform#273)
- Subformat importers for VOC and COCO (open-edge-platform#281)
- Support for KITTI dataset segmentation and detection format (open-edge-platform#282)
- Updated YOLO format user manual (open-edge-platform#295)
ItemTransformclass, which describes item-wise datasetTransforms (open-edge-platform#297)keep-emptyexport parameter in VOC format (open-edge-platform#297)- A base class for dataset validation plugins (open-edge-platform#299)
- Partial support for the Open Images format; only images and image-level labels can be read/written (open-edge-platform#291, open-edge-platform#315).
- Support for Supervisely Point Cloud dataset format (open-edge-platform#245, open-edge-platform#353)
- Support for KITTI Raw / Velodyne Points dataset format (open-edge-platform#245)
- Support for CIFAR-100 and documentation for CIFAR-10/100 (open-edge-platform#301)
- Tensorflow AVX check is made optional in API and disabled by default (open-edge-platform#305)
- Extensions for images in ImageNet_txt are now mandatory (open-edge-platform#302)
- Several dependencies now have lower bounds (open-edge-platform#308)
- TBD
- TBD
- Incorrect image layout on saving and a problem with ecoding on loading (open-edge-platform#284)
- An error when XPath filter is applied to the dataset or its subset (open-edge-platform#259)
- Tracking of
Datasetchanges done by transforms (open-edge-platform#297) - Improved CLI startup time in several cases (open-edge-platform#306)
- Known issue: loading CIFAR can result in arbitrary code execution (open-edge-platform#327)
- Support for escaping in attribute values in LabelMe format (open-edge-platform#49)
- Support for Segmentation Splitting (open-edge-platform#223)
- Support for CIFAR-10/100 dataset format (open-edge-platform#225, open-edge-platform#243)
- Support for COCO panoptic and stuff format (open-edge-platform#210)
- Documentation file and integration tests for Pascal VOC format (open-edge-platform#228)
- Support for MNIST and MNIST in CSV dataset formats (open-edge-platform#234)
- Documentation file for COCO format (open-edge-platform#241)
- Documentation file and integration tests for YOLO format (open-edge-platform#246)
- Support for Cityscapes dataset format (open-edge-platform#249)
- Support for Validator configurable threshold (open-edge-platform#250)
- LabelMe format saves dataset items with their relative paths by subsets without changing names (open-edge-platform#200)
- Allowed arbitrary subset count and names in classification and detection splitters (open-edge-platform#207)
- Annotation-less dataset elements are now participate in subset splitting (open-edge-platform#211)
- Classification task in LFW dataset format (open-edge-platform#222)
- Testing is now performed with pytest instead of unittest (open-edge-platform#248)
- TBD
- TBD
- Added support for auto-merging (joining) of datasets with no labels and having labels (open-edge-platform#200)
- Allowed explicit label removal in
remap_labelstransform (open-edge-platform#203) - Image extension in CVAT format export (open-edge-platform#214)
- Added a label "face" for bounding boxes in Wider Face (open-edge-platform#215)
- Allowed adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if these attributes are not present (open-edge-platform#216)
- Empty lines in YOLO annotations are ignored (open-edge-platform#221)
- Export in VOC format when no image info is available (open-edge-platform#239)
- Fixed saving attribute in WiderFace extractor (open-edge-platform#251)
- TBD
- TBD
- Added an option to allow undeclared annotation attributes in CVAT format export (open-edge-platform#192)
- COCO exports images in separate dirs by subsets. Added an option to control this (open-edge-platform#195)
- TBD
- TBD
- Instance masks of
backgroundclass no more introduce an instance (open-edge-platform#188) - Added support for label attributes in Datumaro format (open-edge-platform#192)
- TBD
- OpenVINO plugin examples (open-edge-platform#159)
- Dataset validation for classification and detection datasets (open-edge-platform#160)
- Arbitrary image extensions in formats (import and export) (open-edge-platform#166)
- Ability to set a custom subset name for an imported dataset (open-edge-platform#166)
- CLI support for NDR(open-edge-platform#178)
- Common ICDAR format is split into 3 sub-formats (open-edge-platform#174)
- TBD
- TBD
- The ability to work with file names containing Cyrillic and spaces (open-edge-platform#148)
- Image reading and saving in ICDAR formats (open-edge-platform#174)
- Unnecessary image loading on dataset saving (open-edge-platform#176)
- Allowed spaces in ICDAR captions (open-edge-platform#182)
- Saving of masks in VOC when masks are not requested (open-edge-platform#184)
- TBD
- TBD
- TBD
- TBD
- TBD
- Images with no annotations are exported again in VOC formats (open-edge-platform#123)
- Inference result for only one output layer in OpenVINO launcher (open-edge-platform#125)
- TBD
Icdar13/15dataset format (open-edge-platform#96)- Laziness, source caching, tracking of changes and partial updating for
Dataset(open-edge-platform#102) Market-1501dataset format (open-edge-platform#108)LFWdataset format (open-edge-platform#110)- Support of polygons' and masks' confusion matrices and mismathing classes in
diffcommand (open-edge-platform#117) - Add near duplicate image removal plugin (open-edge-platform#113)
- Sampler Plugin that analyzes inference result from the given dataset and selects samples for annotation(open-edge-platform#115)
- OpenVINO model launcher is updated for OpenVINO r2021.1 (open-edge-platform#100)
- TBD
- TBD
- High memory consumption and low performance of mask import/export, #53 (open-edge-platform#101)
- Masks, covered by class 0 (background), should be exported with holes inside (open-edge-platform#104)
diffcommand invocation problem with missing class methods (open-edge-platform#117)
- TBD
WiderFacedataset format (open-edge-platform#65, open-edge-platform#90)- Function to transform annotations to labels (open-edge-platform#66)
- Dataset splits for classification, detection and re-id tasks (open-edge-platform#68, open-edge-platform#81)
VGGFace2dataset format (open-edge-platform#69, open-edge-platform#82)- Unique image count statistic (open-edge-platform#87)
- Installation with pip by name
datumaro
Datasetclass extended with new operations:save,load,export,import_from,detect,run_model(open-edge-platform#71)- Allowed importing
Extractor-only defined formats (inProject.import_from,dataset.import_fromand CLI/project import) (open-edge-platform#71) datum project ...commands replaced withdatum ...commands (open-edge-platform#84)- Supported more image formats in
ImageNetextractors (open-edge-platform#85) - Allowed adding
Importer-defined formats as project sources (source add) (open-edge-platform#86) - Added max search depth in
ImageDirformat and importers (open-edge-platform#86)
datum project ...CLI context (open-edge-platform#84)
- TBD
- Allow plugins inherited from
Extractor(instead of onlySourceExtractor) (open-edge-platform#70) - Windows installation with
pipforpycocotools(open-edge-platform#73) YOLOextractor path matching on Windows (open-edge-platform#73)- Fixed inplace file copying when saving images (open-edge-platform#76)
- Fixed
labelmapparameter type checking inVOCconverter (open-edge-platform#76) - Fixed model copying on addition in CLI (open-edge-platform#94)
- TBD
CamViddataset format (open-edge-platform#57)- Ability to install
opencv-python-headlessdependency withDATUMARO_HEADLESS=1environment variable instead ofopencv-python(open-edge-platform#62)
- Allow empty supercategory in COCO (open-edge-platform#54)
- Allow Pascal VOC to search in subdirectories (open-edge-platform#50)
- TBD
- TBD
- TBD
- TBD
ImageNetandImageNetTxtdataset formats (open-edge-platform#41)
- TBD
- TBD
- TBD
- Default
label-mapparameter value for VOC converter (open-edge-platform#34) - Randomness of random split transform (open-edge-platform#38)
Transform.subsets()method (open-edge-platform#38)- Supported unknown image formats in TF Detection API converter (open-edge-platform#40)
- Supported empty attribute values in CVAT extractor (open-edge-platform#45)
- TBD
ByteImageclass to represent encoded images in memory and avoid recoding on save (open-edge-platform#27)
- Implementation of format plugins simplified (open-edge-platform#22)
defaultis now a default subset name, instead ofNone. The values are interchangeable. (open-edge-platform#22)- Improved performance of transforms (open-edge-platform#22)
- TBD
image/depthvalue from VOC export (open-edge-platform#27)
- Zero division errors in dataset statistics (open-edge-platform#31)
- TBD
reindexoption in COCO and CVAT converters (open-edge-platform#18)- Support for relative paths in LabelMe format (open-edge-platform#19)
- MOTS png mask format support (https://github.com/openvinotoolkit/datumaro/21)
- TBD
- TBD
- TBD
- TBD
- TBD
- Initial release
## [Unreleased]
### Added
- TBD
### Changed
- TBD
### Deprecated
- TBD
### Removed
- TBD
### Fixed
- TBD
### Security
- TBD