Skip to content

Commit 0bf5d43

Browse files
author
Maxim Zhiltsov
committed
Initial commit
1 parent ff3f597 commit 0bf5d43

File tree

179 files changed

+20410
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

179 files changed

+20410
-0
lines changed

CONTRIBUTING.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
## Table of Contents
2+
3+
- [Installation](#installation)
4+
- [Usage](#usage)
5+
- [Testing](#testing)
6+
- [Design](#design-and-code-structure)
7+
8+
## Installation
9+
10+
### Prerequisites
11+
12+
- Python (3.5+)
13+
- OpenVINO (optional)
14+
15+
``` bash
16+
git clone https://github.com/opencv/cvat
17+
```
18+
19+
Optionally, install a virtual environment:
20+
21+
``` bash
22+
python -m pip install virtualenv
23+
python -m virtualenv venv
24+
. venv/bin/activate
25+
```
26+
27+
Then install all dependencies:
28+
29+
``` bash
30+
while read -r p; do pip install $p; done < requirements.txt
31+
```
32+
33+
If you're working inside CVAT environment:
34+
``` bash
35+
. .env/bin/activate
36+
while read -r p; do pip install $p; done < datumaro/requirements.txt
37+
```
38+
39+
## Usage
40+
41+
> The directory containing Datumaro should be in the `PYTHONPATH`
42+
> environment variable or `cvat/datumaro/` should be the current directory.
43+
44+
``` bash
45+
datum --help
46+
python -m datumaro --help
47+
python datumaro/ --help
48+
python datum.py --help
49+
```
50+
51+
``` python
52+
import datumaro
53+
```
54+
55+
## Testing
56+
57+
It is expected that all Datumaro functionality is covered and checked by
58+
unit tests. Tests are placed in `tests/` directory.
59+
60+
To run tests use:
61+
62+
``` bash
63+
python -m unittest discover -s tests
64+
```
65+
66+
If you're working inside CVAT environment, you can also use:
67+
68+
``` bash
69+
python manage.py test datumaro/
70+
```
71+
72+
## Design and code structure
73+
74+
- [Design document](docs/design.md)
75+
- [Developer guide](docs/developer_guide.md)

LICENSE

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
MIT License
2+
3+
Copyright (C) 2019-2020 Intel Corporation
4+
 
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"),
7+
to deal in the Software without restriction, including without limitation
8+
the rights to use, copy, modify, merge, publish, distribute, sublicense,
9+
and/or sell copies of the Software, and to permit persons to whom
10+
the Software is furnished to do so, subject to the following conditions:
11+
 
12+
The above copyright notice and this permission notice shall be included
13+
in all copies or substantial portions of the Software.
14+
 
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
16+
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
18+
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
19+
OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
20+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
21+
OR OTHER DEALINGS IN THE SOFTWARE.
22+
 

README.md

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# Dataset Management Framework (Datumaro)
2+
3+
A framework to build, transform, and analyze datasets.
4+
5+
<!--lint disable fenced-code-flag-->
6+
```
7+
CVAT annotations -- ---> Annotation tool
8+
\ /
9+
COCO-like dataset -----> Datumaro ---> dataset ------> Model training
10+
/ \
11+
VOC-like dataset -- ---> Publication etc.
12+
```
13+
<!--lint enable fenced-code-flag-->
14+
15+
## Contents
16+
17+
- [Documentation](#documentation)
18+
- [Features](#features)
19+
- [Installation](#installation)
20+
- [Usage](#usage)
21+
- [Examples](#examples)
22+
- [Contributing](#contributing)
23+
24+
## Documentation
25+
26+
- [User manual](docs/user_manual.md)
27+
- [Design document](docs/design.md)
28+
- [Contributing](CONTRIBUTING.md)
29+
30+
## Features
31+
32+
- Dataset format conversions:
33+
- COCO (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`*)
34+
- [Format specification](http://cocodataset.org/#format-data)
35+
- [Dataset example](tests/assets/coco_dataset)
36+
- `labels` are our extension - like `instances` with only `category_id`
37+
- PASCAL VOC (`classification`, `detection`, `segmentation` (class, instances), `action_classification`, `person_layout`)
38+
- [Format specification](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html)
39+
- [Dataset example](tests/assets/voc_dataset)
40+
- YOLO (`bboxes`)
41+
- [Format specification](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data)
42+
- [Dataset example](tests/assets/yolo_dataset)
43+
- TF Detection API (`bboxes`, `masks`)
44+
- Format specifications: [bboxes](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md), [masks](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/instance_segmentation.md)
45+
- [Dataset example](tests/assets/tf_detection_api_dataset)
46+
- MOT sequences
47+
- [Format specification](https://arxiv.org/pdf/1906.04567.pdf)
48+
- [Dataset example](tests/assets/mot_dataset)
49+
- CVAT
50+
- [Format specification](https://github.com/opencv/cvat/blob/develop/cvat/apps/documentation/xml_format.md)
51+
- [Dataset example](tests/assets/cvat_dataset)
52+
- LabelMe
53+
- [Format specification](http://labelme.csail.mit.edu/Release3.0)
54+
- [Dataset example](tests/assets/labelme_dataset)
55+
- Dataset building operations:
56+
- Merging multiple datasets into one
57+
- Dataset filtering with custom conditions, for instance:
58+
- remove polygons of a certain class
59+
- remove images without a specific class
60+
- remove `occluded` annotations from images
61+
- keep only vertically-oriented images
62+
- remove small area bounding boxes from annotations
63+
- Annotation conversions, for instance:
64+
- polygons to instance masks and vise-versa
65+
- apply a custom colormap for mask annotations
66+
- rename or remove dataset labels
67+
- Dataset comparison
68+
- Model integration:
69+
- Inference (OpenVINO and custom models)
70+
- Explainable AI ([RISE algorithm](https://arxiv.org/abs/1806.07421))
71+
72+
> Check the [design document](docs/design.md) for a full list of features
73+
74+
## Installation
75+
76+
Optionally, create a virtual environment:
77+
78+
``` bash
79+
python -m pip install virtualenv
80+
python -m virtualenv venv
81+
. venv/bin/activate
82+
```
83+
84+
Install Datumaro package:
85+
86+
``` bash
87+
pip install 'git+https://github.com/opencv/cvat#egg=datumaro&subdirectory=datumaro'
88+
```
89+
90+
## Usage
91+
92+
There are several options available:
93+
- [A standalone command-line tool](#standalone-tool)
94+
- [A python module](#python-module)
95+
96+
### Standalone tool
97+
98+
<!--lint disable fenced-code-flag-->
99+
```
100+
User
101+
|
102+
v
103+
+------------------+
104+
| CVAT |
105+
+--------v---------+ +------------------+ +--------------+
106+
| Datumaro module | ----> | Datumaro project | <---> | Datumaro CLI | <--- User
107+
+------------------+ +------------------+ +--------------+
108+
```
109+
<!--lint enable fenced-code-flag-->
110+
111+
``` bash
112+
datum --help
113+
python -m datumaro --help
114+
```
115+
116+
### Python module
117+
118+
Datumaro can be used in custom scripts as a library in the following way:
119+
120+
``` python
121+
from datumaro.components.project import Project # project-related things
122+
import datumaro.components.extractor # annotations and high-level interfaces
123+
# etc.
124+
project = Project.load('directory')
125+
```
126+
127+
## Examples
128+
129+
<!--lint disable list-item-indent-->
130+
<!--lint disable list-item-bullet-indent-->
131+
132+
- Convert [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#data) to COCO, keep only images with `cat` class presented:
133+
```bash
134+
# Download VOC dataset:
135+
# http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
136+
datum convert --input-format voc --input-path <path/to/voc> \
137+
--output-format coco --filter '/item[annotation/label="cat"]'
138+
```
139+
140+
- Convert only non-occluded annotations from a CVAT-annotated project to TFrecord:
141+
```bash
142+
# export Datumaro dataset in CVAT UI, extract somewhere, go to the project dir
143+
datum project extract --filter '/item/annotation[occluded="False"]' \
144+
--mode items+anno --output-dir not_occluded
145+
datum project export --project not_occluded \
146+
--format tf_detection_api -- --save-images
147+
```
148+
149+
- Annotate COCO, extract image subset, re-annotate it in CVAT, update old dataset:
150+
```bash
151+
# Download COCO dataset http://cocodataset.org/#download
152+
# Put images to coco/images/ and annotations to coco/annotations/
153+
datum project import --format coco --input-path <path/to/coco>
154+
datum project export --filter '/image[images_I_dont_like]' --format cvat \
155+
--output-dir reannotation
156+
# import dataset and images to CVAT, re-annotate
157+
# export Datumaro project, extract to 'reannotation-upd'
158+
datum project project merge reannotation-upd
159+
datum project export --format coco
160+
```
161+
162+
- Annotate instance polygons in CVAT, export as masks in COCO:
163+
```bash
164+
datum convert --input-format cvat --input-path <path/to/cvat.xml> \
165+
--output-format coco -- --segmentation-mode masks
166+
```
167+
168+
- Apply an OpenVINO detection model to some COCO-like dataset,
169+
then compare annotations with ground truth and visualize in TensorBoard:
170+
```bash
171+
datum project import --format coco --input-path <path/to/coco>
172+
# create model results interpretation script
173+
datum model add mymodel openvino \
174+
--weights model.bin --description model.xml \
175+
--interpretation-script parse_results.py
176+
datum model run --model mymodel --output-dir mymodel_inference/
177+
datum project diff mymodel_inference/ --format tensorboard --output-dir diff
178+
```
179+
180+
- Change colors in PASCAL VOC-like `.png` masks:
181+
```bash
182+
datum project import --format voc --input-path <path/to/voc/dataset>
183+
184+
# Create a color map file with desired colors:
185+
#
186+
# label : color_rgb : parts : actions
187+
# cat:0,0,255::
188+
# dog:255,0,0::
189+
#
190+
# Save as mycolormap.txt
191+
192+
datum project export --format voc_segmentation -- --label-map mycolormap.txt
193+
# add "--apply-colormap=0" to save grayscale (indexed) masks
194+
# check "--help" option for more info
195+
# use "datum --loglevel debug" for extra conversion info
196+
```
197+
198+
<!--lint enable list-item-bullet-indent-->
199+
<!--lint enable list-item-indent-->
200+
201+
## Contributing
202+
203+
Feel free to [open an Issue](https://github.com/opencv/cvat/issues/new) if you
204+
think something needs to be changed. You are welcome to participate in development,
205+
development instructions are available in our [developer manual](CONTRIBUTING.md).

datum.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/usr/bin/env python
2+
import sys
3+
4+
from datumaro.cli.__main__ import main
5+
6+
7+
if __name__ == '__main__':
8+
sys.exit(main())

datumaro/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
2+
# Copyright (C) 2019-2020 Intel Corporation
3+
#
4+
# SPDX-License-Identifier: MIT

datumaro/__main__.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
2+
# Copyright (C) 2019-2020 Intel Corporation
3+
#
4+
# SPDX-License-Identifier: MIT
5+
6+
import sys
7+
8+
from datumaro.cli.__main__ import main
9+
10+
11+
if __name__ == '__main__':
12+
sys.exit(main())

datumaro/cli/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
2+
# Copyright (C) 2019-2020 Intel Corporation
3+
#
4+
# SPDX-License-Identifier: MIT

0 commit comments

Comments
 (0)