Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions variant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,22 @@ Each example consists of 2 files:

## Regenerating these files

The files were generated by running the [`regen.py`](regen.py) script that uses Apache Spark to
generate the files.
The files in this directory were initially generated by running the [`regen.py`](regen.py)
script which used Apache Spark to generate the files. The files have been subsequently modified
when necessary to ensure that they conform to the Parquet spec.

### Modification 1: Created metadata for `primitive_null` as a single byte (`0x01`)

Per <https://github.com/apache/parquet-testing/issues/81>, Spark did not generate
any metadata for `null` and left `primitive_null.metadata` empty.
The metadata for `primitive_null` should be the same 3 bytes as other primitive types
* header = `0x01`
* dictionary_size = `0x00`
* `dictionary_size + 1 = 1` byte values: `0x00`

```shell
cp primitive_int8.metadata primitive_null.metadata
```

[Variant]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
[primitive types listed in the spec]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#value-data-for-primitive-type-basic_type0
Binary file modified variant/primitive_null.metadata
Binary file not shown.