Skip to content

Commit bce24e6

Browse files
committed
doc: book sections on metadata
1 parent 244d3f6 commit bce24e6

File tree

6 files changed

+161
-0
lines changed

6 files changed

+161
-0
lines changed

book/src/SUMMARY.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,12 @@
1515
- [Working with trees](./tree_sequence_tree.md)
1616
- [Miscellaneous operations](./tree_sequence_miscellaneous.md)
1717

18+
* [Metadata](./metadata.md)
19+
- [Defining metadata types in rust](./metadata_derive.md)
20+
- [Metadata and tables](./metadata_tables.md)
21+
- [Metadata schema](./metadata_schema.md)
22+
23+
1824
[Crate prelude](./prelude.md)
1925
[Changelog](./changelog.md)
2026
[Migration Guide](./migration_guide.md)

book/src/metadata.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Metadata <img align="right" width="73" height="45" src="https://raw.githubusercontent.com/tskit-dev/administrative/main/logos/svg/tskit-rust/Tskit_rust_logo.eps.svg">

book/src/metadata_derive.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Defining metadata types in rust
2+
3+
Using the `tskit` cargo feature `derive`, we can use procedural macros to define metadata types.
4+
Here, we define a metadata type for a mutation table:
5+
6+
```rust, noplayground, ignore
7+
{{#include ../../tests/book_metadata.rs:metadata_derive}}
8+
```
9+
10+
We require that you also manually specify the `serde` derive macros because the metadata API
11+
itself does not depend on `serde`.
12+
Rather, it expects raw bytes and `serde` happens to be a good way to get them from your data types.

book/src/metadata_schema.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Metadata schema
2+
3+
For useful data interchange with `tskit-python`, we need to define [metadata schema](https://tskit.dev/tskit/docs/stable/metadata.html).
4+
5+
There are currently several points slowing down a rust API for schema:
6+
7+
* It is not clear which `serde` formats are compatible with metadata on the Python side.
8+
* Experiments have shown that `serde_json` works with `tskit-python`.
9+
* Ideally, we would also like a binary format compatible with the Python `struct`
10+
module.
11+
* However, we have not found a solution eliminating the need to manually write the
12+
schema as a string and add it to the tables.
13+
Various crates to generate JSON schema from rust structs return schema that are over-specified
14+
and fail to validate in `tskit-python`.
15+
* We also have the problem that we will need to add some Python to our CI to prove to ourselves
16+
that some reasonable tests can pass.
17+

book/src/metadata_tables.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Metadata and tables
2+
3+
Let us create a table and add a row with our mutation metadata:
4+
5+
```rust, noplayground, ignore
6+
{{#include ../../tests/book_metadata.rs:add_mutation_table_row_with_metadata}}
7+
```
8+
9+
Meta data is optional on a per-row basis:
10+
11+
```rust, noplayground, ignore
12+
{{#include ../../tests/book_metadata.rs:add_mutation_table_row_without_metadata}}
13+
```
14+
15+
We can confirm that we have one row with, and one without, metadata:
16+
17+
```rust, noplayground, ignore
18+
{{#include ../../tests/book_metadata.rs:validate_metadata_row_contents}}
19+
```
20+
21+
Fetching our metadata from the table requires specifying the metadata type.
22+
The result of a metadata retrieval is `Option<Result, TskitError>`.
23+
The `None` variant occurs if a row does not have metadata or if a row id does not exist.
24+
The error state occurs if decoding raw bytes into the metadata type fails.
25+
The details of the error variant are [here](https://docs.rs/tskit/latest/tskit/error/enum.TskitError.html#variant.MetadataError).
26+
The reason why the error type holds `Box<dyn Error>` is that the API is very general.
27+
We assume nothing about the API used to encode/decode metadata.
28+
Therefore, the error could be anything.
29+
30+
```rust, noplayground, ignore
31+
{{#include ../../tests/book_metadata.rs:metadata_retrieval}}
32+
```
33+
34+
```rust, noplayground, ignore
35+
{{#include ../../tests/book_metadata.rs:metadata_retrieval_none}}
36+
```

tests/book_metadata.rs

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
#[cfg(feature = "derive")]
2+
#[test]
3+
fn book_mutation_metadata() {
4+
// ANCHOR: metadata_derive
5+
#[derive(serde::Serialize, serde::Deserialize, tskit::metadata::MutationMetadata)]
6+
#[serializer("serde_json")]
7+
struct MutationMetadata {
8+
effect_size: f64,
9+
dominance: f64,
10+
}
11+
// ANCHOR_END: metadata_derive
12+
13+
// ANCHOR: add_mutation_table_row_with_metadata
14+
let mut tables = tskit::TableCollection::new(50.0).unwrap();
15+
16+
let md = MutationMetadata {
17+
effect_size: 1e-3,
18+
dominance: 1.0,
19+
};
20+
21+
tables
22+
.add_mutation_with_metadata(
23+
0, // site id
24+
0, // node id
25+
-1, // mutation parent id
26+
0.0, // time
27+
None, // derived state is Option<&[u8]>
28+
&md, // metadata for this row
29+
)
30+
.unwrap();
31+
// ANCHOR_END: add_mutation_table_row_with_metadata
32+
33+
// ANCHOR: add_mutation_table_row_without_metadata
34+
tables
35+
.add_mutation(
36+
0, // site id
37+
0, // node id
38+
-1, // mutation parent id
39+
0.0, // time
40+
None, // derived state is Option<&[u8]>
41+
)
42+
.unwrap();
43+
// ANCHOR_END: add_mutation_table_row_without_metadata
44+
45+
// ANCHOR: validate_metadata_row_contents
46+
assert_eq!(
47+
tables
48+
.mutations_iter()
49+
.filter(|m| m.metadata.is_some())
50+
.count(),
51+
1
52+
);
53+
assert_eq!(
54+
tables
55+
.mutations_iter()
56+
.filter(|m| m.metadata.is_none())
57+
.count(),
58+
1
59+
);
60+
// ANCHOR_END: validate_metadata_row_contents
61+
62+
// ANCHOR: metadata_retrieval
63+
let fetched_md = match tables.mutations().metadata::<MutationMetadata>(0.into()) {
64+
Some(Ok(m)) => m,
65+
Some(Err(e)) => panic!("metadata decoding failed: {:?}", e),
66+
None => panic!("hmmm...row 0 should have been a valid row with metadata..."),
67+
};
68+
69+
assert_eq!(md.effect_size, fetched_md.effect_size);
70+
assert_eq!(md.dominance, fetched_md.dominance);
71+
// ANCHOR_END: metadata_retrieval
72+
73+
// ANCHOR: metadata_retrieval_none
74+
// There is no metadata at row 1, so
75+
// you get None back
76+
assert!(tables
77+
.mutations()
78+
.metadata::<MutationMetadata>(1.into())
79+
.is_none());
80+
81+
// There is also no metadata at row 2,
82+
// because that row does not exist, so
83+
// you get None back
84+
assert!(tables
85+
.mutations()
86+
.metadata::<MutationMetadata>(2.into())
87+
.is_none());
88+
// ANCHOR_END: metadata_retrieval_none
89+
}

0 commit comments

Comments
 (0)