-
Notifications
You must be signed in to change notification settings - Fork 83
Description
When a schema is loaded in MetadataSchema it is passed through codec.modify_schema. This is a noop except for in StructCodec where the requirement that properties that do not have a default value are required is enforced by making a required list if there is none. (This needs to be enforced for struct as we can't encode a missing property. )
In 0.6.0 metadata.schema returns a deep copy of the unmodified schema, this was changed in 0.6.1 to a deepcopy of the modified schema as otherwise the numpy_dtype function would get the wrong schema. (It accesses the internal ._schema to not get a copy). This did not feel like a breaking change as the modified schema should behave identically to the original. Diff is here: 0.6.0...0.6.1#diff-4e0d1d371651402389605bb0c53f9d9c60726d255c5929d4084c13e29f50eecfR804
Here is how this is effecting @molpopgen:
He makes a schema:
MutationMetadata = tskit.metadata.MetadataSchema(
{
"codec": "struct",
"type": ["object", "null"],
"name": "Mutation metadata",
"properties": {
"s": {"type": "number", "binaryFormat": "d"},
"h": {"type": "number", "binaryFormat": "d"},
"origin": {"type": "number", "binaryFormat": "i"},
"neutral": {"type": "number", "binaryFormat": "?"},
"label": {"type": "number", "binaryFormat": "H"},
"key": {"type": "number", "binaryFormat": "Q"},
},
"additionalProperties": False,
}
)He then modifies the schema via:
_MutationMetaWithVectorsDict = copy.deepcopy(MutationMetadata.schema)
_MutationMetaWithVectorsDict["name"] = "Mutation metadata with vectors"
_MutationMetaWithVectorsDict["properties"]["esizes"] = {
"type": "array",
"items": {"type": "number", "binaryFormat": "d"},
}
_MutationMetaWithVectorsDict["properties"]["heffects"] = {
"type": "array",
"items": {"type": "number", "binaryFormat": "d"},
}
MutationMetadataWithVectors = tskit.metadata.MetadataSchema(
_MutationMetaWithVectorsDict
)(Note that the deepcopy is redundant.) When the original schema was generated, it had the required array added to it by modify_schema:
{'additionalProperties': False,
'codec': 'struct',
'name': 'Mutation metadata with vectors',
'properties': OrderedDict([('h', {'binaryFormat': 'd', 'type': 'number'}),
('key', {'binaryFormat': 'Q', 'type': 'number'}),
('label', {'binaryFormat': 'H', 'type': 'number'}),
('neutral',
{'binaryFormat': '?', 'type': 'number'}),
('origin', {'binaryFormat': 'i', 'type': 'number'}),
('s', {'binaryFormat': 'd', 'type': 'number'}),
('esizes',
{'items': {'binaryFormat': 'd', 'type': 'number'},
'type': 'array'}),
('heffects',
{'items': {'binaryFormat': 'd', 'type': 'number'},
'type': 'array'})]),
'required': ['s', 'h', 'origin', 'neutral', 'label', 'key'],
'type': ['object', 'null']}
Thus when the new heffects and esizes are added they are missing from the required list, modify_schema doesn't add a required list as it already sees one and the schema fails validation with tskit.exceptions.MetadataSchemaValidationError: Optional property 'esizes' must have a default value.
Kevin's solution of
MutationMetadataWithVectors = tskit.metadata.MetadataSchema(
{
"codec": "struct",
"type": ["object", "null"],
"name": "Mutation metadata with vectors",
"properties": copy.deepcopy(MutationMetadata.schema["properties"])
| {
"esizes": {
"type": "array",
"items": {"type": "number", "binaryFormat": "d"},
},
"heffects": {
"type": "array",
"items": {"type": "number", "binaryFormat": "d"},
},
},
"additionalProperties": False,
}
)works because the schema now has no required list and modify_schema inserts one.
The metadata tests are pretty comprehensive, but "make a struct schema, then get it back out again and add a property without a default value" was not in the tested behaviours. Apologies!
One option to return to the original behavior is to have MetadataSchema save the original schema and return that when MetadataSchema.schema is called. I think that is probably the desired behaviour here.