Improve deserialization of JSON primitives into JsonElement#116419
Improve deserialization of JSON primitives into JsonElement#116419PranavSenthilnathan wants to merge 5 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR improves deserialization of JSON primitives into JsonElement by caching immutable MetadataDb instances for common literal values, strings, and numbers. Key changes include updating Parse methods to use a ref Utf8JsonReader and introducing token-specific caching in MetadataDb, along with a minor adjustment in the metadata buffer sizing logic.
- Updated Parse logic to pass reader by reference and select caching based on token type.
- Introduced new MetadataDb creation methods (for literal, string, and number values) and a locked cache for small primitives.
- Adjusted the condition for enlarging the MetadataDb buffer.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| JsonDocument.Parse.cs | Adjusted parsing logic to use ref Utf8JsonReader and to call new CreateLockedFor* methods based on token type. |
| JsonDocument.MetadataDb.cs | Introduced new caching methods for literals, strings, and numbers, and modified the buffer enlargement check. |
Comments suppressed due to low confidence (2)
src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.MetadataDb.cs:266
- Changing the condition from '>=' to '>' alters when the buffer is enlarged. Please verify that the new check correctly prevents buffer overflows when appending new rows.
if (Length > _data.Length - DbRow.Size)
src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.Parse.cs:771
- Subtracting 2 from the payload length assumes that the JSON string always includes both starting and ending quotes. Please confirm that this logic safely handles all edge cases.
MetadataDb database = MetadataDb.CreateLockedForString(utf8Json.Length - 2, reader.ValueIsEscaped);
|
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis |
When creating
JsonElementthere is an extra overhead of creating and storing theMetadataDbin addition to the required UTF-8 payload. We can reduce this overhead by caching readonly databases for primitives of small length. This PR only affects deserialization ofJsonElementwhen it is part of a larger deserialization, like extension data and dictionaries (if the value isobject,JsonElement, orJsonNode). This should cover most places where aJsonElementof a primitive is created, but there's nothing preventing us from extending it to top levelJsonElementdeserialization as well.Caching is based on the length in bytes of the UTF-8 JSON payload. The threshold was arbitrarily chosen - numbers have threshold of 8 bytes and strings 16 bytes.
The perf results show up to ~20% improvement in some cases.
Benchmarks
Benchmarking code