|
| 1 | +# EventPipe (File) Format |
| 2 | + |
| 3 | +EventPipe is the name of the logging mechanism given to system used by the .NET Core |
| 4 | +runtime to log events in a OS independent way. It is meant to serve roughly the same |
| 5 | +niche as ETW does on Windows, but works equally well on Linux. |
| 6 | + |
| 7 | +By convention files in this format are call *.netperf files and this can be thought |
| 8 | +of as the NetPerf File format. However the format is more flexible than that. |
| 9 | + |
| 10 | +The format was designed to take advantage of the facilities of the FastSerialization |
| 11 | +library used by TraceEvent, however the format can be understood on its own, and here |
| 12 | +we describe everything you need to know to use the format. |
| 13 | + |
| 14 | +Fundamentally, the data can be thought of as a serialization of objects. we want the |
| 15 | +format to be Simple, Extensible (it can tolerate multiple versions) and |
| 16 | +make it as easy as possible to be both backward (new readers can read old data version) |
| 17 | +and forward (old readers can read new data versions). We also want to be efficient |
| 18 | +and STREAMABLE (no need for seek, you can do most operations with just 'read'). |
| 19 | + |
| 20 | +Assumptions of the Format: |
| 21 | + |
| 22 | +We assume the following: |
| 23 | + |
| 24 | +* Primitive Types: The format assumes you can emit the primitive data types |
| 25 | + (byte, short, int, long). It is in little endian (least significant byte first) |
| 26 | +* Strings: Strings can be emitted by emitting a int BYTE count followed by the |
| 27 | + UTF8 encoding |
| 28 | +* StreamLabels: The format assumes you know the start of the stream (0) and |
| 29 | + you keep track of your position. The format currently assumes this is |
| 30 | + a 32 bit number (thus limiting references using StreamLabels to 4GB) |
| 31 | + This may change but it is a format change if you do). |
| 32 | +* Compression: The format does not try to be particularly smart about compression |
| 33 | + The idea is that compression is VERY likely to be best done by compressing |
| 34 | + the stream as a whole so it is not that important that we do 'smart' things |
| 35 | + like make variable length integers etc. Instead the format is tuned for |
| 36 | + making it easy for the memory to be used 'in place' and assumes that compression |
| 37 | + will be done on the stream outside of the serialization/deserialization. |
| 38 | + * Alignment: by default the stream is only assumed to be byte aligned. However |
| 39 | + as you will see particular objects have a lot of flexibility in their encoding |
| 40 | + and they may choose to align their data. The is valuable because it allows |
| 41 | + efficient 'in place' use of the data stream, however it is more the exception |
| 42 | + than the rule. |
| 43 | + |
| 44 | +## First Bytes: The Stream Header: |
| 45 | + |
| 46 | +The beginning of the format is always the stream header. This header's only purpose |
| 47 | +is to quickly identify the format of this stream (file) as a whole, and to indicate |
| 48 | +exactly which version of the basic Stream library should be used. It is exactly |
| 49 | +one (length prefixed UTF string with the value "!FastSerialization.1" This declares |
| 50 | +the the rest of file uses the FastSerialization version 1 conventions. |
| 51 | + |
| 52 | +Thus the first 24 bytes of the file will be |
| 53 | + 4 bytes little endian number 20 (number of bytes in "!FastSerialization.1" |
| 54 | + 20 bytes of the UTF8 encoding of "!FastSerialization.1" |
| 55 | + |
| 56 | +After the format is a list of objects. |
| 57 | + |
| 58 | +## Objects: |
| 59 | + |
| 60 | +The format has the concept of an object. Indeed the stream can be thought of as |
| 61 | +simply the serialization of a list of objects. |
| 62 | + |
| 63 | +Tags: The format uses a number of byte-sized tags that are used in the serialization |
| 64 | +and use of objects. In particular there are BeginObject and EndObject which |
| 65 | +are used to define a new object, as well as a few other (discussed below) which |
| 66 | +allow you to refer to objects. |
| 67 | +There are only a handful of them, see the Tags Enum for a complete list. |
| 68 | + |
| 69 | +Object Types: every object has a type. A type at a minimum represents |
| 70 | + 1. The name of the type (which allows the serializer and deserializer to agree what |
| 71 | + is being transmitted |
| 72 | + 2. The version number for the data being sent. |
| 73 | + 3. A minumum version number. new format MAY be compatible with old readers |
| 74 | + this version indicates the oldest reader that can read this format. |
| 75 | + |
| 76 | +An object's structure is |
| 77 | + |
| 78 | +* BeginObject Tag |
| 79 | +* SERIALIZED TYPE |
| 80 | +* SERIALIZED DATA |
| 81 | +* EndObject Tag |
| 82 | + |
| 83 | +As mentioned a type is just another object, but the if that is true it needs a type |
| 84 | +which leads to infinite recursion. Thus the type of a type is alwasy simply |
| 85 | +a special tag call the NullReference that represent null. |
| 86 | + |
| 87 | +## The First Object: The EventTrace Object |
| 88 | + |
| 89 | +After the Trace Header comes the EventTrace object, which represents all the data |
| 90 | +about the Trace as a whole. |
| 91 | + |
| 92 | +* BeginObject Tag (begins the EventTrace Object) |
| 93 | +* BeginObject Tag (begins the Type Object for EventTrace) |
| 94 | +* NullReference Tag (represents the type of type, which is by convention null) |
| 95 | +* 4 byte integer Version field for type |
| 96 | +* 4 byte integer MinimumReaderVersion field for type |
| 97 | +* SERIALIZED STRING for FullName Field for type (4 byte length + UTF8 bytes) |
| 98 | +* EndObject Tag (ends Type Object) |
| 99 | +* DATA FIELDS FOR EVENTTRACE OBJECT |
| 100 | +* End Object Tag (for EventTrace object) |
| 101 | + |
| 102 | +The data field for object depend are deserialized in the 'FromStream' for |
| 103 | +the class that deserialize the object. EventPipeEventSource is the class |
| 104 | +that deserializes the EventTrace object, so you can see its fields there. |
| 105 | +These fields are the things like the time the trace was collected, the |
| 106 | +units of the event timestamps, and other things that apply to all events. |
| 107 | + |
| 108 | +## Next Objects : The EventBlock Object |
| 109 | + |
| 110 | +After the EventTrace object there are zero or more EventBlock objects. |
| 111 | +they look very much like the EventTrace object's layout ultimate fields |
| 112 | +are different |
| 113 | + |
| 114 | +* BeginObject Tag (begins the EventBlock Object) |
| 115 | +* BeginObject Tag (begins the Type Object for EventBlock) |
| 116 | +* NullReference Tag (represents the type of type, which is by convention null) |
| 117 | +* 4 byte integer Version field for type |
| 118 | +* 4 byte integer MinimumReaderVersion field for type |
| 119 | +* SERIALIZED STRING for FullName Field for type (4 byte length + UTF8 bytes) |
| 120 | +* EndObject Tag (ends Type Object) |
| 121 | +* DATA FIELDS FOR EVENTBLOCK OBJECT (size of blob + event bytes blob) |
| 122 | +* End Object Tag (for EventBlock object) |
| 123 | + |
| 124 | +The data in an EventBlock is simply an integer representing the size (in |
| 125 | +bytes not including the size int itself) of the data blob and the event |
| 126 | +data blob itself. |
| 127 | + |
| 128 | +The event blob itself is simply a list of 'event' blobs. each blob has |
| 129 | +a header (defined by EventPipeEventHeader), following by some number of |
| 130 | +bytes of payload data, followed by the byteSize and bytes for the stack |
| 131 | +associated with the event. See EventPipeEventHeader for details. |
| 132 | + |
| 133 | +Some events are actually not true data events but represent meta-data |
| 134 | +about an event. This data includes the name of the event, the name |
| 135 | +of the provider of the event and the names and types of all the fields |
| 136 | +of the event. This meta-data is given an small integer numeric ID |
| 137 | +(starts at 1 and grows incrementally), |
| 138 | + |
| 139 | +One of the fields for an event is this Meta-data ID. An event with |
| 140 | +a Meta-data ID of 0 is expected to be a Meta-data event itself. |
| 141 | +See the constructor of EventPipeEventMetaData for details of the |
| 142 | +format of this event. |
| 143 | + |
| 144 | +## Ending the stream: The NullReference Tag |
| 145 | + |
| 146 | +After the last EventBlock is emitted, the stream is ended by |
| 147 | +emitting a NullReference Tag which indicates that there are no |
| 148 | +more objects in the stream to read. |
| 149 | + |
| 150 | +## Versioning the Format While Maintaining Compatibility |
| 151 | + |
| 152 | +### Backward compatibility |
| 153 | + |
| 154 | +It is a relatively straightforward excercise to update the file format |
| 155 | +to add more information while maintaining backward compatibility (that is |
| 156 | +new readers can read old writers). What is necessary is to |
| 157 | + |
| 158 | +1. For the EventTrace Type, Increment the Version number |
| 159 | +and set the MinimumReaderVersion number to this same value. |
| 160 | +2. Update the reader for the changed type to look at the Version |
| 161 | +number of the type and if it is less than the new version do |
| 162 | +what you did before, and if it is the new version read the new format |
| 163 | +for that object. |
| 164 | + |
| 165 | +By doing (1) we make it so that every OLD reader does not simply |
| 166 | +crash misinterpreting data, but will learly notice that it does |
| 167 | +not support this new version (because the readers Version is less |
| 168 | +than the MinimumReaderVersion value), and can issue a clean error |
| 169 | +that is useful to the user. |
| 170 | + |
| 171 | +Doing (2) is also straightforward, but it does mean keeping the old |
| 172 | +reading code. This is the price of compatibility. |
| 173 | + |
| 174 | +### Forward compatibility |
| 175 | + |
| 176 | +Making changes so that we preserve FORWARD compatibility (old readers |
| 177 | +can read new writers) is more constaining, because old readers have |
| 178 | +to at least know how to 'skip' things they don't understand. |
| 179 | + |
| 180 | +There are however several ways to do this. The simplest way is to |
| 181 | + |
| 182 | +* Add Tagged values to an object. |
| 183 | + |
| 184 | +Every object has a begin tag, a type, data objects, and an end tag. |
| 185 | +One feature of the FastSerialiable library is that it has a tag |
| 186 | +for all the different data types (bool, byte, short, int, long, string blob). |
| 187 | +It also has logic that after parsing the data area it 'looks' for |
| 188 | +the end tag (so we know the data is partially sane at least). However |
| 189 | +during this search if it finds other tags, it knows how to skip them. |
| 190 | +Thus if after the 'Know Version 0' data objects, you place tagged |
| 191 | +data, ANY reader will know how to skip it (it skips all tagged things |
| 192 | +until it finds an endObject tag). |
| 193 | + |
| 194 | +This allows you to add new fields to an object in a way that OLD |
| 195 | +readers can still parse (at least enough to skip them). |
| 196 | + |
| 197 | +Another way to add new data to the file is to |
| 198 | + |
| 199 | +* Add new object (and object types) to the list of objects. |
| 200 | + |
| 201 | +The format is basically a list of objects, but there is no requirement |
| 202 | +that there are only very loose requirements on the order or number of these |
| 203 | +Thus you can create a new object type and insert that object in the |
| 204 | +stream (that object must have only tagged fields however but a tagged |
| 205 | +blob can do almost anything). This allows whole new objects to be |
| 206 | +added to the file format without breaking existing readers. |
| 207 | + |
| 208 | +#### Version Numbers and forward compatibility. |
| 209 | + |
| 210 | +There is no STRONG reason to update the version number when you make |
| 211 | +changes to the format that are both forward (and backward compatible). |
| 212 | +However it can be useful to update the file version because it allows |
| 213 | +readers to quickly determine the set of things it can 'count on' and |
| 214 | +therefore what user interface can be supported. Thus it can be useful |
| 215 | +to update the version number when a non-trival amount of new functionality |
| 216 | +is added. |
| 217 | + |
| 218 | +You can update the Version number but KEEP the MinimumReaderVersion |
| 219 | +unchanged to do this. THus readers quickly know what they can count on |
| 220 | +but old readers can still read the new format. |
| 221 | + |
| 222 | +## Suport for Random Access Streams |
| 223 | + |
| 224 | +So far the features used in the file format are the simplest. In particular |
| 225 | +on object never directly 'points' at another and the stream can be |
| 226 | +processed usefully without needing information later in the file. |
| 227 | + |
| 228 | +But we pay a price for this: namely you have to read all the data in the |
| 229 | +file even if you only care about a small fraction of it. If however |
| 230 | +you have random access (seeking) for your stream (that is it is a file), |
| 231 | +you can overcome this. |
| 232 | + |
| 233 | +The serialization library allows this by supporting a table of pointers |
| 234 | +to objects and placing this table at the end of the stream (when you |
| 235 | +know the stream locations of all objects). This would allow you to |
| 236 | +seek to any particular object and only read what you need. |
| 237 | + |
| 238 | +The FastSerialization library supports this, but the need for this kind |
| 239 | +of 'random access' is not clear at this time (mostly the data needs |
| 240 | +to be processed again and thus you need to read it all anyway). For |
| 241 | +now it is is enough to know that this capability exists if we need it. |
0 commit comments