Skip to content

Commit ed9cc8d

Browse files
adamsitnikVance Morrison
authored andcommitted
Event blocks (#546)
* Changes for the next version of the EventPipe file format * Added Versioning support, documentation * Update V3 format for Event MetaData This simplifies the V3 format. I also spit out the code for the old format so it would be easier to remove it later. * nits * making it work with the latest format * padding support * get ProviderId based on ProviderName, assume Metadata is mandatory and it's lenght does not come first * update the tests * update test file after merge of my fork with CoreCLR/master * update C# version for all projects * improvements after code review * improvements after code review * aligning change: after the block size comes eventual padding
1 parent 37df253 commit ed9cc8d

File tree

8 files changed

+544
-204
lines changed

8 files changed

+544
-204
lines changed

src/Directory.Build.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
</PropertyGroup>
77

88
<PropertyGroup>
9-
<LangVersion>6</LangVersion>
9+
<LangVersion>7</LangVersion>
1010
<Features>strict</Features>
1111
</PropertyGroup>
1212

src/FastSerialization/FastSerialization.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2359,7 +2359,7 @@ internal enum Tags : byte
23592359
Int64,
23602360
SkipRegion,
23612361
String, // Size of string (in bytes) followed by UTF8 bytes.
2362-
Blob, // Size of bytes followed by bytes.
2362+
Blob,
23632363
Limit, // Just past the last valid tag, used for asserts.
23642364
}
23652365
#endregion

src/TraceEvent/EventPipe/EventPipeEventSource.cs

Lines changed: 269 additions & 177 deletions
Large diffs are not rendered by default.
Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# EventPipe (File) Format
2+
3+
EventPipe is the name of the logging mechanism given to system used by the .NET Core
4+
runtime to log events in a OS independent way. It is meant to serve roughly the same
5+
niche as ETW does on Windows, but works equally well on Linux.
6+
7+
By convention files in this format are call *.netperf files and this can be thought
8+
of as the NetPerf File format. However the format is more flexible than that.
9+
10+
The format was designed to take advantage of the facilities of the FastSerialization
11+
library used by TraceEvent, however the format can be understood on its own, and here
12+
we describe everything you need to know to use the format.
13+
14+
Fundamentally, the data can be thought of as a serialization of objects. we want the
15+
format to be Simple, Extensible (it can tolerate multiple versions) and
16+
make it as easy as possible to be both backward (new readers can read old data version)
17+
and forward (old readers can read new data versions). We also want to be efficient
18+
and STREAMABLE (no need for seek, you can do most operations with just 'read').
19+
20+
Assumptions of the Format:
21+
22+
We assume the following:
23+
24+
* Primitive Types: The format assumes you can emit the primitive data types
25+
(byte, short, int, long). It is in little endian (least significant byte first)
26+
* Strings: Strings can be emitted by emitting a int BYTE count followed by the
27+
UTF8 encoding
28+
* StreamLabels: The format assumes you know the start of the stream (0) and
29+
you keep track of your position. The format currently assumes this is
30+
a 32 bit number (thus limiting references using StreamLabels to 4GB)
31+
This may change but it is a format change if you do).
32+
* Compression: The format does not try to be particularly smart about compression
33+
The idea is that compression is VERY likely to be best done by compressing
34+
the stream as a whole so it is not that important that we do 'smart' things
35+
like make variable length integers etc. Instead the format is tuned for
36+
making it easy for the memory to be used 'in place' and assumes that compression
37+
will be done on the stream outside of the serialization/deserialization.
38+
* Alignment: by default the stream is only assumed to be byte aligned. However
39+
as you will see particular objects have a lot of flexibility in their encoding
40+
and they may choose to align their data. The is valuable because it allows
41+
efficient 'in place' use of the data stream, however it is more the exception
42+
than the rule.
43+
44+
## First Bytes: The Stream Header:
45+
46+
The beginning of the format is always the stream header. This header's only purpose
47+
is to quickly identify the format of this stream (file) as a whole, and to indicate
48+
exactly which version of the basic Stream library should be used. It is exactly
49+
one (length prefixed UTF string with the value "!FastSerialization.1" This declares
50+
the the rest of file uses the FastSerialization version 1 conventions.
51+
52+
Thus the first 24 bytes of the file will be
53+
4 bytes little endian number 20 (number of bytes in "!FastSerialization.1"
54+
20 bytes of the UTF8 encoding of "!FastSerialization.1"
55+
56+
After the format is a list of objects.
57+
58+
## Objects:
59+
60+
The format has the concept of an object. Indeed the stream can be thought of as
61+
simply the serialization of a list of objects.
62+
63+
Tags: The format uses a number of byte-sized tags that are used in the serialization
64+
and use of objects. In particular there are BeginObject and EndObject which
65+
are used to define a new object, as well as a few other (discussed below) which
66+
allow you to refer to objects.
67+
There are only a handful of them, see the Tags Enum for a complete list.
68+
69+
Object Types: every object has a type. A type at a minimum represents
70+
1. The name of the type (which allows the serializer and deserializer to agree what
71+
is being transmitted
72+
2. The version number for the data being sent.
73+
3. A minumum version number. new format MAY be compatible with old readers
74+
this version indicates the oldest reader that can read this format.
75+
76+
An object's structure is
77+
78+
* BeginObject Tag
79+
* SERIALIZED TYPE
80+
* SERIALIZED DATA
81+
* EndObject Tag
82+
83+
As mentioned a type is just another object, but the if that is true it needs a type
84+
which leads to infinite recursion. Thus the type of a type is alwasy simply
85+
a special tag call the NullReference that represent null.
86+
87+
## The First Object: The EventTrace Object
88+
89+
After the Trace Header comes the EventTrace object, which represents all the data
90+
about the Trace as a whole.
91+
92+
* BeginObject Tag (begins the EventTrace Object)
93+
* BeginObject Tag (begins the Type Object for EventTrace)
94+
* NullReference Tag (represents the type of type, which is by convention null)
95+
* 4 byte integer Version field for type
96+
* 4 byte integer MinimumReaderVersion field for type
97+
* SERIALIZED STRING for FullName Field for type (4 byte length + UTF8 bytes)
98+
* EndObject Tag (ends Type Object)
99+
* DATA FIELDS FOR EVENTTRACE OBJECT
100+
* End Object Tag (for EventTrace object)
101+
102+
The data field for object depend are deserialized in the 'FromStream' for
103+
the class that deserialize the object. EventPipeEventSource is the class
104+
that deserializes the EventTrace object, so you can see its fields there.
105+
These fields are the things like the time the trace was collected, the
106+
units of the event timestamps, and other things that apply to all events.
107+
108+
## Next Objects : The EventBlock Object
109+
110+
After the EventTrace object there are zero or more EventBlock objects.
111+
they look very much like the EventTrace object's layout ultimate fields
112+
are different
113+
114+
* BeginObject Tag (begins the EventBlock Object)
115+
* BeginObject Tag (begins the Type Object for EventBlock)
116+
* NullReference Tag (represents the type of type, which is by convention null)
117+
* 4 byte integer Version field for type
118+
* 4 byte integer MinimumReaderVersion field for type
119+
* SERIALIZED STRING for FullName Field for type (4 byte length + UTF8 bytes)
120+
* EndObject Tag (ends Type Object)
121+
* DATA FIELDS FOR EVENTBLOCK OBJECT (size of blob + event bytes blob)
122+
* End Object Tag (for EventBlock object)
123+
124+
The data in an EventBlock is simply an integer representing the size (in
125+
bytes not including the size int itself) of the data blob and the event
126+
data blob itself.
127+
128+
The event blob itself is simply a list of 'event' blobs. each blob has
129+
a header (defined by EventPipeEventHeader), following by some number of
130+
bytes of payload data, followed by the byteSize and bytes for the stack
131+
associated with the event. See EventPipeEventHeader for details.
132+
133+
Some events are actually not true data events but represent meta-data
134+
about an event. This data includes the name of the event, the name
135+
of the provider of the event and the names and types of all the fields
136+
of the event. This meta-data is given an small integer numeric ID
137+
(starts at 1 and grows incrementally),
138+
139+
One of the fields for an event is this Meta-data ID. An event with
140+
a Meta-data ID of 0 is expected to be a Meta-data event itself.
141+
See the constructor of EventPipeEventMetaData for details of the
142+
format of this event.
143+
144+
## Ending the stream: The NullReference Tag
145+
146+
After the last EventBlock is emitted, the stream is ended by
147+
emitting a NullReference Tag which indicates that there are no
148+
more objects in the stream to read.
149+
150+
## Versioning the Format While Maintaining Compatibility
151+
152+
### Backward compatibility
153+
154+
It is a relatively straightforward excercise to update the file format
155+
to add more information while maintaining backward compatibility (that is
156+
new readers can read old writers). What is necessary is to
157+
158+
1. For the EventTrace Type, Increment the Version number
159+
and set the MinimumReaderVersion number to this same value.
160+
2. Update the reader for the changed type to look at the Version
161+
number of the type and if it is less than the new version do
162+
what you did before, and if it is the new version read the new format
163+
for that object.
164+
165+
By doing (1) we make it so that every OLD reader does not simply
166+
crash misinterpreting data, but will learly notice that it does
167+
not support this new version (because the readers Version is less
168+
than the MinimumReaderVersion value), and can issue a clean error
169+
that is useful to the user.
170+
171+
Doing (2) is also straightforward, but it does mean keeping the old
172+
reading code. This is the price of compatibility.
173+
174+
### Forward compatibility
175+
176+
Making changes so that we preserve FORWARD compatibility (old readers
177+
can read new writers) is more constaining, because old readers have
178+
to at least know how to 'skip' things they don't understand.
179+
180+
There are however several ways to do this. The simplest way is to
181+
182+
* Add Tagged values to an object.
183+
184+
Every object has a begin tag, a type, data objects, and an end tag.
185+
One feature of the FastSerialiable library is that it has a tag
186+
for all the different data types (bool, byte, short, int, long, string blob).
187+
It also has logic that after parsing the data area it 'looks' for
188+
the end tag (so we know the data is partially sane at least). However
189+
during this search if it finds other tags, it knows how to skip them.
190+
Thus if after the 'Know Version 0' data objects, you place tagged
191+
data, ANY reader will know how to skip it (it skips all tagged things
192+
until it finds an endObject tag).
193+
194+
This allows you to add new fields to an object in a way that OLD
195+
readers can still parse (at least enough to skip them).
196+
197+
Another way to add new data to the file is to
198+
199+
* Add new object (and object types) to the list of objects.
200+
201+
The format is basically a list of objects, but there is no requirement
202+
that there are only very loose requirements on the order or number of these
203+
Thus you can create a new object type and insert that object in the
204+
stream (that object must have only tagged fields however but a tagged
205+
blob can do almost anything). This allows whole new objects to be
206+
added to the file format without breaking existing readers.
207+
208+
#### Version Numbers and forward compatibility.
209+
210+
There is no STRONG reason to update the version number when you make
211+
changes to the format that are both forward (and backward compatible).
212+
However it can be useful to update the file version because it allows
213+
readers to quickly determine the set of things it can 'count on' and
214+
therefore what user interface can be supported. Thus it can be useful
215+
to update the version number when a non-trival amount of new functionality
216+
is added.
217+
218+
You can update the Version number but KEEP the MinimumReaderVersion
219+
unchanged to do this. THus readers quickly know what they can count on
220+
but old readers can still read the new format.
221+
222+
## Suport for Random Access Streams
223+
224+
So far the features used in the file format are the simplest. In particular
225+
on object never directly 'points' at another and the stream can be
226+
processed usefully without needing information later in the file.
227+
228+
But we pay a price for this: namely you have to read all the data in the
229+
file even if you only care about a small fraction of it. If however
230+
you have random access (seeking) for your stream (that is it is a file),
231+
you can overcome this.
232+
233+
The serialization library allows this by supporting a table of pointers
234+
to objects and placing this table at the end of the stream (when you
235+
know the stream locations of all objects). This would allow you to
236+
seek to any particular object and only read what you need.
237+
238+
The FastSerialization library supports this, but the need for this kind
239+
of 'random access' is not clear at this time (mostly the data needs
240+
to be processed again and thus you need to read it all anyway). For
241+
now it is is enough to know that this capability exists if we need it.

src/TraceEvent/TraceEvent.Tests/EventPipeParsing.cs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,13 @@ public void CanParseHeaderOfV3EventPipeFile()
100100
using (var eventPipeSource = new EventPipeEventSource(eventPipeFilePath))
101101
{
102102
Assert.Equal(4, eventPipeSource.PointerSize);
103-
Assert.Equal(11376, eventPipeSource._processId);
103+
Assert.Equal(3312, eventPipeSource._processId);
104104
Assert.Equal(4, eventPipeSource.NumberOfProcessors);
105105
Assert.Equal(1000000, eventPipeSource._expectedCPUSamplingRate);
106106

107-
Assert.Equal(636522350205880000, eventPipeSource._syncTimeUTC.Ticks);
108-
Assert.Equal(44518740604, eventPipeSource._syncTimeQPC);
109-
Assert.Equal(2533308, eventPipeSource._QPCFreq);
107+
Assert.Equal(636531024984420000, eventPipeSource._syncTimeUTC.Ticks);
108+
Assert.Equal(20461004832, eventPipeSource._syncTimeQPC);
109+
Assert.Equal(2533315, eventPipeSource._QPCFreq);
110110

111111
Assert.Equal(10, eventPipeSource.CpuSpeedMHz);
112112
}

0 commit comments

Comments
 (0)