Skip to content

Minimize unmarshaling allocations #555

@CAFxX

Description

@CAFxX

Take the following message generated by gogofaster:

type Period struct {
	Start *timestamp.Timestamp `protobuf:"bytes,1,opt,name=start,proto3" json:"start,omitempty"`
	End   *timestamp.Timestamp `protobuf:"bytes,2,opt,name=end,proto3" json:"end,omitempty"`
}

Unmarshaling the corresponding message currently will allocate the two Timestamp objects separately. IIUC this is to allow to distinguish between the fields being unset or having the default value (there's the nullable option that can be used if we don't care about the distinction).

I hope I'm not missing anything but I think there should be a way to avoid the allocations while keeping nullability. A couple of options that come to mind:

Using internal pointers

If the generated struct for the same message was instead:

type Period struct {
	Start *timestamp.Timestamp `protobuf:"bytes,1,opt,name=start,proto3" json:"start,omitempty"`
	End   *timestamp.Timestamp `protobuf:"bytes,2,opt,name=end,proto3" json:"end,omitempty"`
	// co-located storage
	xxxStart timestamp.Timestamp `json:"-"` 
	xxxEnd   timestamp.Timestamp `json:"-"` 
}

and during unmarshaling instead of doing, e.g.:

if m.Start == nil {
	m.Start = &timestamp.Timestamp{}
}

we did:

if m.Start == nil {
	m.Start = &m.xxxStart // yay no allocation
}

everything AFAICT should keep working and we should avoid the allocations.

pros: lowers the number of allocations, unobtrusive change (no change to APIs)
cons: does not lower the amount of allocated memory or the pointer chasing, increases memory usage when an object is created programmatically instead of being unmarshalled

Using flags

If the generated struct for the same message was instead:

type Period struct {
	// not pointers
	Start timestamp.Timestamp `protobuf:"bytes,1,opt,name=start,proto3" json:"start,omitempty"`
	End   timestamp.Timestamp `protobuf:"bytes,2,opt,name=end,proto3" json:"end,omitempty"`
	// flags signaling whether the fields above are set or not (i.e. not null or null)
	xxxStartIsSet bool `json:"-"`
	xxxEndIsSet   bool `json:"-"`
}

where xxxFieldIsSet is a boolean flag that signals whether the corresponding Field is set or not (potentially these could be combined in a single bitfield, so that each flag uses a single bit instead of one byte). This is basically the same idea of #49.

  • Getters would check the flag and, if set, would return a pointer to the field (e.g. &m.Start).
  • Setters would set the flag and write the value to field
  • During unmarshaling no special initialization is required, as all fields default to the zero value and the flags default to "unset". When a field is encountered on the wire, the corresponding xxxFieldIsSet flag would be set to true. Nested messages can be unmarshaled by taking the address of the inner structure and invoking Unmarshal on that.

pros: lowers the number of allocations and the amount of allocated memory, lowers the amount of pointer chasing
cons: invasive change, requires setters

Note that the first solution is basically the same as the second one, just we're using pointer-sized flags. The second one would obviously help beyond unmarshaling: e.g. when marshaling a new object, as in the first case the xxxFields are hidden, so they can't be used by callers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions