Adding Seq.traverse & sequence functions#277
Conversation
TheAngryByrd
left a comment
There was a problem hiding this comment.
Currently, my function is different in that it doesn't exit the sequence early upon finding an error. If this is a concern for performance
I think we do need to address not iterating the whole sequence as this might be a behavior people are relying on.
I have two tests below, one calling the old implementation and one calling the new. As you can see the new one iterates a whole list.
ftestCase "sequenceResultMv1 with few invalid data with long data set"
<| fun _ ->
let mutable lastValue = null
let mutable callCount = 0
let longerData =
seq {
for i in 1..1000 do
lastValue <- string i
yield string i
}
let tweets =
seq {
""
"Hello"
aLongerInvalidTweet
yield! longerData
}
let tryCreate tweet =
callCount <-
callCount
+ 1
match tweet with
| x when String.IsNullOrEmpty x -> Error "Tweet shouldn't be empty"
| x when x.Length > 280 -> Error "Tweet shouldn't contain more than 280 characters"
| x -> Ok(x)
let actual = sequenceResultMv1 (Seq.map tryCreate tweets)
Expect.equal callCount 1 "Should have called the function only 3 times"
Expect.equal lastValue null ""
Expect.equal
actual
(Error emptyTweetErrMsg)
"traverse the sequence and return the first error"
ftestCase "sequenceResultM with few invalid data with long data set"
<| fun _ ->
let mutable lastValue = null
let mutable callCount = 0
let longerData =
seq {
for i in 1..1000 do
lastValue <- string i
yield string i
}
let tweets =
seq {
""
"Hello"
aLongerInvalidTweet
yield! longerData
}
let tryCreate tweet =
callCount <-
callCount
+ 1
match tweet with
| x when String.IsNullOrEmpty x -> Error "Tweet shouldn't be empty"
| x when x.Length > 280 -> Error "Tweet shouldn't contain more than 280 characters"
| x -> Ok(x)
let actual = Seq.sequenceResultM (Seq.map tryCreate tweets)
Expect.equal callCount 3 "Should have called the function only 3 times"
Expect.equal lastValue null ""
Expect.equal
actual
(Error emptyTweetErrMsg)
"traverse the sequence and return the first error"In terms of performance, this may look fine for lists of numbers but it does become a concern when it's items that take time to generate when iterating the sequence. Creating a seq { } with a Thread.Sleep for instance would show the impact of iterating a whole list.
| "version": "0.2.0", | ||
| "configurations": [ | ||
| { | ||
| "name": "benchmarks", |
| "version": "2.0.0", | ||
| "tasks": [ | ||
| { | ||
| "label": "build release", |
|
No, that makes sense! I can implement the mutable implementation and add the benchmarks accordingly 🫡 |
|
I think this is ready for re-review @TheAngryByrd. Here are the benchmarks I got. I went with inlining since it showed slightly faster results. |
| let mutable state = state | ||
| let enumerator = xs.GetEnumerator() | ||
|
|
||
| while Result.isOk state | ||
| && enumerator.MoveNext() do | ||
| match state, f enumerator.Current with | ||
| | Error e, _ -> state <- Error e | ||
| | Ok oks, Ok ok -> | ||
| state <- | ||
| Seq.singleton ok | ||
| |> Seq.append oks | ||
| |> Ok | ||
| | Ok _, Error e -> state <- Error e | ||
|
|
||
| state |
There was a problem hiding this comment.
-
Should these technically not use
useon the enumerator?use enumerator = xs.GetEnumerator()
-
Using
seq { yield ok; yield! oks }and callingSeq.revat the end is significantly (3–4×) faster on its own thanSeq.singleton+Seq.append. The difference is much greater when you actually iterate the resulting sequences, where using a sequence expression is up to ~333× as fast and allocates as little as 1⁄250th as much. -
Seq.singleton+Seq.appendas used here, orseq { yield! oks; yield ok }, can result in a stack overflow on iteration if the input sequence is long enough andfis defined in another assembly.
| let mutable state = state | |
| let enumerator = xs.GetEnumerator() | |
| while Result.isOk state | |
| && enumerator.MoveNext() do | |
| match state, f enumerator.Current with | |
| | Error e, _ -> state <- Error e | |
| | Ok oks, Ok ok -> | |
| state <- | |
| Seq.singleton ok | |
| |> Seq.append oks | |
| |> Ok | |
| | Ok _, Error e -> state <- Error e | |
| state | |
| match state with | |
| | Error _ -> state | |
| | Ok oks -> | |
| use enumerator = xs.GetEnumerator() | |
| let rec loop oks = | |
| if enumerator.MoveNext() then | |
| match f enumerator.Current with | |
| | Ok ok -> loop (seq { yield ok; yield! oks }) | |
| | Error e -> Error e | |
| else | |
| Ok(Seq.rev oks) | |
| loop oks |
Benchmarks
Source: https://gist.github.com/brianrourkeboll/6febebc4849708071eb08511491019b1
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
|---------------------- |---------------:|--------------:|--------------:|------:|--------:|-----------:|-----------:|----------:|-------------:|------------:|
| SeqFuncs | 16.938 us | 0.2691 us | 0.2385 us | 1.00 | 0.02 | 10.1929 | 5.0354 | - | 125.05 KB | 1.00 |
| SeqExpr | 5.261 us | 0.0831 us | 0.0737 us | 0.31 | 0.01 | 4.4708 | 1.1749 | - | 54.81 KB | 0.44 |
| | | | | | | | | | | |
| SeqFuncs_Million | 336,470.000 us | 6,425.8669 us | 6,598.8922 us | 1.00 | 0.03 | 12500.0000 | 12000.0000 | 2500.0000 | 125000.84 KB | 1.00 |
| SeqExpr_Million | 74,218.799 us | 1,441.9837 us | 1,716.5792 us | 0.22 | 0.01 | 5428.5714 | 5285.7143 | 1142.8571 | 54687.9 KB | 0.44 |
| | | | | | | | | | | |
| SeqFuncs_Iter_Million | 338,091.889 us | 4,229.8594 us | 3,749.6602 us | 1.00 | 0.02 | 12500.0000 | 12000.0000 | 2500.0000 | 125000.86 KB | 1.00 |
| SeqExpr_Iter_Million | 73,353.986 us | 1,465.0880 us | 1,744.0832 us | 0.22 | 0.01 | 5428.5714 | 5285.7143 | 1142.8571 | 54687.92 KB | 0.44 |
| | | | | | | | | | | |
| SeqFuncs_Iter | 9,771.981 us | 104.2526 us | 97.5180 us | 1.000 | 0.01 | 2578.1250 | 359.3750 | 140.6250 | 31718.95 KB | 1.000 |
| SeqExpr_Iter | 26.320 us | 0.5006 us | 0.6148 us | 0.003 | 0.00 | 9.9182 | 1.5564 | - | 121.75 KB | 0.004 |There was a problem hiding this comment.
Thanks for reviewing this! I learned a lot with just a few comments.
- I never knew the enumerator incorporated IDisposable since I actually have never had a need for it until now 😅
- Stupid question but why does iterating after the sequence is made make such a difference between the function and expression? Is there any reading material you could by chance point to, if it exists?
- Since it sounds like stack overflow is inevitable here, do you think adding a xml comment warning about this would be the only thing we can do to help here? Or maybe you have some additional suggestions/ideas?
There was a problem hiding this comment.
Actually ignore point 2, it just clicked for me 🤦
There was a problem hiding this comment.
Since it sounds like stack overflow is inevitable here
The version I posted (using seq { yield ok; yield! oks } and Seq.rev at the end) doesn't blow the stack; using seq { yield! oks; yield ok } instead (note the order of the yield! and yield), would, though.
… reviewer suggestions
|
Here are the updated benchmarks after making the changes @brianrourkeboll recommended. I'm hoping to not make it too cluttered, but I felt it was good information to keep historical record of if people come looking back at this later. I'm open to recommendations on styling/organization. |
Paket Lock Diff ReportThis report was generated via Paket Lock Diff Additions - (1)
Removals - (6)
Version Upgrades - (15)
Version Downgrades - (0) |
TheAngryByrd
left a comment
There was a problem hiding this comment.
Thank you so much for this! Excellent work :)
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder
|
Thank you both for the patience and reviews! 😄 |



Proposed Changes
Currently we have one implementation for
Seq.sequenceResultMhowever, I believe theSeq.cacheis just writing the sequence into memory unnecessarily. If we use aSeq.foldoperation, we can significantly reduce the memory overhead of applications which have excessively large quantities of data i.e. hundreds of thousands or more (assuming my rough draft benchmark is written correctly)v1 - current implementation
v2 - proposed implementation
Currently, my function is different in that it doesn't exit the sequence early upon finding an error. If this is a concern for performance, we can always discuss having a
memoizeinternal to thetraverseXM'functions.Types of changes
What types of changes does your code introduce to FsToolkit.ErrorHandling?
Checklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
I'm open for discussion as this was just something I was playing around with