Adding Seq.traverse & sequence functions by 1eyewonder · Pull Request #277 · demystifyfp/FsToolkit.ErrorHandling

1eyewonder · 2024-09-17T22:31:55Z

Proposed Changes

Currently we have one implementation for Seq.sequenceResultM however, I believe the Seq.cache is just writing the sequence into memory unnecessarily. If we use a Seq.fold operation, we can significantly reduce the memory overhead of applications which have excessively large quantities of data i.e. hundreds of thousands or more (assuming my rough draft benchmark is written correctly)

v1 - current implementation
v2 - proposed implementation

Currently, my function is different in that it doesn't exit the sequence early upon finding an error. If this is a concern for performance, we can always discuss having a memoize internal to the traverseXM' functions.

Types of changes

What types of changes does your code introduce to FsToolkit.ErrorHandling?

New feature (non-breaking change which adds functionality)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

Build and tests pass locally
I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have added necessary documentation (if appropriate)

Further comments

I'm open for discussion as this was just something I was playing around with

1eyewonder · 2024-09-18T16:13:53Z

Added more benchmarks to test making the function mutable for an early escape, however between GetEnumerator and Seq.fold the difference is fairly minimal

TheAngryByrd

Currently, my function is different in that it doesn't exit the sequence early upon finding an error. If this is a concern for performance

I think we do need to address not iterating the whole sequence as this might be a behavior people are relying on.

I have two tests below, one calling the old implementation and one calling the new. As you can see the new one iterates a whole list.

        ftestCase "sequenceResultMv1 with few invalid data with long data set"
        <| fun _ ->

            let mutable lastValue = null
            let mutable callCount = 0

            let longerData =
                seq {
                    for i in 1..1000 do
                        lastValue <- string i
                        yield string i
                }

            let tweets =
                seq {
                    ""
                    "Hello"
                    aLongerInvalidTweet
                    yield! longerData
                }

            let tryCreate tweet =
                callCount <-
                    callCount
                    + 1

                match tweet with
                | x when String.IsNullOrEmpty x -> Error "Tweet shouldn't be empty"
                | x when x.Length > 280 -> Error "Tweet shouldn't contain more than 280 characters"
                | x -> Ok(x)

            let actual = sequenceResultMv1 (Seq.map tryCreate tweets)

            Expect.equal callCount 1 "Should have called the function only 3 times"
            Expect.equal lastValue null ""

            Expect.equal
                actual
                (Error emptyTweetErrMsg)
                "traverse the sequence and return the first error"

        ftestCase "sequenceResultM with few invalid data with long data set"
        <| fun _ ->

            let mutable lastValue = null
            let mutable callCount = 0

            let longerData =
                seq {
                    for i in 1..1000 do
                        lastValue <- string i
                        yield string i
                }

            let tweets =
                seq {
                    ""
                    "Hello"
                    aLongerInvalidTweet
                    yield! longerData
                }

            let tryCreate tweet =
                callCount <-
                    callCount
                    + 1

                match tweet with
                | x when String.IsNullOrEmpty x -> Error "Tweet shouldn't be empty"
                | x when x.Length > 280 -> Error "Tweet shouldn't contain more than 280 characters"
                | x -> Ok(x)

            let actual = Seq.sequenceResultM (Seq.map tryCreate tweets)

            Expect.equal callCount 3 "Should have called the function only 3 times"
            Expect.equal lastValue null ""

            Expect.equal
                actual
                (Error emptyTweetErrMsg)
                "traverse the sequence and return the first error"

In terms of performance, this may look fine for lists of numbers but it does become a concern when it's items that take time to generate when iterating the sequence. Creating a seq { } with a Thread.Sleep for instance would show the impact of iterating a whole list.

TheAngryByrd · 2024-09-21T19:31:16Z

+    "version": "0.2.0",
+    "configurations": [
+        {
+            "name": "benchmarks",


This is nice ❤️

TheAngryByrd · 2024-09-21T19:31:27Z

+    "version": "2.0.0",
+    "tasks": [
+        {
+            "label": "build release",


1eyewonder · 2024-09-21T22:28:36Z

No, that makes sense! I can implement the mutable implementation and add the benchmarks accordingly 🫡

1eyewonder · 2024-09-22T16:45:12Z

I think this is ready for re-review @TheAngryByrd. Here are the benchmarks I got. I went with inlining since it showed slightly faster results.

brianrourkeboll · 2024-09-22T20:58:53Z

+        let mutable state = state
+        let enumerator = xs.GetEnumerator()
+
+        while Result.isOk state
+              && enumerator.MoveNext() do
+            match state, f enumerator.Current with
+            | Error e, _ -> state <- Error e
+            | Ok oks, Ok ok ->
+                state <-
+                    Seq.singleton ok
+                    |> Seq.append oks
+                    |> Ok
+            | Ok _, Error e -> state <- Error e
+
+        state


Should these technically not use use on the enumerator?

use enumerator = xs.GetEnumerator()

Using seq { yield ok; yield! oks } and calling Seq.rev at the end is significantly (3–4×) faster on its own than Seq.singleton + Seq.append. The difference is much greater when you actually iterate the resulting sequences, where using a sequence expression is up to ~333× as fast and allocates as little as 1⁄250^th as much.

Seq.singleton + Seq.append as used here, or seq { yield! oks; yield ok }, can result in a stack overflow on iteration if the input sequence is long enough and f is defined in another assembly.

Suggested change

let mutable state = state

let enumerator = xs.GetEnumerator()

while Result.isOk state

&& enumerator.MoveNext() do

match state, f enumerator.Current with

| Error e, _ -> state <- Error e

| Ok oks, Ok ok ->

state <-

Seq.singleton ok

|> Seq.append oks

|> Ok

| Ok _, Error e -> state <- Error e

state

match state with

| Error _ -> state

| Ok oks ->

use enumerator = xs.GetEnumerator()

let rec loop oks =

if enumerator.MoveNext() then

match f enumerator.Current with

| Ok ok -> loop (seq { yield ok; yield! oks })

| Error e -> Error e

else

Ok(Seq.rev oks)

loop oks

Benchmarks

Source: https://gist.github.com/brianrourkeboll/6febebc4849708071eb08511491019b1

| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio | |---------------------- |---------------:|--------------:|--------------:|------:|--------:|-----------:|-----------:|----------:|-------------:|------------:| | SeqFuncs | 16.938 us | 0.2691 us | 0.2385 us | 1.00 | 0.02 | 10.1929 | 5.0354 | - | 125.05 KB | 1.00 | | SeqExpr | 5.261 us | 0.0831 us | 0.0737 us | 0.31 | 0.01 | 4.4708 | 1.1749 | - | 54.81 KB | 0.44 | | | | | | | | | | | | | | SeqFuncs_Million | 336,470.000 us | 6,425.8669 us | 6,598.8922 us | 1.00 | 0.03 | 12500.0000 | 12000.0000 | 2500.0000 | 125000.84 KB | 1.00 | | SeqExpr_Million | 74,218.799 us | 1,441.9837 us | 1,716.5792 us | 0.22 | 0.01 | 5428.5714 | 5285.7143 | 1142.8571 | 54687.9 KB | 0.44 | | | | | | | | | | | | | | SeqFuncs_Iter_Million | 338,091.889 us | 4,229.8594 us | 3,749.6602 us | 1.00 | 0.02 | 12500.0000 | 12000.0000 | 2500.0000 | 125000.86 KB | 1.00 | | SeqExpr_Iter_Million | 73,353.986 us | 1,465.0880 us | 1,744.0832 us | 0.22 | 0.01 | 5428.5714 | 5285.7143 | 1142.8571 | 54687.92 KB | 0.44 | | | | | | | | | | | | | | SeqFuncs_Iter | 9,771.981 us | 104.2526 us | 97.5180 us | 1.000 | 0.01 | 2578.1250 | 359.3750 | 140.6250 | 31718.95 KB | 1.000 | | SeqExpr_Iter | 26.320 us | 0.5006 us | 0.6148 us | 0.003 | 0.00 | 9.9182 | 1.5564 | - | 121.75 KB | 0.004 |

Thanks for reviewing this! I learned a lot with just a few comments.

I never knew the enumerator incorporated IDisposable since I actually have never had a need for it until now 😅

Stupid question but why does iterating after the sequence is made make such a difference between the function and expression? Is there any reading material you could by chance point to, if it exists?

Since it sounds like stack overflow is inevitable here, do you think adding a xml comment warning about this would be the only thing we can do to help here? Or maybe you have some additional suggestions/ideas?

Actually ignore point 2, it just clicked for me 🤦

Since it sounds like stack overflow is inevitable here

The version I posted (using seq { yield ok; yield! oks } and Seq.rev at the end) doesn't blow the stack; using seq { yield! oks; yield ok } instead (note the order of the yield! and yield), would, though.

… reviewer suggestions

1eyewonder · 2024-09-22T22:38:19Z

Here are the updated benchmarks after making the changes @brianrourkeboll recommended. I'm hoping to not make it too cluttered, but I felt it was good information to keep historical record of if people come looking back at this later. I'm open to recommendations on styling/organization.

TheAngryByrd · 2024-09-23T13:47:23Z

Paket Lock Diff Report

This report was generated via Paket Lock Diff

Additions - (1)

Benchmarks - (1)
- Gee.External.Capstone - 2.3.0

Removals - (6)

Benchmarks - (6)

Version Upgrades - (15)

Benchmarks - (15)

Version Downgrades - (0)

TheAngryByrd

Thank you so much for this! Excellent work :)

@1eyewonder

- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder

@1eyewonder

- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder

@1eyewonder

- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder

1eyewonder · 2024-09-23T14:23:50Z

Thank you both for the patience and reviews! 😄

1eyewonder added 5 commits September 17, 2024 17:21

Added POC functions

3442e14

benchmark updates

6875038

Refactored functions & added unit tests

d53d87f

Added documentation

5ae5f49

Added more benchmarks

c9a1152

Updated unit tests to fix transpiling issues

a7753f0

1eyewonder marked this pull request as ready for review September 18, 2024 17:14

1eyewonder marked this pull request as draft September 18, 2024 17:16

1eyewonder marked this pull request as ready for review September 18, 2024 17:17

1eyewonder changed the title ~~POC - Adding Seq.traverse & sequence functions~~ Adding Seq.traverse & sequence functions Sep 18, 2024

TheAngryByrd requested changes Sep 21, 2024

View reviewed changes

1eyewonder added 3 commits September 22, 2024 11:21

Updated Seq.fooM functions to have an early exit condition

31212f4

Updated benchmark functions to be more accurate of actual behavior

d46f132

Inlined base traverse functions

7821136

brianrourkeboll reviewed Sep 22, 2024

View reviewed changes

Updated seq functions to seq expressions for improved performance per…

3e401ae

… reviewer suggestions

TheAngryByrd approved these changes Sep 23, 2024

View reviewed changes

TheAngryByrd merged commit a48180d into demystifyfp:master Sep 23, 2024

TheAngryByrd added a commit that referenced this pull request Sep 23, 2024

Bump version to 4.17.0-beta001

ad8e314

- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder

TheAngryByrd added a commit that referenced this pull request Sep 23, 2024

Bump version to 4.17.0-beta002

9be8004

- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder

TheAngryByrd added a commit that referenced this pull request Sep 23, 2024

Bump version to 4.17.0

a456aaa

- [Adding Seq.traverse & sequence functions](#277) Credits @1eyewonder

1eyewonder mentioned this pull request Dec 15, 2024

[v5] Updated Uses of Seq.Append #290

Merged

4 tasks

1eyewonder deleted the seq-additions branch January 28, 2025 16:21

1eyewonder mentioned this pull request Feb 17, 2025

feat(Seq.traverse/sequence*)!: Yield arrays #310

Merged

3 tasks

Conversation

1eyewonder commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Types of changes

Checklist

Further comments

Uh oh!

1eyewonder commented Sep 18, 2024

Uh oh!

TheAngryByrd left a comment

Choose a reason for hiding this comment

Uh oh!

TheAngryByrd Sep 21, 2024

Choose a reason for hiding this comment

Uh oh!

TheAngryByrd Sep 21, 2024

Choose a reason for hiding this comment

Uh oh!

1eyewonder commented Sep 21, 2024

Uh oh!

1eyewonder commented Sep 22, 2024

Uh oh!

brianrourkeboll Sep 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Benchmarks

Uh oh!

1eyewonder Sep 22, 2024

Choose a reason for hiding this comment

Uh oh!

1eyewonder Sep 22, 2024

Choose a reason for hiding this comment

Uh oh!

brianrourkeboll Sep 22, 2024

Choose a reason for hiding this comment

Uh oh!

1eyewonder commented Sep 22, 2024

Uh oh!

TheAngryByrd commented Sep 23, 2024

Paket Lock Diff Report

Additions - (1)

Removals - (6)

Version Upgrades - (15)

Version Downgrades - (0)

Uh oh!

TheAngryByrd left a comment

Choose a reason for hiding this comment

Uh oh!

1eyewonder commented Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1eyewonder commented Sep 17, 2024 •

edited

Loading

brianrourkeboll Sep 22, 2024 •

edited

Loading