From 7a27ae4232b87157c46399baab538e5a589639b9 Mon Sep 17 00:00:00 2001 From: Alexander Falzberger Date: Tue, 31 Mar 2020 16:33:08 +0200 Subject: [PATCH 1/3] Fixing some typos in mdbook and readme files --- mdbook/src/chapter_0/chapter_0_1.md | 2 +- mdbook/src/chapter_1/chapter_1_2.md | 2 +- mdbook/src/chapter_2/chapter_2_7.md | 4 ++-- mdbook/src/chapter_4/chapter_4_1.md | 2 +- mdbook/src/introduction.md | 2 +- server/README.md | 2 +- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/mdbook/src/chapter_0/chapter_0_1.md b/mdbook/src/chapter_0/chapter_0_1.md index 69aec0711..999d24d3f 100644 --- a/mdbook/src/chapter_0/chapter_0_1.md +++ b/mdbook/src/chapter_0/chapter_0_1.md @@ -1,6 +1,6 @@ ## Step 1: Write a program. -You write differential dataflow programs against apparently static input collections, with operations that look a bit like database (SQL) or big data (MapReduce) idioms. This is actually a bit of a trick, because you will have the ablity to change the input data, but we'll pretend we don't know that yet. +You write differential dataflow programs against apparently static input collections, with operations that look a bit like database (SQL) or big data (MapReduce) idioms. This is actually a bit of a trick, because you will have the ability to change the input data, but we'll pretend we don't know that yet. Let's write a program with one input: a collection `manages` of pairs `(manager, person)` describing people and their direct reports. Our program will determine for each person their manager's manager (where the boss manages the boss's own self). If you are familiar with SQL, this is an "equijoin", and we will write exactly that in differential dataflow. diff --git a/mdbook/src/chapter_1/chapter_1_2.md b/mdbook/src/chapter_1/chapter_1_2.md index ff3a6835d..6dcea40f4 100644 --- a/mdbook/src/chapter_1/chapter_1_2.md +++ b/mdbook/src/chapter_1/chapter_1_2.md @@ -92,6 +92,6 @@ The `reduce` operator applies to one input collection whose records have the for There are some subtle details here, ones that will likely trip you up (as they trip up me): -The second and third arguments (the input and output, here `packages` and `duplicates`) contain pairs `(val, count)`. This is great when we want to count things that occur many times (in that `("word", 1000000)` is more succint than one million copies of `"word"), but in casual use we need to remember that even when we expect the numbers to be mostly one, we need to use them. +The second and third arguments (the input and output, here `packages` and `duplicates`) contain pairs `(val, count)`. This is great when we want to count things that occur many times (in that `("word", 1000000)` is more succinct than one million copies of `"word"), but in casual use we need to remember that even when we expect the numbers to be mostly one, we need to use them. In actual fact the input (`packages`) contains pairs of type `(&Val, Count)`, which in Rust-isms mean that you only get to view the associated value, you do not get to take ownership of it. This means that if we want to reproduce it in the output we need to do something like `.clone()` to get a new copy. If it were a string, or had other allocated data behind it, our read-only access to that data means we need to spend the time to create new copies for the output. \ No newline at end of file diff --git a/mdbook/src/chapter_2/chapter_2_7.md b/mdbook/src/chapter_2/chapter_2_7.md index 2cfaa2e63..4ab8b47c2 100644 --- a/mdbook/src/chapter_2/chapter_2_7.md +++ b/mdbook/src/chapter_2/chapter_2_7.md @@ -2,7 +2,7 @@ The `iterate` operator takes a starting input collection and a closure to repeatedly apply to this input collection. The output of the iterate operator is the collection that results from an unbounded number of applications of this closure to the input. Ideally this process converges, as otherwise the computation will run forever! -As an example, we can take our `manages` relation and determine for all employees all managers above them in the organizational chat. To do this, we start from the `manages` relation and write a closure that extends any transitive management pairs by "one hop" along the management relation, usig a join operation. +As an example, we can take our `manages` relation and determine for all employees all managers above them in the organizational chat. To do this, we start from the `manages` relation and write a closure that extends any transitive management pairs by "one hop" along the management relation, using a join operation. ```rust,no_run manages // transitive contains (manager, person) for many hops. @@ -26,7 +26,7 @@ Although the first three lines of the closure may look like our skip-level manag ### Enter -The `enter` operator is a helpful method that brings collections outside a loop into the loop, unchanging as the iterations procede. +The `enter` operator is a helpful method that brings collections outside a loop into the loop, unchanging as the iterations proceed. In the example above, we could rewrite diff --git a/mdbook/src/chapter_4/chapter_4_1.md b/mdbook/src/chapter_4/chapter_4_1.md index 140edde57..738aa54d1 100644 --- a/mdbook/src/chapter_4/chapter_4_1.md +++ b/mdbook/src/chapter_4/chapter_4_1.md @@ -4,7 +4,7 @@ Graph computation covers a lot of ground, and we will pick just one example here Imagine you have a collection containing pairs `(source, target)` of graph edges, and you would like to determine which nodes can reach which other nodes along graph edges (using either direction). -One algorithm for this graph connectively is "label propagation", in which each graph node maintains a label (initially its own name) and all nodes repeatedly exchange labels and maintain the smallest label they have yet seen. This process converges to a limit where each node has the smallest label in its connected component. +One algorithm for this graph connectivity is "label propagation", in which each graph node maintains a label (initially its own name) and all nodes repeatedly exchange labels and maintain the smallest label they have yet seen. This process converges to a limit where each node has the smallest label in its connected component. Let's write this computation starting from a collection `edges`, using differential dataflow. diff --git a/mdbook/src/introduction.md b/mdbook/src/introduction.md index c73832a53..6daf96810 100644 --- a/mdbook/src/introduction.md +++ b/mdbook/src/introduction.md @@ -8,4 +8,4 @@ This relatively simple set-up, write programs and then change inputs, leads to a --- -Differential dataflow arose from [work at Microsoft Research](https://www.microsoft.com/en-us/research/wp-content/uploads/2013/11/naiad_sosp2013.pdf), where we aimed to build a high-level framework that could both compute and incrementally maintain non-trivial algoithms. \ No newline at end of file +Differential dataflow arose from [work at Microsoft Research](https://www.microsoft.com/en-us/research/wp-content/uploads/2013/11/naiad_sosp2013.pdf), where we aimed to build a high-level framework that could both compute and incrementally maintain non-trivial algorithms. \ No newline at end of file diff --git a/server/README.md b/server/README.md index da5d99cdb..7072c63dc 100644 --- a/server/README.md +++ b/server/README.md @@ -65,7 +65,7 @@ The first line you'll see may look like so (it will depend on the performance of delays: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 6, 16, 18, 64, 66, 0, 9, 0, 0, 0, 0, 0, 0, 0] -These counts report the number of observed latencies for each power-of-two number of microseconds. It seems that the lowest latency here is `(1 << 17)` microsecends, or roughly 131 milliseconds. That is a large number, but what is going on here is that the first line is the `degr_dist` computation catching up on historical data. Subsequent lines should look better: +These counts report the number of observed latencies for each power-of-two number of microseconds. It seems that the lowest latency here is `(1 << 17)` microseconds, or roughly 131 milliseconds. That is a large number, but what is going on here is that the first line is the `degr_dist` computation catching up on historical data. Subsequent lines should look better: delays: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] delays: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] From a5cf5fb4a63196a05ead31e48e85fb3f5dd5eb62 Mon Sep 17 00:00:00 2001 From: Alexander Falzberger Date: Wed, 1 Apr 2020 11:24:45 +0200 Subject: [PATCH 2/3] Fixing introductory examples in mdbook --- mdbook/src/chapter_0/chapter_0_1.md | 20 +++---- mdbook/src/chapter_0/chapter_0_2.md | 90 ++++++++++++++--------------- mdbook/src/chapter_a/chapter_a_2.md | 10 ++-- mdbook/src/chapter_a/chapter_a_3.md | 12 ++-- 4 files changed, 66 insertions(+), 66 deletions(-) diff --git a/mdbook/src/chapter_0/chapter_0_1.md b/mdbook/src/chapter_0/chapter_0_1.md index 999d24d3f..81a9f84dd 100644 --- a/mdbook/src/chapter_0/chapter_0_1.md +++ b/mdbook/src/chapter_0/chapter_0_1.md @@ -59,16 +59,16 @@ When we execute this program we get to see the skip-level reports for the small Echidnatron% cargo run -- 10 Running `target/debug/my_project` - ((0, 0, 0), (Root, 0), 1) - ((0, 0, 1), (Root, 0), 1) - ((1, 0, 2), (Root, 0), 1) - ((1, 0, 3), (Root, 0), 1) - ((2, 1, 4), (Root, 0), 1) - ((2, 1, 5), (Root, 0), 1) - ((3, 1, 6), (Root, 0), 1) - ((3, 1, 7), (Root, 0), 1) - ((4, 2, 8), (Root, 0), 1) - ((4, 2, 9), (Root, 0), 1) + ((0, (0, 0)), 0, 1) + ((0, (0, 1)), 0, 1) + ((1, (0, 2)), 0, 1) + ((1, (0, 3)), 0, 1) + ((2, (1, 4)), 0, 1) + ((2, (1, 5)), 0, 1) + ((3, (1, 6)), 0, 1) + ((3, (1, 7)), 0, 1) + ((4, (2, 8)), 0, 1) + ((4, (2, 9)), 0, 1) Echidnatron% This is a bit crazy, but what we are seeing is many triples of the form diff --git a/mdbook/src/chapter_0/chapter_0_2.md b/mdbook/src/chapter_0/chapter_0_2.md index ac9a910b3..2fb18f8c2 100644 --- a/mdbook/src/chapter_0/chapter_0_2.md +++ b/mdbook/src/chapter_0/chapter_0_2.md @@ -35,65 +35,65 @@ We do this for each of the non-boss employees and get to see a bunch of outputs. Echidnatron% cargo run -- 10 Running `target/debug/my_project` - ((0, 0, 0), (Root, 0), 1) - ((0, 0, 1), (Root, 0), 1) - ((0, 0, 2), (Root, 2), 1) - ((1, 0, 2), (Root, 0), 1) - ((1, 0, 2), (Root, 2), -1) - ((1, 0, 3), (Root, 0), 1) - ((1, 0, 4), (Root, 4), 1) - ((1, 0, 5), (Root, 5), 1) - ((2, 0, 4), (Root, 2), 1) - ((2, 0, 4), (Root, 4), -1) - ((2, 0, 5), (Root, 2), 1) - ((2, 0, 5), (Root, 5), -1) - ((2, 0, 6), (Root, 6), 1) - ((2, 0, 7), (Root, 7), 1) - ((2, 0, 8), (Root, 8), 1) - ((2, 1, 4), (Root, 0), 1) - ((2, 1, 4), (Root, 2), -1) - ((2, 1, 5), (Root, 0), 1) - ((2, 1, 5), (Root, 2), -1) - ((3, 1, 6), (Root, 0), 1) - ((3, 1, 6), (Root, 6), -1) - ((3, 1, 7), (Root, 0), 1) - ((3, 1, 7), (Root, 7), -1) - ((3, 1, 9), (Root, 9), 1) - ((4, 1, 8), (Root, 4), 1) - ((4, 1, 8), (Root, 8), -1) - ((4, 1, 9), (Root, 4), 1) - ((4, 1, 9), (Root, 9), -1) - ((4, 2, 8), (Root, 0), 1) - ((4, 2, 8), (Root, 4), -1) - ((4, 2, 9), (Root, 0), 1) - ((4, 2, 9), (Root, 4), -1) + ((0, (0, 0)), 0, 1) + ((0, (0, 1)), 0, 1) + ((0, (0, 2)), 2, 1) + ((1, (0, 2)), 0, 1) + ((1, (0, 2)), 2, -1) + ((1, (0, 3)), 0, 1) + ((1, (0, 4)), 4, 1) + ((1, (0, 5)), 5, 1) + ((2, (0, 4)), 2, 1) + ((2, (0, 4)), 4, -1) + ((2, (0, 5)), 2, 1) + ((2, (0, 5)), 5, -1) + ((2, (0, 6)), 6, 1) + ((2, (0, 7)), 7, 1) + ((2, (0, 8)), 8, 1) + ((2, (1, 4)), 0, 1) + ((2, (1, 4)), 2, -1) + ((2, (1, 5)), 0, 1) + ((2, (1, 5)), 2, -1) + ((3, (1, 6)), 0, 1) + ((3, (1, 6)), 6, -1) + ((3, (1, 7)), 0, 1) + ((3, (1, 7)), 7, -1) + ((3, (1, 9)), 9, 1) + ((4, (1, 8)), 4, 1) + ((4, (1, 8)), 8, -1) + ((4, (1, 9)), 4, 1) + ((4, (1, 9)), 9, -1) + ((4, (2, 8)), 0, 1) + ((4, (2, 8)), 4, -1) + ((4, (2, 9)), 0, 1) + ((4, (2, 9)), 4, -1) Echidnatron% Gaaaaaaah! What in the !#$!? -It turns out our input changes result in output changes. Let's try and break this down and make some sense. If we group the columns by time, those `(Root, _)` fields, we see a bit more structure. +It turns out our input changes result in output changes. Let's try and break this down and make some sense. If we group the columns by time, the second element of the tuples, we see a bit more structure. -1. The `(Root, 0)` entries are exactly the same as for our prior computation, where we just loaded the data. +1. The entries with time `0` are exactly the same as for our prior computation, where we just loaded the data. -2. There aren't any `(Root, 1)` entries (go check). That is because the input didn't change in our first step, because 1/2 == 1/3 == 0. Since the input didn't change, the output doesn't change. +2. There aren't any entries at time `1` (go check). That is because the input didn't change in our first step, because 1/2 == 1/3 == 0. Since the input didn't change, the output doesn't change. 3. The other times are more complicated. -Let's look at times `(Root, 4)`. +Let's look at the entries for time `4`. - ((1, 0, 4), (Root, 4), 1) - ((2, 0, 4), (Root, 4), -1) - ((4, 1, 8), (Root, 4), 1) - ((4, 1, 9), (Root, 4), 1) - ((4, 2, 8), (Root, 4), -1) - ((4, 2, 9), (Root, 4), -1) + ((1, (0, 4)), 4, 1) + ((2, (0, 4)), 4, -1) + ((4, (1, 8)), 4, 1) + ((4, (1, 9)), 4, 1) + ((4, (2, 8)), 4, -1) + ((4, (2, 9)), 4, -1) There is a bit going on here. Four's manager changed from two to one, and while their skip-level manager remained zero the explanation changed. The first two lines record this change. The next four lines record the change in the skip-level manager of four's reports, eight and nine. -At the end, `(Root, 9)`, things are a bit simpler because we have reached the employees with no reports, and so the only changes are their skip-level manager, without any implications for other people. +At the end, time `9`, things are a bit simpler because we have reached the employees with no reports, and so the only changes are their skip-level manager, without any implications for other people. - ((3, 1, 9), (Root, 9), 1) - ((4, 1, 9), (Root, 9), -1) + ((3, (1, 9)), 9, 1) + ((4, (1, 9)), 9, -1) Oof. Well, we probably *could* have figured these things out by hand, right? diff --git a/mdbook/src/chapter_a/chapter_a_2.md b/mdbook/src/chapter_a/chapter_a_2.md index bfe7f6ba1..fab3500b1 100644 --- a/mdbook/src/chapter_a/chapter_a_2.md +++ b/mdbook/src/chapter_a/chapter_a_2.md @@ -5,7 +5,7 @@ Differential dataflow works great using multiple threads and computers. It even For this to work out, we'll want to ask each worker to load up a fraction of the input. If we just run the same code with multiple workers, then each of the workers will run ```rust,ignore - for person in 0 .. people { + for person in 0 .. size { input.insert((person/2, person)); } ``` @@ -16,7 +16,7 @@ Instead, each timely dataflow worker has methods `index()` and `peers()`, which ```rust,ignore let mut person = worker.index(); - while person < people { + while person < size { input.insert((person/2, person)); person += worker.peers(); } @@ -25,12 +25,12 @@ Instead, each timely dataflow worker has methods `index()` and `peers()`, which We can also make the same changes to the code that supplies the change, where each worker is responsible for those people whose number equals `worker.index()` modulo `worker.peers()`. ```rust,ignore - let mut person = index; - while person < people { + let mut person = worker.index(); + while person < size { input.remove((person/2, person)); input.insert((person/3, person)); input.advance_to(person); - person += peers; + person += worker.peers(); } ``` diff --git a/mdbook/src/chapter_a/chapter_a_3.md b/mdbook/src/chapter_a/chapter_a_3.md index 6ee756440..6b15735d9 100644 --- a/mdbook/src/chapter_a/chapter_a_3.md +++ b/mdbook/src/chapter_a/chapter_a_3.md @@ -22,7 +22,7 @@ We can then use this probe to limit the introduction of new data, by waiting for ```rust,ignore let mut person = worker.index(); - while person < people { + while person < size { input.insert((person/2, person)); person += worker.peers(); } @@ -31,7 +31,7 @@ We can then use this probe to limit the introduction of new data, by waiting for input.advance_to(1); input.flush(); while probe.less_than(&input.time()) { worker.step(); } - println!("{:?}\tdata loaded", timer.elapsed()); + println!("{:?}\tdata loaded", worker.timer().elapsed()); ``` These four new lines are each important, especially the one that prints things out. The other three do a bit of magic that get timely dataflow to work for us until we are certain that inputs have been completely processed. @@ -40,15 +40,15 @@ We can make the same changes for the interactive loading, but we'll synchronize ```rust,ignore // make changes, but await completion. - let mut person = 1 + index; - while person < people { + let mut person = 1 + worker.index(); + while person < size { input.remove((person/2, person)); input.insert((person/3, person)); input.advance_to(person); input.flush(); while probe.less_than(&input.time()) { worker.step(); } - println!("{:?}\tstep {} complete", timer.elapsed(), person); - person += peers; + println!("{:?}\tstep {} complete", worker.timer().elapsed(), person); + person += worker.peers(); } ``` From e2fb9d5f35500b8b32d4cdb1e4ca300204de4561 Mon Sep 17 00:00:00 2001 From: Alexander Falzberger Date: Thu, 2 Apr 2020 11:56:00 +0200 Subject: [PATCH 3/3] Updating mdbook to timely 0.11.1 and differential 0.11.0 --- mdbook/src/chapter_0/chapter_0_0.md | 4 ++-- mdbook/src/chapter_0/chapter_0_1.md | 4 ++-- mdbook/src/chapter_0/chapter_0_3.md | 2 +- mdbook/src/chapter_2/chapter_2_3.md | 4 ++-- mdbook/src/chapter_2/chapter_2_4.md | 6 +++--- mdbook/src/chapter_2/chapter_2_5.md | 4 ++-- mdbook/src/chapter_2/chapter_2_6.md | 2 +- mdbook/src/chapter_2/chapter_2_7.md | 6 +++--- mdbook/src/chapter_3/chapter_3_2.md | 2 +- mdbook/src/chapter_4/chapter_4_1.md | 2 +- mdbook/src/chapter_5/chapter_5_3.md | 2 -- mdbook/src/chapter_a/chapter_a_3.md | 2 +- 12 files changed, 19 insertions(+), 21 deletions(-) diff --git a/mdbook/src/chapter_0/chapter_0_0.md b/mdbook/src/chapter_0/chapter_0_0.md index f4fc71315..a8ea8c523 100644 --- a/mdbook/src/chapter_0/chapter_0_0.md +++ b/mdbook/src/chapter_0/chapter_0_0.md @@ -21,8 +21,8 @@ Instead, edit your `Cargo.toml` file, which tells Rust about your dependencies, authors = ["Your Name "] [dependencies] - timely = "0.7" - differential-dataflow = "0.7" + timely = "0.11.1" + differential-dataflow = "0.11.0" Echidnatron% You should only need to add those last two lines there, which bring in dependencies on both [timely dataflow](https://github.com/TimelyDataflow/timely-dataflow) and [differential dataflow](https://github.com/TimelyDataflow/differential-dataflow). We will be using both of those. diff --git a/mdbook/src/chapter_0/chapter_0_1.md b/mdbook/src/chapter_0/chapter_0_1.md index 81a9f84dd..60b16e51a 100644 --- a/mdbook/src/chapter_0/chapter_0_1.md +++ b/mdbook/src/chapter_0/chapter_0_1.md @@ -27,7 +27,7 @@ If you are following along at home, put this in your `src/main.rs` file. // create a new collection from our input. let manages = input.to_collection(scope); - // if (m2, m1) and (m1, p), then output (m1, m2, p) + // if (m2, m1) and (m1, p), then output (m1, (m2, p)) manages .map(|(m2, m1)| (m1, m2)) .join(&manages) @@ -50,7 +50,7 @@ If you are following along at home, put this in your `src/main.rs` file. This program has a bit of boilerplate, but at its heart it defines a new input `manages` and then joins it with itself, once the fields have been re-ordered. The intent is as stated in the comment: ```rust,no_run - // if (m2, m1) and (m1, p), then output (m1, m2, p) + // if (m2, m1) and (m1, p), then output (m1, (m2, p)) ``` We want to report each pair `(m2, p)`, and we happen to also produce as evidence the `m1` connecting them. diff --git a/mdbook/src/chapter_0/chapter_0_3.md b/mdbook/src/chapter_0/chapter_0_3.md index 68fe1f216..c8999e5fb 100644 --- a/mdbook/src/chapter_0/chapter_0_3.md +++ b/mdbook/src/chapter_0/chapter_0_3.md @@ -68,7 +68,7 @@ Instead of loading all of our changes and only waiting for the result, we can lo // create a new collection from an input session. let manages = input.to_collection(scope); - // if (m2, m1) and (m1, p), then output (m1, m2, p) + // if (m2, m1) and (m1, p), then output (m1, (m2, p)) manages .map(|(m2, m1)| (m1, m2)) .join(&manages) diff --git a/mdbook/src/chapter_2/chapter_2_3.md b/mdbook/src/chapter_2/chapter_2_3.md index 7bb921002..0a7782b66 100644 --- a/mdbook/src/chapter_2/chapter_2_3.md +++ b/mdbook/src/chapter_2/chapter_2_3.md @@ -14,7 +14,7 @@ This collection likely has at most one copy of each record, unless perhaps any m Importantly, `concat` doesn't do the hard work of ensuring that there is only one physical of each element. If we inspect the output of the `concat` above, we might see - ((0,0), (Root, 0), 1) - ((0,0), (Root, 0), 1) + ((0, 0), 0, 1) + ((0, 0), 0, 1) Although these are two updates to the same element at the same time, `concat` is a bit lazy (read: efficient) and doesn't do the hard work until we ask it. For that, we'll need the `consolidate` operator. \ No newline at end of file diff --git a/mdbook/src/chapter_2/chapter_2_4.md b/mdbook/src/chapter_2/chapter_2_4.md index a7d1d9782..f61cebce6 100644 --- a/mdbook/src/chapter_2/chapter_2_4.md +++ b/mdbook/src/chapter_2/chapter_2_4.md @@ -15,8 +15,8 @@ As an example, if we were to inspect we might see two copies of the same element: - ((0,0), (Root, 0), 1) - ((0,0), (Root, 0), 1) + ((0, 0), 0, 1) + ((0, 0), 0, 1) However, by introducing `consolidate` @@ -30,6 +30,6 @@ However, by introducing `consolidate` we are guaranteed to see at most one `(0,0)` update at each time: - ((0,0), (Root, 0), 2) + ((0, 0), 0, 2) The `consolidate` operator is mostly useful before `inspect`ing data, but it can also be important for efficiency; knowing when to spend the additional computation to consolidate the representation of your data is an advanced topic! diff --git a/mdbook/src/chapter_2/chapter_2_5.md b/mdbook/src/chapter_2/chapter_2_5.md index 2b513637c..48a45ba79 100644 --- a/mdbook/src/chapter_2/chapter_2_5.md +++ b/mdbook/src/chapter_2/chapter_2_5.md @@ -1,6 +1,6 @@ ## The Join Operator -The `join` operator takes two input collections, each of which must have records with a `(key, value)` structure, and must have the same type of `key`. For each pair of elements with matching key, one from each input, the join operator produces the output `(key, value1, value2)`. +The `join` operator takes two input collections, each of which must have records with a `(key, value)` structure, and must have the same type of `key`. For each pair of elements with matching key, one from each input, the join operator produces the output `(key, (value1, value2))`. Our example from earlier uses a join to match up pairs `(m2, m1)` and `(m1, p)` when the `m1` is in common. To do this, we first have to switch the records in the first collection around, so that they are keyed by `m1` instead of `m2`. @@ -11,4 +11,4 @@ Our example from earlier uses a join to match up pairs `(m2, m1)` and `(m1, p)` .inspect(|x| println!("{:?}", x)); ``` -The join operator multiplies frequencies, so if a record `(key, val1)` has multiplicity five, and a matching record `(key, val2)` has multiplicity three, the output result will be `(key, val1, val2)` with multiplicity fifteen. +The join operator multiplies frequencies, so if a record `(key, val1)` has multiplicity five, and a matching record `(key, val2)` has multiplicity three, the output result will be `(key, (val1, val2))` with multiplicity fifteen. diff --git a/mdbook/src/chapter_2/chapter_2_6.md b/mdbook/src/chapter_2/chapter_2_6.md index bc3310443..ec8fbb0e5 100644 --- a/mdbook/src/chapter_2/chapter_2_6.md +++ b/mdbook/src/chapter_2/chapter_2_6.md @@ -11,7 +11,7 @@ For example, to produce for each manager their managee with the lowest identifie // Each element of input is a `(&Value, Count)` for index in 1 .. input.len() { - if input[min_index] > input[index].0 { + if input[min_index].0 > input[index].0 { min_index = index; } } diff --git a/mdbook/src/chapter_2/chapter_2_7.md b/mdbook/src/chapter_2/chapter_2_7.md index 4ab8b47c2..f593e1ff7 100644 --- a/mdbook/src/chapter_2/chapter_2_7.md +++ b/mdbook/src/chapter_2/chapter_2_7.md @@ -10,7 +10,7 @@ As an example, we can take our `manages` relation and determine for all employee transitive .map(|(mk, m1)| (m1, mk)) .join(&transitive) - .map(|(m1, mk, p)| (mk, p)) + .map(|(m1, (mk, p))| (mk, p)) .concat(&transitive) .distinct() }); @@ -34,12 +34,12 @@ In the example above, we could rewrite manages // transitive contains (manager, person) for many hops. .iterate(|transitive| { - let manages = manages.enter(transivite.scope()); + let manages = manages.enter(transitive.scope()); transitive .map(|(mk, m1)| (m1, mk)) .join(&manages) - .map(|(m1, mk, p)| (mk, p)) + .map(|(m1, (mk, p))| (mk, p)) .concat(&manages) .distinct() }); diff --git a/mdbook/src/chapter_3/chapter_3_2.md b/mdbook/src/chapter_3/chapter_3_2.md index 109637800..f71fcef36 100644 --- a/mdbook/src/chapter_3/chapter_3_2.md +++ b/mdbook/src/chapter_3/chapter_3_2.md @@ -13,7 +13,7 @@ For example, recall our example of interacting with our management computation, // create a new collection from an input session. let manages = input.to_collection(scope); - // if (m2, m1) and (m1, p), then output (m1, m2, p) + // if (m2, m1) and (m1, p), then output (m1, (m2, p)) manages .map(|(m2, m1)| (m1, m2)) .join(&manages) diff --git a/mdbook/src/chapter_4/chapter_4_1.md b/mdbook/src/chapter_4/chapter_4_1.md index 738aa54d1..fadce2b49 100644 --- a/mdbook/src/chapter_4/chapter_4_1.md +++ b/mdbook/src/chapter_4/chapter_4_1.md @@ -18,7 +18,7 @@ Let's write this computation starting from a collection `edges`, using different let labels = labels.enter(inner.scope()); let edges = edges.enter(inner.scope()); inner.join(&edges) - .map(|(_src,lbl,dst)| (dst,lbl)) + .map(|(_src,(lbl,dst))| (dst,lbl)) .concat(&labels) .reduce(|_dst, lbls, out| { let min_lbl = diff --git a/mdbook/src/chapter_5/chapter_5_3.md b/mdbook/src/chapter_5/chapter_5_3.md index 27c5602de..7d41bbb0c 100644 --- a/mdbook/src/chapter_5/chapter_5_3.md +++ b/mdbook/src/chapter_5/chapter_5_3.md @@ -2,8 +2,6 @@ Arrangements have the additional appealing property that they can be shared not only within a dataflow, but *across* dataflows. -Imagine we take our `knows` collection from before, and want to make it available for others to use - Imagine we want to build and maintain a relatively large and continually changing collection. But we want to do this in a way that allows an arbitrary number of subsequent queries to access the collection at almost no additional cost. The following example demonstrates going from an interactive input session (`input`) to an arrangement (`trace`) returned from the dataflow and available for use by others. diff --git a/mdbook/src/chapter_a/chapter_a_3.md b/mdbook/src/chapter_a/chapter_a_3.md index 6b15735d9..b87d2e55f 100644 --- a/mdbook/src/chapter_a/chapter_a_3.md +++ b/mdbook/src/chapter_a/chapter_a_3.md @@ -9,7 +9,7 @@ Instead of loading all of our changes and only waiting for the result, we can lo // create a new collection from an input session. let manages = input.to_collection(scope); - // if (m2, m1) and (m1, p), then output (m1, m2, p) + // if (m2, m1) and (m1, p), then output (m1, (m2, p)) manages .map(|(m2, m1)| (m1, m2)) .join(&manages)