Skip to content

Commit bd5eacd

Browse files
author
Matthew Forrester
committed
Initial Commit
0 parents  commit bd5eacd

File tree

83 files changed

+24627
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+24627
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/target
2+
**/*.rs.bk

Cargo.lock

Lines changed: 420 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
[package]
2+
name = "wv_linewise"
3+
version = "0.1.0"
4+
authors = ["Matthew Forrester <matthew.forrester@speechmarks.com>"]
5+
edition = "2018"
6+
7+
[dependencies]
8+
web-view = { version = "0.6.2", features = ["edge"] }
9+
serde_json = { version = "1.0", default-features = false, features = ["alloc"] }
10+
serde_derive = "1.0.105"
11+
serde = { version = "1.0", features = ["derive"] }
12+
clap = "2.33.0"

README.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# WV Linewise
2+
3+
## The Potential User
4+
5+
You're a software developer and UNIX system administrator.
6+
7+
You loooove the command line.
8+
9+
You have probably looked at GNU Plot because it could draw graphs on the terminal but feel it's output isn't suitable for you.
10+
11+
Also you write your SQL in VIM and have it integrated with TMUX to send your query to `psql` you get mildly annoyed because of the lack of horizontal scrolling.
12+
13+
Maybe you use i3...
14+
15+
You write software for a living, sometimes you write great code. Your code is sometimes deployed into big integrated projects but sometimes it's only about getting data from A to B, a one off and while it works, you wonder if `BLAH BLAH | SOMETHING | X | jq BLAH | Y | Z` might have worked, and may have been faster if only you could add some interactivity to X.
16+
17+
You need one foot in both worlds... how...
18+
19+
## What it does
20+
21+
[Boscop's Web-View](https://github.com/Boscop/web-view) has provided us with a lightweight library for showing HTML/CSS/JS within a window, using the OS's standard browser. If you know HTML/CSS/JS but shudder at the weight of Electron, this may be your thing. As a bonus, it's written in Rust too!
22+
23+
This project provides some Rust functions for streaming STDIN or files into Boscop's Web-View and a higher level TypeScript / JavaScript API to pull them in as a stream of lines. You can also write to STDOUT using this API enabling you to put something like a web page, right in the middle of a UNIX pipeline. I find this quite exciting.
24+
25+
Can you think of any things which you would like to do using UNIX pipes which you think you may want to add some interactivity or graphics to? I can think of many...
26+
27+
## Example
28+
29+
This program could be invoked like the following:
30+
31+
32+
# Prepare lookups.csv
33+
echo '"First Name","Telephone Number"' > lookups.csv
34+
echo 'Alice,"01922 123 456"' >> lookups.csv
35+
echo 'Ben,"0800 100 232"' >> lookups.csv
36+
echo 'Jack,"01882 556216"' >> lookups.csv
37+
38+
# write a file that we will pipe to stdin
39+
echo 'Name,Age' > stdin.csv
40+
echo 'Ben,21' >> stdin.csv
41+
echo 'Alice,31' >> stdin.csv
42+
echo 'Jane,27' >> stdin.csv
43+
44+
cat stdin.csv | wv-linewise \
45+
--code tables \
46+
--stream this_is_stdin=- \
47+
--stream lookups=lookups.tsv
48+
--param this_is_stdin='["Name"]' \
49+
--param lookups='["First Name"]' \
50+
--param request_count=2
51+
52+
53+
The data input above would cause the following messages to be sent between WV Linewise and the embedded web page.
54+
55+
< { "msg": "params" }
56+
> { "type": "params", "params": [{ "name": "this_is_stdin", value: "[\"Name\"]"}, {"name": "lookups": "value": "[\"First Name\"]"}] }
57+
< {"msg":"streamList"}
58+
> {"streams":["this_is_stdin","lookups"],"type":"streamList"}
59+
< { "msg": "out", "descriptor": 1, "data": "Line Number,Name,Age,Telephone Number" }
60+
< { "msg": "streamStart", "name": "this_is_stdin", "count": 2 }
61+
< { "msg": "streamStart", "name": "lookups", "count": 2 }
62+
> { "type": "details", "name": "this_is_stdin", "details": { "rewindable": false }
63+
> { "type": "details", "name": "lookups", "details": { "rewindable": false }
64+
> { "type": "line", "name": "this_is_stdin", "data": "Name,Age" }
65+
> { "type": "line", "name": "this_is_stdin", "data": "Ben,21" }
66+
> { "type": "paused", "name": "this_is_stdin" }
67+
> { "type": "line", "name": "lookups", "data": "\"First Name\",\"Telephone Number\"" }
68+
< { "msg": "streamContinue", "name": "this_is_stdin" }
69+
> { "type": "line", "name": "lookups", "data": "Alice,\"01922 123 456\"" }
70+
> { "type": "paused", "name": "lookups" }
71+
> { "type": "line", "name": "this_is_stdin", "data": "Alice,31" }
72+
< { "msg": "out", "descriptor": 1, "data": "2,Alice,31,\"01922 123 456\"" }
73+
> { "type": "line", "name": "this_is_stdin", "data": "Jane,27" }
74+
> { "type": "finished", "name": "this_is_stdin" }
75+
< { "msg": "streamContinue", "name": "lookups" }
76+
> { "type": "line", "name": "lookups", "data": "Ben,\"0800 100 232\"" }
77+
< { "msg": "out", "descriptor": 1, "data": "1,Ben,21,\"0800 100 232"" }
78+
> { "type": "line", "name": "lookups", "data": "Jack,\"01882 556216\"" }
79+
> { "type": "finished", "name": "lookups" }
80+
< { "msg": "exit", "status": 0 }
81+
82+
NOTE: `<` are messages from TypeScript / JavaScript to WV Linewise, `>` are the responses.
83+
84+
WV Linewise will then exit with a status code of 0 and the following data will have already been written to STDOUT:
85+
86+
Line Number,Name,Age,Telephone Number
87+
2,Alice,31,"01922 123 456"
88+
1,Ben,21,"0800 100 232"
89+
90+
91+
## APIs
92+
93+
There are two TypeScript / JavaScript APIs I created to control the sending / receiving of messages. These are listed below:
94+
95+
### The original Light Wrapper API
96+
97+
If for whatever reason you don't want to use the Buffer API, the original API is a light weight message / event based API. In writing it I was merely trying to add some type safety over the raw messages. See the example below
98+
99+
```typescript
100+
import { RawWvLinewise, WvLinewise, RESPONSE_TYPE, MessageErrorResponse, ErrorResponse, ParamsResponse, LineResponse, PausedResponse } from "wv-linewise-js-lib";
101+
102+
async function processLightWeight() {
103+
104+
let lineCount = 0;
105+
const wvl: WvLinewise = new WvLinewise(new RawWvLinewise(external as any));
106+
107+
// Upon error, just raise it so it's caught by the global error handler.
108+
wvl.on(RESPONSE_TYPE.MESSAGE_ERROR, (msg: MessageErrorResponse) => {
109+
throw new Error(`MSG ERROR: ${JSON.stringify(msg)}`)
110+
});
111+
112+
// Upon error, just raise it so it's caught by the global error handler.
113+
wvl.on(RESPONSE_TYPE.ERROR, (msg: ErrorResponse) => {
114+
throw new Error(`MSG ERROR: ${JSON.stringify(msg)}`)
115+
});
116+
117+
// Request the parameters the user passed in on the command line
118+
function getParams(wvl: WvLinewise): Promise<ParamsResponse> {
119+
return new Promise((resolve) => {
120+
let f = (resp: ParamsResponse) => {
121+
resolve(resp);
122+
};
123+
wvl.once(RESPONSE_TYPE.PARAMS, f);
124+
wvl.requestParams();
125+
});
126+
}
127+
128+
function getRequestQuantity(paramsResponse: ParamsResponse): number {
129+
for (let p of paramsResponse.params) {
130+
if (p.name == "quantity") {
131+
return parseInt(p.value, 10);
132+
}
133+
}
134+
return 1000;
135+
}
136+
137+
// Because all of our code is blocking (we're not waiting for animations etc)
138+
// we're going to have processed the data immediately, so when WV Linewise
139+
// pauses we can just start it right up again.
140+
wvl.on(RESPONSE_TYPE.PAUSED, (resp: PausedResponse) => {
141+
if (resp.name == "in") {
142+
wvl.streamContinue("in");
143+
}
144+
});
145+
146+
// This function will get fired on every line, with the line that came from
147+
// the "in" stream, which could be STDIN or a file.
148+
wvl.on(RESPONSE_TYPE.LINE, (resp: LineResponse) => {
149+
if (resp.name == "in") {
150+
lineCount = lineCount + 1;
151+
}
152+
document.body.innerText = `The file has ${lineCount} lines`
153+
});
154+
155+
156+
// Start WV Linewise processing lines
157+
wvl.streamStart("in", getRequestQuantity(await getParams(wvl)));
158+
159+
}
160+
161+
processLightWeight();
162+
```
163+
164+
### The buffer API
165+
166+
The WvLinewiseBuffer will allow you to disregard the messages for the purposes of reading the streams. It uses a low watermark and a quantity to request (3rd and 4th parameters) to try and make sure there's always lines available. Because it's built upon Promises it can handle situations where the buffer is empty and not fail.
167+
168+
```typescript
169+
async function processBuffer(wvl: WvLinewise) {
170+
171+
let buffer = new WvLinewiseBuffer(wvl, "in", 100, 200);
172+
let line: string|null = "";
173+
let lineCount = 0;
174+
175+
while (line !== null) {
176+
line = await buffer.shift();
177+
if (line === null) {
178+
continue;
179+
}
180+
document.body.innerText = `The file has ${lineCount} lines and the last line was ${line}`;
181+
}
182+
}
183+
184+
const wvl: WvLinewise = new WvLinewise(new RawWvLinewise(external as any));
185+
processBuffer(wvl);
186+
```
187+
188+
## Example Applications
189+
190+
### [Discover Types](./examples/discover-types)
191+
192+
#### About
193+
DiscoverTypes is an application for identifying the types of fields within a CSV file. It does this by comparing every cell against every regular expression the user has supplied. Because the CSV file could be HUGE it does not load the whole thing into memory but inspects it line by line as it passes through.
194+
195+
#### Screenshot
196+
197+
![What it looks like](./img/screenshot.png)
198+

examples/discover-types/.gitignore

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
2+
3+
# dependencies
4+
/node_modules
5+
/.pnp
6+
.pnp.js
7+
8+
# testing
9+
/coverage
10+
11+
# production
12+
/build
13+
14+
# misc
15+
.DS_Store
16+
.env.local
17+
.env.development.local
18+
.env.test.local
19+
.env.production.local
20+
21+
npm-debug.log*
22+
yarn-debug.log*
23+
yarn-error.log*

examples/discover-types/README.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Discover Types
2+
3+
## About
4+
5+
DiscoverTypes is an application for identifying the types of fields within a CSV file. It does this by comparing every cell against every regular expression the user has supplied. Because the CSV file could be HUGE it does not load the whole thing into memory but inspects it line by line as it passes through.
6+
7+
## Screenshot
8+
9+
![What it looks like](./img/screenshot.png)
10+
11+
## The Output
12+
13+
The following is an example of the CSV output:
14+
15+
| Column | Types | Count | Total Record Count |
16+
| ------------|-------------------------------------------|---------|-----------------------|
17+
| salesPerson | ["String"]" | 13 | 13 |
18+
| date | ["String","Full US Date","Full UK Date"]" | 8 | 13 |
19+
| date | ["NULL"]" | 1 | 13 |
20+
| date | ["String","Full UK Date"]" | 4 | 13 |
21+
| orderId | ["Integer","String"]" | 13 | 13 |
22+
| item | ["String"]" | 13 | 13 |
23+
| cost | ["String"]" | 13 | 13 |
24+
| price | ["Integer","String"]" | 10 | 13 |
25+
| price | ["String"]" | 3 | 13 |
26+
| profit | ["String"]" | 13 | 13 |
27+
28+
Every cell within a CSV could be interpreted as 0 or more different types. Generally speaking I would use the regular expression `^$` (meaning a zero length string) to be `NULL` and `.` (at least one character in length) to be a String. This means that the integer `4` might be an integer (`^\-[0-9]+`) but could also a string. This is completely correct as you might have user input of a string type, which users tend to write integers in, an example of this may be local phone numbers, which are often just numbers, but sometimes include other characters, so are forced to be strings.
29+
30+
### Why we don't prefer one type over another
31+
32+
Imagine you had 14 million lines and one column was full of dates and we preferred US dates format over the UK dates. The software would tell us we had 14 million (minus one) US dates and 1 UK date and you may conclude that the column is US dates with one error... but if all the dates were before the 12th of the month except __that__ one, you'd probably be wrong.
33+
34+
This situation is actually shown in a much more simple form above. You can see that the date column must be one of the following:
35+
36+
* A "Full US Date" and therefore has 5 records in error.
37+
* A "String" and therefore has 1 record in error.
38+
* A "Full UK Date" and therefore has 1 record in error.
39+
* **A "String" with null allowed and therefore has 0 record in error.**
40+
* **A "Full UK Date" with null allowed and therefore has 0 record in error.**
41+
42+
Given the output above it should be relatively trivial to write software, to figure out that the date column is actually a UK Date which allows nulls. But the caveat is that software needs to know to prefer "Full UK Date" over "String" which this software does not.
43+
44+
I may well add code and columns to:
45+
46+
* List out what the zero error candidates are. This would get you down to just "String" and "Full UK Date" in the above example.
47+
* Add at least one example for each row in the table above.
48+
49+
## Usage
50+
51+
```shell
52+
cat test-data/burger-shop.csv | wv-linewise --code index.html \
53+
--stream in=- \
54+
--stream types=types.csv \
55+
--param 'Full US Date'='^0*[0-9][0-2]?/[0-9]+/[0-9]{4}$' \
56+
--param 'Full UK Date'='^[0-9]+/0*[0-9][0-2]?/[0-9]{4}$' \
57+
--param mode='continuous'
58+
```
59+
60+
Because DiscoverTypes is built upon WV Linewise it's command line interface is the interface from WV Linewise... well I could wrap the program in a BASH/BAT file to make it more slick, but DiscoverTypes is actually only an example application for WV Linewise so I will not.
61+
62+
### The "in" stream ( required )
63+
64+
The "in" stream is where the actual CSV you want to inspect comes from. It could be huge...
65+
66+
If you use the special value `-` WV Linewise will read STDIN, otherwise this will be taken to be a filename and that file will be read.
67+
68+
### The "types" stream (optional)
69+
70+
The "type" stream is a CSV with the following structure.
71+
72+
| name | regexp |
73+
|---------|-----------|
74+
| Integer | ^-?[0-9]+ |
75+
| String | . |
76+
| NULL | ^$ |
77+
78+
The `name` is a types name and `regexp` is a JavaScript regular expression without the enclosing forward slashes.
79+
80+
### The "type" parameters (optional)
81+
82+
The type parameters are added to the "types" stream above with the `name` and the `regexp` taking the same form. Because WV Linewise has a singular `--param` command line argument written in the form `--param 'name=value`, which is moderately too verbose but at least quite specific.
83+
84+
The name must not conflict with any other parameter names otherwise it will need to can be specified in full, for example such as `--param 'type=Full UK Date=^[0-9]+/0*[0-9][0-2]?/[0-9]{4}$'`. Type parameters names also cannot include the `=` symbol.
85+
86+
### The "mode" parameter (optional)
87+
88+
I think the ease of adding interactive elements within UNIX pipelines is one of the nicest parts of WV Linewise, but equally I would find it mildly annoying to page through over a million rows.
89+
90+
To deal with this I have introduced a "mode" to Discover Types which can take one of three values:
91+
92+
* `manual` The default mode is "manual" and in this mode you will have to press "More" on the user interface to page through the data.
93+
* `continuous` This mode will page automatically and will stop at the end, waiting for you to press the "Exit" button,
94+
* `continuous-and-exit` This is same as `continuous` except that it exits automatically at the end.

examples/discover-types/build.js

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
var fs = require("fs")
2+
var inlineAssets = require("inline-assets")
3+
var content = fs.readFileSync("build/index.html", "utf8")
4+
content = inlineAssets("index.html", "build/index.html", content, {
5+
verbose: false,
6+
htmlmin: false,
7+
cssmin: false,
8+
jsmin: false,
9+
pattern: [ ".+" ],
10+
purge: false
11+
})
12+
fs.writeFileSync("index.html", content, "utf8")

examples/discover-types/build.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
IFS=$'\n\t'
4+
5+
rm -rf build
6+
npm run-script build && sed -i 'sJ="/J="Jg' build/index.html && node build.js
7+

examples/discover-types/index.html

Lines changed: 5 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)