From 33f2ecdd1d848da22f2a670434ffb249ad7a93b6 Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Thu, 28 Nov 2024 16:25:57 -0500 Subject: [PATCH 1/6] Add range request examples --- http/get_range/README.md | 2 +- http/get_range/curl/README.md | 22 ++++++++++++ http/get_range/curl/client/client.sh | 50 ++++++++++++++++++++++++++++ http/get_range/js/.gitignore | 20 +++++++++++ http/get_range/js/server/README.md | 34 +++++++++++++++++++ 5 files changed, 127 insertions(+), 1 deletion(-) create mode 100644 http/get_range/curl/README.md create mode 100644 http/get_range/curl/client/client.sh create mode 100644 http/get_range/js/.gitignore create mode 100644 http/get_range/js/server/README.md diff --git a/http/get_range/README.md b/http/get_range/README.md index 0902500..956dc46 100644 --- a/http/get_range/README.md +++ b/http/get_range/README.md @@ -19,4 +19,4 @@ # HTTP GET Arrow Data: Range Request Examples -This directory contains examples of HTTP servers/clients that send/receive data of known size (`Content-Length`) in the Arrow IPC streaming format and support range requests (`Accept-Range: bytes`). +This directory contains examples of HTTP servers/clients that send/receive data of known size (`Content-Length`) in the Arrow IPC streaming format and support range requests (`Accept-Ranges: bytes`). diff --git a/http/get_range/curl/README.md b/http/get_range/curl/README.md new file mode 100644 index 0000000..ab035c0 --- /dev/null +++ b/http/get_range/curl/README.md @@ -0,0 +1,22 @@ + + +# HTTP GET Arrow Data: Range Request curl Client Example + +This directory contains examples of `curl` commands that send HTTP GET requests with the `Range` request header. diff --git a/http/get_range/curl/client/client.sh b/http/get_range/curl/client/client.sh new file mode 100644 index 0000000..c198531 --- /dev/null +++ b/http/get_range/curl/client/client.sh @@ -0,0 +1,50 @@ +#!/bin/sh + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + + +### Use range requests to download an Arrow IPC stream file in two parts + +# Get the length of the file `random.arrows` in bytes +curl -I localhost:8008/random.arrows +# Content-Length: 13550776 + +# Download the first half of the file to `random-part-1.arrows` +curl -r 0-6775388 localhost:8008/random.arrows -o random-part-1.arrows + +# Download the second half of the file to `random-part-2.arrows` +curl -r 6775389-13550776 localhost:8008/random.arrows -o random-part-2.arrows + +# Combine the two separate files into one file `random.arrows` then delete them +cat random-part-1.arrows random-part-2.arrows > random.arrows +rm random-part-1.arrows random-part-2.arrows + +# Clean up +rm random.arrows + + +### Simulate an interrupted download over a slow connection + +# Begin downloading the file at 1M/s but interrupt after five seconds +timeout 5s curl --limit-rate 1M localhost:8008/random.arrows -o random.arrows + +# Resume the download at 1M/s +curl -C - --limit-rate 1M localhost:8008/random.arrows -o random.arrows + +# Clean up +rm random.arrows diff --git a/http/get_range/js/.gitignore b/http/get_range/js/.gitignore new file mode 100644 index 0000000..a5c0552 --- /dev/null +++ b/http/get_range/js/.gitignore @@ -0,0 +1,20 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +/**/node_modules +package-lock.json +package.json diff --git a/http/get_range/js/server/README.md b/http/get_range/js/server/README.md new file mode 100644 index 0000000..c1c92c7 --- /dev/null +++ b/http/get_range/js/server/README.md @@ -0,0 +1,34 @@ + + +# HTTP GET Arrow Data: Range Request JavaScript Server Example + +The example in this directory shows how to use the Node.js package [`serve`](https://www.npmjs.com/package/serve) (which supports range requests) to serve a static Arrow IPC stream file over HTTP. + +To run this example, copy the file `random.arrows` from the directory `data/rand-many-types/` in this repository into this directory: + +```sh +cp ../../../../data/rand-many-types/random.arrows . +``` + +Then start the HTTP server to serve this file: + +```sh +npx --yes serve -l 8008 +``` From bc9d7c427b919e3ecbcbd95f0c485135fe6e4ad0 Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Thu, 28 Nov 2024 18:09:46 -0500 Subject: [PATCH 2/6] Use '.part' file extension --- http/get_range/curl/client/client.sh | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/http/get_range/curl/client/client.sh b/http/get_range/curl/client/client.sh index c198531..c9c481a 100644 --- a/http/get_range/curl/client/client.sh +++ b/http/get_range/curl/client/client.sh @@ -24,15 +24,15 @@ curl -I localhost:8008/random.arrows # Content-Length: 13550776 -# Download the first half of the file to `random-part-1.arrows` -curl -r 0-6775388 localhost:8008/random.arrows -o random-part-1.arrows +# Download the first half of the file to `random-1.arrows.part` +curl -r 0-6775388 localhost:8008/random.arrows -o random-1.arrows.part -# Download the second half of the file to `random-part-2.arrows` -curl -r 6775389-13550776 localhost:8008/random.arrows -o random-part-2.arrows +# Download the second half of the file to `random--2.arrows.part` +curl -r 6775389-13550776 localhost:8008/random.arrows -o random-2.arrows.part # Combine the two separate files into one file `random.arrows` then delete them -cat random-part-1.arrows random-part-2.arrows > random.arrows -rm random-part-1.arrows random-part-2.arrows +cat random-1.arrows.part random-2.arrows.part > random.arrows +rm random-1.arrows.part random-2.arrows.part # Clean up rm random.arrows From 8bf338fd5b53895edab03b7f0dc3e32bf6327a66 Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Fri, 29 Nov 2024 08:35:14 -0500 Subject: [PATCH 3/6] Improve JS server readme --- http/get_range/js/server/README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/http/get_range/js/server/README.md b/http/get_range/js/server/README.md index c1c92c7..e829cf6 100644 --- a/http/get_range/js/server/README.md +++ b/http/get_range/js/server/README.md @@ -21,7 +21,7 @@ The example in this directory shows how to use the Node.js package [`serve`](https://www.npmjs.com/package/serve) (which supports range requests) to serve a static Arrow IPC stream file over HTTP. -To run this example, copy the file `random.arrows` from the directory `data/rand-many-types/` in this repository into this directory: +To run this example, copy the file `random.arrows` from the directory `data/rand-many-types/` into the current directory: ```sh cp ../../../../data/rand-many-types/random.arrows . @@ -32,3 +32,6 @@ Then start the HTTP server to serve this file: ```sh npx --yes serve -l 8008 ``` + +> [!NOTE] +> The npm package `serve` _should_ automatically set the `Content-Type` header to `application/vnd.apache.arrow.stream` when serving a file with extension `.arrows`, because [the Arrow IPC stream format is officially registered with IANA](https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream). Most web servers including `serve` use registration data from IANA to determine the media type of a file based on its file extension and set the `Content-Type` header to that media type when serving a file with that extension. However, this is not working with `.arrows` files in the `serve` package, seemingly because of a problem with the npm package [`mimedb`](https://github.com/jshttp/mime-db) which `serve` depends on. From b62a3d68b06103e89365f54396416a78f91d2116 Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Fri, 29 Nov 2024 08:46:25 -0500 Subject: [PATCH 4/6] Improve JS server --- http/get_range/js/.gitignore | 1 + http/get_range/js/server/README.md | 2 +- http/get_range/js/server/serve.json | 11 +++++++++++ 3 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 http/get_range/js/server/serve.json diff --git a/http/get_range/js/.gitignore b/http/get_range/js/.gitignore index a5c0552..8e6bd3e 100644 --- a/http/get_range/js/.gitignore +++ b/http/get_range/js/.gitignore @@ -18,3 +18,4 @@ /**/node_modules package-lock.json package.json +*.arrows diff --git a/http/get_range/js/server/README.md b/http/get_range/js/server/README.md index e829cf6..5dd9f91 100644 --- a/http/get_range/js/server/README.md +++ b/http/get_range/js/server/README.md @@ -34,4 +34,4 @@ npx --yes serve -l 8008 ``` > [!NOTE] -> The npm package `serve` _should_ automatically set the `Content-Type` header to `application/vnd.apache.arrow.stream` when serving a file with extension `.arrows`, because [the Arrow IPC stream format is officially registered with IANA](https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream). Most web servers including `serve` use registration data from IANA to determine the media type of a file based on its file extension and set the `Content-Type` header to that media type when serving a file with that extension. However, this is not working with `.arrows` files in the `serve` package, seemingly because of a problem with the npm package [`mimedb`](https://github.com/jshttp/mime-db) which `serve` depends on. +> The npm package `serve` _should_ automatically set the `Content-Type` header to `application/vnd.apache.arrow.stream` when serving a file with extension `.arrows`, because [the Arrow IPC stream format is officially registered with IANA](https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream). Most web servers including `serve` use registration data from IANA to determine the media type of a file based on its file extension and set the `Content-Type` header to that media type when serving a file with that extension. However, this is not working with `.arrows` files in the `serve` package, seemingly because of a problem with the npm package [`mimedb`](https://github.com/jshttp/mime-db) which `serve` depends on. So the file `serve.json` is used to set the `Content-Type` header correctly when serving `.arrows` files. diff --git a/http/get_range/js/server/serve.json b/http/get_range/js/server/serve.json new file mode 100644 index 0000000..e3016d8 --- /dev/null +++ b/http/get_range/js/server/serve.json @@ -0,0 +1,11 @@ +{ + "headers": [ + { + "source" : "**/*.arrows", + "headers" : [{ + "key" : "Content-Type", + "value" : "application/vnd.apache.arrow.stream" + }] + } + ] +} From 403e174db02d6b01c5e185c9c7312bc25793e1ca Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Fri, 29 Nov 2024 08:58:58 -0500 Subject: [PATCH 5/6] Improve examples --- http/get_range/curl/.gitignore | 19 +++++++++++++++++++ http/get_range/curl/{ => client}/README.md | 2 ++ http/get_range/curl/client/client.sh | 12 ++++++------ http/get_range/js/server/README.md | 4 ++-- 4 files changed, 29 insertions(+), 8 deletions(-) create mode 100644 http/get_range/curl/.gitignore rename http/get_range/curl/{ => client}/README.md (87%) diff --git a/http/get_range/curl/.gitignore b/http/get_range/curl/.gitignore new file mode 100644 index 0000000..f00067f --- /dev/null +++ b/http/get_range/curl/.gitignore @@ -0,0 +1,19 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +*.arrows +*.arrows.part* diff --git a/http/get_range/curl/README.md b/http/get_range/curl/client/README.md similarity index 87% rename from http/get_range/curl/README.md rename to http/get_range/curl/client/README.md index ab035c0..33fd7e7 100644 --- a/http/get_range/curl/README.md +++ b/http/get_range/curl/client/README.md @@ -20,3 +20,5 @@ # HTTP GET Arrow Data: Range Request curl Client Example This directory contains examples of `curl` commands that send HTTP GET requests with the `Range` request header. + +To run this example, first start one of the range request server examples in the parent directory, then run the shell commands in `client.sh`. diff --git a/http/get_range/curl/client/client.sh b/http/get_range/curl/client/client.sh index c9c481a..e11bc8e 100644 --- a/http/get_range/curl/client/client.sh +++ b/http/get_range/curl/client/client.sh @@ -24,15 +24,15 @@ curl -I localhost:8008/random.arrows # Content-Length: 13550776 -# Download the first half of the file to `random-1.arrows.part` -curl -r 0-6775388 localhost:8008/random.arrows -o random-1.arrows.part +# Download the first half of the file to `random.arrows.part1` +curl -r 0-6775388 localhost:8008/random.arrows -o random.arrows.part1 -# Download the second half of the file to `random--2.arrows.part` -curl -r 6775389-13550776 localhost:8008/random.arrows -o random-2.arrows.part +# Download the second half of the file to `random.arrows.part2` +curl -r 6775389-13550776 localhost:8008/random.arrows -o random.arrows.part2 # Combine the two separate files into one file `random.arrows` then delete them -cat random-1.arrows.part random-2.arrows.part > random.arrows -rm random-1.arrows.part random-2.arrows.part +cat random.arrows.part1 random.arrows.part2 > random.arrows +rm random.arrows.part1 random.arrows.part2 # Clean up rm random.arrows diff --git a/http/get_range/js/server/README.md b/http/get_range/js/server/README.md index 5dd9f91..57cb010 100644 --- a/http/get_range/js/server/README.md +++ b/http/get_range/js/server/README.md @@ -17,7 +17,7 @@ under the License. --> -# HTTP GET Arrow Data: Range Request JavaScript Server Example +# HTTP GET Arrow Data: Range Request Node.js Server Example The example in this directory shows how to use the Node.js package [`serve`](https://www.npmjs.com/package/serve) (which supports range requests) to serve a static Arrow IPC stream file over HTTP. @@ -34,4 +34,4 @@ npx --yes serve -l 8008 ``` > [!NOTE] -> The npm package `serve` _should_ automatically set the `Content-Type` header to `application/vnd.apache.arrow.stream` when serving a file with extension `.arrows`, because [the Arrow IPC stream format is officially registered with IANA](https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream). Most web servers including `serve` use registration data from IANA to determine the media type of a file based on its file extension and set the `Content-Type` header to that media type when serving a file with that extension. However, this is not working with `.arrows` files in the `serve` package, seemingly because of a problem with the npm package [`mimedb`](https://github.com/jshttp/mime-db) which `serve` depends on. So the file `serve.json` is used to set the `Content-Type` header correctly when serving `.arrows` files. +> The npm package `serve` _should_ automatically set the `Content-Type` header to `application/vnd.apache.arrow.stream` when serving a file with extension `.arrows`, because [the Arrow IPC stream format is officially registered with IANA](https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream) and most web servers including `serve` use registration data from IANA to determine the media type of a file based on its file extension and set the `Content-Type` header to that media type when serving a file with that extension. However, this is not working with `.arrows` files in the `serve` package, seemingly because of a problem with the npm package [`mimedb`](https://github.com/jshttp/mime-db) which `serve` depends on. So the file `serve.json` is used to set the `Content-Type` header correctly when serving `.arrows` files. From 2004b0621f1d0e9066ac16c710513497e1df274e Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Fri, 29 Nov 2024 09:03:09 -0500 Subject: [PATCH 6/6] Improve server example --- http/get_range/js/server/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/http/get_range/js/server/README.md b/http/get_range/js/server/README.md index 57cb010..6d03159 100644 --- a/http/get_range/js/server/README.md +++ b/http/get_range/js/server/README.md @@ -30,7 +30,7 @@ cp ../../../../data/rand-many-types/random.arrows . Then start the HTTP server to serve this file: ```sh -npx --yes serve -l 8008 +npx serve -l 8008 ``` > [!NOTE]