diff --git a/CHANGELOG.md b/CHANGELOG.md index 35aabce..a4c0a87 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,62 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.1.0] - 2026-04-10 + +### Added + +- **`fio.ensure_file(path)`**: creates a file if it does not exist; no-op if it + already does. Useful for initialising config or lock files idempotently. + +- **`fio.copy_if_newer(src, dest)`**: copies `src` to `dest` only when `src` + has a newer `mtime` than `dest` (or when `dest` is absent). Returns + `Ok(True)` when a copy was performed, `Ok(False)` when it was skipped. + +- **`fio.read_fold(path, chunk_size, initial, f)`**: reads a file in chunks + and folds each chunk into an accumulator. Lets you process arbitrarily large + files without loading them fully into memory. + +- **`handle.fold_chunks(handle, chunk_size, initial, f)`**: same fold primitive + as `read_fold` but operates on an already-open `FileHandle`. Used internally + by `read_fold` and available for callers that manage their own handle + lifecycle. + +- **`fio/json` module**: thin I/O wrappers that compose with any + encoder/decoder function. + - `read_json(path, decoder)`: reads the file and passes the content to + `decoder`. Returns `Error(IoError(_))` on I/O failure and + `Error(ParseError(_))` when the decoder rejects the content. + - `write_json_atomic(path, value, encoder)`: encodes `value` and writes + atomically via `fio.write_atomic`. + - `JsonError(e)` type with `IoError(FioError)` and `ParseError(e)` variants. + +- **`fio/observer` module**: structured, extensible observability primitives. + - `Event` type: carries `op`, `path`, `outcome: Result(Nil, FioError)`, and + optional `bytes: Option(Int)`. Designed to be consumed by external packages + without knowing fio internals. + - `Sink` type alias (`fn(Event) -> Nil`): any package can implement a sink + (structured logger, metrics counter, test recorder, OpenTelemetry span, …). + - `emit(result, op, path, bytes, sink)`: core primitive — emits an `Event` + then returns `result` unchanged. + - `trace(result, op, path, sink)`: convenience wrapper without byte count. + - `trace_bytes(result, op, path, sink)`: automatically infers `bytes` from + a `BitArray` result (e.g. after `fio.read_bits`). + - `format(event)`: formats an `Event` as a human-readable string for simple + logging sinks. + - `fan_out(sink1, sink2)`: combines two sinks into one; both receive every + event. Enables log-to-stdout AND record-in-test simultaneously. + - `noop_sink`: discards all events; useful as a default/no-op argument when + observability is optional. + +### Fixed + +- **`error.describe` for `Unknown`**: the `context` field is now included in + the description string when present. Previously it was silently discarded. + +- **`recursive.gleam`**: extracted the four copies of the inode-key building + expression into a single private `inode_key(info, fallback)` helper, removing + the risk of the four copies diverging in future. + ## [1.0.0] - 2026-03-18 ### Added diff --git a/README.md b/README.md index 20482ff..3c3dfe2 100644 --- a/README.md +++ b/README.md @@ -20,23 +20,22 @@ All functionality is available via **`import fio`** (no need for submodule impor ## Features -- **Unified API** — One `import fio` for all file operations. No juggling multiple packages. -- **Cross-platform** — Works identically on Erlang, Node.js, Deno, and Bun. -- **Rich errors** — POSIX-style error codes + semantic types like `NotUtf8(path)`. Pattern match precisely. -- **Atomic writes** — `write_atomic` / `write_bits_atomic` guarantee readers never see partial content. Temporary files are cleaned up even if the rename fails. -- **Random-access file handles** — `seek` and `tell` let you jump to arbitrary byte offsets. -- **File handles** — `fio/handle` exposes `open`, `close`, `read_chunk`, `write` for large-file and - streaming scenarios; `with` helper prevents leaks. -- **Type-safe permissions** — `FilePermissions` with `Set(Permission)`, not magic integers. -- **Path operations** — `join`, `split`, `expand`, `safe_relative`, and more — built in. -- **Symlinks & hard links** — Create, detect, read link targets. -- **Symlink loop safety** — Recursive operations track `(dev, inode)` pairs; circular symlinks - are listed but never descended into. On Windows, where `inode` may be zero, the - full path string is used as a fallback key. -- **FFI safety** — Erlang bindings map hash‑algorithm strings with a closed set, - preventing atom table exhaustion. -- **Touch** — Create files or update timestamps, like Unix `touch`. -- **Idempotent deletes** — `delete_all` succeeds silently on non-existent paths. +- **Unified API**: One `import fio` for all file operations. No juggling multiple packages. +- **Cross-platform**: Works identically on Erlang, Node.js, Deno, and Bun. +- **Rich errors**: POSIX-style error codes plus semantic types like `NotUtf8(path)`. Pattern match precisely. +- **Atomic writes**: `write_atomic` / `write_bits_atomic` guarantee readers never see partial content. Temporary files are cleaned up even if the rename fails. +- **Streaming**: `read_fold` and `handle.fold_chunks` let you process files chunk by chunk without loading them fully into memory. +- **Random-access file handles**: `seek` and `tell` let you jump to arbitrary byte offsets. +- **File handles**: `fio/handle` exposes `open`, `close`, `read_chunk`, `write` for large-file and streaming scenarios. The `with` helper prevents handle leaks. +- **High-level helpers**: `ensure_file`, `copy_if_newer`, and `fio/json` for common workflows. +- **Type-safe permissions**: `FilePermissions` with `Set(Permission)`, not magic integers. +- **Path operations**: `join`, `split`, `expand`, `safe_relative`, and more, built in. +- **Symlinks and hard links**: Create, detect, read link targets. +- **Symlink loop safety**: Recursive operations track `(dev, inode)` pairs. Circular symlinks are listed but never descended into. On Windows, where `inode` may be zero, the full path string is used as a fallback key. +- **FFI safety**: Erlang bindings map hash algorithm strings with a closed set, preventing atom table exhaustion. +- **Touch**: Create files or update timestamps, like Unix `touch`. +- **Idempotent deletes**: `delete_all` succeeds silently on non-existent paths. +- **Observability**: `fio/observer` provides structured event sinks and transparent wrappers to instrument any fio call without restructuring your code. ## Installation @@ -65,13 +64,13 @@ pub fn main() { // Path safety (via the same `fio` facade) let safe = fio.safe_relative("../../../etc/passwd") - // safe == Error(Nil) — blocked! + // safe == Error(Nil) -- blocked! } ``` ## API Overview -### Reading & Writing +### Reading and Writing | Function | Description | |---|---| @@ -84,6 +83,14 @@ pub fn main() { | `fio.append(path, content)` | Append string | | `fio.append_bits(path, bytes)` | Append bytes | +### High-level Helpers + +| Function | Description | +|---|---| +| `fio.ensure_file(path)` | Create file if it does not exist; no-op otherwise | +| `fio.copy_if_newer(src, dest)` | Copy only when `src` is newer than `dest`; returns `Bool` | +| `fio.read_fold(path, chunk_size, acc, f)` | Fold over file chunks without loading it all into memory | + ### File Operations | Function | Description | @@ -95,9 +102,9 @@ pub fn main() { | `fio.delete_directory(path)` | Delete an empty directory | | `fio.delete_all(path)` | Delete recursively (idempotent) | | `fio.touch(path)` | Create file or update modification time | +| `fio.list_recursive(path)` | List all files in a directory recursively | > **Note:** `delete_all` does **not** follow directory symlinks. A symlink itself is deleted but its target is left untouched. -| `fio.list_recursive(path)` | List all files in a directory recursively | ### Querying @@ -110,7 +117,7 @@ pub fn main() { | `fio.file_info(path)` | Get file metadata (follows symlinks) | | `fio.link_info(path)` | Get metadata without following symlinks | -### Symlinks & Links +### Symlinks and Links | Function | Description | |---|---| @@ -140,51 +147,26 @@ pub fn main() { | `fio.current_directory()` | Get working directory | | `fio.tmp_dir()` | Get system temp directory | -## Cross-platform behavior notes - -Some behavior differs between BEAM (Erlang/OTP) and JavaScript runtimes (Node/Deno/Bun). The library aims to keep the API consistent, but underlying platform differences can affect: - -- **Synchronous I/O**: The JS implementation uses synchronous filesystem calls (`fs.readFileSync`, `fs.writeFileSync`, etc.). This is appropriate for many Gleam apps, but it blocks the event loop. If you target Deno/Bun, the runtime may still work (they provide Node compatibility layers) but the operations remain blocking. -- **Permissions**: POSIX-style `chmod`/`stat` behavior is only meaningful on Unix-like platforms. On Windows, permissions queries/changes may be no-ops or behave differently, and `set_permissions` may return `Eperm`/`Enotsup`. -- **Symlink creation**: Some platforms (notably Windows) require elevated privileges to create symlinks; when symlink creation fails, the library surfaces the OS error. -- **Path normalization**: The `fio/path` module delegates to `filepath` (BEAM) or Node’s `path` (JS). Windows paths may use backslashes (`\\`) and drive letters; `safe_relative` normalizes backslashes to forward slashes to ensure consistent behavior. - - For example, on Node.js (macOS host) the output of `path.join("C:\\foo", "bar")` is `C:\foo/bar`, while `path.win32.join` yields `C:\foo\\bar`. On BEAM, `fio/path.join` currently yields `C:\foo/bar` (mixing separators) and `path.split("C:\\foo\\bar")` returns a single segment `"C:\\foo\\bar"`. - - You can inspect runtime behavior across targets using: - - ```sh - node dev/path_behavior.js - # (or deno run dev/path_behavior.js, bun dev/path_behavior.js if available) - ``` - -- **File handles**: On Node.js, append mode is enforced by the OS only when write calls use a `null` position; `fio/handle` tracks position and forces `null` when in append mode to preserve POSIX semantics. - -> Tip: If you rely on strict POSIX behavior (permissions, symlink semantics, dev/inode metadata), prefer running on Erlang/OTP where those semantics are stable. - -### File Handles (`fio/handle`) +## File Handles (`fio/handle`) For large files or streaming scenarios where loading the entire content into memory is not acceptable, use the `fio/handle` module: ```gleam import fio/handle -import gleam/result // Read a large log file chunk by chunk (64 KiB at a time) pub fn count_bytes(path: String) -> Result(Int, error.FioError) { - use h <- result.try(handle.open(path, handle.ReadOnly)) - let assert Ok(bits) = handle.read_all_bits(h) - let _ = handle.close(h) - Ok(bit_array.byte_size(bits)) + use h <- handle.with(path, handle.ReadOnly) + handle.fold_chunks(h, 65_536, 0, fn(acc, chunk) { + acc + bit_array.byte_size(chunk) + }) } // Write to a file with explicit lifecycle control pub fn write_lines(path: String, lines: List(String)) -> Result(Nil, error.FioError) { - use h <- result.try(handle.open(path, handle.WriteOnly)) - let result = list.try_each(lines, fn(line) { handle.write(h, line <> "\n") }) - let _ = handle.close(h) - result + use h <- handle.with(path, handle.WriteOnly) + list.try_each(lines, fn(line) { handle.write(h, line <> "\n") }) } ``` @@ -192,16 +174,81 @@ pub fn write_lines(path: String, lines: List(String)) -> Result(Nil, error.FioEr |---|---| | `handle.open(path, mode)` | Open a file (`ReadOnly`, `WriteOnly`, `AppendOnly`) | | `handle.close(handle)` | Close the handle, release the OS file descriptor | +| `handle.with(path, mode, callback)` | Open, run callback, always close (recommended) | | `handle.read_chunk(handle, size)` | Read up to `size` bytes; `Ok(None)` at EOF | | `handle.read_all_bits(handle)` | Read all remaining bytes as `BitArray` | | `handle.read_all(handle)` | Read all remaining content as UTF-8 `String` | +| `handle.fold_chunks(handle, size, acc, f)` | Fold over all remaining chunks | | `handle.write(handle, content)` | Write a UTF-8 string | | `handle.write_bits(handle, bytes)` | Write raw bytes | +| `handle.seek(handle, offset)` | Move cursor to byte offset from start | +| `handle.tell(handle)` | Return current byte offset | + +> **Note**: `FileHandle` is intentionally opaque. Always call `close` when done, or use `handle.with` which closes automatically. + +## JSON Helpers (`fio/json`) + +`fio/json` provides I/O wrappers that compose cleanly with any encoder/decoder +function. It does not bundle a JSON parser; bring your own (e.g. `gleam_json`). + +```gleam +import fio/json as fjson +import gleam_json + +// Read and decode +case fjson.read_json("config.json", gleam_json.decode_string) { + Ok(config) -> use_config(config) + Error(fjson.IoError(e)) -> io.println("I/O failed: " <> error.describe(e)) + Error(fjson.ParseError(e)) -> io.println("Bad JSON: " <> e) +} + +// Encode and write atomically +fjson.write_json_atomic("config.json", my_value, encode_fn) +``` + +| Function | Description | +|---|---| +| `fjson.read_json(path, decoder)` | Read file and run `decoder` on contents | +| `fjson.write_json_atomic(path, value, encoder)` | Encode and write atomically | + +The `JsonError(e)` type has two variants: `IoError(FioError)` and `ParseError(e)`. + +## Observability (`fio/observer`) + +Instrument any fio call without restructuring your code. + +```gleam +import fio +import fio/observer +import gleam/io + +fn log_sink(event: observer.Event) -> Nil { + io.println(observer.format(event)) +} -> **Note**: `FileHandle` is intentionally opaque. Always call `close` when done — -> the OS file descriptor is not automatically released. +pub fn main() { + fio.read("config.json") + |> observer.trace("read", "config.json", log_sink) +} +``` -### Path Operations (`fio/path`) +For byte-oriented operations such as `read_bits`, use `trace_bytes` to infer the byte count automatically. + +```gleam +fio.read_bits("archive.bin") +|> observer.trace_bytes("read_bits", "archive.bin", log_sink) +``` + +| Function | Description | +|---|---| +| `observer.emit(result, op, path, bytes, sink)` | Emit a structured `Event` and return `result` unchanged | +| `observer.trace(result, op, path, sink)` | Emit an event with `bytes = None` | +| `observer.trace_bytes(result, op, path, sink)` | Emit an event and infer `bytes` from `BitArray` results | +| `observer.format(event)` | Format an event as a human-readable string | +| `observer.fan_out(first, second)` | Combine two sinks so both receive every event | +| `observer.noop_sink` | Sink that discards all events | + +## Path Operations (`fio/path`) | Function | Description | |---|---| @@ -216,7 +263,7 @@ pub fn write_lines(path: String, lines: List(String)) -> Result(Nil, error.FioEr | `path.strip_extension(path)` | Remove extension | | `path.is_absolute(path)` | Check if path is absolute | | `path.expand(path)` | Normalize `.` and `..` segments | -| `path.safe_relative(path)` | Validate path doesn't escape via `..` | +| `path.safe_relative(path)` | Validate path does not escape via `..` | ## Atomic Writes @@ -244,7 +291,7 @@ partial writes are acceptable. ## Error Handling -fio uses `FioError`: 39 POSIX-style error constructors plus 7 semantic variants; each error has a human-readable description available via `error.describe`: +fio uses `FioError`: 39 POSIX-style error constructors plus 7 semantic variants. Each error has a human-readable description via `error.describe`: ```gleam import fio @@ -255,7 +302,7 @@ case fio.read("data.bin") { Error(Enoent) -> io.println("Not found") Error(Eacces) -> io.println("Permission denied") Error(NotUtf8(path)) -> { - // File exists but isn't valid UTF-8 — use read_bits instead + // File exists but is not valid UTF-8 -- use read_bits instead let assert Ok(bytes) = fio.read_bits(path) use_bytes(bytes) } @@ -263,8 +310,6 @@ case fio.read("data.bin") { } ``` -Every error has a human-readable description via `error.describe`. - ## Type-Safe Permissions ```gleam @@ -283,8 +328,6 @@ fio.set_permissions("script.sh", perms) ## Platform Support -### Development - Run the complete test suite locally across targets with the helper script: ```sh @@ -293,64 +336,40 @@ Run the complete test suite locally across targets with the helper script: ./bin/test javascript # Node.js only ``` -This mirrors the CI matrix without needing to publish the package. - -## Platform Support - | Target | Runtime | Status | |---|---|---| | Erlang | OTP | Full support | | JavaScript | Node.js | Full support | | JavaScript | Deno | Full support | | JavaScript | Bun | Full support | -### Platform Notes & Limitations + +### Platform Notes and Limitations Some behaviours vary by OS or filesystem. The library strives for consistency but there are edge cases you should be aware of: -* **Windows differences** – permission‑setting functions are no‑ops and - `%o` octal permissions are ignored by the OS. Atomic rename may fail if the - destination already exists (a Windows API restriction); a failure returns - `AtomicFailed("rename", reason)` and the temp file is removed. Recursive - traversal uses inode numbers when available; on Windows `ino` is typically - zero, so the code falls back to a visited path string. Tests for Windows - behaviour run conditionally and the README makes these caveats explicit. - -* **Atomic write caveats** – `write_atomic`/`write_bits_atomic` implement - write‑to‑temp‑then‑rename. This guarantees readers never see a partial file - on POSIX filesystems, but does *not* protect you from: - - crashes that occur **between** the temp write and the rename (a `.tmp` - sibling may be left behind), - - non‑POSIX mounts (SMB, NFS with strange semantics) where rename may not be - atomic. Always clean up temp files periodically if you run on untrusted - filesystems. - -* **Recursive read/write** – `handle.read_all_bits` now uses an iterative loop - to avoid stack overflow on extremely large files. The previous recursive - implementation worked but could blow the call stack for multi‑gigabyte reads. - -* **Path utilities** – `path.join_all([])` returns `"."` (previously `""`) - which better matches user expectations. `path.safe_relative` detects and - blocks Windows drive letters as well as Unix absolute paths; it still simply - normalises `..` segments, so be cautious when operating on network shares. - -* **Error mapping** – the FFI bridge maps all known POSIX errors; if a new - platform error is received it becomes `Unknown(inner, _)`. Add new cases to - `fio_ffi_bridge` when extending the error set. - -* **No async/watch support** – all APIs are synchronous. Reading very large - files will block the BEAM scheduler or the JavaScript event loop; use - `fio/handle` with small chunks or move heavy I/O off the main thread. - -These notes are intentionally broad; see the module docs for more details on -individual functions. -### Cross-Platform Notes +- **Windows differences**: permission-setting functions are no-ops and octal permissions are ignored by the OS. Atomic rename may fail if the destination already exists (a Windows API restriction); a failure returns `AtomicFailed("rename", reason)` and the temp file is removed. Recursive traversal uses inode numbers when available; on Windows `ino` is typically zero, so the code falls back to a visited path string. Tests for Windows behaviour run conditionally. + +- **Synchronous I/O**: The JS implementation uses synchronous filesystem calls (`fs.readFileSync`, `fs.writeFileSync`, etc.). This blocks the event loop. If you target Deno/Bun, the runtime may still work but the operations remain blocking. -- **`NotUtf8` detection** is consistent across Erlang and JavaScript. -- **`delete_all`** is idempotent: succeeds silently if the path doesn't exist. -- **Symlink** functions may require elevated privileges on Windows. -- **Permissions** functions (`set_permissions`, `set_permissions_octal`) have no effect on Windows. +- **Permissions**: POSIX-style `chmod`/`stat` behaviour is only meaningful on Unix-like platforms. On Windows, permissions queries/changes may be no-ops or behave differently. + +- **Symlink creation**: Some platforms (notably Windows) require elevated privileges to create symlinks. + +- **Atomic write caveats**: `write_atomic`/`write_bits_atomic` guarantee readers never see a partial file on POSIX filesystems, but do not protect against crashes between the temp write and the rename, or non-POSIX mounts (SMB, NFS) where rename may not be atomic. + +- **Recursive read/write**: `handle.read_all_bits` and `handle.fold_chunks` use iterative loops to avoid stack overflow on extremely large files. + +- **Path utilities**: `path.join_all([])` returns `"."`. `path.safe_relative` detects and blocks Windows drive letters as well as Unix absolute paths. + +- **Error mapping**: the FFI bridge maps all known POSIX errors. Unknown platform errors become `Unknown(inner, context)`. Add new cases to `fio_ffi_bridge` when extending the error set. + +### Cross-Platform Notes +- `NotUtf8` detection is consistent across Erlang and JavaScript. +- `delete_all` is idempotent: succeeds silently if the path does not exist. +- Symlink functions may require elevated privileges on Windows. +- Permissions functions (`set_permissions`, `set_permissions_octal`) have no effect on Windows. ## License diff --git a/gleam.toml b/gleam.toml index e70df4b..fd10a0f 100644 --- a/gleam.toml +++ b/gleam.toml @@ -1,5 +1,5 @@ name = "fio" -version = "1.0.0" +version = "1.1.0" gleam = ">= 1.14.0" description = "Complete, safe, ergonomic file operations for all Gleam targets" diff --git a/src/fio.gleam b/src/fio.gleam index 04ef79b..8420143 100644 --- a/src/fio.gleam +++ b/src/fio.gleam @@ -1,4 +1,5 @@ -import fio/error.{type FioError} +import fio/error.{type FioError, Enoent} +import fio/handle import fio/internal/io as internal import fio/path import fio/recursive @@ -230,6 +231,63 @@ pub fn copy_directory(src: String, dest: String) -> Result(Nil, FioError) { recursive.copy_directory(src, dest) } +// --- High-level helpers --- + +/// Create a file if it does not already exist. +/// If the file already exists this is a no-op and returns `Ok(Nil)`. +pub fn ensure_file(path: String) -> Result(Nil, FioError) { + case internal.exists(path) { + True -> Ok(Nil) + False -> internal.write(path, "") + } +} + +/// Copy `src` to `dest` only when `src` is newer than `dest`. +/// +/// If `dest` does not exist the copy always happens. +/// Returns `Ok(True)` when a copy was performed, `Ok(False)` when skipped. +pub fn copy_if_newer(src: String, dest: String) -> Result(Bool, FioError) { + use src_info <- result.try(internal.file_info(src)) + case internal.file_info(dest) { + Error(Enoent) -> { + use _ <- result.try(internal.copy_file(src, dest)) + Ok(True) + } + Error(e) -> Error(e) + Ok(dest_info) -> + case src_info.mtime_seconds > dest_info.mtime_seconds { + False -> Ok(False) + True -> { + use _ <- result.try(internal.copy_file(src, dest)) + Ok(True) + } + } + } +} + +// --- Streaming --- + +/// Read a file in chunks, folding each chunk into an accumulator. +/// +/// Opens the file, reads it in `chunk_size`-byte pieces, and calls `f` on each +/// chunk until EOF. The file handle is always closed before returning. +/// +/// ```gleam +/// // Count bytes without loading the whole file into memory +/// fio.read_fold("big.bin", 65_536, 0, fn(acc, chunk) { +/// acc + bit_array.byte_size(chunk) +/// }) +/// ``` +pub fn read_fold( + path: String, + chunk_size: Int, + initial: acc, + f: fn(acc, BitArray) -> acc, +) -> Result(acc, FioError) { + use h <- handle.with(path, handle.ReadOnly) + handle.fold_chunks(h, chunk_size, initial, f) +} + // --- Checksums --- /// Compute a file checksum using the specified algorithm. diff --git a/src/fio/error.gleam b/src/fio/error.gleam index 87529ee..910f69e 100644 --- a/src/fio/error.gleam +++ b/src/fio/error.gleam @@ -150,6 +150,10 @@ pub fn describe(error: FioError) -> String { InvalidPath(path, reason) -> "Invalid path " <> path <> ": " <> reason AtomicFailed(op, reason) -> "Atomic " <> op <> " failed: " <> reason TempFailed(reason) -> "Temp file operation failed: " <> reason - Unknown(inner, _) -> "Unknown error: " <> inner + Unknown(inner, context) -> + case context { + option.None -> "Unknown error: " <> inner + option.Some(ctx) -> "Unknown error: " <> inner <> " (" <> ctx <> ")" + } } } diff --git a/src/fio/handle.gleam b/src/fio/handle.gleam index 1d6ed85..da10472 100644 --- a/src/fio/handle.gleam +++ b/src/fio/handle.gleam @@ -264,3 +264,42 @@ pub fn seek(handle: FileHandle, position: Int) -> Result(Nil, FioError) { pub fn tell(handle: FileHandle) -> Result(Int, FioError) { io.tell(handle.inner) } + +// --------------------------------------------------------------------------- +// Folding / streaming +// --------------------------------------------------------------------------- + +/// Fold over all remaining chunks of the file, accumulating a result. +/// +/// Reads `chunk_size` bytes at a time from the **current cursor position** +/// until EOF, calling `f(acc, chunk)` for each chunk. Returns `Ok(final_acc)` +/// when EOF is reached cleanly, or `Error(FioError)` on the first read failure. +/// +/// ```gleam +/// // Count bytes in a large file without loading it all into memory +/// use h <- handle.with(path, handle.ReadOnly) +/// handle.fold_chunks(h, 65_536, 0, fn(acc, chunk) { +/// acc + bit_array.byte_size(chunk) +/// }) +/// ``` +pub fn fold_chunks( + handle: FileHandle, + chunk_size: Int, + initial: acc, + f: fn(acc, BitArray) -> acc, +) -> Result(acc, FioError) { + do_fold_chunks(handle, chunk_size, initial, f) +} + +fn do_fold_chunks( + handle: FileHandle, + chunk_size: Int, + acc: acc, + f: fn(acc, BitArray) -> acc, +) -> Result(acc, FioError) { + use chunk <- result.try(read_chunk(handle, chunk_size)) + case chunk { + None -> Ok(acc) + Some(data) -> do_fold_chunks(handle, chunk_size, f(acc, data), f) + } +} diff --git a/src/fio/json.gleam b/src/fio/json.gleam new file mode 100644 index 0000000..5c07c7b --- /dev/null +++ b/src/fio/json.gleam @@ -0,0 +1,77 @@ +/// Convenience helpers for reading and writing JSON files. +/// +/// This module does not include a JSON parser or encoder — encoding and +/// decoding are the caller's responsibility (use `gleam_json` or any other +/// library). What this module provides is ergonomic I/O wrappers that compose +/// cleanly with decoder/encoder functions. +/// +/// ## Example +/// +/// ```gleam +/// import fio/json as fjson +/// import gleam_json +/// +/// // Read and decode +/// let result = fjson.read_json("config.json", gleam_json.decode_string) +/// +/// // Encode and write atomically +/// let assert Ok(_) = fjson.write_json_atomic("config.json", my_value, encode_fn) +/// ``` +import fio +import fio/error.{type FioError} + +// --------------------------------------------------------------------------- +// Error type +// --------------------------------------------------------------------------- + +/// Error returned by `read_json` when either the file I/O fails or the +/// caller-supplied decoder rejects the content. +pub type JsonError(decode_err) { + /// The file could not be read (e.g. missing, permission denied). + IoError(error: FioError) + /// The file was read successfully but the decoder rejected the content. + ParseError(error: decode_err) +} + +// --------------------------------------------------------------------------- +// API +// --------------------------------------------------------------------------- + +/// Read a file and decode its content with `decoder`. +/// +/// Returns `Error(IoError(_))` on I/O failure and +/// `Error(ParseError(_))` when the decoder rejects the content. +/// +/// ```gleam +/// fjson.read_json("settings.json", my_decoder) +/// // Ok(settings) | Error(IoError(Enoent)) | Error(ParseError(...)) +/// ``` +pub fn read_json( + path: String, + decoder: fn(String) -> Result(a, e), +) -> Result(a, JsonError(e)) { + case fio.read(path) { + Error(e) -> Error(IoError(e)) + Ok(content) -> + case decoder(content) { + Error(e) -> Error(ParseError(e)) + Ok(value) -> Ok(value) + } + } +} + +/// Encode `value` with `encoder` and write the result atomically to `path`. +/// +/// Uses `fio.write_atomic` under the hood, so readers never observe +/// partial content. +/// +/// ```gleam +/// fjson.write_json_atomic("settings.json", settings, my_encoder) +/// ``` +pub fn write_json_atomic( + path: String, + value: a, + encoder: fn(a) -> String, +) -> Result(Nil, FioError) { + fio.write_atomic(path, encoder(value)) +} diff --git a/src/fio/observer.gleam b/src/fio/observer.gleam new file mode 100644 index 0000000..76c41b9 --- /dev/null +++ b/src/fio/observer.gleam @@ -0,0 +1,205 @@ +/// Observability primitives for fio operations. +/// +/// The design is intentionally open: a `Sink` is just a function +/// `fn(Event) -> Nil`, so any package can implement its own sink +/// (structured logger, metrics counter, OpenTelemetry span, test recorder, …) +/// without depending on fio internals. +/// +/// ## How it works +/// +/// Call `emit` (or the convenience wrapper `trace`) at the end of any fio +/// operation. The result is returned unchanged — the call is fully transparent +/// in a pipeline. +/// +/// ```gleam +/// import fio +/// import fio/observer.{type Event, type Sink} +/// +/// fn my_sink(event: Event) -> Nil { +/// io.println(observer.format(event)) +/// } +/// +/// pub fn main() { +/// fio.write("hello.txt", "world") +/// |> observer.trace("write", "hello.txt", my_sink) +/// } +/// ``` +/// +/// ## Extending with bytes +/// +/// Use `emit` directly when you know the byte count: +/// +/// ```gleam +/// fio.read_bits("data.bin") +/// |> observer.emit("read_bits", "data.bin", option.Some(expected_size), my_sink) +/// ``` +/// +/// ## Writing a reusable sink +/// +/// Because `Sink` is just `fn(Event) -> Nil`, you can build sinks as closures +/// that capture external state (a logger handle, a metrics counter, etc.): +/// +/// ```gleam +/// pub fn make_prefix_sink(prefix: String) -> observer.Sink { +/// fn(event: observer.Event) { +/// io.println(prefix <> observer.format(event)) +/// } +/// } +/// ``` +import fio/error.{type FioError} +import gleam/bit_array +import gleam/int +import gleam/option.{type Option, None, Some} +import gleam/result + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +/// A single fio operation event emitted to a sink. +pub type Event { + Event( + /// Short operation name: "read", "write", "copy", "delete", etc. + op: String, + /// Primary path argument of the operation. + path: String, + /// `Ok(Nil)` on success, `Error(e)` on failure. + /// The actual return value (file content, etc.) is excluded to keep the + /// event type generic and independent of the operation's return type. + outcome: Result(Nil, FioError), + /// Byte count involved, when known. + /// For read operations: bytes returned. + /// For write operations: bytes written. + /// `None` when not applicable (e.g. delete, rename). + bytes: Option(Int), + ) +} + +/// A sink consumes events produced by observed fio operations. +/// +/// Implement this type to integrate with any observability backend: +/// structured loggers, metrics systems, test recorders, or custom hooks. +/// A sink receives events synchronously on the same thread/process as the caller. +pub type Sink = + fn(Event) -> Nil + +// --------------------------------------------------------------------------- +// Core primitive +// --------------------------------------------------------------------------- + +/// Emit an `Event` to `sink` after `result` is produced, then return `result` +/// unchanged. +/// +/// This is the lowest-level building block. Use `trace` when you do not have +/// a byte count to report. +/// +/// ```gleam +/// fio.read_bits("data.bin") +/// |> observer.emit("read_bits", "data.bin", option.None, my_sink) +/// ``` +pub fn emit( + result: Result(a, FioError), + op: String, + path: String, + bytes: Option(Int), + sink: Sink, +) -> Result(a, FioError) { + let outcome = result.map(result, fn(_) { Nil }) + sink(Event(op:, path:, outcome:, bytes:)) + result +} + +// --------------------------------------------------------------------------- +// Convenience wrappers +// --------------------------------------------------------------------------- + +/// Like `emit` but without a byte count (`bytes` is always `None`). +/// +/// Use this for operations where byte count is not meaningful (delete, rename, +/// touch, create_directory, etc.): +/// +/// ```gleam +/// fio.delete("old.txt") +/// |> observer.trace("delete", "old.txt", my_sink) +/// ``` +pub fn trace( + result: Result(a, FioError), + op: String, + path: String, + sink: Sink, +) -> Result(a, FioError) { + emit(result, op, path, None, sink) +} + +/// Like `trace` but automatically infers the byte count from the result when +/// the operation returns a `BitArray` (e.g. `fio.read_bits`). +/// +/// ```gleam +/// fio.read_bits("archive.tar.gz") +/// |> observer.trace_bytes("read_bits", "archive.tar.gz", my_sink) +/// ``` +pub fn trace_bytes( + result: Result(BitArray, FioError), + op: String, + path: String, + sink: Sink, +) -> Result(BitArray, FioError) { + let bytes = case result { + Ok(data) -> Some(bit_array.byte_size(data)) + Error(_) -> None + } + emit(result, op, path, bytes, sink) +} + +// --------------------------------------------------------------------------- +// Sink utilities +// --------------------------------------------------------------------------- + +/// Format an `Event` as a human-readable string. +/// Useful when building simple logging sinks. +/// +/// ```gleam +/// fn log_sink(event: Event) -> Nil { +/// io.println(observer.format(event)) +/// } +/// ``` +pub fn format(event: Event) -> String { + let status = case event.outcome { + Ok(_) -> "ok" + Error(e) -> "err(" <> error.describe(e) <> ")" + } + let bytes_str = case event.bytes { + None -> "" + Some(n) -> " bytes=" <> int.to_string(n) + } + "[fio] " <> event.op <> " " <> event.path <> " -> " <> status <> bytes_str +} + +/// Combine two sinks into one: both receive every event in order. +/// +/// Useful for fan-out — log to stdout AND record in a test buffer: +/// +/// ```gleam +/// let combined = observer.fan_out(log_sink, test_recorder_sink) +/// fio.write("out.txt", data) |> observer.trace("write", "out.txt", combined) +/// ``` +pub fn fan_out(first: Sink, second: Sink) -> Sink { + fn(event: Event) { + first(event) + second(event) + } +} + +/// A no-op sink that discards all events. +/// Useful as a default/placeholder argument when observability is optional. +/// +/// ```gleam +/// pub fn copy(src, dest, sink: observer.Sink) { +/// fio.copy(src, dest) |> observer.trace("copy", src, sink) +/// } +/// // caller passes observer.noop_sink when not interested +/// copy("a.txt", "b.txt", observer.noop_sink) +/// ``` +pub fn noop_sink(_event: Event) -> Nil { + Nil +} diff --git a/src/fio/recursive.gleam b/src/fio/recursive.gleam index 91436ec..d099675 100644 --- a/src/fio/recursive.gleam +++ b/src/fio/recursive.gleam @@ -1,11 +1,26 @@ import fio/error.{type FioError, Enotdir} import fio/internal/io import fio/path.{join} +import fio/types.{type FileInfo} import gleam/int import gleam/list import gleam/result import gleam/set.{type Set} +// Build a stable visited-set key from a FileInfo. +// On Windows stat returns inode 0 for all files; fall back to the full path +// string so we still detect cycles without relying on inode numbers. +fn inode_key(info: FileInfo, fallback_path: String) -> String { + case info.inode { + 0 -> fallback_path + _ -> + "dev:" + <> int.to_string(info.dev) + <> ";ino:" + <> int.to_string(info.inode) + } +} + /// Recursively list files and directories (paths relative to `path`). /// /// Uses a flat string accumulator for O(n) traversal. @@ -13,25 +28,13 @@ import gleam/set.{type Set} /// **Symlink loop safety**: before descending into any directory, its real /// `(dev, inode)` pair (obtained via `stat`, which follows symlinks) is /// checked against the `visited` set. If already seen, the entry is listed -/// but not descended into — breaking any A→B→A or deeper circular chains. +/// but not descended into, breaking any A->B->A or deeper circular chains. pub fn list_recursive(path: String) -> Result(List(String), FioError) { use is_dir <- result.try(io.is_directory(path)) case is_dir { True -> { - // Seed the visited set with the root's real (dev, inode) so we - // never re-enter it via a symlink. We store the key as a string - // because later we may fall back to using the path when inodes are - // unreliable (e.g. Windows). use root_info <- result.try(io.file_info(path)) - let root_key = case root_info.inode { - 0 -> path - _ -> - "dev:" - <> int.to_string(root_info.dev) - <> ";ino:" - <> int.to_string(root_info.inode) - } - let visited = set.from_list([root_key]) + let visited = set.from_list([inode_key(root_info, path)]) use acc <- result.try(do_list_recursive(path, "", visited, [])) Ok(list.reverse(acc)) } @@ -39,9 +42,9 @@ pub fn list_recursive(path: String) -> Result(List(String), FioError) { } } -// `visited` — set of (dev, inode) pairs already entered, prevents loops. -// `current_rel` — relative path of the directory being scanned. -// `acc` — reverse accumulator; reversed once at the call site. +// `visited` - set of inode keys already entered, prevents loops. +// `current_rel` - relative path of the directory being scanned. +// `acc` - reverse accumulator; reversed once at the call site. fn do_list_recursive( root: String, current_rel: String, @@ -67,29 +70,16 @@ fn do_list_recursive( case is_dir { False -> Ok([item_rel, ..inner_acc]) True -> { - // Resolve the real inode (stat follows symlinks) to detect loops. use info <- result.try(io.file_info(full_path)) - // Windows/stat may return inode 0 for all files; fall back to using - // the (full) path string in that case so we still avoid infinite - // recursion. We store everything as strings for simplicity. - let key = case info.inode { - 0 -> full_path - _ -> - "dev:" - <> int.to_string(info.dev) - <> ";ino:" - <> int.to_string(info.inode) - } + let key = inode_key(info, full_path) case set.contains(visited, key) { // Already visited: list the entry but do not descend. True -> Ok([item_rel, ..inner_acc]) - False -> { - let new_visited = set.insert(visited, key) - do_list_recursive(root, item_rel, new_visited, [ + False -> + do_list_recursive(root, item_rel, set.insert(visited, key), [ item_rel, ..inner_acc ]) - } } } } @@ -106,15 +96,7 @@ pub fn copy_directory(src: String, dest: String) -> Result(Nil, FioError) { False -> Error(Enotdir) True -> { use root_info <- result.try(io.file_info(src)) - let root_key = case root_info.inode { - 0 -> src - _ -> - "dev:" - <> int.to_string(root_info.dev) - <> ";ino:" - <> int.to_string(root_info.inode) - } - let visited = set.from_list([root_key]) + let visited = set.from_list([inode_key(root_info, src)]) do_copy_directory(src, dest, visited) } } @@ -137,14 +119,7 @@ fn do_copy_directory( False -> io.copy_file(src_path, dest_path) True -> { use info <- result.try(io.file_info(src_path)) - let key = case info.inode { - 0 -> src_path - _ -> - "dev:" - <> int.to_string(info.dev) - <> ";ino:" - <> int.to_string(info.inode) - } + let key = inode_key(info, src_path) case set.contains(visited, key) { True -> Ok(Nil) False -> diff --git a/test/fio_test.gleam b/test/fio_test.gleam index 5d03975..c7c1000 100644 --- a/test/fio_test.gleam +++ b/test/fio_test.gleam @@ -1,8 +1,11 @@ import fio import fio/error.{Enoent, NotUtf8} import fio/handle +import fio/json as fjson +import fio/observer import fio/path import fio/types +import gleam/bit_array import gleam/list import gleam/option.{None, Some} import gleam/order @@ -734,3 +737,287 @@ pub fn handle_tell_test() { let assert Ok(_) = handle.close(h) let assert Ok(_) = fio.delete(p) } + +// ============================================================================ +// ensure_file +// ============================================================================ + +pub fn ensure_file_creates_when_missing_test() { + let p = "_test_ensure_file_new.txt" + let assert False = fio.exists(p) + let assert Ok(Nil) = fio.ensure_file(p) + fio.exists(p) |> should.equal(True) + fio.read(p) |> should.equal(Ok("")) + let assert Ok(_) = fio.delete(p) +} + +pub fn ensure_file_noop_when_exists_test() { + let p = "_test_ensure_file_existing.txt" + let assert Ok(_) = fio.write(p, "preserve me") + let assert Ok(Nil) = fio.ensure_file(p) + fio.read(p) |> should.equal(Ok("preserve me")) + let assert Ok(_) = fio.delete(p) +} + +// ============================================================================ +// copy_if_newer +// ============================================================================ + +pub fn copy_if_newer_copies_when_dest_missing_test() { + let src = "_test_cin_src.txt" + let dest = "_test_cin_dest.txt" + let assert Ok(_) = fio.write(src, "source") + let assert False = fio.exists(dest) + fio.copy_if_newer(src, dest) |> should.equal(Ok(True)) + fio.read(dest) |> should.equal(Ok("source")) + let assert Ok(_) = fio.delete(src) + let assert Ok(_) = fio.delete(dest) +} + +pub fn copy_if_newer_no_error_when_same_mtime_test() { + let src = "_test_cin_same_src.txt" + let dest = "_test_cin_same_dest.txt" + let assert Ok(_) = fio.write(src, "original") + let assert Ok(_) = fio.write(dest, "destination") + let assert Ok(_) = fio.touch(dest) + case fio.copy_if_newer(src, dest) { + Ok(_) -> Nil + Error(_) -> should.fail() + } + let assert Ok(_) = fio.delete(src) + let assert Ok(_) = fio.delete(dest) +} + +// ============================================================================ +// read_fold (streaming) +// ============================================================================ + +pub fn read_fold_counts_bytes_test() { + let p = "_test_read_fold.bin" + let assert Ok(_) = + fio.write_bits(p, <<0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11>>) + let result = + fio.read_fold(p, 4, 0, fn(acc, chunk) { + acc + bit_array.byte_size(chunk) + }) + result |> should.equal(Ok(12)) + let assert Ok(_) = fio.delete(p) +} + +pub fn read_fold_collects_chunks_test() { + let p = "_test_read_fold_collect.txt" + let assert Ok(_) = fio.write(p, "abcdefgh") + let result = fio.read_fold(p, 2, [], fn(acc, chunk) { [chunk, ..acc] }) + case result { + Ok(chunks) -> list.length(chunks) |> should.equal(4) + Error(_) -> should.fail() + } + let assert Ok(_) = fio.delete(p) +} + +// ============================================================================ +// handle.fold_chunks +// ============================================================================ + +pub fn handle_fold_chunks_test() { + let p = "_test_fold_chunks.bin" + let assert Ok(_) = fio.write_bits(p, <<10, 20, 30, 40, 50, 60>>) + let assert Ok(h) = handle.open(p, handle.ReadOnly) + let result = + handle.fold_chunks(h, 3, 0, fn(acc, chunk) { + acc + bit_array.byte_size(chunk) + }) + let assert Ok(_) = handle.close(h) + result |> should.equal(Ok(6)) + let assert Ok(_) = fio.delete(p) +} + +// ============================================================================ +// fio/json helpers +// ============================================================================ + +pub fn json_read_json_ok_test() { + let p = "_test_json_read.json" + let assert Ok(_) = fio.write(p, "{\"key\":\"value\"}") + let result = fjson.read_json(p, fn(s) { Ok(s) }) + result |> should.equal(Ok("{\"key\":\"value\"}")) + let assert Ok(_) = fio.delete(p) +} + +pub fn json_read_json_io_error_test() { + let result = fjson.read_json("_nonexistent_json_fio.json", fn(s) { Ok(s) }) + case result { + Error(fjson.IoError(Enoent)) -> Nil + _ -> should.fail() + } +} + +pub fn json_read_json_parse_error_test() { + let p = "_test_json_parse_err.json" + let assert Ok(_) = fio.write(p, "not json") + let result = fjson.read_json(p, fn(_s) { Error("invalid json") }) + case result { + Error(fjson.ParseError("invalid json")) -> Nil + _ -> should.fail() + } + let assert Ok(_) = fio.delete(p) +} + +pub fn json_write_json_atomic_test() { + let p = "_test_json_write_atomic.json" + let assert Ok(_) = + fjson.write_json_atomic(p, "hello", fn(s) { "\"" <> s <> "\"" }) + fio.read(p) |> should.equal(Ok("\"hello\"")) + let assert Ok(_) = fio.delete(p) +} + +// ============================================================================ +// fio/observer helpers +// ============================================================================ + +pub fn observer_trace_ok_test() { + let p = "_test_observer_trace.txt" + let assert Ok(_) = fio.write(p, "observed") + let seen = "_test_observer_flag.txt" + fio.read(p) + |> observer.trace("read", p, fn(event) { + case event.outcome { + Ok(_) -> { + let assert Ok(_) = fio.write(seen, "ok") + Nil + } + Error(_) -> Nil + } + }) + |> should.equal(Ok("observed")) + fio.exists(seen) |> should.equal(True) + let assert Ok(_) = fio.delete(p) + let assert Ok(_) = fio.delete(seen) +} + +pub fn observer_trace_error_propagates_test() { + let flag = "_test_observer_err_flag.txt" + fio.read("_nonexistent_observer_test.txt") + |> observer.trace("read", "_nonexistent_observer_test.txt", fn(event) { + case event.outcome { + Error(_) -> { + let assert Ok(_) = fio.write(flag, "error_seen") + Nil + } + Ok(_) -> Nil + } + }) + |> should.equal(Error(error.Enoent)) + fio.read(flag) |> should.equal(Ok("error_seen")) + let assert Ok(_) = fio.delete(flag) +} + +pub fn observer_emit_with_bytes_test() { + let p = "_test_observer_bytes.bin" + let assert Ok(_) = fio.write_bits(p, <<1, 2, 3, 4>>) + let recorded = "_test_observer_bytes_flag.txt" + fio.read_bits(p) + |> observer.emit("read_bits", p, option.Some(4), fn(event) { + case event.bytes { + option.Some(n) -> { + let assert Ok(_) = fio.write(recorded, "bytes=" <> string.inspect(n)) + Nil + } + option.None -> Nil + } + }) + |> should.equal(Ok(<<1, 2, 3, 4>>)) + fio.read(recorded) |> should.equal(Ok("bytes=4")) + let assert Ok(_) = fio.delete(p) + let assert Ok(_) = fio.delete(recorded) +} + +pub fn observer_trace_bytes_infers_size_test() { + let p = "_test_observer_trace_bytes.bin" + let assert Ok(_) = fio.write_bits(p, <<10, 20, 30>>) + let recorded = "_test_obs_tb_flag.txt" + fio.read_bits(p) + |> observer.trace_bytes("read_bits", p, fn(event) { + case event.bytes { + option.Some(n) -> { + let assert Ok(_) = fio.write(recorded, string.inspect(n)) + Nil + } + option.None -> Nil + } + }) + |> should.equal(Ok(<<10, 20, 30>>)) + fio.read(recorded) |> should.equal(Ok("3")) + let assert Ok(_) = fio.delete(p) + let assert Ok(_) = fio.delete(recorded) +} + +pub fn observer_format_ok_test() { + let event = + observer.Event( + op: "write", + path: "foo.txt", + outcome: Ok(Nil), + bytes: option.None, + ) + observer.format(event) + |> should.equal("[fio] write foo.txt -> ok") +} + +pub fn observer_format_error_with_bytes_test() { + let event = + observer.Event( + op: "read", + path: "bar.txt", + outcome: Error(error.Enoent), + bytes: option.Some(512), + ) + let desc = observer.format(event) + string.contains(desc, "err(") |> should.equal(True) + string.contains(desc, "bytes=512") |> should.equal(True) +} + +pub fn observer_fan_out_test() { + let flag1 = "_test_fanout_1.txt" + let flag2 = "_test_fanout_2.txt" + let sink1 = fn(_e: observer.Event) { + let assert Ok(_) = fio.write(flag1, "s1") + Nil + } + let sink2 = fn(_e: observer.Event) { + let assert Ok(_) = fio.write(flag2, "s2") + Nil + } + let combined = observer.fan_out(sink1, sink2) + fio.write("_test_fanout_src.txt", "x") + |> observer.trace("write", "_test_fanout_src.txt", combined) + |> should.equal(Ok(Nil)) + fio.read(flag1) |> should.equal(Ok("s1")) + fio.read(flag2) |> should.equal(Ok("s2")) + let assert Ok(_) = fio.delete(flag1) + let assert Ok(_) = fio.delete(flag2) + let assert Ok(_) = fio.delete("_test_fanout_src.txt") +} + +pub fn observer_noop_sink_test() { + // noop_sink must not raise or alter the result + fio.write("_test_noop_sink.txt", "data") + |> observer.trace("write", "_test_noop_sink.txt", observer.noop_sink) + |> should.equal(Ok(Nil)) + let assert Ok(_) = fio.delete("_test_noop_sink.txt") +} + +// ============================================================================ +// error.describe Unknown with context +// ============================================================================ + +pub fn error_describe_unknown_with_context_test() { + let e = error.Unknown("raw_error", option.Some("extra context")) + error.describe(e) + |> should.equal("Unknown error: raw_error (extra context)") +} + +pub fn error_describe_unknown_no_context_test() { + let e = error.Unknown("raw_error", option.None) + error.describe(e) |> should.equal("Unknown error: raw_error") +}