Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
e249867
Add AzureBlobFileSystem placeholder, verify devtools::document() beha…
Collinbrown95 Mar 3, 2026
a9f85c3
add simple test function to work through codegen.R
Collinbrown95 Mar 5, 2026
00dfa1c
temporarily "force" build DARROW_R_WITH_AZUREFS build flag.
Collinbrown95 Mar 5, 2026
cb77017
Added c++ stub
marberts Mar 5, 2026
8b811aa
Updated codegen
marberts Mar 5, 2026
0e9d2b9
Added a comment
marberts Mar 5, 2026
1bc3e42
cleanup azurefs test function code
Collinbrown95 Mar 5, 2026
cb8ed62
document instructions to start local azurite container
Collinbrown95 Mar 5, 2026
0966367
add arrow_with_azure helper following convention for s3/gcp
Collinbrown95 Mar 5, 2026
e843a54
add ARROW_AZURE flag to nixlibs.R
Collinbrown95 Mar 5, 2026
a73ad16
debug first argument check
Collinbrown95 Mar 5, 2026
4b650d0
Renamed R6 class correctly
marberts Mar 11, 2026
0329839
Added endpoint + key, token, and default authentication
marberts Mar 11, 2026
b1e5fec
Finished logical for AzureFileSystem to match pyarrow
marberts Mar 11, 2026
765f89f
standardize on ARROW_R_WITH_AZURE instead of ARROW_R_WITH_AZUREFS
Collinbrown95 Mar 13, 2026
8c3efaf
standardize on ARROW_R_WITH_AZURE
Collinbrown95 Mar 13, 2026
868e653
Turn on ARROW_AZURE flag in nixlibs.R
Collinbrown95 Mar 13, 2026
8f8862c
drop temporary arrow env var hack
Collinbrown95 Mar 13, 2026
163a76a
temporary documentation of what I've tried so far
Collinbrown95 Mar 13, 2026
2b0e367
Add TODO note in configure script to remove hard-coded link flags
Collinbrown95 Mar 13, 2026
919edc3
initial filesystem tests
Collinbrown95 Mar 14, 2026
421ca19
uncomment line 256 of filesystem.cpp
Collinbrown95 Mar 15, 2026
b9e97a7
checkpoint: resolved segfault error
Collinbrown95 Mar 15, 2026
4e61485
skip test_filesystem tests that rely on being able to connect directl…
Collinbrown95 Mar 15, 2026
130cb3f
Add most test cases from test_filesystem and recreate a couple that w…
Collinbrown95 Mar 15, 2026
8b5bcbd
rename az_bucket to az_container
Collinbrown95 Mar 15, 2026
3c3e081
check that azurite is installed as precondition for test-azure.R script.
Collinbrown95 Mar 15, 2026
33e614f
add setup code to start azurite from the test-azure.R script, then ki…
Collinbrown95 Mar 15, 2026
f7615a7
run air formatter
Collinbrown95 Mar 15, 2026
8aeb29a
add documentation to az_container.
Collinbrown95 Mar 15, 2026
73e2355
docs: Updated documentation for AzureFileSystem and updated vignette …
marberts Mar 17, 2026
9034ec7
Updated installation vignettes to include Azure
marberts Mar 18, 2026
8bd4737
Updated install scripts
marberts Mar 18, 2026
e2cba0a
add tests for valid and invalid combinations of options to AzureFileS…
Collinbrown95 Mar 18, 2026
5260ba8
Ran pre-commit hooks
marberts Mar 19, 2026
7d52cbc
Removed tmp.md
marberts Mar 19, 2026
32f8422
wrap credential configuration methods with StopIfNotOk
Collinbrown95 Mar 19, 2026
f6107e2
move link flags to arrow_built_with ARROW_AZURE block in configure sc…
Collinbrown95 Mar 19, 2026
9f246a2
fix error message to check in test for empty call to AzureFileSystem$…
Collinbrown95 Mar 19, 2026
d1627e3
Updated docs to include libxml2
marberts Mar 24, 2026
46fc6dc
add simple test function to work through codegen.R
Collinbrown95 Mar 5, 2026
0d790f6
cleanup azurefs test function code
Collinbrown95 Mar 5, 2026
09c8089
Updated filesystem.cpp
marberts Mar 25, 2026
2ffa6ad
Updated build scripts
marberts Mar 25, 2026
fb58522
Update configure.win: fix typo
Collinbrown95 Mar 25, 2026
81d563b
Updated CI setup scripts for windows
marberts Mar 27, 2026
b3de919
Trying PKGBUILD and windows build script again
marberts Mar 27, 2026
fa17e21
Updated PKGBUILD
marberts Mar 27, 2026
b911bca
debug ci: add azure dependencies that CI tries to build from source.
Collinbrown95 Mar 29, 2026
3a1cc33
Removed tmp.md
marberts Apr 1, 2026
5ae1105
add install script to build azure-sdk-for-cpp dependencies from source.
Apr 4, 2026
1630574
add simple test function to work through codegen.R
Collinbrown95 Mar 5, 2026
51593bc
temporarily "force" build DARROW_R_WITH_AZUREFS build flag.
Collinbrown95 Mar 5, 2026
2ad013c
Added c++ stub
marberts Mar 5, 2026
c8a73e5
cleanup azurefs test function code
Collinbrown95 Mar 5, 2026
2ee6eb6
document instructions to start local azurite container
Collinbrown95 Mar 5, 2026
a47bb9e
debug first argument check
Collinbrown95 Mar 5, 2026
add7652
Added endpoint + key, token, and default authentication
marberts Mar 11, 2026
020fca2
Finished logical for AzureFileSystem to match pyarrow
marberts Mar 11, 2026
156a945
standardize on ARROW_R_WITH_AZURE
Collinbrown95 Mar 13, 2026
96ac7e9
temporary documentation of what I've tried so far
Collinbrown95 Mar 13, 2026
ddf80e3
Add TODO note in configure script to remove hard-coded link flags
Collinbrown95 Mar 13, 2026
bb5d887
uncomment line 256 of filesystem.cpp
Collinbrown95 Mar 15, 2026
d3d9b9e
checkpoint: resolved segfault error
Collinbrown95 Mar 15, 2026
9068886
Add most test cases from test_filesystem and recreate a couple that w…
Collinbrown95 Mar 15, 2026
8a8267d
check that azurite is installed as precondition for test-azure.R script.
Collinbrown95 Mar 15, 2026
93e1cdf
Ran pre-commit hooks
marberts Mar 19, 2026
6ef0963
Removed tmp.md
marberts Mar 19, 2026
f8e787d
wrap credential configuration methods with StopIfNotOk
Collinbrown95 Mar 19, 2026
a1d72ab
move link flags to arrow_built_with ARROW_AZURE block in configure sc…
Collinbrown95 Mar 19, 2026
82a5d37
add simple test function to work through codegen.R
Collinbrown95 Mar 5, 2026
246ff27
cleanup azurefs test function code
Collinbrown95 Mar 5, 2026
1bdf2ab
Trying PKGBUILD and windows build script again
marberts Mar 27, 2026
7148238
Updated PKGBUILD
marberts Mar 27, 2026
d1c73bb
debug ci: add azure dependencies that CI tries to build from source.
Collinbrown95 Mar 29, 2026
8839eb2
Removed tmp.md
marberts Apr 1, 2026
9639d0f
Removed windows build and added macos
marberts Apr 27, 2026
cb1cb9f
azure off for windows ci
marberts Apr 27, 2026
b367140
Disable azure in configure.win
marberts Apr 28, 2026
c4db209
Add placeholder comment to flag present issues with MinGW and the Azu…
Collinbrown95 May 9, 2026
ce2e058
remove debugging script to install azure sdk from source
Collinbrown95 May 10, 2026
1279b02
Updated vignettes to reflect Azure off on Windows
marberts May 12, 2026
c728602
Update github.packages.yml
marberts Jun 23, 2026
4c370ca
Fixing lint in filesystem.cpp
marberts Jun 27, 2026
df23828
Fixed warning in cpp code
marberts Jun 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/scripts/PKGBUILD
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ build() {
-DARROW_PACKAGE_PREFIX="${MINGW_PREFIX}" \
-DARROW_PARQUET=ON \
-DARROW_S3=ON \
-DARROW_AZURE=OFF \
-DARROW_SNAPPY_USE_SHARED=OFF \
-DARROW_USE_GLOG=OFF \
-DARROW_UTF8PROC_USE_SHARED=OFF \
Expand Down
3 changes: 2 additions & 1 deletion r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ URL: https://github.com/apache/arrow/, https://arrow.apache.org/docs/r/
BugReports: https://github.com/apache/arrow/issues
Encoding: UTF-8
Language: en-US
SystemRequirements: C++20; for AWS S3 support on Linux, libcurl and openssl (optional);
SystemRequirements: C++20; for AWS S3 support on Linux, libcurl and openssl, and
libxml2 for Azure (optional);
cmake >= 3.26 (build-time only, and only for full source build)
Biarch: true
Imports:
Expand Down
3 changes: 3 additions & 0 deletions r/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ S3method(vec_ptype_full,arrow_fixed_size_list)
S3method(vec_ptype_full,arrow_large_list)
S3method(vec_ptype_full,arrow_list)
export(Array)
export(AzureFileSystem)
export(Buffer)
export(BufferOutputStream)
export(BufferReader)
Expand Down Expand Up @@ -282,6 +283,7 @@ export(arrow_available)
export(arrow_info)
export(arrow_table)
export(arrow_with_acero)
export(arrow_with_azure)
export(arrow_with_dataset)
export(arrow_with_gcs)
export(arrow_with_json)
Expand All @@ -295,6 +297,7 @@ export(as_data_type)
export(as_record_batch)
export(as_record_batch_reader)
export(as_schema)
export(az_container)
export(binary)
export(bool)
export(boolean)
Expand Down
10 changes: 10 additions & 0 deletions r/R/arrow-info.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ arrow_info <- function() {
json = arrow_with_json(),
s3 = arrow_with_s3(),
gcs = arrow_with_gcs(),
azure = arrow_with_azure(),
utf8proc = "utf8_upper" %in% compute_funcs,
re2 = "replace_substring_regex" %in% compute_funcs,
vapply(tolower(names(CompressionType)[-1]), codec_is_available, logical(1))
Expand Down Expand Up @@ -128,6 +129,15 @@ arrow_with_gcs <- function() {
})
}

#' @rdname arrow_info
#' @export
arrow_with_azure <- function() {
tryCatch(.Call(`_azure_available`), error = function(e) {
return(FALSE)
})
}


#' @rdname arrow_info
#' @export
arrow_with_json <- function() {
Expand Down
4 changes: 4 additions & 0 deletions r/R/arrowExports.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

118 changes: 118 additions & 0 deletions r/R/filesystem.R
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,31 @@ FileSelector$create <- function(base_dir, allow_not_found = FALSE, recursive = F
#' - `default_metadata`: default metadata to write in new objects.
#' - `project_id`: the project to use for creating buckets.
#'
#' `AzureFileSystem$create()` takes following required argument:
#'
#' - `account_name`: Azure Blob Storage account name.
#'
#' `AzureFileSystem$create()` takes following optional arguments:
#'
#' - `account_key`: Account key of the storage account. Cannot be used with
#' `sas_token`.
#' - `blob_storage_authority`: Hostname of the blob service, defaulting to
#' `"blob.core.windows.net"`.
#' - `blob_storage_scheme`: Either `"http"` or `"https"` (the default).
#' - `client_id`: The client/application ID for Azure Active Directory
#' authentication. If used with `client_secret` and `tenant_id` then it is the
#' application ID for a registered Azure AD application. Otherwise, it is the
#' client ID of a user-assigned managed identity.
#' - `client_secret`: Client secret for Azure Active Directory authentication.
#' Must be provided with both `client_id` and `tenant_id`.
#' - `dfs_storage_authority`: Hostname of the data lake (gen 2) service,
#' defaulting to `"dfs.core.windows.net"`.
#' - `dfs_storage_scheme`: Either `"http"` or `"https"` (the default).
#' - `sas_token`: Shared access signature (SAS) token for the storage account.
#' Cannot be used with `account key`.
#' - `tenant_id`: Tenant ID for Azure Active Directory authentication. Must
#' be provided with both `client_id` and `client_secret`.
#'
#' @section Methods:
#'
#' - `path(x)`: Create a `SubTreeFileSystem` from the current `FileSystem`
Expand Down Expand Up @@ -253,6 +278,10 @@ FileSelector$create <- function(base_dir, allow_not_found = FALSE, recursive = F
#' (the default), 'ERROR', 'WARN', 'INFO', 'DEBUG' (recommended), 'TRACE', and
#' 'OFF'.
#'
#' On `AzureFileSystem`, passing no arguments for authentication uses the
#' `AzureDefaultCredential` for authentication, so that several authentication
#' types are tried until one succeeds.
#'
#' @usage NULL
#' @format NULL
#' @docType class
Expand Down Expand Up @@ -645,6 +674,95 @@ GcsFileSystem$create <- function(anonymous = FALSE, retry_limit_seconds = 15, ..
fs___GcsFileSystem__Make(anonymous, options)
}

#' @usage NULL
#' @format NULL
#' @rdname FileSystem
#' @importFrom utils modifyList
#' @export
AzureFileSystem <- R6Class(
"AzureFileSystem",
inherit = FileSystem
)

AzureFileSystem$create <- function(account_name, ...) {
options <- list(...)
valid_opts <- c(
"account_key",
"blob_storage_authority",
"blob_storage_scheme",
"client_id",
"client_secret",
"dfs_storage_authority",
"dfs_storage_scheme",
"sas_token",
"tenant_id"
)

invalid_opts <- setdiff(names(options), valid_opts)
if (length(invalid_opts)) {
stop(
"Invalid options for AzureFileSystem: ",
oxford_paste(invalid_opts),
call. = FALSE
)
}
# The c++ codes assumes that the various combinations of authentication methods
# have been validated in this function.
if (!is.null(options$tenant_id) || !is.null(options$client_id) || !is.null(options$client_secret)) {
if (is.null(options$client_id)) {
stop(
"`client_id` must be given with `tenant_id` and `client_secret`",
call. = FALSE
)
}
if (sum(is.null(options$tenant_id), is.null(options$client_secret)) == 1) {
stop(
"Provide only `client_id` to authenticate with ",
"Managed Identity Credential, or provide `client_id`, `tenant_id`, ",
"and`client_secret` to authenticate with Client Secret Credential",
call. = FALSE
)
}
} else if (!is.null(options$account_key) && !is.null(options$sas_token)) {
stop(
"Cannot specify both `account_key` and `sas_token`",
call. = FALSE
)
}

fs___AzureFileSystem__Make(c(account_name = account_name, options))
}

#' Connect to an Azure Blob Storage container
#'
#' `az_conainer` is a convenience function to create an `AzureFileSystem` object
#' that provides a file system interface for blob storage containers in an Azure
#' Storage Account.
#'
#' @param container_path string Container name or path.
#' @param ... Additional connection options, passed to `AzureFileSystem$create()`.
#'
#' @return A `SubTreeFileSystem` containing an `AzureFileSystem` and the container's
#' relative path. Note that this function's success does not guarantee that you
#' are authorized to access the container's contents.
#' @examplesIf FALSE
#' container_fs <- az_container(
#' container_path = "arrow-datasets",
#' account_name = azurite_account_name,
#' account_key = azurite_account_key,
#' blob_storage_authority = azurite_blob_storage_authority,
#' blob_storage_scheme = azurite_blob_storage_scheme
#' )
#' @export
az_container <- function(container_path, ...) {
assert_that(is.string(container_path))
args <- list2(...)

fs <- exec(AzureFileSystem$create, !!!args)

SubTreeFileSystem$create(container_path, fs)
}

#' @usage NULL
#' @format NULL
#' @rdname FileSystem
Expand Down
3 changes: 2 additions & 1 deletion r/_pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -261,10 +261,11 @@ reference:

- title: File systems
desc: >
Functions for working with files on S3 and GCS
Functions for working with files on S3, GCS, and Azure
contents:
- s3_bucket
- gs_bucket
- az_container
- copy_files

- title: Flight
Expand Down
6 changes: 5 additions & 1 deletion r/configure
Original file line number Diff line number Diff line change
Expand Up @@ -359,10 +359,14 @@ add_feature_flags () {
if arrow_built_with ARROW_S3; then
PKG_CFLAGS_FEATURES="$PKG_CFLAGS_FEATURES -DARROW_R_WITH_S3"
fi
if arrow_built_with ARROW_AZURE; then
PKG_CFLAGS_FEATURES="$PKG_CFLAGS_FEATURES -DARROW_R_WITH_AZURE"
PKG_LIBS_FEATURES="$PKG_LIBS_FEATURES -lxml2"
fi
if arrow_built_with ARROW_GCS; then
PKG_CFLAGS_FEATURES="$PKG_CFLAGS_FEATURES -DARROW_R_WITH_GCS"
fi
if arrow_built_with ARROW_GCS || arrow_built_with ARROW_S3; then
if arrow_built_with ARROW_GCS || arrow_built_with ARROW_S3 || arrow_built_with ARROW_AZURE; then
# If pkg-config is available it will handle this for us automatically
SSL_LIBS_WITHOUT_PC="-lcurl -lssl -lcrypto"
fi
Expand Down
13 changes: 11 additions & 2 deletions r/configure.win
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ function configure_binaries() {
# pkg-config --libs libcurl
GCS_LIBS="-lcurl -lnormaliz -lssh2 -lgdi32 -lssl -lcrypto -lcrypt32 -lwldap32 \
-lz -lws2_32 -lnghttp2 -ldbghelp"
# AZURE_LIBS="-lcurl -lssl -lxml2"

# Set the right flags to point to and enable arrow/parquet
if [ -d "windows/r-libarrow-windows-x86_64-$VERSION" ]; then
Expand Down Expand Up @@ -94,8 +95,8 @@ function configure_binaries() {
# S3, GCS, and re2 support only for Rtools40 (i.e. R >= 4.0)
"${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" -e 'R.version$major >= 4' | grep TRUE >/dev/null 2>&1
if [ $? -eq 0 ]; then
PKG_CFLAGS="${PKG_CFLAGS} -DARROW_R_WITH_S3 -DARROW_R_WITH_GCS"
PKG_LIBS="${PKG_LIBS} -lre2 ${AWS_LIBS} ${GCS_LIBS}"
PKG_CFLAGS="${PKG_CFLAGS} -DARROW_R_WITH_S3 -DARROW_R_WITH_GCS" # -DARROW_R_WITH_AZURE
PKG_LIBS="${PKG_LIBS} -lre2 ${AWS_LIBS} ${GCS_LIBS}" # ${AZURE_LIBS}
else
# It seems that order matters
PKG_LIBS="${PKG_LIBS} -lws2_32"
Expand Down Expand Up @@ -187,6 +188,10 @@ add_feature_flags () {
if arrow_built_with ARROW_S3; then
PKG_CFLAGS_FEATURES="$PKG_CFLAGS_FEATURES -DARROW_R_WITH_S3"
fi
# if arrow_built_with ARROW_AZURE; then
# PKG_CFLAGS_FEATURES="$PKG_CFLAGS_FEATURES -DARROW_R_WITH_AZURE"
# PKG_LIBS_FEATURES="$PKG_LIBS_FEATURES -lxml2"
# fi
if arrow_built_with ARROW_GCS; then
PKG_CFLAGS_FEATURES="$PKG_CFLAGS_FEATURES -DARROW_R_WITH_GCS"
fi
Expand Down Expand Up @@ -292,6 +297,10 @@ function configure_dev() {
PKG_CFLAGS="$PKG_CFLAGS -DARROW_R_WITH_GCS"
fi

# if [ $(cmake_option ARROW_AZURE) -eq 1 ]; then
# PKG_CFLAGS="$PKG_CFLAGS -DARROW_R_WITH_AZURE"
# fi

if [ $(cmake_option ARROW_JSON) -eq 1 ]; then
PKG_CFLAGS="$PKG_CFLAGS -DARROW_R_WITH_JSON"
fi
Expand Down
2 changes: 1 addition & 1 deletion r/data-raw/codegen.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
# Ensure that all machines are sorting the same way
invisible(Sys.setlocale("LC_COLLATE", "C"))

features <- c("acero", "dataset", "substrait", "parquet", "s3", "gcs", "json")
features <- c("acero", "dataset", "substrait", "parquet", "s3", "gcs", "azure", "json")

suppressPackageStartupMessages({
library(decor)
Expand Down
1 change: 1 addition & 0 deletions r/inst/build_arrow_static.sh
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ ${CMAKE_WRAPPER} ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
-Dlz4_SOURCE=${lz4_SOURCE:-} \
-DARROW_FILESYSTEM=ON \
-DARROW_GCS=${ARROW_GCS:-OFF} \
-DARROW_AZURE=${ARROW_AZURE:-$ARROW_DEFAULT_PARAM} \
-DARROW_JEMALLOC=${ARROW_JEMALLOC:-$ARROW_DEFAULT_PARAM} \
-DARROW_MIMALLOC=${ARROW_MIMALLOC:-ON} \
-DARROW_JSON=${ARROW_JSON:-ON} \
Expand Down
35 changes: 35 additions & 0 deletions r/man/FileSystem.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions r/man/arrow_info.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading