Skip to content

log: add log_location parameter to allow log redirection to file#11858

Open
philippfriese wants to merge 3 commits intoofiwg:mainfrom
philippfriese:core_log_location
Open

log: add log_location parameter to allow log redirection to file#11858
philippfriese wants to merge 3 commits intoofiwg:mainfrom
philippfriese:core_log_location

Conversation

@philippfriese
Copy link
Contributor

This PR adds the parameter log_location / the environment variabel FI_LOG_LOCATION to the standard logging provider in to allow control of the output location of the libfabric log. Possible options are: stderr (default), stdout, as well as a path to a file or to a directory. If a directory is provided, a log file with format ofi_<pid>.log is used. Log files are created with the file mode specified via log_location_mode / FI_LOG_LOCATION_MODE, defaulting to 0600, and must not exist prior to launching libfabric[1].

The use-case of this parameter concerns multi-process application cases, such as MPI applications. By default, libfabric writes to stderr, which is usually unbuffered. Even when stderr of an MPI application is redirected to per-process files, for example using Open MPIs --output-filename option, then the libfabric log still contains interleaved lines, resulting in one MPI process' log containing a log line from another MPI process.
When modifying stderr to be line-buffered, then libfabric log entries are no longer interleaved, but may be out-of-order, which may render certain provider outputs impossible to interpret correctly. An example is the output of the ofi_hook_profile provider, which relies on strict line ordering for its output sematic.

While libfabric allows the logging provider to be extended and overwritten programmatically, this approach is not feasible for applications using another communication library on-top of libfabric, such as MPI applications.
The parameter introduced in this PR addresses this.

[1] This is to prevent libfabric from overwriting/modifying files via an accidentally or intentionally mis-crafted log location path.

Usage Example

$ FI_LOG_LEVEL=debug FI_LOG_LOCATION="./" fi_info -l
ofi_rxm:
    version: 201.0
ofi_rxd:
[...]
$ ls ./
ofi_545157.log
$ cat ofi_545157.log
libfabric:545157:1770045780::core:core:fi_param_get_():372<info> variable perf_cntr=<not set>
libfabric:545157:1770045780::core:core:fi_param_get_():372<info> variable hook=<not set>
[...]

@shefty
Copy link
Member

shefty commented Feb 4, 2026

The line indentation is off -- using spaces instead of tabs. A library should never print without explicitly being enabled to do so. For all the library knows, stderr could have been closed and redirected to an application file. Use only logging functions to print errors, not direct calls to fprintf.

@philippfriese
Copy link
Contributor Author

Thank you for the feedback @shefty. I have pushed an updated commit which moves the fprintf calls to the OFI logging macros.
I also moved the handling of log_location{_mode} to after the handling of the other log_* settings in fi_log_init and added a check for log_location != NULL to ofi_log_enabled. Without this check, a small amount of logging calls, emitted by the fi_param_get/define functions in fi_log_init after setting log_mask and before calling fi_log_location_init, would be logged to stderr even if FI_LOG_LOCATION has been set to a valid, non-stderr target, breaking the semantics of that setting.
As a consequence, these logging calls do not get emitted at all. This matches behaviour of the calls to fi_param* before setting log_mask, which also do not emit logs, even prior to this PR.

As for the line indentation: I adjusted some indentations but overall could not find places where my changes use spaces instead of tabs. Is this an issue on my end or did I miss spots where I did in fact use spaces for indentation?

src/log.c Outdated
char path_buffer[PATH_MAX];
int len;

if (strncmp(locationstr, "stderr", 6) == 0 || *locationstr == '\0') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use strcmp instead. Reason: (1) for exact match; (2) the second string is constant with known length so the comparison will stop after 6 characters any way. Same for L126.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Changed in the updated commit.

@j-xiong
Copy link
Contributor

j-xiong commented Feb 23, 2026

Windows build failure is real:

  C:\projects\libfabric\src\log.c(43,1): fatal error C1083: Cannot open include file: 'libgen.h': No such file or directory

@j-xiong
Copy link
Contributor

j-xiong commented Feb 23, 2026

The CodeQL warning needs to be addressed as well.

@philippfriese philippfriese force-pushed the core_log_location branch 4 times, most recently from 0a43015 to bc3bd19 Compare February 24, 2026 17:16
@philippfriese
Copy link
Contributor Author

Thanks for the feedback @j-xiong! The updated commit moves away from fopen + chmod and to open with appropriate flags + fdopen, which resolves the CodeQL TOCTOU warnings. This also removed the requirement of dirname/libgen.h, clearing the previous Windows build issues.
In addition, I added a brief #ifdef _WIN32 which defines: PATH_MAX as MAX_PATH, mode_t as an uint32_t, as well as {fd}open as _{fd}open. These are not present/defined when using the Windows toolchain. Please let me know whether this is an appropriate way to address these compilation issues.

src/log.c Outdated
Comment on lines +52 to +57
#ifdef _WIN32
#define PATH_MAX MAX_PATH
typedef uint32_t mode_t;
#define open _open
#define fdopen _fdopen
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should go into include/windows/osd.h

Copy link
Contributor Author

@philippfriese philippfriese Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, moved in updated commit. Also added _close redefinition needed below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding #define close _close to include/windows/osd.h breaks several OFI components which rely on the close keyword (see https://ci.appveyor.com/project/ofiwg/libfabric/builds/53592941). I presume something similar could happen with open as well.
Would it make sense to keep the PATH_MAX and mode_t definitions in osd.h and keep a separate #ifdef _WIN32 in log.c for the respective file-descriptor calls? This would prevent this rather generic redefinition from spreading to undesired places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining open and close as static inline functions should work.

goto error_log_location;
log_location = fdopen(fd, "r+");
if (log_location == NULL)
goto error_log_location;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should close the opened fd when fdopen fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! Addressed in the updated commit.

@philippfriese philippfriese force-pushed the core_log_location branch 3 times, most recently from a3177a3 to 243cc67 Compare February 25, 2026 16:17
return posix_memalign(memptr, alignment, size);
}

static inline int ofi_osd_close(int fd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use ofi_close and ofi_open.

@philippfriese philippfriese force-pushed the core_log_location branch 2 times, most recently from d53801e to 2efab06 Compare February 26, 2026 12:57
@philippfriese
Copy link
Contributor Author

@j-xiong Thank you for the prompt feedback! I moved to ofi_{open,close}.
The implementation of ofi_open is somewhat more involved than expected: open takes a variadic argument mode if certain flags are set. However, it does not take a va_list. The added code checks, for each OS, whether mode is expected (for Linux: O_CREAT or O_TMPFILE are set, the rest: O_CREAT is set), and if so extracts it from the va_list and calls open with it.

Comment on lines +80 to +99
static inline int ofi_open(const char* path, int flags, ...)
{
// open takes a variadic argument 'mode_t mode', but not a va_list
// extract and pass mode if required, as indicated by flag O_CREAT
va_list args;
int fd;
mode_t mode;

if ((flags & O_CREAT) != 0) {
va_start(args, flags);
mode = (mode_t)va_arg(args, int);
fd = open(path, flags, mode);
va_end(args);
} else {
fd = open(path, flags);
}

return fd;
}

Copy link
Contributor

@j-xiong j-xiong Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything wrong with the following?

static inline int ofi_open(const char *path, int flags, ...)
{
        int ret; 
        va_list args;

        va_start(args, flags);
        ret = open(path, flags, args);
        va_end(args);

        return ret;
}

At least it works fine on Linux.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using this program to test the ofi_open implementation with va_list, please let me know what the errors you see on your platforms.

#include <stdio.h>
#include <stdarg.h>
#include <fcntl.h>
#include <unistd.h>

static inline int ofi_open(const char *path, int flags, ...)
{
        int ret;
        va_list args;

        va_start(args, flags);
        ret = open(path, flags, args);
        va_end(args);

        return ret;
}

int main()
{
        int fd;

        fd = ofi_open("test.txt", O_CREAT, 0666);
        if (fd < 0) {
                perror("open new");
                return 1;
        }
        printf("File created successfully with fd: %d\n", fd);
        close(fd);

        fd = ofi_open("test.txt", O_RDONLY);
        if (fd < 0) {
                perror("open existing");
                return 1;
        }
        printf("File opened successfully with fd: %d\n", fd);
        close(fd);

        return 0;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On an arm64-based macOS 24, I can both compile and successfully execute this snippet, on an x86-based Ubuntu 24.04 I can compile, but not successfully execute due to a "Permission denied" on the second ofi_open call. In both cases however, the provided mode 0666 is invalid:

macOS arm64:

% ./main.o
File created successfully with fd: 3
File opened successfully with fd: 3
% ls -lah test.txt
-rwx------  1 pfriese  wheel     0B 28 Feb 08:15 test.txt

Ubuntu x86:

% ./main.out
File created successfully with fd: 3
open existing: Permission denied
% ls -lah test.txt
---x--S---  1 friese tumuser    0 Feb 28 08:11 test.txt

Repeated invocation on Ubuntu yields random modes on the created file. I believe that the reason for this is that open expects a proper variadic argument, and not a va_list.
If my understanding is correct, then the involved va_arg call in the open code path extracts the variable argument from the stack (or from registers containing the function arguments). In the above case (and my initial implementation), that contains the va_list structure, which contains pointers to where mode is, but not mode itself.
I assume the consistent (but still wrong) result on ARM/Darwin, as opposed to the seemingly random behaviour on x86/Linux, is due to different behaviour of va_arg / different function argument passing on these platforms.

I toyed with always extracting the mode parameter in ofi_open and always passing it on, but ultimately decided to use the commited implementation given that va_arg is UB if no variable argument is given:

va_arg()
[...] If there is no next argument [...], random errors will occur. [man va_arg]

glibc handles this by only extracting mode if the __OPEN_MEEDS_MODE macro succeeds, which checks against O_CREAT and, if available, against O_TMPFILE. This way, "random errors" will only occur if the function has been called incorrectly. The man-page for open acknowledges this behaviour on incorrect usage:

The mode argument must be supplied if O_CREAT or O_TMPFILE is specified in flags; if it is not supplied, some arbitrary bytes from the stack will be applied as the file mode. [man 2 open]

Checking the glib-implementation of open, I noticed that it always calls open with mode, which is either zero-initialised or populated via va_arg. I will push a new commit with an updated implementation that matches this approach.

Copy link
Contributor

@j-xiong j-xiong Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

Thinking it more, how useful is setting the permissions of the log files? If only taking the default permissions, we can get away with using fopen and fclose only, which don't need wrappers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about doing that. The default permission is 0666 & umask, which equates to 0644 on most platforms. I would assume that not everyone wants their libfabric logs to be globally readable by default, especially on multi-tenant systems.
Users can of course adjust their umask prior to launching libfabric, but given that we are a library, this would imply that users automatically change the behaviour of the entire application, which might be prohibitive or at least inconvenient in some cases.

I also thought about exposing control of the umask of just the libfabric log via something like log_location_umask, but that makes for a rather awkward interface and I would expect users to rightfully ask why they cannot set the mode directly.

One way to avoid the open wrappers is to statically set the umask to 0077 while creating the log file so that it gets created with a mode of 0600. This would mean that users would have to manually chmod their log files in a second step if they want something else. The implementation would be simpler, but users would have less options.
Let me know how to proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

Could you do a rebase so that AWS CI can pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed commits with updated implementation of wrappers (now matches the glibc implementation linked above) and rebased against current main.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the user create and restrict a directory and not bother with file level permissions?

@j-xiong
Copy link
Contributor

j-xiong commented Feb 27, 2026

bot:aws:retest

@shijin-aws
Copy link
Contributor

bot:aws:retest

@j-xiong it is the same problem as the hook PR, this PR needs to be rebased to resolve AWS CI failure

@j-xiong
Copy link
Contributor

j-xiong commented Feb 28, 2026

@shijin-aws Thanks!

Comment on lines +93 to +94
fd = open(path, flags, mode);
return fd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use return open(path, flags, mode); and eliminate fd. Same for other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! Changed in updated commit.

Windows deprecated the functions 'open', 'fdopen', and 'close' and
emits a C4996 compilation warning when used. Libfabric is built in SDL
mode, turning this warning into an error.

This commit adds 'ofi_{open,fdopen,close}', which on Unix wrap the
original libc functions, and on Windows wrap the Windows-variants
'_{open,fdopen,close}'.

Signed-off-by: Philipp Friese <philipp.friese@cit.tum.de>
Signed-off-by: Philipp Friese <philipp.friese@cit.tum.de>
This commit introduces the parameter 'log_location', which can be set to
alter the default log output to stderr (default), stdout, a file, or a
directory. 'log_location_mode' is introduced to control the file mode
of the log file.

Signed-off-by: Philipp Friese <philipp.friese@cit.tum.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants