log: add log_location parameter to allow log redirection to file#11858
log: add log_location parameter to allow log redirection to file#11858philippfriese wants to merge 3 commits intoofiwg:mainfrom
Conversation
|
The line indentation is off -- using spaces instead of tabs. A library should never print without explicitly being enabled to do so. For all the library knows, stderr could have been closed and redirected to an application file. Use only logging functions to print errors, not direct calls to fprintf. |
2da5ff9 to
c1d1924
Compare
|
Thank you for the feedback @shefty. I have pushed an updated commit which moves the fprintf calls to the OFI logging macros. As for the line indentation: I adjusted some indentations but overall could not find places where my changes use spaces instead of tabs. Is this an issue on my end or did I miss spots where I did in fact use spaces for indentation? |
src/log.c
Outdated
| char path_buffer[PATH_MAX]; | ||
| int len; | ||
|
|
||
| if (strncmp(locationstr, "stderr", 6) == 0 || *locationstr == '\0') { |
There was a problem hiding this comment.
Better to use strcmp instead. Reason: (1) for exact match; (2) the second string is constant with known length so the comparison will stop after 6 characters any way. Same for L126.
There was a problem hiding this comment.
Thanks! Changed in the updated commit.
c1d1924 to
5565f71
Compare
|
Windows build failure is real: |
|
The CodeQL warning needs to be addressed as well. |
0a43015 to
bc3bd19
Compare
|
Thanks for the feedback @j-xiong! The updated commit moves away from |
src/log.c
Outdated
| #ifdef _WIN32 | ||
| #define PATH_MAX MAX_PATH | ||
| typedef uint32_t mode_t; | ||
| #define open _open | ||
| #define fdopen _fdopen | ||
| #endif |
There was a problem hiding this comment.
These changes should go into include/windows/osd.h
There was a problem hiding this comment.
Thanks, moved in updated commit. Also added _close redefinition needed below.
There was a problem hiding this comment.
Adding #define close _close to include/windows/osd.h breaks several OFI components which rely on the close keyword (see https://ci.appveyor.com/project/ofiwg/libfabric/builds/53592941). I presume something similar could happen with open as well.
Would it make sense to keep the PATH_MAX and mode_t definitions in osd.h and keep a separate #ifdef _WIN32 in log.c for the respective file-descriptor calls? This would prevent this rather generic redefinition from spreading to undesired places.
There was a problem hiding this comment.
Defining open and close as static inline functions should work.
| goto error_log_location; | ||
| log_location = fdopen(fd, "r+"); | ||
| if (log_location == NULL) | ||
| goto error_log_location; |
There was a problem hiding this comment.
Should close the opened fd when fdopen fails.
There was a problem hiding this comment.
Good catch, thanks! Addressed in the updated commit.
a3177a3 to
243cc67
Compare
include/unix/osd.h
Outdated
| return posix_memalign(memptr, alignment, size); | ||
| } | ||
|
|
||
| static inline int ofi_osd_close(int fd) |
There was a problem hiding this comment.
Just use ofi_close and ofi_open.
d53801e to
2efab06
Compare
|
@j-xiong Thank you for the prompt feedback! I moved to |
| static inline int ofi_open(const char* path, int flags, ...) | ||
| { | ||
| // open takes a variadic argument 'mode_t mode', but not a va_list | ||
| // extract and pass mode if required, as indicated by flag O_CREAT | ||
| va_list args; | ||
| int fd; | ||
| mode_t mode; | ||
|
|
||
| if ((flags & O_CREAT) != 0) { | ||
| va_start(args, flags); | ||
| mode = (mode_t)va_arg(args, int); | ||
| fd = open(path, flags, mode); | ||
| va_end(args); | ||
| } else { | ||
| fd = open(path, flags); | ||
| } | ||
|
|
||
| return fd; | ||
| } | ||
|
|
There was a problem hiding this comment.
Anything wrong with the following?
static inline int ofi_open(const char *path, int flags, ...)
{
int ret;
va_list args;
va_start(args, flags);
ret = open(path, flags, args);
va_end(args);
return ret;
}
At least it works fine on Linux.
There was a problem hiding this comment.
I am using this program to test the ofi_open implementation with va_list, please let me know what the errors you see on your platforms.
#include <stdio.h>
#include <stdarg.h>
#include <fcntl.h>
#include <unistd.h>
static inline int ofi_open(const char *path, int flags, ...)
{
int ret;
va_list args;
va_start(args, flags);
ret = open(path, flags, args);
va_end(args);
return ret;
}
int main()
{
int fd;
fd = ofi_open("test.txt", O_CREAT, 0666);
if (fd < 0) {
perror("open new");
return 1;
}
printf("File created successfully with fd: %d\n", fd);
close(fd);
fd = ofi_open("test.txt", O_RDONLY);
if (fd < 0) {
perror("open existing");
return 1;
}
printf("File opened successfully with fd: %d\n", fd);
close(fd);
return 0;
}
There was a problem hiding this comment.
On an arm64-based macOS 24, I can both compile and successfully execute this snippet, on an x86-based Ubuntu 24.04 I can compile, but not successfully execute due to a "Permission denied" on the second ofi_open call. In both cases however, the provided mode 0666 is invalid:
macOS arm64:
% ./main.o
File created successfully with fd: 3
File opened successfully with fd: 3
% ls -lah test.txt
-rwx------ 1 pfriese wheel 0B 28 Feb 08:15 test.txtUbuntu x86:
% ./main.out
File created successfully with fd: 3
open existing: Permission denied
% ls -lah test.txt
---x--S--- 1 friese tumuser 0 Feb 28 08:11 test.txt
Repeated invocation on Ubuntu yields random modes on the created file. I believe that the reason for this is that open expects a proper variadic argument, and not a va_list.
If my understanding is correct, then the involved va_arg call in the open code path extracts the variable argument from the stack (or from registers containing the function arguments). In the above case (and my initial implementation), that contains the va_list structure, which contains pointers to where mode is, but not mode itself.
I assume the consistent (but still wrong) result on ARM/Darwin, as opposed to the seemingly random behaviour on x86/Linux, is due to different behaviour of va_arg / different function argument passing on these platforms.
I toyed with always extracting the mode parameter in ofi_open and always passing it on, but ultimately decided to use the commited implementation given that va_arg is UB if no variable argument is given:
va_arg()
[...] If there is no next argument [...], random errors will occur. [man va_arg]
glibc handles this by only extracting mode if the __OPEN_MEEDS_MODE macro succeeds, which checks against O_CREAT and, if available, against O_TMPFILE. This way, "random errors" will only occur if the function has been called incorrectly. The man-page for open acknowledges this behaviour on incorrect usage:
The mode argument must be supplied if O_CREAT or O_TMPFILE is specified in flags; if it is not supplied, some arbitrary bytes from the stack will be applied as the file mode. [man 2 open]
Checking the glib-implementation of open, I noticed that it always calls open with mode, which is either zero-initialised or populated via va_arg. I will push a new commit with an updated implementation that matches this approach.
There was a problem hiding this comment.
Thanks for the explanation.
Thinking it more, how useful is setting the permissions of the log files? If only taking the default permissions, we can get away with using fopen and fclose only, which don't need wrappers.
There was a problem hiding this comment.
I also thought about doing that. The default permission is 0666 & umask, which equates to 0644 on most platforms. I would assume that not everyone wants their libfabric logs to be globally readable by default, especially on multi-tenant systems.
Users can of course adjust their umask prior to launching libfabric, but given that we are a library, this would imply that users automatically change the behaviour of the entire application, which might be prohibitive or at least inconvenient in some cases.
I also thought about exposing control of the umask of just the libfabric log via something like log_location_umask, but that makes for a rather awkward interface and I would expect users to rightfully ask why they cannot set the mode directly.
One way to avoid the open wrappers is to statically set the umask to 0077 while creating the log file so that it gets created with a mode of 0600. This would mean that users would have to manually chmod their log files in a second step if they want something else. The implementation would be simpler, but users would have less options.
Let me know how to proceed.
There was a problem hiding this comment.
Fair enough.
Could you do a rebase so that AWS CI can pass?
There was a problem hiding this comment.
Pushed commits with updated implementation of wrappers (now matches the glibc implementation linked above) and rebased against current main.
There was a problem hiding this comment.
Why not have the user create and restrict a directory and not bother with file level permissions?
|
bot:aws:retest |
@j-xiong it is the same problem as the hook PR, this PR needs to be rebased to resolve AWS CI failure |
|
@shijin-aws Thanks! |
2efab06 to
3969c4b
Compare
include/freebsd/osd.h
Outdated
| fd = open(path, flags, mode); | ||
| return fd; |
There was a problem hiding this comment.
Can use return open(path, flags, mode); and eliminate fd. Same for other places.
There was a problem hiding this comment.
Good catch, thanks! Changed in updated commit.
Windows deprecated the functions 'open', 'fdopen', and 'close' and
emits a C4996 compilation warning when used. Libfabric is built in SDL
mode, turning this warning into an error.
This commit adds 'ofi_{open,fdopen,close}', which on Unix wrap the
original libc functions, and on Windows wrap the Windows-variants
'_{open,fdopen,close}'.
Signed-off-by: Philipp Friese <philipp.friese@cit.tum.de>
Signed-off-by: Philipp Friese <philipp.friese@cit.tum.de>
This commit introduces the parameter 'log_location', which can be set to alter the default log output to stderr (default), stdout, a file, or a directory. 'log_location_mode' is introduced to control the file mode of the log file. Signed-off-by: Philipp Friese <philipp.friese@cit.tum.de>
3969c4b to
4ecd601
Compare
This PR adds the parameter
log_location/ the environment variabelFI_LOG_LOCATIONto the standard logging provider in to allow control of the output location of the libfabric log. Possible options are: stderr (default), stdout, as well as a path to a file or to a directory. If a directory is provided, a log file with formatofi_<pid>.logis used. Log files are created with the file mode specified vialog_location_mode/FI_LOG_LOCATION_MODE, defaulting to0600, and must not exist prior to launching libfabric[1].The use-case of this parameter concerns multi-process application cases, such as MPI applications. By default, libfabric writes to stderr, which is usually unbuffered. Even when stderr of an MPI application is redirected to per-process files, for example using Open MPIs
--output-filenameoption, then the libfabric log still contains interleaved lines, resulting in one MPI process' log containing a log line from another MPI process.When modifying stderr to be line-buffered, then libfabric log entries are no longer interleaved, but may be out-of-order, which may render certain provider outputs impossible to interpret correctly. An example is the output of the
ofi_hook_profileprovider, which relies on strict line ordering for its output sematic.While libfabric allows the logging provider to be extended and overwritten programmatically, this approach is not feasible for applications using another communication library on-top of libfabric, such as MPI applications.
The parameter introduced in this PR addresses this.
[1] This is to prevent libfabric from overwriting/modifying files via an accidentally or intentionally mis-crafted log location path.
Usage Example