Add file buffering capability to TFileCacheRead#146
Conversation
|
@pcanal - This is another one ready to review. |
| } | ||
|
|
||
|
|
||
| bool TFileBufferRead::cache(off_t start, off_t end) { |
There was a problem hiding this comment.
function name need to be camel case.
There was a problem hiding this comment.
all functions must have a doxygen documentation
There was a problem hiding this comment.
Can you replace off_t by a ROOT type?
This is also the focus of the TTreeCache, we must extent the documentation to explain in which use case this is yet-another-caching/prefetch layer is beneficial (compare to the other ones include the TTreeCache, the one the remote protocol might provide (xrootd) or even the XNet.ReadCacheSize or the one behind TFile::OpenFromCache). |
|
Hi Philippe, I think this technique is useful in the following cases (both assuming the file is larger than the TTC):
(Forgot to mention - have unit tests ready to be posted once this is merged.) Brian |
| * | ||
| * On error, -1 is returned and errno is set appropriately. | ||
| */ | ||
| ssize_t pread(char *into, size_t n, off_t pos); |
There was a problem hiding this comment.
Could you update the file for naming convention (camel case for funciton name) spacing (indention 3 spaces)?
Thanks.
|
|
||
|
|
||
| bool TFileBufferRead::tmpfile(const std::string &tmpdir) { | ||
| std::string pattern(tmpdir); |
There was a problem hiding this comment.
Why not using TSystem::TempFileName ?
| size_t len = std::min(static_cast<off_t>(fSize - start), static_cast<off_t>(CHUNK_SIZE)); | ||
| if (!fPresent[index]) { | ||
|
|
||
| void *window = mmap(0, len, PROT_READ | PROT_WRITE, MAP_SHARED, fFd, start); |
There was a problem hiding this comment.
This wont compiles using the Microsoft compiles.
See interpreter/llvm/src/lib/Support/Unix/Memory.inc
vs interpreter/llvm/src/lib/Support/Windows/Memory.inc
For inspiration (maybe).
|
@bbockelm, could we add a gtest for this feature? |
|
@vgvassilev - is there a short guide on writing gtests within ROOT? I just barely got caught up with doing |
|
How does this relates and/or superseed the local caching from TFilePrefetch? |
|
@pcanal - the local caching mechanism in This is a buffer -- it uses the system memory (i.e., the page cache and, eventually, the filesystem) to pull the data locally in large, storage-friendly reads. This allows us to do latency hiding for the duration of the running process. It does not cache the data beyond the lifetime of the running process. It is much simpler than the Overall, it's a pretty useful improvement - a modest improvement, admittedly - that we've used for many years in CMS. |
This sounds like functionally it overlaps with the TTreeCache. How it is different? How are they playing along? |
|
This captures a wider set of use cases than
|
|
|
||
| Bool_t fAsyncReading; | ||
| Bool_t fEnablePrefetching;///< reading by prefetching asynchronously | ||
| bool fEnableBuffering {false}; ///< buffer remote files on local disk |
There was a problem hiding this comment.
Keep the initialization style consistent.
| if (pattern.empty()) { | ||
| pattern = "/tmp"; | ||
| } | ||
| pattern += "/cmssw-shadow-XXXXXX"; |
| //initialise the prefetch object and set the cache directory | ||
| // start the thread only if the file is not local | ||
| fEnablePrefetching = gEnv->GetValue("TFile.AsyncPrefetching", 0); | ||
| fEnableBuffering = gEnv->GetValue("TFile.LocalBuffer", 0); |
There was a problem hiding this comment.
Please add doc/example in config/rootrc.in
|
@bbockelm Hi, Brian. This should be an easy error to fix :) Or is this ongoing work? I will tag it as donotmerge if so. |
|
Hi @bbockelm, do you still want to go for this? |
|
Hi @bbockelm , do you want to close this one or is it ongoing work? |
TFileBufferRead is meant to provide a local-disk buffer for a remote file. It aims to provide protection against low performance on high-latency links while being simpler than various caching and prefetching approaches. This version is a WIP as it is not yet integrated with TFileCacheRead.
Now, when TFile.LocalBuffer is set in the root environment, we will buffer the remote file on local disk.
|
Hi @bbockelm! I'm closing this PR because of inactivity. Please feel free to reopen if this is still a relevant development for you, solve the merge conflicts, and move the test from root-project/roottest#243 to this main PR, since the roottest repository has become a subdirectory of the main repo. If you reopen this PR and do these updates, we'll make sure we'll wrap it up quickly. Thanks for your understanding! |
This pull request adds an intermediate buffering mode between "normal ROOT IO" and the prefetching system. When enabled, it will cache a remote file to the local disk (uses the same logic as prefetching to determine what is "remote") for as long as it is opened and automatically cleans up afterward.
This is useful in cases where you want to hide the effects of network latency (for various use cases which work poorly with
TTreeCache, such as when an unpredictable set of branches are used or non-sequential scans) but do not want to set aside a directory to use as a persistent cache or have a cache-unfriendly workflow.The approach has been ported from CMSSW (there, it is called
lazy-download) where it has been in use for several years.