Skip to content

Add file buffering capability to TFileCacheRead#146

Closed
bbockelm wants to merge 8 commits into
root-project:masterfrom
bbockelm:filebuffer
Closed

Add file buffering capability to TFileCacheRead#146
bbockelm wants to merge 8 commits into
root-project:masterfrom
bbockelm:filebuffer

Conversation

@bbockelm
Copy link
Copy Markdown
Contributor

This pull request adds an intermediate buffering mode between "normal ROOT IO" and the prefetching system. When enabled, it will cache a remote file to the local disk (uses the same logic as prefetching to determine what is "remote") for as long as it is opened and automatically cleans up afterward.

This is useful in cases where you want to hide the effects of network latency (for various use cases which work poorly with TTreeCache, such as when an unpredictable set of branches are used or non-sequential scans) but do not want to set aside a directory to use as a persistent cache or have a cache-unfriendly workflow.

The approach has been ported from CMSSW (there, it is called lazy-download) where it has been in use for several years.

@bbockelm
Copy link
Copy Markdown
Contributor Author

@pcanal - This is another one ready to review.

Comment thread io/io/src/TFileBufferRead.cxx Outdated
}


bool TFileBufferRead::cache(off_t start, off_t end) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function name need to be camel case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all functions must have a doxygen documentation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace off_t by a ROOT type?

@pcanal
Copy link
Copy Markdown
Member

pcanal commented Sep 12, 2016

Rather, the focus of this buffer is to hide the effects of network latency.

This is also the focus of the TTreeCache, we must extent the documentation to explain in which use case this is yet-another-caching/prefetch layer is beneficial (compare to the other ones include the TTreeCache, the one the remote protocol might provide (xrootd) or even the XNet.ReadCacheSize or the one behind TFile::OpenFromCache).

@bbockelm
Copy link
Copy Markdown
Contributor Author

Hi Philippe,

I think this technique is useful in the following cases (both assuming the file is larger than the TTC):

  1. We have a job reading the file's events out-of-order (i.e., randomly selecting events) and reading many events. This occurs when we have two open files and have to match events in file A with events in file B (and B's event order is very different than A).
  2. Cases where there is high latency and the events are read more than once.

(Forgot to mention - have unit tests ready to be posted once this is merged.)

Brian

Comment thread io/io/inc/TFileBufferRead.h Outdated
*
* On error, -1 is returned and errno is set appropriately.
*/
ssize_t pread(char *into, size_t n, off_t pos);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the file for naming convention (camel case for funciton name) spacing (indention 3 spaces)?

Thanks.

Comment thread io/io/inc/TFileBufferRead.h
Comment thread io/io/src/TFileBufferRead.cxx Outdated


bool TFileBufferRead::tmpfile(const std::string &tmpdir) {
std::string pattern(tmpdir);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using TSystem::TempFileName ?

Comment thread io/io/src/TFileBufferRead.cxx Outdated
size_t len = std::min(static_cast<off_t>(fSize - start), static_cast<off_t>(CHUNK_SIZE));
if (!fPresent[index]) {

void *window = mmap(0, len, PROT_READ | PROT_WRITE, MAP_SHARED, fFd, start);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wont compiles using the Microsoft compiles.
See interpreter/llvm/src/lib/Support/Unix/Memory.inc
vs interpreter/llvm/src/lib/Support/Windows/Memory.inc

For inspiration (maybe).

@vgvassilev
Copy link
Copy Markdown
Member

@bbockelm, could we add a gtest for this feature?

@bbockelm
Copy link
Copy Markdown
Contributor Author

bbockelm commented Jul 3, 2017

@vgvassilev - is there a short guide on writing gtests within ROOT? I just barely got caught up with doing roottest stuff...

@vgvassilev
Copy link
Copy Markdown
Member

@bbockelm, it looks like your question was snowed under by our bot.

You could look at io/io/test/. It is just adding a file. A test I recently added could be found here

@pcanal
Copy link
Copy Markdown
Member

pcanal commented Aug 23, 2017

How does this relates and/or superseed the local caching from TFilePrefetch?

@bbockelm
Copy link
Copy Markdown
Contributor Author

bbockelm commented Sep 6, 2017

@pcanal - the local caching mechanism in TFilePrefetch is permanent cache -- it lives on the local disk beyond the lifetime of the actual process.

This is a buffer -- it uses the system memory (i.e., the page cache and, eventually, the filesystem) to pull the data locally in large, storage-friendly reads. This allows us to do latency hiding for the duration of the running process.

It does not cache the data beyond the lifetime of the running process. It is much simpler than the TFilePrefetch mechanism as the buffering is only done on demand.

Overall, it's a pretty useful improvement - a modest improvement, admittedly - that we've used for many years in CMS.

@pcanal
Copy link
Copy Markdown
Member

pcanal commented Sep 10, 2017

This is a buffer -- it uses the system memory (i.e., the page cache and, eventually, the filesystem) to pull the data locally in large, storage-friendly reads.

This sounds like functionally it overlaps with the TTreeCache. How it is different? How are they playing along?

@bbockelm
Copy link
Copy Markdown
Contributor Author

This captures a wider set of use cases than TTreeCache, but doesn't perform as well as TTreeCache for the cases TTC is designed for.

  • TTC works best when most of the IO is walking sequentially through the tree, reading from non-local storage, with approximately the same branches read for each event. TFileCacheRead will handle this case OK - but not as well as TTC.
  • TFileCacheRead will perform better when the event access is non-sequential, when branch access patterns are not predictable, or when non-TTree data is used.

Comment thread io/io/inc/TFileCacheRead.h Outdated

Bool_t fAsyncReading;
Bool_t fEnablePrefetching;///< reading by prefetching asynchronously
bool fEnableBuffering {false}; ///< buffer remote files on local disk
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the initialization style consistent.

Comment thread io/io/src/TFileBufferRead.cxx Outdated
if (pattern.empty()) {
pattern = "/tmp";
}
pattern += "/cmssw-shadow-XXXXXX";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmssw? :)

//initialise the prefetch object and set the cache directory
// start the thread only if the file is not local
fEnablePrefetching = gEnv->GetValue("TFile.AsyncPrefetching", 0);
fEnableBuffering = gEnv->GetValue("TFile.LocalBuffer", 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add doc/example in config/rootrc.in

@xvallspl
Copy link
Copy Markdown
Contributor

xvallspl commented Dec 7, 2017

@bbockelm Hi, Brian. This should be an easy error to fix :)

Or is this ongoing work? I will tag it as donotmerge if so.

@etejedor
Copy link
Copy Markdown
Contributor

Hi @bbockelm, do you still want to go for this?

@etejedor
Copy link
Copy Markdown
Contributor

Hi @bbockelm , do you want to close this one or is it ongoing work?

TFileBufferRead is meant to provide a local-disk buffer for a remote
file.  It aims to provide protection against low performance on
high-latency links while being simpler than various caching and
prefetching approaches.

This version is a WIP as it is not yet integrated with TFileCacheRead.
Now, when TFile.LocalBuffer is set in the root environment, we
will buffer the remote file on local disk.
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@root-project root-project deleted a comment from phsft-bot May 29, 2025
@guitargeek guitargeek marked this pull request as ready for review May 29, 2025 07:10
@guitargeek guitargeek marked this pull request as draft May 29, 2025 07:11
@guitargeek
Copy link
Copy Markdown
Contributor

Hi @bbockelm! I'm closing this PR because of inactivity. Please feel free to reopen if this is still a relevant development for you, solve the merge conflicts, and move the test from root-project/roottest#243 to this main PR, since the roottest repository has become a subdirectory of the main repo.

If you reopen this PR and do these updates, we'll make sure we'll wrap it up quickly. Thanks for your understanding!

@guitargeek guitargeek closed this May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.