-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Added batching to feature server /push to offline store (#5683) #5729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added batching to feature server /push to offline store (#5683) #5729
Conversation
|
|
||
| from feast.repo_config import FeastConfigBaseModel | ||
|
|
||
| class OfflinePushBatchingConfig(FeastConfigBaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think having a config is fine but do we actually need it? we probably could have just passed these as optional args, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I just noticed the FeatureLoggingConfig with 5 fields and decided that for 3 fields it would be justified to also add a config. Do you want me to refactor it to use optional args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think having a config is fine but do we actually need it? we probably could have just passed these as optional args, no?
I refactored it as you wanted, so that there is no config. @franciscojavierarceo
494912d to
107bf09
Compare
c980db9 to
04dc34f
Compare
…dev#5683) Signed-off-by: Jacob Weinhold <29459386+jfw-ppi@users.noreply.github.com> fix: formatting,l int errors (feast-dev#5683) Signed-off-by: Jacob Weinhold <29459386+jfw-ppi@users.noreply.github.com>
20079eb to
1a3ccbd
Compare
…push Signed-off-by: Jacob Weinhold <29459386+jfw-ppi@users.noreply.github.com>
a60c605 to
bb299d9
Compare
sdk/python/feast/feature_server.py
Outdated
| return | ||
|
|
||
| batch_df = pd.concat(dfs, ignore_index=True) | ||
| self._buffers[key].clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to move clear inside try: self._store.push so that buffer gets cleared only after the write succeeds ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally makes sense, it's done. Thanks for seeing that.
sdk/python/feast/feature_server.py
Outdated
|
|
||
| # NOTE: offline writes are currently synchronous only, so we call directly | ||
| try: | ||
| self._store.push( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about splitting _flush_locked into two methods: one that extracts data (with lock) and one that does I/O (without lock) ?
Something like:
- Extracting the batch data while holding the lock
- Releasing the lock before doing I/O
- Re-enqueueing data if the write fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
I split _flush_locked into _drain_locked (extract under lock) and _flush (I/O without lock). I also added _inflight to prevent concurrent flushes per key. On failure the drained batch is re‑enqueued so we don’t drop data.
|
|
||
| # use a multi-row payload to ensure we test non-trivial dfs | ||
| resp = client.post("/push", json=push_body_many(push_mode, count=2, id_start=100)) | ||
| assert resp.status_code == 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional but I think it's good to return 202 when batching is enabled and offline writes are involved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! It's done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds configurable batching support for offline writes to the feature server's /push endpoint. The batching mechanism buffers offline writes and flushes them based on either a size threshold or time interval, improving throughput for high-volume offline push operations.
Key Changes:
- Introduced
OfflineWriteBatcherclass that manages buffered writes in a background thread - Added configuration options for batch size and interval in
BaseFeatureServerConfig - Modified
/pushendpoint logic to separate online and offline writes when batching is enabled
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/python/feast/feature_server.py | Implemented OfflineWriteBatcher class and integrated batching logic into the /push endpoint |
| sdk/python/feast/infra/feature_servers/base_config.py | Added three new configuration fields for offline push batching |
| sdk/python/tests/unit/test_feature_server.py | Added comprehensive test coverage for batching behavior across different push modes and configurations |
| docs/reference/feature-store-yaml.md | Documented the new feature_server configuration block with batching options |
| docs/reference/feature-servers/python-feature-server.md | Added user-facing documentation explaining offline write batching functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| allow_registry_cache = request.allow_registry_cache | ||
| transform_on_write = request.transform_on_write | ||
|
|
||
| # Async currently only applies to online store writes (ONLINE / ONLINE_AND_OFFLINE paths) as theres no async for offline store |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling: 'theres' should be 'there's'.
| # Async currently only applies to online store writes (ONLINE / ONLINE_AND_OFFLINE paths) as theres no async for offline store | |
| # Async currently only applies to online store writes (ONLINE / ONLINE_AND_OFFLINE paths) as there's no async for offline store |
| fs, enabled: bool = True, batch_size: int = 1, batch_interval_seconds: int = 60 | ||
| ): | ||
| """ | ||
| Attach a minimal feature_server.offline_push_batching config |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring has inconsistent indentation. The closing triple quotes and the description line should be aligned with the opening triple quotes for standard formatting.
| Attach a minimal feature_server.offline_push_batching config | |
| Attach a minimal feature_server.offline_push_batching config |
…-dev#5683](feast-dev#5683)) Signed-off-by: Jacob Weinhold <29459386+jfw-ppi@users.noreply.github.com>
a99a9b0 to
2747405
Compare
ntkathole
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
What this PR does / why we need it:
Added batching configuration for feature_servers /push endpoint for offline store writes
Which issue(s) this PR fixes:
Fixes #5683