feat: add sync_max_age_days to email adapter#547
Conversation
When connecting an email account for the first time, every unread email in the inbox gets imported and treated as new. This floods the agent with stale messages it shouldn't respond to. Add a `sync_max_age_days` config option (default: 0 / no limit) that combines IMAP's UNSEEN flag with a SINCE date filter so only recent unread emails are imported. The SINCE query is evaluated server-side by the IMAP server, so it's efficient even on large mailboxes. Supported on both the default email config and per-instance configs.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughA new sync_max_age_days parameter is added to TOML schema and runtime config types, mapped during config load, threaded into EmailAdapter/EmailPollConfig, and used to optionally add IMAP SINCE filtering; tests and docs were updated accordingly. ChangesEmail sync config and polling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a sync_max_age_days configuration option to the email adapter to limit IMAP polling to recent unread emails, preventing first-connect inbox floods from importing long-stale UNSEEN messages.
Changes:
- Thread
sync_max_age_daysthrough TOML schema → loaded config types → adapter runtime config. - Update IMAP polling to use
UNSEEN SINCE <date>whensync_max_age_days > 0. - Update config test fixture to include the new field.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/messaging/email.rs |
Adds sync_max_age_days to polling config and builds UNSEEN + SINCE IMAP search queries during polling. |
src/config/types.rs |
Adds sync_max_age_days to EmailConfig and EmailInstanceConfig. |
src/config/toml_schema.rs |
Adds sync_max_age_days to TOML deserialization structs with default behavior. |
src/config/load.rs |
Threads sync_max_age_days through config loading for default and instance email configs. |
src/config.rs |
Updates config fixture to set sync_max_age_days: 0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/messaging/email.rs`:
- Around line 723-726: The cast config.sync_max_age_days as i64 can wrap for
large u64 values; validate or safely convert before calling
ChronoDuration::days() to avoid producing a future SINCE date. Replace the blind
cast in the search_query construction by using
i64::try_from(config.sync_max_age_days) (or clamp to a sensible max) and handle
the Err by returning/config error or defaulting; ensure the conversion happens
before calling ChronoDuration::days() and reference config.sync_max_age_days,
ChronoDuration::days(), and the search_query/Utc::now() usage when locating the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b6b793cc-691d-47c5-85c2-9a862a5372dd
📒 Files selected for processing (5)
src/config.rssrc/config/load.rssrc/config/toml_schema.rssrc/config/types.rssrc/messaging/email.rs
Extract the UNSEEN SINCE <date> construction into build_since_date() and use it from both poll_inbox_once and build_imap_search_criterion. The helper clamps day counts to MAX_SINCE_DAYS (1_000_000) and returns None for values past chrono::TimeDelta's internal bounds, replacing the original 'u64 as i64' cast in poll_inbox_once that would wrap to a negative ChronoDuration for inputs past i64::MAX and produce a future SINCE date that silently excluded every message. Also adds 3 unit tests for build_since_date and updates the email-setup docs to document sync_max_age_days on the default config and per-instance configs.
|
@jamiepine — could you take a look when you get a chance? I ran a front-to-back adversarial review before opening this up. The design is sound; the only blocking issue was a
Two pre-existing things I noticed but left out of scope — happy to follow up either as separate PRs or fold them in if you'd rather keep them in this slice:
|
… semantics Addresses the remaining review comments on PR spacedriveapp#547: - Extract build_poll_search_query() so the UNSEEN / UNSEEN SINCE <date> assembly is unit-testable independently of the IMAP session. Add three tests covering zero, non-zero, and absurd input. - Add sync_max_age_days to the Debug impls for both EmailConfig and EmailInstanceConfig. The field is non-sensitive and omitting it made config dumps misleading when diagnosing polling behavior. - Update docs and PR description to describe the actual IMAP SINCE semantics precisely: inclusive midnight boundary, not a rolling 24h window. Treat the value as a backfill cap.
|
|
||
| Default is `0` (no limit), which preserves the original behavior. | ||
|
|
||
| **Semantics.** IMAP `SINCE` (RFC 3501 §6.4.4) is inclusive and matches against whole dates at midnight in the server's local time, not rolling 24-hour windows. `sync_max_age_days = 1` therefore bounds the *oldest* mail the poller will import to "received on or after yesterday's date", which can include mail up to ~48 hours old depending on the current time and the server's timezone. Treat the value as a backfill cap, not a literal "last N hours" window — pick a value that's comfortably larger than your real cutoff if you need to be strict (e.g. use 2 to approximate "last 24h"). |
There was a problem hiding this comment.
[🟡 Medium] [🔵 Bug]
The new semantics note says users should pick a value "comfortably larger" than their real cutoff to be strict, but larger sync_max_age_days values include older messages and therefore make the cutoff less strict; this can cause operators to ingest more stale unread mail than intended when following the docs. Update the guidance to describe choosing the oldest acceptable date (or explicitly call out the extra slack) without implying larger values are stricter.
// docs/content/docs/(messaging)/email-setup.mdx
Default is `0` (no limit), which preserves the original behavior.
**Semantics.** IMAP `SINCE` (RFC 3501 §6.4.4) is inclusive and matches against whole dates at midnight in the server's local time, not rolling 24-hour windows. `sync_max_age_days = 1` therefore bounds the *oldest* mail the poller will import to "received on or after yesterday's date", which can include mail up to ~48 hours old depending on the current time and the server's timezone. Treat the value as a backfill cap, not a literal "last N hours" window — pick a value that's comfortably larger than your real cutoff if you need to be strict (e.g. use 2 to approximate "last 24h").
## Verify it's working
Summary
Bound first-connect email backfill to a configurable recency window. Without this, the first poll after an email account is added imports every unread message in the configured folders, and the agent tries to respond to years of stale mail.
sync_max_age_daysdefaults to0(no limit) so existing setups are unchanged. When set, the poll query becomesUNSEEN SINCE <date>instead of justUNSEEN. TheSINCEfilter is evaluated server-side by the IMAP server, so the check is cheap on large mailboxes and no messages are fetched only to be discarded locally.Works on the default email config and on per-instance configs.
Problem
Connecting an email account to a mailbox that has accumulated hundreds of unread emails from months (or years) ago floods the agent with stale messages. Each one looks like a new conversation, and the agent responds to mail it has no business responding to.
The existing
email_searchtool already supports asince_daysfilter (seebuild_imap_search_criterioninsrc/messaging/email.rs), so the IMAP path is well-trodden. The poll path was the gap.Solution
Mirror the
SINCEclause construction inpoll_inbox_onceand gate it on a newsync_max_age_daysconfig field. Thread the field through the TOML schema, the loaded config types, and the adapter runtime config in the same way as the existing email knobs.Semantics
IMAP
SINCE(RFC 3501 §6.4.4) is inclusive and matches against whole dates at midnight in the server's local time, not rolling 24-hour windows.sync_max_age_days = 1therefore bounds the oldest mail the poller will import to "received on or after yesterday's date" — which can include mail up to ~48 hours old depending on the current time and the server's timezone. Treat the value as a backfill cap, not a literal "last N hours" window; if you need a strict rolling window, set a value comfortably larger than your real cutoff (e.g.2to approximate "last 24h").Implementation notes
build_since_date()helper used by bothpoll_inbox_onceandbuild_imap_search_criterion. Single source of truth for thedd-MMM-YYYYformat string and the overflow guard.build_poll_search_query()helper that assembles theUNSEENvsUNSEEN SINCE <date>query string. The poll-time assembly is unit-tested in isolation from the IMAP session.MAX_SINCE_DAYSclamp (1_000_000) insidebuild_since_date. Without it, a config value past chrono's internalTimeDeltabound would panic the poll task; a value pasti64::MAXwould wrap to a negativeChronoDurationand produce a futureSINCEdate, silently excluding every message and re-introducing the exact bug the feature is supposed to prevent. Both failure modes are now unreachable: oversized inputs degrade to "no SINCE clause" and the poller behaves as ifsync_max_age_days = 0.build_imap_search_criterionescaping behavior.sync_max_age_daysis included in theDebugimpls for bothEmailConfigandEmailInstanceConfigso config dumps reflect the actual polling state during debugging.Files changed
src/messaging/email.rsbuild_since_dateandbuild_poll_search_queryhelpers, plumbsync_max_age_daysthroughEmailPollConfig/EmailAdapter, use the helpers inpoll_inbox_once, add 3 new unit testssrc/config/toml_schema.rssync_max_age_days: u64toTomlEmailConfigandTomlEmailInstanceConfigwith#[serde(default)]src/config/types.rsEmailConfigandEmailInstanceConfig; include in bothDebugimplssrc/config/load.rssrc/config.rssync_max_age_days: 0docs/content/docs/(messaging)/email-setup.mdxSINCEsemanticsTest plan
cargo check --all-targets— clean compilecargo test --lib— 881 tests passing (875 baseline + 6 new)cargo clippy --all-targets -- -D warnings— no warningscargo fmt --all -- --check— cleanjust gate-pr-ci-fast— all gate checks passReviewer notes
_unusedMAX_SINCE_DAYScould be tightened to whatever value the team prefers — 1_000_000 days (~2739 years) was chosen as obviously-past-anything-sane without being so small it surprises users who think in decades.SINCEsemantics are documented precisely in both the in-code doc comment onbuild_poll_search_queryand the docs page. If the team would rather rename the field (e.g.backfill_max_age_days) or add a separatesince_window_hoursfor users who need rolling windows, that's a small follow-up.Out of scope (known follow-ups)
SINCEfilter narrows the scope of this bug but doesn't fix it. Worth a follow-up PR with a per-UID retry budget.from_instance_configrebuilds a temporaryEmailConfigto callSelf::build. Every new email config field has to be copied in this temp struct. Consider aFrom<&EmailInstanceConfig> for EmailConfigimpl.search_mailboxin the same file hand-builds anEmailPollConfigliteral; the new field is propagated there, but the broader divergence betweensearch_mailbox's literal andbuild()'s initialization is a pre-existing maintenance hazard.