Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 20 additions & 10 deletions API-INTERNAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,17 @@ If the requested key is a collection, it will return an object with all the coll
<dd><p>Remove a key from Onyx and update the subscribers</p>
</dd>
<dt><a href="#retryOperation">retryOperation()</a></dt>
<dd><p>Handles storage operation failures based on the error type:</p>
<dd><p>Handles storage operation failures based on the error class (see lib/storage/errors.ts).
The connection layer (createStore) owns connection/transport recovery; this operation layer owns
capacity recovery (eviction) so that a given failure is retried by exactly one layer:</p>
<ul>
<li>Storage capacity errors: evicts data and retries the operation</li>
<li>Invalid data errors: logs an alert and throws an error</li>
<li>Non-retriable errors: logs an alert and resolves without retrying</li>
<li>Other errors: retries the operation</li>
<li>INVALID_DATA: logs an alert and throws (the same data will always fail).</li>
<li>TRANSIENT / FATAL: the connection layer already retried (transient) or exhausted its heal budget
and alerted (fatal). Retrying here would only re-amplify, so we skip the write quietly.</li>
<li>CAPACITY: evicts the least recently accessed evictable key and retries, under a session-level
circuit breaker (see lib/StorageCircuitBreaker.ts) that halts the loop once eviction stops making
progress or failures storm — the per-operation budget alone cannot stop a session-wide storm.</li>
<li>UNKNOWN: bounded retry.</li>
</ul>
</dd>
<dt><a href="#broadcastUpdate">broadcastUpdate()</a></dt>
Expand Down Expand Up @@ -318,11 +323,16 @@ Remove a key from Onyx and update the subscribers
<a name="retryOperation"></a>

## retryOperation()
Handles storage operation failures based on the error type:
- Storage capacity errors: evicts data and retries the operation
- Invalid data errors: logs an alert and throws an error
- Non-retriable errors: logs an alert and resolves without retrying
- Other errors: retries the operation
Handles storage operation failures based on the error class (see lib/storage/errors.ts).
The connection layer (createStore) owns connection/transport recovery; this operation layer owns
capacity recovery (eviction) so that a given failure is retried by exactly one layer:
- INVALID_DATA: logs an alert and throws (the same data will always fail).
- TRANSIENT / FATAL: the connection layer already retried (transient) or exhausted its heal budget
and alerted (fatal). Retrying here would only re-amplify, so we skip the write quietly.
- CAPACITY: evicts the least recently accessed evictable key and retries, under a session-level
circuit breaker (see lib/StorageCircuitBreaker.ts) that halts the loop once eviction stops making
progress or failures storm — the per-operation budget alone cannot stop a session-wide storm.
- UNKNOWN: bounded retry.

**Kind**: global function
<a name="broadcastUpdate"></a>
Expand Down
138 changes: 89 additions & 49 deletions lib/OnyxUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ import * as Logger from './Logger';
import type Onyx from './Onyx';
import cache, {TASK} from './OnyxCache';
import OnyxKeys from './OnyxKeys';
import * as Str from './Str';
import StorageCircuitBreaker from './StorageCircuitBreaker';
import Storage from './storage';
import {StorageErrorClass, classifyStorageError} from './storage/errors';
import type {
CollectionKeyBase,
ConnectOptions,
Expand Down Expand Up @@ -49,26 +50,6 @@ const METHOD = {
CLEAR: 'clear',
} as const;

// IndexedDB errors that indicate storage capacity issues where eviction can help
const IDB_STORAGE_ERRORS = [
'quotaexceedederror', // Browser storage quota exceeded
] as const;

// SQLite errors that indicate storage capacity issues where eviction can help
const SQLITE_STORAGE_ERRORS = [
'database or disk is full', // Device storage is full
] as const;

const STORAGE_ERRORS = [...IDB_STORAGE_ERRORS, ...SQLITE_STORAGE_ERRORS];

// IndexedDB errors where retrying is futile because the underlying connection/store is broken.
// The healing path (separate from retryOperation) is responsible for recovery.
const IDB_NON_RETRIABLE_ERRORS = [
'internal error opening backing store', // LevelDB backing store is broken at the filesystem level
] as const;

const NON_RETRIABLE_ERRORS = [...IDB_NON_RETRIABLE_ERRORS];

// Max number of retries for failed storage operations
const MAX_STORAGE_OPERATION_RETRY_ATTEMPTS = 5;

Expand Down Expand Up @@ -425,7 +406,9 @@ function multiGet<TKey extends OnyxKey>(keys: CollectionKeyBase[]): Promise<Map<
* Note: just using `.map`, you'd end up with `Array<OnyxCollection<Report>|OnyxEntry<string>>`, which is not what we want. This preserves the order of the keys provided.
*/
function tupleGet<Keys extends readonly OnyxKey[]>(keys: Keys): Promise<{[Index in keyof Keys]: OnyxValue<Keys[Index]>}> {
return Promise.all(keys.map((key) => get(key))) as Promise<{[Index in keyof Keys]: OnyxValue<Keys[Index]>}>;
return Promise.all(keys.map((key) => get(key))) as Promise<{
[Index in keyof Keys]: OnyxValue<Keys[Index]>;
}>;
}

/**
Expand Down Expand Up @@ -597,7 +580,10 @@ function keysChanged<TKey extends CollectionKeyBase>(
}

try {
lastConnectionCallbackData.set(subscriber.subscriptionID, {value: cachedCollection, matchedKey: subscriber.key});
lastConnectionCallbackData.set(subscriber.subscriptionID, {
value: cachedCollection,
matchedKey: subscriber.key,
});

if (subscriber.waitForCollectionCallback) {
subscriber.callback(cachedCollection, subscriber.key, partialCollection);
Expand Down Expand Up @@ -638,7 +624,10 @@ function keysChanged<TKey extends CollectionKeyBase>(
try {
const subscriberCallback = subscriber.callback as DefaultConnectCallback<TKey>;
subscriberCallback(cachedCollection[subscriber.key], subscriber.key as TKey);
lastConnectionCallbackData.set(subscriber.subscriptionID, {value: cachedCollection[subscriber.key], matchedKey: subscriber.key});
lastConnectionCallbackData.set(subscriber.subscriptionID, {
value: cachedCollection[subscriber.key],
matchedKey: subscriber.key,
});
} catch (error) {
Logger.logAlert(`[OnyxUtils.keysChanged] Subscriber callback threw an error for key '${collectionKey}': ${error}`);
}
Expand Down Expand Up @@ -709,15 +698,23 @@ function keyChanged<TKey extends OnyxKey>(
cachedCollection = getCachedCollection(subscriber.key);
cachedCollections[subscriber.key] = cachedCollection;
}
lastConnectionCallbackData.set(subscriber.subscriptionID, {value: cachedCollection, matchedKey: subscriber.key});
subscriber.callback(cachedCollection, subscriber.key, {[key]: value});
lastConnectionCallbackData.set(subscriber.subscriptionID, {
value: cachedCollection,
matchedKey: subscriber.key,
});
subscriber.callback(cachedCollection, subscriber.key, {
[key]: value,
});
continue;
}

const subscriberCallback = subscriber.callback as DefaultConnectCallback<TKey>;
subscriberCallback(value, key);

lastConnectionCallbackData.set(subscriber.subscriptionID, {value, matchedKey: key});
lastConnectionCallbackData.set(subscriber.subscriptionID, {
value,
matchedKey: key,
});
continue;
} catch (error) {
Logger.logAlert(`[OnyxUtils.keyChanged] Subscriber callback threw an error for key '${key}': ${error}`);
Expand Down Expand Up @@ -791,8 +788,11 @@ function remove<TKey extends OnyxKey>(key: TKey, isProcessingCollectionUpdate?:
function reportStorageQuota(error?: Error): Promise<void> {
return Storage.getDatabaseSize()
.then(({bytesUsed, bytesRemaining, usageDetails}) => {
// `bytesRemaining` comes from navigator.storage.estimate() and is an ORIGIN-WIDE estimate,
// not headroom for this database. The browser allocates IndexedDB storage dynamically, so a
// QuotaExceededError can legitimately occur even when this number still looks large.
Logger.logInfo(
`Storage Quota Check -- bytesUsed: ${bytesUsed} bytesRemaining: ${bytesRemaining}${
`Storage Quota Check -- bytesUsed: ${bytesUsed} originWideBytesRemaining (estimate, not per-DB headroom): ${bytesRemaining}${
usageDetails ? ` usageDetails: ${JSON.stringify(usageDetails)}` : ''
}. Original error: ${error}`,
);
Expand All @@ -803,11 +803,16 @@ function reportStorageQuota(error?: Error): Promise<void> {
}

/**
* Handles storage operation failures based on the error type:
* - Storage capacity errors: evicts data and retries the operation
* - Invalid data errors: logs an alert and throws an error
* - Non-retriable errors: logs an alert and resolves without retrying
* - Other errors: retries the operation
* Handles storage operation failures based on the error class (see lib/storage/errors.ts).
* The connection layer (createStore) owns connection/transport recovery; this operation layer owns
* capacity recovery (eviction) so that a given failure is retried by exactly one layer:
* - INVALID_DATA: logs an alert and throws (the same data will always fail).
* - TRANSIENT / FATAL: the connection layer already retried (transient) or exhausted its heal budget
* and alerted (fatal). Retrying here would only re-amplify, so we skip the write quietly.
* - CAPACITY: evicts the least recently accessed evictable key and retries, under a session-level
* circuit breaker (see lib/StorageCircuitBreaker.ts) that halts the loop once eviction stops making
* progress or failures storm — the per-operation budget alone cannot stop a session-wide storm.
* - UNKNOWN: bounded retry.
*/
function retryOperation<TMethod extends RetriableOnyxOperation>(
error: Error,
Expand All @@ -818,34 +823,50 @@ function retryOperation<TMethod extends RetriableOnyxOperation>(
): Promise<void> {
const currentRetryAttempt = retryAttempt ?? 0;
const nextRetryAttempt = currentRetryAttempt + 1;
const errorClass = classifyStorageError(error);

Logger.logInfo(`Failed to save to storage. Error: ${error}. onyxMethod: ${onyxMethod.name}. retryAttempt: ${currentRetryAttempt}/${MAX_STORAGE_OPERATION_RETRY_ATTEMPTS}`);
// Once the breaker is open, every capacity write is going to fail the same way. Drop it silently —
// the breaker already emitted its single alert, and logging per failed write is exactly the storm
// we are suppressing. (We return before the log line below on purpose.)
if (errorClass === StorageErrorClass.CAPACITY && StorageCircuitBreaker.isTripped()) {
return Promise.resolve();
}

if (error && Str.startsWith(error.message, "Failed to execute 'put' on 'IDBObjectStore'")) {
Logger.logInfo(
`Failed to save to storage. Error: ${error}. class: ${errorClass}. onyxMethod: ${onyxMethod.name}. retryAttempt: ${currentRetryAttempt}/${MAX_STORAGE_OPERATION_RETRY_ATTEMPTS}`,
);

if (errorClass === StorageErrorClass.INVALID_DATA) {
Logger.logAlert(`Attempted to set invalid data set in Onyx. Please ensure all data is serializable. Error: ${error}`);
throw error;
}

const errorMessage = error?.message?.toLowerCase?.();
const errorName = error?.name?.toLowerCase?.();
const isStorageCapacityError = STORAGE_ERRORS.some((storageError) => errorName?.includes(storageError) || errorMessage?.includes(storageError));
const isNonRetriableError = NON_RETRIABLE_ERRORS.some((nonRetriableError) => errorName?.includes(nonRetriableError) || errorMessage?.includes(nonRetriableError));

if (isNonRetriableError) {
Logger.logAlert(`Storage operation skipped retry for non-retriable error. Error: ${error}. onyxMethod: ${onyxMethod.name}.`);
if (errorClass === StorageErrorClass.TRANSIENT || errorClass === StorageErrorClass.FATAL) {
Logger.logInfo(`Storage operation skipped retry; ${errorClass} errors are handled by the connection layer. Error: ${error}. onyxMethod: ${onyxMethod.name}.`);
return Promise.resolve();
}

if (nextRetryAttempt > MAX_STORAGE_OPERATION_RETRY_ATTEMPTS) {
Logger.logAlert(`Storage operation failed after 5 retries. Error: ${error}. onyxMethod: ${onyxMethod.name}.`);
Logger.logAlert(`Storage operation failed after ${MAX_STORAGE_OPERATION_RETRY_ATTEMPTS} retries. Error: ${error}. onyxMethod: ${onyxMethod.name}.`);
return Promise.resolve();
}

if (!isStorageCapacityError) {
if (errorClass !== StorageErrorClass.CAPACITY) {
// UNKNOWN error — bounded retry without eviction.
// @ts-expect-error No overload matches this call.
return onyxMethod(defaultParams, nextRetryAttempt);
}

// CAPACITY: feed the session-level circuit breaker before evicting. The per-operation budget above
// cannot stop a session-wide storm — each evicted key triggers an OnyxDerived recompute that spawns
// a fresh write with its own budget — so the breaker is what actually halts the meltdown. (The
// already-open case returned silently at the top of this function.)
StorageCircuitBreaker.recordCapacityFailure();
if (StorageCircuitBreaker.isTripped()) {
// This failure tripped the breaker; it already emitted its single alert. Stop here.
return Promise.resolve();
}

// Find the least recently accessed evictable key that we can remove. Never evict an in-flight
// key — its cache value is the merge base this retry depends on, so dropping it would truncate
// the write to just the delta and diverge cache from storage.
Expand All @@ -858,9 +879,11 @@ function retryOperation<TMethod extends RetriableOnyxOperation>(
return reportStorageQuota(error);
}

// Remove the least recently accessed key and retry.
// Remove the least recently accessed key and retry. Tell the breaker we evicted so that, if the
// retry comes back as another capacity failure, it counts as a no-progress cycle.
Logger.logInfo(`Out of storage. Evicting least recently accessed key (${keyForRemoval}) and retrying. Error: ${error}`);
reportStorageQuota(error);
StorageCircuitBreaker.recordEviction();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear pending eviction after a successful retry

When a capacity failure evicts a key and the subsequent retry succeeds, this flag is never cleared, so the next quota error within the rolling window is counted as a “no-progress” eviction even though the previous eviction did make progress. With intermittent quota pressure where each eviction successfully frees enough space, five such successful cycles will still trip the breaker and then silently skip later storage writes for 60s; the pending result should be cleared/reset on successful retry instead of only on the next capacity failure.

Useful? React with 👍 / 👎.


// @ts-expect-error No overload matches this call.
return remove(keyForRemoval).then(() => onyxMethod(defaultParams, nextRetryAttempt));
Expand Down Expand Up @@ -1249,7 +1272,10 @@ function updateSnapshots<TKey extends OnyxKey>(data: Array<OnyxUpdate<TKey>>, me
const keysToCopy = new Set([...snapshotExistingKeys, ...allowedNewKeys]);
const newValue = typeof value === 'object' && value !== null ? utils.pick(value as Record<string, unknown>, [...keysToCopy]) : {};

updatedData = {...updatedData, [key]: Object.assign(oldValue, newValue)};
updatedData = {
...updatedData,
[key]: Object.assign(oldValue, newValue),
};
}

// Skip the update if there's no data to be merged
Expand Down Expand Up @@ -1387,7 +1413,13 @@ function multiSetWithRetry(data: OnyxMultiSetInput, retryAttempt?: number): Prom
// via a single batched keysChanged() call instead of one keyChanged() per member. For each
// collection, `partial` holds the new values being set and `previous` holds the cached values
// from before the set, which keysChanged() uses to skip subscribers whose value didn't change.
const collectionBatches = new Map<string, {partial: Record<string, OnyxValue<OnyxKey>>; previous: Record<string, OnyxValue<OnyxKey>>}>();
const collectionBatches = new Map<
string,
{
partial: Record<string, OnyxValue<OnyxKey>>;
previous: Record<string, OnyxValue<OnyxKey>>;
}
>();

for (const [key, value] of keyValuePairsToSet) {
// When we use multiSet to set a key we want to clear the current delta changes from Onyx.merge that were queued
Expand Down Expand Up @@ -1630,7 +1662,10 @@ function mergeCollectionWithPatches<TKey extends CollectionKeyBase>(
const keyValuePairsForNewCollection = prepareKeyValuePairsForStorage(newCollection, true);

// finalMergedCollection contains all the keys that were merged, without the keys of incompatible updates
const finalMergedCollection = {...existingKeyCollection, ...newCollection};
const finalMergedCollection = {
...existingKeyCollection,
...newCollection,
};

// Pre-warm cache for cache-miss existingKeys so cache.merge() merges the new delta into
// the real previous storage value. Fast path (all warm) skips the pre-warm to preserve
Expand Down Expand Up @@ -1679,7 +1714,12 @@ function mergeCollectionWithPatches<TKey extends CollectionKeyBase>(
retryOperation(
error,
mergeCollectionWithPatches,
{collectionKey, collection: resultCollection as OnyxMergeCollectionInput<TKey>, mergeReplaceNullPatches, isProcessingCollectionUpdate},
{
collectionKey,
collection: resultCollection as OnyxMergeCollectionInput<TKey>,
mergeReplaceNullPatches,
isProcessingCollectionUpdate,
},
retryAttempt,
inFlightKeys,
),
Expand Down
Loading
Loading