feat: CATALYST-676 add generic otel setup by chanceaclark · Pull Request #2674 · bigcommerce/catalyst

chanceaclark · 2025-11-07T20:30:52Z

What/Why?

Adds OpenTelemetry instrumentation for Catalyst, enabling the collection of spans for Catalyst storefronts.

Testing

Spans populating within Sentry traces.

Migration

Change is new code only, so just copy over /core/instrumentation.ts and core/lib/otel/tracers.ts.

@chris-nowicki I took a stab at whipping up some documentation for you here:

Documentation

OpenTelemetry in Catalyst

Overview

Catalyst includes built-in OpenTelemetry instrumentation for monitoring your storefront's performance and behavior. OpenTelemetry provides distributed tracing, which helps you understand how requests flow through your application, identify performance bottlenecks, and debug issues in production.

This guide covers practical usage of OpenTelemetry within Catalyst, focusing on how to use traces to improve your storefront's performance.

What is OpenTelemetry?

OpenTelemetry is an open-source observability framework that provides a standardized way to collect telemetry data (traces, metrics, and logs) from your applications. It's platform-agnostic, meaning you can change your observability provider without changing your code.

Key concepts:

Trace: A complete journey of a request through your system
Span: A single unit of work within a trace (e.g., a database query, API call, or function execution)
Attributes: Key-value pairs that provide context about a span (e.g., route name, HTTP method)

Quick Setup

Catalyst uses the @vercel/otel package, which simplifies OpenTelemetry configuration. The setup is already complete in your project:

Dependencies installed: OpenTelemetry packages are in package.json
Instrumentation configured: See instrumentation.ts in the root of the core directory
Tracer available: Import from ~/lib/otel/tracer to create custom spans

For detailed setup instructions and collector configuration, refer to the official Next.js OpenTelemetry guide.

Environment Variables

Configure OpenTelemetry using environment variables:

# Optional: Set a custom service name (defaults to 'next-app')
OTEL_SERVICE_NAME=catalyst-storefront

# Optional: Enable verbose tracing to see all spans
NEXT_OTEL_VERBOSE=1

# Optional: Disable automatic fetch instrumentation
NEXT_OTEL_FETCH_DISABLED=1

Add these to your .env.local file or deployment environment.

Configuring Exporters

Exporters send telemetry data to observability backends where you can analyze traces and monitor performance. Configure exporters using standard OpenTelemetry environment variables:

# OTLP endpoint for your observability backend
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend.example.com

# Authentication headers (format depends on your provider)
OTEL_EXPORTER_OTLP_HEADERS=api-key=your_api_key

Add these to your .env.local file for local development or set them in your deployment platform's environment settings for production.

For detailed configuration options, protocol settings, and provider-specific examples, refer to the OpenTelemetry Exporter documentation.

What's Instrumented by Default

Next.js automatically instruments several key operations:

HTTP Requests: All incoming requests with route, method, and status code
Route Rendering: App Router page and layout rendering
API Routes: Route handler execution
Fetch Requests: External API calls made with fetch()
Metadata Generation: generateMetadata function execution

Each span includes helpful attributes like:

http.method: The HTTP method (GET, POST, etc.)
http.route: The route pattern (e.g., /[locale]/(default)/cart)
http.status_code: Response status code
next.route: Next.js route identifier

Viewing and Analyzing Traces

To view traces, you need an OpenTelemetry collector and observability backend. Common options:

Vercel: Built-in observability (see Vercel docs)
Local development: Use the OpenTelemetry dev environment
Third-party tools: Jaeger, Honeycomb, Datadog, New Relic, Grafana Tempo

Trace Hierarchy

Traces are organized hierarchically:

GET /cart                              [Root span - entire request]
├─ render route (app) /[locale]/(default)/cart   [Page rendering]
│  ├─ fetch POST https://api.bigcommerce.com/...  [GraphQL: Get cart data]
│  ├─ fetch POST https://api.bigcommerce.com/...  [GraphQL: Get checkout data]
│  └─ generateMetadata                            [Metadata generation]
└─ start response                                 [First byte sent]

Using Traces to Debug Performance

Analyzing the Cart Page

The cart page is a critical conversion point in your storefront. Let's use traces to understand what impacts its load speed.

What to look for:

Total request duration: Time from request start to response completion
Fetch operations: GraphQL queries to BigCommerce API
Rendering time: How long it takes to render the page
Waterfall pattern: Are requests sequential (slow) or parallel (fast)?

Example analysis:

Span                                Duration    Notes
─────────────────────────────────────────────────────────────────
GET /cart                           1,240ms     Total page load
├─ render route                     1,180ms     Most time spent here
│  ├─ fetch (getCart)                 450ms     First API call
│  ├─ fetch (getShippingCountries)    380ms     Second API call
│  └─ generateMetadata                 15ms     Quick metadata
└─ start response                       0ms     Marker span

Findings:

Two sequential API calls account for 830ms (~67% of total time)
Opportunity: Make these calls in parallel or cache shipping countries

Common performance patterns:

Sequential fetches (slow):

// ❌ Sequential - total time = sum of both
const cart = await getCart({ cartId });
const countries = await getShippingCountries();

Parallel fetches (fast):

// ✅ Parallel - total time = slowest of the two
const [cart, countries] = await Promise.all([
  getCart({ cartId }),
  getShippingCountries(),
]);

Identifying N+1 Query Problems

Look for repeated fetch spans with similar patterns:

├─ fetch (getProduct) - 45ms
├─ fetch (getProduct) - 43ms
├─ fetch (getProduct) - 44ms
├─ fetch (getProduct) - 46ms
└─ fetch (getProduct) - 45ms

This indicates an N+1 query problem. Consider batching these requests or using a different API endpoint.

Monitoring Server Actions

Server actions (like applyCouponCode, updateLineItem) also appear in traces:

POST /cart (Server Action)          580ms
├─ applyCouponCode                  560ms
│  └─ fetch (ApplyCheckoutCoupon)   540ms
└─ revalidateTag                     18ms

Use this to understand:

How long mutations take
Whether revalidation is causing delays
If optimistic UI updates would improve perceived performance

Creating Custom Spans

Add custom spans to instrument important operations in your codebase.

Basic usage:

import { tracer } from '~/lib/otel/tracer';

export async function complexOperation() {
  return await tracer.startActiveSpan('complexOperation', async (span) => {
    try {
      // Your operation here
      const result = await doSomething();
      
      // Optionally add attributes
      span.setAttribute('result.count', result.length);
      
      return result;
    } finally {
      // Always end the span
      span.end();
    }
  });
}

Practical examples:

1. Instrument data transformations

import { tracer } from '~/lib/otel/tracer';

export async function transformCartData(rawCart: RawCart) {
  return await tracer.startActiveSpan('transformCartData', async (span) => {
    try {
      span.setAttribute('cart.itemCount', rawCart.lineItems.length);
      
      const transformed = {
        items: rawCart.lineItems.map(transformLineItem),
        total: calculateTotal(rawCart),
      };
      
      span.setAttribute('transformation.duration', Date.now());
      
      return transformed;
    } finally {
      span.end();
    }
  });
}

2. Instrument external API calls

import { tracer } from '~/lib/otel/tracer';

export async function fetchRecommendations(productId: string) {
  return await tracer.startActiveSpan('fetchRecommendations', async (span) => {
    try {
      span.setAttribute('product.id', productId);
      
      const response = await fetch(
        `https://api.example.com/recommendations/${productId}`,
      );
      
      span.setAttribute('http.status_code', response.status);
      
      if (!response.ok) {
        span.setAttribute('error', true);
        throw new Error('Failed to fetch recommendations');
      }
      
      return await response.json();
    } finally {
      span.end();
    }
  });
}

3. Instrument critical business logic

import { tracer } from '~/lib/otel/tracer';

export async function calculateShipping(cart: Cart, address: Address) {
  return await tracer.startActiveSpan('calculateShipping', async (span) => {
    try {
      span.setAttribute('cart.weight', calculateWeight(cart));
      span.setAttribute('shipping.country', address.countryCode);
      
      const rates = await getShippingRates(cart, address);
      
      span.setAttribute('shipping.optionsAvailable', rates.length);
      
      return rates;
    } finally {
      span.end();
    }
  });
}

When to add custom spans:

✅ Operations that might be slow (> 100ms)
✅ Critical business logic (checkout, pricing, inventory checks)
✅ Data transformations with variable performance
✅ Third-party API integrations
❌ Simple utility functions
❌ Operations that are already instrumented (fetch calls)
❌ Trivial operations (< 10ms)

Best Practices for Instrumentation

1. Use descriptive span names

// ❌ Not descriptive
tracer.startActiveSpan('operation', ...)

// ✅ Clear and specific
tracer.startActiveSpan('cart.calculateDiscounts', ...)

2. Add meaningful attributes

span.setAttribute('cart.itemCount', lineItems.length);
span.setAttribute('user.type', isGuest ? 'guest' : 'registered');
span.setAttribute('cache.hit', cacheHit);

3. Always end spans in finally blocks

try {
  return await doWork();
} finally {
  span.end(); // Ensures span ends even if error occurs
}

4. Mark errors appropriately

import { SpanStatusCode } from '@opentelemetry/api';

try {
  return await riskyOperation();
} catch (error) {
  span.recordException(error);
  span.setStatus({ code: SpanStatusCode.ERROR });
  throw error;
} finally {
  span.end();
}

5. Use hierarchical naming

Group related spans with dot notation:

cart.validate
cart.addItem
cart.calculateTotals
checkout.validate
checkout.submitOrder

This creates logical grouping in your observability dashboard.

Interpreting Cart Page Traces

Key metrics to monitor:

Total page load time: Target < 1.5 seconds
Time to first byte (TTFB): Target < 500ms
GraphQL query time: Target < 300ms per query
Rendering time: Target < 200ms

Common issues and solutions:

Issue	Trace pattern	Solution
Slow API calls	Fetch spans > 500ms	Add caching, use CDN, or upgrade API tier
Sequential fetches	Waterfall pattern	Make calls parallel with `Promise.all()`
Heavy rendering	Long render spans	Optimize components, reduce data transformations
Too many fetches	10+ fetch spans	Batch requests or use different API endpoints
Slow metadata generation	`generateMetadata` > 100ms	Cache metadata or simplify generation

Real-world example:

Before optimization:

GET /cart - 2,340ms
├─ render route - 2,280ms
│  ├─ fetch getCart - 520ms
│  ├─ fetch getCheckout - 480ms
│  ├─ fetch getShippingCountries - 420ms
│  ├─ fetch getCustomer - 380ms
│  └─ transformCartData - 340ms

After optimization:

GET /cart - 920ms
├─ render route - 860ms
│  ├─ fetch getCart (parallel) - 510ms
│  ├─ fetch getCheckout (parallel) - 490ms
│  ├─ fetch getShippingCountries (cached) - 5ms
│  ├─ fetch getCustomer (parallel) - 370ms
│  └─ transformCartData (optimized) - 45ms

Changes made:

Parallelized GraphQL queries (-890ms)
Cached shipping countries (-415ms)
Optimized cart data transformation (-295ms)
Total improvement: 1,420ms (61% faster)

Advanced Patterns

Propagating context across async boundaries

import { context, trace } from '@opentelemetry/api';

// Create a span and get its context
const span = trace.getActiveSpan();
const currentContext = context.active();

// Pass context to async work
setTimeout(() => {
  context.with(currentContext, () => {
    // This work is associated with the original span
    doAsyncWork();
  });
}, 1000);

Sampling for high-traffic routes

Configure sampling in instrumentation.ts to reduce overhead on high-traffic routes (refer to the Next.js OpenTelemetry guide for collector configuration).

Troubleshooting

Spans not appearing

Verify instrumentation.ts exists in the project root
Check that registerOTel() is being called
Ensure OTEL_SERVICE_NAME is set (optional but recommended)
Verify your collector is running and accessible

Missing fetch spans

Set NEXT_OTEL_FETCH_DISABLED=0 or remove this environment variable.

Too much data

Set NEXT_OTEL_VERBOSE=0 to reduce span volume
Configure sampling in your collector
Filter spans by route or operation name

Spans appear out of order

This is usually a visualization issue. Check the span timestamps in your observability tool.

Resources

Next Steps

Set up a collector: Use the local dev environment or deploy to Vercel
Baseline your app: Capture traces for key pages (home, product, cart, checkout)
Identify bottlenecks: Look for long spans and sequential operations
Add custom spans: Instrument critical business logic
Monitor over time: Track performance improvements and regressions

Screenshot Recommendations

To complete this documentation, add the following screenshots:

[Screenshot: OpenTelemetry trace view]: A complete trace showing the hierarchy of spans for a cart page request
[Screenshot: Cart page trace analysis]: A detailed view highlighting slow operations and their durations
Consider adding: Waterfall view comparing before/after optimization
Consider adding: Dashboard view showing trace metrics over time
Consider adding: Example of custom span with attributes in the UI

These screenshots should come from your actual observability tool (Vercel, Jaeger, Honeycomb, etc.) showing real Catalyst traces.

changeset-bot · 2025-11-07T20:30:56Z

🦋 Changeset detected

Latest commit: d8f5ce6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@bigcommerce/catalyst-core	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2025-11-07T20:30:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
catalyst-b2b	Ready	Preview	Comment	Nov 7, 2025 8:53pm
catalyst-canary	Ready	Preview	Comment	Nov 7, 2025 8:53pm

1 Skipped Deployment

Project	Deployment	Preview	Comments	Updated (UTC)
catalyst	Ignored			Nov 7, 2025 8:53pm

This reverts commit a48bcf2.

* Revert "test(reviews): add e2e tests for reviews form (#2686)" This reverts commit d1d1249. * Revert "feat: CATALYST-676 add generic otel setup (#2674)" This reverts commit a48bcf2. * Revert "refactor(auth): separate first and last name fields on user session (#2684)" This reverts commit edbd202. * Revert "feat(reviews): add reviews form enabling shoppers to submit reviews (#2676)" This reverts commit b7ba003.

vercel Bot deployed to Preview – catalyst-canary November 7, 2025 20:31 View deployment

vercel Bot deployed to Preview – catalyst-b2b November 7, 2025 20:31 View deployment

chanceaclark mentioned this pull request Nov 7, 2025

chore: CATALYST-676 add sentry to demo site #2669

Closed

chanceaclark marked this pull request as ready for review November 7, 2025 20:32

chanceaclark requested a review from a team November 7, 2025 20:32

sentry Bot reviewed Nov 7, 2025

View reviewed changes

Comment thread core/instrumentation.ts

feat: CATALYST-676 add generic otel setup

d8f5ce6

chanceaclark force-pushed the feat/add-otel branch from c3e45dc to d8f5ce6 Compare November 7, 2025 20:52

vercel Bot deployed to Preview – catalyst-canary November 7, 2025 20:53 View deployment

vercel Bot deployed to Preview – catalyst-b2b November 7, 2025 20:53 View deployment

jorgemoya approved these changes Nov 11, 2025

View reviewed changes

chanceaclark added this pull request to the merge queue Nov 13, 2025

Merged via the queue into canary with commit a48bcf2 Nov 13, 2025
9 of 11 checks passed

chanceaclark deleted the feat/add-otel branch November 13, 2025 16:14

github-actions Bot mentioned this pull request Nov 13, 2025

Version Packages (canary) #2689

Merged

jordanarldt restored the feat/add-otel branch November 14, 2025 21:51

jordanarldt added a commit that referenced this pull request Nov 14, 2025

Revert "feat: CATALYST-676 add generic otel setup (#2674)"

0a317f7

This reverts commit a48bcf2.

jordanarldt mentioned this pull request Nov 14, 2025

Revert "feat: CATALYST-676 add generic otel setup" #2698

Closed

jordanarldt added a commit that referenced this pull request Nov 14, 2025

Revert "feat: CATALYST-676 add generic otel setup (#2674)"

01814ef

This reverts commit a48bcf2.

jordanarldt mentioned this pull request Nov 14, 2025

Revert minor changes #2701

Merged

This was referenced Nov 14, 2025

feat: CATALYST-676 add generic otel setup #2705

Merged

feat: CATALYST-676 add generic otel setup #2708

Merged

jamesqquick pushed a commit that referenced this pull request Feb 11, 2026

feat: CATALYST-676 add generic otel setup (#2674)

83b854c

chanceaclark added a commit that referenced this pull request Apr 27, 2026

feat: CATALYST-676 add generic otel setup (#2674)

89bde10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CATALYST-676 add generic otel setup#2674

feat: CATALYST-676 add generic otel setup#2674
chanceaclark merged 1 commit intocanaryfrom
feat/add-otel

chanceaclark commented Nov 7, 2025

Uh oh!

changeset-bot Bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

vercel Bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chanceaclark commented Nov 7, 2025

What/Why?

Testing

Migration

OpenTelemetry in Catalyst

Overview

What is OpenTelemetry?

Quick Setup

Environment Variables

Configuring Exporters

What's Instrumented by Default

Viewing and Analyzing Traces

Trace Hierarchy

Using Traces to Debug Performance

Analyzing the Cart Page

What to look for:

Example analysis:

Common performance patterns:

Identifying N+1 Query Problems

Monitoring Server Actions

Creating Custom Spans

Basic usage:

Practical examples:

1. Instrument data transformations

2. Instrument external API calls

3. Instrument critical business logic

When to add custom spans:

Best Practices for Instrumentation

1. Use descriptive span names

2. Add meaningful attributes

3. Always end spans in finally blocks

4. Mark errors appropriately

5. Use hierarchical naming

Interpreting Cart Page Traces

Key metrics to monitor:

Common issues and solutions:

Real-world example:

Advanced Patterns

Propagating context across async boundaries

Sampling for high-traffic routes

Troubleshooting

Spans not appearing

Missing fetch spans

Too much data

Spans appear out of order

Resources

Next Steps

Screenshot Recommendations

Uh oh!

changeset-bot Bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented Nov 7, 2025 •

edited

Loading

vercel Bot commented Nov 7, 2025 •

edited

Loading