Skip to content

feat: CATALYST-676 add generic otel setup#2674

Merged
chanceaclark merged 1 commit intocanaryfrom
feat/add-otel
Nov 13, 2025
Merged

feat: CATALYST-676 add generic otel setup#2674
chanceaclark merged 1 commit intocanaryfrom
feat/add-otel

Conversation

@chanceaclark
Copy link
Copy Markdown
Contributor

What/Why?

Adds OpenTelemetry instrumentation for Catalyst, enabling the collection of spans for Catalyst storefronts.

Testing

Spans populating within Sentry traces.
Screenshot 2025-11-07 at 13 23 59
Screenshot 2025-11-07 at 13 22 40

Migration

Change is new code only, so just copy over /core/instrumentation.ts and core/lib/otel/tracers.ts.


@chris-nowicki I took a stab at whipping up some documentation for you here:

Documentation

OpenTelemetry in Catalyst

Overview

Catalyst includes built-in OpenTelemetry instrumentation for monitoring your storefront's performance and behavior. OpenTelemetry provides distributed tracing, which helps you understand how requests flow through your application, identify performance bottlenecks, and debug issues in production.

This guide covers practical usage of OpenTelemetry within Catalyst, focusing on how to use traces to improve your storefront's performance.

What is OpenTelemetry?

OpenTelemetry is an open-source observability framework that provides a standardized way to collect telemetry data (traces, metrics, and logs) from your applications. It's platform-agnostic, meaning you can change your observability provider without changing your code.

Key concepts:

  • Trace: A complete journey of a request through your system
  • Span: A single unit of work within a trace (e.g., a database query, API call, or function execution)
  • Attributes: Key-value pairs that provide context about a span (e.g., route name, HTTP method)

Quick Setup

Catalyst uses the @vercel/otel package, which simplifies OpenTelemetry configuration. The setup is already complete in your project:

  1. Dependencies installed: OpenTelemetry packages are in package.json
  2. Instrumentation configured: See instrumentation.ts in the root of the core directory
  3. Tracer available: Import from ~/lib/otel/tracer to create custom spans

For detailed setup instructions and collector configuration, refer to the official Next.js OpenTelemetry guide.

Environment Variables

Configure OpenTelemetry using environment variables:

# Optional: Set a custom service name (defaults to 'next-app')
OTEL_SERVICE_NAME=catalyst-storefront

# Optional: Enable verbose tracing to see all spans
NEXT_OTEL_VERBOSE=1

# Optional: Disable automatic fetch instrumentation
NEXT_OTEL_FETCH_DISABLED=1

Add these to your .env.local file or deployment environment.

Configuring Exporters

Exporters send telemetry data to observability backends where you can analyze traces and monitor performance. Configure exporters using standard OpenTelemetry environment variables:

# OTLP endpoint for your observability backend
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend.example.com

# Authentication headers (format depends on your provider)
OTEL_EXPORTER_OTLP_HEADERS=api-key=your_api_key

Add these to your .env.local file for local development or set them in your deployment platform's environment settings for production.

For detailed configuration options, protocol settings, and provider-specific examples, refer to the OpenTelemetry Exporter documentation.

What's Instrumented by Default

Next.js automatically instruments several key operations:

  • HTTP Requests: All incoming requests with route, method, and status code
  • Route Rendering: App Router page and layout rendering
  • API Routes: Route handler execution
  • Fetch Requests: External API calls made with fetch()
  • Metadata Generation: generateMetadata function execution

Each span includes helpful attributes like:

  • http.method: The HTTP method (GET, POST, etc.)
  • http.route: The route pattern (e.g., /[locale]/(default)/cart)
  • http.status_code: Response status code
  • next.route: Next.js route identifier

Viewing and Analyzing Traces

To view traces, you need an OpenTelemetry collector and observability backend. Common options:

Screenshot 2025-11-07 at 13 22 40

Trace Hierarchy

Traces are organized hierarchically:

GET /cart                              [Root span - entire request]
├─ render route (app) /[locale]/(default)/cart   [Page rendering]
│  ├─ fetch POST https://api.bigcommerce.com/...  [GraphQL: Get cart data]
│  ├─ fetch POST https://api.bigcommerce.com/...  [GraphQL: Get checkout data]
│  └─ generateMetadata                            [Metadata generation]
└─ start response                                 [First byte sent]

Using Traces to Debug Performance

Analyzing the Cart Page

The cart page is a critical conversion point in your storefront. Let's use traces to understand what impacts its load speed.

Screenshot 2025-11-07 at 13 23 59

What to look for:

  1. Total request duration: Time from request start to response completion
  2. Fetch operations: GraphQL queries to BigCommerce API
  3. Rendering time: How long it takes to render the page
  4. Waterfall pattern: Are requests sequential (slow) or parallel (fast)?

Example analysis:

Span                                Duration    Notes
─────────────────────────────────────────────────────────────────
GET /cart                           1,240ms     Total page load
├─ render route                     1,180ms     Most time spent here
│  ├─ fetch (getCart)                 450ms     First API call
│  ├─ fetch (getShippingCountries)    380ms     Second API call
│  └─ generateMetadata                 15ms     Quick metadata
└─ start response                       0ms     Marker span

Findings:

  • Two sequential API calls account for 830ms (~67% of total time)
  • Opportunity: Make these calls in parallel or cache shipping countries

Common performance patterns:

Sequential fetches (slow):

// ❌ Sequential - total time = sum of both
const cart = await getCart({ cartId });
const countries = await getShippingCountries();

Parallel fetches (fast):

// ✅ Parallel - total time = slowest of the two
const [cart, countries] = await Promise.all([
  getCart({ cartId }),
  getShippingCountries(),
]);

Identifying N+1 Query Problems

Look for repeated fetch spans with similar patterns:

├─ fetch (getProduct) - 45ms
├─ fetch (getProduct) - 43ms
├─ fetch (getProduct) - 44ms
├─ fetch (getProduct) - 46ms
└─ fetch (getProduct) - 45ms

This indicates an N+1 query problem. Consider batching these requests or using a different API endpoint.

Monitoring Server Actions

Server actions (like applyCouponCode, updateLineItem) also appear in traces:

POST /cart (Server Action)          580ms
├─ applyCouponCode                  560ms
│  └─ fetch (ApplyCheckoutCoupon)   540ms
└─ revalidateTag                     18ms

Use this to understand:

  • How long mutations take
  • Whether revalidation is causing delays
  • If optimistic UI updates would improve perceived performance

Creating Custom Spans

Add custom spans to instrument important operations in your codebase.

Basic usage:

import { tracer } from '~/lib/otel/tracer';

export async function complexOperation() {
  return await tracer.startActiveSpan('complexOperation', async (span) => {
    try {
      // Your operation here
      const result = await doSomething();
      
      // Optionally add attributes
      span.setAttribute('result.count', result.length);
      
      return result;
    } finally {
      // Always end the span
      span.end();
    }
  });
}

Practical examples:

1. Instrument data transformations

import { tracer } from '~/lib/otel/tracer';

export async function transformCartData(rawCart: RawCart) {
  return await tracer.startActiveSpan('transformCartData', async (span) => {
    try {
      span.setAttribute('cart.itemCount', rawCart.lineItems.length);
      
      const transformed = {
        items: rawCart.lineItems.map(transformLineItem),
        total: calculateTotal(rawCart),
      };
      
      span.setAttribute('transformation.duration', Date.now());
      
      return transformed;
    } finally {
      span.end();
    }
  });
}

2. Instrument external API calls

import { tracer } from '~/lib/otel/tracer';

export async function fetchRecommendations(productId: string) {
  return await tracer.startActiveSpan('fetchRecommendations', async (span) => {
    try {
      span.setAttribute('product.id', productId);
      
      const response = await fetch(
        `https://api.example.com/recommendations/${productId}`,
      );
      
      span.setAttribute('http.status_code', response.status);
      
      if (!response.ok) {
        span.setAttribute('error', true);
        throw new Error('Failed to fetch recommendations');
      }
      
      return await response.json();
    } finally {
      span.end();
    }
  });
}

3. Instrument critical business logic

import { tracer } from '~/lib/otel/tracer';

export async function calculateShipping(cart: Cart, address: Address) {
  return await tracer.startActiveSpan('calculateShipping', async (span) => {
    try {
      span.setAttribute('cart.weight', calculateWeight(cart));
      span.setAttribute('shipping.country', address.countryCode);
      
      const rates = await getShippingRates(cart, address);
      
      span.setAttribute('shipping.optionsAvailable', rates.length);
      
      return rates;
    } finally {
      span.end();
    }
  });
}

When to add custom spans:

  • ✅ Operations that might be slow (> 100ms)
  • ✅ Critical business logic (checkout, pricing, inventory checks)
  • ✅ Data transformations with variable performance
  • ✅ Third-party API integrations
  • ❌ Simple utility functions
  • ❌ Operations that are already instrumented (fetch calls)
  • ❌ Trivial operations (< 10ms)

Best Practices for Instrumentation

1. Use descriptive span names

// ❌ Not descriptive
tracer.startActiveSpan('operation', ...)

// ✅ Clear and specific
tracer.startActiveSpan('cart.calculateDiscounts', ...)

2. Add meaningful attributes

span.setAttribute('cart.itemCount', lineItems.length);
span.setAttribute('user.type', isGuest ? 'guest' : 'registered');
span.setAttribute('cache.hit', cacheHit);

3. Always end spans in finally blocks

try {
  return await doWork();
} finally {
  span.end(); // Ensures span ends even if error occurs
}

4. Mark errors appropriately

import { SpanStatusCode } from '@opentelemetry/api';

try {
  return await riskyOperation();
} catch (error) {
  span.recordException(error);
  span.setStatus({ code: SpanStatusCode.ERROR });
  throw error;
} finally {
  span.end();
}

5. Use hierarchical naming

Group related spans with dot notation:

cart.validate
cart.addItem
cart.calculateTotals
checkout.validate
checkout.submitOrder

This creates logical grouping in your observability dashboard.

Interpreting Cart Page Traces

Key metrics to monitor:

  1. Total page load time: Target < 1.5 seconds
  2. Time to first byte (TTFB): Target < 500ms
  3. GraphQL query time: Target < 300ms per query
  4. Rendering time: Target < 200ms

Common issues and solutions:

Issue Trace pattern Solution
Slow API calls Fetch spans > 500ms Add caching, use CDN, or upgrade API tier
Sequential fetches Waterfall pattern Make calls parallel with Promise.all()
Heavy rendering Long render spans Optimize components, reduce data transformations
Too many fetches 10+ fetch spans Batch requests or use different API endpoints
Slow metadata generation generateMetadata > 100ms Cache metadata or simplify generation

Real-world example:

Before optimization:

GET /cart - 2,340ms
├─ render route - 2,280ms
│  ├─ fetch getCart - 520ms
│  ├─ fetch getCheckout - 480ms
│  ├─ fetch getShippingCountries - 420ms
│  ├─ fetch getCustomer - 380ms
│  └─ transformCartData - 340ms

After optimization:

GET /cart - 920ms
├─ render route - 860ms
│  ├─ fetch getCart (parallel) - 510ms
│  ├─ fetch getCheckout (parallel) - 490ms
│  ├─ fetch getShippingCountries (cached) - 5ms
│  ├─ fetch getCustomer (parallel) - 370ms
│  └─ transformCartData (optimized) - 45ms

Changes made:

  1. Parallelized GraphQL queries (-890ms)
  2. Cached shipping countries (-415ms)
  3. Optimized cart data transformation (-295ms)
  4. Total improvement: 1,420ms (61% faster)

Advanced Patterns

Propagating context across async boundaries

import { context, trace } from '@opentelemetry/api';

// Create a span and get its context
const span = trace.getActiveSpan();
const currentContext = context.active();

// Pass context to async work
setTimeout(() => {
  context.with(currentContext, () => {
    // This work is associated with the original span
    doAsyncWork();
  });
}, 1000);

Sampling for high-traffic routes

Configure sampling in instrumentation.ts to reduce overhead on high-traffic routes (refer to the Next.js OpenTelemetry guide for collector configuration).

Troubleshooting

Spans not appearing

  1. Verify instrumentation.ts exists in the project root
  2. Check that registerOTel() is being called
  3. Ensure OTEL_SERVICE_NAME is set (optional but recommended)
  4. Verify your collector is running and accessible

Missing fetch spans

Set NEXT_OTEL_FETCH_DISABLED=0 or remove this environment variable.

Too much data

  1. Set NEXT_OTEL_VERBOSE=0 to reduce span volume
  2. Configure sampling in your collector
  3. Filter spans by route or operation name

Spans appear out of order

This is usually a visualization issue. Check the span timestamps in your observability tool.

Resources

Next Steps

  1. Set up a collector: Use the local dev environment or deploy to Vercel
  2. Baseline your app: Capture traces for key pages (home, product, cart, checkout)
  3. Identify bottlenecks: Look for long spans and sequential operations
  4. Add custom spans: Instrument critical business logic
  5. Monitor over time: Track performance improvements and regressions

Screenshot Recommendations

To complete this documentation, add the following screenshots:

  1. [Screenshot: OpenTelemetry trace view]: A complete trace showing the hierarchy of spans for a cart page request
  2. [Screenshot: Cart page trace analysis]: A detailed view highlighting slow operations and their durations
  3. Consider adding: Waterfall view comparing before/after optimization
  4. Consider adding: Dashboard view showing trace metrics over time
  5. Consider adding: Example of custom span with attributes in the UI

These screenshots should come from your actual observability tool (Vercel, Jaeger, Honeycomb, etc.) showing real Catalyst traces.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Nov 7, 2025

🦋 Changeset detected

Latest commit: d8f5ce6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@bigcommerce/catalyst-core Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Nov 7, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
catalyst-b2b Ready Ready Preview Comment Nov 7, 2025 8:53pm
catalyst-canary Ready Ready Preview Comment Nov 7, 2025 8:53pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
catalyst Ignored Ignored Nov 7, 2025 8:53pm

Comment thread core/instrumentation.ts
@chanceaclark chanceaclark added this pull request to the merge queue Nov 13, 2025
Merged via the queue into canary with commit a48bcf2 Nov 13, 2025
9 of 11 checks passed
@chanceaclark chanceaclark deleted the feat/add-otel branch November 13, 2025 16:14
@jordanarldt jordanarldt restored the feat/add-otel branch November 14, 2025 21:51
jordanarldt added a commit that referenced this pull request Nov 14, 2025
jordanarldt added a commit that referenced this pull request Nov 14, 2025
github-merge-queue Bot pushed a commit that referenced this pull request Nov 14, 2025
* Revert "test(reviews): add e2e tests for reviews form (#2686)"

This reverts commit d1d1249.

* Revert "feat: CATALYST-676 add generic otel setup (#2674)"

This reverts commit a48bcf2.

* Revert "refactor(auth): separate first and last name fields on user session (#2684)"

This reverts commit edbd202.

* Revert "feat(reviews): add reviews form enabling shoppers to submit reviews (#2676)"

This reverts commit b7ba003.
jamesqquick pushed a commit that referenced this pull request Feb 11, 2026
* Revert "test(reviews): add e2e tests for reviews form (#2686)"

This reverts commit d1d1249.

* Revert "feat: CATALYST-676 add generic otel setup (#2674)"

This reverts commit a48bcf2.

* Revert "refactor(auth): separate first and last name fields on user session (#2684)"

This reverts commit edbd202.

* Revert "feat(reviews): add reviews form enabling shoppers to submit reviews (#2676)"

This reverts commit b7ba003.
chanceaclark pushed a commit that referenced this pull request Apr 27, 2026
* Revert "test(reviews): add e2e tests for reviews form (#2686)"

This reverts commit d1d1249.

* Revert "feat: CATALYST-676 add generic otel setup (#2674)"

This reverts commit a48bcf2.

* Revert "refactor(auth): separate first and last name fields on user session (#2684)"

This reverts commit edbd202.

* Revert "feat(reviews): add reviews form enabling shoppers to submit reviews (#2676)"

This reverts commit b7ba003.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants