Skip to content

⚡ Bolt: Optimize JSON-LD parsing in MovieScraper#86

Closed
bartholomej wants to merge 1 commit intomasterfrom
bolt-json-ld-parsing-opt-7258766754463568208
Closed

⚡ Bolt: Optimize JSON-LD parsing in MovieScraper#86
bartholomej wants to merge 1 commit intomasterfrom
bolt-json-ld-parsing-opt-7258766754463568208

Conversation

@bartholomej
Copy link
Copy Markdown
Owner

@bartholomej bartholomej commented Feb 19, 2026

💡 What: Optimized JSON-LD parsing in MovieScraper.
🎯 Why: JSON.parse was called multiple times for the same large JSON-LD string in helper functions (getMovieYear, getMovieDuration).
📊 Impact: Reduces synchronous parsing overhead by parsing once per movie request.
🔬 Measurement: Verified with unit tests in tests/movie.test.ts and integration test csfd.movie(10135) in tests/fetchers.test.ts.


PR created automatically by Jules for task 7258766754463568208 started by @bartholomej

Summary by CodeRabbit

Release Notes

  • Refactor

    • Enhanced movie information extraction with improved data structure and parsing for movie details, including better error handling and fallback mechanisms to ensure reliable metadata retrieval
  • Tests

    • Updated and expanded test coverage to validate movie metadata processing

Parse JSON-LD string once in `MovieScraper.movie()` and pass the object to helpers, avoiding redundant `JSON.parse` calls in `getMovieYear` and `getMovieDuration`. This reduces the overhead of parsing large JSON strings multiple times per movie request.

- Define `MovieJsonLd` interface.
- Update `getMovieYear` and `getMovieDuration` to accept `MovieJsonLd` object.
- Update `MovieScraper.movie` to parse JSON-LD once.
- Update tests to reflect signature changes.

Co-authored-by: bartholomej <5861310+bartholomej@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 19, 2026

📝 Walkthrough

Walkthrough

This PR introduces type-safe JSON-LD handling by creating a MovieJsonLd interface and refactoring movie parsing functions to accept structured typed objects instead of raw strings. The service layer now parses JSON-LD with error handling, and helper functions receive the typed object.

Changes

Cohort / File(s) Summary
Type Definition
src/dto/movie.ts
Added new MovieJsonLd interface with dateCreated and duration string fields, plus an index signature for arbitrary additional properties.
Helper Functions
src/helpers/movie.helper.ts
Updated getMovieYear and getMovieDuration function signatures to accept typed MovieJsonLd | null instead of string/element parameters. Added try-catch wrapping around duration parsing with DOM fallback logic.
Service Layer
src/services/movie.service.ts
Added JSON-LD parsing logic with try-catch error handling to convert raw script content into typed MovieJsonLd | null objects. Updated buildMovie signature to accept the typed object.
Tests
tests/movie.test.ts
Updated test imports and helper functions to work with the new MovieJsonLd type, replacing string-based JSON-LD handling with typed object handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • ESM + CommonJS build 🎉  #18 — Directly overlaps with edits to src/dto/movie.ts, src/helpers/movie.helper.ts, and src/services/movie.service.ts regarding JSON-LD typing and function signature changes.

Suggested labels

enhancement

Poem

🐰 A rabbit hops through typed land,
Where JSON-LD follows our command,
No more raw strings, just safety true,
Type-safe parsing through and through! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description lacks the required template sections (Type of change, Related Issues, Checklist) and instead uses an informal format with emojis and external task references. Restructure the description to follow the template: add 'Type of change' section with appropriate checkbox, include 'Related Issues' section, and complete the standard 'Checklist' items.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main optimization: reducing JSON-LD parsing overhead in MovieScraper by parsing once instead of multiple times.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bolt-json-ld-parsing-opt-7258766754463568208

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/dto/movie.ts (1)

186-188: dateCreated and duration should be optional; prefer unknown over any in the index signature.

Two separate points on the interface shape:

  1. Both fields are declared as required string, but a live CSFD page that lacks dateCreated in its JSON-LD will cause getMovieYear to silently return NaN (because +undefined === NaN) rather than null, even though jsonLd is non-null. Making the known fields optional matches the real data contract and lets TypeScript surface the missing-field check.

  2. [key: string]: any disables type-checking for all index-accessed properties. unknown is a drop-in replacement that preserves the open shape while forcing callers to narrow before use.

✏️ Proposed fix
 export interface MovieJsonLd {
-  dateCreated: string;
-  duration: string;
-  [key: string]: any;
+  dateCreated?: string;
+  duration?: string;
+  [key: string]: unknown;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dto/movie.ts` around lines 186 - 188, Update the movie DTO so known
optional fields are optional and the index signature uses unknown: change
dateCreated and duration from required strings to optional (dateCreated?:
string; duration?: string;) and replace [key: string]: any with [key: string]:
unknown; then update any callers (e.g., getMovieYear) to properly narrow/check
dateCreated before converting so it can return null instead of producing NaN.
src/helpers/movie.helper.ts (1)

127-132: Guard jsonLd.dateCreated before converting — links to optional-field suggestion in MovieJsonLd.

If dateCreated is made optional in MovieJsonLd (per the dto comment above), the existing if (jsonLd) guard is no longer sufficient and +undefined = NaN will be silently returned. Adding the field guard now is low-cost and defensive:

✏️ Proposed fix
 export const getMovieYear = (jsonLd: MovieJsonLd | null): number => {
-  if (jsonLd) {
+  if (jsonLd?.dateCreated) {
     return +jsonLd.dateCreated;
   }
   return null;
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/helpers/movie.helper.ts` around lines 127 - 132, The getMovieYear
function currently only guards for jsonLd but not jsonLd.dateCreated, which can
lead to NaN if dateCreated is undefined; update getMovieYear to check both
jsonLd and jsonLd.dateCreated (or use a nullish check) before converting, and
return null when dateCreated is missing; reference the getMovieYear function and
the MovieJsonLd type so you add a guard like "if (!jsonLd || jsonLd.dateCreated
== null) return null" and only then return Number(jsonLd.dateCreated).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/services/movie.service.ts`:
- Around line 44-50: The code reads
movieHtml.querySelector('script[type="application/ld+json"]').innerText without
guarding for a missing element, which can throw before the JSON.parse try/catch;
update the extraction in the movie() flow to first retrieve the element (use
optional chaining or a null check on the result of movieHtml.querySelector),
only read innerText when the element exists, and if absent leave jsonLd as null
(and optionally log a controlled message) so JSON.parse is never invoked on
undefined; refer to movieHtml, jsonLdString, and MovieJsonLd to locate and fix
the logic.

In `@tests/movie.test.ts`:
- Around line 52-59: getJsonLd currently falls back to parsing the string '{}'
which produces a truthy empty object and breaks consumers like getMovieYear;
change getJsonLd to detect absence of the script element and return null
immediately instead of using '?? "{}"', i.e., capture the result of
node.querySelector('script[type="application/ld+json"]') into a variable, if
it's null return null, otherwise read its innerText and JSON.parse inside the
try/catch to return MovieJsonLd or null; update references to getJsonLd and
ensure getMovieYear receives null when no JSON‑LD is present.

---

Nitpick comments:
In `@src/dto/movie.ts`:
- Around line 186-188: Update the movie DTO so known optional fields are
optional and the index signature uses unknown: change dateCreated and duration
from required strings to optional (dateCreated?: string; duration?: string;) and
replace [key: string]: any with [key: string]: unknown; then update any callers
(e.g., getMovieYear) to properly narrow/check dateCreated before converting so
it can return null instead of producing NaN.

In `@src/helpers/movie.helper.ts`:
- Around line 127-132: The getMovieYear function currently only guards for
jsonLd but not jsonLd.dateCreated, which can lead to NaN if dateCreated is
undefined; update getMovieYear to check both jsonLd and jsonLd.dateCreated (or
use a nullish check) before converting, and return null when dateCreated is
missing; reference the getMovieYear function and the MovieJsonLd type so you add
a guard like "if (!jsonLd || jsonLd.dateCreated == null) return null" and only
then return Number(jsonLd.dateCreated).

Comment on lines +44 to +50
const jsonLdString = movieHtml.querySelector('script[type="application/ld+json"]').innerText;
let jsonLd: MovieJsonLd | null = null;
try {
jsonLd = JSON.parse(jsonLdString);
} catch (e) {
console.error('node-csfd-api: Error parsing JSON-LD', e);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

NPE if the <script type="application/ld+json"> tag is absent.

querySelector returns null if no matching element is found. On line 44, .innerText is accessed directly on the return value without optional chaining. When the tag is missing, this throws TypeError: Cannot read properties of null (reading 'innerText') before reaching the try/catch on line 46, crashing the public movie() method entirely. The test helper at tests/movie.test.ts line 53 correctly uses ?.innerText, making the discrepancy between production and test defensive behavior visible.

🐛 Proposed fix
-    const jsonLdString = movieHtml.querySelector('script[type="application/ld+json"]').innerText;
-    let jsonLd: MovieJsonLd | null = null;
-    try {
-      jsonLd = JSON.parse(jsonLdString);
-    } catch (e) {
-      console.error('node-csfd-api: Error parsing JSON-LD', e);
-    }
+    const jsonLdString = movieHtml.querySelector('script[type="application/ld+json"]')?.innerText;
+    let jsonLd: MovieJsonLd | null = null;
+    if (jsonLdString) {
+      try {
+        jsonLd = JSON.parse(jsonLdString) as MovieJsonLd;
+      } catch (e) {
+        console.error('node-csfd-api: Error parsing JSON-LD', e);
+      }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/movie.service.ts` around lines 44 - 50, The code reads
movieHtml.querySelector('script[type="application/ld+json"]').innerText without
guarding for a missing element, which can throw before the JSON.parse try/catch;
update the extraction in the movie() flow to first retrieve the element (use
optional chaining or a null check on the result of movieHtml.querySelector),
only read innerText when the element exists, and if absent leave jsonLd as null
(and optionally log a controlled message) so JSON.parse is never invoked on
undefined; refer to movieHtml, jsonLdString, and MovieJsonLd to locate and fix
the logic.

Comment thread tests/movie.test.ts
Comment on lines +52 to 59
const getJsonLd = (node: HTMLElement): MovieJsonLd | null => {
const text = node.querySelector('script[type="application/ld+json"]')?.innerText ?? '{}';
try {
return JSON.parse(text);
} catch (e) {
return null;
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

?? '{}' fallback returns a truthy empty object, breaking getMovieYear for pages without a JSON-LD tag.

When script[type="application/ld+json"] is absent, innerText is undefined, text becomes '{}', and JSON.parse('{}') returns {} — a truthy object with no dateCreated or duration. Any subsequent call to getMovieYear({}) will execute the if (jsonLd) branch and return +undefined (i.e., NaN) instead of null.

Return null directly when the tag is missing to stay consistent with the MovieJsonLd | null contract:

✏️ Proposed fix
 const getJsonLd = (node: HTMLElement): MovieJsonLd | null => {
-  const text = node.querySelector('script[type="application/ld+json"]')?.innerText ?? '{}';
-  try {
-    return JSON.parse(text);
-  } catch (e) {
-    return null;
-  }
+  const text = node.querySelector('script[type="application/ld+json"]')?.innerText;
+  if (!text) return null;
+  try {
+    return JSON.parse(text) as MovieJsonLd;
+  } catch (e) {
+    return null;
+  }
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const getJsonLd = (node: HTMLElement): MovieJsonLd | null => {
const text = node.querySelector('script[type="application/ld+json"]')?.innerText ?? '{}';
try {
return JSON.parse(text);
} catch (e) {
return null;
}
};
const getJsonLd = (node: HTMLElement): MovieJsonLd | null => {
const text = node.querySelector('script[type="application/ld+json"]')?.innerText;
if (!text) return null;
try {
return JSON.parse(text) as MovieJsonLd;
} catch (e) {
return null;
}
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/movie.test.ts` around lines 52 - 59, getJsonLd currently falls back to
parsing the string '{}' which produces a truthy empty object and breaks
consumers like getMovieYear; change getJsonLd to detect absence of the script
element and return null immediately instead of using '?? "{}"', i.e., capture
the result of node.querySelector('script[type="application/ld+json"]') into a
variable, if it's null return null, otherwise read its innerText and JSON.parse
inside the try/catch to return MovieJsonLd or null; update references to
getJsonLd and ensure getMovieYear receives null when no JSON‑LD is present.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.20%. Comparing base (25a700c) to head (21af607).

Files with missing lines Patch % Lines
src/helpers/movie.helper.ts 88.88% 1 Missing ⚠️
src/services/movie.service.ts 80.00% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #86      +/-   ##
==========================================
- Coverage   99.52%   99.20%   -0.32%     
==========================================
  Files          28       28              
  Lines         629      631       +2     
  Branches      143      145       +2     
==========================================
  Hits          626      626              
- Misses          3        5       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bartholomej bartholomej deleted the bolt-json-ld-parsing-opt-7258766754463568208 branch March 20, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants