Optimize XUI/XML parsing: LLXMLNode, LLStringTable, LLInitParam, LLXUIParser#313
Optimize XUI/XML parsing: LLXMLNode, LLStringTable, LLInitParam, LLXUIParser#313RyeMutt wants to merge 9 commits into
Conversation
Replace the triplicated constructor init lists with default member initializers in the header; the constructors now only set what differs per overload. This also fixes a latent -Wreorder in the copy constructor (mParser/mParent were initialized out of declaration order). Drop the unused <lldir.h> include (nothing references LLDir), and give escapeXML a reserve() plus a range-for. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The character-data callback (XMLData) read the node's entire accumulated
value out via getValue() and wrote it back via setValue() on every chunk,
making multi-chunk text (anything over expat's buffer, entity refs, etc.)
quadratic. Add appendValue(string_view) and a setValue(string&&) overload
and append in place instead - linear in the text length.
StartXMLNode now inspects attribute names as string_view (the expat atts
array is already NUL-terminated), moves each attribute value into its node
once, and drops the dead per-attribute dedup lookup (expat already rejects
duplicate attribute names) plus an unused local. EndXMLNode no longer
copies the value to test it for whitespace.
setBoolValue built its string with llformat("%s ...", val.c_str(), ...),
re-formatting the whole accumulator each element; switch to append. All
output is byte-for-byte identical.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
characterData built a std::string for every character-data chunk even when not dumping, then appended that copy to the node contents. Take a (const char*, len) overload of appendContents and append the chunk directly; only materialize a string in the (rarely taken) dump branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stand up the previously-empty llxml unit-test target with TUT coverage for the parse path that the preceding commits touched: attribute/value round-trip, escaped-string unescaping, multi-chunk text accumulation (via parseStream's 1KB chunks), typed float arrays, bool serialization, escapeXML, and deepCopy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
STRING_TABLE_HASH_MAP is only ever defined under _MSC_VER 1300-1399 (VS2002/2003), and its branches use std::hash_multimap / __gnu_cxx:: hash_multimap, neither of which exists in modern stdlib. Remove the dead branches from the header and from check/add/removeStringEntry and the destructor, leaving the live list-bucket implementation. Also drop the #if 0 alternate hash in hash_my_string, value-initialize the bucket array, inline the now-dead ret_val locals, and switch NULL -> nullptr. LLStringTableEntry owns a raw char* with a user-declared destructor but had an implicit copy ctor (shallow copy -> double free). It is never copied (always heap-allocated and held by pointer), so delete the copy ctor/assignment to make the single-owner invariant explicit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
readXUI() dereferenced node (node->getName()) before its isNull() check,
and the warning branch further dereferenced a null/stale mCurReadNode;
check for null first and bail out. ScopedFile's destructor called
fclose() even when the file failed to open (fclose(NULL) is UB); guard
it.
readXUIImpl() called getSanitizedValue() up to twice per node; compute it
once and reuse. Use find('.') instead of find(".") for the single-char
scan, and emplace_back name-stack/output-stack entries instead of
push_back(make_pair(...)).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hand-rolled new/delete and manual rule-of-three with a std::unique_ptr<T> member. Deep-copy value semantics and lazy heap allocation are preserved -- heap storage (rather than std::optional) keeps a Lazy<> param small when T is a large, usually-absent block -- and move ctor/assignment are added (the type was previously copy-only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Defects: - Multiple::isValid() used strict < while the registered validate() uses <=, so a Multiple at exactly its min/max count (e.g. AtLeast<1> with a single element) reported not-provided. Make both bounds inclusive. - getPossibleValues() appended to a function-static vector with no guard, so the inspect path (XSD/RNG writers) grew it unboundedly with duplicate entries. Populate it once. Copy/alloc reductions: - getValueName()/calcValueName() return const std::string& instead of by value, removing string copies in the serialize loops. - serializeBlock()/inspectBlock() build the unnamed-handle set once for O(1) duplicate checks instead of an O(named*unnamed) rescan per param. - Non-block Multiple deserialize parses directly into a freshly-added element (popping on failure) instead of into a local that is then copied in, matching how block-Multiple already works. Moves: - emplace_back over push_back(make_pair(...)); mValues = value over std::copy + back_inserter; add() uses emplace_back; rvalue set(container_t&&) plus matching Multiple assignment/call operators. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThe PR modernizes three core areas of the codebase: ChangesLLStringTable Simplification
LLInitParam Modernization
LLXMLNode, LLXmlTree, and LLXUIParser Modernization
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
indra/llxml/llxmlnode.cpp (1)
368-385:⚠️ Potential issue | 🟠 MajorFix
sscanfformat specifiers for unsigned targets.These calls parse into
U32(which isunsigned int), but use%d(which expectssigned int*). That format/type mismatch is undefined behavior. Use%uforU32.💡 Proposed fix
- if (sscanf(atts[pos + 1], "%d.%d", &version_major, &version_minor) > 0) + if (sscanf(atts[pos + 1], "%u.%u", &version_major, &version_minor) > 0) @@ - if (sscanf(atts[pos + 1], "%d", &length) > 0) + if (sscanf(atts[pos + 1], "%u", &length) > 0) @@ - if (sscanf(atts[pos + 1], "%d", &precision) > 0) + if (sscanf(atts[pos + 1], "%u", &precision) > 0)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@indra/llxml/llxmlnode.cpp` around lines 368 - 385, The sscanf format specifiers use %d (signed int) but parse into U32 (unsigned int) variables, creating undefined behavior. Replace all %d format specifiers with %u in the three sscanf calls: the version_major and version_minor parsing, the length variable parsing, and the precision variable parsing. This ensures the format specifiers match the target variable types.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@indra/llxml/llxmltree.cpp`:
- Around line 624-627: The characterData() method dereferences mCurrent without
a NULL check before calling appendContents(), and endElement() has the same
vulnerability when accessing mCurrent->mContents.empty(). Expat can invoke these
callbacks for text before the root element exists or after it closes, leaving
mCurrent as NULL. Add a guard check to verify that mCurrent is not NULL before
dereferencing it in both the characterData() method (where the appendContents
call occurs) and the endElement() method (where mContents.empty() is accessed),
ensuring the code only proceeds if mCurrent points to a valid element.
---
Outside diff comments:
In `@indra/llxml/llxmlnode.cpp`:
- Around line 368-385: The sscanf format specifiers use %d (signed int) but
parse into U32 (unsigned int) variables, creating undefined behavior. Replace
all %d format specifiers with %u in the three sscanf calls: the version_major
and version_minor parsing, the length variable parsing, and the precision
variable parsing. This ensures the format specifiers match the target variable
types.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6452996f-f9b8-4aa9-b5d2-bf477cb30912
📒 Files selected for processing (11)
indra/llcommon/llinitparam.cppindra/llcommon/llinitparam.hindra/llcommon/llstringtable.cppindra/llcommon/llstringtable.hindra/llui/llxuiparser.cppindra/llxml/CMakeLists.txtindra/llxml/llxmlnode.cppindra/llxml/llxmlnode.hindra/llxml/llxmltree.cppindra/llxml/llxmltree.hindra/llxml/tests/llxmlnode_test.cpp
CodeRabbit review on #313: - LLXmlTreeParser::characterData/endElement dereferenced mCurrent without a null check. It is structurally non-null for well-formed input (expat routes prolog/epilog text to the default handler, and endElement only fires for an opened element), but guard both as cheap defensive hardening. - StartXMLNode parsed version/length/precision into U32 with %d (expects int*); switch to %u to match the target type. Output is unchanged for the non-negative values these attributes hold. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Description
A focused performance and modernization pass over the XUI/XML parsing path and the
LLInitParamparam-block system that backs it. No user-facing behavior change — the same XUI/XML parses to the same widget trees and values; these are allocation/copy reductions, a handful of latent-defect fixes, and C++ modernization. 8 commits, grouped by area:XML node parsing (
llxml)getValue()) and back (setValue()) on every chunk, so any text delivered in multiple chunks (large bodies, entity refs, expat buffer splits) was quadratic. Now appends in place via a newappendValue(string_view)/setValue(string&&).StartXMLNode— attribute names inspected asstring_view(the expatattsarray is already NUL-terminated), values moved into the node once, dead per-attribute dedup lookup removed (expat already rejects duplicate attribute names).EndXMLNodeno longer copies the value to test it for whitespace;setBoolValue's quadraticllformataccumulator switched to append. Output is byte-for-byte identical.LLXMLNodeconstruction — default member initializers replace three duplicated init lists (also fixes a latent-Wreorderin the copy ctor); drop an unused<lldir.h>include; tidyescapeXML.LLXmlTreeParser::characterData— append the char range directly instead of building a per-chunkstd::string.llxmltest target with 7 TUT cases covering the touched parse paths.String table (
llcommon/llstringtable)STRING_TABLE_HASH_MAPbranches (only ever active on VS2002/2003 via long-gonehash_multimap), modernize the surviving list-bucket implementation, and deleteLLStringTableEntry's implicit copy ctor — it owns a rawchar*, so a shallow copy is a double-free hazard (it's never copied in practice).Param blocks (
llcommon/llinitparam)Multiple::isValid()now uses inclusive bounds to matchvalidate()(anAtLeast<1>with exactly one element no longer reports not-provided);getPossibleValues()no longer grows a function-static vector unboundedly on the inspect path.getValueName()/calcValueName()return by const ref; the unnamed-handle set is built once for O(1) dup checks; non-blockMultipledeserializes directly into the freshly-added element.LazyValuereimplemented overstd::unique_ptr(replaces hand-rolled rule-of-three, adds move support); assortedemplace_back/move adoptions.XUI parser (
llui/llxuiparser)readXUI()dereferencednodebefore its null check and could deref a stalemCurReadNode;ScopedFilecalledfclose(NULL)on open failure. Read hot path: computegetSanitizedValue()once per node,find('.')overfind("."),emplace_back.Related Issues
No tracking issue — internal optimization/cleanup pass on the XUI/XML load path.
Issue Link: N/A
Checklist
Additional Notes
llxml,llcommon, andlluibuild clean (RelWithDebInfo, VS2026); the newllxmlunit tests pass 7/7. The parser changes are behavior-preserving (byte-for-byte output, including the escaped-string edge cases) — full in-viewer runtime verification is still recommended before merge.llxmlcommits are behavior-preserving refactors; thellcommon/lluicommits additionally fix the latent defects called out above (param-block min/max validity, unbounded inspect-path vector, null-deref inreadXUI,fclose(NULL), string-table double-free).