Skip to content

[MINOR][CI] Serialize federated monitoring/multitenant tests#2517

Merged
Baunsgaard merged 1 commit into
apache:mainfrom
Baunsgaard:ci-serialize-federated
Jun 26, 2026
Merged

[MINOR][CI] Serialize federated monitoring/multitenant tests#2517
Baunsgaard merged 1 commit into
apache:mainfrom
Baunsgaard:ci-serialize-federated

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

**.functions.federated.monitoring.**,**.functions.federated.multitenant.** was the only federated test group running at the default surefire parallelism (parallel=classes, threadCount=2). These tests spawn worker JVMs on fixed ports, run Spark, and share the static /tmp/systemds working directory, so two classes per fork race on those resources.

Symptoms

  • Failed to create non-existing local working directory: /tmp/systemds
  • Federated worker processes on port N died before becoming ready
  • All tests finish, then a leaked worker/Spark thread keeps the fork JVM alive until the 30m job cap cancels it.

Change

  • Run the group with -Dtest-threadCount=1 -Dtest-forkCount=1, matching every other federated group (the federated.primitives.part1-5 groups already use this), so the classes execute serially and no longer contend for ports and the shared working directory.

The **.functions.federated.monitoring.**,**.functions.federated.multitenant.**
job was the only federated test group running with the default surefire
parallelism (parallel=classes, threadCount=2). Unlike pure-CP tests, these
federated tests spawn worker JVMs on fixed ports, run Spark, and share the
static /tmp/systemds working directory, so two classes running concurrently in
one fork race on those resources. Observed symptoms include "Failed to create
non-existing local working directory: /tmp/systemds" and "Federated worker
processes on port N died before becoming ready", followed by a leaked
worker/Spark thread keeping the fork JVM alive until the 30m job cap cancels it.

Run this group with -Dtest-threadCount=1 -Dtest-forkCount=1, matching every
other federated group (the federated.primitives.part1-5 groups already use
this), so the classes execute serially and no longer contend for ports and the
shared working directory.
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.54%. Comparing base (d4e7def) to head (feb36c9).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2517      +/-   ##
============================================
- Coverage     71.56%   71.54%   -0.03%     
+ Complexity    49125    49101      -24     
============================================
  Files          1575     1575              
  Lines        189784   189784              
  Branches      37232    37232              
============================================
- Hits         135823   135772      -51     
- Misses        43470    43517      +47     
- Partials      10491    10495       +4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Baunsgaard Baunsgaard merged commit b51bde1 into apache:main Jun 26, 2026
50 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant