Skip to content

The ACPATH Software Metric#25

Open
sebastianbergmann wants to merge 3 commits into
mainfrom
issue-4/acpath
Open

The ACPATH Software Metric#25
sebastianbergmann wants to merge 3 commits into
mainfrom
issue-4/acpath

Conversation

@sebastianbergmann
Copy link
Copy Markdown
Owner

@sebastianbergmann sebastianbergmann commented May 22, 2026

This implements the ACPATH complexity metric described in

Bagnara, Roberto & Bagnara, Abramo & Benedetti, Alessandro & Hill, Patricia. (2016). The ACPATH Metric: Precise Estimation of the Number of Acyclic Paths in C-like Languages.

PDF

New/updated paper:

Bagnara, Roberto & Bagnara, Abramo & Benedetti, Alessandro & Hill, Patricia. (2024). The ACPATH Structural Complexity Metric.

PDF

This software metric answers the question: How many acyclic execution paths exist through a function?

The number of acyclic paths through a function is a direct measure of its testability: since each acyclic path represents a distinct execution scenario, a thorough test suite should exercise all (or a significant fraction) of them. A function with 200 acyclic paths is fundamentally harder to test than one with 4.

ACPATH improves on two predecessors:

  • Cyclomatic complexity (McCabe) does not distinguish between different control-flow structures (e.g., conditionals vs. loops, sequential vs. nested), and therefore correlates poorly with testing effort.
  • NPATH (Nejmeh, 1988) was intended to count acyclic paths but its definition fails to do so, even for simple programs. NPATH can both underestimate and overestimate the true count. For example, NPATH does not account for short-circuit evaluation of &&/||, does not handle the backward jump in while loops, and does not model switch fall-through correctly.

ACPATH is proven (Theorem 2 in the paper) to yield the exact number of acyclic paths for all controlled function bodies, functions that contain no backward gotos and no jumps into a loop from outside. In practice this covers virtually all real-world code.

How Is ACPATH Calculated?

ACPATH works by structural induction over the abstract syntax tree. It performs a single traversal of the function body, propagating a set of path counts through each statement and expression.

Core Concepts

Path Counters for Expressions (Table 3 in the paper)

For every expression E, three functions are defined:

Symbol Meaning
t(E) Number of execution paths through E that may evaluate to true
f(E) Number of execution paths through E that may evaluate to false
p(E) Total number of execution paths through E (= t + f for boolean exprs)

The key insight is that short-circuit operators (&&, ||) introduce branching within an expression:

  • E1 && E2: E2 is only evaluated when E1 is true. So t = t(E1) * t(E2), f = f(E1) + t(E1) * f(E2).
  • E1 || E2: E2 is only evaluated when E1 is false. So t = t(E1) + f(E1) * t(E2), f = f(E1) * f(E2).
  • !E1: swaps t and f.
  • E1 ? E2 : E3 (ternary): t = t(E1)*t(E2) + f(E1)*t(E3), f = t(E1)*f(E2) + f(E1)*f(E3).
  • Leaf expressions (variables, function calls, etc.): t = f = p = 1.

For non-boolean operators (arithmetic, comparison, assignment, etc.), t = f = p since we cannot determine the boolean outcome statically.

Double-Traversal Functions (Tables 4-7 in the paper)

While-loop conditions can be traversed twice in a single acyclic path: once when entering the loop (evaluating to true) and once when exiting (evaluating to false). To correctly count paths through a while loop, four additional functions are needed for the condition expression:

Symbol Meaning
tt(E) Ways E can be traversed twice, both times evaluating to true, on non-overlapping arcs
tf(E) Ways E can be traversed twice, first true then false (or vice versa), on non-overlapping arcs
ff(E) Ways E can be traversed twice, both times evaluating to false, on non-overlapping arcs
pp(E) Total ways E can be traversed twice on non-overlapping arcs

For a simple leaf expression: tt = 0, tf = 1, ff = 0, pp = 0 (there is exactly one arc, and it can be used once for true and once for false).

These compose for &&, ||, !, and ternary in the same way single-traversal functions do, following the structure of the expression's control-flow graph.

Path Counters for Statements (Definitions 6-7, equations 37-53)

Each statement is analyzed by computing a tuple of five values:

Symbol Meaning
ft (fall-through) Number of acyclic paths that "fall through" the statement and continue to the next
bp (break paths) Cumulative paths that reach a break (not inside a nested switch/loop)
cp (continue paths) Cumulative paths that reach a continue (not inside a nested loop)
rp (return paths) Cumulative paths that reach a return statement
gt (goto paths) Partial function mapping label identifiers to path counts (not used in PHP)

The ft for incoming paths is threaded through the statement sequence: each statement receives the ft produced by its predecessor. The final ACPATH value for a function body is ft_out + rp (equation 53): all paths that fall off the end of the function plus all paths that exit via return.

The key formulas for statements are:

  • Expression statement E;: ft_out = p(E) * ft. No branching, just multiply by expression paths.
  • Sequential composition S1 S2: process S1 with incoming ft, then process S2 with S1's ft_out. bp, cp, rp accumulate.
  • return: ft_out = 0, rp = ft (or p(E) * ft if returning an expression). All incoming paths divert to return.
  • return E: ft_out = 0, rp = p(E) * ft.
  • if (E) S1 else S2: S1 receives t(E) * ft paths, S2 receives f(E) * ft paths. ft_out = ft1 + ft2.
  • if (E) S1 (no else): S1 receives t(E) * ft paths. ft_out = ft1 + f(E) * ft.
  • while (E) S: ft_out = f(E) * ft + bp_S * t(E) + (ft_S + cp_S) * tf(E) / t(E). This accounts for: (1) paths that skip the loop (f * ft), (2) paths broken out of the loop that re-enter the condition (bp_S * t), (3) paths that complete the body and loop back through the condition requiring double traversal (tf). The paper writes this as (ft_S + cp_S) * tf(E) / t(E), but since the body was entered with t(E) * ft incoming paths, dividing by t(E) normalizes back to "per incoming path". In the implementation this is computed differently (see below).
  • do S while (E): ft_out = f(E) * ft_S + bp_S. The body always executes once; only the false-exit paths from the condition leave the loop.
  • for (E1; E2; E3) S: desugared to E1; while (E2) { S; E3; }.
  • break: ft_out = 0, bp = ft.
  • continue: ft_out = 0, cp = ft.
  • switch (E) S: Each case label adds st (switch-to) incoming paths. Fall-through between cases is handled by sequential processing. bp from the switch body becomes ft_out (break exits the switch). If there is no default, an additional p(E) * ft paths pass through without matching.

Implementation in src/Visitor/AcpathCalculator.php

The implementation is a PHP class that takes a list of PHP-Parser Stmt nodes (the body of a function/method) and returns the ACPATH count as an integer. It implements the paper's formulas from Section 4, adapted to PHP's syntax.

Entry Point: calculate()

public function calculate(array $statements): int
{
    ['ft' => $ft, 'bp' => $bp, 'cp' => $cp, 'rp' => $rp] =
        $this->statements($statements, 1, 0);

    return max(1, $ft + $rp);
}

Corresponds to equation (53): apc_i^b[B] := ft_out + rp. Initial ft = 1 (one path enters the function), initial st = 0 (no switch-to paths). The max(1, ...) ensures even an empty function returns at least 1. The gt (goto) component from the paper is not implemented since PHP does not have goto statements in practice within this tool's scope.

Statement Sequence: statements()

private function statements(array $statements, int $ft, int $st): array

Implements equation (38): sequential composition. Iterates through statements, threading ft from one to the next. Accumulates bp, cp, rp by summation. The st parameter carries the switch-to path count for switch statements.

Individual Statement Dispatch: statement()

Routes each statement type to its handler. Implements the paper's equations as follows:

Statement Paper Eq. Implementation
Expression (expr stmt) (37) ft_out = p(E) * ft
Return_ (no expr) (39) ft=0, rp=ft
Return_ (with expr) (40) ft=0, rp=p(E)*ft
If_ (41)/(42) processIf()
Switch_ (43) processSwitch()
While_ (44) processWhile()
Do_ (45) processDo()
For_ (46) processFor()
Foreach_ n/a processForeach()
Break_ (47) ft=0, bp=ft
Continue_ (48) ft=0, cp=ft
TryCatch n/a processTryCatch()
Block (51) delegates to statements()
Other (echo, noop, etc.) (52) ft unchanged

processIf(): Conditional Statements

Implements equations (41) and (42).

  1. Computes t, f, p for the condition expression.
  2. Elseif chains: desugared into nested if/else. The first elseif is extracted, a new If_ node is constructed with remaining elseifs and the else clause, and the else branch is processed as this synthetic inner if. This mirrors how the paper treats elseif as syntactic sugar.
  3. If/else: the then-branch receives t * ft incoming paths, the else-branch receives f * ft incoming paths. ft_out = ft1 + ft2.
  4. If without else: the then-branch receives t * ft paths, and f * ft paths fall through directly. ft_out = ft1 + f * ft.

bp, cp, rp from both branches accumulate by addition.

processSwitch(): Switch Statements

Implements equation (43).

  1. Computes p for the switch condition expression.
  2. Sets switchSt = p * ft; this is the "switch-to" count: the number of incoming paths that each case label contributes.
  3. Delegates to processSwitchBody() which processes cases sequentially.
  4. Each case label adds st to the current ft (modeling "switch-to" entry). Case body statements are processed normally, allowing fall-through between cases (ft flows from one case to the next unless interrupted by break).
  5. After processing: ftOut = ftS + bpS (fall-through plus break paths). If there is no default case, adds p * ft paths for the "no match" case.

processWhile(): While Loops

Implements equation (44).

$ftOut = $f * $ft + $bpS * $t + ($ftS + $cpS) * $tf;

However, note an important difference from the paper's formula. The paper states:

ft_out = f(E) + bp_S + (ft_S + cp_S) * tf(E) / t(E)

where the body is entered with t(E) * ft incoming paths. The implementation instead passes the un-multiplied $ft to the body(line 261: $this->statements($stmt->stmts, $ft, $st)), not $t * $ft. Then compensates by multiplying the break term by $t and multiplying the skip-loop term by $f * $ft instead of $f. This is algebraically equivalent when the paper's formula is expanded with the incoming ft:

  • Paper: body gets t*ft paths, so ft_S is proportional to t*ft. Then (ft_S + cp_S) * tf / t normalizes out one factor of t.
  • Implementation: body gets ft paths (no multiplication by t). Then (ft_S + cp_S) * tf is already correct because ft_S is proportional to ft (not t*ft), and multiplying by tf (which already accounts for one true-then-false traversal) gives the right count.

The bp from the body is multiplied by $t because break paths must have entered the loop (condition was true), accounting for the condition's true-paths. Return paths pass through unchanged.

processDo(): Do-While Loops

Implements equation (45).

$ftOut = $f * $ftS + $bpS;

The body always executes once (receives $ft paths directly). Then:

  • $f * $ftS: paths that complete the body and exit via the condition evaluating to false.
  • $bpS: paths that break out of the loop.

Note: the implementation does not use double-traversal functions for do-while. The paper's equation (45) is ft_out = f(E) * ft_S + bp_S, which is simpler than while because the backward arc in a do-while loop goes from the condition back to the body entry and in the reference CFG this backward arc cannot be traversed in an acyclic path (it would revisit a node). So only one traversal of the condition is needed.

processFor(): For Loops

Implements equation (46) by desugaring to E1; while(E2) { S; E3; }:

  1. Processes init expressions, multiplying ft by each expression's p.
  2. Combines condition expressions with BooleanAnd (or uses true if empty).
  3. Appends loop expressions as Expression statements to the body.
  4. Applies the while-loop formula.

processForeach(): Foreach Loops

Not in the paper (PHP-specific). Treated as a while loop with a leaf condition (t=1, f=1, tf=1), giving:

$ftOut = 1 * $ft + $bpS * 1 + ($ftS + $cpS) * 1;

This means: the loop may execute zero or one additional iteration, with both the "skip" and "iterate-once" paths counted.

processTryCatch(): Try/Catch/Finally

Not in the paper (the paper covers C, which has no exceptions). The implementation treats each catch block as an alternative path: it receives the same ft as the try block (modeling that an exception could occur at the start of the try block). All ft values are summed. A finally block, if present, is threaded after the combined try+catch ft (it always executes).

Expression Path Counting: expressionPaths()

Returns {t, f, p} for an expression. Implements Table 3 from the paper:

Expression Type t f p
BooleanNot (!E) f(E) t(E) p(E)
BooleanAnd (&&) t1*t2 f1 + t1*f2 f1 + t1*p2
BooleanOr (||) t1 + f1*t2 f1*f2 t1 + f1*p2
Ternary (E1?E2:E3) t1t2 + f1t3 t1f2 + f1f3 t1p2 + f1p3
Ternary (E1?:E2, elvis) t1 + f1*t2 f1*f2 t1 + f1*p2
Coalesce (??) same as || same as || same as ||
Match_ sum of all arm paths same same
Assign/AssignOp p(var)*p(expr) same same
BinaryOp (non-boolean) p1*p2 same same
Cast, UnaryMinus/Plus p(E) same same
Leaf (variable, literal, call) 1 1 1

PHP-specific additions beyond the paper:

  • LogicalAnd/LogicalOr (PHP's and/or keywords): treated identically to &&/||.
  • Coalesce (??): treated as || (short-circuit on non-null).
  • Match_: sums up paths from all arm conditions and bodies.

Double-Traversal: expressionPathsDouble()

Returns {tt, tf, ff, pp}. Implements Tables 4-7 from the paper. Used only for while/for loop conditions.

The key formulas for &&:

tt = tt1 * tt2
tf = tf1 * t2 + tt1 * tf2
ff = ff1 + 2*tf1*f2 + tt1*ff2
pp = ff1 + 2*tf1*p2 + tt1*pp2

For ||:

tt = tt1 + 2*tf1*t2 + ff1*tt2
tf = tf1*f2 + ff1*tf2
ff = ff1 * ff2
pp = tt1 + 2*tf1*p2 + ff1*pp2

For !E: swaps tt/ff, tf stays the same.

For non-short-circuit expressions (comparisons, arithmetic, etc.), the base case is: tt=0, tf=p, ff=0, pp=0. This means the expression has p independent arcs, each of which can be used once for true and once for false.

Notable Design Decisions in the Implementation

  1. No gt (goto) tracking: the paper tracks goto target paths via a partial function gt. The implementation omits this entirely, which is appropriate for PHP where gotos in function bodies are extremely rare and outside this tool's scope.

  2. max(1, ...): the result is clamped to a minimum of 1. An empty function body or a function consisting only of unreachable code still reports ACPATH = 1.

  3. Elseif desugaring: rather than implementing a separate elseif rule, the code constructs a synthetic nested If_ node. This is mathematically equivalent and reduces code duplication.

  4. While-loop formula variant: the implementation passes un-multiplied ft to the loop body rather than t(E) * ft as the paper does, then adjusts the combining formula accordingly. See the processWhile() section above for the algebraic equivalence argument.

  5. Foreach as while with leaf condition: a pragmatic choice. The iteration variable binding is not modeled as a branching expression.

  6. Try/catch as alternative paths: each catch block is treated as a parallel branch with the same incoming ft as the try block. This is a reasonable extension for exception-handling semantics not covered by the original paper. Exceptions are modelled as if they always fire at try-entry; the analysis does not track which statements within the try block can throw, nor does it propagate uncaught exceptions out of the function.

  7. Constant-true loop conditions are not specialised: a while (true) or for (;;) loop is treated as having a leaf condition (f = 1), so the formulas always count one "skip the loop" path even though the condition can never evaluate to false. This produces a small over-count for infinite loops that exit only via break/return/throw. The behaviour is intentional and locked in by the test suite (e.g., for (;;) { break; } returns 2). Eliminating the phantom path would require constant folding on the condition expression and is outside the current scope.

  8. continue inside switch follows C semantics: the implementation propagates cp (continue paths) through switch statements, treating a bare continue inside a case as targeting the enclosing loop. This matches the paper's C-language model. PHP differs: at level 1, continue inside a switch behaves like break (and emits an E_WARNING since PHP 7.3). Code that uses bare continue directly inside a switch will therefore be modelled as continuing an outer loop rather than breaking out of the switch. This is an edge case; in idiomatic PHP, continue 2 is used instead, which is not handled by the current dispatcher either.

@sebastianbergmann sebastianbergmann changed the title Issue 4/acpath The ACPATH Software Metric May 22, 2026
@sebastianbergmann sebastianbergmann added the enhancement New feature or request label May 22, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

API Surface Changes

If any of the additions below are not intended as public API, mark them with @internal in the docblock.

New API Surface

Classes

Methods

Modified API Surface

Methods

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 99.84721% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.30%. Comparing base (b164899) to head (ffb2126).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/Visitor/AcpathPathEnumerationDotVisitor.php 98.34% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main      #25      +/-   ##
============================================
+ Coverage     93.93%   99.30%   +5.36%     
- Complexity       57      315     +258     
============================================
  Files             6       12       +6     
  Lines           132     1440    +1308     
============================================
+ Hits            124     1430    +1306     
- Misses            8       10       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant