The ACPATH Software Metric#25
Open
sebastianbergmann wants to merge 3 commits into
Open
Conversation
90402da to
c8549d6
Compare
API Surface ChangesIf any of the additions below are not intended as public API, mark them with New API SurfaceClasses
Methods
Modified API SurfaceMethods
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #25 +/- ##
============================================
+ Coverage 93.93% 99.30% +5.36%
- Complexity 57 315 +258
============================================
Files 6 12 +6
Lines 132 1440 +1308
============================================
+ Hits 124 1430 +1306
- Misses 8 10 +2 ☔ View full report in Codecov by Sentry. |
c8549d6 to
ffb2126
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This implements the ACPATH complexity metric described in
PDF
New/updated paper:
PDF
This software metric answers the question: How many acyclic execution paths exist through a function?
The number of acyclic paths through a function is a direct measure of its testability: since each acyclic path represents a distinct execution scenario, a thorough test suite should exercise all (or a significant fraction) of them. A function with 200 acyclic paths is fundamentally harder to test than one with 4.
ACPATH improves on two predecessors:
&&/||, does not handle the backward jump inwhileloops, and does not modelswitchfall-through correctly.ACPATH is proven (Theorem 2 in the paper) to yield the exact number of acyclic paths for all controlled function bodies, functions that contain no backward gotos and no jumps into a loop from outside. In practice this covers virtually all real-world code.
How Is ACPATH Calculated?
ACPATH works by structural induction over the abstract syntax tree. It performs a single traversal of the function body, propagating a set of path counts through each statement and expression.
Core Concepts
Path Counters for Expressions (Table 3 in the paper)
For every expression
E, three functions are defined:Ethat may evaluate to trueEthat may evaluate to falseE(= t + f for boolean exprs)The key insight is that short-circuit operators (
&&,||) introduce branching within an expression:E1 && E2:E2is only evaluated whenE1is true. Sot = t(E1) * t(E2),f = f(E1) + t(E1) * f(E2).E1 || E2:E2is only evaluated whenE1is false. Sot = t(E1) + f(E1) * t(E2),f = f(E1) * f(E2).!E1: swapstandf.E1 ? E2 : E3(ternary):t = t(E1)*t(E2) + f(E1)*t(E3),f = t(E1)*f(E2) + f(E1)*f(E3).t = f = p = 1.For non-boolean operators (arithmetic, comparison, assignment, etc.),
t = f = psince we cannot determine the boolean outcome statically.Double-Traversal Functions (Tables 4-7 in the paper)
While-loop conditions can be traversed twice in a single acyclic path: once when entering the loop (evaluating to true) and once when exiting (evaluating to false). To correctly count paths through a while loop, four additional functions are needed for the condition expression:
Ecan be traversed twice, both times evaluating to true, on non-overlapping arcsEcan be traversed twice, first true then false (or vice versa), on non-overlapping arcsEcan be traversed twice, both times evaluating to false, on non-overlapping arcsEcan be traversed twice on non-overlapping arcsFor a simple leaf expression:
tt = 0, tf = 1, ff = 0, pp = 0(there is exactly one arc, and it can be used once for true and once for false).These compose for
&&,||,!, and ternary in the same way single-traversal functions do, following the structure of the expression's control-flow graph.Path Counters for Statements (Definitions 6-7, equations 37-53)
Each statement is analyzed by computing a tuple of five values:
break(not inside a nested switch/loop)continue(not inside a nested loop)returnstatementThe ft for incoming paths is threaded through the statement sequence: each statement receives the ft produced by its predecessor. The final ACPATH value for a function body is ft_out + rp (equation 53): all paths that fall off the end of the function plus all paths that exit via return.
The key formulas for statements are:
E;: ft_out = p(E) * ft. No branching, just multiply by expression paths.S1 S2: processS1with incoming ft, then processS2withS1's ft_out. bp, cp, rp accumulate.return: ft_out = 0, rp = ft (or p(E) * ft if returning an expression). All incoming paths divert to return.return E: ft_out = 0, rp = p(E) * ft.if (E) S1 else S2: S1 receives t(E) * ft paths, S2 receives f(E) * ft paths. ft_out = ft1 + ft2.if (E) S1(no else): S1 receives t(E) * ft paths. ft_out = ft1 + f(E) * ft.while (E) S: ft_out = f(E) * ft + bp_S * t(E) + (ft_S + cp_S) * tf(E) / t(E). This accounts for: (1) paths that skip the loop (f * ft), (2) paths broken out of the loop that re-enter the condition (bp_S * t), (3) paths that complete the body and loop back through the condition requiring double traversal (tf). The paper writes this as (ft_S + cp_S) * tf(E) / t(E), but since the body was entered with t(E) * ft incoming paths, dividing by t(E) normalizes back to "per incoming path". In the implementation this is computed differently (see below).do S while (E): ft_out = f(E) * ft_S + bp_S. The body always executes once; only the false-exit paths from the condition leave the loop.for (E1; E2; E3) S: desugared toE1; while (E2) { S; E3; }.break: ft_out = 0, bp = ft.continue: ft_out = 0, cp = ft.switch (E) S: Each case label addsst(switch-to) incoming paths. Fall-through between cases is handled by sequential processing. bp from the switch body becomes ft_out (break exits the switch). If there is no default, an additional p(E) * ft paths pass through without matching.Implementation in
src/Visitor/AcpathCalculator.phpThe implementation is a PHP class that takes a list of PHP-Parser
Stmtnodes (the body of a function/method) and returns the ACPATH count as an integer. It implements the paper's formulas from Section 4, adapted to PHP's syntax.Entry Point:
calculate()Corresponds to equation (53):
apc_i^b[B] := ft_out + rp. Initial ft = 1 (one path enters the function), initial st = 0 (no switch-to paths). Themax(1, ...)ensures even an empty function returns at least 1. Thegt(goto) component from the paper is not implemented since PHP does not have goto statements in practice within this tool's scope.Statement Sequence:
statements()Implements equation (38): sequential composition. Iterates through statements, threading
ftfrom one to the next. Accumulatesbp,cp,rpby summation. Thestparameter carries the switch-to path count for switch statements.Individual Statement Dispatch:
statement()Routes each statement type to its handler. Implements the paper's equations as follows:
Expression(expr stmt)Return_(no expr)Return_(with expr)If_processIf()Switch_processSwitch()While_processWhile()Do_processDo()For_processFor()Foreach_processForeach()Break_Continue_TryCatchprocessTryCatch()Blockstatements()processIf(): Conditional StatementsImplements equations (41) and (42).
t,f,pfor the condition expression.elseifis extracted, a newIf_node is constructed with remaining elseifs and the else clause, and the else branch is processed as this synthetic inner if. This mirrors how the paper treats elseif as syntactic sugar.t * ftincoming paths, the else-branch receivesf * ftincoming paths. ft_out = ft1 + ft2.t * ftpaths, andf * ftpaths fall through directly. ft_out = ft1 + f * ft.bp, cp, rp from both branches accumulate by addition.
processSwitch(): Switch StatementsImplements equation (43).
pfor the switch condition expression.switchSt = p * ft; this is the "switch-to" count: the number of incoming paths that each case label contributes.processSwitchBody()which processes cases sequentially.stto the current ft (modeling "switch-to" entry). Case body statements are processed normally, allowing fall-through between cases (ft flows from one case to the next unless interrupted by break).ftOut = ftS + bpS(fall-through plus break paths). If there is nodefaultcase, addsp * ftpaths for the "no match" case.processWhile(): While LoopsImplements equation (44).
However, note an important difference from the paper's formula. The paper states:
where the body is entered with t(E) * ft incoming paths. The implementation instead passes the un-multiplied
$ftto the body(line 261:$this->statements($stmt->stmts, $ft, $st)), not$t * $ft. Then compensates by multiplying the break term by$tand multiplying the skip-loop term by$f * $ftinstead of$f. This is algebraically equivalent when the paper's formula is expanded with the incomingft:t*ftpaths, soft_Sis proportional tot*ft. Then(ft_S + cp_S) * tf / tnormalizes out one factor oft.ftpaths (no multiplication byt). Then(ft_S + cp_S) * tfis already correct becauseft_Sis proportional toft(nott*ft), and multiplying bytf(which already accounts for one true-then-false traversal) gives the right count.The
bpfrom the body is multiplied by$tbecause break paths must have entered the loop (condition was true), accounting for the condition's true-paths. Return paths pass through unchanged.processDo(): Do-While LoopsImplements equation (45).
The body always executes once (receives
$ftpaths directly). Then:$f * $ftS: paths that complete the body and exit via the condition evaluating to false.$bpS: paths that break out of the loop.Note: the implementation does not use double-traversal functions for do-while. The paper's equation (45) is
ft_out = f(E) * ft_S + bp_S, which is simpler than while because the backward arc in a do-while loop goes from the condition back to the body entry and in the reference CFG this backward arc cannot be traversed in an acyclic path (it would revisit a node). So only one traversal of the condition is needed.processFor(): For LoopsImplements equation (46) by desugaring to
E1; while(E2) { S; E3; }:p.BooleanAnd(or usestrueif empty).Expressionstatements to the body.processForeach(): Foreach LoopsNot in the paper (PHP-specific). Treated as a while loop with a leaf condition (t=1, f=1, tf=1), giving:
This means: the loop may execute zero or one additional iteration, with both the "skip" and "iterate-once" paths counted.
processTryCatch(): Try/Catch/FinallyNot in the paper (the paper covers C, which has no exceptions). The implementation treats each catch block as an alternative path: it receives the same
ftas the try block (modeling that an exception could occur at the start of the try block). All ft values are summed. A finally block, if present, is threaded after the combined try+catch ft (it always executes).Expression Path Counting:
expressionPaths()Returns
{t, f, p}for an expression. Implements Table 3 from the paper:BooleanNot(!E)BooleanAnd(&&)BooleanOr(||)Ternary(E1?E2:E3)Ternary(E1?:E2, elvis)Coalesce(??)Match_Assign/AssignOpBinaryOp(non-boolean)Cast,UnaryMinus/PlusPHP-specific additions beyond the paper:
LogicalAnd/LogicalOr(PHP'sand/orkeywords): treated identically to&&/||.Coalesce(??): treated as||(short-circuit on non-null).Match_: sums up paths from all arm conditions and bodies.Double-Traversal:
expressionPathsDouble()Returns
{tt, tf, ff, pp}. Implements Tables 4-7 from the paper. Used only for while/for loop conditions.The key formulas for
&&:For
||:For
!E: swaps tt/ff, tf stays the same.For non-short-circuit expressions (comparisons, arithmetic, etc.), the base case is:
tt=0, tf=p, ff=0, pp=0. This means the expression haspindependent arcs, each of which can be used once for true and once for false.Notable Design Decisions in the Implementation
No
gt(goto) tracking: the paper tracks goto target paths via a partial functiongt. The implementation omits this entirely, which is appropriate for PHP where gotos in function bodies are extremely rare and outside this tool's scope.max(1, ...): the result is clamped to a minimum of 1. An empty function body or a function consisting only of unreachable code still reports ACPATH = 1.Elseif desugaring: rather than implementing a separate elseif rule, the code constructs a synthetic nested
If_node. This is mathematically equivalent and reduces code duplication.While-loop formula variant: the implementation passes un-multiplied
ftto the loop body rather thant(E) * ftas the paper does, then adjusts the combining formula accordingly. See theprocessWhile()section above for the algebraic equivalence argument.Foreach as while with leaf condition: a pragmatic choice. The iteration variable binding is not modeled as a branching expression.
Try/catch as alternative paths: each catch block is treated as a parallel branch with the same incoming ft as the try block. This is a reasonable extension for exception-handling semantics not covered by the original paper. Exceptions are modelled as if they always fire at try-entry; the analysis does not track which statements within the try block can throw, nor does it propagate uncaught exceptions out of the function.
Constant-true loop conditions are not specialised: a
while (true)orfor (;;)loop is treated as having a leaf condition (f = 1), so the formulas always count one "skip the loop" path even though the condition can never evaluate to false. This produces a small over-count for infinite loops that exit only viabreak/return/throw. The behaviour is intentional and locked in by the test suite (e.g.,for (;;) { break; }returns 2). Eliminating the phantom path would require constant folding on the condition expression and is outside the current scope.continueinsideswitchfollows C semantics: the implementation propagatescp(continue paths) throughswitchstatements, treating a barecontinueinside acaseas targeting the enclosing loop. This matches the paper's C-language model. PHP differs: at level 1,continueinside aswitchbehaves likebreak(and emits anE_WARNINGsince PHP 7.3). Code that uses barecontinuedirectly inside aswitchwill therefore be modelled as continuing an outer loop rather than breaking out of the switch. This is an edge case; in idiomatic PHP,continue 2is used instead, which is not handled by the current dispatcher either.