This repository contains a toy semantic merge checker for simple C programs. It looks for semantic merge conflicts that escape syntax-based merge tools and compiler errors, and it can synthesize a semantic merge in some cases.
The implementation uses LLVM bitcode, the custom KLEE fork in
third_party/klee, and Z3. Benchmarks now live in benchmarks/klee, so the
repository no longer depends on a separate ~/merge-benchmark checkout.
benchmarks/klee/: safe, unsafe, and buggy benchmark casesthird_party/klee/: custom KLEE fork as a git submodulethird_party/klee-build/: recommended out-of-tree KLEE build directorychecker/: SMT-based equivalence, conflict, and summary logicmergeChecker.py: CLI for semantic conflict checkingsemanticMerger.py: CLI for merge synthesiseqRunner.py: CLI for semantic equivalence checking
- Python 3.12 and
uv - CMake and Ninja
- LLVM/Clang 13
- Z3 headers and libraries
- zlib, libxml2, and sqlite3 development libraries
The verified KLEE build in this repository uses LLVM 13 explicitly. The build
commands below assume an LLVM 13 installation rooted at ~/.local/llvm-13:
~/.local/llvm-13/bin/clang~/.local/llvm-13/bin/clang++~/.local/llvm-13/lib/cmake/llvm
If your LLVM 13 installation lives elsewhere, set LLVM13_ROOT accordingly or
pass explicit paths on the CMake command line.
- Initialize the submodule:
git submodule update --init --recursive- Build KLEE. The repository defaults to
third_party/klee-build/bin/klee. The following minimal build was verified for this repository and is enough formergeChecker.py,semanticMerger.py, and the smoke tests:
LLVM13_ROOT=${LLVM13_ROOT:-$HOME/.local/llvm-13}
cmake -S third_party/klee -B third_party/klee-build -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_C_COMPILER="$LLVM13_ROOT/bin/clang" \
-DCMAKE_CXX_COMPILER="$LLVM13_ROOT/bin/clang++" \
-DLLVMCC="$LLVM13_ROOT/bin/clang" \
-DLLVMCXX="$LLVM13_ROOT/bin/clang++" \
-DLLVM_DIR="$LLVM13_ROOT/lib/cmake/llvm" \
-DENABLE_SOLVER_Z3=ON \
-DENABLE_SOLVER_STP=OFF \
-DENABLE_SOLVER_METASMT=OFF \
-DENABLE_POSIX_RUNTIME=OFF \
-DENABLE_UNIT_TESTS=OFF \
-DENABLE_SYSTEM_TESTS=OFF \
-DENABLE_TCMALLOC=OFF \
-DENABLE_DOCS=OFF \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON
cmake --build third_party/klee-build -j"$(nproc)"If you want KLEE's unit tests, system tests, or POSIX runtime, you will also
need additional dependencies such as googletest and klee-uclibc, and you
should reconfigure with the corresponding flags enabled.
- Create the Python environment:
uv sync --devThe Python entry points now prefer repo-local paths and command-line configuration over hardcoded home-directory locations.
- Benchmarks:
benchmarks/klee - KLEE source:
third_party/klee - KLEE build:
third_party/klee-build - Clang: discovered from
--clang,SEMANTIC_MERGER_CLANG,CLANG, or PATH
Supported environment overrides:
SEMANTIC_MERGER_BENCHMARK_ROOTSEMANTIC_MERGER_KLEE_SOURCESEMANTIC_MERGER_KLEE_BUILD_DIRSEMANTIC_MERGER_KLEE_EXESEMANTIC_MERGER_CLANG
Check whether a benchmark merge candidate is safe:
uv run python mergeChecker.py safe/1Generate a merge result for a case:
uv run python semanticMerger.py unsafe/4Check equivalence between two source files:
uv run python eqRunner.py benchmarks/klee/safe/1/A.c benchmarks/klee/safe/1/B.cRun the smoke tests:
uv run pytest
uv run python -m scripts.smoke_test --clang "$LLVM13_ROOT/bin/clang"In a 3-way merge scenario there're 4 versions of programs in the play: the two versions on the HEAD commit on two branches to be merge (referred to as A and B, resp.), the version at the LCA (Lowest Common Acestor) commit of the two branches (the base program, referred to as O), and the merge candidate (referred to as M).
The definition of semantic merge conflict is introduced in the OOPSLA'18 paper Verified three-way program merge.
Formally, semantic conflict-freedom requires the observable outputs of the 4-tuple of program versions
if for all valuations
the following condition hold for all
- If
$\sigma_{O}[(out, i)] \neq \sigma_{A}[(out, i)]$ , then$\sigma_{M}[(out, i)] = \sigma_{B}[(out, i)]$ - If
$\sigma_{O}[(out, i)] \neq \sigma_{B}[(out, i)]$ , then$\sigma_{M}[(out, i)] = \sigma_{A}[(out, i)]$ - Otherwise,
$\sigma_{O}[(out, i)] = \sigma_{A}[(out, i)] = \sigma_{B}[(out, i)] = \sigma_{M}[(out, i)]$
Where
In short, semantic conflict-freedom requires the 4 versions of the programs to reach some sort of agreement of the program semantic, rather than syntax.