Drop-In-JTAG
A drop-in IEEE 1149.1 JTAG test infrastructure for RTL designs. The project
provides a complete TAP controller, instruction register, boundary scan
register (BSR) chain, and debug clock-gating FSM that can be wrapped around
any synchronous DUT — with no changes required to the DUT itself.
========================================
What It Does
Drop-In-JTAG sits between your DUT and the outside world. It intercepts the
DUT's input and output pins through a chain of boundary scan cells, and
exposes a standard 4-wire JTAG interface (TCK, TMS, TDI, TDO, TRST) to a
test controller. A debug FSM running on the system clock lets you halt,
single-step, and resume the DUT without modifying its RTL.
JTAG pins
TCK TMS TDI TDO TRST
┌─────▼──────────────┐
│ TAP Controller │
│ Instruction Reg │
│ JTAG Test Logic │
│ Debug Clock FSM │
└─────────┬──────────┘
│ dbgclk (gated sys_clk)
┌─────────▼──────────┐
│ BSR Chain │ ← scan inputs & outputs
└──────┬──────┬──────┘
│ │
┌──────▼──────▼──────┐
│ DUT │
└────────────────────┘
========================================
Repository Structure
Drop-In-JTAG/
├── JTAG-HDL/ # Core JTAG infrastructure (shared by all targets)
│ ├── jtag_test_logic.sv # Top-level JTAG module
│ ├── tap_controller.sv # IEEE 1149.1 TAP FSM
│ ├── instruction_register.sv
│ ├── bsr.sv # Boundary scan register (parameterised width)
│ ├── bsr_cell.sv # Single BSR cell (IEEE 1149.1 §8.5.1)
│ ├── bypass_register.sv
│ ├── device_identification_register.sv
│ ├── synchronizer.sv # 2-FF CDC synchronizer
│ ├── top.sv # RISC-V DUT top-level wrapper
│ ├── defines.sv # Instruction opcodes and constants
│ └── ...
│
├── FPGA-ArtyA7/ # Arty A7-100T FPGA implementation
│ ├── Makefile
│ ├── swap_top.sh # Switch between sim and FPGA top-level
│ ├── top_fpga.sv # FPGA top (MMCM clock generation)
│ ├── top_orig.sv # Simulation top (plain clock)
│ ├── clk_gen.sv # MMCM wrapper for Arty A7-100T
│ └── build_arty_project.tcl
│
├── accum/ # 8-bit accumulator demo (no RISC-V required)
│ ├── top_accum.sv # Accumulator DUT + BSR chain
│ ├── tb_top_accum.sv # QuestaSim testbench
│ └── top.do # Simulation run script
│
└── testbenches/
└── tb_top.sv # RISC-V testbench
========================================
JTAG Instructions
The instruction register is 4 bits wide. The following instructions are
supported:
Instruction Opcode Description
IDCODE 4'b0001 Shift out 32-bit device ID
SAMPLE_PRELOAD 4'b0010 Capture live pin values into BSR
EXTEST 4'b0011 Drive BSR outputs onto DUT inputs
INTEST 4'b0100 Internal logic test
CLAMP 4'b0101 Hold BSR outputs, bypass DR
HALT 4'b0110 Gate dbgclk — freeze DUT execution
STEP 4'b0111 Single-step DUT one dbgclk cycle
RESUME 4'b1000 Resume dbgclk — unfreeze DUT
RESET 4'b1001 Assert DUT reset via JTAG
BYPASS 4'b1111 Single-bit bypass DR
IR Scan Vector Format
The IR is 4 bits wide. Scans use a 12-bit TMS/TDI vector driven MSB-first
(bit 11 first) on each falling edge of TCK. The opcode occupies TDI bits
6:3], shifted LSB-first:
bit[11]=0 TLR -> Run-Test/Idle
bit[10]=1 RTI -> Select-DR
bit[ 9]=1 SelDR-> Select-IR
bit[ 8]=0 SelIR-> Capture-IR
bit[ 7]=x CapIR-> Shift-IR (don't-care)
bit[ 6]=x ShIR -> Shift-IR (opcode bit 0)
bit[ 5]=x ShIR -> Shift-IR (opcode bit 1)
bit[ 4]=x ShIR -> Shift-IR (opcode bit 2)
bit[ 3]=1 ShIR -> Exit1-IR (opcode bit 3)
bit[ 2]=1 Ex1IR-> Update-IR
bit[ 1]=0 UpdIR-> Run-Test/Idle
bit[ 0]=1 RTI -> Select-DR <- TAP rests here
Quick-reference TDI vectors for common instructions:
Instruction TDI vector
HALT 12'b000000110000
SAMPLE_PRELOAD 12'b000000100000
EXTEST 12'b000000011000
RESUME 12'b000010000000
RESET 12'b000010010000
========================================
Clock Domain Crossing
The DUT runs on dbgclk (a gated version of sys_clk). All JTAG control
signals originate in the TCK domain and cross into sys_clk through
two-flip-flop synchronizers before driving the debug FSM:
systemverilog
// Clock-Domain Crossing: TCK -> sys_clk
// TCK is asynchronous to sys_clk. Without synchronization, any signal
// crossing this boundary is susceptible to metastability. Each synchronizer
// adds two sys_clk cycles of latency, which is acceptable here because JTAG
// control signals change only on UpdateIR/UpdateDR boundaries — orders of
// magnitude slower than sys_clk.
synchronizer logicrst (.clk(sys_clk), .d(logic_reset), .q(dm_reset));
synchronizer dbgrst (.clk(sys_clk), .d(~trst ~reset), .q(dbg_rst));
synchronizer dbghalt (.clk(sys_clk), .d(halt), .q(dbg_halt));
synchronizer dbgstep (.clk(sys_clk), .d(step && updateIR), .q(dbg_step));
synchronizer dbgresume(.clk(sys_clk), .d(resume), .q(dbg_resume));
========================================
How bsr_clk Works
systemverilog
assign bsr_clk = (tck & clk_dr) ~bsr_enable;
assign bsr_enable = sample_preload extest intest clamp;
When no BSR instruction is active, ~bsr_enable holds bsr_clk permanently
HIGH. Since bsr_cell clocks on posedge bsr_clk, a stuck-high signal
produces no rising edges and the shift registers never move.
SAMPLE_PRELOAD (or EXTEST) must be loaded before entering Capture-DR.
Loading HALT first and then SAMPLE_PRELOAD is the correct sequence for a
frozen snapshot:
- HALT IR scan — gates dbgclk, freezes DUT
- TLR flush (tms=1×5) — clean TAP state
- SAMPLE_PRELOAD IR — arms bsr_enable
- Capture-DR — BSR latches frozen pin values
- Shift-DR — clock out chain serially
========================================
FPGA Implementation — Arty A7-100T
See [FPGA-ArtyA7/README.md for full build
instructions. The key difference from simulation is that the FPGA top-level
uses an MMCM to generate sys_clk from the board's 100 MHz oscillator. The
swap_top.sh utility in FPGA-ArtyA7/ manages switching between the
simulation and FPGA top-level modules:
bash
./swap_top.sh --fpga # install FPGA top (top_fpga.sv + clk_gen.sv)
./swap_top.sh --orig # restore simulation top
./swap_top.sh --status # show active version
Always run ./swap_top.sh --fpga before make.
========================================
Accumulator Demo
The accum/ directory contains a self-contained example that replaces the
RISC-V core with a simple 8-bit accumulator/ALU. It requires no memory
initialization file and is a good starting point for understanding how to
wrap a new DUT with the BSR chain.
See [accum/README.md for a full walkthrough including the
BSR chain layout, correct JTAG scan sequence, and expected simulation output.
bash
cd accum
vsim -do top.do -c # batch
vsim -do top.do # GUI
========================================
Wrapping a New DUT
To connect a new DUT to Drop-In-JTAG:
-
Instantiate jtag_test_logic and connect sys_clk, dbg_clk,
bsr_tdi/tdo/clk/update/shift/mode, and the JTAG pins. -
Clock your DUT from dbg_clk so the debug FSM can halt and step it.
-
Add a bsr instance for each DUT port you want to observe or control.
Chain them: bsr_chain0 = bsr_tdi, each BSR's tdo feeds the next BSR's
tdi, and the last BSR's tdo drives bsr_tdo. -
Wire input BSRs with parallel_in = DUT input signal and
parallel_out feeding the DUT (used in EXTEST to force values). -
Wire output BSRs with parallel_in = DUT output signal.
parallel_out is observe-only and need not be fed back. -
Update bsr_tdo to point to the last BSR in the chain.
The total scan chain length is the sum of all BSR widths. Update your
testbench tdovector width and decode slices accordingly.
========================================
License
Licensed under the Solderpad Hardware License v2.1.
See [LICENSE for details.
Copyright (C) 2021-25 Harvey Mudd College & Oklahoma State University