Custom Strategy VM

Write arbitrary Prisoner's Dilemma strategies as compact bytecode programs, interpreted on-chain within the match execution pipeline.

Overview

The 9 built-in strategies cover the classic approaches, but the strategic structure is fixed. A player who wants “cooperate for 5 rounds, then play Tit-for-Tat, but always defect if the opponent has defected more than 60% of the time” cannot express that today.

The Custom Strategy VM lets players compose arbitrary decision logic as compact bytecode programs (max 64 bytes). Programs are interpreted on-chain during match execution — fully deterministic, verifiable, and reproducible. Custom is strategy variant index 9, alongside the existing built-in strategies.

Architecture Flow

Write bytecode (max 64 bytes)→SHA256(hash) → commitment→validate() → stored on-chain→execute_bytecode() per round

Built-in strategies remain as native optimized code paths — zero performance regression for existing players.

The VM lives in the match-logic crate and compiles to both native (on-chain contract) and WASM (frontend replay).

Machine Model

Property	Value
Stack depth	8 elements
Value type	u8 (0–255)
Max program size	64 bytes
Fuel limit	128 instructions per round
Default on error	Cooperate
Jump model	Forward-only (guarantees termination)

Inputs Available Per Round

Input	Source
Opponent’s move history	Slice, grows each round
Own move history	Slice, grows each round
Round number	u8, 0-indexed
Deterministic RNG	SeededRng, unique per player per round

Error Handling

The VM never panics. Every anomalous condition falls back to Cooperate:

Stack underflow — halt, Cooperate
Stack overflow — halt, Cooperate
Out-of-bounds history — returns 0 (Cooperate)
Unknown opcode — immediate halt, Cooperate
Fuel exhaustion — Cooperate
Program falls off end — Cooperate

This “fail-safe to cooperation” penalizes broken programs without crashing the match.

Instruction Set (25 opcodes)

Terminals

Hex	Mnemonic	Bytes	Stack	Description
00	COOP	1	→ halt	Return Cooperate immediately
16	DEFECT	1	→ halt	Return Defect immediately
18	RETURN	1	[v] → halt	Pop top; 0 = Cooperate, nonzero = Defect

Literals & Input

Hex	Mnemonic	Bytes	Stack	Description
01	PUSH imm8	2	→ [imm]	Push literal byte
02	OPP_LAST	1	→ [0\|1]	Opponent’s last move (0 if round 0)
03	MY_LAST	1	→ [0\|1]	My last move (0 if round 0)
04	OPP_N	1	[n] → [0\|1]	Opponent’s move n rounds ago
05	MY_N	1	[n] → [0\|1]	My move n rounds ago
06	OPP_DEFECTS	1	→ [count]	Total opponent defections (cap 255)
07	MY_DEFECTS	1	→ [count]	Total my defections (cap 255)
08	ROUND	1	→ [n]	Current round number (0-indexed)
09	RAND	1	→ [0..99]	Deterministic random 0–99
17	SCORE_LAST	1	→ [0..5]	My payoff from last round (3 if round 0)

Arithmetic (saturating)

Hex	Mnemonic	Stack	Description
0A	ADD	[a, b] → [a+b]	Capped at 255
0B	SUB	[a, b] → [a−b]	Floored at 0
0C	MUL	[a, b] → [a×b]	Capped at 255

Comparison & Logic

Hex	Mnemonic	Stack	Description
0D	GT	[a, b] → [0\|1]	1 if a > b
0E	LT	[a, b] → [0\|1]	1 if a < b
0F	EQ	[a, b] → [0\|1]	1 if a == b
10	NOT	[a] → [0\|1]	0 → 1, nonzero → 0
11	AND	[a, b] → [0\|1]	Both nonzero → 1
12	OR	[a, b] → [0\|1]	Either nonzero → 1

Stack & Control Flow

Hex	Mnemonic	Bytes	Stack	Description
13	DUP	1	[a] → [a, a]	Duplicate top
14	JMP_FWD off	2	—	Jump forward off bytes (unconditional)
15	JMP_FWD_IF off	2	[cond] → —	Pop; if nonzero, jump forward off bytes

Example Programs

Classic strategies re-implemented as bytecode. These demonstrate how the VM's small instruction set can express complex decision logic.

TitForTat

2 bytes

Copy opponent's last move. Round 0: opponent history empty → 0 → Cooperate.

02 18       OPP_LAST RETURN

AlwaysDefect

1 byte

Defect unconditionally.

16          DEFECT

GrimTrigger

8 bytes

Cooperate until the opponent defects once, then defect forever.

06          OPP_DEFECTS         ; [count]
01 00       PUSH 0              ; [count, 0]
0D          GT                  ; [count > 0]
15 01       JMP_FWD_IF 1        ; if true, skip to DEFECT
00          COOP
16          DEFECT

Pavlov

10 bytes

Win-stay, lose-switch: repeat last move if payoff ≥ 3, otherwise switch.

17          SCORE_LAST          ; [score]
01 03       PUSH 3              ; [score, 3]
0E          LT                  ; [bad?]  1 if score < 3
03          MY_LAST             ; [bad?, my_d]
0F          EQ                  ; [should_coop]  bad==my_d → cooperate
15 01       JMP_FWD_IF 1        ; if true → COOP
16          DEFECT
00          COOP

TitForTwoTats

9 bytes

Only retaliate after two consecutive opponent defections.

02          OPP_LAST            ; [last]
01 01       PUSH 1              ; [last, 1]
04          OPP_N               ; [last, second_last]
11          AND                 ; [both_defected]
15 01       JMP_FWD_IF 1        ; if true → DEFECT
00          COOP
16          DEFECT

Forgiving Detective

25 bytes

Cooperate rounds 0–2, defect round 3 (probe). After: if opponent never defected, exploit (AlwaysDefect); otherwise play TitForTat. A novel strategy impossible to express with the 9 built-in strategies.

08          ROUND               ; [round]
01 03       PUSH 3              ; [round, 3]
0D          GT                  ; [past_opening?]
15 06       JMP_FWD_IF 6        ; if past opening → analysis
08          ROUND               ; [round]
01 03       PUSH 3              ; [round, 3]
0F          EQ                  ; [is_round_3?]
15 01       JMP_FWD_IF 1        ; if round 3 → defect
00          COOP                ; rounds 0-2: cooperate
16          DEFECT              ; round 3: probe defect
; -- analysis (round > 3) --
06          OPP_DEFECTS         ; [opp_d]
01 00       PUSH 0              ; [opp_d, 0]
0F          EQ                  ; [naive?]
15 02       JMP_FWD_IF 2        ; if never defected → exploit
02          OPP_LAST            ; [opp_last]
18          RETURN              ; TFT: mirror opponent
16          DEFECT              ; exploit naive opponent

Try it in the Strategy Lab

Write assembly, get instant WASM validation, and preview your custom strategy against all 9 built-ins — right in the browser.

Commit-Reveal for Custom Strategies

Custom strategies use a two-level hashing scheme to keep the commitment preimage fixed-length while allowing variable-length bytecode.

Strategy Type	Commitment Hash
Built-in	SHA256(strategy_u8 \|\| salt[16])
Custom	SHA256(9u8 \|\| SHA256(bytecode[0..len]) \|\| salt[16])

The inner SHA256(bytecode) hash produces a fixed 32-byte digest regardless of program length, keeping the outer preimage at a fixed 49 bytes (1 + 32 + 16). The bytecode hash can also be displayed independently as a program fingerprint.

Forfeit handling: The forfeit mechanism uses on-chain SlotHashes sysvar data to deterministically assign a built-in strategy (index 0–8) — forfeited players never receive Custom.

Bytecode Validation

Six checks are performed on-chain during the reveal phase to reject malformed programs before they enter the match pipeline:

1. Non-emptyProgram length must be > 0

2. Length limitProgram length must be ≤ 64 bytes

3. Valid opcodesEvery byte must be a known opcode (0x00–0x18)

4. Complete immediatesPUSH, JMP_FWD, and JMP_FWD_IF must have their operand byte present

5. Jump boundspc + offset ≤ bytecode.len() for all jumps

6. Has terminalAt least one COOP, DEFECT, or RETURN instruction must exist

Stack depth is not validated statically — underflow and overflow are handled gracefully at runtime (see Machine Model error handling).

Testing Locally

The match-logic crate provides everything you need to validate and test custom bytecode programs locally before submitting them on-chain.

Key Functions

Function	Description
validate_bytecode(bytecode: &[u8])	Runs all 6 validation checks. Returns Ok(()) or a BytecodeError variant (Empty, TooLong, UnknownOpcode, TruncatedImmediate, JumpOutOfBounds, NoTerminal).
run_match(strategy_a, strategy_b, seed, match_index, participant_count)	Executes a full match between two PlayerStrategy values. Returns a MatchResult with round-by-round history and total scores.
replay_match(...) (WASM)	Browser-compatible binding. Accepts JSON-serialized strategies, returns JSON MatchResult. Use for frontend testing.

Example: Validate & Run a Custom Strategy

use match_logic::{validate_bytecode, run_match, PlayerStrategy};

fn main() {
    // TitForTat as bytecode: OPP_LAST RETURN
    let bytecode = vec![0x02, 0x18];

    // Validate before submitting on-chain
    validate_bytecode(&bytecode).expect("invalid program");

    // Test against AlwaysDefect
    let custom  = PlayerStrategy::Custom(bytecode);
    let defector = PlayerStrategy::Builtin(match_logic::Strategy::new(match_logic::StrategyBase::AlwaysDefect));

    let seed = [0u8; 32];
    let result = run_match(&custom, &defector, &seed, 0, 8);

    println!("Custom: {} | Defector: {}",
        result.total_score_a, result.total_score_b);
    println!("Rounds played: {}", result.round_count);
    for r in &result.rounds {
        println!("  R{}: {:?} vs {:?} → {}-{}",
            r.round, r.move_a, r.move_b, r.score_a, r.score_b);
    }
}

Add match-logic as a dependency in your Cargo.toml to test locally with cargo run. The same code that runs on-chain will execute on your machine — results are deterministic given the same seed.

For browser-based testing, the WASM replay_match() binding accepts JSON strategies like {"Custom": [2, 24]} and returns a full JSON match result.