MEV Sandwich Detection at Scale: Implementation Guide + Pitfalls

Intermediate Track

MEV Sandwich Detection at Scale: Implementation Guide + Pitfalls

Q: What is the most reliable signal for a true sandwich?

Use a combination of bracketing structure in a pool timeline, linkage between attacker legs, measurable victim harm attributable to the front-leg, and attacker profit after conservative gas and fee accounting.

Q: Can you detect sandwiches without mempool data?

Yes. Many sandwiches can be detected from in-block ordering and pool-centric analysis. Mempool data helps explain the mechanism, but it is not required to label structure, harm, and profit.

Q: Why do false positives spike when you add aggregator support?

Aggregators create multi-hop routes with many swap logs per transaction. Without route context you can misassign pools and amounts. Use traces when available or downgrade confidence when not.

Q: Should I use ML for sandwich detection?

Start with deterministic labeling that proves harm and profit. Add ML later for scoring borderline cases, clustering attacker behavior, or prioritizing investigations.

Q: Where can I learn to build the evaluation loop for this detector?

Use the TokenToolHub prerequisite reading on building an anomaly detector, then standardize investigation and reporting playbooks through AI Learning Hub and Prompt Libraries.

MEV Sandwich Detection at Scale is not a single heuristic and it is not a dashboard that flags a block as “bad”. It is an end to end engineering problem: ingesting blocks and traces reliably, reconstructing swap intent, labeling attacker-victim-attacker patterns, measuring confidence, and doing it fast enough to keep up with chain throughput without drowning in false positives. This guide shows a practical blueprint for detection systems that operate at production scale, including failure modes that quietly break accuracy.

TL;DR

A sandwich is a specific three-part structure: attacker front-run, victim swap, attacker back-run, typically in the same block and around the same pool path.
At scale, the hard part is not spotting three swaps. The hard part is reconstructing pool state, routing, and intent with incomplete data (private tx, missing traces, MEV relays).
Start with a deterministic baseline detector (pool-centric, trace-aware), then add a probabilistic scoring layer to handle edge cases.
False positives usually come from arbitrage, liquidations, multi-route aggregators, rebase style fee-on-transfer tokens, and reused routers.
Production pipeline needs: block ingestion, log decoding, optional traces, pool state snapshots, candidate generation, labeling, scoring, storage, and monitoring.
For a strong anomaly mindset and evaluation harness patterns, treat Building a market anomaly detector as prerequisite reading.
For AI assist features and prompt templates that make the workflow repeatable, use AI Learning Hub and Prompt Libraries.
If you want updates and new playbooks, you can Subscribe.

Prerequisite reading Detection is an engineering discipline, not a label

Sandwich detection behaves like any serious anomaly system: you need clean ingestion, stable feature definitions, evaluation sets, and drift monitoring. If you have not built a detection loop before, start with Building a market anomaly detector. It will help you think in terms of ground truth, thresholds, and failure analysis, which is exactly what scale MEV detection requires.

What a sandwich is, in precise on-chain terms

A sandwich attack is a coordinated sequence of swaps that extracts value from a victim trade by manipulating price around it. The classic shape is attacker buy (front-run) then victim buy then attacker sell (back-run). The attacker profits because the victim’s swap moves price further in the direction the attacker already pushed, and the attacker unwinds after the victim at a better price.

In practice, the details vary. Sometimes the victim is a sell and the attacker sells first to push price down, then buys back after. Sometimes the trade route touches multiple pools, and the attacker mirrors only the sensitive pool or one hop in the route. Sometimes the attacker uses multiple addresses or a bundle, and the victim is included by a builder relay rather than mempool ordering.

If you are building detection, do not define a sandwich as “three swaps in one block”. That definition breaks immediately when you encounter: multi-hop routes, aggregator routers that emit many swap logs, internal calls, partial fills, and routed trades that touch multiple pools. A useful definition is pool-centric and state-centric: if the attacker’s first action shifts pool price (or effective price) and the victim executes at a worse price because of it, then the attacker unwinds to realize profit.

Why sandwich detection matters for builders and traders

Sandwich attacks are not only an academic MEV topic. They are a user harm problem. If you build wallets, swaps, bots, risk dashboards, or analytics, you need to quantify how often users are being sandwiched, which pools are most exploited, and which routing patterns create predictable victims.

Detection matters for three concrete reasons:

User protection: you can warn users before they trade into a vulnerable path, or route through safer venues.
Market integrity analytics: you can separate organic volume from volume that is being “taxed” by MEV.
Protocol hardening: you can identify which pool designs, fee tiers, or router behaviors are consistently exploited.

There is also a hidden reason: without detection, you cannot measure whether mitigations help. MEV protection is often discussed in marketing terms. A detection system lets you test outcomes, pool by pool, time window by time window, and see whether user harm actually drops.

What “at scale” really means for sandwich detection

In small prototypes, you can scan a few blocks, decode a couple of swap events, and label patterns manually. At scale, you face constraints: throughput, chain differences, missing data, probabilistic routing, and adversaries who adapt.

Throughput

Blocks, logs, traces

You must handle continuous ingestion without backlogs and without losing ordering semantics.

Data quality

Traces may be missing

Some providers throttle trace APIs, and private relays reduce visibility into mempool and bundles.

False positives

Arb looks similar

Arbitrage and sandwich share patterns. Your detector must prove victim harm and attacker profit, not only sequence.

The mental shift is this: you are building a streaming system and a labeling system. Your objective is not “find sandwiches”. Your objective is “produce stable, explainable labels with known error bars”. That is why the anomaly-detector prerequisite reading matters: detection systems are judged by their mistakes, not their demos.

The minimum data model you need

A robust detector can be built with only receipts and logs for many AMMs, but it becomes much better if you also have execution traces. At scale, you should design your schema so you can operate in “logs only” mode, then upgrade to “trace aware” mode when available.

Entity	Fields you must store	Why it matters	Common pitfall
Block	number, hash, timestamp, baseFee, proposer or builder metadata (if available)	Time windows, gas economics, ordering context	Assuming timestamp implies inclusion order
Transaction	hash, from, to, nonce, gasUsed, effectiveGasPrice, index, value	Ordering, gas cost, cluster linking	Ignoring internal calls and router nesting
Receipt and logs	status, logs, topics, address, data	Swap reconstruction and token transfers	Not handling multiple swaps per tx
Decoded swap	pool, tokenIn, tokenOut, amountIn, amountOut, sqrtPrice or reserves (if known), hop index	Pool-centric detection and victim harm estimation	Confusing router amounts with pool amounts
Trace (optional)	call tree, internal transfers, contract creation, revert reasons (if available)	Aggregator paths, precise amounts, intent inference	Trace API throttling and missing nodes
Pool state snapshot	reserves or price state at block boundaries (or computed from swap deltas)	Slippage and price impact estimation	Assuming static price within a block

How sandwich detection works, step by step

The most reliable approach is pool-centric candidate generation followed by victim harm and attacker profit verification. You do not start by hunting “attacker addresses”. You start by understanding how a pool state changes inside a block.

1) Ingest blocks and decode swaps into a canonical format

Your detector cannot reason about “a Uniswap swap” unless you normalize swaps into a canonical record. Different AMMs emit different events. Routers wrap swaps. Aggregators batch multiple swaps. Your canonical record should answer these questions per swap:

Which pool or venue executed the swap?
Which token went in, which token came out?
What were the pool amounts (not router amounts)?
What is the swap direction relative to the pool pair?
What is the tx index and log index so ordering is deterministic?

from dataclasses import dataclass
from typing import Optional

@dataclass(frozen=True)
class CanonicalSwap:
    chain_id: int
    block_number: int
    block_timestamp: int
    tx_hash: str
    tx_index: int
    log_index: int
    trader: str            # tx.from (use traces to map to end user when possible)
    router: Optional[str]  # tx.to or detected router in trace
    pool: str              # AMM pool contract
    amm: str               # "uni_v2", "uni_v3", "curve", "balancer", "unknown"
    token_in: str
    token_out: str
    amount_in: int
    amount_out: int
    recipient: Optional[str]  # pool output recipient if known
    hop: int = 0              # order within route

# Tip: store amounts as raw integers plus token decimals in a separate token registry.

At scale, decoding is a major cost center. Cache ABIs, decode by topic signature, and store swap records in a columnar-friendly schema. Most analytics pipelines eventually land in a data store optimized for large scans and group-by operations.

2) Build a pool timeline for each block

A sandwich is local to a pool or a route segment. So create a per-block, per-pool ordered list of swaps. Ordering should be (tx_index, log_index) because a single tx can produce multiple swap logs. Once you have a pool timeline, candidate generation becomes mechanical: look for opposite-direction swaps that bracket another swap.

Candidate generation rule of thumb:

Front-leg: swap direction A (token0 to token1 or token1 to token0)
Victim swap: same direction A, occurs after the front-leg
Back-leg: opposite direction B, occurs after victim swap
Front-leg and back-leg are linked by address cluster or repeated router patterns

Pitfall Many false positives come from “three swaps” that are not a sandwich

Arbitrage often looks like buy then buy then sell, or sell then sell then buy, especially when multiple users trade in the same direction. If you only rely on ordering and direction, you will overlabel. Your system must prove two things: (1) the middle trade is harmed by the outer trades, and (2) the outer trades realize net profit after fees and gas.

3) Link attacker legs using clustering, not just a single address

At small scale you can assume the attacker uses one address. At real scale, attackers use: multiple EOAs, smart contracts, and relays. You need a flexible linking strategy:

Direct same sender: tx.from matches for front-leg and back-leg.
Contract executor: same contract address is used as the router or executor across both legs.
Funding linkage: both EOAs funded from the same hot wallet, or share coinbase payments (advanced).
Bundle linkage: legs appear as a tight bundle with near-zero gaps and predictable gas strategy (advanced).

Start simple. In production, a “same sender” linkage catches a lot. Then add “same executor contract”. If you want intelligence-grade clustering, you need additional datasets. That is where on-chain intel platforms can help your enrichment layer. If that fits your workflow, you can explore Nansen.

4) Estimate victim harm using pool state and counterfactual pricing

The center of sandwich detection is counterfactual analysis: what would the victim have received if the attacker’s front-run had not happened? You do not need perfect math for every AMM to get value, but you need a consistent approximation.

For constant product pools (Uniswap v2 style), counterfactual is straightforward if you know reserves before the victim trade. For concentrated liquidity pools (Uniswap v3 style), you need either: swap event fields that include sqrtPrice changes, or a v3 math engine that simulates ticks. At scale, many pipelines use approximations: compare effective price before and after, and compute deltas based on observed swap amounts.

Victim harm proof checklist

Confirm victim swap touches the same pool as attacker legs (or a shared critical hop).
Compute victim’s effective price and compare to a baseline price just before front-leg.
Estimate extra slippage attributable to attacker’s front-leg (not general volatility).
Check victim’s minOut and deadline. Many victims “allow” slippage, but harm still exists within that window.
Flag “uncertain” when router batching or multi-hop route makes counterfactual ambiguous without traces.

5) Verify attacker profit net of fees and gas

Profit verification is the second half of truth. Many patterns that look like sandwiches are actually failed attempts or neutral trades. Profit must account for:

AMM fees paid on both attacker swaps
Gas cost (effectiveGasPrice times gasUsed), plus any coinbase or builder tips if visible
Bridging costs if the attacker moves funds across venues (rare for same-block sandwiches, more common in generalized MEV)

At scale, you can start with a simplified profit model: compute net token delta for attacker address cluster and convert to a base asset using a price feed at block time. Then subtract gas cost. If profit is positive and victim harm is positive, you have a high confidence label.

A production architecture for detection at scale

The pipeline below assumes you want continuous detection and a queryable database, not just a one-off scan. There are many tech stacks that work. The architecture matters more than the exact tools.

Pitfalls that break sandwich detection in production

The pitfalls below are the ones that show up after you ship. They are not theoretical. They are what makes a system look accurate in a test window and then degrade quietly once it processes thousands of pools across weeks.

Pitfall	What it looks like	Why it hurts accuracy	Mitigation
Aggregator multi-hop complexity	One tx emits many swap logs across pools	You mis-assign victim pool or misread amounts	Use traces when possible, or route parsing by known router ABIs
Private order flow and bundles	Victim appears without mempool presence	Mempool-only heuristics fail; ordering still occurs in block	Focus on in-block pool timelines and profit verification
Arbitrage confusion	Attacker legs look like standard arb	You label arb as sandwich without victim harm	Require victim harm and bracketing around a specific tx
Fee-on-transfer or rebasing tokens	Amounts in/out do not match simple AMM math	Profit and harm estimates become wrong	Detect token behaviors and downgrade confidence or simulate with transfers
Trace gaps and provider throttling	Some blocks have missing call traces	Detector becomes inconsistent across time	Design logs-only fallback and record data availability flags
Pool upgrades and forks	Same address changes semantics	ABI decode silently fails or misdecodes	ABI versioning, signature-based decoding, and contract bytecode checks
Gas accounting blind spots	Tips or coinbase payments hidden	Profit calculation is overstated	Compute conservative profit, store “min profit” bounds, track known tip patterns

Candidate generation patterns that scale

Candidate generation decides whether your system scales. A naive approach compares every swap to every other swap. That explodes. You need narrow indexing and rules that reduce search space.

Pool-centric scan (fast baseline)

For each block, group swaps by pool. For each pool timeline, scan with a sliding window: when you see a swap, look ahead for an opposite-direction swap that could be a back-leg. Between them, find victim swaps that share direction with the first leg. This is O(n) per pool timeline with small constant factors if implemented carefully.

from collections import defaultdict

def direction(swap):
    # normalize direction to a boolean or small enum
    return (swap.token_in, swap.token_out)

def opposite(dir_a, dir_b):
    return dir_a[0] == dir_b[1] and dir_a[1] == dir_b[0]

def generate_candidates(swaps_for_block, max_gap=24):
    # swaps_for_block: list[CanonicalSwap]
    by_pool = defaultdict(list)
    for s in swaps_for_block:
        by_pool[s.pool].append(s)

    candidates = []
    for pool, swaps in by_pool.items():
        swaps.sort(key=lambda x: (x.tx_index, x.log_index))
        n = len(swaps)
        for i in range(n):
            a1 = swaps[i]
            dir_a = direction(a1)
            # look for a back-leg within a bounded window
            for j in range(i+2, min(n, i+max_gap)):
                a2 = swaps[j]
                if not opposite(dir_a, direction(a2)):
                    continue
                # victim candidates are between i and j and share dir_a
                for k in range(i+1, j):
                    v = swaps[k]
                    if direction(v) != dir_a:
                        continue
                    # lightweight linkage hint (same trader for attacker legs)
                    candidates.append((a1, v, a2))
    return candidates

# Tip: add early pruning:
# - require a1 and a2 share same tx.from OR share same router/executor fingerprint
# - require amounts be plausible (a2 amount_out relates to a1 amount_in)

This baseline produces many candidates. That is expected. The verification layer is where you narrow to true positives. If you want to scale further, you can implement attacker linkage before victim search: only look for back-legs that share attacker fingerprint.

Attacker fingerprinting that reduces compute

A production trick is to build a compact “fingerprint” for each swap:

sender address (tx.from)
executor contract address (trace-derived or tx.to)
gas strategy bucket (priority fee style, if available)
pool id
direction

Then for a given front-leg, you search only for back-legs with the same fingerprint but opposite direction, within a short window. That can cut candidates dramatically without missing most sandwiches.

Verification: turning candidates into high confidence labels

Verification is where you separate signal from noise. You want two scores: victim harm confidence and attacker profit confidence. Then combine them into a final label.

Victim harm estimation: a practical approach for different AMMs

The cleanest harm estimate is counterfactual output: what would the victim’s amountOut have been if the front-leg did not exist? For constant product pools, this is doable if you know reserves before the victim swap. If you do not store reserves, you can reconstruct them by replaying swaps inside the block, starting from a snapshot at block start. That is heavier, but still feasible if you limit replay to pools with candidates.

For concentrated liquidity pools, exact replay is hard because you need tick data. At scale, many teams use a practical compromise: treat the victim harm as a function of observed effective prices:

Compute attacker front-leg effective price in the pool.
Compute victim effective price.
Compute attacker back-leg effective price.
If victim price is materially worse than a baseline and the bracketing legs align, count it as harm, then weight by confidence.

This does not give perfect harm in units, but it gives a reliable detection signal when combined with profit verification.

Attacker profit: compute conservative lower bounds

Profit is often overstated when you ignore gas and hidden payments. A safe practice is to compute: a lower bound profit estimate, then label only when the lower bound is positive and above a small threshold. That makes your detector conservative, which reduces false positives.

def token_delta(swaps, attacker_addrs):
    # swaps: list[CanonicalSwap] that belong to attacker legs across pools
    # attacker_addrs: set of addresses clustered as attacker
    # This is simplified: real implementation should use transfer logs and trace-derived recipients.
    delta = {}
    for s in swaps:
        if s.trader not in attacker_addrs:
            continue
        delta[s.token_in] = delta.get(s.token_in, 0) - s.amount_in
        delta[s.token_out] = delta.get(s.token_out, 0) + s.amount_out
    return delta

def gas_cost_native(tx_receipts, attacker_txs, effective_gas_prices):
    total = 0
    for txh in attacker_txs:
        gas_used = tx_receipts[txh]["gasUsed"]
        egp = effective_gas_prices[txh]
        total += gas_used * egp
    return total

# Convert token deltas to native or USD with a price oracle at block time.
# Conservative choice: use worst-case conversion within a short window.

A scoring model that is explainable and versioned

You do not need a black box model to get great results. You need a score that is: stable, explainable, and versioned so you can improve without breaking dashboards.

A practical scoring breakdown:

Structure score: bracketing pattern exists in pool timeline and swap directions align.
Link score: attacker legs share sender or executor fingerprints.
Harm score: victim effective price is worse than baseline beyond a threshold.
Profit score: attacker lower bound profit is positive after conservative gas.
Data quality score: traces available, pool state replay available, token behavior known.

Then output a label like: sandwich_high, sandwich_medium, sandwich_low, uncertain. Store reasons in a compact list so a UI can explain why the system flagged it.

A visual intuition: why thresholds matter as volume grows

At scale, your detector sees more events. If your harm threshold is too low, false positives grow faster than true positives. If your profit threshold is too high, you miss smaller sandwiches that still harm users. You tune thresholds using evaluation sets and drift monitoring.

Step-by-step checks you should run before you trust your labels

This section is a pre-flight checklist for the system itself. Run these checks when you deploy a new decoder, a new chain, a new AMM integration, or when provider reliability changes.

System checks for sandwich detection at scale

Ordering sanity: for random blocks, confirm swaps sorted by (tx_index, log_index) match explorer ordering.
Decoder coverage: verify top pools and routers in your window decode successfully; track unknown signatures.
Trace availability: record a per-block flag for trace success; do not mix trace and non-trace labels without marking it.
Token registry: decimals and symbols must be correct; wrong decimals can ruin profit and harm estimates.
Gas accounting: confirm effective gas price calculation across EIP-1559 and legacy tx types.
Profit bounds: compare profit estimates with conservative lower bounds and ensure negative profits are not labeled high confidence.
Evaluation slice: maintain a curated set of known sandwiches and known non-sandwich patterns for regression testing.
Drift monitoring: track label rates by pool and by router, because a new aggregator can change patterns overnight.

Implementation details that make or break performance

Backfill without missing blocks or duplicating work

Real systems need both a live stream and a backfill loop. Live streams drop occasionally. Nodes restart. Providers rate-limit. The stable design is:

Live subscriber writes block numbers to a durable queue.
Workers pull block numbers, fetch receipts and logs, and write canonical swap records.
A backfill job compares last processed block to chain head and fills gaps.
Idempotency is enforced by primary keys like (chain_id, tx_hash, log_index).

Storage strategy: raw facts separate from derived labels

Store raw decoded swaps separately from derived candidates and final labels. This matters because:

You can re-run detection when your logic improves without re-ingesting the chain.
You can audit why a label happened by inspecting source swaps.
You can export training data for ML models without parsing logs again.

A common pattern is: raw swaps in a columnar store, labels in a relational store or an analytics store, with a stable sandwich_id that references involved swaps.

Distributed compute: keep the heavy math scoped

At scale, you do not want to simulate every pool every block. Scope heavy work to candidate pools. Replay only pools that produce candidates. Store pool state snapshots for those pools. That keeps costs bounded even when chain activity spikes.

Compute environments for batch experiments and evaluations

Sandwich detection benefits from periodic batch evaluation: run your detector on a large historical window, compare label distribution, and measure drift. That work can be compute heavy, especially if you add trace parsing or pool simulation. If you need isolated GPU or CPU instances for experiments, you can explore Runpod as an optional compute layer. The key is isolation: do not run heavy eval loads on the same machines as live ingestion.

Where AI fits, and where it does not

A surprising number of teams start with AI, then realize they still need deterministic labeling and explainability. AI helps most in these areas:

Log signature triage: quickly classify unknown events and suggest ABIs or decoding hints.
Alert summarization: generate human readable explanations from structured label reasons.
Rule drafting: turn observed failure modes into new heuristics quickly, then validate empirically.
Analyst workflows: speed up investigations, clustering notes, and report writing.

AI is not a substitute for core math: harm estimation, profit verification, and pool state logic. Those must be deterministic enough to audit. If you want prompts that help you build explainable detection workflows, use Prompt Libraries and deepen your foundation through AI Learning Hub.

A safety-first workflow for MEV sandwich detection teams

Whether you are building an internal analytics system or a user-facing dashboard, the workflow below keeps you honest and prevents common pitfalls.

Phase 1: baseline detector with strict confidence

Implement pool-centric candidate generation for a single AMM family first (for example, constant product pools).
Require both victim harm and attacker profit, and label only when confidence is high.
Store reasons and version your rules so you can compare changes over time.
Build a small hand-curated evaluation set: known sandwiches and known non-sandwich scenarios.

Phase 2: broaden coverage carefully

Add router and aggregator support. Expect false positives to increase until you refine route parsing.
Add trace enrichment as optional, never as a hard dependency, because trace availability varies.
Add token behavior detection: fee-on-transfer, rebasing, and weird transfer hooks.

Phase 3: monitoring and drift

Track label rate per pool, per router, per day.
Alert on sudden changes. Often it is a new router version or a new pool design.
Regularly sample uncertain cases for analyst review, then feed findings back into deterministic logic.

Turn sandwich detection into a repeatable product feature

If you are building detection tooling, pair deterministic labeling with clean analyst workflows. Use the AI Learning Hub and Prompt Libraries to standardize how you investigate, explain, and ship improvements. For updates and new playbooks, subscribe to TokenToolHub.

Explore AI Learning Hub Open Prompt Libraries Subscribe

Hard cases: patterns that trick detectors

These cases are where most teams lose time. The best approach is to label them explicitly as a distinct category or lower confidence, instead of forcing them into “sandwich or not”.

Case: arbitrage around a large user trade

A large user swap moves price. Arbitrage bots trade immediately after to restore prices across venues. This can look like a sandwich, especially if there is also a trade before. The key difference is bracketing and intent: in a true sandwich, the attacker’s first leg is designed to worsen the victim’s price, and the attacker unwinds after the victim. In arbitrage, the bot usually reacts to price differences, and the victim trade is not necessarily between two legs from the same actor.

Practical rule: require that both attacker legs are linked and that the victim trade is between them in the same pool timeline. Then require that the attacker’s first leg increases the victim’s price impact beyond baseline. That eliminates many arb mislabels.

Case: liquidation cascades

Liquidations can trigger swaps in DEX pools, and bots may trade around them. These sequences can produce bracketing patterns. Treat liquidation-related swaps as a separate cluster and label them with separate semantics unless you can prove targeted victim harm.

Case: multi-route aggregation with partial fills

Aggregators can split a trade across routes. If you only look at one pool hop, you may misinterpret the victim’s effective slippage. When traces are available, you can map exact route segments. When traces are not available, downgrade confidence and avoid overstated harm calculations.

What to export for dashboards, research, and user warnings

Once you detect sandwiches, the next question is what you store and export so your product can actually use the data. A good export format is compact, explainable, and stable across detector versions.

Export field	What it enables	Notes
sandwich_id + detector_version	Stable references and regression comparisons	Never reuse ids across versions without mapping
front_tx, victim_tx, back_tx	Audit trails and UI linking	Include tx_index and log_index for deterministic replay
pool, amm, token_in, token_out	Pool-level analytics and routing insights	Normalize token addresses and decimals
estimated_victim_harm	User warnings and harm aggregation	Store units and confidence separately
estimated_attacker_profit_lower_bound	Profit analysis and ranking	Conservative by design
confidence + reasons[]	Explainability and debugging	Short reason codes are better than long text
data_flags	Quality control	trace_available, token_behavior_unknown, pool_state_replayed

Tools and workflow inside TokenToolHub

If you are building MEV analytics products, you usually need two parallel tracks: engineering (pipelines, decoders, storage) and analyst workflows (investigation, reporting, playbooks). TokenToolHub is built to support both sides of that loop:

AI Learning Hub for structured knowledge on how AI can support detection, evaluation, and reporting workflows.
Prompt Libraries for reusable prompts that turn raw labeled events into readable incident summaries, weekly reports, and investigation checklists.
AI Crypto Tools for tooling discovery and workflow assembly when you are integrating data sources, analysis stacks, and monitoring.
Subscribe to get updates as detection patterns evolve and new playbooks are published.

Conclusion: make your detector honest, then make it fast

MEV sandwich detection at scale is a test of engineering discipline. The best systems do not start with complicated ML. They start with clear definitions, pool-centric timelines, deterministic candidate generation, and verification that proves victim harm and attacker profit. Then they layer explainable scoring, versioning, and monitoring to survive changing market structure.

If you build this as a product feature, your biggest wins come from honesty: record data availability flags, downgrade confidence when traces are missing, and avoid overstated harm estimates. Your users will trust a conservative detector more than a noisy one.

Finally, keep the workflow repeatable. Revisit the prerequisite reading Building a market anomaly detector when you tune thresholds and evaluate drift, because the mechanics of detection quality are the same across domains. For ongoing playbooks and AI-assisted analyst workflows, use AI Learning Hub and Prompt Libraries, and you can Subscribe for updates.

FAQs

What is the most reliable signal for a true sandwich?

The combination of (1) bracketing structure in a pool timeline, (2) linkage between attacker legs, (3) measurable victim harm attributable to the front-leg, and (4) attacker profit after conservative gas and fee accounting. If any of these is missing, downgrade confidence.

Can you detect sandwiches without mempool data?

Yes. Many sandwiches can be detected purely from in-block ordering and pool-centric analysis. Mempool data helps explain how it happened, but it is not required to label the harm if you can verify structure, harm, and profit.

Why do false positives spike when you add aggregator support?

Aggregators produce multiple swaps in one transaction, often across several pools, and amounts can be transformed across hops. If you treat swap logs as independent trades without route context, you can misassign victim pools and misunderstand harm. Use traces when possible, and otherwise downgrade confidence.

How do you handle fee-on-transfer tokens?

Detect token behaviors and treat amount deltas from swap logs as insufficient. Prefer transfer logs and trace-derived recipients to compute real deltas. If you cannot validate deltas reliably, mark the case as uncertain or low confidence rather than forcing a high confidence label.

Should I use ML for sandwich detection?

Use deterministic logic first. It gives explainability and stable labels. ML is useful later for scoring borderline cases, clustering attacker behavior, or prioritizing investigations, but it should not replace harm and profit verification.

What should a user-facing warning say?

Keep it specific: identify the pool or route segment that appears heavily sandwiched, show recent occurrences, and suggest mitigations like reducing slippage, splitting trades, using protected routing when available, or trading during lower contention windows.

Where can I learn to build the evaluation loop for this detector?

Start with Building a market anomaly detector because it teaches the core evaluation mindset. Then use AI Learning Hub and Prompt Libraries to standardize your investigation and reporting playbooks.

References

Official documentation and reputable sources to deepen implementation details:

Note: MEV behavior evolves quickly. If you ship detection to users, treat labels as probabilistic, version your rules, and monitor drift continuously.

About the author: Wisdom Uche Ijika

Founder @TokenToolHub | Web3 Research, Token Security & On-Chain Intelligence | Building Tools for Safer Crypto | Solidity & Smart Contract Enthusiast