MEV Sandwich Detection at Scale: Implementation Guide + Pitfalls
MEV Sandwich Detection at Scale is not a single heuristic and it is not a dashboard that flags a block as “bad”. It is an end to end engineering problem: ingesting blocks and traces reliably, reconstructing swap intent, labeling attacker-victim-attacker patterns, measuring confidence, and doing it fast enough to keep up with chain throughput without drowning in false positives. This guide shows a practical blueprint for detection systems that operate at production scale, including failure modes that quietly break accuracy.
TL;DR
- A sandwich is a specific three-part structure: attacker front-run, victim swap, attacker back-run, typically in the same block and around the same pool path.
- At scale, the hard part is not spotting three swaps. The hard part is reconstructing pool state, routing, and intent with incomplete data (private tx, missing traces, MEV relays).
- Start with a deterministic baseline detector (pool-centric, trace-aware), then add a probabilistic scoring layer to handle edge cases.
- False positives usually come from arbitrage, liquidations, multi-route aggregators, rebase style fee-on-transfer tokens, and reused routers.
- Production pipeline needs: block ingestion, log decoding, optional traces, pool state snapshots, candidate generation, labeling, scoring, storage, and monitoring.
- For a strong anomaly mindset and evaluation harness patterns, treat Building a market anomaly detector as prerequisite reading.
- For AI assist features and prompt templates that make the workflow repeatable, use AI Learning Hub and Prompt Libraries.
- If you want updates and new playbooks, you can Subscribe.
Sandwich detection behaves like any serious anomaly system: you need clean ingestion, stable feature definitions, evaluation sets, and drift monitoring. If you have not built a detection loop before, start with Building a market anomaly detector. It will help you think in terms of ground truth, thresholds, and failure analysis, which is exactly what scale MEV detection requires.
What a sandwich is, in precise on-chain terms
A sandwich attack is a coordinated sequence of swaps that extracts value from a victim trade by manipulating price around it. The classic shape is attacker buy (front-run) then victim buy then attacker sell (back-run). The attacker profits because the victim’s swap moves price further in the direction the attacker already pushed, and the attacker unwinds after the victim at a better price.
In practice, the details vary. Sometimes the victim is a sell and the attacker sells first to push price down, then buys back after. Sometimes the trade route touches multiple pools, and the attacker mirrors only the sensitive pool or one hop in the route. Sometimes the attacker uses multiple addresses or a bundle, and the victim is included by a builder relay rather than mempool ordering.
If you are building detection, do not define a sandwich as “three swaps in one block”. That definition breaks immediately when you encounter: multi-hop routes, aggregator routers that emit many swap logs, internal calls, partial fills, and routed trades that touch multiple pools. A useful definition is pool-centric and state-centric: if the attacker’s first action shifts pool price (or effective price) and the victim executes at a worse price because of it, then the attacker unwinds to realize profit.
Why sandwich detection matters for builders and traders
Sandwich attacks are not only an academic MEV topic. They are a user harm problem. If you build wallets, swaps, bots, risk dashboards, or analytics, you need to quantify how often users are being sandwiched, which pools are most exploited, and which routing patterns create predictable victims.
Detection matters for three concrete reasons:
- User protection: you can warn users before they trade into a vulnerable path, or route through safer venues.
- Market integrity analytics: you can separate organic volume from volume that is being “taxed” by MEV.
- Protocol hardening: you can identify which pool designs, fee tiers, or router behaviors are consistently exploited.
There is also a hidden reason: without detection, you cannot measure whether mitigations help. MEV protection is often discussed in marketing terms. A detection system lets you test outcomes, pool by pool, time window by time window, and see whether user harm actually drops.
What “at scale” really means for sandwich detection
In small prototypes, you can scan a few blocks, decode a couple of swap events, and label patterns manually. At scale, you face constraints: throughput, chain differences, missing data, probabilistic routing, and adversaries who adapt.
The mental shift is this: you are building a streaming system and a labeling system. Your objective is not “find sandwiches”. Your objective is “produce stable, explainable labels with known error bars”. That is why the anomaly-detector prerequisite reading matters: detection systems are judged by their mistakes, not their demos.
The minimum data model you need
A robust detector can be built with only receipts and logs for many AMMs, but it becomes much better if you also have execution traces. At scale, you should design your schema so you can operate in “logs only” mode, then upgrade to “trace aware” mode when available.
| Entity | Fields you must store | Why it matters | Common pitfall |
|---|---|---|---|
| Block | number, hash, timestamp, baseFee, proposer or builder metadata (if available) | Time windows, gas economics, ordering context | Assuming timestamp implies inclusion order |
| Transaction | hash, from, to, nonce, gasUsed, effectiveGasPrice, index, value | Ordering, gas cost, cluster linking | Ignoring internal calls and router nesting |
| Receipt and logs | status, logs, topics, address, data | Swap reconstruction and token transfers | Not handling multiple swaps per tx |
| Decoded swap | pool, tokenIn, tokenOut, amountIn, amountOut, sqrtPrice or reserves (if known), hop index | Pool-centric detection and victim harm estimation | Confusing router amounts with pool amounts |
| Trace (optional) | call tree, internal transfers, contract creation, revert reasons (if available) | Aggregator paths, precise amounts, intent inference | Trace API throttling and missing nodes |
| Pool state snapshot | reserves or price state at block boundaries (or computed from swap deltas) | Slippage and price impact estimation | Assuming static price within a block |
How sandwich detection works, step by step
The most reliable approach is pool-centric candidate generation followed by victim harm and attacker profit verification. You do not start by hunting “attacker addresses”. You start by understanding how a pool state changes inside a block.
1) Ingest blocks and decode swaps into a canonical format
Your detector cannot reason about “a Uniswap swap” unless you normalize swaps into a canonical record. Different AMMs emit different events. Routers wrap swaps. Aggregators batch multiple swaps. Your canonical record should answer these questions per swap:
- Which pool or venue executed the swap?
- Which token went in, which token came out?
- What were the pool amounts (not router amounts)?
- What is the swap direction relative to the pool pair?
- What is the tx index and log index so ordering is deterministic?
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class CanonicalSwap:
chain_id: int
block_number: int
block_timestamp: int
tx_hash: str
tx_index: int
log_index: int
trader: str # tx.from (use traces to map to end user when possible)
router: Optional[str] # tx.to or detected router in trace
pool: str # AMM pool contract
amm: str # "uni_v2", "uni_v3", "curve", "balancer", "unknown"
token_in: str
token_out: str
amount_in: int
amount_out: int
recipient: Optional[str] # pool output recipient if known
hop: int = 0 # order within route
# Tip: store amounts as raw integers plus token decimals in a separate token registry.
At scale, decoding is a major cost center. Cache ABIs, decode by topic signature, and store swap records in a columnar-friendly schema. Most analytics pipelines eventually land in a data store optimized for large scans and group-by operations.
2) Build a pool timeline for each block
A sandwich is local to a pool or a route segment. So create a per-block, per-pool ordered list of swaps. Ordering should be (tx_index, log_index) because a single tx can produce multiple swap logs. Once you have a pool timeline, candidate generation becomes mechanical: look for opposite-direction swaps that bracket another swap.
Candidate generation rule of thumb:
- Front-leg: swap direction A (token0 to token1 or token1 to token0)
- Victim swap: same direction A, occurs after the front-leg
- Back-leg: opposite direction B, occurs after victim swap
- Front-leg and back-leg are linked by address cluster or repeated router patterns
Arbitrage often looks like buy then buy then sell, or sell then sell then buy, especially when multiple users trade in the same direction. If you only rely on ordering and direction, you will overlabel. Your system must prove two things: (1) the middle trade is harmed by the outer trades, and (2) the outer trades realize net profit after fees and gas.
3) Link attacker legs using clustering, not just a single address
At small scale you can assume the attacker uses one address. At real scale, attackers use: multiple EOAs, smart contracts, and relays. You need a flexible linking strategy:
- Direct same sender: tx.from matches for front-leg and back-leg.
- Contract executor: same contract address is used as the router or executor across both legs.
- Funding linkage: both EOAs funded from the same hot wallet, or share coinbase payments (advanced).
- Bundle linkage: legs appear as a tight bundle with near-zero gaps and predictable gas strategy (advanced).
Start simple. In production, a “same sender” linkage catches a lot. Then add “same executor contract”. If you want intelligence-grade clustering, you need additional datasets. That is where on-chain intel platforms can help your enrichment layer. If that fits your workflow, you can explore Nansen.
4) Estimate victim harm using pool state and counterfactual pricing
The center of sandwich detection is counterfactual analysis: what would the victim have received if the attacker’s front-run had not happened? You do not need perfect math for every AMM to get value, but you need a consistent approximation.
For constant product pools (Uniswap v2 style), counterfactual is straightforward if you know reserves before the victim trade. For concentrated liquidity pools (Uniswap v3 style), you need either: swap event fields that include sqrtPrice changes, or a v3 math engine that simulates ticks. At scale, many pipelines use approximations: compare effective price before and after, and compute deltas based on observed swap amounts.
Victim harm proof checklist
- Confirm victim swap touches the same pool as attacker legs (or a shared critical hop).
- Compute victim’s effective price and compare to a baseline price just before front-leg.
- Estimate extra slippage attributable to attacker’s front-leg (not general volatility).
- Check victim’s minOut and deadline. Many victims “allow” slippage, but harm still exists within that window.
- Flag “uncertain” when router batching or multi-hop route makes counterfactual ambiguous without traces.
5) Verify attacker profit net of fees and gas
Profit verification is the second half of truth. Many patterns that look like sandwiches are actually failed attempts or neutral trades. Profit must account for:
- AMM fees paid on both attacker swaps
- Gas cost (effectiveGasPrice times gasUsed), plus any coinbase or builder tips if visible
- Bridging costs if the attacker moves funds across venues (rare for same-block sandwiches, more common in generalized MEV)
At scale, you can start with a simplified profit model: compute net token delta for attacker address cluster and convert to a base asset using a price feed at block time. Then subtract gas cost. If profit is positive and victim harm is positive, you have a high confidence label.
A production architecture for detection at scale
The pipeline below assumes you want continuous detection and a queryable database, not just a one-off scan. There are many tech stacks that work. The architecture matters more than the exact tools.
Pitfalls that break sandwich detection in production
The pitfalls below are the ones that show up after you ship. They are not theoretical. They are what makes a system look accurate in a test window and then degrade quietly once it processes thousands of pools across weeks.
| Pitfall | What it looks like | Why it hurts accuracy | Mitigation |
|---|---|---|---|
| Aggregator multi-hop complexity | One tx emits many swap logs across pools | You mis-assign victim pool or misread amounts | Use traces when possible, or route parsing by known router ABIs |
| Private order flow and bundles | Victim appears without mempool presence | Mempool-only heuristics fail; ordering still occurs in block | Focus on in-block pool timelines and profit verification |
| Arbitrage confusion | Attacker legs look like standard arb | You label arb as sandwich without victim harm | Require victim harm and bracketing around a specific tx |
| Fee-on-transfer or rebasing tokens | Amounts in/out do not match simple AMM math | Profit and harm estimates become wrong | Detect token behaviors and downgrade confidence or simulate with transfers |
| Trace gaps and provider throttling | Some blocks have missing call traces | Detector becomes inconsistent across time | Design logs-only fallback and record data availability flags |
| Pool upgrades and forks | Same address changes semantics | ABI decode silently fails or misdecodes | ABI versioning, signature-based decoding, and contract bytecode checks |
| Gas accounting blind spots | Tips or coinbase payments hidden | Profit calculation is overstated | Compute conservative profit, store “min profit” bounds, track known tip patterns |
Candidate generation patterns that scale
Candidate generation decides whether your system scales. A naive approach compares every swap to every other swap. That explodes. You need narrow indexing and rules that reduce search space.
Pool-centric scan (fast baseline)
For each block, group swaps by pool. For each pool timeline, scan with a sliding window: when you see a swap, look ahead for an opposite-direction swap that could be a back-leg. Between them, find victim swaps that share direction with the first leg. This is O(n) per pool timeline with small constant factors if implemented carefully.
from collections import defaultdict
def direction(swap):
# normalize direction to a boolean or small enum
return (swap.token_in, swap.token_out)
def opposite(dir_a, dir_b):
return dir_a[0] == dir_b[1] and dir_a[1] == dir_b[0]
def generate_candidates(swaps_for_block, max_gap=24):
# swaps_for_block: list[CanonicalSwap]
by_pool = defaultdict(list)
for s in swaps_for_block:
by_pool[s.pool].append(s)
candidates = []
for pool, swaps in by_pool.items():
swaps.sort(key=lambda x: (x.tx_index, x.log_index))
n = len(swaps)
for i in range(n):
a1 = swaps[i]
dir_a = direction(a1)
# look for a back-leg within a bounded window
for j in range(i+2, min(n, i+max_gap)):
a2 = swaps[j]
if not opposite(dir_a, direction(a2)):
continue
# victim candidates are between i and j and share dir_a
for k in range(i+1, j):
v = swaps[k]
if direction(v) != dir_a:
continue
# lightweight linkage hint (same trader for attacker legs)
candidates.append((a1, v, a2))
return candidates
# Tip: add early pruning:
# - require a1 and a2 share same tx.from OR share same router/executor fingerprint
# - require amounts be plausible (a2 amount_out relates to a1 amount_in)
This baseline produces many candidates. That is expected. The verification layer is where you narrow to true positives. If you want to scale further, you can implement attacker linkage before victim search: only look for back-legs that share attacker fingerprint.
Attacker fingerprinting that reduces compute
A production trick is to build a compact “fingerprint” for each swap:
- sender address (tx.from)
- executor contract address (trace-derived or tx.to)
- gas strategy bucket (priority fee style, if available)
- pool id
- direction
Then for a given front-leg, you search only for back-legs with the same fingerprint but opposite direction, within a short window. That can cut candidates dramatically without missing most sandwiches.
Verification: turning candidates into high confidence labels
Verification is where you separate signal from noise. You want two scores: victim harm confidence and attacker profit confidence. Then combine them into a final label.
Victim harm estimation: a practical approach for different AMMs
The cleanest harm estimate is counterfactual output: what would the victim’s amountOut have been if the front-leg did not exist? For constant product pools, this is doable if you know reserves before the victim swap. If you do not store reserves, you can reconstruct them by replaying swaps inside the block, starting from a snapshot at block start. That is heavier, but still feasible if you limit replay to pools with candidates.
For concentrated liquidity pools, exact replay is hard because you need tick data. At scale, many teams use a practical compromise: treat the victim harm as a function of observed effective prices:
- Compute attacker front-leg effective price in the pool.
- Compute victim effective price.
- Compute attacker back-leg effective price.
- If victim price is materially worse than a baseline and the bracketing legs align, count it as harm, then weight by confidence.
This does not give perfect harm in units, but it gives a reliable detection signal when combined with profit verification.
Attacker profit: compute conservative lower bounds
Profit is often overstated when you ignore gas and hidden payments. A safe practice is to compute: a lower bound profit estimate, then label only when the lower bound is positive and above a small threshold. That makes your detector conservative, which reduces false positives.
def token_delta(swaps, attacker_addrs):
# swaps: list[CanonicalSwap] that belong to attacker legs across pools
# attacker_addrs: set of addresses clustered as attacker
# This is simplified: real implementation should use transfer logs and trace-derived recipients.
delta = {}
for s in swaps:
if s.trader not in attacker_addrs:
continue
delta[s.token_in] = delta.get(s.token_in, 0) - s.amount_in
delta[s.token_out] = delta.get(s.token_out, 0) + s.amount_out
return delta
def gas_cost_native(tx_receipts, attacker_txs, effective_gas_prices):
total = 0
for txh in attacker_txs:
gas_used = tx_receipts[txh]["gasUsed"]
egp = effective_gas_prices[txh]
total += gas_used * egp
return total
# Convert token deltas to native or USD with a price oracle at block time.
# Conservative choice: use worst-case conversion within a short window.
A scoring model that is explainable and versioned
You do not need a black box model to get great results. You need a score that is: stable, explainable, and versioned so you can improve without breaking dashboards.
A practical scoring breakdown:
- Structure score: bracketing pattern exists in pool timeline and swap directions align.
- Link score: attacker legs share sender or executor fingerprints.
- Harm score: victim effective price is worse than baseline beyond a threshold.
- Profit score: attacker lower bound profit is positive after conservative gas.
- Data quality score: traces available, pool state replay available, token behavior known.
Then output a label like: sandwich_high, sandwich_medium, sandwich_low, uncertain. Store reasons in a compact list so a UI can explain why the system flagged it.
A visual intuition: why thresholds matter as volume grows
At scale, your detector sees more events. If your harm threshold is too low, false positives grow faster than true positives. If your profit threshold is too high, you miss smaller sandwiches that still harm users. You tune thresholds using evaluation sets and drift monitoring.
Step-by-step checks you should run before you trust your labels
This section is a pre-flight checklist for the system itself. Run these checks when you deploy a new decoder, a new chain, a new AMM integration, or when provider reliability changes.
System checks for sandwich detection at scale
- Ordering sanity: for random blocks, confirm swaps sorted by (tx_index, log_index) match explorer ordering.
- Decoder coverage: verify top pools and routers in your window decode successfully; track unknown signatures.
- Trace availability: record a per-block flag for trace success; do not mix trace and non-trace labels without marking it.
- Token registry: decimals and symbols must be correct; wrong decimals can ruin profit and harm estimates.
- Gas accounting: confirm effective gas price calculation across EIP-1559 and legacy tx types.
- Profit bounds: compare profit estimates with conservative lower bounds and ensure negative profits are not labeled high confidence.
- Evaluation slice: maintain a curated set of known sandwiches and known non-sandwich patterns for regression testing.
- Drift monitoring: track label rates by pool and by router, because a new aggregator can change patterns overnight.
Implementation details that make or break performance
Backfill without missing blocks or duplicating work
Real systems need both a live stream and a backfill loop. Live streams drop occasionally. Nodes restart. Providers rate-limit. The stable design is:
- Live subscriber writes block numbers to a durable queue.
- Workers pull block numbers, fetch receipts and logs, and write canonical swap records.
- A backfill job compares last processed block to chain head and fills gaps.
- Idempotency is enforced by primary keys like (chain_id, tx_hash, log_index).
Storage strategy: raw facts separate from derived labels
Store raw decoded swaps separately from derived candidates and final labels. This matters because:
- You can re-run detection when your logic improves without re-ingesting the chain.
- You can audit why a label happened by inspecting source swaps.
- You can export training data for ML models without parsing logs again.
A common pattern is: raw swaps in a columnar store, labels in a relational store or an analytics store, with a stable sandwich_id that references involved swaps.
Distributed compute: keep the heavy math scoped
At scale, you do not want to simulate every pool every block. Scope heavy work to candidate pools. Replay only pools that produce candidates. Store pool state snapshots for those pools. That keeps costs bounded even when chain activity spikes.
Compute environments for batch experiments and evaluations
Sandwich detection benefits from periodic batch evaluation: run your detector on a large historical window, compare label distribution, and measure drift. That work can be compute heavy, especially if you add trace parsing or pool simulation. If you need isolated GPU or CPU instances for experiments, you can explore Runpod as an optional compute layer. The key is isolation: do not run heavy eval loads on the same machines as live ingestion.
Where AI fits, and where it does not
A surprising number of teams start with AI, then realize they still need deterministic labeling and explainability. AI helps most in these areas:
- Log signature triage: quickly classify unknown events and suggest ABIs or decoding hints.
- Alert summarization: generate human readable explanations from structured label reasons.
- Rule drafting: turn observed failure modes into new heuristics quickly, then validate empirically.
- Analyst workflows: speed up investigations, clustering notes, and report writing.
AI is not a substitute for core math: harm estimation, profit verification, and pool state logic. Those must be deterministic enough to audit. If you want prompts that help you build explainable detection workflows, use Prompt Libraries and deepen your foundation through AI Learning Hub.
A safety-first workflow for MEV sandwich detection teams
Whether you are building an internal analytics system or a user-facing dashboard, the workflow below keeps you honest and prevents common pitfalls.
Phase 1: baseline detector with strict confidence
- Implement pool-centric candidate generation for a single AMM family first (for example, constant product pools).
- Require both victim harm and attacker profit, and label only when confidence is high.
- Store reasons and version your rules so you can compare changes over time.
- Build a small hand-curated evaluation set: known sandwiches and known non-sandwich scenarios.
Phase 2: broaden coverage carefully
- Add router and aggregator support. Expect false positives to increase until you refine route parsing.
- Add trace enrichment as optional, never as a hard dependency, because trace availability varies.
- Add token behavior detection: fee-on-transfer, rebasing, and weird transfer hooks.
Phase 3: monitoring and drift
- Track label rate per pool, per router, per day.
- Alert on sudden changes. Often it is a new router version or a new pool design.
- Regularly sample uncertain cases for analyst review, then feed findings back into deterministic logic.
Turn sandwich detection into a repeatable product feature
If you are building detection tooling, pair deterministic labeling with clean analyst workflows. Use the AI Learning Hub and Prompt Libraries to standardize how you investigate, explain, and ship improvements. For updates and new playbooks, subscribe to TokenToolHub.
Hard cases: patterns that trick detectors
These cases are where most teams lose time. The best approach is to label them explicitly as a distinct category or lower confidence, instead of forcing them into “sandwich or not”.
Case: arbitrage around a large user trade
A large user swap moves price. Arbitrage bots trade immediately after to restore prices across venues. This can look like a sandwich, especially if there is also a trade before. The key difference is bracketing and intent: in a true sandwich, the attacker’s first leg is designed to worsen the victim’s price, and the attacker unwinds after the victim. In arbitrage, the bot usually reacts to price differences, and the victim trade is not necessarily between two legs from the same actor.
Practical rule: require that both attacker legs are linked and that the victim trade is between them in the same pool timeline. Then require that the attacker’s first leg increases the victim’s price impact beyond baseline. That eliminates many arb mislabels.
Case: liquidation cascades
Liquidations can trigger swaps in DEX pools, and bots may trade around them. These sequences can produce bracketing patterns. Treat liquidation-related swaps as a separate cluster and label them with separate semantics unless you can prove targeted victim harm.
Case: multi-route aggregation with partial fills
Aggregators can split a trade across routes. If you only look at one pool hop, you may misinterpret the victim’s effective slippage. When traces are available, you can map exact route segments. When traces are not available, downgrade confidence and avoid overstated harm calculations.
What to export for dashboards, research, and user warnings
Once you detect sandwiches, the next question is what you store and export so your product can actually use the data. A good export format is compact, explainable, and stable across detector versions.
| Export field | What it enables | Notes |
|---|---|---|
| sandwich_id + detector_version | Stable references and regression comparisons | Never reuse ids across versions without mapping |
| front_tx, victim_tx, back_tx | Audit trails and UI linking | Include tx_index and log_index for deterministic replay |
| pool, amm, token_in, token_out | Pool-level analytics and routing insights | Normalize token addresses and decimals |
| estimated_victim_harm | User warnings and harm aggregation | Store units and confidence separately |
| estimated_attacker_profit_lower_bound | Profit analysis and ranking | Conservative by design |
| confidence + reasons[] | Explainability and debugging | Short reason codes are better than long text |
| data_flags | Quality control | trace_available, token_behavior_unknown, pool_state_replayed |
Tools and workflow inside TokenToolHub
If you are building MEV analytics products, you usually need two parallel tracks: engineering (pipelines, decoders, storage) and analyst workflows (investigation, reporting, playbooks). TokenToolHub is built to support both sides of that loop:
- AI Learning Hub for structured knowledge on how AI can support detection, evaluation, and reporting workflows.
- Prompt Libraries for reusable prompts that turn raw labeled events into readable incident summaries, weekly reports, and investigation checklists.
- AI Crypto Tools for tooling discovery and workflow assembly when you are integrating data sources, analysis stacks, and monitoring.
- Subscribe to get updates as detection patterns evolve and new playbooks are published.
Conclusion: make your detector honest, then make it fast
MEV sandwich detection at scale is a test of engineering discipline. The best systems do not start with complicated ML. They start with clear definitions, pool-centric timelines, deterministic candidate generation, and verification that proves victim harm and attacker profit. Then they layer explainable scoring, versioning, and monitoring to survive changing market structure.
If you build this as a product feature, your biggest wins come from honesty: record data availability flags, downgrade confidence when traces are missing, and avoid overstated harm estimates. Your users will trust a conservative detector more than a noisy one.
Finally, keep the workflow repeatable. Revisit the prerequisite reading Building a market anomaly detector when you tune thresholds and evaluate drift, because the mechanics of detection quality are the same across domains. For ongoing playbooks and AI-assisted analyst workflows, use AI Learning Hub and Prompt Libraries, and you can Subscribe for updates.
FAQs
What is the most reliable signal for a true sandwich?
The combination of (1) bracketing structure in a pool timeline, (2) linkage between attacker legs, (3) measurable victim harm attributable to the front-leg, and (4) attacker profit after conservative gas and fee accounting. If any of these is missing, downgrade confidence.
Can you detect sandwiches without mempool data?
Yes. Many sandwiches can be detected purely from in-block ordering and pool-centric analysis. Mempool data helps explain how it happened, but it is not required to label the harm if you can verify structure, harm, and profit.
Why do false positives spike when you add aggregator support?
Aggregators produce multiple swaps in one transaction, often across several pools, and amounts can be transformed across hops. If you treat swap logs as independent trades without route context, you can misassign victim pools and misunderstand harm. Use traces when possible, and otherwise downgrade confidence.
How do you handle fee-on-transfer tokens?
Detect token behaviors and treat amount deltas from swap logs as insufficient. Prefer transfer logs and trace-derived recipients to compute real deltas. If you cannot validate deltas reliably, mark the case as uncertain or low confidence rather than forcing a high confidence label.
Should I use ML for sandwich detection?
Use deterministic logic first. It gives explainability and stable labels. ML is useful later for scoring borderline cases, clustering attacker behavior, or prioritizing investigations, but it should not replace harm and profit verification.
What should a user-facing warning say?
Keep it specific: identify the pool or route segment that appears heavily sandwiched, show recent occurrences, and suggest mitigations like reducing slippage, splitting trades, using protected routing when available, or trading during lower contention windows.
Where can I learn to build the evaluation loop for this detector?
Start with Building a market anomaly detector because it teaches the core evaluation mindset. Then use AI Learning Hub and Prompt Libraries to standardize your investigation and reporting playbooks.
References
Official documentation and reputable sources to deepen implementation details:
- Ethereum developer documentation
- Ethereum Improvement Proposals (EIPs)
- Uniswap documentation
- Flashbots documentation
- TokenToolHub: AI Learning Hub
- TokenToolHub: Prompt Libraries
Note: MEV behavior evolves quickly. If you ship detection to users, treat labels as probabilistic, version your rules, and monitor drift continuously.
