Indexing and Querying: The Graph, Subgraphs and GraphQL
From raw on chain events to queryable APIs: subgraph schemas, mappings, indexing economics, and production grade query patterns.
The Graph turns contract events and call results into a GraphQL API using a subgraph that declares a schema, a manifest of what to watch, and mapping functions that transform chain data into entities.
You can deploy to the decentralized network where indexers, curators, and delegators coordinate with GRT, or you can begin on hosted services while prototyping.
Scalable subgraphs are deterministic, reorg aware, and well modeled so queries stay fast and predictable.
1) What is a Subgraph?
A subgraph is a small indexing program that watches one or more smart contracts and converts their events and selected call results into a structured dataset that you query with GraphQL.
It runs inside The Graph node, which streams blocks from a chain, executes your mapping handlers when matching events appear, and materializes entities in a store.
Those entities are exposed through a GraphQL schema you control.
The execution model is event sourced: you never mutate chain state in place. Instead, your mapping receives a replayable event and updates entities accordingly.
If the network reorgs, the indexer reverts affected entities back to the last safe block and replays the canonical blocks. This is why handlers must be idempotent and deterministic.
- Deterministic means the same input always yields the same output and no external randomness or time based branching is allowed.
- Reorg aware means you link every entity update to the block and the log index so rollbacks are safe and predictable.
- Well modeled means your entities match the questions your product needs to answer, with fields and indices that keep hot queries fast.
2) Folder Layout and Core Files
A typical subgraph repository follows a simple and portable structure. You can scaffold it with the Graph CLI or assemble it by hand.
my-protocol-subgraph/ ├─ abis/ │ ├─ ERC20.json │ └─ Pool.json ├─ schema.graphql ├─ subgraph.yaml ├─ src/ │ ├─ mappings/ │ │ ├─ pool.ts │ │ └─ factory.ts │ └─ utils/ │ ├─ math.ts │ └─ ids.ts ├─ tests/ # Matchstick tests │ ├─ pool.test.ts │ └─ factory.test.ts ├─ package.json └─ tsconfig.json
The three most important files are the schema, the manifest, and the mappings.
The schema declares the entities your API will serve. The manifest declares which networks and contracts to index and which handlers to run.
The mappings are AssemblyScript functions that transform events into entity writes.
3) Schema Design and Entity Modeling
Your schema.graphql
defines the types that clients can query.
Each type annotated with @entity
becomes a table with an id
primary key and a set of fields in the store. Relations are expressed by referencing other entity ids.
You can also mark reverse relations with @derivedFrom
to let the indexer compute them automatically.
# schema.graphql # Entities for a simple swap protocol with pools, accounts, and trades type Account @entity { id: ID! # lowercased address string createdAt: BigInt! tradesCount: Int! trades: [Trade!]! @derivedFrom(field: "trader") positions: [Position!]! @derivedFrom(field: "account") } type Pool @entity { id: ID! # pool address token0: Bytes! token1: Bytes! feeBps: Int! createdAtBlock: BigInt! createdAtTimestamp: BigInt! txCount: BigInt! volumeToken0: BigDecimal! volumeToken1: BigDecimal! liquidityToken0: BigDecimal! liquidityToken1: BigDecimal! swaps: [Trade!]! @derivedFrom(field: "pool") } type Trade @entity { id: ID! # txHash-logIndex pool: Pool! trader: Account! amountInRaw: BigInt! amountOutRaw: BigInt! amountIn: BigDecimal! # scaled by decimals amountOut: BigDecimal! # scaled by decimals tokenIn: Bytes! tokenOut: Bytes! price: BigDecimal! # tokenOut per tokenIn blockNumber: BigInt! timestamp: BigInt! txHash: Bytes! } type Position @entity { id: ID! # account-pool account: Account! pool: Pool! shares: BigDecimal! deposited0: BigDecimal! deposited1: BigDecimal! withdrawn0: BigDecimal! withdrawn1: BigDecimal! updatedAt: BigInt! } # Snapshot for charts (sharded by day) type PoolDayData @entity { id: ID! # poolId-YYYYMMDD pool: Pool! date: Int! volumeToken0: BigDecimal! volumeToken1: BigDecimal! txCount: BigInt! liquidityToken0: BigDecimal! liquidityToken1: BigDecimal! }
A few design rules help keep schemas robust:
- Deterministic identifiers. Make every id derivable from the event payload. For trades, concatenate the transaction hash and the log index. For accounts, use the lowercased checksum address. For composite entities like a position, join
account
andpool
with a delimiter. - Block metadata. Persist
blockNumber
,timestamp
, andtxHash
on state change entities. That makes time range filtering and audit trails straightforward. - Raw and scaled values. Store raw on chain integers as
BigInt
and scaled decimals asBigDecimal
strings so you can render human friendly numbers without repeated conversion in the client. - Snapshots for charts. Real time dashboards should not aggregate thousands of trades per view. Write hourly and daily snapshots in your mapping and query them directly.
- Denormalize hot fields. Copy two or three frequently read attributes onto leaf entities to avoid deep joins. For example cache
feeBps
or an asset symbol if you need it on every row in a table.
4) Mappings and Deterministic Transforms
Mappings are AssemblyScript functions that the indexer calls when an event you declared in the manifest fires.
The functions transform the event into entity writes using the entity API. The environment exposes block metadata, transaction hash, and event parameters.
Mappings must be deterministic. They cannot perform random number generation, fetch from HTTP, or read the host clock. They can call read only view functions through the ABI as long as the values are not time sensitive.
Manifest with event handlers and templates
# subgraph.yaml specVersion: 0.0.6 schema: file: ./schema.graphql dataSources: - kind: ethereum name: Factory network: mainnet source: address: "0xFactoryAddress..." abi: Factory startBlock: 17350000 mapping: kind: ethereum/events apiVersion: 0.0.7 language: wasm/assemblyscript abis: - name: Factory file: ./abis/Factory.json - name: Pool file: ./abis/Pool.json - name: ERC20 file: ./abis/ERC20.json entities: - Pool eventHandlers: - event: PoolCreated(indexed address,indexed address,indexed address,uint24) handler: handlePoolCreated file: ./src/mappings/factory.ts templates: - name: PoolTemplate kind: ethereum network: mainnet source: abi: Pool mapping: kind: ethereum/events apiVersion: 0.0.7 language: wasm/assemblyscript abis: - name: Pool file: ./abis/Pool.json - name: ERC20 file: ./abis/ERC20.json entities: - Trade - PoolDayData - Position eventHandlers: - event: Swap(indexed address,indexed address,uint256,uint256) handler: handleSwap - event: Mint(indexed address,uint256,uint256) handler: handleMint - event: Burn(indexed address,uint256,uint256) handler: handleBurn file: ./src/mappings/pool.ts
Mapping helpers
Collect shared math and id helpers in a src/utils
folder. AssemblyScript supports BigInt
and BigDecimal
for safe integer and fixed decimal math.
// src/utils/math.ts import { BigDecimal, BigInt } from "@graphprotocol/graph-ts" export const ZERO_BI = BigInt.zero() export const ZERO_BD = BigDecimal.fromString("0") export const ONE_BI = BigInt.fromI32(1) /** Scale a raw integer amount by token decimals into BigDecimal */ export function scale(amount: BigInt, decimals: i32): BigDecimal { let scaleFactor = BigDecimal.fromString("1" + "0".repeat(decimals as i32)) return amount.toBigDecimal().div(scaleFactor) } /** Safe division with zero guard */ export function div(a: BigDecimal, b: BigDecimal): BigDecimal { return b.equals(ZERO_BD) ? ZERO_BD : a.div(b) }
Factory mapping with dynamic data source creation
// src/mappings/factory.ts import { Address, BigInt, ethereum } from "@graphprotocol/graph-ts" import { PoolCreated } from "../../types/Factory/Factory" import { Pool as PoolEntity } from "../../types/schema" import { Pool as PoolContract } from "../../types/Pool/Pool" import { PoolTemplate } from "../../types/templates" import { ZERO_BD } from "../utils/math" export function handlePoolCreated(e: PoolCreated): void { let id = e.params.pool.toHex().toLowerCase() let pool = new PoolEntity(id) pool.token0 = e.params.token0 pool.token1 = e.params.token1 pool.feeBps = e.params.fee.toI32() pool.createdAtBlock = e.block.number pool.createdAtTimestamp = e.block.timestamp pool.txCount = BigInt.zero() pool.volumeToken0 = ZERO_BD pool.volumeToken1 = ZERO_BD pool.liquidityToken0 = ZERO_BD pool.liquidityToken1 = ZERO_BD pool.save() // Start indexing this new pool from this block PoolTemplate.create(e.params.pool) }
Pool mapping with swap handler and snapshots
// src/mappings/pool.ts import { BigInt, Address, crypto, Bytes } from "@graphprotocol/graph-ts" import { Swap, Mint, Burn } from "../../types/Pool/Pool" import { Trade, Account, Pool, PoolDayData, Position } from "../../types/schema" import { ZERO_BD, scale, div } from "../utils/math" /** Build an id like <txhash>-<logIndex> */ function eventId(tx: Bytes, logIndex: BigInt): string { return tx.toHex() + "-" + logIndex.toString() } /** Build a day bucket id like <poolId>-YYYYMMDD */ function poolDayId(poolId: string, timestamp: BigInt): string { let day = timestamp.toI32() / 86400 return poolId + "-" + day.toString() } export function handleSwap(e: Swap): void { let poolId = e.address.toHex().toLowerCase() let id = eventId(e.transaction.hash, e.logIndex) let pool = Pool.load(poolId) if (pool == null) return // Account let acctId = e.params.trader.toHex().toLowerCase() let acct = Account.load(acctId) if (acct == null) { acct = new Account(acctId) acct.createdAt = e.block.timestamp acct.tradesCount = 0 } // Compute scaled amounts and price // Example assumes 18 decimals for both tokens; in a real subgraph, query decimals once and cache let amountIn = scale(e.params.amountIn, 18) let amountOut = scale(e.params.amountOut, 18) let price = div(amountOut, amountIn) // Trade entity let t = new Trade(id) t.pool = poolId t.trader = acct.id t.amountInRaw = e.params.amountIn t.amountOutRaw= e.params.amountOut t.amountIn = amountIn t.amountOut = amountOut t.tokenIn = e.params.tokenIn t.tokenOut = e.params.tokenOut t.price = price t.blockNumber = e.block.number t.timestamp = e.block.timestamp t.txHash = e.transaction.hash t.save() // Update pool aggregates pool.txCount = pool.txCount.plus(BigInt.fromI32(1)) if (e.params.tokenIn == pool.token0) { pool.volumeToken0 = pool.volumeToken0.plus(amountIn) pool.volumeToken1 = pool.volumeToken1.plus(amountOut) } else { pool.volumeToken1 = pool.volumeToken1.plus(amountIn) pool.volumeToken0 = pool.volumeToken0.plus(amountOut) } pool.save() // Snapshot let dayId = poolDayId(poolId, e.block.timestamp) let snap = PoolDayData.load(dayId) if (snap == null) { snap = new PoolDayData(dayId) snap.pool = poolId snap.date = e.block.timestamp.toI32() / 86400 snap.txCount = BigInt.zero() snap.volumeToken0 = ZERO_BD snap.volumeToken1 = ZERO_BD snap.liquidityToken0 = pool.liquidityToken0 snap.liquidityToken1 = pool.liquidityToken1 } snap.txCount = snap.txCount.plus(BigInt.fromI32(1)) snap.volumeToken0 = pool.volumeToken0 snap.volumeToken1 = pool.volumeToken1 snap.save() // Account counter acct.tradesCount = acct.tradesCount + 1 acct.save() } export function handleMint(e: Mint): void { // Update position and pool liquidity let poolId = e.address.toHex().toLowerCase() let acctId = e.params.owner.toHex().toLowerCase() let posId = acctId + "-" + poolId let pos = Position.load(posId) if (pos == null) { pos = new Position(posId) pos.account = acctId pos.pool = poolId pos.shares = ZERO_BD pos.deposited0 = ZERO_BD pos.deposited1 = ZERO_BD pos.withdrawn0 = ZERO_BD pos.withdrawn1 = ZERO_BD } pos.deposited0 = pos.deposited0.plus(scale(e.params.amount0, 18)) pos.deposited1 = pos.deposited1.plus(scale(e.params.amount1, 18)) pos.updatedAt = e.block.timestamp pos.save() let pool = Pool.load(poolId) if (pool) { pool.liquidityToken0 = pool.liquidityToken0.plus(scale(e.params.amount0, 18)) pool.liquidityToken1 = pool.liquidityToken1.plus(scale(e.params.amount1, 18)) pool.save() } } export function handleBurn(e: Burn): void { let poolId = e.address.toHex().toLowerCase() let acctId = e.params.owner.toHex().toLowerCase() let posId = acctId + "-" + poolId let pos = Position.load(posId) if (pos) { pos.withdrawn0 = pos.withdrawn0.plus(scale(e.params.amount0, 18)) pos.withdrawn1 = pos.withdrawn1.plus(scale(e.params.amount1, 18)) pos.updatedAt = e.block.timestamp pos.save() } let pool = Pool.load(poolId) if (pool) { pool.liquidityToken0 = pool.liquidityToken0.minus(scale(e.params.amount0, 18)) pool.liquidityToken1 = pool.liquidityToken1.minus(scale(e.params.amount1, 18)) pool.save() } }
The pattern above covers the majority of decentralized exchange or lending subgraphs: a factory emits a creation event, you index each new instance with a template, and you create entities for each business event.
5) Decentralized Network Roles and Costs
The Graph decentralized network matches consumers who want queries with indexers who run Graph Node at scale.
Three roles coordinate with the GRT token:
- Indexers stake GRT, select subgraphs to index, and serve queries. Their rewards and fees depend on performance and stake.
- Curators signal which subgraphs are valuable by depositing GRT on bonding curves, guiding indexer attention and earning a portion of query fees.
- Delegators delegate GRT to indexers and share rewards without running infrastructure.
As a developer you have two main paths. During early development, you may deploy to a hosted service or a private Graph Node to iterate quickly.
Once the schema and mappings stabilize, publish to the decentralized network so multiple indexers can serve your traffic. You pay per query through a gateway or directly to indexers.
Performance improves with diversity of providers and users gain availability even if one indexer has issues.
6) Query Patterns and Performance
GraphQL lets you ask for exactly the fields you need. Performance depends on query shape, filters, and pagination strategy.
Below are patterns that keep things fast and safe in production.
Pagination strategies
For small lists use first
and skip
. For long histories, avoid large skip
values because they grow slower with big offsets.
Instead, paginate by a cursor field you control such as id
, timestamp
, or blockNumber
with _gt
filters.
# Cursor pagination by timestamp and id query TradesAfter($pool: String!, $ts: Int!, $lastId: String!) { trades( where: { pool: $pool, timestamp_gte: $ts, id_gt: $lastId } orderBy: id orderDirection: asc first: 1000 ) { id amountIn amountOut timestamp } }
Server side filtering
Use the where
filter to narrow results as much as possible. Avoid fetching thousands of rows to filter in the client.
Many filter operators are available: _in
, _not_in
, _contains
, _not_contains
, _gt
, _lt
, and friends.
# All positions for an account in a set of pools query($acct: String!, $pools: [String!]!) { positions( where: { account: $acct, pool_in: $pools } orderBy: updatedAt orderDirection: desc ) { id pool { id } shares updatedAt } }
Historical reads with the block argument
The block
argument lets you query the state as of a specific block number. This is essential for fair user interfaces and time travel analytics.
# Past state at a specific block height query($block: Int!, $pool: String!) { pool(block: { number: $block }, id: $pool) { id volumeToken0 volumeToken1 txCount } }
Counters and snapshots instead of ad hoc aggregation
Subgraphs do not offer SQL aggregates. To serve totals and charts, maintain running counters and time bucketed snapshots in mappings.
Query those entities directly rather than aggregating thousands of leaf rows per request.
Client side code examples
Any GraphQL client works. Here is a small example with fetch and with Apollo Client to illustrate pagination and retries.
// Simple fetch with pagination by id const endpoint = "https://gateway.thegraph.com/api/<key>/subgraphs/id/<subgraph-id>" const Q = ` query TradesAfter($pool: String!, $lastId: String!) { trades(where: { pool: $pool, id_gt: $lastId }, orderBy: id, orderDirection: asc, first: 1000) { id amountIn amountOut timestamp } } ` async function* streamTrades(pool) { let last = "" while (true) { const res = await fetch(endpoint, { method: "POST", headers: { "content-type": "application/json" }, body: JSON.stringify({ query: Q, variables: { pool, lastId: last } }) }).then(r => r.json()) const rows = res.data.trades if (rows.length === 0) return for (const t of rows) yield t last = rows[rows.length - 1].id } } // Apollo Client example with retry link (pseudocode) import { ApolloClient, InMemoryCache, HttpLink, ApolloLink } from "@apollo/client" const retry = new ApolloLink((op, forward) => forward(op).map(result => result)) // plug real retry link const client = new ApolloClient({ link: ApolloLink.from([retry, new HttpLink({ uri: endpoint })]), cache: new InMemoryCache() })
7) Advanced Features and Multi Chain
As your subgraph matures, you may need to index multiple networks, read immutable config through call handlers, or snapshot at block intervals.
The following techniques cover those needs.
Call handlers for immutable config
Call handlers allow you to index the results of specific contract calls. This is most valuable for immutable values like token decimals or pool parameters that do not change often.
You should avoid reading volatile state in tight loops from call handlers.
# subgraph.yaml (snippet) ... mapping: ... callHandlers: - function: initialize(address,address,uint24) handler: handleInitialize
// src/mappings/pool.ts import { InitializeCall } from "../../types/Pool/Pool" export function handleInitialize(call: InitializeCall): void { let poolId = call.to.toHex().toLowerCase() let pool = Pool.load(poolId) if (pool == null) return pool.feeBps = call.inputs.fee.toI32() pool.save() }
Block handlers for periodic tasks
If you need periodic tasks such as rolling snapshots or sanity checks, a block handler can run every block or at a defined interval.
Use sparingly since it increases indexing cost.
# subgraph.yaml (snippet) ... mapping: ... blockHandlers: - handler: handleBlock filter: kind: polling every: 250 # every 250 blocks
// src/mappings/pool.ts import { ethereum } from "@graphprotocol/graph-ts" export function handleBlock(block: ethereum.Block): void { // Example: no op heartbeats, or rotate a small cache }
Multi chain deployments
For multi chain apps you can deploy one subgraph per network or include multiple data sources in one manifest.
For a unified view across chains, run a tiny aggregation service that fans out queries per chain and merges results by a common key like account id.
8) Observability, Testing, and Versioning
Treat your subgraph like a service with releases, monitoring, and tests. Stability here translates directly into a better product and fewer support escalations.
- Metrics to track: head block height, lag versus the chain head, average mapping duration, handler error counts, and query error rates at your gateway.
- Alerts: notify when lag exceeds your tolerance, when mapping exceptions spike, or when a data source stalls at a block for more than a short window.
- Semantic versions: publish versions such as
v1.3.0
with a changelog. Pins to exact contract addresses and ABIs keep deployments reproducible. - Breaking changes: run the old and the new schema in parallel during client migration. If you need a one time backfill, write a short script to reprocess blocks or add a migration handler guarded by a block filter.
- Backfills and reindexing: when logic changes, you may need to reindex from scratch. Keep manifests tight with
startBlock
near contract creation so reindexing stays fast.
Matchstick tests
Unit tests for mappings improve confidence and speed up iteration. The Matchstick framework lets you construct fake events, call handlers, and assert entity outcomes.
// tests/pool.test.ts (concept) import { test, assert, newMockEvent } from "matchstick-as/assembly/index" import { handleSwap } from "../src/mappings/pool" import { Swap } from "../generated/Pool/Pool" import { BigInt, Address, ethereum, Bytes } from "@graphprotocol/graph-ts" test("swap creates trade and updates counters", () => { let e = newMockEvent() e.address = Address.fromString("0xPool...") e.block.number = BigInt.fromI32(100) e.block.timestamp = BigInt.fromI32(1700000000) e.transaction.hash = Bytes.fromHexString("0xabc...") as Bytes e.logIndex = BigInt.fromI32(0) let swap = new Swap( e.address, e.logIndex, e.transactionLogIndex, e.logType, e.block, e.transaction, e.parameters ) // set Swap params as needed... handleSwap(swap) // assert entities persisted... })
9) Reorgs, Determinism, and Safety
Chains reorg. Contracts upgrade. Data providers hiccup. Robust subgraphs handle these realities gracefully.
- Reorg handling: Always base entity identity on transaction hash and log index for event driven entities. Avoid external sequence numbers that could drift during replays.
- Block finality windows: If you represent live values like TVL on a landing page, consider a small delay to allow the indexer to settle on a stable head during volatile periods.
- Decimals and overflow: Use integer scaling for prices and amounts. Avoid casting large integers to 64 bit numbers in AssemblyScript. Prefer
BigInt
andBigDecimal
everywhere. - Call handler safety: Keep call handlers for immutable config. If you must read view functions that can change, cache once per pool and ignore during tight loops.
- Derived relations:
@derivedFrom
fields are computed by the store. Do not write them from mappings or you may create inconsistent rows. - Entity churn control: For high volume contracts, avoid creating a new entity for every micro event that is not relevant to queries. Collapse micro events into a single aggregated row per block if possible.
10) Production Playbook and Migration Tips
The last step is turning a working subgraph into a reliable service. The following checklists capture what experienced teams do before launch and during operations.
Pre launch checklist
- Confirm that the schema answers every screen and report with one or two shallow queries. If a dashboard needs three or more deep joins, adjust the model.
- Tighten
startBlock
values to just after contract deployment. This shortens initial sync and any reindex cycles. - Cache token decimals and symbols once per address to avoid repeated reads. Expose both raw and scaled amounts on entities that the front end renders often.
- Backfill PoolDayData and similar snapshot entities as your mappings observe events. Ensure you do not rebuild old days on every new event.
- Create browser and server side examples demonstrating cursor pagination, historical reads with the block argument, and a total counter read for TVL.
Operational runbooks
- Stuck sync: If the indexer stalls at a block, check node connectivity, ABI mismatches, and mapping exceptions. Deploy a quick patch and reindex if necessary.
- Reorg storm: If a chain is unstable, you may temporarily display values as of a slightly older block and label the page with the last synchronized height.
- Contract upgrade: When a protocol deploys a new factory or pool version, add a new data source with a higher start block and keep both live until flows migrate.
- Breaking schema change: Publish
v2
, keepv1
for a defined window, and add a banner to client apps prompting users to upgrade. Archivev1
after adoption. - Cost spikes: If gateway fees rise due to a traffic event, cache hot queries with a reverse proxy and aggressively limit query fields to what you render.
Client side ergonomics
Adopt persisted queries or a thin server that validates incoming GraphQL and forwards only approved shapes to indexers. This protects you from accidental heavy queries and simplifies caching by making payloads stable.
Quick check
- What are the three core parts of a subgraph and what does each do?
- Why are deterministic ids like
txHash-logIndex
preferred for event driven entities? - How can you query a past state of an entity, and why would you do it?
- When would you choose snapshots over client side aggregation, and why?
- What are two ways to paginate long histories without using large
skip
values?
Show answers
- Schema defines entities and relations, manifest declares contracts, events, and handlers with start blocks, and mappings transform events and calls into entity writes.
- They ensure uniqueness and stability across replays and reorgs so you never duplicate or miss a record when blocks reorganize.
- Use the GraphQL
block
argument, for examplepositions(block: { number: N })
, to reproduce the state at a point in time for fair comparisons or historical analytics. - Choose snapshots when your UI needs totals or charts over many events. Snapshots reduce query cost and avoid downloading massive result sets to sum in the browser.
- Paginate by cursor on a monotonic field such as
id
ortimestamp
using_gt
filters, or paginate by block ranges if you store block numbers on entities.
Go deeper
- Design lectures: event driven ETL, snapshot strategies, denormalization trade offs, and idempotent mapping patterns.
- Operations lectures: indexer selection and query budgeting, cache keys for GraphQL, multi region gateways, and sync lag service level objectives.
- Ecosystem lectures: alternatives and complements such as custom ETL, Subsquid, Dune, and Reservoir. When to combine subgraphs with direct node reads and how to stitch multi chain data.
- Testing labs: write Matchstick tests that simulate swaps, mints, and burns, verify counters and snapshot updates, and assert determinism under replays.
Next: decentralized storage with IPFS, Arweave, and Filecoin.