Decentralized Storage: IPFS, Arweave, and Filecoin
Content addressing, permanence versus leases, gateways, pinning, proofs, and production patterns for NFTs, dapps, analytics, and archives.
IPFS is a content-addressed peer-to-peer layer that gives you CIDs and global retrieval; availability depends on pinning and gateways.
Arweave offers “permanent” storage by funding long-term replication up front; content is addressed by transaction IDs and lives on the Permaweb.
Filecoin is a storage market with cryptographic proofs (replication and spacetime) where providers seal data and get paid over time; retrieval is a separate market.
In practice: publish assets to IPFS, back them with pinning and Filecoin deals, and mirror critical artifacts to Arweave for long-term guarantees. Verify hashes on chain, never rely solely on a single HTTP gateway, and plan for naming, encryption, and compliance.
1) IPFS: Content Addressing and Pinning
IPFS (InterPlanetary File System) is a distributed file system built on content addressing. You do not ask “give me file at https://host/path
”; you ask “give me the block whose hash is Qm...
or bafy...
.”
- CID (Content ID): a self-describing identifier that includes the hash function and digest (plus codec). CIDv0 is Base58 with SHA-256; CIDv1 is multibase (often Base32) with multicodec and multihash. The content itself determines the address.
- DAG and UnixFS: files are chunked into blocks that form a Merkle DAG. UnixFS defines how to represent files and directories on that DAG (with chunking, links, and metadata).
- IPLD: a data model for Merkle-linked data across codecs (DAG-CBOR, DAG-JSON). It lets you treat many data structures as linkable graphs under the same addressing scheme.
- Retrieval: nodes find blocks via a DHT and newer transports (bitswap, GraphSync). Any peer that has the block can serve it, gateways just speak HTTP on top.
- Pinning: content addressing proves integrity, not availability. If no peers pin your CID, the network will eventually forget it. Pinning tells nodes to retain blocks; you can run your own pinning node or use a service.
CAR files and deterministic imports
CAR (Content Addressable aRchive) bundles blocks in a portable file. Importing a CAR reproduces the exact CIDs, which is perfect for reproducible builds, NFT mints, and audits.
Pinning strategies
- Self-hosted node: full control, best when you already run infra. Pair with IPFS Cluster for replication and sharding across nodes.
- Pinning services: third-party providers that keep your pins available and often back them with Filecoin deals. Useful for teams that prefer managed storage.
- Hybrid: pin critical CIDs on your nodes and also via a provider. Replicate popular content closer to users through caching gateways or CDNs.
2) Arweave: Permaweb and Pay-Once
Arweave targets permanent storage: you pay up front to fund long-term replication. Data is addressed by a transaction ID and served through the Permaweb (content over HTTP with immutable back-ends).
- Economic model: an endowment-like mechanism funds ongoing storage costs while technology trends reduce per-byte costs. Users pay once; miners are incentivized to keep data available.
- Proofs: miners provide random access proofs that they can retrieve chunks from random past blocks. This ties rewards to persistent data availability, not just “having space.”
- Transactions and bundles: each upload is a transaction. Bundlers let you batch many small files into one on-chain commit for lower overhead and better throughput.
- Permaweb apps: web apps can live entirely in Arweave HTML, JS, CSS, and assets are immutable. You can version by uploading new transactions and linking forward pointers in metadata.
Arweave IDs versus CIDs
Arweave uses transaction IDs (base64url-like strings) rather than multihash CIDs. Both are content-addressed in spirit, but you should not mix formats. If you mirror IPFS content to Arweave, store a manifest that maps CID to Arweave ID so clients can fail over safely.
Costs and latency
Uploads cost more up front than pinning, but you avoid ongoing invoices. Retrieval is generally fast from public gateways, but heavy traffic should use multiple gateways or caches. Treat Permaweb URLs like immutable asset URLs.
3) Filecoin: Storage and Retrieval Markets
Filecoin is a decentralized market where storage providers make deals to store your data and prove it continuously. It complements IPFS (which focuses on addressing and retrieval) with a durable economic layer.
- Deals: you negotiate price, duration, and replication factor with providers. Data is sealed into sectors; providers post Proof-of-Replication initially and Proof-of-Spacetime periodically to earn FIL.
- Retrieval: separate from storage. Retrieval providers specialize in serving data quickly for a fee; alternatively you serve from your own IPFS pins or HTTP caches for hot paths.
- Filecoin Plus: a data-cap program that gives verified clients better economics for public-good datasets, encouraging providers to accept them.
- FVM: the Filecoin Virtual Machine enables on-chain programmability for storage markets automated renewals, escrowed deals, verification, and data DAOs.
- Bridged services: many IPFS pinning services write backups as Filecoin deals under the hood. This gives you hot IPFS retrieval plus provable cold storage.
4) Compare and Combine: Which Tool for Which Job
- Availability model: IPFS needs pinning; Arweave bakes persistence into economics; Filecoin enforces ongoing proofs and payments.
- Addressing: IPFS uses CIDs (multihash, multicodec); Arweave uses transaction IDs; Filecoin references payload CIDs for deals but retrieval often goes through IPFS.
- Cost profile: IPFS pinning is subscription-like; Arweave is pay-once; Filecoin is contract for duration with market pricing.
- Latency: IPFS hot content via gateways is fast; Arweave is good via gateways and caches; Filecoin retrieval may be slower unless combined with IPFS or a retrieval provider.
- Use cases: IPFS for general web3 assets and mutable dev cycles; Arweave for permanent artifacts; Filecoin for verifiable archival and large datasets with market-based replication.
Real-world architecture: publish to IPFS → pin across multiple providers and your own node → back with Filecoin deals for verifiable durability → mirror crucial CIDs to Arweave and store a manifest linking both. Clients try IPFS first, then fall back to Arweave if needed.
5) Builder Patterns (NFTs, Metadata, Datasets)
NFTs
- Metadata JSON: store
name
,description
, andimage
as a CID-based URL (ipfs://bafy...
) or an Arweave link. Freeze metadata by committing to a final CID and disabling further updates in the contract if that is part of your promise. - Media files: images, audio, and video should be content addressed. Avoid single HTTP hosts. Use CAR imports for deterministic CID creation.
- On-chain commitment: store a hash or CID digest per token. Even if you later serve through a CDN, the on-chain reference anchors the canonical content.
- Mirroring: mirror the metadata and media to Arweave and publish a JSON manifest that maps IPFS CIDs to Arweave transaction IDs.
Dynamic metadata and reveals
- Use IPNS or an ENS contenthash to point to the current CID during a reveal window; later, pin the final CID and update the pointer once.
- For trait randomization, reveal maps can live on Arweave to ensure a verifiable, immutable draw after the event.
Datasets and analytics
- Sharding: split large datasets into fixed-size CARs with a manifest that lists part CIDs, byte ranges, and checksums.
- Versioning: publish per-epoch snapshots (for example daily) with a monotonic version id. Keep a signed index file that maps version to root CID or Arweave ID.
- Compute adjacency: if consumers run compute, provide DAG-CBOR or Parquet for columnar reads and link them via IPLD for chunk-friendly streaming.
6) Naming and Addressing (CIDs, IPNS, ENS, Arweave IDs)
- CID v1, Base32: prefer Base32 CIDs so they are subdomain-gateway friendly (for example
https://<cid>.ipfs.your-gateway.io/
). - IPNS: a mutable name that can point to different CIDs over time. Useful for “latest” pointers. Back it with signatures and publish through multiple nodes to reduce propagation lag.
- ENS contenthash: bind a human name to a content address. You can point ENS records at IPFS CIDs or Arweave IDs. Clients resolve to the correct gateway automatically.
- Arweave transaction IDs: immutable identifiers. Use a small, signed manifest to map human names to these IDs for friendliness.
ipfs://
) or name records (ENS) your client can resolve to any gateway.7) Gateways, Performance, and Reliability
Gateways bridge content addressing to everyday HTTP. In production you likely use several:
- Public gateways: easy to start, but rate limited. Good as backups, not as your only source.
- Dedicated gateways: run or rent a gateway that talks to your own nodes. Add CDN caching with immutable caching headers since CIDs never change.
- Subdomain isolation: serve each CID at its own subdomain to avoid content-type sniffing issues and to enable stricter browser security policies.
- Edge caching: pre-warm popular CIDs and set
Cache-Control: public, max-age=31536000, immutable
. For Arweave, use multiple mirrors and return the first that succeeds.
8) On-Chain Verification and Integrity Checks
You do not need to store whole files on chain; store commitments to them. Common approaches:
- Store the digest only: extract the multihash digest from a CID (for example SHA-256) and store it as
bytes32
. Off chain, you reconstruct the CID by prefixing with multicodec and base encoding. - Merkle batching: for collections, compute a Merkle root of item hashes. Store the root on chain and publish proofs for each file in a separate registry.
- Arweave integrity: store the Arweave transaction ID and a separate SHA-256 hash in your contract. A verifier fetches the transaction and checks the hash matches before accepting it.
// Solidity: commit to a file's SHA-256 digest (e.g., from a CID's multihash) contract ContentRegistry { mapping(uint256 => bytes32) public sha256Of; // tokenId => sha256 digest function commit(uint256 id, bytes32 digest) external { require(sha256Of[id] == bytes32(0), "already set"); sha256Of[id] = digest; } function verify(uint256 id, bytes memory data) external view returns (bool) { return sha256(data) == sha256Of[id]; // EVM exposes sha256 precompile } }
9) Encryption, Access Control, and Compliance
Public networks replicate widely. If you need confidentiality, encrypt before upload and manage keys separately.
- Envelope encryption: generate a random content key to encrypt the file; encrypt that key for each authorized recipient. Publish the ciphertext publicly; share the small wrapped keys privately.
- Attribute-based encryption: encrypt to a policy (for example “has a credential from issuer X”). A gateway or client checks a ZK proof or a verifiable credential before releasing keys.
- Revocation and rotation: you cannot delete public ciphertext easily. Plan for key rotation and short-lived access tokens. For GDPR-style erasure, keep sensitive data off public storage and store only hashes or irreversible commitments on chain.
- PII caution: do not upload personal data in the clear. If you must handle it, encrypt and keep keys in a separate trust domain with audit trails.
10) Operational Hygiene: Pinsets, Deals, and Monitoring
- Pinset management: maintain manifests of all production CIDs with labels, replication counts, and last verification date. Periodically fetch and checksum to catch silent bit-rot or gateway regressions.
- Multi-provider replication: pin with at least two providers and one of your own nodes. For Arweave, keep two gateway mirrors in different regions.
- Filecoin deal cadence: renew deals before expiry. For hot content, maintain both a Filecoin deal and active IPFS pins. Track sealing and proof health.
- Cost dashboards: watch upload sizes, gateway egress, and storage duration. For bundles, prefer compression-friendly formats and deduplicate assets across collections.
- Disaster playbooks: if a gateway goes down, switch DNS or client resolver to another. If a pinning provider fails, bring a cold backup online and rehydrate pins from CAR archives.
Quick check
- What problem does pinning solve on IPFS, and why does content addressing alone not guarantee availability?
- When would you prefer Arweave over IPFS plus Filecoin, and why?
- Name two reasons you should not hardcode a single HTTP gateway in production clients.
- How do you commit to NFT media integrity on chain without storing the file itself?
- What extra steps are needed if you must store sensitive personal data using these networks?
Show answers
- Pinning keeps blocks retained by specific peers so they remain retrievable. Content addressing only verifies integrity; if no peers pin the data, it can disappear from the network.
- For artifacts that must be immutable and available for decades with one-time payment and simple URLs (for example compliance archives, canonical NFT media promises). Arweave’s economics target permanence rather than ongoing leases.
- Public gateways throttle and can fail; relying on one creates a central point of failure and a censorable choke point. Using protocol URLs or name records lets clients choose any healthy gateway.
- Store and verify a cryptographic commitment such as the SHA-256 digest (or a Merkle root of many digests) in your smart contract. The off-chain file is referenced by CID or Arweave ID whose hash matches that digest.
- Encrypt before upload, manage keys separately (envelope or attribute-based), plan for key rotation, and avoid posting raw PII. Keep only irreversible hashes on chain for auditability.
Go deeper
- Concept lectures: multihash and multicodec, UnixFS internals, IPLD schemas, CAR format and reproducible imports.
- Arweave lectures: economic endowment, random access proofs, bundlers and permaweb app versioning.
- Filecoin lectures: deal lifecycle, replication and spacetime proofs, retrieval market design, FVM automation patterns.
- Security lectures: content-type isolation at gateways, subdomain routing, digest extraction from CIDs, Merkle manifests.
- Builder labs: mint an NFT with deterministic metadata CAR, back the CID with a Filecoin deal, mirror to Arweave, and ship a manifest that maps CID to Arweave ID. Add a contract that stores the digest and a test that verifies integrity.