

De-fi
a modern open source data stack for blockchain – Crypto News
1.The challenge for modern blockchain data stack
There are several challenges that a modern blockchain indexing startup may face, including:
- Massive amounts of data. As the amount of data on the blockchain increases, the data index will need to scale up to handle the increased load and provide efficient access to the data. Consequently, it leads to higher storage costs, slow metrics calculation, and increased load on the database server.
- Complex data processing pipeline. Blockchain technology is complex, and building a comprehensive and reliable data index requires a deep understanding of the underlying data structures and algorithms. The diversity of blockchain implementations inherits it. Given specific examples, NFTs in Ethereum are usually created within smart contracts following the ERC721 and ERC1155 formats. In contrast, the implementation of those on Polkadot, for instance, is usually built directly within the blockchain runtime. Those should be considered NFTs and should be saved as those.
- Integration capabilities. To provide maximum value to users, a blockchain indexing solution may need to integrate its data index with other systems, such as analytics platforms or APIs. This is challenging and requires significant effort placed into the architectural design.
As blockchain technology has become more widespread, the amount of data stored on the blockchain has increased. This is because more people are using the technology, and each transaction adds new data to the blockchain. Additionally, blockchain technology has evolved from simple money-transferring applications, such as those involving the use of Bitcoin, to more complex applications involving the implementation of business logic within smart contracts. These smart contracts can generate large amounts of data, contributing to the increased complexity and size of the blockchain. Over time, this has led to a larger and more complex blockchain.
In this article, we review the evolution of Footprint Analytics’ technology architecture in stages as a case study to explore how the Iceberg-Trino technology stack addresses the challenges of on-chain data.
Footprint Analytics has indexed about 22 public blockchain data, and 17 NFT marketplaces, 1900 GameFi projects, and over 100,000 NFT collections into a semantic abstraction data layer. It’s the most comprehensive blockchain data warehouse solution in the world.
Regardless of blockchain data, which includes over 20 billion rows of records of financial transactions, which data analysts frequently query. it’s different from ingress logs in traditional data warehouses.
We have experienced 3 major upgrades in the past several months to meet the growing business requirements:
2. Architecture 1.0 Bigquery
At the beginning of Footprint Analytics, we used google bigquery as our storage and query engine; Bigquery is a great product. It is blazingly fast, easy to use, and provides dynamic arithmetic power and a flexible UDF syntax that helps us quickly get the job done.
However, Bigquery also has several problems.
- Data is not compressed, resulting in high costs, especially when storing raw data of over 22 blockchains of Footprint Analytics.
- Insufficient concurrency: Bigquery only supports 100 simultaneous queries, which is unsuitable for high concurrency scenarios for Footprint Analytics when serving many analysts and users.
- Lock in with Google Bigquery, which is a closed-source product.
So we decided to explore other alternative architectures.
3. Architecture 2.0OLAP
We were very interested in some of the OLAP products which had become very popular. The most attractive advantage of OLAP is its query response time, which typically takes sub-seconds to return query results for massive amounts of data, and it can also support thousands of concurrent queries.
We picked one of the best OLAP databases, Doris, to give it a try. This engine performs well. However, at some point we soon ran into some other issues:
- Data types such as Array or JSON are not yet supported (Nov, 2022). Arrays are a common type of data in some blockchains. For example, the topic field evmlogs. Unable to compute on Array directly affects our ability to compute many business metrics.
- Limited support for DBT, and for merge statements. These are common requirements for data engineers for ETL/ELT scenarios where we need to update some newly indexed data.
That being said, we couldn’t use Doris for our whole data pipeline on production, so we tried to use Doris as an OLAP database to solve part of our problem in the data production pipeline, acting as a query engine and providing fast and highly concurrent query capabilities.
Unfortunately, we could not replace Bigquery with Doris, so we had to periodically synchronize data from Bigquery to Doris using it as a query engine. This synchronization process had several issues, one of which was that the update writes got piled up quickly when the OLAP engine was busy serving queries to the front-end clients. Subsequently, the speed of the writing process got affected, and synchronization took much longer and sometimes even became impossible to finish.
We realized that the OLAP could solve several issues we are facing and could not become the turnkey solution of Footprint Analytics, especially for the data processing pipeline. Our problem is bigger and more complex, and we could say OLAP as a query engine alone was not enough for us.
4. Architecture 3.0 Iceberg + Trino
Welcome to Footprint Analytics architecture 3.0, a complete overhaul of the underlying architecture. We have redesigned the entire architecture from the ground up to separate the storage, computation and query of data into three different pieces. Taking lessons from the two earlier architectures of Footprint Analytics and learning from the experience of other successful big data projects like Uber, Netflix, and Databricks.
4.1. Introduction to the data lake
We first turned our attention to data lake, a new type of data storage for both structured and unstructured data. Data lake is perfect for on-chain data storage as the formats of on-chain data range widely from unstructured raw data to structured abstraction data Footprint Analytics is well-known for. We expected to use data lake to solve the problem of data storage, and ideally it would also support mainstream compute engines such as Spark and Flink, so that it wouldn’t be a pain to integrate with different types of processing engines as Footprint Analytics evolves .
Iceberg integrates very well with Spark, Flink, Trino and other computational engines, and we can choose the most appropriate computation for each of our metrics. For example:
- For those requiring complex computational logic, Spark will be the choice.
- Flink for real-time computation.
- For simple ETL tasks that can be performed using SQL, we use Trino.
4.2. query engine
With Iceberg solving the storage and computation problems, we had to think about choosing a query engine. There are not many options available. The alternatives we considered were
The most important thing we considered before going deeper was that the future query engine had to be compatible with our current architecture.
- To support Bigquery as a Data Source
- To support DBT, on which we rely for many metrics to be produced
- To support the BI tool metabase
Based on the above, we chose Trino, which has very good support for Iceberg and the team were so responsive that we raised a bug, which was fixed the next day and released to the latest version the following week. This was the best choice for the Footprint team, who also required high implementation responsiveness.
4.3. performance testing
Once we had decided on our direction, we did a performance test on the Trino + Iceberg combination to see if it could meet our needs and to our surprise, the queries were incredibly fast.
Knowing that Presto + Hive has been the worst comparator for years in all the OLAP hype, the combination of Trino + Iceberg completely blew our minds.
Here are the results of our tests.
case 1: join a large dataset
An 800 GB table1 joins another 50 GB table2 and does complex business calculations
case2: use a big single table to do a distinct query
Test sql: select distinct(address) from the table group by day
The Trino+Iceberg combination is about 3 times faster than the Doris in the same configuration.
In addition, there is another surprise because Iceberg can use data formats such as Parquet, ORC, etc., which will compress and store the data. Iceberg’s table storage takes only about 1/5 of the space of other data warehouses. The storage size of the same table in the three databases is as follows:
Note: The above tests are examples we have encountered in actual production and are for reference only.
4.4. upgrade effect
The performance test reports gave us enough performance that it took our team about 2 months to complete the migration, and this is a diagram of our architecture after the upgrade.
- Multiple computer engines match our various needs.
- Trino supports DBT, and can query Iceberg directly, so we no longer have to deal with data synchronization.
- The amazing performance of Trino + Iceberg allows us to open up all Bronze data (raw data) to our users.
5.Summary
Since its launch in August 2021, the Footprint Analytics team has completed three architectural upgrades in less than a year and a half, thanks to its strong desire and determination to bring the benefits of the best database technology to its crypto users and solid execution on implementing and upgrading its underlying infrastructure and architecture.
The Footprint Analytics architecture upgrade 3.0 has bought a new experience to its users, allowing users from different backgrounds to get insights into more diverse usage and applications:
- Built with the Metabase BI tool, Footprint facilitates analysts to gain access to decoded on-chain data, explore with complete freedom of choice of tools (no-code or hardcord), query entire history, and cross-examine datasets, to get insights in no-time.
- Integrate both on-chain and off-chain data to analyze across web2 + web3;
- By building / querying metrics on top of Footprint’s business abstraction, analysts or developers save time on 80% of repetitive data processing work and focus on meaningful metrics, research, and product solutions based on their business.
- Seamless experience from Footprint Web to REST API calls, all based on SQL
- Real-time alerts and actionable notifications on key signals to support investment decisions
-
Blockchain4 days ago
It’s About Trust as NYSE Owner, Polymarket Bet on Tokenization – Crypto News
-
Technology1 week ago
Einride Raises $100 Million for Road Freight Technology Solutions – Crypto News
-
Technology1 week ago
CAKE eyes 60% rally as PancakeSwap hits $772B trading all-time high – Crypto News
-
Business1 week ago
REX-Osprey Files For ADA, HYPE, XLM, SUI ETFs as Crypto ETF Frenzy Heats Up – Crypto News
-
others1 week ago
USD/JPY returns below 147.00 amid generalized Dollar weakness – Crypto News
-
others1 week ago
UK firms’ inflation expectations seen higher at 3.5% in the September quarter – Crypto News
-
Business1 week ago
October Fed Rate Cut Odds Rise After Weak U.S. Labor Data, Bitcoin Surges – Crypto News
-
Blockchain1 week ago
USDT, USDC Dominance Falls To 82% Amid Rising Competition – Crypto News
-
Cryptocurrency1 week ago
BREAKING: Bitcoin Reclaims $120K. Is ATH Next? – Crypto News
-
Blockchain1 week ago
Robinhood CEO Says Asset Tokenization ‘Can’t Be Stopped’ – Crypto News
-
others1 week ago
Fed’s Lorie Logan Urges Caution on Further Rate Cuts Citing Inflation Risks – Crypto News
-
Business1 week ago
Nasdaq-Listed Fitell Adds Pump.fun’s PUMP To Supplement Solana Treasury – Crypto News
-
others1 week ago
EUR/USD remains bid as investors ramp up bets of Fed rate cuts – Crypto News
-
Cryptocurrency1 week ago
A Complete Guide for Beginners – Crypto News
-
Cryptocurrency1 week ago
XRP and DOGE ETFs Push $500 Million Milestone for U.S. Investment Fund – Crypto News
-
Cryptocurrency1 week ago
Private Key Leakage Remains the Leading Cause of Crypto Theft in Q3 2025 – Crypto News
-
Technology7 days ago
What Arattai, Zoho’s homegrown messaging app offers: Key features, how to download, top FAQs explained – Crypto News
-
Technology6 days ago
Diwali bonanza: iPhone 16 Pro Max price crashes by up to ₹55,000 on Flipkart – Don’t miss out! – Crypto News
-
Cryptocurrency1 week ago
Is Today’s $165B Crypto Market Rally The Start of a Massive Bull Run? – Crypto News
-
Metaverse1 week ago
BlackRock launches AI tool for financial advisors. Its first client is a big one. – Crypto News
-
others1 week ago
Current interest rate level is very appropriate – Crypto News
-
Blockchain1 week ago
WLFI and the Trump connection, opportunity or just hype? – Crypto News
-
Cryptocurrency1 week ago
What happens when $1.8M RLUSD enters the market – Is it an XRP rally? – Crypto News
-
others1 week ago
Pound Sterling trades firmly against Greenback on slowing US job demand – Crypto News
-
Business1 week ago
BNB Leads Crypto Market Rally With Fresh All-Time High, Expert Sees $5000 Upside – Crypto News
-
Technology1 week ago
Tech Giant Samsung Taps Coinbase To Provide Crypto Access, Driving Adoption – Crypto News
-
others1 week ago
Bitget Joins UNICEF Game Jam To Train 300,000 Youths In Blockchain – Crypto News
-
others1 week ago
MetaMask Gears Up for Major MASK Token Airdrop With Reward Points System Launch – Crypto News
-
Technology1 week ago
Gemini Nano Banana hacks: How to make AI-powered handwritten Diwali 2025 invites, reveals Google – Crypto News
-
Technology1 week ago
Expert Predicts SHIB Rally as Shiba Inu Restores Shibarium After $4M Hack Shutdown – Crypto News
-
Cryptocurrency1 week ago
Private Key Leakage Remains the Leading Cause of Crypto Theft in Q3 2025 – Crypto News
-
Technology1 week ago
ASTER Deposits Flows Into Binance Wallets Following CZ Endorsement, Listing Incoming? – Crypto News
-
Technology1 week ago
Morgan Stanley’s Tech Boss Says AI Coding Has ‘Profound’ Impact – Crypto News
-
Technology1 week ago
Boom or bubble: How long can the AI investment craze last? – Crypto News
-
Technology1 week ago
Breaking: CME to Launch 24/7 Crypto Futures Trading Amid Rising Institutional Demand – Crypto News
-
Blockchain1 week ago
ETHZilla CEO Predicts Ethereum as Future of Finance – Crypto News
-
De-fi1 week ago
Zcash Leads Rally as Bitcoin Surpasses $120,000 – Crypto News
-
others1 week ago
Bitcoin Price Hits $120K, Is Citigroup’s Bold Q4 Prediction in Motion? – Crypto News
-
Technology1 week ago
Aravind Srinivas takes a jab at Google as $200 Perplexity Comet browser goes free: ‘O hey hi Chrome!’ – Crypto News
-
Business1 week ago
BNB Rally to $1,300 Will Continue As Binance Hits Crucial Q3 Milestone, Says Expert – Crypto News
-
Technology1 week ago
Exclusive discounts on newly launched 2025 tablets with up to 53% off from Apple, Samsung, Lenovo, Xiaomi and OnePlus – Crypto News
-
Metaverse1 week ago
AI chatbots move toward a future with advertising and online shopping – Crypto News
-
Cryptocurrency1 week ago
ETF inflows, ‘debasement trade’ fuel bitcoin’s climb above $123K – Crypto News
-
Business1 week ago
Pro-Crypto Mike Selig Emerges As CFTC Chair Frontrunner, Gains Ripple CLO’s Endorsement – Crypto News
-
Technology1 week ago
‘Every Crypto ETF You Can Imagine’: Expert Predicts Flurry of Filings After REX-Osprey’s 21 Applications – Crypto News
-
Business1 week ago
99.3% of Bitcoin Supply in Profit, Analyst Warns of Short-Term Correction – Crypto News
-
Blockchain1 week ago
Bitcoin And XRP Are Testing Key Resistances And Could Turn Bloody Again, Here’s Why – Crypto News
-
others1 week ago
Trump’s Real Estate Moves On-Chain as Hut8 Adds WLFI Tokens to Boost Treasury – Crypto News
-
Technology1 week ago
Indonesia Revokes TikTok License Suspension After Data Submitted – Crypto News
-
Cryptocurrency1 week ago
Stripe’s USDC Transfers Exceed $100 Million on Polygon, Base, Ethereum – Crypto News