Modernising the Data Layer: Iceberg, StarRocks, and Why Your Warehouse Choice Matters

Most enterprises have a “data warehouse strategy” that locked them into a vendor a decade ago. AI Agents need to query operational, lake, and warehouse data as one. The data layer choice now decides whether you can ever reach Levels 3, 4, and 5.

Editorial — VoltusWave

Platform & Data Architecture

For most of the last decade, the data warehouse decision was an architectural one made by a chief data officer with a multi-year horizon. Pick a vendor. Migrate the existing reporting estate. Build the analytics roadmap. Train the team. Renew at year three. Renew again at year six. The decision was important but slow-moving, and the vendor lock-in was rarely felt as a daily constraint.

That has changed. AI Agents need to read operational data, historical lake data, and warehouse data in a single query. The data layer is no longer the substrate that BI tools sit on top of. It is the substrate that the workforce reasons across. Whether your data architecture can serve the next ten years now turns on two specific decisions you may not have realised you were making.

This post walks both of them: open table formats (Apache Iceberg) and customer-choice warehouse (StarRocks alongside BigQuery, Microsoft Fabric, or Snowflake). What they buy you, what closed alternatives cost you, and why the data layer choice now determines whether the modernisation programme can ever reach Levels 4 and 5.

The data warehouse lock-in problem

The first generation of cloud data warehouses delivered real value: elastic compute, separation of storage from compute, columnar performance at scale. The trade-off was that your data lived inside the warehouse vendor's storage format, accessible only through the warehouse vendor's compute. To get the data out, you exported it; to query it from another engine, you copied it; to migrate to another vendor, you began a multi-year programme.

This was tolerable while the warehouse was a static analytics destination. It is intolerable when the warehouse is one of three places an agent needs to read from in real time. The warehouse cannot be a silo if the agent reasoning crosses systems.

Where the lock-in actually bites

The lock-in is felt in three concrete ways once agents arrive.

Cross-engine queries become impossible. An agent cannot reason across the operational SOR and the warehouse if the warehouse format only opens to the warehouse vendor's compute. The data has to move; the round-trip kills agent latency.
The historical estate is not portable. When the architecture has to evolve — and it always does — migrating the historical estate to the next platform is a programme in its own right. The data has gravity, and the gravity is denominated in vendor-specific files.
Pricing is captive. When your historical data is in a proprietary format, the vendor knows it. Renewal negotiations reflect that. CFOs notice this slowly and then all at once.

Apache Iceberg — what it actually buys you

Apache Iceberg is an open table format for data lakes. The technical description is dry; the strategic implication is significant. Iceberg files sit in your object storage (S3, Azure Blob, GCS) in a format that any compatible compute engine can read. StarRocks, BigQuery, Snowflake, Trino, Spark, Flink, Athena — all read Iceberg natively. Your historical estate stops being captive to whichever query engine wrote it.

Three properties that matter at L4 and beyond

Open standards. The data is portable across compute engines and clouds. Today's choice does not become tomorrow's prison. The customer's data architecture is theirs, not the warehouse vendor's.

Time-travel queries. Iceberg snapshots let you query the data as of any point in the past. For an agent workforce, this is critical — Decision Traces can replay against the state of the world at decision time, not the state of the world today. Audit and reproducibility become structural properties.

Schema evolution without rewrites. Add columns, drop columns, change types — without rewriting the historical files. The data layer evolves as the business evolves; legacy schema decisions do not become permanent constraints.

💡The strategic principle. Architect one level above where you are. An enterprise at Level 2 with an open data architecture will outpace an enterprise at Level 3 locked into a proprietary one. Iceberg is the structural protection of the next decade of optionality.

StarRocks — the high-performance compute layer

StarRocks is the analytics engine in the VoltusWave-native stack. It runs sub-second queries against operational and historical data at enterprise scale. The critical point is not that StarRocks is fast (many engines are); it is that the AI experiences at L2 and L3 — embedded analytics on the SOR, conversational AI Mode against the ERP — cannot run on a substrate that takes minutes to return a query. Agent-speed questioning needs warehouse-speed answers.

StarRocks plus Iceberg gives you both: open data, fast queries. The warehouse is no longer a separate analytics destination — it is part of the unified data plane the agents reason across.

Customer-choice warehouse — the realistic strategy

Most enterprises did not start with Iceberg. They started with a vendor warehouse: BigQuery, Microsoft Fabric, Snowflake, Redshift, or another. They have years of investment in the existing warehouse, hundreds of dashboards built on it, training and recruiting infrastructure around it. Telling them to migrate to a new warehouse as part of the modernisation programme is impractical.

The realistic strategy is customer-choice warehouse — the platform respects the customer's existing data investments and operates with whichever warehouse the customer has already chosen. VoltusWave runs natively on StarRocks plus Iceberg, but it also connects to BigQuery, Fabric, Snowflake, and other warehouses where the customer has already committed.

“The platform follows the customer's data strategy, rather than dictating it. The lock-in conversation has shifted permanently — from “which warehouse should we pick?” to “how do we keep our options open as the platform evolves?””

This matters because most modernisation programmes do not need to migrate the warehouse to capture the value. The warehouse can stay where it is. The modernisation happens in the operational layer (the SOR), the lake layer (Iceberg), and the agent layer. The warehouse becomes one source the unified data plane queries across, rather than the gravitational centre of the architecture.

What the unified data plane actually delivers

Once operational SOR, Iceberg lake, and customer-choice warehouse are presented as one unified data plane, three things become operationally possible that were not before.

Cross-system queries in seconds, not days

The class of question that historically required a quarterly data engineering project — “how does our supplier risk correlate with our SLA performance over the last two years, segmented by region and adjusted for currency moves?” — becomes a question an agent answers in seconds against the unified data plane. The friction that previously made these questions ones-per-quarter rather than ones-per-day is gone.

Operational and historical context in one query

Agents reason across the live operational state and the historical context simultaneously. The AP agent processing today's invoice queries the supplier's payment history from the warehouse and the supplier's open POs from the operational SOR in one decision. There is no batch lag, no analytics delay, no out-of-date dashboard.

Structured plus unstructured in one reasoning step

Operational data, lake data, and the document estate (PDFs, emails, contracts) are addressable through one query plane. Agents reason across structured operational data and unstructured documents in a single step. Cross-system insights stop being a quarterly analytics initiative and become a property of every agent decision.

What this means for the modernisation business case

Three concrete implications for the data layer line in the modernisation business case.

1. The warehouse migration is no longer required

If the existing warehouse works, keep it. Connect it to the unified data plane. Modernise around it, not through it. The capex saved on warehouse migration is significant; the strategic optionality of customer-choice warehouse is even more significant.

2. The lake layer becomes strategic, not tactical

Iceberg in the lake is no longer just a data engineering choice. It is the protection of the next decade of optionality. Most modernisation programmes underweight this; CIOs and CDOs who set the architecture in 2026 will see the dividends through 2030 and beyond.

3. The agent workforce depends on it

This is the strategic point. Without a unified data plane, the agent workforce cannot reach L4 and L5. The data layer choice is not a separate decision from the agent workforce decision. It is the precondition for it.

What to do this quarter

📋Step 1: Audit your current data layer. Where is the operational data? Where is the historical estate? What format? Which compute engine has access?

Step 2: Evaluate the Iceberg adoption case for the lake layer. Most enterprises should be on Iceberg by 2027 regardless of what they buy from anyone.

Step 3: Resist warehouse migration unless there is a specific operational reason. The customer-choice warehouse strategy means most enterprises do not need to migrate the existing warehouse to deploy an agent workforce.

Step 4: Insist on the unified data plane in any modernisation evaluation. Operational, lake, and warehouse must be queryable as one. If the platform cannot deliver this, agent-driven outcomes will not be reachable on it.

Step 5: Watch the lock-in. Every closed table format, every proprietary export, every “data leaves only through us” provision is a future modernisation programme that will be more expensive than the one you are planning today.

Closing

The data layer used to be a multi-year decision with slow consequences. It is now a quarterly decision with compounding consequences. The enterprises that pick open standards and customer-choice warehouse architecture in 2026 will reach L4 and L5 in a way the enterprises locked into proprietary table formats simply cannot.

The next post in this series moves up to the workflow layer — where BPM and RPA are now legacy and Agentic Process Orchestration is what replaces both.

Pick open standards. Keep your options open. Build for L5 from the data layer up.

Open Data Foundation

VoltusWave runs natively on StarRocks plus Apache Iceberg — or connects to your existing BigQuery, Microsoft Fabric, or Snowflake investment. The unified data plane lets agents reason across operational, lake, and warehouse data in one query. No vendor lock-in. No forced migration.

Book a Discovery Call →See the Platform →

Continue the series