Trusted Execution Environment (TEE) in Polyhedra

Polyhedra is adding an additional security layer on our bridge, oracles, and verifiable AI marketplace, leveraging Google Confidential Computing, an trusted execution environment (TEE). We have conducted extensive study of the existing TEE solutions, implemented our TEE security layer with Google Confidential Space, and verified our ZK-TEE proofs—a novel approach to verify computation on Google cloud on EVM chains.

This new security layer is gradually rolling out to various ZK-based products of Polyhedra, across the cross-chain interoperability systems in multiple chains, and Polyhedra will also bring TEE proofs and TEE-secured AI applications on our EXPchain, through a native precompile.

What is TEE?

TEE refers to trusted execution environment. It is a CPU technology that enables the CPU to perform computation on encrypted and integrity-protected memory, so that even the owners of the machines (such as Google Cloud), the operating system, and other virtual machines running the same hypervisor, cannot see any of such data.

TEE has been used prevalently today. Apple products by default have full-disk encryption, called “Data Protection”, which rely on TEE on Apple chips. Unless a user successfully unlocks the product, such as using the fingerprints or passwords, the entire device has data encrypted, including the passwords and passkeys that the user stores with the device. Recent versions of Microsoft Windows also have TEE-protected fully disk encryption in BitLocker, so that the disk is only accessible when the operating system and boot sequences remain secure. 

Our vision with TEE

Since last year, Polyhedra has been focusing security, safety, and verifiability of cross-chain interoperability and AI. There are several products in the cooking, a few of which have been announced. But, in general, Polyhedra has been focusing on three key areas:

  • Cross-chain bridges
  • ZKML and verifiable AI agents
  • Verifiable AI marketplace, including MCP servers

Security is the top priority of Polyhedra and is the exact purpose when the founding team came together to create Polyhedra Network. ZK proofs that verify the underlying consensus, including the full consensus of Ethereum, have been conquered through deVirgo, and a large number of chains that Polyhedra zkBridge supports, which use BFT-style PoS, are even easier.

However, we find the necessity to add TEE in our product line to offer better user experience: lower cost, faster finality, non-blockchain interoperability, and privacy.

Lower cost

Polyhedra has been working on various approaches to lower the cost for bridges. This cost happens due to the cost of proving and verifying the ZK proof on the destination chains, which vary depending on the chains, and Ethereum tends to have a higher cost. The chief approach that Polyhedra has used in production to lower this cost is through batching, in which “block synchronization”, a step fundamental in our ZK bridge, is performed after every a few blocks, the number of which is dynamically adjusted based on demand. 

However, there are always “idle hours” on certain chains when a user is the only user sending the tokens during those five to ten minutes. To make sure that the user doesn’t need to wait forever, zkBridge must consider the block ready, product its proof, and get it verified in the other chains. This cost is either charged directly from this user when making the transaction or amortized out to other users’ cross-chain transactions’ fees.

For larger transfers across chains, this cost may be unavoidable for security. But this could seem unnecessary for small transfers where the bridge operator (here, Polyhedra) can simply take the risks and front the capital using Polyhedra’s own liquidity, up to a certain limit.

Other than zkBridge, Polyhedra has also been working to lower the cost for zkML, both on the proving cost and the verifying cost. The Expander library—our flagship product that has been used by other teams as well for ZK machine learning—has been making a lot of progress, including vectorized and parallel computation and GPU. The proof system also has received multiple improvements that reduce the proving cost. Our EXPchain has been working to deploy a precompile for verifying the zkML proof that should be ready in the next testnet, so proofs can be efficiently verified on EXPchain (and many other chains in the future).

However, although Polyhedra can prove Llama with 8 billion parameters, its proof generation is not “instant”. When a user uses a larger model with even more parameters, or models for image or video generation, the proof still can be generated, but it would be fairly long. We have been focusing on AI agents that run the AI models in an optimistic manner, where the user, if suffering a loss due to malicious execution, can use a ZKML proof to slash the operator for compensations that cover the users’ losses, and the proof is not required unless the user challenges. Proving cost is acceptable. The cost is mostly on the operator, who needs to lock in capitals that need to be sufficient for the insurance, while the operator’s money could be spent somewhere else, such as investment.

Therefore, it would be useful to have an option for users who run really large models, need additional security assurances, or simply want a lower cost, to enjoy the benefits of safety with on-chain AI agents including trading bots. Having another layer of security, such as TEE, can also reduce the cost of the insurance, as breaking the system also requires breaking TEE.

Fast finality

Another thing Polyhedra has been working on is to achieve fast finality. This is mostly for rollups that take a long time to settle on Ethereum L1, while the zkBridge’s consensus proof has to “see” things on Ethereum L1 to inherit the Ethereum security guarantees. 

This immediately becomes an issue for optimistic rollups since they typically have a withdrawal period of 7 days, which would be unacceptable for most users. For zkRollups, it can also be an issue, as many rollups settle transactions to L1 only after every 30 minutes to 10-12 hours.

To tackle this problem, Polyhedra uses a state committee that is aggregated using ZK proofs for cross-chain interoperability with Arbitrum and Optimism. The same technology was also deployed for opBNB. This approach uses several machines to run a node for these rollups, mainly to acquire the latest blocks from the official RPC APIs of these rollups. RPC diversity was being considered when available for security and liveness. Each machine produces a signature on the “events” from the bridge contract that should be sent to other chains, and the signatures are aggregated into one single proof to be verified on-chain. The design choice to use signature aggregation is to support a large number of nodes.

State committee has been in production for about one year. However, state committee—ZK aggregated signatures—doesn’t provide the same level of security guarantees as a ZK proof for full consensus, and therefore the fast finality mechanism for it would have to be restricted to small-amount token transfers. For larger amounts, we generally advise users to use the official L2 to L1 bridge.

Fast finality for ZKML would be desired, especially when instant execution is required, such as in the case of AI trading bots. This is why we are thinking about adding TEE as a solution for our verifiable AI stack where the AI inference comes from a machine with TEE. We can either leverage the model garden in Google’s Vertex AI for generative models, to prove that the model output is a direct result of using Vertex AI API call, or to use TEE to prove that the model output is the result from the official ChatGPT or DeepSeek API services. This requires us to trust Google, OpenAI, or DeepSeek, but we feel that this is a mild assumption to make. 

If someone wants to bring one’s own model, we can run it on a TEE-enabled Nvidia GPU instance, recently supported in Google. As we mentioned, this can be made in addition to ZKML proofs, which are generated either when challenged or with a delay. For example, if we have an insurance policy for AI trading bots or AI agents, the operator can generate the ZKML proofs before the policy reaches its cap to “free up the security deposit” therefore allowing the AI trading bots to process more transactions within the policy limit.

Non-blockchain interoperability 

Polyhedra has constantly been exploring ZK technologies on non-blockchain applications, such as proof of reserve, which produces a proof on the database of a centralized exchange (CEX) with privacy guarantees. We also have been looking into interoperability between blockchain and non-blockchain, for example, price oracles for stock, gold and silver, which can be useful for AI trading bots and Real-World Assets (RWA), on-chain identities from social logins, such as Google sign-in and Auth0 sign-in.

There are two types of verifiable data. One is JSON Web Token (JWT) authenticated data, which can either be directly verified on EVM (with a somewhat high gas cost) or wrapped in a ZK proof—the approach that Polyhedra adopts. The other one is data transmitted in TLS connections. We can use ZK-TLS to prove data in such TLS connections, but it introduces an assumption that one needs to trust the MPC nodes that share the secrets of that TLS connection, and the performance of ZK-TLS is currently suitable for simple web data such as API call responses, the cost of which goes higher for complex HTML websites and PDF files.

This is where TEE, and more specifically, our ZK-TEE proofs, can weigh in. We can run a TLS client inside TEE, perform the operations, produce, through Google Confidential Computing, a TEE proof, and finally convert this into a ZK-TEE proof to be posted on-chain.

Since this TLS client is general-purpose and runs rather efficiently, it can handle almost arbitrary TLS connections. 

  • Access Nasdaq websites for stock prices
  • Operate, on behalf of the user, a trading account to buy and sell stocks
  • Transfer fiat currency through online banking, effectively “bridge with your bank account”
  • Search and book flights and hotels for a user
  • Get real-time crypto prices from multiple CEX and DEX

Non-blockchain interoperability also has significant values for AI applications. Today, LLM models not only use the inputs from the users, but also use search engines and more generally, LangGraph and Model Context Protocol (MCP). TEE can enable us to verify that these external data sources are trustworthy. For example, if the AI agent needs to solve a math problem, it can run applications (which could be Wolfram Mathematica for solving mathematical problems) or invoke a remote API service (such as Wolfram Alpha APIs).

Privacy

Currently, zkBridge doesn’t connect to a privacy chain, and the ZK proofs in zkBridge are primarily for security. Privacy, however, soon becomes an important feature for AI applications, including on-chain ones like AI agents and AI trading bots. There are several examples use cases that we consider.

A key use case for ZKML is to prove correct inference on private models. These models keep their parameters private, and users are not expected to know the parameters. Sometimes, the layout or the structure of the models, which is usually business secrets, also remains hidden. Private models are very common: ChatGPT from OpenAI is private, Claude from Anthropic is private, and Gemini from Google is private. There are many reasons why many of the most advanced models are all closed source for now since they need to earn revenue to cover the training costs and research, and we expect this to continue being the case for at least a few more years. 

However, while private models have a reason to remain private, users may demand stronger traceability and verifiability guarantees, especially in a fully automated environment while the model output is immediately used for on-chain actions, such as buy/sell of tokens, and especially if such on-chain actions involve large amounts of capitals.

ZKML solves this problem by allowing private models to prove that the same model was used for benchmark and for this time of inference. This is fundamental and particularly useful for AI trading bots, where users pick a model by looking at its performance over historical data (called “backtesting”), and users expect that the model uses the same data later on, and the users do not require to know the model parameters. ZKML can provide such model privacy.

We explore TEE as TEE can provide another level of privacy—user input privacy—that ZK alone cannot provide. ZKML requires the prover to have all the information necessary to create the proof, including the model parameters and the user inputs. It is not impossible to combine ZK with MPC, but doing so for large models is prohibitively slow, as the model inference as well as the entire prover needs to be run inside MPC, and MPC isn’t perfect either—one needs to trust that the MPC nodes don’t collude with each other. This, however, can be solved with TEE. 

TEE also helps with privacy on MCP servers. Verifiable MCP marketplace, which Polyhedra is actively working on, will showcase a list of MCP servers that provide verifiability, traceability, and safety through either ZKP or TEE. When the model is evaluated in Proof Cloud with TEE, and the model uses only the verifiable MCP services in Polyhedra’s marketplace with a “privacy” tag, we can guarantee that the user inputs always remain in TEE and are never revealed.

How TEE works?

We have already discussed the vision of Polyhedra and how TEE can play an important role, together with zero-knowledge proofs, in our line of product offerings. Now, we dive into more details about TEE. 

TEE builds an “enclave” where computation and data inside the enclave is fully protected. This is, however, just the first step. The most amazing property of TEE is public verifiability, through a mechanism called remote attestation.

Remote attestation works as follows. When an enclave was created, the CPU takes a measurement of the enclave, which includes the binary code of the executable programs in the enclave. CPU can later generate a publicly verifiable proof, through AMD Key Distribution Service (KDS) or Intel's Attestation Service (IAS)

This publicly verifiable proof contains a signature and a chain of certificates, where the root certificate is either AMD’s or Intel’s root certificates. If the proof uses a certificate that can be validated by the root certificates, we can be certain that the computation is done in such an enclave by the AMD or Intel chips with TEE technologies. Then, we can check the content of the signature, which contains information related to the program and other information (such as model outputs). By verifying this proof, one can check if the program running inside the enclave is what we desire.

This proof, most importantly, can be verified on-chain through our ZK-TEE proofs, by wrapping the TEE proofs into a smaller proof that can be efficiently verified. We will soon show how we use it to secure multiple products in Polyhedra, using zkBridge as an example.

SGX, SEV, and TDX

Polyhedra did an extensive study on the different kinds of the TEE technologies available today. We focus on three TEE technologies. 

We now compare these technologies and share our thoughts on how to choose.

SGX. SGX has been available for the longest, but among Google, Azure, and AWS, only Azure supports SGX, while they all support SEV and TDX.

In SGX, the Intel CPU works with the enclave—a memory area—directly. The CPU provides memory protections of this memory and certifies the correctness of the binary code running inside the memory through REPORT, which measures the enclave. It is low-level as the CPU operates directly on this memory area, and a developer needs to make sure that its code and data would be (1) at the same, reproducible state when the enclave is created and (2) it serves as the root of trust and doesn’t involve untrusted external data or code without verification. 

This low-level nature has been unfriendly to developers for almost a decade. In the early days, enclave programs were almost exclusively written in C/C++. Not all operating system features, such as multithreading, can be supported, and significant code changes (including code changes to a software’s dependency) were usually needed. As a result, in recent years, applications on SGX are often executed over a “virtualized operating system”, e.g., Gramine, but it still requires a lot of care and modifications especially if one desires a better performance. Even today, Gramine sometimes still requires the developers to be careful that certain dependencies, which could be well-used Linux libraries, can break things because Gramine isn’t an exact replica of the regular OS.

Nevertheless, there is an alternative, using SEV or TDX, that avoids the implementation challenges of SGX altogether.

SEV and TDX. In contrast to SGX that protects a small piece of memory called enclaves, SEV and TDX secure an entire virtual machine running on an untrusted host. A rationale for this design choice is that modern cloud services, such as Google Cloud, are running (bare metal) hypervisors that spawn the different compute nodes, from different users, on the same machine. These hypervisors extensively use CPU hardware virtualization technologies, such as Intel Virtualization Technology (Intel VT-x) or AMD Virtualization (AMD-V), and software-only virtualization has long been deprecated due to poor performance.

This means that, in cloud computing, CPU is already aware of the hypervisors and the VMs that are running over it. It is the CPU that provides the necessary software and instruction isolation that prevents one VM from accessing data of another VM, makes sure that they get their fair share of the compute resources, and separates the network and disk accesses of different VMs. To some extent, hypervisors are increasingly become a software layer that serves as a frontend, while CPU does the actual work in managing the virtual machines.

As a result, adding an enclave over these virtual machines spawned on the cloud becomes natural and straightforward, and this is exactly what SEV and TDX are designed for.

  • They add memory encryption with integrity protection, so that even if the hypervisor is malicious, the hypervisor doesn’t see or modify the data in the virtual machines.
  • They add remote attestation to the virtual machines, specifically through the Trusted Platform Module (TPM), which measures the boot state of the virtual machine and can generate signatures attesting to such state.

There is, however, an important challenge in using SEV and TDX right, and it has been a very common pitfall in real-world deployments, in that the TPM, by default, only measures the boot sequences of the operating system of the virtual machine, and it doesn’t capture anything about the application running on this operating system. 

There are two ways to make sure the attestation includes any possible application that could be executed on this operating system: (1) hardcode the application into the OS or (2) use Google Confidential Space.

The first approach, hardcoding the application into the OS, is done by making sure that the VM with SEV and TDX boot sequence is booting to a security-enhanced, hardened OS that can only run the application that we desire, and importantly, nothing else can be executed in this OS. There are many ways to implement this, but we recommend this method from Microsoft using dm-verity, in which during the boot process, the operating system mounts a read-only disk authenticated by a fixed and public disk image hash, and it mounts nothing else. Therefore, the only program that can be executed are the programs in this read-only disk, and a user can verify that the OS was programmed and configured in this way—through AMD KDS or Intel IAS.

The complexity of the first approach is to structure or refactor your programs so that they can live on a read-only disk image, and if they need a temporary writable storage, they can use a memory storage, or an external storage with encryption and integrity protection. Then, a clean disk image with this application, as well as the image for the OS, should be packaged along with the OS kernel, following the structure of a Unified Kernel Image (UKI). This is doable, but it could be a little bit complicated. 

The second approach, which we focus on, is to use Google Confidential Space, which is a managed solution that implements the same idea from the first approach, but Google offers a streamlined implementation, and developers only need to create a Docker container image. In a future blog post, we will share more details on our solution using Confidential Space, including key management.

Conclusion

Now, how does Polyhedra use TEE in our product?

For bridges, Polyhedra will run extra security checks in addition to the existing ZK proofs or state committee, by either running a light client (if this is available) or checking with several normative RPC API services for the corresponding chains.

For ZKML, Polyhedra may be running a TEE proxy that calls the Google Vertex API or external AI API services for inference and certifies that the model output comes from Vertex API without modifications or directly runs the AI model (if the model is not within Google’s model garden) using Confidential Computing with Nvidia GPU. Note that privacy is a byproduct in this approach. We can easily hide the model parameters, inputs, and outputs.

For verifiable AI marketplace including MCP servers, we are performing a similar approach, by either running a TEE proxy or running the application directly (if possible). For example, if we want a math-solving MCP service, we can either build a TEE proxy that connects to Wolfram Alpha or run a local copy of Mathematica. There are situations when one has to use a TEE proxy, such as interacting with flight ticket booking systems, Slack, or search engines. Note that TEE can also turn a non-MCP-compliant service (e.g., any web2 API) into an MCP-compliant one by translating the schema and format within the proxy.

We are quickly pushing these products, starting with bridges, and we expect that the introduction of TEE into Polyhedra’s tech stack can reduce the user cost, speed up finality, interoperates with more ecosystems, and provides new privacy features for users.