The Solana blockchain’s 17-hour outage, which resulted in the network having problems validating transactions, exposes critical vulnerabilities in the way Solana counts and handles transactions, Douglas Horn, Chief Architect of Telos Blockchain on Tuesday said.
Solana, on its official website, has stated that while the network was offline for 17 hours, no funds were lost, and the network returned to full functionality in under 24 hours and that Solana is designed for adversarial conditions.
“The cause of the network stall was, in effect, a denial of service attack. At 12:00 UTC, Grape Protocol launched their IDO on Raydium, and bots generated transactions that flooded the network. These transactions created a memory overflow, which caused many validators to crash forcing the network to slow down and eventually stall. The network went offline when the validator network could not come to an agreement on the current state of the blockchain, which prevented the network from confirming new blocks,” Solana stated.
It further added that at 12:11 UTC, the validator community noticed the transaction spike and network slowdown and the community took steps to help the network recover but were unsuccessful. “These transactions flooded a system known as the forwarder queue, causing the memory used by this queue to grow without limits. The transactions that were encoded into blocks were resource-heavy to process. The combination of the unbounded growth of the forwarder queues and resource-heavy blocks caused block producers to automatically propose a number of forks. The validator processes started to run out of memory and crash, and upon restart, the validators were unable to process all the proposed forks in time to catch back up with the rest of the network,” Solana stated.
Reacting to Solana’s statement, Horn said blockchains should never stall if designed well and that a look at Solana’s purported transactions per second illuminates the key flaws in Solana's blockchain.
“A great portion of Solana transactions is not the user or smart contract transactions that networks like Ethereum (15 TPS) and Telos (10,000 TPS) process but instead include Solana’s thousands of critical consensus messages required by the chain. These processes are typically handled separately from on-chain transactions via a distinct communications channel--for good reason. This differing design results in seemingly amazing scalability claims, which are entirely misleading. It's this design that forms a large part of why the Solana chain was locked up for over 12 hrs,” Horn said.
He added that mixing critical consensus messaging with regular transactions not only results in inflated TPS numbers but more critically exposes a large surface area of attacks on the blockchain.
“Also, the chain's lack of prioritization capabilities means when the Solana transaction queue becomes flooded, critical consensus message processing is displaced causing a lack of syncing between nodes and eventually the forking that resulted in a stalled network. The best avoidance of this and other potential issues would be not to have these transactions mixed together. Nonetheless, prioritization of transactions was a factor yesterday that could have helped avoid Solana stalling from this specific exploit,” Horn said.
Horn further said that the Solana fee model appears to be “immature in design” making it possible for someone to affordably flood the chain with too many transactions and blocking the critical consensus messaging from occurring in time. “Again, separation of concerns is the most secure way to avoid this but allowing so many transactions to pass through without significant cost was a huge contributor,” he said.