Summary
For much improved transaction volume, latency, and finality, many blockchain systems utilize BFT consensus protocols. In particular, permissioned blockchains enjoy the full benefits of BFT consensus among a small (e.g., a few tens or hundreds) group of distributed nodes. One remaining problem though is the limited scalability of BFT consensus protocols. In particular, the many existing protocols, such as PBFT, Istanbul BFT, have quadratic message complexity to the number of replicas in the system, and this makes the blockchains hard to scale beyond a few tens of nodes participating in consensus. Some newer proposals have attempted to reduce the message complexity via committee selections, threshold signatures, and trusted hardware.
In this proposal, we ask whether the current network infrastructure is properly used for the BFT consensus protocols in permissioned blockchains and investigate if emerging network primitives can further improve the throughput performance. First, we plan to study whether the current Internet infrastructure is properly used for scalable operation of the Istanbul BFT consensus protocol in real large-scale blockchain networks like Klaytn. In particular, we will perform an in-depth evaluation on the effect of the current underlying Internet architecture of the commercial blockchains on their throughput performance. Furthermore, we aim to utilize emerging hardware-based networking primitives, such as programmable switches, to further speed up the message propagation in the Istanbul BFT consensus protocol.
Key Deliverables - Technical report
For our KIR milestone #5, we submit this technical report on the progress of our evaluation of the network message complexity in the current Klaytn network. We have found that the current Klaytn implementation sends many redundant consensus messages in all rounds regardless of whether round changes occur. We believe that this behavior increases the message complexity in Klaytn unnecessarily from O(n^2) to O(n^3) and, thus, we provide our detailed analysis on this problem in this report. To clearly show this problem, we empirically confirm the redundant message transfers in a controlled environment, including decryption and decompression of all consensus messages between all consensus nodes. We make some observations and suggestions for remaining tasks.
1 Investigating Message Broadcasts in Klaytn
1.1 Review of IBFT message complexity
Let us first review the IBFT protocol and its intended message complexity [1].
Figure 1. Intended message complexity of IBFT
Figure 1 illustrates a possible scenario of message transfers between four consensus nodes (CNs). CN1 takes the role of a proposer in this example and initiates a round with a Preprepare message. Once the Preprepare message is broadcast to CN2-CN4, Prepare messages are sent from all CNs (except the proposer) to all other CNs. Commit and RoundChange messages are sent from all CNs to all other CNs. Overall, the Preprepare message type incurs O(n) message complexity while all other message types create O(n^2) message complexity in IBFT.
1.2 Empirical evidence of n^3 message complexity in Klaytn
To measure the message complexity in Klaytn, we set up seven CNs in a controlled environment and count the number of messages of the Prepare message type in each round.
Figure 2. Number of received Prepare messages per round
Figure 2 shows the number of Prepare messages the CN1 node receives at each round. As we discuss in Figure 1, we expect six Prepare messages at the CN1 node per round. However, we observe that about 36 Prepare messages are received at the CN1 node in most cases. This significant difference is hard to explain with the message transfer model of IBFT, as shown in Figure 1.
Figure 3. Distribution of redundant message events
To better understand the main cause of this huge gap between the theory and the practice, we take a look at the more detailed patterns of message receptions at the CN1 node. One conjecture we set up is that there seems to exist many redundant consensus messages exchanges between CNs. For example, in a single round, CN1 sends only one Prepare message to CN2. The original IBFT protocol [1] has no redundant messages in the protocol, as shown in Figure 1. In Figure 3, we show the number of cases for different numbers of redundant messages. Interestingly, out of all 755 unique messages we observe during one experiment with seven CN nodes, only 29 messages are exchanged without any redundant messages. In all other 726 cases, some redundant messages are made and received by the CN1 node. Surprisingly, in the vast majority (e.g., 528 out of 755) of cases, there exist five redundant messages for the already received consensus messages.
Figure 4. Actual message complexity in Klaytn’s IBFT
From the experiments above, we conclude that the Klaytn’s IBFT implementation somehow makes redundant consensus messages. Based on the number of redundant messages (i.e., 6 in most cases when 7 CN nodes exist), we further conjecture the message exchange patterns between CN nodes for the Klaytn’s IBFT, as shown in Figure 4. The best explanation of the behaviors shown in Figure 2 and Figure 3 is that each CN node receives a consensus message for the first time and then broadcasts consensus messages to all other nodes that it believes have not yet received it. Figure 4 visualizes this process of broadcasting redundant messages in red. Interestingly, this implementation model explains our observations in Figure 2 and Figure 3. Moreover, the redundant messages due to this broadcast implementation of the Klaytn’s IBFT easily outnumbers the originally intended IBFT consensus messages.
1.3 SDN-based test environment
Figure 5. Overview of our SDN-based Klaytn network analysis framework
Let us briefly describe our experiment setup for the simple controlled experiment with seven CNs. The seven CN nodes run on separate virtual machines (VMs) and all of their peer-to-peer sessions go over the common software Open vSwitch (OvS). The software switch is connected to a single POX controller.
The POX controller fetches all the session keys between CNs. Whenever CNs roll over their session keys, the key information at the POX controller is also updated. With the current session key information, the POX controller decrypts the consensus messages.
1.4 Decrypting RLPx messages
Figure 6. RLPx message structure
To accurately count the number of redundant messages of the same type, we decrypt and decode the received RLPx messages. Figure 6 shows the RLPx message structure. We use the counter-mode symmetric key decryption with AES. And the Snappy compression is used to decompress messages.
1.4 Code reviews
To better understand the broadcasting behavior in the Klaytn’s IBFT, we review the current Klaytn’s code. All messages handled in HandleMsg are forwarded to GossipSubPeer unless any error occurs. Each node receives the same consensus message at least 1 to 6 times, according to the following flowcharts in Figure 7 and Figure 8.
Figure 7. HandleMsg function in klaytn/consensus/istanbul/backend/handler.go (version 1.8.3)
(Some details are omitted for easier understanding)
Figure 8. GossipSubPeer function in klaytn/consensus/istanbul/backend/backend.go (version 1.8.3)
To investigate where this logic of sending redundant messages has originated, we crawled several open source projects and found that it originates from Quorum 2.0. There exist HandleMsg (quorum/consensus/istanbul/backend/handler.go) and Gossip (quorum/consensus/istanbul/backend/backend.go) functions in Quorum 2.0. We discover that the two functions are almost identical to the ones in the current Klaytn code.
1.5 Impact analysis
It is, in fact, unclear why this broadcasting behavior was first introduced to the IBFT codebase since none of the source codes (both in Klaytn and Quorum) leaves any explanations in the code. One possible reason we can guess is to better resist message drops with redundant messages at the network layer. Since every message is redundantly broadcast to all other nodes, this provides another layer of message propagation which makes the consensus protocol highly robust against message drops.
However, the added robustness to message drops does not fully explain the cost incurred by this broadcasting behavior. With this additional broadcast the message complexity of O(n^2) becomes O(n^3). We believe that more in-depth analysis of tradeoff between message complexity and robustness to message drops should be conducted to fully understand the pros and cons of the consensus message broadcast behavior in Klaytn.
1.6 Remaining tasks
- Evaluate the benefit of having broadcasts
- Evaluate network bandwidth increase due to the broadcast
- Evaluate the effect of the broadcast for larger networks
References
[1] Istanbul Byzantine Fault Tolerance · Issue #650 · ethereum/EIPs · GitHub