Simulator
How we simulate protocol executions.
Having talked about our model for virtual protocol executions and about how we specify protocols, we can proceed with the description of the network simulator. Like before, we use Python as pseudocode to describe important details. The code snippets are optimized for readability not speed. The real simulator is implemented in a compiled language to achieve good performance.
Inputs
The simulator takes as input the following assumptions about the network and participating nodes.
n: int
defines the number of nodes, range(n)
addresses the
individual nodes.
neighbours(i: int) -> int list
defines the outgoing network
connections of each node. If node 42
decides to share a block, the
simulator will send this block to the nodes returned by
neighbours(42)
.
message_delay(src: int, dst: int) -> float
defines the message
propagation delays. If node 42
sends a block to node 7
at time t
,
the simulator will deliver the block to node 7
at time t + message_delay(42, 7)
. The function does not have to be
deterministic; it can model probability distributions. If the function
returns float('inf')
the message will not be delivered.
roots
and validity
define the structure of the blockchain as
discussed in the section on protocol
specifications. The simulator rejects
invalid blocks.
node(i: int) -> tuple[init, update, mining]
defines the behaviour of nodes
as discussed in the the section on protocol
specifications. In the simplest case, when
all nodes are honest, node(i)
for all i
returns the three functions
init
, update
, and mining
as defined in the protocol specification.
In attack scenarios, node(i)
returns nonconforming functions for the
malicious nodes.
mining_delay() -> float
and select_miner() -> int
model
proof-of-work difficulty and hash-rates of the nodes. mining_delay()
defines the time between consecutive mining events. It is typically a
random function returning independent and exponentially distributed
values. select_miner()
defines which node is successful at a
particular mining event. It is typically a random function which selects
nodes based on their relative hash-rate.
import nakamoto # protocol specification
from numpy import random
n = 7
def mining_delay():
return random.exponential(scale=600)
def select_miner():
hash_rates = range(1, n + 1)
return random.choice(range(n), p=hash_rates)
def neighbours(i):
return [x for x in range(n) if x != i]
def message_delay(src, dst):
return 6
def roots():
return nakamoto.roots()
def validity(block):
return nakamoto.validity(block)
def node(i):
return (nakamoto.init, nakamoto.update, nakamoto.mining)
n = 7
nodes following the Nakamoto
consensus protocol. All messages are
delayed for 6 seconds. The block interval will be about 10 minutes.
Hash-rates are chosen arbitrarily.Concurrent Programming
We describe the simulator as a concurrent program. The program does not use parallelism—all computations happen sequentially in a single thread. We introduce two primitives to delay function evaluation.
delay(n, fun, *args)
schedules the evaluation offun(*args)
inn
seconds. While waiting, other computations may happen concurrently.delay_until(prop, fun, *args)
schedules the evaluation offun(*args)
as soon asprop()
returnsTrue
. Whileprop()
returnsFalse
other computations may take place concurrently.
Consider the following example program which prints the letters a to
f in order. We invite the curious reader to take a look at the
complete program, including
an implementation of the two primitives delay
and delay_until
using
Python’s asyncio
.
def f(x):
f.count += 1
t = time.time() - f.start
print(f"at second {t:.0f}: {x}")
f.start = time.time()
f.count = 0
f("a")
delay(2, f, "d")
delay(0, f, "c")
delay(1, delay, 2, f, "f")
delay_until(lambda: f.count >= 4, f, "e")
f("b")
## expected output:
# at second 0: a
# at second 0: b
# at second 0: c
# at second 2: d
# at second 2: e
# at second 3: f
delay
and
delay_until
.Block DAG and Visibility
As discussed before the simulator will maintain a DAG of all blocks mined or appended by any nodes. Individual nodes however have only a partial view on this DAG. In our program, we set local views as follows.
dag = DAG()
...
# in this context dag represents the global view on _all_ blocks
with dag.set_view(i):
...
# in this context visibility of parents and children is restricted
# according to the local view of node i
...
# in this context dag represents the global view on _all_ blocks
Local views are read-only. The node functions init
, mining
, and
update
cannot append blocks to the DAG. Instead, they return requests
for appending a block to the simulator. We will describe below, how the
simulator handles these requests.
The local view makes the nodes’ id accessible to the node by setting
my_id = i
within the context.
Initialization
We start each simulation with the initialization of the block DAG,
the protocol’s root
blocks, and the nodes’ state.
# initialize empty block DAG
dag = DAG()
# obtain drafts for the protocol's root blocks, convert them into
# actual blocks, and make them visible to all nodes.
root_blocks = [dag.add(b) for b in roots()]
for b in root_blocks:
for i in range(n):
dag.make_visible(b, i)
# nodes' state
state = [None for _ in range(n)]
for i in range(n):
init, _update, _mining = node(i)
with dag.set_view(i):
state[i] = init(blocks)
Proof-of-Work
After initialization, the simulator starts a concurrent loop which models proof-of-work and mining. The loop repeatedly samples and waits for a mining delay, selects a random node as successful miner, obtains a block draft from this miner, converts the draft into a block, checks block validity, and informs the miner about the freshly mined block.
def proof_of_work():
# select random miner
i = select_miner()
# obtain block draft from miner
_init, _update, mining = node(i)
with dag.set_view(i):
draft = mining()
# convert draft to block w/ proof-of-work
block = dag.add(draft)
block.has_pow = True
# enforce validity of all appended blocks
if validity(block):
# inform miner about freshly mined block
deliver(block, i, "proof-of-work")
else:
dag.remove(block)
# continue proof-of-work loop after random delay
delay(mining_delay(), proof_of_work)
# enter proof-of-work loop
delay(mining_delay(), proof_of_work)
Block Delivery
Block delivery is what we call informing a node about a new block. From the perspective of a node, new blocks might be freshly mined locally, just appended locally, or received from the network.
We handle these deliveries in the deliver
function. The function makes
the block visible to the node, applies the node’s update
function, and
handles the returned instructions to share existing or append new
blocks.
def deliver(block, i, event):
# update local visibility
dag.make_visible(block, i)
# simulate node i's action
_init, update, _mining = node(i)
with dag.set_view(i):
ret = update(state[i], block, event)
# store updated state
state[i] = ret.state
# handle communication
for msg in ret.share:
broadcast(i, msg)
# handle appends w/o proof-of-work
for draft in ret.append:
append(i, draft)
The protocol specification
assumes that blocks are delivered in order, that is, that the parents of
a delivered block have been delivered before. For our previous use of
deliver
in the proof-of-work loop, this invariant is trivially true.
The following wrapper ensures in-order delivery for messages received
from the network.
def deliver_in_order(block, i, event):
def deliver_once(block, i, event):
if not dag.is_visible(block, i):
deliver(block, i, event)
def all_parents_visible():
for parent in block.parents():
if not dag.is_visible(parent, i):
return False
return True
delay_until(parents_visible, deliver_once, block, i, event)
Communication
Nodes may request block broadcasts from their update
function. The
simulator’s delivery
function forwards these
requests to the broadcast
function below. For each of the sending
node’s (src
) neighbors, this function samples a message delay (t
)
and—after waiting for delay—delivers the block to the receiver
(dst
).
def broadcast(src, block):
for dst in neighbours(src):
t = message_delay(src, dst)
delay(t, deliver_in_order, dst, block, "network")
Non-PoW Blocks
Nodes may request block appends from their update
function. The
simulator’s delivery
function forwards these
requests to the append
function below. Without delay, the function
appends the block draft to the DAG, ensures validity, and delivers the
new block to the requesting node. Block’s originating from this
mechanism do not carry a proof-of-work.
def append(i, draft):
block = dag.add(draft)
block.pow = False
if validity(block):
deliver(block, i, "append")
else:
dag.remove(block)
Discrete Event Simulation
Above we suggest to use concurrent
programming to implement the scheduling primitives delay
and
delay_until
. If implemented naively, the simulator would spend most
the time waiting for the delays or checking properties becoming true.
A delay of n
seconds would actually take n
seconds to
simulate. Simulations would be real-time, but real-time is
inconveniently slow in the proof-of-work blockchain context.
We speed up the simulation by skipping the delayed time instead of waiting for it to pass. The approach is called discrete-event simulation. The trick is to maintain a time-ordered queue of future events and define a loop that handles the scheduled events one after another until none are left. In the context of this simulator, future events are delayed function calls. Handling an event is as simple as evaluating the call. Each call may schedule further delayed calls. The time-ordering of the queue ensures that the calls happen in the same order as they would with naive waiting.
from queue import PriorityQueue
time = 0
queue = PriorityQueue()
props = []
def delay(seconds, fun, *args):
queue.put(time + seconds, fun, args)
def delay_until(prop, fun, *args):
prop.append(prop, fun, args)
# simulator code goes here
...
while not queue.empty():
# dequeue and handle next event
time, fun, args = queue.get()
fun(*args)
# re-evaluate delay_until properties
next_props = []
for prop, fun, args in props:
if prop():
fun(*args)
else:
next_props.append(prop, fun, args)
props = next_props
Note that using delay_until
can cause significant overhead. We
introduced this scheduling primitive to improve readability of the
simulator pseudo-code. We completely avoid this kind of scheduling in
our optimized simulator implementation.