Data Smith

The Data Blog

Musings and ramblings on a variety of topics.

The power of ... tc

How to simulating network latency and packet loss, without purchasing expensive network hardware.

Greg Smith

3 minute read

This is a short and hopefully useful blog post on adding network latency and packet loss to nodes for testing the effect of latency and packet loss in a small local test environment. The post assumes that you already have a set of nodes running in a test environment and focusses introducing simulated latencies and other network phenomena using the unix ‘tc’ tool.

Check out my other posts on ansible for some great playbooks that I have created for spinning up infrastructure and databases in AWS.

The ‘tc’ utility

For simplicity I am using Ubuntu for the following steps, but the same approach would also work on your distro of choice. The ‘tc’ utility is extremely powerful, but also very daunting. In this section I will scratch the surface of what is possible with some examples that I use to simulate typical network issues. At a high level tc (Traffic Control) allows you to place various types of throttles/schedules on the kernel packet scheduler.

Simulating network latency

Most instances typically only have a single network interface (eth0) and for these first examples we are going to operate on that interface. The following commands are all executed as root (thanks to sudo).

The following command will add 250ms of latency with 40ms of jitter in a normal distribution on the eth0 interface. The last 3 arguments are optional, but demonstrate some of the power of the tool.

tc qdisc add dev eth0 root netem delay 250ms 40ms distribution normal

Once we have some latency it would be nice to change the traffic control and add in some packet loss. The following command will add 8% packet loss.

tc qdisc change dev eth0 root netem loss 8%

Note, the change took effect immediately, but we no longer have any latency. To have packet loss and latency we need to ‘change’ both

tc qdisc change dev eth0 root netem delay 250ms 40ms distribution normal loss 8%

Obviously, this affects all traffic to this node, so be careful, you could stop yourself connecting to the node. Also once we have added a device we can change the device settings. We can also remove all the traffic shaping by deleting the device.

tc qdisc del dev eth0 root netem

I sometimes find it useful to place a set of commands with appropriate sleeps in a shell script and then just ‘nohup’ the shell script and run through a playbook containing my failure scenario.

#!/bin/bash
# Add a delay and let the communication stabilise for 5 mins

tc qdisc add dev eth0 root netem delay 200ms 10ms distribution normal
sleep 300

# start a network issue (bit of latency and jitter, with packet loss)

tc qdisc change dev eth0 root netem delay 230ms 40ms distribution normal loss 1%
sleep 60

tc qdisc change dev eth0 root netem delay 230ms 40ms distribution normal loss 10%
sleep 60

tc qdisc change dev eth0 root netem delay 230ms 40ms distribution normal loss 90%
sleep 300

tc qdisc change dev eth0 root netem delay 230ms 40ms distribution normal loss 70%
sleep 60

# End the network issue and see how the communication stabilises

tc qdisc change dev eth0 root netem delay 200ms 10ms distribution normal loss 1%
sleep 60

# Remove any traffic shaping
tc qdisc del dev eth0 root netem

The limitation with this approach, is because we are using eth0 we are placing a delay to all other nodes in the cluster. To simulate a true WAN level failure we need to create good connectivity between some nodes and bad connectivity to other nodes. For that we need some more advanced features of ‘tc’ to create virtual network adapters, which I will discuss in a subsequent post, when time permits.

Recent posts

About

This is my personal blog space that I use to write about the many different things that interest me. To find out a little more about me click on the link