In my job, I have to learn a lot about multiple topics of IT. Especially in the virtualization market, many people are usually well prepared when it comes to infrastructure, servers and storage, but I found out that the weakest point is many times networking. There are some topics that are obscure for many, like Layer2 vs Layer3, BGP, Spanning Tree, but there is one topic that is really important even for infrastructure guys, yet it’s almost unknown to them: Latency.
What is latency?
As fast as it can be, even light speed is not infinite. It takes time to a photon to go from point A to point B, and for example the light generated by the Sun takes 8 minutes to reach the Earth. Our computer networks are not even close to that speed, because first of all usually our connections are not made with optical fiber (which would eventually be as fast as light) but with electric cables and so transmissions over these media are slower, but also because there are many appliances between source and destination that need to manipulate the packet. The source host, switches, routers, firewalls, the target host; each “hop” adds time to the total time that a packet needs to reach its destination. We can design this situation like this:
(source: https://stackoverflow.com/questions/8682702/how-to-calculate-packet-time-from-latency-and-bandwidth )
So, simply said, latency is the time that it takes to a packet to go from source to destination.
In a local network it’s easy to apply some approximation and declare that the processing delay is 0, and also that the multiple switches in between two hosts don’t add additional latency. After all, when we run a ping command to check if a host is connected, latency is always “below 1 ms” :
For this reason, it’s not uncommon to calculate the maximum transfer speed by simply looking at the available bandwidth:
– 1 Gb ethernet link , divided by 8, gives me 125 MBps
– 10 Gb ethernet link means 1250 MBps
and so on. And if my bandwidth is 125MBps, it means I can transfer 125MB every second.
Why latency is important?
Because to ignore it, together with other parameters, leads to false results!
Look at the previous example. Even on a local network, where latency is close to zero, there are other parameters that may impact the final speed. One above all: TCP Window size. I’m not going to repeat what has been already written in a perfect way by others, so if you want to learn more, read this post by Brad Hedlund. What’s the catch? The link speed we usually talk about is the pure cable link speed. But on top of it, we need to run multiple protocols, one above the other, like TCP over IP. TCP splits data into packets, and the size of the packets is dictated by the TCP Window size: bigger this value, more data can be transferred in a single transmission. Then, a payload is packed inside a packet, so there are additional bytes for each packet that nave to be transmitted, even if they do not contain any data (there is also some overhead for the underlying ethernet frame, think about all the discussion in regards to Jumbo Frames). Finally, TCP latency plays its part, because I can only transmit the next packet once the previous one has reached its destination, because the link is otherwise busy transferring the other packets.
Latency and Window size become paramount when we move from local networks to public networks. Here, the <1ms value goes away, and we have higher values to take into account. Higher is the latency, smaller is the maximum “real” bandwidth I will see. Let’s take an easy example: a customer has 100 Mbps link to Internet, and needs to transfer a 1TB file to his service provider.
The usual theoretical calculations would be simple:
100 Mbps = 12,5 MBps
1 TB = 1000 GB = 1000000 MB
1000000 MB / 12,5 MBps = 80000 seconds = 1333,33 minutes = 22,22 hours or (d:h:m:s): 22h:13m:20s
But if you try to send this file to your provider, it will NEVER take this time to complete, unless you and your service provider are connected to the same ethernet link; which means, you are not using Internet at all!
How to properly calculate transfer speed
I calculated the previous value down to the exact seconds using this nice tool:
http://wintelguy.com/transfertimecalc.pl
If you look at it however, you will see the same “approximation error” I talked about previously: only size and bandwidth are taken into account. No window size, and no latency. But the WintelGuy website has more amazing tools, and one is exactly what we need:
http://wintelguy.com/wanperf.pl
In this one, you can see that every important parameter is listed and used for the calculations. Let’s repeat the same calculation we did before, but now with some new information:
We added 40ms latency, and accepted the other two default values (Packet loss and MTU). We don’t have any chance to change MTU over an Internet link, as there are many appliances between us and our service providers that are not under our control. This is one of the reason why many Telco offer to their customers MPLS private links instead of VPN links over public internet, because the connection settings can be controlled and tuned by the provider (well, there’s a bit of lock-in too, but this is another story…). Here, you see that TCP overhead and latency are already affecting the real maximum throughput, that falls down to 94.9 MBps. I’d say however that this is a really good situation, and it could be worse: let’s keep every other parameter as before, and increase the latency to 150ms:
Throughput is decreased to 77.8Mbps, a loss of 22% of the theoretical speed. And it can be even worse: an ADSL connection for example has more packet loss, so if we keep 150ms of latency, but we increase packet loss by 10 times, we obtain this (ignore the fact that no ADSL can go to 100 Mbps, it’s done to keep the same speed across all the examples):
Packet loss has been increased from 0.0001 (1 lost packet for every 10000 transmitted) to 0.001 (1 lost packet for every 1000 transmitted), and this value alone has decreased our maximum speed by 75%!!!
So, next time you see your new shiny Internet line not performing as expected, before blaming your service provider or the software you are using to transfer those data, have a better look at your network. You may find that those 25Mbps is the fastest speed you can get.