Checking Latencies

The primary tool for checking latencies is ping. This program is a standard part of all Linux systems, and its basic use is fairly straightforward: Type the program name followed by a hostname or IP address. You can also use a number of parameters, such as -c, which sets the number of packets the program sends. The result is a series of checks of the remote system (which you terminate by pressing Ctrl+C if you didn't use -c): $ ping -c 3 www.linux.org

PING www.linux.org (198.182.196.56): 56 data bytes

64 bytes fromwww.linux.org (198.182.196.56): icmp_seq=1 ttl=43 time=99.1 ms 64 bytes fromwww.linux.org (198.182.196.56): icmp_seq=2 ttl=43 time=84.4 ms 64 bytes fromwww.linux.org (198.182.196.56): icmp_seq=3 ttl=43 time=91.8 ms

www.linux.org ping statistics —

3 packets transmitted, 3 received, 0% packet loss, time 2020ms rtt min/avg/max/mdev = 84.431/91.801/99.144/6.006 ms

This test shows latencies in the form of ping times of between 84.4 and 99.1 milliseconds (ms). Unlike throughput, in which higher values are better, lower latencies are superior. As with throughput, latencies can vary substantially from one site to another. Some of the causes of this variation include:

Physical Distance Information cannot travel faster than the speed of light (186,282 miles per second). As the Earth's circumference is 24,902 miles, this means that the theoretical minimum round-trip ping time between two computers on opposite sides of the planet is 134ms. By contrast, the theoretical minimum ping time over a 6-foot Ethernet cable is so small as to be negligible. In practice, distance has a more profound effect than a simple examination of physics and the speed of light would seem to suggest. This is partially because greater distance is associated with a larger number of intervening routers (the next factor in this list), and in part because Internet paths seldom take the shortest physical route.

Number of Intervening Routers Every router that processes a packet takes a certain amount of time to do so. The result is that pinging systems that are separated from you by many routers can take a long time, even if those systems are physically nearby.

Load on Intervening Routers Computers tend to do things more slowly when they're asked to do a lot of work. Routers are nothing more than computers configured to direct network traffic. Many of the routers through which your Internet traffic travels are dedicated to the task of routing—they do almost nothing else. As a result, when Internet traffic increases, the routers slow down. The result can be increased latencies or even lost packets.

Link Types Some connection technologies produce higher latencies than others. The worst technology in this regard is satellite-based Internet access, which uses satellites hovering at 22,282 miles above the Earth. A packet traveling from a computer to that satellite, back down to the receiving station, and then the reply taking the reverse path must traverse at least 89,128 miles, for a minimum theoretical minimum ping time of 478ms. In practice, PPP links over telephone lines also tend to produce quite high latencies. Most dedicated Internet connections (cable modems, DSL, T1 lines, and so on) produce latencies of just a few milliseconds to the nearest router.

To get a feel for the latencies your network and Internet connection produce, try pinging a few systems, such as a computer on your LAN, the first router beyond your Internet connection (that is, the router provided by your ISP), some computers you know are operated in your geographic area, and some more distant systems. Computers on your LAN are likely to produce ping times of 1 ms or less. My own ISP's routers yield ping times in the 10 to 20ms range. Computers geographically near me (in Rhode Island) are closer to 50ms. Those on the west coast of the United States or in England yield ping times of about 100ms. Australia and New Zealand take just under 300ms to ping. Hong Kong, Israel, and Russia all produce ping times of a bit over 300ms. One complication can involve routing paths. For instance, my own ISP routes all my packets through Georgia; therefore, my own latencies to sites in Georgia are about 40ms, compared to 50ms to sites a few miles from me. Also, sites may not be hosted where you believe them to be. For instance, my own personal website (http://www.rodsbooks.com) is hosted by a company in California. Most colleges and universities run their own web servers, so you can try pinging nearby schools' web servers as nearby systems. Once you're familiar with typical ping times for your connection, you'll be able to spot aberrant performers. If the issue is important enough, you may want to try looking for routers that aren't performing as they should.

Note Some sites are configured to not respond at all to pings. Therefore, if you ping a site and it doesn't respond, you can't be sure if the site is down or if it's deliberately ignoring your pings. For purposes of learning about typical ping times, move on to another site. If you're trying to diagnose a problem with that particular site, though, a failure to respond to pings deprives you of some potentially useful information. Locating Flaky Routers

Many Internet connectivity problems can be traced to difficulties experienced by just one router in the path between you and your target site. Such problems may affect all sites, some sites, or even just one site. Using ping, you can identify which sites have latency problems, but ping doesn't tell you where the problem lies in the network. To do that, you need another tool: traceroute. This program traces the route that packets take going to a remote site and reports three latencies for each of these sites. For instance, here's a sample run of this program:

$ traceroute -n www.whitehouse.gov

traceroute: Warning: www.whitehouse.gov has multiple addresses; using 63.209.213.16 traceroute to a1289.g.akamai.net (63.209.213.16), 30 hops max, 38 byte packets

9 209.247.9.173 28.867 ms 26.455 ms 23.185 ms

10 64.159.1.45 41.636 ms 46.747 ms 43.022 ms

11 64.159.3.70 39.151 ms 36.240 ms 41.194 ms

12 63.209.213.16 37.947 ms 46.459 ms 41.738 ms

Tip The -n parameter causes traceroute to report computers' IP addresses rather than their hostnames. Although the hostnames can be informative, the DNS lookups can slow down the process.

In this example, the first router (192.168.1.254) is my own LAN's router, and the second (10.1.88.1) is the router to which my router connects. The block of routers whose IP addresses start with 68 also belongs to my ISP. Subsequent addresses correspond to routers later along the path, until the final entry, which is the target system.

This example traceroute output shows no unusual problems. Most hops reveal a modest increase in latencies, although there are a few larger jumps. The increase in latency moving off of my LAN is quite substantial, for instance, as is the increase from the ninth to the tenth hop. Neither of these increases is truly aberrant, though. If you see an increase of more than 20 or 30 milliseconds, the later router could be overloaded. Another possibility is that the hop spanned an unusually great physical distance. For instance, when I traced a route to a Linux FTP site in Australia, I saw two big jumps in latencies: one increase of about 60ms from a router in Washington, D.C. to one in Los Angeles, and another increase of about 160ms from Los Angeles to Sydney.

Severe problems may show up in traceroute output as asterisks (*) in place of times. These marks denote a packet that wasn't acknowledged—in other words, packet loss. If a router loses many packets, the result is likely to be a severe degradation in throughput and unreliable operation. Although TCP/IP was designed to tolerate a certain amount of packet loss, the phenomenon requires the sender to request that a packet be resent after a period of time (the timeout period). This process takes time, and so degrades throughput.

In most cases, locating a flaky router will not accomplish much. If you happen to control the router, of course, you can investigate further—confirm that it's powerful enough to handle the load it receives, check that it's not suffering from some other problem such as runaway processes or a denial-of-service (DoS) attack, and so on. If you don't control the router, the best you can do is to contact whoever does control it. You can learn who this is by using the whois command:

$ whois 209.247.9.173

This command returns a lot of information, including contacts for whoever controls the IP address. (You can also use a hostname.) Sometimes whois returns two or more entry summaries instead of contact information:

Cox Communications Inc. COX-ATLANTA (NET-68-0-0-0-1)

68.0.0.0-68.15.255.255 Cox Communications Inc. NETBLK-ATRDC-68-1-0-0 (NET-68-1-0-0-1) 68.1.0.0-68.1.127.255

If you see something like this, select one of the network blocks (typically the more specific one, such as NETBLK-ATRDC-68-1-0-0 in this example) and perform a whois lookup on it. If you contact the router's operator, use a technical contact address in the whois output and include your traceroute output. I recommend contacting router operators only if a problem is persistent; many problems are transient and are well known to the router's operators when they occur, so sending them e-mail about such problems won't do any good.

One additional tool can sometimes be useful, or at least educational: Xtraceroute (http://www.dtek.chalmers.se/~d3august/xt/). This program is an X-based tool that displays your network traffic's route on a map in a window, as shown in Figure 19.3. To launch the program, type xtraceroute after installing the package. Unfortunately, information on router locations in Xtraceroute's database is sometimes wrong, so you may see network paths taking bizarre hops around the world.

Figure 19.3: If you want to see how your data travels the globe, use Xtraceroute. Diagnosing DNS Problems

The cry goes out around the office, "The network's down!" E-mail correspondence piles up in the mail server and employees aren't able to browse the Web or transfer files with co-workers elsewhere. This scenario is common, but it's also common for the problem to have an embarrassingly local cause: DNS failures. If your local DNS servers go down, the entire network will appear to be down—if you type an address into a web browser, no page will appear, to name just one symptom.

DNS failures leave certain clues. For instance, when you use ping or traceroute, these tools report back the IP address of the site you are trying to contact before any other output:

$ ping trex.pangaea.edu

PING trex.pangaea.edu (198.168.1.1): 56 data bytes

If that line is present, name resolution is working, at least for the target site. (Remember that /etc/hosts can provide name resolution, although this file normally only holds a handful of names and IP addresses.) If the program hangs before reporting an IP address, chances are good that your DNS server has failed. You can verify the matter by trying to use host, which returns an IP address when given a hostname or a hostname when given an IP address:

$ host trex.pangaea.edu trex.pangaea.edu has address 192.168.1.1

If this command returns an error message, you can be certain there's something wrong with name resolution. This problem can take several forms:

Broken Local DNS Configuration The computer at which you're typing the command may have an incorrect DNS configuration. Review your/etc/resolv.conf file, as described earlier in "Files for Setting Network Options," and verify that it lists valid DNS servers on its nameserver lines.

Broken Network Connection to the DNS Server It's possible that the DNS server is functioning but that a critical network link between your system and the DNS server has come down. The problem could be as simple as a network wire that's come loose, or it could be a damaged switch, router, or some other problem.

Crashed or Malfunctioning DNS Server The DNS server itself might be misbehaving. If you run it yourself, verify that it's working locally. If somebody else operates the DNS server, and if the problem is persistent, contact the operator to report the problem. You may be able to work around the problem temporarily by specifying another DNS server in /etc/resolv.conf. You can also use IP addresses rather than hostnames with many network protocols to bypass the DNS server, but this solution isn't workable for most Internet accesses.

In addition to completely failing, DNS servers can also become sluggish. This can happen because of overloaded DNS servers or because of problems on the Internet generally. For instance, if the DNS server's Internet connection is sluggish, its own outside queries will become slow. DNS servers cache recent accesses, so these sluggish accesses will affect new queries for sites that are seldom visited. The result is that popular sites may come up very quickly, whereas less popular sites may take many seconds to connect, but will respond quickly thereafter.

Tip If you rely on an ISP's DNS servers, and if they don't perform well, you can run a caching-only DNS server locally, as described in Chapter 27. Of course, if your ISP can't manage to run a reliable DNS server, the ISP probably isn't very good, so you might prefer to switch ISPs!

Team LIB

Team LIB

^ previous

0 0

Post a comment