Routing Data Structures
The most important function of the IP layer consists of ensuring that packets originated by the host or received by the network interface cards are forwarded toward their final destinations. As you might easily guess, this task is really crucial because the routing algorithm should be fast enough to keep up with the highest network loads.
The IP routing mechanism is fairly simple. Each 32-bit integer representing an IP address encodes both a network address, which specifies the network the host is in, and a host identifier, which specifies the host inside the network. To properly interpret the IP address, the kernel must know the network mask of a given IP address — that is, what bits of the IP address encode the network address. For instance, suppose the network mask of the IP address 126.96.36.199 is 255.255.255.0; then 188.8.131.52 represents the network address, while 110 identifies the host inside its network. Nowadays, the network address is almost always stored in the most significant bits of the IP address, so each network mask can also be represented by the number of bits set to 1 (24 in our example).
The key property of IP routing is that any host in the internetwork needs only to know the address of a computer inside its local area network (a so-called router), which is able to forward the packets to the destination network.
For instance, consider the following routing table shown by the netstat -rn system command:
This computer is linked to two networks. One of them has the IP address 184.108.40.206 and a netmask of 24 bits, and it is served by the Network Interface Card (NIC) associated with the network device eth1. The other network has the IP address 220.127.116.11 and a netmask of 16 bits, and it is served by the NIC associated with eth0.
Suppose that a packet must be sent to a host that belongs to the local area network 18.104.22.168 and that has the IP address 22.214.171.124. The kernel examines the static routing table starting with the higher entry (the one including the greater number of bits set to 1 in the netmask). For each entry, it performs a logical AND between the destination host's IP address and the netmask; if the results are equal to the network destination address, the kernel uses the entry to route the packet. In our case, the first entry wins and the packet is sent to the eth1 network device.
In this case, the "gateway" field of the static routing table entry is null ("0.0.0.0"). This means the address is on the local network of the sender, so the computer sends packets directly to hosts in the network; it encapsulates the packet in a frame carrying the Ethernet address of the destination host. The frame is physically broadcast to all hosts in the network, but any NIC automatically ignores frames carrying Ethernet addresses different from its own.
Suppose now that a packet must be sent to a host that has the IP address 126.96.36.199. This address belongs to a remote network (not directly linked to our computer). The last entry in the table is a catch-all entry, since the AND logical operation with the netmask 0.0.0.0 always yields the network address 0.0.0.0. Thus, in our case, any IP address still not resolved by higher entries is sent through the eth0 network device to the default router that has the IP address 188.8.131.52, which hopefully knows how to forward the packet toward its final destination. The packet is encapsulated in a frame carrying the Ethernet address of the default router.
184.108.40.206 The Forwarding Information Base (FIB)
The Forwarding Information Base (FIB), or static routing table, is the ultimate reference used by the kernel to determine how to forward packets to their destinations. As a matter of fact, if the destination network of a packet is not included in the FIB, then the kernel cannot transmit that packet. As mentioned previously, however, the FIB usually includes a default entry that catches any IP address not resolved by the other entries.
The kernel data structures that implement the FIB are quite sophisticated. In fact, routers might include several hundred lines, most of which refer to the same network devices or to the same gateway. Figure 18-1 illustrates a simplified view of the FIB's data structures when the table includes the four entries of the routing table just shown. You can get a low-level view of the data included in the FIB data structures by reading the /proc/net/route file.
Figure 18-1. FIB's main data structures
The main_table global variable points to an fib_table object that represents the static routing table of the IPS architecture. Actually, it is possible to define secondary routing tables, but the table referenced by main_table is the most important one. The fib_table object includes the addresses of some methods that operate on the FIB, and stores the pointer to a fn_hash data structure.
The fn_hash data structure is essentially an array of 33 pointers, one for every FIB zone. A zone includes routing information for destination networks that have a given number of bits in the network mask. For instance, zone 24 includes entries for networks that have the mask 255.255.255.0.
Each zone is represented by a fn_zone descriptor. It references, through a hash table, the set of entries of the routing table that have the given netmask. For instance, in Figure 18-1, zone 16 references the entries 220.127.116.11 and 18.104.22.168.
The data relative to each routing table entry is stored in a fib_node descriptor. A router might have several entries, but it usually has very few network devices. Thus, to avoid wasting space, the fib_node descriptor does not include information about the network interface, but rather a pointer to a fib_info descriptor shared by several entries.
22.214.171.124 The routing cache
Looking up a route in the static routing table is quite a slow task: the kernel has to walk the various zones in the FIB and, for each entry in a zone, check whether the logical AND between the host destination address and the entry's netmask yields the entry's exact network address. To speed up routing, the kernel keeps the most recently discovered routes in a routing cache. Typically, the cache includes several hundreds of entries; they are sorted so that more frequently used routes are retrieved more quickly. You can easily get the contents of the cache by reading the /proc/net/rt_cache file.
The main data structure of the routing cache is the rt_hash_table hash table; its hash function combines the destination host's IP address with other information, like the source address of the packet and the type of service required. In fact, the Linux networking code allows you to fine tune the routing process so that a packet can, for instance, be routed along several paths according to where the packet came from and what kind of data it is carrying.
Each entry of the cache is represented by a rtable data structure, which stores several pieces of information; among them:
• The source and destination IP addresses
• The gateway IP address, if any
• Data relative to the route identified by the entry, stored in a dst_entry embedded in the rtable data structure (see the earlier section Section 18.1.5)
126.96.36.199 The neighbor cache
Another core component of the networking code is the so-called "neighbor cache," which includes information relative to hosts that belong to the networks directly linked to the computer.
We know that IP addresses are the main host identifiers of the network layer; unfortunately, they are meaningless for the lower data-link layer, whose protocols are essentially hardware-dependent. In practice, when the kernel has to transmit a packet by means of a given network card device, it must encapsulate the data in a frame carrying, among other things, the hardware-dependent identifiers of the source and destination network card devices.
Most local area networks are based on the IEEE 802 standards, and in particular, on the 802.3 standard, which is commercially known as "Ethernet.
The network card identifiers of the 802 standards are 48-bit numbers, which are usually written as 6 bytes separated by colons (such as "00:50:DA:61:A7:83"). There are no two network card devices sharing the same identifier (although it would be sufficient to ensure that all network card devices in the same local area network have different identifiers).
 Actually, Ethernet local area networks sprang up before IEEE published its standards; unfortunately, Ethernet and IEEE standards disagree in small but nevertheless crucial details — for instance, in the format of the data link packets. Every host in the Internet is able to operate with both standards, though.
How can the kernel know the identifier of a remote device? It uses an IPS protocol named Address Resolution Protocol (ARP). Basically, the kernel sends a broadcast packet into the local area network carrying the question: "What is the identifier of the network card device associated with IP address X?" As a result, the host identified by the specified IP address sends an answer packet carrying the network card device identifier.
It is a waste of time and bandwidth to repeat the whole process for every packet to be sent. Thus, the kernel keeps the network card device identifier, together with other precious data concerning the physical connection to the remote device, in the neighbor cache (often also called arp cache). You might get the contents of this cache by reading the/proc/net/arp file. System administrators may also explicitly set the entries of this cache by means of the arp command.
Each entry of the neighbor cache is an object of type neighbour; the most important field is certainly ha, which stores the network card device identifier. The entry also stores a pointer to a hh_cache object belonging to the hardware header cache; since all packets sent to the same remote network card device are encapsulated in frames having the same header (essentially carrying the source and destination device identifiers), the kernel keeps a copy of the header in memory to avoid having to reconstruct it from scratch for every packet.
18.1.7 The Socket Buffer
Each single packet transmitted through a network device is composed of several pieces of information. Besides the payload — that is, the data whose transmission caused the creation of the packet itself — all network layers, starting from the data link layer and ending at the transport layer, add some control information. The format of a packet handled by a network card device is shown in Figure 18-2.
Figure 18-2. The packet format
The whole packet is built by different functions in several stages. For instance, the UDP/TCP header and the IP header are composed of functions belonging, respectively, to the transport layer and the network layer of the IPS architecture, while the hardware header and trailer, which build the frame encapsulating the IP datagram, are written by a suitable method specific to the network card device.
The Linux networking code keeps each packet in a large memory area called a socket buffer. Each socket buffer is associated with a descriptor, which is a data structure of type sk_buff that stores, among other things, pointers to the following data structures:
• The socket buffer
• The payload — that is, the user data (inside the socket buffer)
• The data link trailer (inside the socket buffer)
• The INET socket (sock object)
• The network device's net_device object
• A descriptor of the transport layer header
• A descriptor of the network layer header
• A descriptor of the data link layer header
• The destination cache entry (dst_entry object)
The sk_buff data structure includes many other fields, like an identifier of the network protocol used for transmitting the packet, a checksum field, and the arrival time for received packets.
As a general rule, the kernel avoids copying data, but simply passes the sk_buff descriptor pointer, and thus the socket buffer, to each networking layer in turn. For instance, when preparing a packet to send, the transport layer starts copying the payload from the User Mode buffer into the higher portion of the socket buffer; then the transport layer adds its TCP or UDP header before the payload. Next, the control is transferred to the network layer, which receives the socket buffer descriptor and adds the IP header before the transport header. Eventually, the data link layer adds its header and trailer, and enqueues the packet for transmission.
Continue reading here: System Calls Related to Networking
Was this article helpful?