Linux TCP IP Stack Networking

Destination Cache Utility Functions

There are a few important utility functions associated with the destination cache. Several of these functions are called directly by users of the destination cache, and some of them are generic functions for manipulating entries that are initialized in the destination cache entry when it is created. The functions are defined in file linux net core dst. c. Destination cache entries are allocated with dst_alloc. void * dst alloc(struct dst ops * ops) In this function, we allocate an instance of a...

What is a Socket

One definition of the socket interface is that it is the interface between the transport layer protocols in the TCP IP stack and all protocols above. However, the socket interface is also the interface between the kernel and the application layer for all network programming functions. All data and control functions for the TCP IP stack pass through the socket interface. As we saw in the introductory chapters in this book, the TCP IP stack itself does not include any protocols above the...

Receiving Multicast and Broadcast Packets in UDP

Multicast and broadcast packets are sent to multiple destinations. In fact, there may be multiple destinations in the same machine. When UDP receives a multicast or broadcast packet, it checks to see if there are multiple open sockets that should receive the packet. When the routing table entry for the incoming packet has the multicast or broadcast flags set, the UDP receive function, udp_rcv, calls the UDP multicast receive function, udp_v4_mcast_deliver in file linux net ipv4 udp.,c to...

Specific Requirements for Embedded OSs

The later chapters of this book illustrate the Linux implementation of TCP IP in detail. An important focus of this book is embedded systems therefore, we should take the time to discuss the requirements for a generic embedded system that seeks to host a TCP IP stack. We can compare these requirements with specific Linux capabilities to see how Linux compares with its competitors as a choice for an embedded OS, and how well it is suited to support an embedded system's networking needs. The TCP...

The Socket Buffer Structure Skbuff

The socket buffer structure, sk_buff, is defined in file linux include linux skbuff.h. The following two fields in the socket buffer must be first. As we will see later in the book, sometimes the socket buffer lists are overloaded as different types. Next points to the next buffer on the list, and prev points to the previous buffer. The next field, list, points to the head of the list of socket buffers. The next field points to a sock structure, and for transmitted packets, it points to the...

Sending Data From a Socket via TCP

The best way to describe the implementation of a complex protocol like TCP is to follow the data as it flows through the protocol. For the remaining part of this chapter, the send-side processing of TCP will be examined. It is important to keep in mind that TCP is probably the most complex part of the TCP IP protocol suite. Because TCP provides a connection-oriented service at the transport layer, it is far more complicated than UDP. TCP must manage the relationship between the local host and...

TCP Receive Handler Function tcpv4rcv

Function Tcpip Stack

In this section, we will examine the TCP input segment handling and the registered handler function for the TCP protocol in the AF_INET protocol family. Figure 10.2 shows the TCP receive packet flow. Figure 10.2 TCP receive packet flow. Figure 10.2 TCP receive packet flow. As is the case with all other member protocols in the AF_INET family, TCP associates a handler function with the protocol field value, IPPPROTO_TCP, (the value six) by initializing an instance of the inet_protocol structure....

Tasklets SoftIRQs and Timers

Newer versions of Linux include a multithreading facility called tasklets. Bottom halves still exist, but they have been converted into tasklets, which are SMP aware. In addition to tasklets, the Linux kernel also provides timers for use by device drivers and protocols. The discussion on kernel threads in this book is limited to the threading model used internally in the TCP IP stack and the network interface drivers. However, Bovet and Cesati include a complete discussion of Linux kernel...

Route Cache Garbage Collection

The route cache includes a Garbage Collection (GC) facility. The GC is based on the generic garbage collection in the generic destination cache. The route GC sets a specific function in the gc field of the destination cache operations structure. This is to override the function in the generic destination cache garbage collection. The route GC also has an expiration timer that acts as a fallback to completely delete routes from the cache after they age out. In addition to the GC timer, there is...

Handling IGMP Queries

IGMP queries are sent from routers to the all-hosts. Hosts respond to the queries with membership reports. The function igmp_heard_query processes incoming IGMP queries. static void igmp heard query(struct in device *in dev, struct sk buff *skb, struct igmphdr *ih skb- h.igmph struct igmpv3 query *ih3 (struct igmpv3 query *)ih struct ip mc list *im Group is the multicasting group (class D) address. group ih- group max delay mark 0 The first thing we have to do is figure out if the incoming...

The Sock Structure

In earlier versions of the kernel, the sock structure was much more complex. However, in 2.6, it has been greatly simplified in two ways. The structure is preceded with a common part that is generic to all protocol families. The other way it is different is that in Linux 2.6, instances of the sock structure are allocated from protocol-specific slab caches instead of a generic cache. In addition, following the structure there is an IPv4 and IPv6 specific part that contains the prot_info...

Creation of an Afinet Socket

The create function for the TCP IP protocol family, AF INET is inet create, is defined in file static int inet create(struct socket *sock, int protocol) Inet_create is defined as static because it is not called directly. Instead, it is called through the create field in the net_proto_family structure for the AF_INET protocol family. Inet_create is called from sys_socket when family is set to AF_INET. In inet_create, we create a new sock structure called sk and initialize a few more fields. The...

FIB Table Hash Functions

Under the covers, the FIB table is implemented as a multizone hash table. Each location in the hash table is held by a fib_info structure. The fib_info structure is explained in Section 9.6.1. The fib_info structures are allocated from a slab cache called the fib node cache, which is also defined in file linux net ipv4 fib_hash. c. Next, we will look at the hash functions defined for the fib_table. As we know from earlier discussions, the FIB is a generic routing database. The fib_table...

Neighbor Discovery

The Neighbor Discovery (ND) protocol is used by hosts and routers for mutual discovery on locally connected nets RFC 2461 . This protocol replaces two protocols in IPv4. One of these protocols, no longer needed in IPv6, is ARP. In IPv4, ARP was used to map an IP address to a link layer address. As we saw in Section 11.3, this is no longer necessary because IPv6 addresses include a link-local address type in which the link-layer addresses are built in to the IPv6 address itself. Another facility...

TCP Processing Data Segments in Established State

Obviously, the purpose of TCP is to transfer data reliably and rapidly. Data is transferred between the peers while the connection is in the ESTABLISHED state. Once the socket is in the ESTABLISHED state, the function of the connection is to transfer data between the two sides of the connection as fast as the network and the peer will permit. The tcp_v4_do_rcv function covered earlier in the text checks to see if the socket is in the ESTABLISHED state while processing incoming packets. If so,...

The RPDB the FIB and the FIB Rules

The Routing Policy Database (RPDB) consists of the Forwarding Information Base (FIB) and the FIB rules. The FIB is the core of Linux routing. Linux can be configured to support multiple routing tables, but as a default it is configured with only two tables. In many common uses of the Linux kernel there is no need for policy-based routing. This would be the case with most embedded systems. Therefore, multiple tables need not be configured into the kernel. If multiple tables are not configured,...

Multicast and IGMP

The Internet Group Management Protocol (IGMP) is for exchanging messages to manage multicast routing and message transmission. Essentially, the purpose of IGMP is to associate a group of unicast addresses to a specific class D multicast address. The protocol exchanges information about these groups between hosts and routers. There are three versions of IGMP version 1 RFC 1112 , version 2, and version 3 RFC 3376 . All three versions are interoperable. Version 1 specifies two types of messages, a...

Network Device initialization

Network device initialization begins when the Linux kernel discovers a device of a certain type by calling a probe function looking for a match of a particular hardware interface with its associated driver. Once a match is discovered, the driver's specific initialization function is called. This function is generally typed_devinit or_init, both of which are defined in file linux include linux init. h. When the network interface driver's initialization or probe function is called, the first...

Packet Handler Initialization and Registration

According to the OSI layered network model, a network layer protocol deals with the semantics of network addressing and packet routing. However, some OSs and TCP IP stack implementations define a network layer protocol as anything that receives incoming packets from the network interface drivers. Linux defines all protocols that receive packets from network interface drivers as packet handlers. The protocol management and registration facility provided by Linux is called the Packet Handler...

The Main Output Route Resolving Function

If the fast path route failed, we continue with the slow path routing. If the fast path routing function can't match the new route in the route cache, it calls the function ip_route_output_slow to search the FIB. int ip route output slow(struct rtable **rp, const struct flowi *oldflp) This variable, tos, is built from the tos field in the flowi structure pointed to by the parameter oldflp. It mostly contains the IP ToS field bits, which are used as part of the criteria to determine the route....

The Socket Application Programming Interface

The socket Application Programming Interface (API) functions are described in this section. Sockets are the fundamental basis of client server programming. Generally, socket programming follows the client-server model. At the risk of over-simplifying, we will define the server as the machine that accepts connections. In contrast, a client is the machine that initiates connections. This book won't pretend to duplicate the work of other authors on network application programming instead, refer to...

Multicasting and the Multicast Listener Discovery MLD Protocol for IPv6

In this section, we discuss the IPv6 multicast group management. Unfortunately, multicast routing is not supported by IPv6 at the time of this writing only the host side is fully supported. IPv4 used the Internet Group Management Protocol IGMP for multicast routing. In IPv4, IGMP associates a multicast address with a series of unicast addresses called a group. If a host wants to receive packets sent to a particular multicast destination address, it joins the group by sending a special IGMP...

The Socket Structure

The socket structure is the general structure that holds control and states information for the socket layer. It supports the BSD type socket interface. The first field contains the state of the socket and one of the socket state values shown later in Table 5.8. The socket flags are in the next field and hold the socket wait buffer state containing values such as SOCK_ASYNC_NOSPACE. Ops points to the protocol-specific operations for the socket. This data structure is shown later in this...

Delayed Acknowledgment Timer

The purpose of delayed acknowledgment is to minimize the number of separate ACKs that are sent. The receiver does not send an ACK as soon as it can. Instead, it holds on to the ACK in the hopes of piggybacking the ACK on an outgoing data packet. The delayed acknowledgment timer is set to the amount of time to hold the ACK waiting for outgoing data to be ready. The function, tcp_delack_timer, defined in file, linux net ipv4 tcp_timer. c, is called when the delayed acknowledgment timer expires,...

The Socket Multiplexor

The purpose of the socket multiplexor is to unravel the socket system calls. This mechanism is implemented in the file linux net socket. c and consists largely of the function sys_socketcall. This function maps the addresses in the arguments from the user-level socket function to kernel space and calls the correct kernel call for the specified protocol. asmlinkage long sys socketcall(int call, unsigned long user *args) The first thing it does is map each address from user space to kernel space....

The Network Device Structure netdevice

The net device structure, defined in file linux include netdevice.h, is the data structure that defines an instance of a network interface. It tracks the state information of all the network interface devices attached to the TCP IP stack. The structure might seem longer and more complex than is necessary, but it contains fields for implementing devices that have been added after the more generic network interfaces were originally defined. In our description of the net_device structure, the...

Indevice Structure for IPv4 Address Assignment Multicast and Configuration

Any TCP IP implementation has configurable options for IP multicasting and multiple network interface address assignment. The structure also contains most of the IPv4 tunable parameters. As shown in Chapter 4, most of the net_device structure is protocol independent. However, network interface addresses for IPv4 must be kept somewhere, and they are stored in the in_device structure. Fields can be set in this structure either via sysctrl or setsockopt and the rtnetlink. The address information...

Pv6 Packet Format

Many of the fields that were required in the IPv4 header are optional in the IPv6 header. These fields are now in optional headers called extension headers. There is a next header field in the main IP header and in all the extension headers. The next header field contains a specific value indicating the type of the next extension header if it exists. If there is no next extension header, the next header field contains the type for the first upper layer header, or...

ICMP Packet Processing

When the AF_INET family is initialized, the initialization function in linux net ipv4 af_inet. c calls inet_add_protocol to add ICMP to the list of protocols that will receive IPv4 packets after IP is done. The handler function for ICMP is icmp_rcv, defined in file linux net ipv4 icmp. c. The main purpose of icmp_rcv is to dispatch separate handler functions for all the ICMP message types shown in Table 9.8. It will receive all incoming ICMP packets once they are stripped of their IPv4 headers....

TCP Retransmit Timer

Tcp_retransmit_timer function is called when the retransmit timer expires, indicating that an expected acknowledgment was not received. The function is invoked from the general TCP write timer function, tcp_write_timer, which was discussed earlier in Section 8.9.1. static void tcp retransmit timer(struct sock *sk) struct tcp opt *tp tcp sk(sk) if (tp- packets out 0) goto out BUG TRAP( skb queue empty(&sk- sk write queue)) Next, we check to see if the connection should be timed out or if the...

Socket Options for TCP

In general, TCP is very configurable. The discussion of the internals of the TCP protocol later in this chapter and in Chapter 10 refer to various options and how they affect the performance or operation of the protocol. Section 8.5.2 shows the TCP options structure that holds the values of many of the socket options. However, in this section, the TCP socket options and ioctl configuration options are gathered together in one place. Although, most of these are covered in some fashion in the...

Destination Cache Garbage Collection

In this section, we will discuss how the destination cache Garbage Collection (GC) works. In addition to the default GC, the higher-level users of the destination cache may also have a separate garbage collection capability with a GC function reached through the gc field of the dst_ops structure. This is how the facility for aging out cached routes works and explicit garbage collection of cached routes is covered in Chapter 9. Here we discuss generic low-level destination garbage collection,...

Note about the Socket API and IPv6

Chapter 11 covers the Linux IPv6 protocol, how it is implemented, and how it compares to the IPv4 implementation. However, in this section, we will mention the address family used with the IPv6 and a few changes that were made to the 2.6 kernel socket API to accommodate the protocol suite. IPv6 introduces a new address family, AF_INET6, which is defined in linux include socket.h along with the other address families. In addition, there is a new socket address type for IPv6, sockaddr_in6,...

Notifier Chains and Network Interface Device Status Notification

Linux provides a mechanism for device status change notification called notifier chains. It is based on a generic notifier facility that can be used by modules anywhere in the kernel for various purposes. In the TCP IP stack, the notifier chains are used to pass changes in network device status or other events to any protocols, modules, or devices that register themselves with the facility. The notifier chains are for passing changes in status to any function that is registered with the...