Quick summary ↬ In this article, we explore the basic networking concepts in UNIX based operating systems. We start with sockets, IPv4, IPv6.
Introduction
Socket API is one of the most important ones in network programming that must be known with greater understanding to develop efficient distributed networking applications. A socket address structure is defined for every socket by which we can communicate with the corresponding socket and also receive information from it. This address structure carries information such as IP address, port number, and family. We also discuss many address conversion functions that convert from text representation to binary structure that fits into the socket address structure.
There are many address conversion functions that do this independent of the type of the address: IPv4 or IPv6. In the next section, let’s have a deep look at the socket address structure.
Socket Address Structures
Most functions require a pointer to the socket address as an argument to access the data in it. The names of the socket address always starts with the sockaddr_ and end with a suffix unique to the protocol suite.
Let’s look at how the address structure looks for both IPv4 and IPv6.
IPv4 Socket Address Structure
IPv4 socket address structure is also called the Internet socket address and is named as sockaddr_in. This address structure is included by adding the <netinet/in.h> header. POSIX definition of the socket address is defined below:
|
|
# | Socket address attribute | Memory needed |
---|---|---|
1 | sin_len | 1 byte(uint8_t) |
2 | sin_family | 1 byte(uint8_t) |
3 | sin_port | 2 bytes |
4 | sin_addr | 4 bytes |
5 | sin_zero | 8 bytes |
Based on the above socket address structure, we can learn many important points on how the computer communication happens in the network of the computers. In this article, we consider everything based on the POSIX specification.
The sin_len field won’t be used in the definition of the socket address. It is only useful to handle the variable-length socket address structures. We never need it or never set it unless if we deal with the routing sockets.
The POSIX specification requires only three members in the structure: sin_family, sin_port, sin_addr. All the implementations declare the sin_zero variable so that the socket address by minimum can be 16 bytes in size.
IPv4 address and the TCP or UDP port numbers were stored in the structure in the network byte order. This raises the question, why it’s not in the host byte order? What is the difference between the host byte order and the network byte order? Let’s answer all the above questions later and let’s concentrate on the address structure.
The 32-bit IPv4 address can be accessed in two different ways. serv.sin_addr references the 32-bit IPv4 address in the in_addr structure, while serv.sin_addr.s_addr references the same IPv4 address in the in_addr_t. We need to be careful to know how we access the address when we pass an address to the function.
The sin_zero member is unused. We always set the sin_zero to 0. By rule, we must set the whole address structure to be initiated to 0 before using it. Socket address structure was used only on a given host. The structure is not used to communicate with the different host but just the IP address and the port number for communicating with the other devices.
Generic Socket Address Structure
Whenever we pass the socket address structure as a reference to the function, the function should be able to deal with the conversion of the socket address structure based on the protocol family upon which the address structure is created. To recieve the address in the function, we need a generic address structure that can be casted to protocol specific address inside the function. In ANSI C version just declaring void* is enough to get the socket address structure. But, the socket functions predate the ANSI C, generic socket address structure is a must.
|
|
The above generic socket address structure resides in <sys/socket.h>.
Let’s consider a function that gets a socket address structure as an argument as shown below:
|
|
The second argument of the above function requires socket address structure and it’s declared and called below:
|
|
In the above declaration, an IPv4 specific socket address structure is converted to the generic socket address structure. If this conversion is not explicitly performed by the programmer, then the compiler generates the warning “warning: passing arg3 of ‘bind’ from incompatible pointer type.
Inside the bind function implementation, sa_family value is checked to find what socket address structure is passed to it. In application programmer’s point of view, generic socket address structure is to cast and send arguments to functions. In the kernel’s point of view, generic socket address structure is recieved as an argument and it finds the sa_family value before casting it to the appropriate socket address structure.
IPv6 Socket Address Structure
The IPv6 socket address is defined in <netinet/in.h> header. The below code shows the socket address structure for IPv6:
|
|
In this blog post, we won’t go deep into the IPv6 address space but it’s always good to know that it is there. A new generic socket address structure is defined as part of IPv6 API and it has the storage to accept any socket address structure values that the system supports.
The below code shows the struct declared for the sockaddr_storage.
|
|
Conclusion
To conclude this article, I have posted a diagram from the book, “UNIX Network Programming, Volume 1” by Richard Stevens.