3-Sockets Introduction

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1

Welcome to my github: https://github.com/gaoxiangnumber1

3.1 Introduction

  • Socket address structures can be passed in two directions: process -> kernel, and kernel -> process. The address conversion functions convert between a text representation of an address and the binary value that goes into a socket address structure.

3.2 Socket Address Structures

  • Each protocol suite defines its own socket address structure whose names begin with sockaddr_ and end with a unique suffix for each protocol suite.

IPv4 Socket Address Structure

  • An IPv4 socket address structure, commonly called an “Internet socket address structure”, is named sockaddr_in and is defined in

**Figure 3.1 The Internet (IPv4) socket address structure: sockaddr_in.**

struct in_addr
{
    in_addr_t   s_addr;         // 32-bit IPv4 address, network byte ordered
};

struct sockaddr_in
{
    uint8_t         sin_len;        // length of structure (16)
    sa_family_t     sin_family; // AF_INET
    in_port_t       sin_port;   // 16-bit TCP or UDP port number
    // network byte ordered
    struct in_addr  sin_addr;   // 32-bit IPv4 address, network byte ordered
    char                sin_zero[8];    // unused
};
  • The sin_zero member is unused, but we always set it to 0 when filling in one of these structures. Many implementations add this member so that all socket address structures are at least 16 bytes in size.
  • in_addr_t: unsigned integer >= 32 bits,
    in_port_t: unsigned integer >= 16 bits,
    sa_family_t: 8-bit unsigned integer if sin_len exists, otherwise 16-bit unsigned integer.
  • Socket address structures are used only on a given host: The structure itself is not communicated between different hosts, although certain fields (e.g., the IP address and port) are used for communication.

Generic Socket Address Structure

  • A generic socket address structure defined in

**Figure 3.3 The generic socket address structure: sockaddr.**

struct sockaddr
{
    uint8_t     sa_len;
    sa_family_t sa_family;      // address family: AF_xxx value
    char            sa_data[14];        // protocol-specific address
};
  • The socket functions are defined as taking a pointer to the generic socket address structure. This requires that any calls to these functions must cast the pointer to the protocol-specific socket address structure to be a pointer to a generic socket address structure.

IPv6 Socket Address Structure

  • The IPv6 socket address is defined in

**Figure 3.4 IPv6 socket address structure: sockaddr_in6.**

struct in6_addr
{
    uint8_t         s6_addr[16];        // 128-bit IPv6 address, network byte ordered
};

#define SIN6_LEN // required for compile-time tests

struct sockaddr_in6
{
    uint8_t         sin6_len;       // length of this struct (28)
    sa_family_t     sin6_family;        // AF_INET6
    in_port_t       sin6_port;      // transport layer port, network byte ordered
    uint32_t            sin6_flowinfo;  // flow information, undefined
    struct in6_addr sin6_addr;      // IPv6 address, network byte ordered
    uint32_t            sin6_scope_id;  // set of interfaces for a scope
};
  • The SIN6_LEN constant must be defined if the system supports the length member for socket address structures.
  • The IPv6 family is AF_INET6, the IPv4 family is AF_INET.
  • The sin6_flowinfo member is divided into two fields:
    1. The low-order 20 bits are the flow label.
    2. The high-order 12 bits are reserved.
  • The members are ordered so that if the sockaddr_in6 structure is 64-bit aligned, so is the 128-bit sin6_addr member.
  • The sin6_scope_id identifies the scope zone in which a scoped address is meaningful, most commonly an interface index for a link-local address(Section A.5).

New Generic Socket Address Structure

  • A new generic socket address structure “sockaddr_storage” is large enough to hold any socket address type supported by the system. It is defined in

**Figure 3.5 The storage socket address structure: sockaddr_storage.**

struct sockaddr_storage
{
    uint8_t     ss_len;     // length of this struct (implementation dependent)
    sa_family_t ss_family;  // address family: AF_xxx value
    /* implementation-dependent elements to provide: * a) alignment sufficient to fulfill the alignment requirements of * all socket address types that the system supports. * b) enough storage to hold any type of socket address that the * system supports. */
};
  • Two differences between “sockaddr_storage” and “sockaddr”.
    1. If any socket address structures that the system supports have alignment requirements, the sockaddr_storage provides the strictest alignment requirement.
    2. The sockaddr_storage is large enough to contain any socket address structure that the system supports.
  • The fields of the sockaddr_storage structure are opaque to the user, except for ss_family and ss_len(if present). The sockaddr_storage must be cast or copied to the appropriate socket address structure for the address given in ss_family to access any other fields.

Comparison of Socket Address Structures

  • Figure 3.6 shows a comparison of the five socket address structures: IPv4, IPv6, Unix domain(Figure 15.1), datalink(Figure 18.1), and storage. Assume: the socket address structures all contain a one-byte length field; the family field occupies one byte; any field that must be at least some number of bits is exactly that number of bits.
  • IPv4 and IPv6 socket address structures are fixed-length, while the Unix domain structure and the datalink structure are variable-length. To handle variable-length structures, whenever we pass a pointer to a socket address structure as an argument to one of the socket functions, we pass its length as another argument.

3.3 Value-Result Arguments

  • Four functions(accept, recvfrom, getsockname, getpeername) pass a socket address structure from the kernel to the process. Two of the arguments to these four functions are the pointer to the socket address structure along with a pointer to an integer containing the size of the structure:
struct sockaddr_un cli; /* Unix domain */
socklen_t len = sizeof(cli); /* len is a value */
getpeername(unixfd, (SA *) &cli, &len); /* len may have changed */
  • The reason that the size changes from an integer to be a pointer to an integer is because the size is both a value when the function is called (it tells the kernel the size of the structure so that the kernel does not write past the end of the structure when filling it in) and a result when the function returns (it tells the process how much information the kernel actually stored in the structure).
  • This type of argument is called a value-result argument. Figure 3.8

3.4 Byte Ordering Functions

  • Consider a 16-bit integer that is made up of 2 bytes. Two ways to store the two bytes in memory: with the low-order byte at the starting address, known as little-endian byte order, or with the high-order byte at the starting address, known as big-endian byte order.

  • Most Significant Bit: MSB; Least Significant Bit: LSB.
  • The terms “little-endian” and “big-endian” indicate which end of the multibyte value, the little end or the big end, is stored at the starting address of the value.
  • There is no standard between these two byte orderings and we encounter systems that use both formats. We refer to the byte ordering used by a given system as the host byte order. The program in Figure 3.10 prints the host byte order.

#include <stdio.h>
#include <stdlib.h>

const int kShortSize = sizeof(short);

int main(int argc, char **argv)
{
    union
    {
        short s;
        char c[kShortSize];
    } un;

    un.s = 0x0102;
    if(kShortSize == 2)
    {
        if(un.c[0] == 1 && un.c[1] == 2)
        {
            printf("big-endian\n");
        }
        else if(un.c[0] == 2 && un.c[1] == 1)
        {
            printf("little-endian\n");
        }
        else
        {
            printf("Unknown\n");
        }
    }
    else
    {
        printf("sizeof(short) = %d\n", kShortSize);
    }

    exit(0);
}
  • The Internet protocols use big-endian byte ordering.
#include <netinet/in.h>
uint16_t htons(uint16_t host16bitvalue) ;
uint32_t htonl(uint32_t host32bitvalue) ;
Both return: value in network byte order
uint16_t ntohs(uint16_t net16bitvalue) ;
uint32_t ntohl(uint32_t net32bitvalue) ;
Both return: value in host byte order
  • h: host,
    n: network,
    s: short(a 16-bit value, such as a TCP/UDP port number),
    l: long (a 32-bit value, such as an IPv4 address).
  • On systems that have the same byte ordering as the Internet protocols(big-endian), these four functions are usually defined as null macros.
  • The First 32 Bits of the IPv4 Header:

  • This represents four bytes in the order in which they appear on the wire; the leftmost bit is the most significant.

3.5 Byte Manipulation Functions

#include <strings.h>
void bzero(void *dest, size_t nbytes);
void bcopy(const void *src, void *dest, size_t nbytes);
int bcmp(const void *ptr1, const void *ptr2, size_t nbytes);
Returns: 0 if equal, nonzero if unequal
  • bzero sets the specified number of bytes to 0 in the destination.
  • bcopy moves the specified number of bytes from the source to the destination.
  • bcmp compares two arbitrary byte strings. The return value is zero if the two byte strings are identical; otherwise, it is nonzero.
#include <string.h>
void *memset(void *dest, int c, size_t len);
void *memcpy(void *dest, const void *src, size_t nbytes);
int memcmp(const void *ptr1, const void *ptr2, size_t nbytes);
Returns: 0(first = second); > 0(first > second); < 0(first < second).
  • memset sets the specified number of bytes to the value c in the destination.
  • memcpy is similar to bcopy. bcopy correctly handles overlapping fields, while the behavior of memcpy is undefined if the source and destination overlap. memmove function must be used when the fields overlap.
  • memcmp compares two arbitrary byte strings and returns 0 if they are identical. > 0: first > second; < 0: first < second. The comparison is done assuming the two unequal bytes are unsigned chars.

3.6 ‘inet_aton’, ‘inet_addr’, and ‘inet_ntoa’ Functions

  1. inet_aton, inet_ntoa, and inet_addr convert an IPv4 address from a dotted-decimal string(e.g., “206.168.112.96”) to its 32-bit network byte ordered binary value.
  2. inet_pton and inet_ntop handle both IPv4 and IPv6 addresses.
#include <arpa/inet.h>
int inet_aton(const char *strptr, struct in_addr *addrptr);
Returns: 1 if string was valid, 0 on error
in_addr_t inet_addr(const char *strptr);
Returns: 32-bit binary network byte ordered IPv4 address; INADDR_NONE(usually -1) if error
char *inet_ntoa(struct in_addr inaddr);
Returns: pointer to dotted-decimal string
  • inet_aton converts the C character string pointed to by strptr into its 32-bit binary network byte ordered value, which is stored through the pointer addrptr. If successful, 1 is returned; otherwise, 0 is returned. If addrptr is a null pointer, the function still performs its validation of the input string but does not store any result.
  • inet_addr does the same conversion, returning the 32-bit binary network byte ordered value. Return the constant INADDR_NONE(typically 32 one-bits, usually -1) on an error. This means “255.255.255.255” (the IPv4 limited broadcast address, Section 20.2) cannot be handled by this function since its binary value appears to indicate failure of the function.
  • The inet_ntoa function converts a 32-bit binary network byte ordered IPv4 address into its corresponding dotted-decimal string. The string pointed to by the return value resides in static memory. This means the function is not reentrant - Section 11.18.

3.7 ‘inet_pton’ and ‘inet_ntop’ Functions

  • These two functions work with both IPv4 and IPv6 addresses. p: presentation; n: numeric.
#include <arpa/inet.h>
int inet_pton(int family, const char *strptr, void *addrptr);
Returns: 1 if OK, 0 if input not a valid presentation format, -1 on error
const char *inet_ntop(int family, const void *addrptr, char *strptr, size_t len);
Returns: pointer to result if OK, NULL on error
  • family: AF_INET or AF_INET6. If family is not supported, both functions return an error with errno set to EAFNOSUPPORT.
  • inet_pton converts the string pointed to by strptr, storing the binary result through the pointer addrptr.
  • inet_ntop converts from numeric(addrptr) to presentation(strptr). The len argument is the size of the destination, to prevent the function from overflowing the caller’s buffer. To help specify this size, the following two definitions are defined in
#define INET_ADDRSTRLEN 16 /* for IPv4 dotted-decimal */
#define INET6_ADDRSTRLEN 46 /* for IPv6 hex string */ 
  • If len is too small to hold the resulting presentation format, including the terminating null, a null pointer is returned and errno is set to ENOSPC.
  • The strptr argument to inet_ntop cannot be a null pointer. The caller must allocate memory for the destination and specify its size. On success, this pointer is the return value of the function.

Example

  • Replace
    foo.sin_addr.s_addr = inet_addr(cp);
    with
    inet_pton(AF_INET, cp, &foo.sin_addr);
    Replace
    ptr = inet_ntoa(foo.sin_addr);
    with
    char str[INET_ADDRSTRLEN];
    ptr = inet_ntop(AF_INET, &foo.sin_addr, str, sizeof(str));

  • Problem with inet_ntop is that it requires the caller to pass a pointer to a binary address. This address is contained in a socket address structure, requiring the caller to know the format of the structure and the address family. To use it, we must write code of the form
struct sockaddr_in addr;
inet_ntop(AF_INET, &addr.sin_addr, str, sizeof(str));

for IPv4;

struct sockaddr_in6 addr6;
inet_ntop(AF_INET6, &addr6.sin6_addr, str, sizeof(str));

for IPv6. This makes our code protocol-dependent.
- To solve this, we write sock_ntop that takes a pointer to a socket address structure, looks inside the structure, and calls the appropriate function to return the presentation format of the address.

#include "unp.h"
char *sock_ntop(const struct sockaddr *sockaddr, socklen_t addrlen);
Returns: non-null pointer if OK, NULL on error
  • sockaddr points to a socket address structure whose length is addrlen. The function uses its own static buffer to hold the result and a pointer to this buffer is the return value.
  • Using static storage for the result prevents the function from being re-entrant or thread-safe.
  • The presentation format is the dotted-decimal form of an IPv4 address or the hex string form of an IPv6 address surrounded by brackets, followed by a terminator (we use a colon:), followed by the decimal port number, followed by a null character. Hence, the buffer size must be at least INET_ADDRSTRLEN plus 6 bytes for IPv4 (16 + 6 = 22), or INET6_ADDRSTRLEN plus 8 bytes for IPv6 (46 + 8 = 54).

  • Other functions that we define to operate on socket address structures to simplify the portability of our code between IPv4 and IPv6.
#include "unp.h"
int sock_bind_wild(int sockfd, int family);
Returns: 0 if OK, -1 on error

int sock_cmp_addr(const struct sockaddr *sockaddr1, const struct sockaddr *sockaddr2, socklen_t addrlen);
Returns: 0 if addresses are of the same family and ports are equal, else nonzero

int sock_cmp_port(const struct sockaddr *sockaddr1, const struct sockaddr *sockaddr2, socklen_t addrlen);
Returns: 0 if addresses are of the same family and ports are equal, else nonzero

int sock_get_port(const struct sockaddr *sockaddr, socklen_t addrlen);
Returns: non-negative port number for IPv4 or IPv6 address, else -1

char *sock_ntop_host(const struct sockaddr *sockaddr, socklen_t addrlen);
Returns: non-null pointer if OK, NULL on error

void sock_set_addr(const struct sockaddr *sockaddr, socklen_t addrlen, void *ptr);
void sock_set_port(const struct sockaddr *sockaddr, socklen_t addrlen, int port);
void sock_set_wild(struct sockaddr *sockaddr, socklen_t addrlen);
  • sock_bind_wild binds the wildcard address and an ephemeral port to a socket.
  • sock_cmp_addr compares the address portion of two socket address structures, and
  • sock_cmp_port compares the port number of two socket address structures.
  • sock_get_port returns just the port number.
  • sock_ntop_host converts just the host portion of a socket address structure to presentation format (not the port number).
  • sock_set_addr sets just the address portion of a socket address structure to the value pointed to by ptr.
  • sock_set_port sets just the port number of a socket address structure.
  • sock_set_wild sets the address portion of a socket address structure to the wildcard.
  • We provide a wrapper function whose name begins with “S” for all of these functions that return values other than void and normally call the wrapper function from our programs.

3.9 ‘readn’, ‘writen’, and ‘readline’ Functions

  • Stream sockets(e.g., TCP sockets) exhibit a behavior with the read and write functions that differs from normal file I/O. A read or write on a stream socket might input or output fewer bytes than requested, but this is not an error condition. The reason is that buffer limits might be reached for the socket in the kernel. All that is required to input or output the remaining bytes is for the caller to invoke the read or write function again. This scenario is always a possibility on a stream socket with read, but is normally seen with write only if the socket is nonblocking.
  • We provide the following three functions that we use whenever we read from or write to a stream socket:
#include "unp.h"
ssize_t readn(int filedes, void *buff, size_t nbytes);
ssize_t writen(int filedes, const void *buff, size_t nbytes);
ssize_t readline(int filedes, void *buff, size_t maxlen);
All return: number of bytes read or written, –1 on error

  • Our three functions look for the error EINTR and continue reading or writing if the error occurs. We handle the error here, instead of forcing the caller to call readn or writen again.
  • Our readline function is very inefficient since it calls the system’s read function once for every byte of data. When faced with the desire to read lines from a socket, it is tempting to turn to the standard I/O library(“stdio”). We will discuss this approach in Section 14.8, but it is dangerous: The same stdio buffering that solves this performance problem creates numerous logistical problems that can lead to well-hidden bugs in your application.
  • The reason is that the state of the stdio buffers is not exposed. Consider a line-based protocol between a client and a server, where several clients and servers using that protocol may be implemented over time. Good “defensive programming” techniques require these programs to not only expect their counterparts to follow the network protocol, but to check for unexpected network traffic as well. Such protocol violations should be reported as errors so that bugs are noticed and fixed (and malicious attempts are detected as well), and also so that network applications can recover from problem traffic and continue working if possible. Using stdio to buffer data for performance flies in the face of these goals since the application has no way to tell if unexpected data is being held in the stdio buffers at any given time.
  • There are many line-based network protocols such as SMTP, HTTP, the FTP control connection protocol and finger. So, the desire to operate on lines is strong. Advice is to think in terms of buffers and not lines: read buffers of data, and if a line is expected, check the buffer to see if it contains that line.

  • Figure 3.18 shows a faster version of the readline function, which uses its own buffering rather than stdio buffering. The state of readline’s internal buffer is exposed, so callers have visibility into exactly what has been received.
  • 2–21
    The internal function my_read reads up to MAXLINE characters at a time and then returns them, one at a time.
  • 29
    The only change to the readline function itself is to call my_read instead of read.
  • 42–48
    A new function, readlinebuf, exposes the internal buffer state so that callers can check and see if more data was received beyond a single line.
  • By using static variables in readline.c to maintain the state information across successive calls, the functions are not re-entrant or thread-safe. We will discuss this in Sections 11.18 and 26.5. We will develop a thread-safe version using thread-specific data in Figure 26.11.

3.10 Summary

Exercises 3.1

Why must value-result arguments such as the length of a socket address structure be passed by reference?

  • In C, a function cannot change the value of an argument that is passed by value. For a called function to modify a value passed by the caller requires that the caller pass a pointer to the value to be modified.

Exercises 3.2

Why do both the readn and writen functions copy the void* pointer into a char* pointer?

  • The pointer must be incremented by the number of bytes read or written, but C does not allow a void pointer to be incremented since the compiler does not know the datatype pointed to.

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1

你可能感兴趣的:(github,socket)