Linux learning -- socket network programming foundation

Hierarchical model

OSI seven layer model

1. Physical layer: it mainly defines physical equipment standards, such as interface type of network cable, interface type of optical fiber, transmission rate of various transmission media, etc. Its main function is to transmit the bit stream (that is, from 1, 0 to current strength for transmission, and then to 1, 0 after arriving at the destination, which is often referred to as digital to analog conversion and analog-to-digital conversion). The data in this layer is called bits.

2. Data link layer: it defines how to make formatted data transmitted in frames and how to control access to physical media. This layer also provides error detection and correction to ensure reliable data transmission. For example: 115200, 8, N, 1 used in serial communication

3. Network layer: provide connection and path selection between two host systems in the network located in different geographical locations. With the development of Internet, the number of users accessing information from all sites in the world has increased greatly, and the network layer is the layer that manages this kind of connection.

4. Transmission layer: defines some protocols and port numbers for data transmission (WWW port 80, etc.), such as: TCP (transmission control protocol, low transmission efficiency, strong reliability, used to transmit data with high reliability requirements and large data volume), UDP (user datagram protocol, opposite to TCP characteristics, used to transmit data with low reliability requirements and small data volume, such as QQ chat data It is transmitted in this way). It is mainly to segment and transmit the data received from the lower layer, and reorganize it after reaching the destination address. This layer of data is often called a segment.

5. Session layer: establish the data transmission path through the transmission layer (port number: transmission port and receiving port). It is mainly used to initiate a session or accept a session request between your systems (devices need to know each other whether it is IP or MAC or hostname).

6. Presentation layer: it can ensure that the information sent by the application layer of one system can be read by the application layer of another system. For example, a PC program communicates with another computer, one of which uses the extended binary decimal interchange code (EBCDIC), while the other uses the American Information Interchange Standard Code (ASCII) to represent the same characters. If necessary, the presentation layer uses a common format to transform between multiple data formats.

7. Application layer: the OSI layer closest to the user. This layer provides network services for users' applications, such as email, file transfer and terminal emulation.

Four layer model of TCP/IP
TCP/IP Network protocol stack is divided into Application layer, Transport layer, Network layer and Link layer. As shown in the figure below:

Generally, in the process of application development, the most discussed is the TCP/IP model.

Communication process

The process of communication between the two computers through TCP/IP protocol is as follows:

The above figure corresponds to the situation that two computers are in the same network segment. If two computers are in different network segments, the data will pass through one or more routers in the process of transmission from one computer to another, as shown in the following figure:

There are Ethernet, token ring and other standards in the link layer. The link layer is responsible for the driving of network card equipment, frame synchronization (that is, the signal detected from the network cable is regarded as the start of a new frame), conflict detection (if a conflict is detected, it will automatically resend), data error verification and other work. The switch is a network device working in the link layer. It can forward data frames between different link layer networks (for example, between 10 megaethernet and 100 megaethernet, between Ethernet and token ring network). Because the frame formats of different link layers are different, the switch needs to remove the incoming data packets from the first part of the link layer and repack them before forwarding.

IP protocol of network layer is the foundation of Internet. The host on the Internet is identified by IP address. There are a lot of routers on the Internet who are responsible for choosing the appropriate path to forward packets according to IP address. Packets from the source host on the Internet to the destination host often pass through more than ten routers. Router is a network device working in the third layer, and it also has the function of switch. It can forward data packets between different link layer interfaces. Therefore, the router needs to remove the network layer and link layer of the incoming data packets and repack them. IP protocol does not guarantee the reliability of transmission. Data packets may be lost in the process of transmission. Reliability can be supported in the upper protocol or application program.

The network layer is responsible for point-to-point transmission (in this case, "point" refers to host or router), while the transmission layer is responsible for end-to-end transmission (in this case, "end" refers to source host and destination host). The transport layer can choose TCP or UDP protocol.
TCP is a connection oriented and reliable protocol. It's a bit like making a phone call. After the two sides pick up the phone and exchange identities, they establish a connection, and then talk. What they say here and there is guaranteed to be heard and heard in the order of speaking. When they finish speaking, they hang up and disconnect. In other words, the two sides of TCP transmission need to establish a connection first, and then the TCP protocol ensures the reliability of data receiving and transmitting. The lost data packets are automatically retransmitted. The upper application receives always reliable data flow, and the connection is closed after communication.

UDP is a connectionless transmission protocol, which does not guarantee the reliability. It's a bit like sending a letter. When a letter is written and put in a mailbox, it can neither guarantee that the letter will not be lost in the process of post delivery nor guarantee the order in which it is sent. Applications using UDP protocol need to complete packet loss retransmission, message sorting, etc.

After the destination host receives the data packet, how to pass through the protocol stack of each layer and finally reach the application program? The process is as follows:

The Ethernet driver first determines whether the payload of the data frame is the IP, ARP or RARP datagram according to the "upper protocol" field in the Ethernet header, and then submits it to the corresponding protocol for processing. If it is an IP datagram, the IP protocol determines whether the payload of the datagram is TCP, UDP, ICMP or IGMP according to the "upper protocol" field in the IP header, and then submits it to the corresponding protocol for processing. If it is a TCP segment or a UDP segment, the TCP or UDP protocol then determines which user process should hand over the application layer data according to the "port number" field of the TCP or UDP header. IP address is the address that identifies different hosts in the network, and port number is the address that identifies different processes on the same host. The combination of IP address and port number identifies the only process in the network.

Although IP, ARP and RARP datagrams all need Ethernet drivers to package into frames, but from the functional point of view, ARP and RARP belong to the link layer and IP belongs to the network layer. Although the data of ICMP, IGMP, TCP and UDP all need IP protocol to encapsulate into datagram, but from the functional division, ICMP, IGMP and IP belong to the network layer, and TCP and UDP belong to the transmission layer.

Protocol format

Packet encapsulation

The transport layer and the following mechanisms are provided by the kernel, and the application layer is provided by the user process (how to use the socket API to write the application program will be described later). The application program explains the meaning of the communication data, while the transport layer and the following processing communication details send the data from one computer to another through a certain path. When application layer data is sent to the network through the protocol stack, each layer of protocol needs to add a data header, which is called Encapsulation, as shown in the following figure:

Different protocol layers have different names for packets, which are called segment in the transmission layer, datagram in the network layer and frame in the link layer. The data is packaged into frames and sent to the transmission medium. After arriving at the destination host, each layer of protocol strips off the corresponding head. Finally, the application layer data is handed over to the application program for processing.

Ethernet frame format

The frame format of Ethernet is as follows:

The source address and destination address refer to the hardware address (also called MAC address) of the network card. The length is 48 bits, which is solidified when the network card leaves the factory. You can use the ifconfig command in the shell to see that the HWaddr 00:15:F2:14:9E:3F part is the hardware address. The protocol field has three values, IP, ARP and RARP. At the end of the frame is the CRC check code.

The data length in the Ethernet frame specifies the minimum 46 bytes and the maximum 1500 bytes. ARP and RARP packets are not enough 46 bytes in length, so they need to fill in the following bits. The maximum value of 1500 is called the maximum transmission unit (MTU) of Ethernet. Different network types have different MTUs. If a packet is routed from Ethernet to the dial-up link and the packet length is greater than the MTU of the dial-up link, the packet needs to be fragmented. "MTU:1500" is also available in the ifconfig command output. Note that the concept of MTU refers to the maximum length of the payload in the data frame, excluding the frame header length.

ARP datagram format

In network communication, the application program of the source host knows the IP address and port number of the destination host, but does not know the hardware address of the destination host. The data packet is first received by the network card and then processed to the upper layer protocol. If the hardware address of the received data packet does not match the local one, it will be discarded directly. Therefore, the hardware address of the destination host must be obtained before communication. ARP protocol plays this role. The source host sends out the ARP request, asking "what is the hardware address of the host whose IP address is 192.168.0.1", and broadcasts the request to the local network segment (the hardware address of the Ethernet frame header is filled with FF:FF:FF:FF:FF:FF: FF: FF), the destination host receives the ARP request broadcast, and sends an ARP response packet to the source host if the IP address is consistent with that of the local host Your hardware address is filled in the response package.

Each host maintains an ARP cache table, which can be viewed with the arp-a command. There is an expiration time (generally 20 minutes) for the table entries in the cache table. If a table entry is not used again within 20 minutes, the table entry will fail. Next time, an ARP request will be sent to obtain the hardware address of the destination host. Think about it. Why should entries have expiration times instead of being valid all the time?

The format of ARP datagram is as follows:

The source MAC address and destination MAC address appear once in the Ethernet header and ARP request respectively, which is redundant for the case that the link layer is Ethernet, but it may be necessary if the link layer is other types of network. Hardware type refers to the network type of link layer, 1 is Ethernet, protocol type refers to the address type to be converted, 0x0800 is IP address, the length of the latter two addresses is 6 and 4 (bytes) for Ethernet address and IP address respectively, op field 1 indicates ARP request, op field 2 indicates ARP response.

Look at a specific example.

The request frame is as follows (in order to be clear, a byte count is added at the front of each line, 16 bytes for each line):
Ethernet header (14 bytes)
0000: ff ff ff ff ff ff 00 05 5d 61 58 a8 08 06
ARP frame (28 bytes)
0000: 00 01
0010: 08 00 06 04 00 01 00 05 5d 61 58 a8 c0 a8 00 37
0020: 00 00 00 00 00 00 c0 a8 00 02
Fill bit (18 bytes)
0020: 00 77 31 d2 50 10
0030: fd 78 41 d3 00 00 00 00 00 00 00 00
Ethernet head: the destination host uses broadcast address, the MAC address of the source host is 00:05:5d:61:58:a8, and the upper protocol type 0x0806 represents ARP.
ARP frame: hardware type 0x0001 represents Ethernet, protocol type 0x0800 represents IP protocol, hardware address (MAC address) length is 6, protocol address (IP address) length is 4, op is 0x0001 represents MAC address of request destination host, source host MAC address is 00:05:5d:61:58:a8, source host IP address is c0 a8 00 37 (192.168.0.55), the MAC address of the destination host is all 0 to be filled in, and the IP address of the destination host is c0a8000 2 (192.168.0.2).
Because Ethernet specifies a minimum data length of 46 bytes and ARP frame length of only 28 bytes, there are 18 bytes of fill bits, and the contents of the fill bits are not defined, which is related to the specific implementation.

The response frame is as follows:

Ethernet head
0000: 00 05 5d 61 58 a8 00 05 5d a1 b8 40 08 06
ARP frame
0000: 00 01
0010: 08 00 06 04 00 02 00 05 5d a1 b8 40 c0 a8 00 02
0020: 00 05 5d 61 58 a8 c0 a8 00 37
Fill bit
0020: 00 77 31 d2 50 10
0030: fd 78 41 d3 00 00 00 00 00 00 00 00
Ethernet head: the MAC address of the destination host is 00:05:5d:61:58:a8, the MAC address of the source host is 00:05:5d:a1:b8:40, and the upper protocol type 0x0806 indicates ARP.

ARP frame: hardware type 0x0001 indicates Ethernet, protocol type 0x0800 indicates IP protocol, hardware address (MAC address) length is 6, protocol address (IP address) length is 4, op is 0x0002 indicates response, source host MAC address is 00:05:5d:a1:b8:40, source host IP address is c0 a8 00 02 (192.168.0.2), destination host MAC address is 00:05:5d:61:58:a8, destination host IP address is C0 A8 00 37(192.168.0.55).

Question: if the source host and the destination host are not in the same network segment, the broadcast frame requested by ARP cannot pass through the router, how can the source host communicate with the destination host?

IP segment format

The header length and data length of IP datagram are variable, but they are always integer multiples of 4 bytes. For IPv4, the 4-bit version field is 4. The value of 4-bit head length is in 4 bytes, the minimum value is 5, that is to say, the minimum length of the head is 4x5=20 bytes, that is, the IP head without any options, the maximum value represented by 4-bit is 15, that is to say, the maximum length of the head is 60 bytes. There are three bits in the 8-bit TOS field to specify the priority of IP datagram (which has been discarded at present), four bits to indicate the optional service type (minimum delay, maximum throughput, maximum reliability and minimum cost), and one bit is always 0. The total length is the number of bytes of the whole datagram (including IP header and IP layer payload). For each IP datagram transmitted, the 16 bit identifier plus 1 can be used to segment and reassemble datagrams. The 3-bit flag and the 13 bit slice offset are used for slicing. TTL (Time to live) is used in this way: the source host sets a lifetime for packets, such as 64, which is subtracted by 1 for each router. If it is reduced to 0, it means that the route is too long and the network of the destination host cannot be found, the packet is discarded. Therefore, the unit of this lifetime is not seconds, but hop. The protocol field indicates whether the upper protocol is TCP, UDP, ICMP, or IGMP. Then there is the check sum, which only checks the IP header, and the data verification is in the charge of the higher level protocol. IPv4 has an IP address length of 32 bits.

Think about it. As mentioned earlier, the minimum data length in Ethernet frame is 46 bytes, and those less than 46 bytes need to be filled with padding bytes. Then how to define how many bytes in 46 bytes are IP, ARP or RARP datagrams, followed by padding bytes?

UDP Datagram Format

Next, we analyze a frame of TFTP protocol based on UDP.
Ethernet head
0000: 00 05 5d 67 d0 b1 00 05 5d 61 58 a8 08 00
IP first
0000: 45 00
0010: 00 53 93 25 00 00 80 11 25 ec c0 a8 00 37 c0 a8
0020: 00 01
UDP first
0020: 05 d4 00 45 00 3f ac 40
TFTP protocol
0020: 00 01 'c'':''''q'
0030: 'w''e''r''q''.''q''w''e'00 'n''e''t''a''s''c''i'
0040: 'i'00 'b''l''k''s''i''z''e'00 '5''1''2'00 't''i'
0050: 'm''e''o''u''t'00 '1''0'00 't''s''i''z''e'00 '0'
0060: 00 Ethernet head: the source MAC address is 00:05:5d:61:58:a8, the destination MAC address is 00:05:5d:67:d0:b1, and the upper layer protocol type 0x0800 represents IP.

IP header: each byte 0x45 contains 4-bit version number and 4-bit header length. Version number is 4, i.e. IPv4, and the header length is 5, indicating that the IP header does not have option field. Service type is 0, service is not used. The total length field of 16 bits (including the length of IP header and IP layer payload) is 0x0053, i.e. 83 bytes. Adding 14 bytes of Ethernet header, the whole frame length is 97 bytes. IP message ID is 0x9325, flag field and slice offset field are set to 0x0000, that is, DF=0 allows fragmentation, MF=0 this datagram has no more fragmentation and slice offset. TTL is 0x80, which is 128. Upper protocol 0x11 indicates UDP protocol. The IP header checksum is 0x25ec, the source host IP is c0 a8 00 37 (192.168.0.55), and the destination host IP is c0a800 01 (192.168.0.1).

UDP header: the source port number 0x05d4 (1492) is the port number of the client, and the destination port number 0x0045 (69) is the well-known port number of the TFTP service. The UDP message length is 0x003f, that is, 63 bytes, including the length of UDP header and UDP layer pay load. The check sum of UDP header and UDP layer payload is 0xac40.

TFTP is a text-based protocol. Each field is separated by byte 0. The first 00 01 indicates a request to read a file. The next fields are:
c:\qwerq.qwe
netascii
blksize 512
timeout 10
tsize 0

In general, network communication is like TFTP protocol. The two sides of communication are the client and the server. The client initiates the request actively (the above example is the request frame initiated by the client), while the server passively waits, receives and responds to the request. The IP address and port number of the client uniquely identify the TFTP client process on the host, and the IP address and port number of the server uniquely identify the TFTP service process on the host. Since the client is the party who initiatively initiates the request, it must know the IP address of the server and the port number of the TFTP service process. Therefore, some common network protocols have default server ports, For example, the HTTP service defaults to port 80 of the TCP protocol, the FTP service defaults to port 21 of the TCP protocol, and the TFTP service defaults to port 69 of the UDP protocol (as shown in the above example). When using the client program, you must specify the host name or IP address of the server. If you do not explicitly specify the port number, the default port is used. Please refer to the man page of FTP, TFTP and other programs to learn how to specify the port number. /All well-known service ports and corresponding transport layer protocols are listed in etc/services, which are assigned by IANA (Internet assigned numbers According to authority), some of these services can use either TCP or UDP. For clarity, IANA requires that such services use the same default port number of TCP or UDP, while others use different services with the same port number of TCP and UDP.

Many services have well-known port numbers, but the port numbers of client programs do not need to be well-known. Usually, the system automatically assigns an idle port number every time the client program runs, and releases it when it runs out, which is called the ephemeral port number. What's this for?

As mentioned earlier, UDP protocol is not connection oriented and does not guarantee the reliability of transmission, for example:
The UDP protocol layer at the sender only encapsulates the data from the application layer into segments and hands them over to the IP protocol layer to complete the task. If the segment cannot be sent to the other party due to network failure, the UDP protocol layer will not return any error information to the application layer.

The UDP protocol layer of the receiving end only gives the received data to the corresponding application program according to the port number to complete the task. If the sending end sends multiple packets and passes through different routes on the network, the order of arrival at the receiving end is already disordered, and the UDP protocol layer does not guarantee that the data is delivered to the application layer according to the order of sending.

Usually, the UDP protocol layer of the receiver places the received data in a fixed size buffer waiting for the application program to extract and process. If the application program extracts and processes very slowly, and the sender sends very fast, the packet will be lost. The UDP protocol layer does not report this error.

Therefore, applications using UDP protocol must take these possible problems into account and implement appropriate solutions, such as waiting for response, overtime retransmission, packet numbering, traffic control, etc. Generally, the application program using UDP protocol is relatively simple, only sending some messages with low reliability requirements, rather than sending a large number of data. For example, the TFTP protocol based on UDP is generally only used to transfer small files (so it's called trivial FTP), while the FTP protocol based on TCP is applicable to the transmission of various files. How can TCP protocol use connection oriented service instead of application program to solve the problem of transmission reliability.

TCP datagram format


Like UDP protocol, it also has active port number and destination port number. Both sides of communication are identified by IP address and port number. 32-bit serial number, 32-bit confirmation serial number and window size will be explained in detail later. The length of the 4-bit header is similar to that of the IP protocol header, indicating the length of the TCP protocol header in 4 bytes. Therefore, the maximum length of the TCP protocol header can be 4x15=60 bytes. If there is no option field, the minimum length of the TCP protocol header is 20 bytes. URG, ACK, PSH, RST, SYN and FIN are six control bits. Later in this section, four bits SYN, ACK, FIN and RST will be explained, and the explanation of other bits will be omitted. 16 bit check and count the TCP protocol header and data. The emergency pointer and various options are explained briefly.

TCP protocol

TCP communication sequence

The following figure is the sequence diagram of a TCP communication. TCP connection established and disconnected. It includes the well-known three and four handshakes.

In this example, first the client initiates the connection and sends the request, then the server responds to the request, and then the client actively closes the connection. The two vertical lines represent the two ends of communication, and the order of time from top to bottom. Note that it also takes time for data to pass from one end to the other end of the network, so the arrows in the figure are oblique. The segments sent by both parties are numbered 1-10 in chronological order, and the main information in each segment is marked on the arrow. For example, the arrow of segment 2 is marked with SYN, 8000(0), ACK1001, , indicating SYN position 1 in the segment, 32-bit serial number is 8000, the segment does not carry payload (data byte number is 0), ACK position 1, 32-bit confirmation serial number is 1001, with an mss (Maximum Segment Size, maximum message length) option value of 1024.

The process of establishing a connection (three handshakes):

1. The client sends a TCP message with SYN flag to the server. This is paragraph 1 of the three handshake process.
Client sends segment 1, SYN bit indicates connection request. The serial number is 1000. This serial number is used as a temporary address in network communication. For each data byte sent, this serial number should be increased by 1. In this way, the receiver can arrange the correct sequence of data packets according to the serial number, and can find the situation of packet loss. In addition, it is stipulated that the SYN bit and the FIN bit should also occupy a serial number. Although no data is sent this time, the SYN bit is sent, so it will be sent again next time Serial number 1001 should be used for delivery. mss indicates the maximum segment size. If a segment is too large and the maximum frame length of the link layer is exceeded after being encapsulated into a frame, it must be partitioned in the IP layer. To avoid this situation, the client declares its maximum segment size, and it is recommended that the segment sent from the server does not exceed this length.

2. The server responds to the client, which is the second message segment in the three-way handshake, with both ACK and SYN flags. It indicates the response to the client SYN just now; at the same time, it sends SYN to the client to ask if the client is ready for data communication.
The server sends out segment 2, which also has SYN bit. At the same time, set ACK bit to indicate confirmation. The confirmation serial number is 1001, which means "I have received the serial number 1000 and all previous segments. Please send the segment with serial number 1001 next time". That is to say, I answered the connection request of the client, and also sent a connection request to the client. At the same time, I declare that the maximum size is 1024.

3. The client must respond to an ACK message on the server side again. This is segment 3.
The client sends out segment 3 to answer the connection request of the server, and the confirmation serial number is 8001. In this process, the client and the server send the connection request to the other party and answer the connection request of the other party. The request and response of the server are sent in one segment, so there are three segments in total for establishing the connection, which is called "three-way handshake". At the same time of establishing the connection, both parties negotiate some information, such as the initial value of the serial number sent by both parties, the maximum segment size, etc.

In TCP communication, if one party receives the segment sent by the other party, reads out the destination port number, and finds that no process on the local machine uses this port, it will reply to the other party with a segment containing RST bits. For example, the server does not use port 8080 for any process, but we use a telnet client to connect to it. When the server receives the SYN segment sent by the client, it will respond to an RST segment. After receiving the rst segment, the telnet program of the client reports the error connection rejected:
$ telnet 192.168.0.200 8080
Trying 192.168.0.200...
telnet: Unable to connect to remote host: Connection refused

Data transmission process:

1. The client sends out segment 4, which contains 20 bytes of data starting from serial number 1001.
2. The server sends out section 5, and the confirmation serial number is 1021. For the data with serial number of 1001-1020, it indicates that it has received the confirmation. At the same time, it requests to send the data starting with serial number of 1021. While the server responds, it also sends 10 bytes of data starting from serial number of 8001 to the client, which is called piggyback.
3. The client sends out section 6. The data with serial number of 8001-8010 sent by the server indicates that it is confirmed to receive and requests to send the data starting with serial number of 8011.

In the process of data transmission, ACK and acknowledgement sequence number are very important. The data sent by the application program to the TCP protocol will be temporarily stored in the sending buffer of the TCP layer. After sending the data packet to the other party, only the ACK segment receiving the response from the other party can know that the data packet has indeed been sent to the other party and can be released from the sending buffer. If the data packet is lost due to network failure or The sender loses the ACK segment sent back by the other party. After the timeout, the TCP protocol automatically retransmits the packets in the sending buffer.

The process of closing the connection (four waves):

Since TCP connections are full duplex, each direction must be closed separately. This principle is that when a party completes its data transmission task, it can send a FIN to terminate the connection in this direction. Receiving a FIN only means that there is no data flow in this direction. A TCP connection can still send data after receiving a FIN. The first party to perform the shutdown will perform the active shutdown, while the other party will perform the passive shutdown.

1. The client sends segment 7. The FIN bit indicates the request to close the connection.
2. The server sends segment 8 to answer the client's close connection request.
3. The server sends segment 9, which also contains the FIN bit, to send a close connection request to the client.
4. The client sends segment 10 to answer the server's close connection request.

The process of establishing a connection is a three-party handshake, and closing a connection usually requires four segments. The server's response and closing a connection request are not usually combined in one segment. Because there is a situation where the connection is half closed, the client can no longer send data to the server after closing the connection, but the server can also send data to the client until the server also closes the connection Until.

Sliding window (TCP flow control)

When introducing UDP, we describe the following problems: if the sender sends faster and the receiver receives data at a slower speed, and the size of the receiving buffer is fixed, the data will be lost. TCP protocol solves this problem through "Sliding Window" mechanism. See the communication process below:

1. The sender initiates the connection, stating that the maximum segment size is 1460, the initial sequence number is 0, and the window size is 4K, which means "my receive buffer has 4K bytes free, and the data you send should not exceed 4K". The receiver responds to the connection request, stating that the maximum segment size is 1024, the initial sequence number is 8000, and the window size is 6K. The sender answers and the three-party handshake ends.

2. The sender sends out segments 4-9 with 1K data in each segment. The sender knows that the buffer of the receiver is full according to the window size, so it stops sending data.

3. The receiving end's application program picks up 2K data, and the receiving buffer has 2K free again. The receiving end sends segment 10, and declares that the window size is 2K when it answers that 6K data has been received.

4. The application program at the receiving end picks up 2K data again. 4K of the receiving buffer is free. The receiving end sends segment 11 and redeclares the window size as 4K.

5. Sending end sends out segments 12-13, each segment with 2K data, and segment 13 also contains FIN bits.

6. The receiver responds to the received 2K data (6145-8192), plus the FIN bit occupies a sequence number of 8193, so the response sequence number is 8194, the connection is in a semi closed state, and the receiver also declares that the window size is 2K.

7. The receiving end's application program picks up 2K data, and the receiving end redeclares the window size as 4K.

8. the application program at the receiving end takes the remaining 2K data, the receiving buffer is all empty, and the receiving end re declares the window size as 6K.

9. After the receiving end's application program picks up all the data, it decides to close the connection. The sending section 17 contains the FIN bit. The sending end responds and the connection is completely closed.

In the above figure, the receiving end uses small squares to represent 1K data, the solid small squares to represent the received data, and the dotted box to represent the receiving buffer. Therefore, the hollow small squares embedded in the dotted box represent the window size. As can be seen from the figure, the dotted box slides to the right as the application advances the data, so it is called the sliding window.

It can also be seen from this example that the sender sends data one K at a time, while the receiver's application program can carry data two K at a time, of course, it is also possible to carry 3K or 6K data at a time, or only a few bytes of data at a time. In other words, the data seen by the application is a whole, or a stream. In the underlying communication, these data may be split into many packets to send, but how many bytes of a packet are not visible to the application, so the TCP protocol is a stream oriented protocol. UDP is a message oriented protocol. Every UDP segment is a message. The application program must extract data in message units, not any bytes at a time. This is very different from TCP.

TCP state transition

As many people know, this picture is very helpful to eliminate and locate network or system faults, but how to firmly carve this picture in the brain? Then you must have a deep understanding of each state of this picture and the process of transformation, and you can't just stay in a half understanding. The following 11 states of this picture are analyzed in detail, so as to strengthen memory! Before that, though, let's review the three-way handshake process for TCP to establish a connection and the four-way handshake process for closing a connection.

CLOSED: indicates the initial state.

LISTEN: this state indicates that a SOCKET on the server side is in listening state and can accept connections.

SYN send: this status corresponds to SYN RCVD. When the client SOCKET performs CONNECT connection, it first sends SYN message, then enters SYN send status, and waits for the second message in the three handshakes sent by the server. SYN sent status indicates that the client has sent SYN message.

SYN_RCVD: this state indicates that the SYN message is received. Under normal circumstances, this state is an intermediate state during the three handshake sessions of the server-side SOCKET when the TCP connection is ESTABLISHED, which is very short. In this state, when receiving the ACK message from the client, it will enter the ESTABLISHED state.

ESTABLISHED: indicates that the connection has been ESTABLISHED.

The real meaning of the status of FIN ﹣ wait ﹣ 1 and FIN ﹣ wait ﹣ 2 is to wait for the FIN message of the other party. The difference is:
When the socket is in the ESTABLISHED state, if you want to actively close the connection and send the FIN message to the other party, the socket will enter the FIN wait state.

When the other party responds to the ACK, the socket will enter the fin wait state. Normally, the other party should respond to the ACK message immediately, so the fin wait state is generally difficult to see, while the fin wait state can be seen by netstat.

FIN ﹣ wait ﹣ 2: the party who actively closes the link, sends FIN to enter the state after receiving the ACK. It is called semi connected or semi closed state. socket in this state can only receive data and cannot send.

TIME_WAIT: indicates receiving the FIN message from the other party and sending out the ACK message. After 2MSL, it can return to the CLOSED available state. If you receive a message with both the FIN flag and the ACK flag from the other party in the FIN ﹣ wait ﹣ 1 state, you can directly enter the time ﹣ wait state without going through the FIN ﹣ wait ﹣ 2 state.

CLOSING: this state is special and belongs to a rare state. Under normal circumstances, when you send FIN message, you should receive (or receive) the ACK message of the other party at the same time, and then receive the FIN message of the other party. However, the CLOSING status indicates that after sending FIN message, you have not received the ACK message of the other party, but also received the FIN message of the other party. When will this happen? If both parties close a SOCKET almost at the same time, then there will be a situation that both parties send FIN messages at the same time, that is, the CLOSING state will appear, indicating that both parties are CLOSING the SOCKET connection.

Close? Wait: this status indicates that the system is waiting to be closed. When the other party closes a SOCKET and sends FIN message to itself, the system will respond to an ACK message to the other party. At this time, it will enter the close ﹣ wait state. Next, check whether there is any data sent to the other party. If there is no such SOCKET, send FIN message to the other party, that is, close the connection. Therefore, you need to close the connection in the close ﹣ wait state.

LAST_ACK: this status is that the passive Closing Party waits for the ACK message of the other party after sending FIN message. After receiving the ACK message, it can enter the CLOSED available state.

Half closed

When A sends FIN request to close in the TCP link, and B responds to ACK (A enters the FIN ﹣ wait ﹣ 2 state), and B does not immediately send FIN to A, A is in the semi link state, at this time A can receive the data sent by B, but A can no longer send data to B.
From the perspective of program, API can be used to control the implementation of semi connection state.

#include <sys/socket.h>
int shutdown(int sockfd, int how);
sockfd: the descriptor of the socket to be closed
how: allows you to select the following methods for the shutdown operation:
Shut [Rd (0): turn off the read function on sockfd. This option will not allow sockfd to read.
The socket no longer accepts data, and any data currently in the socket receive buffer will be silently discarded.

Shut [WR (1): turn off the write function of sockfd. This option will not allow sockfd to write. The process cannot write to this socket.

Shut [rdwr (2): turn off the read and write function of sockfd. It's equivalent to calling shutdown twice: first with shut'rd, then with shut'wr.

Use close to terminate a connection, but it only reduces the reference count of the descriptor, and does not directly close the connection. Only when the reference count of the descriptor is 0 can the connection be closed.

shutdown directly closes the descriptor without considering the reference count of the descriptor. You can also choose to abort the connection in one direction, only reading or writing.

Be careful:

1. If more than one process shares a socket, the count will be reduced by 1 for each call to close, until the count is 0, that is to say, all processes used call close, and the socket will be released.

2. If one process calls shutdown (SFD, shutdown [rdwr), other processes will not be able to communicate. However, if one process is closed (SFD), it will not affect other processes.

2MSL

There are two reasons for the existence of 2msl (maximum segment lifetime) time ﹣ wait status:

(1) Make the closing process of the 4-time handshake more reliable; the last ack of the 4-time handshake is sent by the active closing party. If the ACK is lost, the passive closing party will send a FIN again. If the active shutdown party can maintain a 2MSL time ﹣ wait state, there is a greater chance that the lost ack will be sent out again.

(2) Prevent the loss duplicate from damaging the subsequent transmission of new normal links. Lost replicate is very common in the actual network. It is often due to the failure of the router and the path cannot converge, which causes A packet to jump between routers A, B and C like A dead cycle. There is A TTL in the IP header, which limits the maximum number of hops of A packet in the network, so this packet has two fates, either TTL becomes 0 at last and disappears in the network, or TTL converges in the router path before it becomes 0, and it finally reaches the destination with the remaining TTL hops. Unfortunately, TCP sent as like as two peas to the package through the timeout retransmission mechanism, and before it reached its destination, its fate was doomed to be abandoned by TCP protocol stack.

Another concept is called inception connection, which refers to the same new connection as the last socket pair. It is called inception of previous connection. Lost replicate plus incoming connection will cause fatal errors to our transmission.

TCP is streaming, the arrival order of all packets is inconsistent, and the sequence is spliced by the TCP protocol stack depending on the serial number. If an incoming connection receives seq=1000, and a low duplicate is seq=1000, len=1000, then TCP believes that the low duplicate is legal, and the received buffer is put in at the same time, resulting in transmission errors. Through a 2msl time ﹣ wait state, ensure that all lost duplicate s will disappear to avoid errors on new connections.

Why is this state designed to be actively closed
(1) The last ACK is from the active shutdown party.
(2) As long as one party maintains the time ﹣ wait state, it can avoid the re establishment of the initiation connection in 2MSL, and it does not need both parties to have it.

How to treat 2msl time & wait correctly?

RFC requires that when the socket pair is in time ﹣ wait, it cannot start another instance connection. But most TCP implementations impose more stringent restrictions. During 2MSL waiting, the local port used in socket can no longer be used by default.
If A 10.234.5.5: 1234 and B 10.55.55.60: 6666 establish A connection and A actively closes, then as long as the port of A is 1234, no matter what the port and ip of the other party are, the service is not allowed to restart. This is even more strict than the RFC limit. RFC only requires that the socket pair is inconsistent. In the implementation, as long as the port is in time ﹣ wait, the connection is not allowed. This restriction doesn't matter to the active opening party, because the temporary port is generally used; but for the passive opening party, the server is generally tragic, because the server is generally familiar with the port. For example, http, the general port is 80, so it is impossible to allow this service to fail in 2MSL.

The solution is to set the so ﹣ reuseaddr option for the server socket, so that even if the well-known port is in the time ﹣ wait state, the service can still be started on this port. Of course, although there is so "reuseaddr option, the limitation of sockt pair still exists. For example, in the above example, A still listens on port 1234 through the so ﹣ reuseaddr option, but if we connect from B through port 6666, the TCP protocol will tell us that the connection failed because Address already in use

In RFC 793, MSL is defined as 2 minutes, 30 seconds, 1 minute and 2 minutes are commonly used in practical application.

RFC (Request For Comments) is a series of numbered documents. Collected information about the Internet, as well as software files for UNIX and the Internet community.

Problems in programming

To do a test, first start the server, then start the client, use Ctrl-C to terminate the server, and then run the server immediately. The results are as follows:

itcast$ ./server
bind error: Address already in use

This is because, although the application of the server is terminated, the connection of the TCP protocol layer is not completely disconnected, so it is not allowed to listen to the same server port again. Let's use the netstat command to check:

itcast$ netstat -apn |grep 6666
tcp 1 0 192.168.1.11:38103 192.168.1.11:6666 CLOSE_WAIT 3525/client
tcp 0 0 192.168.1.11:6666 192.168.1.11:38103 FIN_WAIT2 -

When the server terminates, the socket descriptor will automatically close the concurrent FIN segment to the client, and the client will be in the close ﹣ wait state after receiving the FIN, but the client does not terminate or close the socket descriptor, so the FIN will not be sent to the server, so the TCP connection of the server is in the FIN ﹣ wait2 state.

Now use Ctrl-C to terminate the client, and then observe the phenomenon:

itcast$ netstat -apn |grep 6666
tcp 0 0 192.168.1.11:6666 192.168.1.11:38104 TIME_WAIT -
itcast$ ./server
bind error: Address already in use

When the client terminates, the socket descriptor is automatically closed, and the TCP connection of the server is in the time ﹣ wait state after receiving the FIN segment sent by the client.

TCP protocol stipulates that the party who actively closes the connection should be in the time ﹣ wait state and wait for two MSL (maximum segment life time) before returning to the CLOSED state. Because we first Control-C to terminate the server, so the server is the party who actively closes the connection. During the time ﹣ wait period, the same server port cannot be monitored again.

MSL is specified as two minutes in RFC 1122, but the implementation of each operating system is different. Generally, the server can be started again after half a minute on Linux. Refer to Section 2.7 of unp for the reason why time ﹐ wait should be specified.

Port multiplexing

It is unreasonable not to allow re listening until the TCP connection of the server is completely disconnected. Because the TCP connection is not completely disconnected means that the connfd (127.0.0.1:6666) is not completely disconnected, and we re listen to the LIS tenfd (0.0.0.0:6666). Although it occupies the same port, the IP address is different. The connfd corresponds to a specific IP address that communicates with a client, while the listenfd corresponds to the wildcard address. The solution to this problem is to use setsockopt() to set the option so'reuseaddr of the socket descriptor to 1, which means that multiple socket descriptors with the same port number but different IP addresses are allowed to be created.

Insert the following code between the socket() and bind() calls of the server code:

int opt = 1;
setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

For other options that setsockopt can set, refer to Chapter 7 of the UNP.

TCP abnormal disconnection

Heartbeat detection mechanism

In the TCP network communication, the abnormal disconnection between the client and the server often occurs, so it is necessary to detect the link status in real time. The common solution is to add heartbeat mechanism into the program.

Heart beat thread

This is the most common and simple method. When receiving and sending data, I personally design a daemons (threads) to send the heart beat packet regularly. After receiving the packet, the client / server immediately returns the corresponding packet to check whether the other party is online in real time.

The advantage of this method is universal, but the disadvantage is that it will change the existing communication protocol! We usually use the heartbeat of the business layer to deal with it, which is mainly flexible and controllable.

It is not recommended to use so ﹣ keepalive for heartbeat detection in UNIX network programming. It is better to use heartbeat packets for detection in the business layer, which is also convenient for control.

Set TCP properties
So keep the connection and check whether the other host crashes to avoid (server) blocking the input of TCP connection forever. After setting this option, if there is no data exchange in any direction of this set of interfaces within 2 hours, TCP will automatically send a keep alive probe to the other side. This is a TCP segment that the other party must respond to. It will cause the following three situations: the other party receives everything normal: respond with the expected ACK. After 2 hours, TCP will send out another detection segment. The other party has crashed and restarted: in response to RST. The pending error of the socket is set to econreset, and the socket itself is closed. No response from the other side: TCP from berkeley sends another 8 probe segments, one every 75 seconds, trying to get a response. If there is no response after 11 minutes and 15 seconds of the first detection segment, it will be abandoned. The pending error of the socket is set to ETIMEOUT, and the socket itself is closed. For example, if the ICMP error is "host unreachable", it means that the opposite host has not crashed, but it is not reachable. In this case, the error to be processed is set to EHOSTUNREACH.

According to the above introduction, we can know whether the other side's TCP connection still exists after 2 hours by setting the so ﹐ keepalive property when the other side disconnects in a non elegant way.

keepAlive = 1;
setsockopt(listenfd, SOL_SOCKET, SO_KEEPALIVE, (void*)&keepAlive, sizeof(keepAlive));

If we can't accept such a long waiting time, we can know from TCP-Keepalive-HOWTO that there are two ways to set, one is to modify the configuration parameters of the kernel on the network, the other is the three options of the SOL_TCP field: TCP_KEEPIDLE, TCP_KEEPINTVL, and TCP_KEEPCNT.

1.The tcp_keepidle parameter specifies the interval of inactivity that causes TCP to generate a KEEPALIVE transmission for an application that requests them. tcp_keepidle defaults to 14400 (two hours).
/TCP empty close time before the first KeepAlive detection/
2.The tcp_keepintvl parameter specifies the interval between the nine retriesthat are attempted if a KEEPALIVE transmission is not acknowledged. tcp_keep ntvldefaults to 150 (75 seconds).
/Time interval between two KeepAlive detections/
3.The tcp_keepcnt option specifies the maximum number of keepalive probes tobe sent. The value of TCP_KEEPCNT is an integer value between 1 and n, where n s the value of the systemwide tcp_keepcnt parameter.
/KeepAlive detection times before disconnection/
int keepIdle = 1000;
int keepInterval = 10;
int keepCount = 10;

Setsockopt(listenfd, SOL_TCP, TCP_KEEPIDLE, (void *)&keepIdle, sizeof(keepIdle));
Setsockopt(listenfd, SOL_TCP,TCP_KEEPINTVL, (void *)&keepInterval, sizeof(keepInterval));
Setsockopt(listenfd,SOL_TCP, TCP_KEEPCNT, (void *)&keepCount, sizeof(keepCount));

So keep alive is set to send a "keep alive detection section" after idle for 2 hours, which can not guarantee real-time detection. For judging the time of network disconnection is too long, it is not suitable for the program that needs to respond in time.

Of course, you can also modify the time interval parameter, but it will affect all the socket interfaces that open this option! Socket associated with the completion port may ignore this socket option.

Socket programming

Socket concept

Socket itself has the meaning of "socket". In the Linux environment, it is used to represent the special file type of inter process network communication. In essence, it is a pseudo file formed by the kernel with buffer.

Since it's a file, of course, we can use the file descriptor to refer to the socket. Similar to the pipeline, the purpose of encapsulating the Linux system as a file is to unify the interface and make the operation of read / write socket and read / write file consistent. The difference is that pipes are mainly used for communication between local processes, while sockets are mostly used for data transmission between network processes.

The implementation of socket kernel is complex, so it is not suitable to study deeply in the early stage of learning.

In TCP/IP protocol, "IP address + TCP or UDP port number" uniquely identifies a process in network communication. "IP address + port number" corresponds to a socket. The two processes to establish a connection have a socket to identify each other. Then the socket pair composed of these two sockets uniquely identifies a connection. So socket can be used to describe the one-to-one relationship of network connection.

The socket communication principle is shown in the following figure:


In network communication, sockets must appear in pairs. The send buffer at one end corresponds to the receive buffer at the other end. We use the same file descriptor to index the send buffer and the receive buffer.

TCP/IP protocol was first implemented on BSD UNIX. The application layer programming interface designed for TCP/IP protocol is called socket API. The main content of this chapter is socket API, which mainly introduces the function interface of TCP protocol, and finally UDP protocol and UNIX Domain Socket.


Preparatory knowledge

Network byte order

We have known that the multibyte data in memory can be divided into large end and small end with respect to memory address, and the multibyte data in disk file can also be divided into large end and small end with respect to offset address in file. Network data flow also has big end and small end, so how to define the address of network data flow? The sending host usually sends out the data in the sending buffer in the order of memory address from low to high. The receiving host saves the bytes received from the network in the receiving buffer in turn, and also saves them in the order of memory address from low to high. Therefore, the address of the network data stream should be specified as follows: the data sent out first is the low address, and the data sent out later is the high address.

According to the TCP/IP protocol, the network data flow should adopt large end byte order, i.e. low address and high byte. For example, in the UDP segment format of the previous section, address 0-1 is the 16 bit source port number. If the port number is 1000 (0x3e8), address 0 is 0x03, address 1 is 0xe8, that is, 0x03 is sent first, and then 0xe8 is sent. These 16 bits should also be 0x03 for low address memory and 0xe8 for high address memory in the buffer of sending host. However, if the sending host is in small endian byte order, the 16 bits are interpreted as 0xe803 instead of 1000. Therefore, the sending host needs to do byte order conversion before filling 1000 into the sending buffer. Similarly, if the receiving host is in small endian byte order, the source port number receiving 16 bits also needs to be converted into byte order. If the host is in large byte order, neither transmission nor reception needs to be converted. Similarly, the 32-bit IP address should also consider the network byte order and the host byte order.

In order to make the network program portable and make the same C code run normally after being compiled on the big end and small end computers, the following library functions can be called for the conversion of network byte order and host byte order.

#include <arpa/inet.h>

uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

h for host, n for network, l for 32-bit long integer, s for 16 bit short integer.

If the host is a small endian byte order, these functions convert the parameters to the corresponding size endian and return them. If the host is a large endian byte order, these functions do not convert and return the parameters intact.

IP address translation function

Early:

#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int inet_aton(const char *cp, struct in_addr *inp);
in_addr_t inet_addr(const char *cp);
char *inet_ntoa(struct in_addr in);

Only IPv4 ip addresses can be processed
Non reentrant function
Note that the parameter is struct in addr

Now:

#include <arpa/inet.h>
int inet_pton(int af, const char *src, void *dst);
const char *inet_ntop(int af, const void *src, char *dst, socklen_t size);

Support for IPv4 and IPv6
Reentrant function
INET ﹣ Pton and INET ﹣ ntop can not only convert in ﹣ addr of IPv4, but also in6 ﹣ addr of IPv6.
So the function interface is void* addrptr.

sockaddr data structure

Many network programming functions of strcut sockaddr were born before IPv4 protocol. At that time, the SOCKADDR structure was used. In order to be forward compatible, SOCKADDR now degenerates into (void *) function, passing an address to the function. As for whether the function is sockaddr_in or sockaddr_in6, it is determined by the address family, and then the internal type of the function is forced to convert to the required address type.

struct sockaddr {
	sa_family_t sa_family; 		/* address family, AF_xxx */
	char sa_data[14];			/* 14 bytes of protocol address */
};

Use sudo grep -r "struct SOCKADDR {/ usr" command to view the definition of struct SOCKADDR {/ usr. Generally, the default storage location is in the file: usr/include/linux/in.h.

struct sockaddr_in {
	__kernel_sa_family_t sin_family; 			/* Address family */  	Address structure type
	__be16 sin_port;					 		/* Port number */		Port number
	struct in_addr sin_addr;					/* Internet address */	IP address
	/* Pad to size of `struct sockaddr'. */
	unsigned char __pad[__SOCK_SIZE__ - sizeof(short int) -
	sizeof(unsigned short int) - sizeof(struct in_addr)];
};
struct in_addr {						/* Internet address. */
	__be32 s_addr;
};

struct sockaddr_in6 {
	unsigned short int sin6_family; 		/* AF_INET6 */
	__be16 sin6_port; 					/* Transport layer port # */
	__be32 sin6_flowinfo; 				/* IPv6 flow information */
	struct in6_addr sin6_addr;			/* IPv6 address */
	__u32 sin6_scope_id; 				/* scope id (new in RFC2553) */
};

struct in6_addr {
	union {
		__u8 u6_addr8[16];
		__be16 u6_addr16[8];
		__be32 u6_addr32[4];
	} in6_u;
	#define s6_addr 		in6_u.u6_addr8
	#define s6_addr16 	in6_u.u6_addr16
	#define s6_addr32	 	in6_u.u6_addr32
};

#define UNIX_PATH_MAX 108
	struct sockaddr_un {
	__kernel_sa_family_t sun_family; 	/* AF_UNIX */
	char sun_path[UNIX_PATH_MAX]; 	/* pathname */
};

The address formats of Pv4 and IPv6 are defined in netinet/in.h. IPv4 address is represented by sockaddr_in structure, including 16 bit port number and 32-bit IP address. IPv6 address is represented by sockaddr_in6 structure, including 16 bit port number, 128 bit IP address and some control fields. The address format of UNIX domain socket is defined in sys/un.h, which is represented by the sock addr UUN structure. The beginning of each socket address structure is the same. The first 16 bits represent the length of the whole structure (not all UNIX implementations have length fields, such as Linux), and the last 16 bits represent the address type. The address types of IPv4, IPv6 and UNIX domain socket are defined as the constants AF ﹣ INET, AF ﹣ inet6 and AF ﹣ UNIX, respectively. In this way, as long as you get the first address of a certain SOCKADDR structure, you don't need to know which type of SOCKADDR structure it is, you can determine the content of the structure according to the address type field. Therefore, the socket API can accept various types of SOCKADDR structure pointers as parameters, such as bind, accept, connect and other functions. The parameters of these functions should be designed as void * type to accept various types of pointers, but the implementation of the sock API was earlier than ANSI C standardization, and there was no void * type at that time, so the parameters of these functions were all structured SOCKADDR *For type representation, you need to cast the following types before passing parameters, for example:

struct sockaddr_in servaddr;
bind(listen_fd, (struct sockaddr *)&servaddr, sizeof(servaddr)); /* initialize servaddr */

Network socket function

socket model creation flowchart

socket function

#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int socket(int domain, int type, int protocol);
domain:
AF? INET this is most of the protocol used to generate socket. It uses TCP or UDP for transmission and IPv4 address
AF? Inet6 is similar to the above, but it is used for IPv6 address
AF ﹣ Unix local protocol, which is used on Unix and Linux systems, is generally used when the client and server are on the same platform and on the same platform

type:
Sock "stream is a sequential, reliable, data complete connection based on byte stream. This is the most commonly used socket type, which uses TCP for transmission.
Sock? Dgram is a connectionless, fixed length transfer call. The protocol is unreliable and uses UDP for its connection.
Sock  seqpacket this protocol is a two-way, reliable connection, sending fixed length packets for transmission. This package must be accepted completely before it can be read.
The sock? Raw socket type provides a single network access. This socket type uses ICMP public protocol. (ping and traceroute use this protocol)
The sock? RDM type is rarely used and is not implemented on most operating systems. It is provided for the data link layer to use and does not guarantee the order of data packets

protocol:
Passing 0 means using the default protocol.

Return value:
Success: return the file descriptor pointing to the newly created socket, failure: Return - 1, set errno

socket() opens a network communication port. If it succeeds, it will return a file descriptor like open(). The application program can send and receive data on the network with read/write like a read/write file. If the socket() call fails, it will return - 1. For IPv4, the domain parameter is specified as AF? INET. For TCP protocol, the type parameter is specified as sock "stream, which represents the stream oriented transport protocol. In case of UDP protocol, the type parameter is specified as "sock" Dgram, indicating the datagram oriented transmission protocol. The introduction of the protocol parameter is omitted. Specify 0.

bind function

#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

sockfd:
socket file descriptor

addr:
Construct IP address plus end slogan

addrlen:
sizeof(addr) length

Return value:
Return 0 for success, return - 1 for failure, set errno

The network address and port number monitored by the server program are usually fixed. After the client program knows the address and port number of the server program, it can initiate a connection to the server. Therefore, the server needs to call bind to bind a fixed network address and port number.

The function of bind() is to bind the parameters sockfd and addr together, so that sockfd, a file descriptor used for network communication, listens for the address and port number described by addr. As mentioned earlier, struct sockaddr * is a general pointer type. In fact, the addr parameter can accept sockaddr structures of various protocols, but their lengths are different. Therefore, the third parameter addrlen is required to specify the length of the structure. Such as:

struct sockaddr_in servaddr;
bzero(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(6666);

First, clear the whole structure, then set the address type as AF ﹣ INET, and the network address as inaddr ﹣ any. This macro represents any local IP address, because the server may have multiple network cards, and each network card may be bound with multiple IP addresses, so that the setting can listen on all IP addresses, and it is not determined which IP address to use until the connection with a client is established Address, port 6666.

listen function

#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int listen(int sockfd, int backlog);

sockfd:
socket file descriptor

backlog:
The sum of the number of links between queuing to establish the 3-time handshake queue and just establishing the 3-time handshake queue

View system default backlog
cat /proc/sys/net/ipv4/tcp_max_syn_backlog

A typical server program can serve multiple clients at the same time. When a client initiates a connection, the accept() called by the server returns and accepts the connection. If a large number of clients initiate a connection and the server cannot process it, the clients that have not yet accepted are in the connection waiting state. listen() declares that sockfd is in the listening state, and the maximum number of backlog is allowed The client is in the connection pending state. If more connection requests are received, they will be ignored. listen() returned 0 successfully and - 1 failed.

accept function

#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

sockdf:
socket file descriptor

addr:
Outgoing parameter, return the address information of link client, including IP address and port number

addrlen:
The size of the incoming and outgoing parameters (value result), the size of the incoming sizeof(addr), and the size of the address structure actually received when the function returns

Return value:
A new socket file descriptor is returned successfully to communicate with the client, and - 1 is returned in case of failure, and errno is set

After the three-party handshake is completed, the server calls accept() to accept the connection. If there is no connection request from the client when the server calls accept(), it will block and wait until the client connects. Addr is an outgoing parameter, the address and port number of the outgoing client when accept() returns. The addrlen parameter is an incoming and outgoing parameter (value result argument). The incoming parameter is the length of the buffer addr provided by the caller to avoid the buffer overflow problem. The outgoing parameter is the actual length of the client address structure (possibly not occupying the buffer provided by the caller). If NULL is passed to the addr parameter, the address of the client is not concerned.

Our server program structure is as follows:

while (1) {
	cliaddr_len = sizeof(cliaddr);
	connfd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len);
	n = read(connfd, buf, MAXLINE);
	......
	close(connfd);
}

The whole process is a while loop, which processes one client connection at a time. Since cliaddr_len is an incoming and outgoing parameter, the initial value should be re assigned before each call to accept(). The parameter listenfd of accept() is the previous listening file descriptor, and the return value of accept() is another file descriptor, connfd. After that, it communicates with the client through the connfd. Finally, it closes the connfd to disconnect without closing listenfd. Then it returns to the beginning of the loop and listenfd is still used as the parameter of accept. Accept() successfully returned a file descriptor, error returned - 1.

connect function

#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

sockdf:
socket file descriptor

addr:
Pass in the parameter, specify the server address information, including IP address and port number

addrlen:
Incoming parameter, incoming sizeof(addr) size

Return value:
Return 0 for success, return - 1 for failure, set errno

The client needs to call connect() to connect to the server. The parameter forms of connect and bind are the same. The difference is that the parameter of bind is its own address, while the parameter of connect is the address of the other party. Connect() successfully returned 0, error returned - 1.

C/S model TCP

The following figure shows the general process of client / server program based on TCP protocol:

After the server calls socket(), bind(), listen() to complete initialization, it calls accept() to block and wait, which is in the state of listening port. After the client calls socket() to initialize, it calls connect() to send SYN segment and block and wait for the server to answer. The server answers a SYN-ACK segment. After the client receives it, it returns from connect() and answers an ACK segment at the same time. After the server receives it, it accepts it( ) return.

Data transmission process:

After the connection is established, the TCP protocol provides full duplex communication service, but the general process of client / server program is that the client initiates the request actively, and the server passively processes the request in a question and answer way. Therefore, after the server returns from accept(), it immediately calls read() to read socket just like read pipeline. If no data arrives, it blocks waiting. At this time, the client calls write() to send a request to the server. After the server receives it, it returns from read() to process the client's request. During this period, the client calls read() to block waiting for the server's response, and the server calls write() Send the processing result back to the client, call read() again to block and wait for the next request, and the client will return from read() after receiving it, send the next request, and so on.

If there are no more requests from the client, close() is called to close the connection, just like the pipeline closed by the write end. read() of the server returns 0, so the server knows that the client has closed the connection, and close() is also called to close the connection. Note that after either party calls close(), both transmission directions of the connection are closed and data can no longer be sent. If one party calls shutdown(), the connection is in a semi closed state, and the data sent by the other party can still be received.

When learning the socket API, you should pay attention to how the application program interacts with the TCP protocol layer: what actions the TCP protocol layer performs when the application program calls a socket function, such as SYN segment when calling connect() How can an application know the state change of the TCP protocol layer? For example, if it returns from a blocked socket function, it indicates that the TCP protocol has received some segments. If it returns 0 for read(), it indicates that it has received FIN segments

server

Let's learn about the socket API through the simplest example of a client / server program.
The role of server.c is to read characters from the client, then convert each character to uppercase and send back to the client.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define MAXLINE 80
#define SERV_PORT 6666

int main(void)
{
	struct sockaddr_in servaddr, cliaddr;
	socklen_t cliaddr_len;
	int listenfd, connfd;
	char buf[MAXLINE];
	char str[INET_ADDRSTRLEN];
	int i, n;

	listenfd = socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servaddr.sin_port = htons(SERV_PORT);

	bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr));
	listen(listenfd, 20);

	printf("Accepting connections ...\n");
	while (1) {
		cliaddr_len = sizeof(cliaddr);
		connfd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len);
		n = read(connfd, buf, MAXLINE);
		printf("received from %s at PORT %d\n",
		inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)),
		ntohs(cliaddr.sin_port));
		for (i = 0; i < n; i++)
			buf[i] = toupper(buf[i]);
		write(connfd, buf, n);
		close(connfd);
	}
	return 0;
}

client

The function of client.c is to get a string from the command line parameter and send it to the server, then receive the string returned by the server and print it.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define MAXLINE 80
#define SERV_PORT 6666

int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	char buf[MAXLINE];
	int sockfd, n;
char *str;

	if (argc != 2) {
		fputs("usage: ./client message\n", stderr);
		exit(1);
	}
str = argv[1];

	sockfd = socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	write(sockfd, str, strlen(str));

	n = read(sockfd, buf, MAXLINE);
	printf("Response from server:\n");
	write(STDOUT_FILENO, buf, n);
	close(sockfd);

	return 0;
}

Since the client does not need a fixed port number, it is not necessary to call bind (), and the port number of the client is automatically assigned by the kernel. Note that the client is not not allowed to call bind(), but it is not necessary to call bind() to fix a port number, and the server is not required to call bind(), but if the server does not call bind(), the kernel will automatically assign a listening port to the server. The port number is different each time the server is started, and the client will encounter trouble connecting to the server.

After the client and server start, you can use the netstat command to view the link status:
netstat -apn|grep 6666

Error handling wrapper function

The above example is not only simple in function, but also so simple that there is almost no error handling. We know that system call can not guarantee every success, and error handling must be carried out. On the one hand, it can ensure the normal logic of the program, on the other hand, it can quickly get fault information.

In order to make the error handling code not affect the readability of the main program, we wrap some system functions related to socket with the error handling code into a new function, which is made into a module wrap.c:

wrap.c

#include <stdlib.h>
#include <errno.h>
#include <sys/socket.h>
void perr_exit(const char *s)
{
	perror(s);
	exit(1);
}
int Accept(int fd, struct sockaddr *sa, socklen_t *salenptr)
{
	int n;
	again:
	if ( (n = accept(fd, sa, salenptr)) < 0) {
		if ((errno == ECONNABORTED) || (errno == EINTR))
			goto again;
		else
			perr_exit("accept error");
	}
	return n;
}
int Bind(int fd, const struct sockaddr *sa, socklen_t salen)
{
	int n;
	if ((n = bind(fd, sa, salen)) < 0)
		perr_exit("bind error");
	return n;
}
int Connect(int fd, const struct sockaddr *sa, socklen_t salen)
{
	int n;
	if ((n = connect(fd, sa, salen)) < 0)
		perr_exit("connect error");
	return n;
}
int Listen(int fd, int backlog)
{
	int n;
	if ((n = listen(fd, backlog)) < 0)
		perr_exit("listen error");
	return n;
}
int Socket(int family, int type, int protocol)
{
	int n;
	if ( (n = socket(family, type, protocol)) < 0)
		perr_exit("socket error");
	return n;
}
ssize_t Read(int fd, void *ptr, size_t nbytes)
{
	ssize_t n;
again:
	if ( (n = read(fd, ptr, nbytes)) == -1) {
		if (errno == EINTR)
			goto again;
		else
			return -1;
	}
	return n;
}
ssize_t Write(int fd, const void *ptr, size_t nbytes)
{
	ssize_t n;
again:
	if ( (n = write(fd, ptr, nbytes)) == -1) {
		if (errno == EINTR)
			goto again;
		else
			return -1;
	}
	return n;
}
int Close(int fd)
{
	int n;
	if ((n = close(fd)) == -1)
		perr_exit("close error");
	return n;
}
ssize_t Readn(int fd, void *vptr, size_t n)
{
	size_t nleft;
	ssize_t nread;
	char *ptr;

	ptr = vptr;
	nleft = n;

	while (nleft > 0) {
		if ( (nread = read(fd, ptr, nleft)) < 0) {
			if (errno == EINTR)
				nread = 0;
			else
				return -1;
		} else if (nread == 0)
			break;
		nleft -= nread;
		ptr += nread;
	}
	return n - nleft;
}

ssize_t Writen(int fd, const void *vptr, size_t n)
{
	size_t nleft;
	ssize_t nwritten;
	const char *ptr;

	ptr = vptr;
	nleft = n;

	while (nleft > 0) {
		if ( (nwritten = write(fd, ptr, nleft)) <= 0) {
			if (nwritten < 0 && errno == EINTR)
				nwritten = 0;
			else
				return -1;
		}
		nleft -= nwritten;
		ptr += nwritten;
	}
	return n;
}

static ssize_t my_read(int fd, char *ptr)
{
	static int read_cnt;
	static char *read_ptr;
	static char read_buf[100];

	if (read_cnt <= 0) {
again:
		if ((read_cnt = read(fd, read_buf, sizeof(read_buf))) < 0) {
			if (errno == EINTR)
				goto again;
			return -1;	
		} else if (read_cnt == 0)
			return 0;
		read_ptr = read_buf;
	}
	read_cnt--;
	*ptr = *read_ptr++;
	return 1;
}

ssize_t Readline(int fd, void *vptr, size_t maxlen)
{
	ssize_t n, rc;
	char c, *ptr;
	ptr = vptr;

	for (n = 1; n < maxlen; n++) {
		if ( (rc = my_read(fd, &c)) == 1) {
			*ptr++ = c;
			if (c == '\n')
				break;
		} else if (rc == 0) {
			*ptr = 0;
			return n - 1;
		} else
			return -1;
	}
	*ptr = 0;
	return n;
}

wrap.h

#ifndef __WRAP_H_
#define __WRAP_H_
void perr_exit(const char *s);
int Accept(int fd, struct sockaddr *sa, socklen_t *salenptr);
int Bind(int fd, const struct sockaddr *sa, socklen_t salen);
int Connect(int fd, const struct sockaddr *sa, socklen_t salen);
int Listen(int fd, int backlog);
int Socket(int family, int type, int protocol);
ssize_t Read(int fd, void *ptr, size_t nbytes);
ssize_t Write(int fd, const void *ptr, size_t nbytes);
int Close(int fd);
ssize_t Readn(int fd, void *vptr, size_t n);
ssize_t Writen(int fd, const void *vptr, size_t n);
ssize_t my_read(int fd, char *ptr);
ssize_t Readline(int fd, void *vptr, size_t maxlen);
#endif

High concurrency server

Multiprocess concurrent server

Consider the following when using a multiprocess concurrent server:
1. The maximum number of file descriptions of the parent process (the new file descriptor returned from close close accept is required in the parent process)
2. Number of creation processes in the system (related to memory size)
3. Whether excessive process creation reduces overall service performance (process scheduling)

server

/* server.c */
#include <stdio.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/types.h>
#include "wrap.h"

#define MAXLINE 80
#define SERV_PORT 800

void do_sigchild(int num)
{
	while (waitpid(0, NULL, WNOHANG) > 0)
		;
}
int main(void)
{
	struct sockaddr_in servaddr, cliaddr;
	socklen_t cliaddr_len;
	int listenfd, connfd;
	char buf[MAXLINE];
	char str[INET_ADDRSTRLEN];
	int i, n;
	pid_t pid;

	struct sigaction newact;
	newact.sa_handler = do_sigchild;
	sigemptyset(&newact.sa_mask);
	newact.sa_flags = 0;
	sigaction(SIGCHLD, &newact, NULL);

	listenfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servaddr.sin_port = htons(SERV_PORT);

	Bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	Listen(listenfd, 20);

	printf("Accepting connections ...\n");
	while (1) {
		cliaddr_len = sizeof(cliaddr);
		connfd = Accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len);

		pid = fork();
		if (pid == 0) {
			Close(listenfd);
			while (1) {
				n = Read(connfd, buf, MAXLINE);
				if (n == 0) {
					printf("the other side has been closed.\n");
					break;
				}
				printf("received from %s at PORT %d\n",
						inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)),
						ntohs(cliaddr.sin_port));
				for (i = 0; i < n; i++)
					buf[i] = toupper(buf[i]);
				Write(connfd, buf, n);
			}
			Close(connfd);
			return 0;
		} else if (pid > 0) {
			Close(connfd);
		} else
			perr_exit("fork");
	}
	Close(listenfd);
	return 0;
}

client

/* client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include "wrap.h"

#define MAXLINE 80
#define SERV_PORT 6666

int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	char buf[MAXLINE];
	int sockfd, n;

	sockfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	Connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	while (fgets(buf, MAXLINE, stdin) != NULL) {
		Write(sockfd, buf, strlen(buf));
		n = Read(sockfd, buf, MAXLINE);
		if (n == 0) {
			printf("the other side has been closed.\n");
			break;
		} else
			Write(STDOUT_FILENO, buf, n);
	}
	Close(sockfd);
	return 0;
}

Multithreaded concurrent server

The following issues need to be considered when developing servers using the thread model:
1. Adjust the maximum file descriptor in the process
2. If the thread has shared data, consider thread synchronization
3. When the client thread exits, exit processing. (exit value, separated state)
4. System load, with the increase of linked clients, other threads cannot get CPU in time

server

/* server.c */
#include <stdio.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <pthread.h>

#include "wrap.h"
#define MAXLINE 80
#define SERV_PORT 6666

struct s_info {
	struct sockaddr_in cliaddr;
	int connfd;
};
void *do_work(void *arg)
{
	int n,i;
	struct s_info *ts = (struct s_info*)arg;
	char buf[MAXLINE];
	char str[INET_ADDRSTRLEN];
	/* Before creating a thread, you can set the thread creation property to separate state. Which is more efficient? */
	pthread_detach(pthread_self());
	while (1) {
		n = Read(ts->connfd, buf, MAXLINE);
		if (n == 0) {
			printf("the other side has been closed.\n");
			break;
		}
		printf("received from %s at PORT %d\n",
				inet_ntop(AF_INET, &(*ts).cliaddr.sin_addr, str, sizeof(str)),
				ntohs((*ts).cliaddr.sin_port));
		for (i = 0; i < n; i++)
			buf[i] = toupper(buf[i]);
		Write(ts->connfd, buf, n);
	}
	Close(ts->connfd);
}

int main(void)
{
	struct sockaddr_in servaddr, cliaddr;
	socklen_t cliaddr_len;
	int listenfd, connfd;
	int i = 0;
	pthread_t tid;
	struct s_info ts[256];

	listenfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servaddr.sin_port = htons(SERV_PORT);

	Bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr));
	Listen(listenfd, 20);

	printf("Accepting connections ...\n");
	while (1) {
		cliaddr_len = sizeof(cliaddr);
		connfd = Accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len);
		ts[i].cliaddr = cliaddr;
		ts[i].connfd = connfd;
		/* When the maximum number of threads is reached, pthread ﹣ create error handling increases the stability of the server */
		pthread_create(&tid, NULL, do_work, (void*)&ts[i]);
		i++;
	}
	return 0;
}

client

/* client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include "wrap.h"
#define MAXLINE 80
#define SERV_PORT 6666
int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	char buf[MAXLINE];
	int sockfd, n;

	sockfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	Connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	while (fgets(buf, MAXLINE, stdin) != NULL) {
		Write(sockfd, buf, strlen(buf));
		n = Read(sockfd, buf, MAXLINE);
		if (n == 0)
			printf("the other side has been closed.\n");
		else
			Write(STDOUT_FILENO, buf, n);
	}
	Close(sockfd);
	return 0;
}

Multichannel I/O transfer server

A multiplex IO transfer server is also called a multitask IO server. The main idea of this kind of server implementation is that instead of monitoring the client connection by the application itself, the kernel monitors the files for the application.

There are three main methods

select
1. The number of file descriptors that select can listen to is limited to fd_zzie, generally 1024. Simply changing the number of file descriptors opened by a process does not change the number of select listening files

2. It is very suitable to use select to solve clients below 1024. However, if there are too many linked clients, the polling model will be adopted in select, which will greatly reduce the response efficiency of the server and should not invest more energy in select

#include <sys/select.h>
/* According to earlier standards */
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
int select(int nfds, fd_set *readfds, fd_set *writefds,
			fd_set *exceptfds, struct timeval *timeout);

	nfds: 		Add the largest file descriptor in the monitored file descriptor set1,Because this parameter tells the kernel how many file descriptors are in the state before detection
	readfds: 	Monitor the arrival of read data to the file descriptor set and the incoming and outgoing parameters
	writefds: 	Monitoring write data arrives at the file descriptor set, and incoming and outgoing parameters
	exceptfds: 	Monitoring exception occurs to file descriptor set,If the out of band data arrives abnormally, the incoming and outgoing parameters
	timeout: 	Timed blocking monitoring time,3Species condition
				1.NULL,Wait forever
				2.Set up timeval,Waiting for a fixed time
				3.Set up timeval Li time is0,Return immediately after checking the description word, poll
	struct timeval {
		long tv_sec; /* seconds */
		long tv_usec; /* microseconds */
	};
	void FD_CLR(int fd, fd_set *set); 	//Clear fd in file descriptor set to 0
	int FD_ISSET(int fd, fd_set *set); 	//Test whether fd is set to 1 in the file descriptor set
	void FD_SET(int fd, fd_set *set); 	//Put fd position 1 in the file descriptor set
	void FD_ZERO(fd_set *set); 			//Clear all bits in the file descriptor set to 0

server

/* server.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include "wrap.h"

#define MAXLINE 80
#define SERV_PORT 6666

int main(int argc, char *argv[])
{
	int i, maxi, maxfd, listenfd, connfd, sockfd;
	int nready, client[FD_SETSIZE]; 	/* FD_SETSIZE Default is 1024 */
	ssize_t n;
	fd_set rset, allset;
	char buf[MAXLINE];
	char str[INET_ADDRSTRLEN]; 			/* #define INET_ADDRSTRLEN 16 */
	socklen_t cliaddr_len;
	struct sockaddr_in cliaddr, servaddr;

	listenfd = Socket(AF_INET, SOCK_STREAM, 0);

bzero(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(SERV_PORT);

Bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

Listen(listenfd, 20); 		/* Default maximum 128 */

maxfd = listenfd; 			/* Initialization */
maxi = -1;					/* client[]Subscript */

for (i = 0; i < FD_SETSIZE; i++)
	client[i] = -1; 		/* Initialize client [] with - 1 */

FD_ZERO(&allset);
FD_SET(listenfd, &allset); /* Construct select monitor file descriptor set */

for ( ; ; ) {
	rset = allset; 			/* select the monitoring signal set from the new setting every time */
	nready = select(maxfd+1, &rset, NULL, NULL, NULL);

	if (nready < 0)
		perr_exit("select error");
	if (FD_ISSET(listenfd, &rset)) { /* new client connection */
		cliaddr_len = sizeof(cliaddr);
		connfd = Accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len);
		printf("received from %s at PORT %d\n",
				inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)),
				ntohs(cliaddr.sin_port));
		for (i = 0; i < FD_SETSIZE; i++) {
			if (client[i] < 0) {
				client[i] = connfd; /* Save the file descriptor returned by accept to client [] */
				break;
			}
		}
		/* The maximum number of files that can be monitored by select is 1024 */
		if (i == FD_SETSIZE) {
			fputs("too many clients\n", stderr);
			exit(1);
		}

		FD_SET(connfd, &allset); 	/* Add a new file descriptor to the monitor signal set */
		if (connfd > maxfd)
			maxfd = connfd; 		/* select The first parameter requires */
		if (i > maxi)
			maxi = i; 				/* Update client [] maximum subscript value */

		if (--nready == 0)
			continue; 				/* If there are no more ready file descriptors to go back to the above, select blocking listening,
										Responsible for processing the unhandled ready file descriptor */
		}
		for (i = 0; i <= maxi; i++) { 	/* Check which clients have data ready */
			if ( (sockfd = client[i]) < 0)
				continue;
			if (FD_ISSET(sockfd, &rset)) {
				if ( (n = Read(sockfd, buf, MAXLINE)) == 0) {
					Close(sockfd);		/* When the client closes the link, the server also closes the corresponding link */
					FD_CLR(sockfd, &allset); /* Un select to monitor this file descriptor */
					client[i] = -1;
				} else {
					int j;
					for (j = 0; j < n; j++)
						buf[j] = toupper(buf[j]);
					Write(sockfd, buf, n);
				}
				if (--nready == 0)
					break;
			}
		}
	}
	close(listenfd);
	return 0;
}

client

/* client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include "wrap.h"

#define MAXLINE 80
#define SERV_PORT 6666

int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	char buf[MAXLINE];
	int sockfd, n;

	sockfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	Connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	while (fgets(buf, MAXLINE, stdin) != NULL) {
		Write(sockfd, buf, strlen(buf));
		n = Read(sockfd, buf, MAXLINE);
		if (n == 0)
			printf("the other side has been closed.\n");
		else
			Write(STDOUT_FILENO, buf, n);
	}
	Close(sockfd);
	return 0;
}

pselect

The pselect prototype is as follows. This model is rarely applied.

#include <sys/select.h>
int pselect(int nfds, fd_set *readfds, fd_set *writefds,
			fd_set *exceptfds, const struct timespec *timeout,
			const sigset_t *sigmask);
	struct timespec {
		long tv_sec; /* seconds */
		long tv_nsec; /* nanoseconds */
	};

Replace the blocking signal set of the current process with sigmask, and restore the original blocking signal set after the call returns

poll

#include <poll.h>
int poll(struct pollfd fds, nfds_t nfds, int timeout);
struct pollfd {
int fd; / file descriptor/
short events; / monitored events/
Short events; / events returned in the monitoring event that meet the conditions*/
};
POLLIN common or out of band priority data is readable, i.e. POLLRDNORM | POLLRDBAND
POLLRDNORM data readability
POLLRDBAND priority with data readability
POLLPRI high priority readable data
POLLOUT normal or out of band data can be written
POLLWRNORM data writable
POLLWRBAND priority with data writable
POLLERR error
POLLHUP is suspended
POLLNVAL description word is not an open file

nfds monitors how many file descriptors in the array need to be monitored

timeout millisecond wait
-1: Blocking and so on, this macro is not defined in ා define INFTIM -1 Linux
0: return immediately without blocking the process
>0: wait for the specified number of milliseconds. If the current system time precision is not enough milliseconds, take the value up

If you no longer monitor a file descriptor, you can set fd to - 1 in the pollfd. The poll no longer monitors the pollfd. Next time you return, set events to 0.

server

/* server.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <poll.h>
#include <errno.h>
#include "wrap.h"

#define MAXLINE 80
#define SERV_PORT 6666
#define OPEN_MAX 1024

int main(int argc, char *argv[])
{
	int i, j, maxi, listenfd, connfd, sockfd;
	int nready;
	ssize_t n;
	char buf[MAXLINE], str[INET_ADDRSTRLEN];
	socklen_t clilen;
	struct pollfd client[OPEN_MAX];
	struct sockaddr_in cliaddr, servaddr;

	listenfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servaddr.sin_port = htons(SERV_PORT);

	Bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	Listen(listenfd, 20);

	client[0].fd = listenfd;
	client[0].events = POLLRDNORM; 					/* listenfd Listening to common read events */

	for (i = 1; i < OPEN_MAX; i++)
		client[i].fd = -1; 							/* Initialize the remaining elements in client [] with - 1 */
	maxi = 0; 										/* client[]The maximum element subscript in the valid elements of an array */

	for ( ; ; ) {
		nready = poll(client, maxi+1, -1); 			/* block */
		if (client[0].revents & POLLRDNORM) { 		/* There are client link requests */
			clilen = sizeof(cliaddr);
			connfd = Accept(listenfd, (struct sockaddr *)&cliaddr, &clilen);
			printf("received from %s at PORT %d\n",
					inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)),
					ntohs(cliaddr.sin_port));
			for (i = 1; i < OPEN_MAX; i++) {
				if (client[i].fd < 0) {
					client[i].fd = connfd; 	/* Find the empty space of client [] and store the connfd returned from accept */
					break;
				}
			}

			if (i == OPEN_MAX)
				perr_exit("too many clients");

			client[i].events = POLLRDNORM; 		/* Set the connfd just returned to monitor the read event */
			if (i > maxi)
				maxi = i; 						/* Update the maximum element subscript in client [] */
			if (--nready <= 0)
				continue; 						/* When there are no more ready events, continue to return to poll blocking */
		}
		for (i = 1; i <= maxi; i++) { 			/* Detect client [] */
			if ((sockfd = client[i].fd) < 0)
				continue;
			if (client[i].revents & (POLLRDNORM | POLLERR)) {
				if ((n = Read(sockfd, buf, MAXLINE)) < 0) {
					if (errno == ECONNRESET) { /* When RST flag is received */
						/* connection reset by client */
						printf("client[%d] aborted connection\n", i);
						Close(sockfd);
						client[i].fd = -1;
					} else {
						perr_exit("read error");
					}
				} else if (n == 0) {
					/* connection closed by client */
					printf("client[%d] closed connection\n", i);
					Close(sockfd);
					client[i].fd = -1;
				} else {
					for (j = 0; j < n; j++)
						buf[j] = toupper(buf[j]);
						Writen(sockfd, buf, n);
				}
				if (--nready <= 0)
					break; 				/* no more readable descriptors */
			}
		}
	}
	return 0;
}

client

/* client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include "wrap.h"

#define MAXLINE 80
#define SERV_PORT 6666

int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	char buf[MAXLINE];
	int sockfd, n;

	sockfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	Connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	while (fgets(buf, MAXLINE, stdin) != NULL) {
		Write(sockfd, buf, strlen(buf));
		n = Read(sockfd, buf, MAXLINE);
		if (n == 0)
			printf("the other side has been closed.\n");
		else
			Write(STDOUT_FILENO, buf, n);
	}
	Close(sockfd);
	return 0;
}

ppoll

GNU defines ppoll (non POSIX standard), which can support setting signal mask word.

#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <poll.h>
int ppoll(struct pollfd *fds, nfds_t nfds,
const struct timespec *timeout_ts, const sigset_t *sigmask);

epoll

epoll is an enhanced version of select/poll, a multiplexing IO interface under Linux. It can significantly improve the CPU utilization of the system when there are only a few active programs in a large number of concurrent connections, because it will reuse the collection of file descriptors to deliver results without forcing developers to prepare the collection of file descriptors to be listened to before each event. Another reason is that When getting events, it doesn't need to traverse the entire set of descriptors being listened to. It just needs to traverse the set of descriptors that are asynchronously awakened by kernel IO events and added to the Ready queue.

At present, epell is a popular model of choice in linux large-scale concurrent network programs.

In addition to the Level Triggered IO events of select/poll type, epoll also provides Edge Triggered IO events, which makes it possible for user space programs to cache IO status, reduce the call of epoll_wait/epoll_pwait, and improve application efficiency.

You can use the cat command to see the maximum number of socket descriptors a process can open.
cat /proc/sys/fs/file-max

If necessary, you can modify the upper limit value by modifying the configuration file.
sudo vi /etc/security/limits.conf
Write the following configuration at the end of the file, soft limit and hard limit. As shown in the figure below.
* soft nofile 65536
* hard nofile 100000

Foundation API

1. Create an epoll handle. The parameter size tells the kernel the number of file descriptors it listens to, which is related to the memory size.
#include <sys/epoll.h>
Int epll_create (int size) size: number of listeners

2. Control the events on a file descriptor monitored by epoll: register, modify, and delete.
#include <sys/epoll.h>
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
epfd: handle for epoll_creat e
op: represents the action, which is represented by three macros:
Epoll? CTL? Add (register new fd to epfd),
Epoll? CTL? Mod (modify the listening event of the registered fd),
Epoll? CTL? Del (delete an FD from epfd);
event: tell the kernel what events to listen for

struct epoll_event {
			__uint32_t events; /* Epoll events */
			epoll_data_t data; /* User data variable */
};
typedef union epoll_data {
			void *ptr;
			int fd;
			uint32_t u32;
			uint64_t u64;
} epoll_data_t;

EPOLLIN: indicates that the corresponding file descriptor can be read (including normal shutdown of peer SOCKET)
EPOLLOUT: indicates that the corresponding file descriptor can be written
EPOLLPRI: indicates that the corresponding file descriptor has urgent data readability (here it should indicate that there is out of band data coming)
EPOLLERR: indicates that there is an error in the corresponding file descriptor
Epollup: indicates that the corresponding file descriptor is suspended;
EPOLLET: set EPOLL to edge triggered mode, which is relative to level triggered
EPOLLONESHOT: only listen to the event once. After listening to the event, if you need to continue listening to the socket, you need to add the socket to the EPOLL queue again

3. Wait for an event to be generated on the monitored file descriptor, similar to the select() call.

#include <sys/epoll.h>
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout)
Events: the collection used to store events in the kernel,
Maxevents: tell the kernel how large this event is. The value of maxevents cannot be greater than the size when epoll_create() is created,
Timeout: is the timeout
-1: blocking
0: immediate return, non blocking
>0: specify milliseconds

Return value: how many file descriptors are ready to be returned successfully, 0 when the time expires, and - 1 when the error occurs

server

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/epoll.h>
#include <errno.h>
#include "wrap.h"

#define MAXLINE 80
#define SERV_PORT 6666
#define OPEN_MAX 1024

int main(int argc, char *argv[])
{
	int i, j, maxi, listenfd, connfd, sockfd;
	int nready, efd, res;
	ssize_t n;
	char buf[MAXLINE], str[INET_ADDRSTRLEN];
	socklen_t clilen;
	int client[OPEN_MAX];
	struct sockaddr_in cliaddr, servaddr;
	struct epoll_event tep, ep[OPEN_MAX];

	listenfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servaddr.sin_port = htons(SERV_PORT);

	Bind(listenfd, (struct sockaddr *) &servaddr, sizeof(servaddr));

	Listen(listenfd, 20);

	for (i = 0; i < OPEN_MAX; i++)
		client[i] = -1;
	maxi = -1;

	efd = epoll_create(OPEN_MAX);
	if (efd == -1)
		perr_exit("epoll_create");

	tep.events = EPOLLIN; tep.data.fd = listenfd;

	res = epoll_ctl(efd, EPOLL_CTL_ADD, listenfd, &tep);
	if (res == -1)
		perr_exit("epoll_ctl");

	while (1) {
		nready = epoll_wait(efd, ep, OPEN_MAX, -1); /* Blocking monitoring */
		if (nready == -1)
			perr_exit("epoll_wait");

		for (i = 0; i < nready; i++) {
			if (!(ep[i].events & EPOLLIN))
				continue;
			if (ep[i].data.fd == listenfd) {
				clilen = sizeof(cliaddr);
				connfd = Accept(listenfd, (struct sockaddr *)&cliaddr, &clilen);
				printf("received from %s at PORT %d\n", 
						inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)), 
						ntohs(cliaddr.sin_port));
				for (j = 0; j < OPEN_MAX; j++) {
					if (client[j] < 0) {
						client[j] = connfd; /* save descriptor */
						break;
					}
				}

				if (j == OPEN_MAX)
					perr_exit("too many clients");
				if (j > maxi)
					maxi = j; 		/* max index in client[] array */

				tep.events = EPOLLIN; 
				tep.data.fd = connfd;
				res = epoll_ctl(efd, EPOLL_CTL_ADD, connfd, &tep);
				if (res == -1)
					perr_exit("epoll_ctl");
			} else {
				sockfd = ep[i].data.fd;
				n = Read(sockfd, buf, MAXLINE);
				if (n == 0) {
					for (j = 0; j <= maxi; j++) {
						if (client[j] == sockfd) {
							client[j] = -1;
							break;
						}
					}
					res = epoll_ctl(efd, EPOLL_CTL_DEL, sockfd, NULL);
					if (res == -1)
						perr_exit("epoll_ctl");

					Close(sockfd);
					printf("client[%d] closed connection\n", j);
				} else {
					for (j = 0; j < n; j++)
						buf[j] = toupper(buf[j]);
					Writen(sockfd, buf, n);
				}
			}
		}
	}
	close(listenfd);
	close(efd);
	return 0;
}

client

/* client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include "wrap.h"

#define MAXLINE 80
#define SERV_PORT 6666

int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	char buf[MAXLINE];
	int sockfd, n;

	sockfd = Socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	Connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	while (fgets(buf, MAXLINE, stdin) != NULL) {
		Write(sockfd, buf, strlen(buf));
		n = Read(sockfd, buf, MAXLINE);
		if (n == 0)
			printf("the other side has been closed.\n");
		else
			Write(STDOUT_FILENO, buf, n);
	}

	Close(sockfd);
	return 0;
}

epoll advanced

event model

EPOLL events have two models:
Edge Triggered (ET) edge trigger is triggered only when data arrives, regardless of whether there is data in the cache.
Level Triggered (LT) level triggers whenever there is data.

Consider the following steps:
1. Suppose we have added a file descriptor (RFD) to the epoll descriptor to read data from the pipeline.
2. 2KB data is written to the other end of the pipeline
3. Call epoll UU wait, and it will return RFD, indicating that it is ready to read
4. Read 1KB data
5. Call epoll Ou wait

In this process, there are two modes of operation:
ET mode
ET mode is Edge Triggered mode.

If we use the EPOLLET flag when we add RFD to epoll descriptor in step 1, it is possible to suspend after we call epoll ﹐ wait in step 5, because the remaining data still exists in the input buffer of the file, and the data sender is waiting for a feedback message for the data that has been sent. The ET working mode reports events only when an event occurs on the monitored file handle. So in step 5, the caller might give up waiting for the remaining data that is still in the file input buffer. When epoll works in et mode, it must use non blocking socket interface to avoid starving the task of processing multiple file descriptors due to the blocking read / block write operation of one file handle. It is better to call the epoll interface of ET mode in the following way, and the possible defects will be avoided later.

1) Based on non blocking file handle
2) Only when read or write returns to EAGAIN (non blocking read, no data for the time being) do you need to suspend or wait. However, this does not mean that each read needs to be read in a circular way. Until an EAGAIN is generated, it is considered that the event processing is completed. When the length of the read data returned by the read is less than the length of the requested data, it can be determined that there is no data in the buffer at this time, and it can be considered that the read event has been processed.

LT mode

LT mode is the Level Triggered working mode.

Different from ET mode, when epoll interface is called in LT mode, it is equivalent to a faster poll, no matter whether the later data is used or not.

LT(level triggered): lt is the default way of working, and supports both block and no block socket. In this way, the kernel tells you whether a file descriptor is ready, and then you can IO the ready fd. If you don't do anything, the kernel will continue to notify you, so programming in this mode is less likely to produce errors. Traditional select/poll is the representative of this model.

ET (edge triggered): ET is a high-speed working mode, only supports no block socket. In this mode, the kernel tells you through epoll when the descriptor is never ready to change to ready. Then it assumes that you know that the file descriptor is ready, and no more ready notifications will be sent for that file descriptor. Note that the kernel will not send more notifications (only once) if the fd has not been IO operated (which causes it to become not ready again)

Example 1:
Trigger mode based on pipeline epoll ET

#include <stdio.h>
#include <stdlib.h>
#include <sys/epoll.h>
#include <errno.h>
#include <unistd.h>

#define MAXLINE 10

int main(int argc, char *argv[])
{
	int efd, i;
	int pfd[2];
	pid_t pid;
	char buf[MAXLINE], ch = 'a';

	pipe(pfd);
	pid = fork();
	if (pid == 0) {
		close(pfd[0]);
		while (1) {
			for (i = 0; i < MAXLINE/2; i++)
				buf[i] = ch;
			buf[i-1] = '\n';
			ch++;

			for (; i < MAXLINE; i++)
				buf[i] = ch;
			buf[i-1] = '\n';
			ch++;

			write(pfd[1], buf, sizeof(buf));
			sleep(2);
		}
		close(pfd[1]);
	} else if (pid > 0) {
		struct epoll_event event;
		struct epoll_event resevent[10];
		int res, len;
		close(pfd[1]);

		efd = epoll_create(10);
		/* event.events = EPOLLIN; */
		event.events = EPOLLIN | EPOLLET;		/* ET Edge trigger, default is horizontal trigger */
		event.data.fd = pfd[0];
	epoll_ctl(efd, EPOLL_CTL_ADD, pfd[0], &event);

		while (1) {
			res = epoll_wait(efd, resevent, 10, -1);
			printf("res %d\n", res);
			if (resevent[0].data.fd == pfd[0]) {
				len = read(pfd[0], buf, MAXLINE/2);
				write(STDOUT_FILENO, buf, len);
			}
		}
		close(pfd[0]);
		close(efd);
	} else {
		perror("fork");
		exit(-1);
	}
	return 0;
}

Example two:
epoll ET trigger mode based on network C/S model

server

/* server.c */
#include <stdio.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <sys/epoll.h>
#include <unistd.h>

#define MAXLINE 10
#define SERV_PORT 8080

int main(void)
{
	struct sockaddr_in servaddr, cliaddr;
	socklen_t cliaddr_len;
	int listenfd, connfd;
	char buf[MAXLINE];
	char str[INET_ADDRSTRLEN];
	int i, efd;

	listenfd = socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servaddr.sin_port = htons(SERV_PORT);

	bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	listen(listenfd, 20);

	struct epoll_event event;
	struct epoll_event resevent[10];
	int res, len;
	efd = epoll_create(10);
	event.events = EPOLLIN | EPOLLET;		/* ET Edge trigger, default is horizontal trigger */

	printf("Accepting connections ...\n");
	cliaddr_len = sizeof(cliaddr);
	connfd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len);
	printf("received from %s at PORT %d\n",
			inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)),
			ntohs(cliaddr.sin_port));

	event.data.fd = connfd;
	epoll_ctl(efd, EPOLL_CTL_ADD, connfd, &event);

	while (1) {
		res = epoll_wait(efd, resevent, 10, -1);
		printf("res %d\n", res);
		if (resevent[0].data.fd == connfd) {
			len = read(connfd, buf, MAXLINE/2);
			write(STDOUT_FILENO, buf, len);
		}
	}
	return 0;
}

client

/* client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>

#define MAXLINE 10
#define SERV_PORT 8080

int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	char buf[MAXLINE];
	int sockfd, i;
	char ch = 'a';

	sockfd = socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	while (1) {
		for (i = 0; i < MAXLINE/2; i++)
			buf[i] = ch;
		buf[i-1] = '\n';
		ch++;

		for (; i < MAXLINE; i++)
			buf[i] = ch;
		buf[i-1] = '\n';
		ch++;

		write(sockfd, buf, sizeof(buf));
		sleep(10);
	}
	Close(sockfd);
	return 0;
}

Example three:
epoll ET triggering mode based on network C/S non blocking model

server

/* server.c */
#include <stdio.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <sys/epoll.h>
#include <unistd.h>
#include <fcntl.h>

#define MAXLINE 10
#define SERV_PORT 8080

int main(void)
{
	struct sockaddr_in servaddr, cliaddr;
	socklen_t cliaddr_len;
	int listenfd, connfd;
	char buf[MAXLINE];
	char str[INET_ADDRSTRLEN];
	int i, efd, flag;

	listenfd = socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servaddr.sin_port = htons(SERV_PORT);

	bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	listen(listenfd, 20);

	struct epoll_event event;
	struct epoll_event resevent[10];
	int res, len;
	efd = epoll_create(10);
	/* event.events = EPOLLIN; */
	event.events = EPOLLIN | EPOLLET;		/* ET Edge trigger, default is horizontal trigger */

	printf("Accepting connections ...\n");
	cliaddr_len = sizeof(cliaddr);
	connfd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len);
	printf("received from %s at PORT %d\n",
			inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)),
			ntohs(cliaddr.sin_port));

	flag = fcntl(connfd, F_GETFL);
	flag |= O_NONBLOCK;
	fcntl(connfd, F_SETFL, flag);
	event.data.fd = connfd;
	epoll_ctl(efd, EPOLL_CTL_ADD, connfd, &event);

	while (1) {
		printf("epoll_wait begin\n");
		res = epoll_wait(efd, resevent, 10, -1);
		printf("epoll_wait end res %d\n", res);

		if (resevent[0].data.fd == connfd) {
			while ((len = read(connfd, buf, MAXLINE/2)) > 0)
				write(STDOUT_FILENO, buf, len);
		}
	}
	return 0;
}

client

/* client.c */
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>

#define MAXLINE 10
#define SERV_PORT 8080

int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	char buf[MAXLINE];
	int sockfd, i;
	char ch = 'a';

	sockfd = socket(AF_INET, SOCK_STREAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));

	while (1) {
		for (i = 0; i < MAXLINE/2; i++)
			buf[i] = ch;
		buf[i-1] = '\n';
		ch++;

		for (; i < MAXLINE; i++)
			buf[i] = ch;
		buf[i-1] = '\n';
		ch++;

		write(sockfd, buf, sizeof(buf));
		sleep(10);
	}
	Close(sockfd);
	return 0;
}

Thread pool concurrent server

1. Create in advance to block in accept multithread, and use mutually exclusive lock to protect accept
2. Create Multithread in advance, and the main thread calls accept

UDP server

There are two main application protocol models in transport layer, one is TCP protocol, the other is UDP protocol. TCP protocol plays a leading role in network communication. Most of the network communication uses TCP protocol to complete data transmission. But UDP is also an indispensable means of network communication.

Compared with TCP, UDP communication is more like SMS. It is not necessary to establish and maintain the connection before data transmission. Just focus on getting the data. Without the process of three handshakes, the speed of communication can be greatly improved, but the stability and accuracy of the accompanying communication can not be guaranteed. Therefore, we call UDP "connectionless unreliable message delivery".

So what are the advantages and disadvantages of UDP compared with the well-known TCP? Because there is no need to create a connection, UDP has low overhead, fast data transmission speed and strong real-time performance. It is mainly used in communication occasions with high real-time requirements, such as video conference, telephone conference, etc. However, along with the unreliable data transmission, the accuracy, sequence and flow of data transmission can not be controlled and guaranteed. Therefore, in general, UDP protocol is used for data transmission. In order to ensure the correctness of data, we need to add auxiliary verification protocol in the application layer to make up for the lack of UDP, so as to achieve the purpose of reliable data transmission.

Similar to TCP, UDP may lose packets when receiving data after the buffer is filled. Because it does not have the mechanism of TCP sliding window, it is usually solved by the following two methods:

1) The server application layer is designed with flow control to control the speed of sending data.
2) With the help of setsockopt function, the receiving buffer size is changed. Such as:

#include <sys/socket.h>
int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen);
int n = 220x1024
setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &n, sizeof(n));

C/S model UDP

Because UDP does not need to maintain the connection, the program logic is much simpler, but UDP protocol is unreliable, and the mechanism to ensure the reliability of communication needs to be implemented in the application layer.

Compile and run the server. Open a client in each terminal to interact with the server to see if the server has the ability of concurrent services. Use Ctrl+C to close the server, and then run the server to see if the client can contact the server at this time. Compared with the running result of the previous TCP program, it realizes the meaning of no connection.

server

#include <string.h>
#include <netinet/in.h>
#include <stdio.h>
#include <unistd.h>
#include <strings.h>
#include <arpa/inet.h>
#include <ctype.h>

#define MAXLINE 80
#define SERV_PORT 6666

int main(void)
{
	struct sockaddr_in servaddr, cliaddr;
	socklen_t cliaddr_len;
	int sockfd;
	char buf[MAXLINE];
	char str[INET_ADDRSTRLEN];
	int i, n;

	sockfd = socket(AF_INET, SOCK_DGRAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
	servaddr.sin_port = htons(SERV_PORT);

	bind(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));
	printf("Accepting connections ...\n");

	while (1) {
		cliaddr_len = sizeof(cliaddr);
		n = recvfrom(sockfd, buf, MAXLINE,0, (struct sockaddr *)&cliaddr, &cliaddr_len);
		if (n == -1)
			perror("recvfrom error");
		printf("received from %s at PORT %d\n", 
				inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)),
				ntohs(cliaddr.sin_port));
		for (i = 0; i < n; i++)
			buf[i] = toupper(buf[i]);

		n = sendto(sockfd, buf, n, 0, (struct sockaddr *)&cliaddr, sizeof(cliaddr));
		if (n == -1)
			perror("sendto error");
	}
	close(sockfd);
	return 0;
}

client

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <strings.h>
#include <ctype.h>

#define MAXLINE 80
#define SERV_PORT 6666

int main(int argc, char *argv[])
{
	struct sockaddr_in servaddr;
	int sockfd, n;
	char buf[MAXLINE];

	sockfd = socket(AF_INET, SOCK_DGRAM, 0);

	bzero(&servaddr, sizeof(servaddr));
	servaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
	servaddr.sin_port = htons(SERV_PORT);

	while (fgets(buf, MAXLINE, stdin) != NULL) {
		n = sendto(sockfd, buf, strlen(buf), 0, (struct sockaddr *)&servaddr, sizeof(servaddr));
		if (n == -1)
			perror("sendto error");
		n = recvfrom(sockfd, buf, MAXLINE, 0, NULL, 0);
		if (n == -1)
			perror("recvfrom error");
		write(STDOUT_FILENO, buf, n);
	}
	close(sockfd);
	return 0;
}

Multicast (multicast)

Multicast groups can be permanent or temporary. Among the multicast group addresses, some are officially assigned, which are called permanent multicast groups. What the permanent multicast group keeps unchanged is its ip address, and the composition of its members can change. The number of members in a permanent multicast group can be arbitrary or even zero. ip multicast addresses that are not reserved for permanent multicast groups can be used by temporary multicast groups.

224.0.0.0-224.0.0.255 is the reserved multicast address (permanent group address), 224.0.0.0 is reserved without allocation, and other addresses are used by routing protocol;
224.0.1.0 ~ 224.0.1.255 are public multicast addresses, which can be used in the Internet. If you want to use them, you need to apply.
224.0.2.0-238.255.255.255 is the multicast address (temporary group address) available to users, which is valid in the whole network;
239.0.0.0-239.255.255.255 is the local management multicast address, which is only valid in a specific local range.

You can use the ip ad command to view the network card number, such as:
itcast$ ip ad

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
link/ether 00:0c:29:0a:c4:f4 brd ff:ff:ff:ff:ff:ff
inet6 fe80::20c:29ff:fe0a:c4f4/64 scope link
valid_lft forever preferred_lft forever
The if? Namepointex command can obtain the serial number of the network card according to the name of the network card.

server

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <net/if.h>

#define SERVER_PORT 6666
#define CLIENT_PORT 9000
#define MAXLINE 1500
#define GROUP "239.0.0.2"

int main(void)
{
	int sockfd, i ;
	struct sockaddr_in serveraddr, clientaddr;
	char buf[MAXLINE] = "itcast\n";
	char ipstr[INET_ADDRSTRLEN]; /* 16 Bytes */
	socklen_t clientlen;
	ssize_t len;
	struct ip_mreqn group;

	/* Construct socket for UDP communication */
	sockfd = socket(AF_INET, SOCK_DGRAM, 0);

	bzero(&serveraddr, sizeof(serveraddr));
	serveraddr.sin_family = AF_INET; /* IPv4 */
	serveraddr.sin_addr.s_addr = htonl(INADDR_ANY); /* Local any IP INADDR_ANY = 0 */
	serveraddr.sin_port = htons(SERVER_PORT);

	bind(sockfd, (struct sockaddr *)&serveraddr, sizeof(serveraddr));

	/*Set group address*/
	inet_pton(AF_INET, GROUP, &group.imr_multiaddr);
	/*Local any IP*/
	inet_pton(AF_INET, "0.0.0.0", &group.imr_address);
	/* eth0 --> Numbering command: ip ad */
	group.imr_ifindex = if_nametoindex("eth0");
	setsockopt(sockfd, IPPROTO_IP, IP_MULTICAST_IF, &group, sizeof(group));

	/*Construct client address IP + port */
	bzero(&clientaddr, sizeof(clientaddr));
	clientaddr.sin_family = AF_INET; /* IPv4 */
	inet_pton(AF_INET, GROUP, &clientaddr.sin_addr.s_addr);
	clientaddr.sin_port = htons(CLIENT_PORT);

	while (1) {
		//fgets(buf, sizeof(buf), stdin);
		sendto(sockfd, buf, strlen(buf), 0, (struct sockaddr *)&clientaddr, sizeof(clientaddr));
		sleep(1);
	}
	close(sockfd);
	return 0;
}

client

#include <netinet/in.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <net/if.h>

#define SERVER_PORT 6666
#define MAXLINE 4096
#define CLIENT_PORT 9000
#define GROUP "239.0.0.2"

int main(int argc, char *argv[])
{
	struct sockaddr_in serveraddr, localaddr;
	int confd;
	ssize_t len;
	char buf[MAXLINE];

	/* Define multicast architecture */
	struct ip_mreqn group;
	confd = socket(AF_INET, SOCK_DGRAM, 0);

	//Initialize local address
	bzero(&localaddr, sizeof(localaddr));
	localaddr.sin_family = AF_INET;
	inet_pton(AF_INET, "0.0.0.0" , &localaddr.sin_addr.s_addr);
	localaddr.sin_port = htons(CLIENT_PORT);

	bind(confd, (struct sockaddr *)&localaddr, sizeof(localaddr));

	/*Set group address*/
	inet_pton(AF_INET, GROUP, &group.imr_multiaddr);
	/*Local any IP*/
	inet_pton(AF_INET, "0.0.0.0", &group.imr_address);
	/* eth0 --> Numbering command: ip ad */
	group.imr_ifindex = if_nametoindex("eth0");
	/*Set client to join multicast group */
	setsockopt(confd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &group, sizeof(group));

	while (1) {
		len = recvfrom(confd, buf, sizeof(buf), 0, NULL, 0);
		write(STDOUT_FILENO, buf, len);
	}
	close(confd);
	return 0;
}

socket IPC (local socket domain)

socket API was originally designed for network communication, but later an IPC mechanism was developed on the framework of socket, namely UNIX Domain Socket. Although network socket can also be used for interprocess communication of the same host (through loopback address 127.0.0.1), UNIX Domain Socket is more efficient for IPC: it does not need to go through the network protocol stack, does not need to pack and unpack, calculate the check sum, maintain the serial number and reply, etc., only copies the application layer data from one process to another. This is because IPC mechanism is essentially reliable communication, while network protocol is designed for unreliable communication. UNIX Domain Socket also provides flow oriented and packet oriented API interfaces, similar to TCP and UDP, but message oriented UNIX Domain Socket is also reliable, and messages are neither lost nor disordered.

UNIX Domain Socket is full duplex with rich API interface semantics. Compared with other IPC mechanisms, it has obvious advantages. At present, it has become the most widely used IPC mechanism. For example, X Window server and GUI program communicate through UNIX Domain Socket.

The process of using UNIX Domain Socket is very similar to that of network socket. First, call socket() to create a socket file descriptor. The address family is specified as AF ﹣ UNIX. The type can be selected as sock ﹣ Dgram or sock ﹣ stream. The protocol parameter is still specified as 0.

The most obvious difference between UNIX Domain Socket and network socket programming is that the address format is different. Using the structure sockaddr_un, the socket address of network programming is the IP address plus the end slogan, while UNIX domain The socket address is the path of a socket type file in the file system. The socket file is created by the bind() call. If the file already exists when the bind() call is made, the bind() error returns.

Compare network socket address structure with local socket address structure:

struct sockaddr_in {
__kernel_sa_family_t sin_family; 			/* Address family */  	Address structure type
__be16 sin_port;					 	/* Port number */		Port number
struct in_addr sin_addr;					/* Internet address */	IP address
};
struct sockaddr_un {
__kernel_sa_family_t sun_family; 		/* AF_UNIX */			Address structure type
char sun_path[UNIX_PATH_MAX]; 		/* pathname */		socket file name(Containing paths)
};

The following program binds the UNIX Domain socket to an address.

size = offsetof(struct sockaddr_un, sun_path) + strlen(un.sun_path);
#define offsetof(type, member) ((int)&((type *)0)->MEMBER)

server

#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <errno.h>

#define QLEN 10
/*
* Create a server endpoint of a connection.
* Returns fd if all OK, <0 on error.
*/
int serv_listen(const char *name)
{
	int fd, len, err, rval;
	struct sockaddr_un un;

	/* create a UNIX domain stream socket */
	if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0)
		return(-1);
	/* in case it already exists */
	unlink(name); 			

	/* fill in socket address structure */
	memset(&un, 0, sizeof(un));
	un.sun_family = AF_UNIX;
	strcpy(un.sun_path, name);
	len = offsetof(struct sockaddr_un, sun_path) + strlen(name);

	/* bind the name to the descriptor */
	if (bind(fd, (struct sockaddr *)&un, len) < 0) {
		rval = -2;
		goto errout;
	}
	if (listen(fd, QLEN) < 0) { /* tell kernel we're a server */
		rval = -3;
		goto errout;
	}
	return(fd);

errout:
	err = errno;
	close(fd);
	errno = err;
	return(rval);
}
int serv_accept(int listenfd, uid_t *uidptr)
{
	int clifd, len, err, rval;
	time_t staletime;
	struct sockaddr_un un;
	struct stat statbuf;

	len = sizeof(un);
	if ((clifd = accept(listenfd, (struct sockaddr *)&un, &len)) < 0)
		return(-1); /* often errno=EINTR, if signal caught */

	/* obtain the client's uid from its calling address */
	len -= offsetof(struct sockaddr_un, sun_path); /* len of pathname */
	un.sun_path[len] = 0; /* null terminate */

	if (stat(un.sun_path, &statbuf) < 0) {
		rval = -2;
		goto errout;
	}
	if (S_ISSOCK(statbuf.st_mode) == 0) {
		rval = -3; /* not a socket */
		goto errout;
	}
	if (uidptr != NULL)
		*uidptr = statbuf.st_uid; /* return uid of caller */
	/* we're done with pathname now */
	unlink(un.sun_path); 
	return(clifd);

errout:
	err = errno;
	close(clifd);
	errno = err;
	return(rval);
}
int main(void)
{
	int lfd, cfd, n, i;
	uid_t cuid;
	char buf[1024];
	lfd = serv_listen("foo.socket");

	if (lfd < 0) {
		switch (lfd) {
			case -3:perror("listen"); break;
			case -2:perror("bind"); break;
			case -1:perror("socket"); break;
		}
		exit(-1);
	}
	cfd = serv_accept(lfd, &cuid);
	if (cfd < 0) {
		switch (cfd) {
			case -3:perror("not a socket"); break;
			case -2:perror("a bad filename"); break;
			case -1:perror("accept"); break;
		}
		exit(-1);
	}
	while (1) {
r_again:
		n = read(cfd, buf, 1024);
		if (n == -1) {
		if (errno == EINTR)
		goto r_again;
	}
	else if (n == 0) {
		printf("the other side has been closed.\n");
		break;
	}
	for (i = 0; i < n; i++)
		buf[i] = toupper(buf[i]);
		write(cfd, buf, n);
	}
	close(cfd);
	close(lfd);
	return 0;
}

client

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <errno.h>

#define CLI_PATH "/var/tmp/" /* +5 for pid = 14 chars */
/*
* Create a client endpoint and connect to a server.
* Returns fd if all OK, <0 on error.
*/
int cli_conn(const char *name)
{
	int fd, len, err, rval;
	struct sockaddr_un un;

	/* create a UNIX domain stream socket */
	if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0)
		return(-1);

	/* fill socket address structure with our address */
	memset(&un, 0, sizeof(un));
	un.sun_family = AF_UNIX;
	sprintf(un.sun_path, "%s%05d", CLI_PATH, getpid());
	len = offsetof(struct sockaddr_un, sun_path) + strlen(un.sun_path);

	/* in case it already exists */
	unlink(un.sun_path); 
	if (bind(fd, (struct sockaddr *)&un, len) < 0) {
		rval = -2;
		goto errout;
	}

	/* fill socket address structure with server's address */
	memset(&un, 0, sizeof(un));
	un.sun_family = AF_UNIX;
	strcpy(un.sun_path, name);
	len = offsetof(struct sockaddr_un, sun_path) + strlen(name);
	if (connect(fd, (struct sockaddr *)&un, len) < 0) {
		rval = -4;
		goto errout;
	}
return(fd);
	errout:
	err = errno;
	close(fd);
	errno = err;
	return(rval);
}
int main(void)
{
	int fd, n;
	char buf[1024];

	fd = cli_conn("foo.socket");
	if (fd < 0) {
		switch (fd) {
			case -4:perror("connect"); break;
			case -3:perror("listen"); break;
			case -2:perror("bind"); break;
			case -1:perror("socket"); break;
		}
		exit(-1);
	}
	while (fgets(buf, sizeof(buf), stdin) != NULL) {
		write(fd, buf, strlen(buf));
		n = read(fd, buf, sizeof(buf));
		write(STDOUT_FILENO, buf, n);
	}
	close(fd);
	return 0;
}

Other common functions

Name and address translation

gethostbyname gets the host information according to the given host name.
Obsolete, IPv4 only, and thread unsafe.

#include <stdio.h>
#include <netdb.h>
#include <arpa/inet.h>

extern int h_errno;

int main(int argc, char *argv[])
{
	struct hostent *host;
	char str[128];
	host = gethostbyname(argv[1]);
	printf("%s\n", host->h_name);

	while (*(host->h_aliases) != NULL)
		printf("%s\n", *host->h_aliases++);

	switch (host->h_addrtype) {
		case AF_INET:
			while (*(host->h_addr_list) != NULL)
			printf("%s\n", inet_ntop(AF_INET, (*host->h_addr_list++), str, sizeof(str)));
		break;
		default:
			printf("unknown address type\n");
			break;
	}
	return 0;
}

gethostbyaddr function

This function can only obtain the domain name corresponding to the url of the domain name resolution server and the IP registered in / etc/hosts.

#include <stdio.h>
#include <netdb.h>
#include <arpa/inet.h>

extern int h_errno;

int main(int argc, char *argv[])
{
	struct hostent *host;
	char str[128];
	struct in_addr addr;

	inet_pton(AF_INET, argv[1], &addr);
	host = gethostbyaddr((char *)&addr, 4, AF_INET);
	printf("%s\n", host->h_name);

	while (*(host->h_aliases) != NULL)
		printf("%s\n", *host->h_aliases++);
	switch (host->h_addrtype) {
		case AF_INET:
			while (*(host->h_addr_list) != NULL)
			printf("%s\n", inet_ntop(AF_INET, (*host->h_addr_list++), str, sizeof(str)));
			break;
		default:
			printf("unknown address type\n");
			break;
	}
	return 0;
}

getservbyname
getservbyport
Obtain information according to the service program name or port number. The frequency of use is not high.

getaddrinfo
getnameinfo
freeaddrinfo
Can handle both IPv4 and IPv6, thread safe.
Socket interface and address Association

getsockname
According to the sockfd returned by accpet, get the temporary port number

getpeername
According to the sockfd returned by accpet, the port number of the remote link is obtained, and the client information can be obtained after exec.

Published 16 original articles, won praise 1, visited 1080
Private letter follow

Tags: socket network Unix Mac

Posted on Wed, 11 Mar 2020 04:03:09 -0700 by RockyShark