Bo2SS

Bo2SS

7 Basics of Socket Programming

Course Content#

What is a socket? What does network programming do?

  • Understand the TCP/IP five-layer model and the OSI seven-layer model
  • Analogy
    • Socket — Courier
    • Transport Layer — Courier Company: TCP — Some Feng Courier Company, UDP — Some Tong Courier Company
    • Transportation Road — Internet
    • Communication Address — IP

——Transport Layer Protocol——#

Analogy of a courier company
For developers, you can only choose TCP or UDP protocol and modify protocol parameters

TCP#

Transmission Control Protocol; connection-oriented, reliable data transmission protocol

  • Connection: Three-way handshake [See Additional Knowledge Points for details]
  • Nature of reliability: Acknowledgment and retransmission [Requires sequence number]
    • If lost, it will be retransmitted
    • [PS] Both parties save some variables describing their states
  • Header Format
  • Image
  • Source Port Number: From which port it is sent; Destination Port Number: Which port it is sent to
    • Different ports correspond to different applications
    • If the computer is compared to a building, the port number is the room number in the building
    • The IP address is provided by the IP layer
  • Sequence Number: Marks the number of the communication; Acknowledgment Number: The expected sequence number for the next communication from the other party
  • Header Length: Measured in words [Usually 4 bytes]
  • Function Bit Fields [Focus on the highlighted parts]
    • ACK: Acknowledgment
    • RST: Reset connection [Refuse the next connection]
    • SYN: Establish request [Used in the first two handshakes of the three-way handshake]
    • FIN: Close connection [Used in the first and third handshakes of the four-way handshake, and can also carry some data, see Additional Knowledge Points for details]
  • Window Size: Tells the other party how much data can still be sent, used to suppress the sending rate of the other party
  • Checksum: Confirms whether the data is correct. If there is a problem, it is directly discarded and a retransmission is requested
  • [PS]
    • Designed so much mainly for reliability
    • Real-world courier companies cannot achieve reliability because the items transported are unique

UDP#

User Datagram Protocol; connectionless, unreliable data transmission protocol

  • Connectionless: No handshake required
  • Unreliable: No matter whether the other party receives it or not
  • Advantages: Flexible, low cost
  • Header Format
  • Image
  • Much simpler compared to TCP

——Socket——#

Analogy of a courier, but serving only one task
The interface between the process and the transport layer, the process must hand over network data to it for delivery to the transport layer

【Life and Death】#

Socket: Create Socket#

  • Image
  • Domain: Domain type
    • AF_INET, corresponds to IPv4 [commonly used]
    • AF_INET6, corresponds to IPv6
  • Type: Type
    • SOCK_STREAM, corresponds to byte stream [TCP]
    • SOCK_DGRAM, corresponds to datagram [UDP]
  • Protocol: Protocol
    • Domain and type may uniquely determine the protocol, such as AF_INET and SOCK_STREAM determine IPPROTO_TCP
    • [PS] If only one can be selected, you can use 0 instead
  • Return Value: File descriptor
    • Returns -1 on error
    • Socket is also a file, everything is a file

Close: Close Connection#

  • int close(int fd);
  • Four-way handshake [See Additional Knowledge Points for details]
  • Both ends need to call close, the caller sends FIN, the return value of recv on the receiving end is 0

【Service】#

Bind: Bind IP and Port#

Only for the data receiving party

  • Image
  • sockfd: File descriptor
  • addr: IP address and port
    • Binding IP: Can receive data from that IP address [local machine]
      • If empty, can receive data from any IP address
      • Can be used at the junction of internal and external networks, serving as a firewall
    • Binding Port: Which port it serves [A total of $2^{16}=65536$ ports]
  • addrlen: Address length
  • Return Value: Success, 0; otherwise, -1

sockaddr

  • Image
  • sin_family: Address protocol family, generally using AF_INET, corresponding to IPv4
  • sa_data: Contains both IP address and port
  • ❗ Not convenient to use, switch to the following more user-friendly way 👇, then use (struct sockaddr*) for type conversion

sockaddr_in

  • Image
  • sin_port: Port number [Requires network byte order, see below]
  • sin_addr: IP address
  • Among them, sin_addr corresponds to a new structure in_addr
    • Image
    • Stores a 32-bit unsigned integer, generally using the inet_addr function to convert dotted decimal to in_addr structure:
      • Image
      • Dotted decimal representation [string form] is more convenient
      • inet_ntoa, the reverse

+ Host Byte Order & Network Byte Order#

  • Host Byte Order: Big-endian, Little-endian
    • Commonly Little-endian machines, low byte is placed at the low address end of memory
  • Network Byte Order: For a 32-bit value of 4 bytes, first transmit 0~7 bits, ..., finally transmit 24~31 bits
  • Functions for converting integer byte order
    • Image
    • htonl: Converts 32-bit host byte order to network byte order
    • htons: Converts 16-bit host byte order to network byte order
    • ntohl, ntohs, the reverse

Listen: Set to Listening State#

Switch the socket from active (default) to passive [First need to bind the port]

  • Image
  • Note: The real meaning of the second parameter is the length of the completion queue
    • ① The TCP connection process has two queues
      • Incomplete Queue: The client sends a SYN, the server responds with SYN+ACK, and the server is currently in the SYN_RECV state, at this time the connection is in the incomplete queue
      • Complete Queue: After the client responds with ACK, both sides are in the ESTABLISHED state, at this time the connection is transferred from the incomplete queue to the complete queue
      • 👉 When the server calls accept, the connection is removed from the complete queue
    • ② Note: Set an appropriate backlog; the server should accept new connections as soon as possible

Accept: Accept Connection#

Generate a new courier [can continue to establish multiple connections]

  • Image
  • ① The incoming sockfd must have been processed by socket(), bind(), listen()
  • ② addr is an output parameter used to store the client address
  • Return Value
    • On success, returns a new sockfd, the original sockfd can still be used to accept
    • On failure, returns -1
  • [PS] Generally, the new sockfd is closed after use; the socket in listen state is not closed

【Client】#

Connect: Establish Connection#

Active socket, can connect to one at most

  • Image
  • Unlike accept:
    • sockfd does not need to be processed by bind(), listen()
    • Will not return a new socket

⭐ Connect and accept are a pair, executed on the client and server respectively, during which the three-way handshake is completed

【Transmission】#

Send: Send Data#

Essentially the same as write

  • Image
  • ❗ sendto additionally passes dest_addr and addrlen, which is used for UDP
    • Because no connection is established, the destination IP and port need to be specified
  • Flag is generally set to 0

Recv: Receive Data#

Essentially the same as read

  • Image
  • When the other party disconnects, the return value is 0
  • ❗ recvfrom additionally passes src_addr and addrlen, which is used for UDP
    • src_addr stores the address information of the sending data end
  • Default is blocking

——Additional——#

Kill#

Send a signal to a process

  • man 2 kill
  • Prototype
    • Image
    • Based on process ID and signal mask
  • Description
    • Image
    • Setting pid has various forms
    • All require existence and permission checks
  • Return Value
    • Image
    • 0, success; -1, error
  • kill -l to view the signal list
    • Image
    • 64 types of signals

Signal#

Signal handling method

  • man signal
  • Prototype
    • Image
    • Needs to define a function of type sighandler_t
  • Description
    • Image
    • Its behavior varies with UNIX versions
    • Handler has three types: ignore, default, custom
    • Custom type involves the principle of catching mice: when one mouse is caught, the next one may be lost
      • Needs to be reset [by system operation]
  • Return Value
    • Image
    • Depends on the handler

Code Demonstration#

Server#

tcp_server.h

  • Image
  • Create a courier in a listening state on the specified port

tcp_server.c

  • Image
  • Read according to the sequence number
  • Note: Add socket-related header files in head.h, which can be found in the man manual, not elaborated here

1.server.c

  • Image
  • Image
  • Accept can obtain the client's address information
  • Create a child process dedicated to data transmission
  • Pay attention to error detection at every step
      • Handling of disconnection (FIN, recv returns 0)
  • Sending and receiving strategies differ
    • Send as much as possible, receive as much as possible
    • Send uses strlen, recv uses sizeof

Client#

tcp_client.h

  • Image
  • Actively connect to the specified IP [dotted decimal IPv4 string] and port

tcp_client.c

  • Image
  • Fill out the form based on input

1.client.c

  • Image
  • Image
  • Added signal catching
  • Use of bzero to initialize buff variable

Effect Display#

  • Image
  • Left: Server, Right: Client [Can be multi-user]
  • Establish connection, address capture, data transmission, disconnection
  • Use netstat to view the listening status of the port
    • Image
    • Add -alnt option
  • [PS] Ports need to be opened in the security group of the cloud host console — port 8888

Additional Knowledge Points#

  • IP: Public address service, strives to deliver services. Another layer of meaning, it is unreliable [may encounter accidents]

Three-way Handshake, Four-way Handshake#

  • Image
  • Three-way handshake [SYN, ACK]
    • Image
    • First handshake: The client sends a SYN packet to the server [the client enters SYN_SEND state, waiting for server confirmation]
    • Second handshake: The server receives it, must confirm the client, sets an ACK, and also sets a SYN, i.e., SYN+ACK packet [the server transitions from LISTEN to SYN_RECV state]
    • Third handshake: The client receives the server's SYN+ACK packet and sends an ACK confirmation packet to the server. After sending, the client enters ESTABLISHED state, and the server also enters ESTABLISHED state after receiving the ACK
    • Note: Each ACK sequence number adds one to the sequence number of the packet that needs to be confirmed, indicating acknowledgment
  • Four-way handshake [FIN, ACK]
    • Image
    • First wave: Assuming the client wants to close the connection, the client sends a FIN packet, indicating that it has no more data to send [can still receive data at this time] [the client enters FIN_WAIT_1 state]
    • Second wave: The server replies with an ACK packet, indicating that it has received the client's request to close the connection, but it still needs to prepare to close the connection [the server enters CLOSE_WAIT state]
      • After receiving this ACK, the client enters FIN_WAIT_2 state, waiting for the server to close the connection
    • Third wave: When the server is ready to close the connection, it sends a FIN to the client [the server enters LAST_ACK state, waiting for the client's confirmation]
    • Fourth wave: The client receives the close request from the server and sends an ACK packet [the client enters TIME_WAIT state, waiting for the possible timeout retransmission of the FIN packet for 2 MSL time]
      • After the server receives this ACK, it closes the connection and enters CLOSED state
      • After the client waits for 2 MSL, if it does not receive the server's FIN, it considers that the server has closed the connection normally, so it also closes the connection and enters CLOSED state; otherwise, it sends ACK again
  • Reference Three-way Handshake and Four-way Handshake — Blog [Note: In the fourth wave, the client waits for the timeout retransmission of the FIN rather than ACK]

Additional: Meaning of 2 MSL#

How TIME_WAIT is triggered, what role it plays, what drawbacks it has in programming, and how to solve it?

  • Cause: During the four-way handshake of TCP, after completing the first three waves, when the client receives the FIN from the server during the fourth wave, it will enter the TIME_WAIT state after sending an ACK
    • At this time, the client needs to wait for the time of two maximum segment lifetimes (Maximum segment lifetime, MSL) before it can enter the CLOSED state
  • Reasons for existence
    • Prevent delayed segments
      • Each TCP segment contains a unique sequence number that ensures the reliability of the TCP protocol
      • To ensure that the data segments of the new TCP connection do not overlap with the data segments of historical connections still in transmission, the TCP connection needs to wait for at least the longest time that the silent data segments can survive in the network, which is MSL
      • Thus preventing delayed segments from being received by other TCP connections using the same source address, source port, destination address, and destination port
    • Ensure connection closure
      • If the waiting time of the client is not long enough, when the server has not received the ACK message, and the client reconnects to the server, the following will happen:
        • The server, having not received the ACK message, still considers the current connection valid
        • When the client resends the SYN message to request a handshake, it will receive the server's RST message, and the connection establishment process will be terminated
      • Therefore, it is necessary to ensure that the remote TCP connection is correctly closed, that is, to wait for the passive closing party to receive the FIN corresponding to the ACK message
  • Programming impact
    • In high-concurrency scenarios, it is easy to have too many TIME_WAITs
    • The duration of MSL is generally 60s, which is unacceptable; a TCP connection may only communicate for a few seconds, but TIME_WAIT needs to wait for 2 minutes
  • Solutions
    • Based on a timestamp variable, record the time of sending packets and the time of the last received packet
    • Then combine with two parameters
      • reuse: Allows the party that actively closes the connection to reuse the connection in TIME_WAIT state when initiating a connection again
      • recycle: The kernel will quickly reclaim connections in TIME_WAIT, only needing to wait for RTO time [timeout for packet retransmission]
  • References

Socket Programming in C Language#

  • Server: socket, sockaddr[_in], bind, listen; accept, send/recv; close
  • Client: socket, sockaddr[_in], connect; send/recv; close
  • Image
  • Stream socket based on TCP, datagram socket based on UDP
    • UDP server also needs to bind IP and port, but does not need to listen, using sendto, recvfrom to send and receive information
  • sockaddr[_in]: Structure that saves socket information, use [_in] to fill in information, then convert to sockaddr
  • The server needs two sockets, one for listening and one for receiving the socket sent by the client connect

Enter kaikeba.com and press Enter#

-> To establish a TCP connection, what happens from the local sending of the first request packet to receiving the first request packet?

  • [Macro level] DNS 👉 TCP connection [Application layer, Transport layer, Network layer, Data link layer] 👉 Server processes request 👉 Returns response result
  • DNS
    • Local hosts, local DNS resolver cache
    • Local DNS
    • Iterative/Recursive: Root DNS server, Top-level DNS, Authoritative DNS
    • Until the domain name corresponding to the IP is found
  • TCP Connection
    • Application layer: Send HTTP request — request method, URL, HTTP version
    • Transport layer: Perform three-way handshake with the server
    • Network layer: ARP protocol queries the MAC address corresponding to the IP. If within a local area network, directly send requests based on MAC address; otherwise, use the routing table to find the next hop address, then access the corresponding MAC address
    • Data link layer: Ethernet protocol
    • Broadcast: Send requests to all machines in a local area network, comparing MAC addresses
  • Web Server
    • Parses user requests, knows which resource files need to be scheduled, and calls database information to return to the browser client
  • Return response result
    • Generally, there will be an HTTP status code, such as 200, 301, 404, etc. Through this status code, we can know whether the server-side processing is normal and understand the specific error
  • ⭐ Recommended video: TCP-IP Explained (2000) — Youtube
    • [Mainly unfolds from the IP layer]
    • Involved objects: TCP packets, ICMP Ping packets, UDP packets, dead Ping, routers, router switches...
    • General process
      • Local: Encapsulate packets, local transmission, local router selection, switch selection, proxy check, firewall check, local transmission, router selection
      • ——> Network transmission ——
      • Response end: Firewall check [supervising port], proxy checks request packets, returns corresponding information to the request end, same local process [encapsulate packets, ..., router selection]

Can one port be bound to different services simultaneously?

  • Yes. When receiving data, the data attributes are determined based on the five-tuple {Transport Protocol, Source IP, Source Port, Destination IP, Destination Port}
  • For example:
    • Using TCP and UDP transport protocols to listen on the same port, receiving data does not affect each other, no conflict
    • Similarly, accept generates a new socket, still using the same port
      • Multiple different sockets are generated, the destination IP and port contained in these sockets remain unchanged, only the source IP and port vary [Port reuse]
  • [PS] TCP type sockets only send data to TCP type

Socket Relationship Between Parent and Child Processes#

The relationship between the socket in the child process cloned from the parent process and the socket in the parent process

  • They are the same, corresponding to the same file
  • When data arrives, whichever of the two processes receives the data first has that data, and the other process continues to wait
  • Therefore, generally, resources that the child process does not need should not be inherited, such as: you can use close to directly close the socket inherited from the parent process in the child process

Tips#

  • System/network programming should consider all possible error points
  • Signal knowledge expansion: Implement your own sleep function
  • Remember to consider all related source files [*.c] during compilation

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.