Bo2SS

Bo2SS

Network Communication | The Things About HTTP(S)

Hello everyone, this is Bo2SS~ Time flies, it's been a year since graduation, and the company has injected fresh blood. There is a large front-end newcomer training in the department, and I boldly signed up to share some knowledge related to HTTP. In fact, I had not systematically studied HTTP before, so I prepared for this sharing for two months in advance. After the sharing last week, according to the feedback from the questionnaire, everyone gave it a five-star 🌟 rating, so I’ll record it here~

Preface#

As the title suggests, today we will talk about HTTP and HTTPS in network communication.

Q: Why share this topic?

  1. 🫡 Common in life. HTTP is very common throughout the internet; for example, when we watch dramas, scroll through short videos, or program facing Google, we will use it. As developers, we have an obligation to understand it in depth.

  2. 🤔 Common in work. In our work, we often encounter related issues. For example, during the front-end and back-end interface debugging, if we encounter unexpected situations, the first thing we need to pay attention to is some information in the HTTP request. We should be familiar with its structure and some specifications.

  3. 📖 Ideas to reference. HTTP has developed for more than 30 years and has three major versions. Many of its design ideas are worth referencing in our development.

Q: What materials were referenced?

I did a lot of preparatory work before this sharing, mainly referencing:

  1. The course “Perspective on HTTP Protocol” on Geek Time. I might have listened to this series about ten times. If anyone wants to delve deeper into more details, you can check it out. The author also provides a practical learning repository chronolaw/http_study, through which you can easily set up a web server and access resources inside it via a browser to understand HTTP.

  2. Xiao Lin Coding. This is a public account I have been following for a long time, which contains many illustrated articles about computer fundamentals. It now also has a website version.

  3. Bo2SS. This is my own public account, right here👋. I have previously written two articles related to HTTPS:

    1. Information Security | How to Establish Trust in the Internet Age?

    2. Information Security | (Extra) How to Establish Trust in the Internet Age?

Additionally, some third-party images are referenced in the text, but I won't list the sources one by one. If there are any issues, please feel free to contact me.

Q: What are the goals of this sharing?

  1. 🔍 Quickly locate HTTP issues. As mentioned earlier, during front-end and back-end debugging, we may often encounter situations where the results do not meet expectations. We should first be able to quickly locate whether the issue is on the back-end or front-end through the status code.

  2. 🥣 Familiarize with common header fields in HTTP messages. By familiarizing ourselves with common header fields, we can not only grasp the basic functions of HTTP but also learn many design ideas of HTTP. Where can we see this message? Each end generally has packet capture tools to view it.

  3. 🔐 Understand basic encryption knowledge. In the internet age, user privacy and business confidentiality are very important.

🏁Ultimate Goal: After reading this article, you will have the ability to independently delve into learning HTTP, such as using WireShark, Chrome, Telnet tools, or even looking at RFC documents, which contain almost all important information related to networks.

➕ Some materials: User guides for various tools (WireShark, Chrome, Telnet, RFC document summary.

Q: What content will be shared this time?

This is the outline of this sharing:

In simple terms, today we will discuss what HTTP and HTTPS are, and how they have developed.


Alright, let's get to the main topic:

What is HTTP?#

What is HTTP, and what is it not?#

image

HTTP stands for Hypertext Transfer Protocol, which means hypertext, transfer, protocol. Let's explain it from the back to the front—

  1. Protocol. What is a protocol? We can relate it to our rental agreement or a tripartite agreement; they all have the same meaning. The "co" in protocol represents that there are two or more participants, while the "tocol" represents agreements and specifications, stipulating what you can and cannot do.

  2. Transfer. Then there is transfer. We can relate it to express delivery, which is specifically for transferring between two points. The key points are two: the first is bidirectionality; we can send and receive packages; the second is that the transfer process can have intermediaries. For example, when we send a package, it goes through the delivery person, the courier company, the logistics warehouse, etc., before finally reaching the recipient, and all these intermediaries also abide by the protocol.

  3. Hypertext. Finally, regarding hypertext, it refers to text that goes beyond ordinary text. Here I want to ask everyone a question: besides text, images, audio, and video formats, what is the most crucial format of hypertext? That's right, it's hyperlinks. Hyperlinks allow us to jump from one "hypertext" to another, transforming our text from a linear structure to a non-linear web structure.

In summary: “HTTP is an agreement and specification for transferring hypertext data such as text, images, audio, and video between two points in the computer world.”

image

This image shows the participants in our basic HTTP communication process. From this image, we can clarify what HTTP is not, thus gaining a clearer understanding of HTTP.

  • HTTP is not an entity, such as the web browser on the left (the sender) and the web server on the right (the receiver).

  • HTTP is not the internet; the hypertext resources transmitted by HTTP are just a part of internet resources.

  • HTTP is not a programming language, but HTTP supports various programming languages for implementation.

  • HTTP is not HTML; HTTP can transmit HTML, which is a common format of hypertext.

  • HTTP is not an isolated protocol. Typically, there are some underlying protocols supporting HTTP, such as TCP, IP, DNS, etc.; above HTTP, there are also some protocols that depend on HTTP, such as WebSocket, HTTPDNS, etc. These protocols are interwoven, forming a network of protocols, with HTTP at the center.

Overview of the HTTP World#

Let's take a look at the overall picture of the HTTP world, mainly divided into application-related and theory-related aspects. With a more macro understanding of the HTTP communication link, we can more clearly identify which link might cause the problem.

image

Looking from right to left:

  • Internet - WWW: The internet is the internet, which stores various information resources. WWW is a subset of the internet, short for the World Wide Web, which is based on HTTP, so it stores hypertext resources, accounting for about 90% of resources on the internet.

  • Web Browser: The web browser is the requester in the HTTP communication process and can display the requested resources.

  • Web Server: The web server is the responder in the HTTP communication process, managing network resources. It is generally divided into hardware and software; hardware refers to physical servers, cloud servers, etc., while software refers to applications like Nginx, Apache, etc.

  • CDN: Content Delivery Network, which acts as an intermediary in the HTTP transfer process, serving as a network proxy. It can cache server resources, speed up network responses, and provide load balancing, security protection, etc.

  • Crawler: This refers to web crawlers. Similar to web browsers, it can also be understood as a type of user agent, generally used for automatically scraping data for major search engines and storing it in databases.

  • Others: Other components include HTML, web services, and WAF. Web services can be understood as specific services or service development specifications running on web servers. WAF stands for Web Application Firewall, which is also a type of proxy.

image

Again, looking from right to left, the right side shows HTTP/1.1, HTTPS, HTTP/2, and HTTP/3, which are the main protocols we will discuss today. On the left:

  • TCP/IP: It actually represents a protocol stack that contains many network communication protocols.

image

Here we draw an analogy between the HTTP communication process based on TCP/IP and express delivery:

  1. Hypertext => MAC: In the left image, the column on the left represents the hypertext to be transmitted from the application layer to the link layer. Each layer adds the corresponding header, such as HTTP header, TCP header, IP header, MAC header, just like the packaging process of a package;

  2. MAC => Hypertext: In the right column of the left image, the data being transmitted has a header removed at each layer, just like the unpacking process of a package.


  • URI - URL: URI (I - Identifier) is a Uniform Resource Identifier, which is divided into two forms: URL and URN. However, since the latter is not commonly used in the internet world, URI generally refers to URL.
  1. URL (L - Locator): The address at the top of our browser is the URL.

image

Its basic components are shown in the image above; we can first focus on the red box part:

  1. scheme: The leftmost scheme represents the protocol, such as http, https, ftp, etc. Note that the :// symbol immediately following the protocol is fixed and necessary.

  2. host: The middle host is the hostname, also called the domain name, which will be elaborated on when discussing DNS.

  3. path: The last part, the path, represents the resource path.

Q: Here’s a question: Is the domain name in the example URL www.creatorseo.com/?

The answer is no; the trailing slash / belongs to the path, representing the root directory of the accessed host. Since most computers on the early internet were UNIX systems, the path format here adopts the file path style of UNIX.

Below is another image showing the complete format of a URL.

image

This image includes three additional components:

  1. user@: We can fill in user password information in the URL, but it is no longer recommended for security reasons.

  2. ?query: This part can add some additional requirements for the resource, starting with ? and consisting of multiple key-value pairs k=v, each pair connected by &.

  3. #fragment: It represents a fragment identifier, which can be understood as an anchor within the resource, intended for client use and not sent to the server. When we read some blogs (like clicking on a title in a floating directory to jump to my blog page), this part is used.

💡 Here are two small reminders~ One is that the host can also specify the port through :port. The other is that when discussing URLs, we often mention escape and encode concepts, because without them, the server may not be able to process the URL correctly. Think about it: if the path also contains a ? symbol, how would the server parse the starting position of the query?

  1. escape - escaping: For special characters, we generally escape them by converting them directly into their ASCII code's hexadecimal representation and adding a % prefix, such as SPACE corresponding to %20, and ? corresponding to %3F;

  2. encode - encoding: For Chinese characters and other languages, we generally perform UTF-8 encoding first, then escape. If you don't believe it, try copying and pasting a URL containing Chinese characters into WeChat (like clicking on "Read the original" to jump to my blog page).

  1. URN (N - Name): Now back to the second form of URI, URN, which marks resources in the form of a namespace plus a specific identifier, such as urn:<NAMESPACE-IDENTIFIER>:<NAMESPACE-SPECIFIC-STRING>. It is not commonly used when we go online, but if you buy a book, you might find a string of characters in the barcode position of each book, such as ISBN xxx-x-xx-xxxx, which is actually a use of URN.

image


  • DNS: The full name is Domain Name System, which is an application layer protocol used for domain name resolution, converting domain names into IP addresses.

Let's first look at the structure of a domain name; the red box part here is the domain name, which is a hierarchical structure separated by ., with the rightmost part being the highest level. From right to left, they are the top-level domain, second-level domain, third-level domain, and so on.

image

Next, let's look at the types of DNS and the steps for DNS to resolve domain names, as shown in the image below:

image

  1. DNS Types. DNS is divided into root DNS, top-level DNS, authoritative DNS, and non-authoritative DNS. There are 13 groups of root DNS distributed globally, which can assign DNS resolution to the corresponding top-level DNS based on the requested top-level domain. The top-level DNS then specifies the authoritative DNS based on the second-level domain until the domain name's corresponding IP is resolved. Some large companies also build their own DNS, known as non-authoritative DNS, which are more widely distributed. Well-known examples include Google's 8.8.8.8, Microsoft's 4.2.2.1, and CloudFlare's 1.1.1.1, etc.

  2. Steps for DNS to resolve domain names. The actual resolution process is divided into four steps: the system first looks for DNS cache, which may be in the browser or the system; if not found, it checks the hosts file, which contains our custom domain-IP mapping rules, with the hosts file path on Mac being /etc/hosts; if still not matched, it queries the non-authoritative DNS, generally defaulting to the one specified by our network operator; if still unresolved, the root DNS resolution process must be followed~

💡 Here are some commonly used commands related to domain name resolution (dig, host, nslookup); if you're interested, you can try them in the terminal~

1. DNS addressing process: dig www.baidu.com +trace @8.8.8.8
2. domain name <=> IP: host www.baidu.com
3. domain => IP: nslookup www.baidu.com

If you know how to use WireShark, you can also filter out DNS resolution-related packets using filter: port 53.


  • Proxy: This refers to proxies. Proxies are generally divided into forward proxies and reverse proxies. Forward proxies are closer to the client, while reverse proxies are closer to the server. The CDN mentioned earlier belongs to reverse proxies, while the VPN we use to access external networks belongs to forward proxies.

HTTP Messages#

After all this groundwork (which is indeed worthwhile), we finally arrive at the most important part of HTTP!

The so-called HTTP, Hypertext Transfer Protocol, has its most important part in the last word "protocol," which stipulates the format and usage of HTTP messages.

Basic Format#

Let's first look at the basic format of HTTP messages, which can be simply divided into header and body:

image

  1. Header: Generally includes the start line part, which consists of the Start line and Header. The image below shows the structure of the request header and response header in a request:

image

  1. Request Header

    1. The Start line consists of request method, URI, HTTP version, a space separator, and the final newline character.

    2. The Header consists of individual key:value pairs and the final newline character. Note that there should not be extra spaces before the :; if you don't believe it, try using the telnet command (on Mac, you can install telnet using brew install telnet, and it is recommended to use it with the practical repository provided by Geek Time chronolaw/http_study).

  2. Response Header

    1. The Start line consists of HTTP version, status code, status code explanation, a space separator, and the final newline character.

    2. The Header structure is the same as the request header.

  1. Body: Generally, the specific content of the body is agreed upon based on the business; it is optional.

Next, let's take a look at the specific request methods and status codes in the request line—

Request Methods#

image

HTTP/1.1 specifies eight request methods, which are divided into commonly used and less commonly used categories. Additionally, there is a category of extended request methods. Note that these methods must be in uppercase.

Here are the commonly used request methods:

  1. GET and HEAD, used to retrieve server resources. The difference between the two is that HEAD only retrieves header information, while GET retrieves the complete header and body information. So if you just want to confirm whether a resource exists or only need header information, you can use the HEAD request to reduce the transmission volume.

  2. POST and PUT, used to send resources to the server. The difference between the two is that the former creates resources on the server, similar to the CREATE operation in a database, while the latter modifies resources on the server, similar to the UPDATE operation. The two are quite similar, but PUT is used less frequently in practice.

💡 Speaking of request methods, two concepts are generally mentioned: safety and idempotence.

  1. Safety: Refers to not making substantial modifications to server resources, so the aforementioned GET and HEAD are safe.

  2. Idempotence: Refers to whether the result remains the same after executing the same operation multiple times, so the aforementioned GET, HEAD, and PUT are all idempotent.

Response Status Codes (5 Categories)#

Now we arrive at our first goal: 🔍 Quickly locate HTTP issues through status codes.

image

Status codes are generally divided into 5 major categories:

1xx: Informational. Generally represents an intermediate state of a request, which is relatively rare.

2xx: Success. This indicates that the request meets expectations, which is what we most want to see.

3xx: Redirection. The resource has changed, and the client needs to resend the request to another domain.

4xx: Client error. If you see this, consider whether the request message is filled out incorrectly.

5xx: Server error. If you see this, you need to confirm the problem with the server-side colleagues.

Common specific status codes can be referenced in the image below:

image

  • 301: Permanent redirection, you can change the requested URL.

  • 302: Temporary redirection.

  • 304: The server resource has not changed, so it redirected to the local cache.

  • 401: Unauthorized error, generally related to authentication and login.

  • 403: Access denied, possibly accessing sensitive information.

  • 404: Resource not found, possibly due to a wrong resource path or lack of permission to access (the error code is set by the server, 404 is generally used to indicate that the resource is not found, but it can also extend its usage to obscure specific reasons).

  • 405: Request method not allowed.

  • 502: Error code returned by the gateway or proxy, generally indicating an error accessing the server behind the gateway or proxy.

  • 503: The server is temporarily unavailable, please try again later.

💡 For 400 and 500, they are relatively vague error codes, sometimes returned as fallback error codes indicating an unknown error has occurred; sometimes, it is because the server does not want to expose too many details. In any case, the server can customize status codes as long as it adheres to public understanding as much as possible.

Common Header Fields (8 Types)#

Quickly, we arrive at the second goal of this article: 🥣 Familiarize with common header fields in HTTP messages.

First, we divide the header fields into three main categories: Request, Response, and Universal. The Universal category also includes the Entity subclass. Request header fields are used by the requester, Response header fields are used by the responder, and Universal header fields can be used by both the requester and responder. Entity header fields are generally used to describe body attributes.

image

The image above lists common header fields. It may look overwhelming, but don't worry; we will explain them by function. Note: Header fields with the same fill color are generally used together or are related.

Next, we will explain the header fields mainly divided into 8 types by function.


  1. Body: Related to body attributes, which can describe either the request message or the response message.

When mentioning the type of body, we first need to understand what MIME (Multipurpose Internet Mail Extensions) type is. It originated from the email system and is now used to describe the type of body. Here is a summary of MIME types; you will find them familiar, such as application/json, text/html, text/javascript, etc. The first half represents a broad category, and the second half represents a specific format.

  • Accept indicates the types of body that the requester can accept, which may include more than one.

  • Content-Type indicates the actual type of body being transmitted.

To reduce the size of the body, we generally compress it. Common compression formats include gzip, deflate, and br, which work well for text.

  • Accept-Encoding indicates the compression formats that the requester can support, which may include more than one.

  • Content-Encoding indicates the actual compression format used for the transmitted body.

Since the above compression methods generally have good compression rates only for text, for multimedia formats like images, audio, and video that do not compress well, there is another way to solve the problem of large files, which is chunked transfer.

  • Transfer-Encoding: chunked: This indicates that the data is being transferred in chunks.

For video-type bodies, such as when we watch videos on Bilibili, this video cannot be requested all at once; we can request the video in segments.

  • Accept-Ranges: bytes: We generally first use a HEAD request to ask the server whether it supports range requests. If it supports byte range requests, the server will return this.

  • Range: bytes=x-y: Under the server's support, the requester can specify that it wants to request the content from byte x to y.

  • Content-Range: bytes x-y/length: This indicates that the body returned by the server is the content from byte x to y, with a total length of length.

In terms of internationalization, we can also set language requirements.

  • Accept-Language indicates the languages that the requester can understand, which may include more than one.

  • Content-Language indicates the actual language of the transmitted body.

Here’s a specific example: Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7. There are two details we need to pay attention to:

  1. In the HTTP specification, the priority of , is greater than that of ;, which is contrary to the syntax of most programming languages. Therefore, the en;q=0.9 above is a pair.

  2. What does the q represent? It actually indicates a weight, with a default value of 1. The responder will try to return content in the language with the highest weight.


  1. Connection: Related to long connections.

Before HTTP/1.1, the client needed to establish a new connection every time it communicated with the server. If communication occurred frequently, it would repeatedly establish and close TCP connections, as shown on the left side of the image, which is a short connection:

image

So if we could keep a TCP connection open a bit longer, we could communicate multiple times without having to establish and close the connection each time, as shown on the right side of the image, which is a long connection. HTTP/1.1 supports this.

  • Connection: keep-alive: This indicates the use of a long connection, which is enabled by default in HTTP/1.1.

  • Connection: close: Actively closes the long connection, generally initiated by the client.

For the server, it can also set the timing for disconnecting long connections, which is configured in the web server. For example, in Nginx, keepalive_timeout represents the timeout for long connections; if there is no data sent or received for a long time, it will actively disconnect; keepalive_requests represents the maximum number of requests that can be received during the long connection.

Because of long connections, the client can also initiate multiple requests simultaneously without having to wait for the result of the first request before sending the second request, which is called pipelined communication.

However, whether it is a short connection or a long connection, there will still be a head-of-line blocking (HoL blocking) problem. This is because the "request-response" model of HTTP stipulates that messages must be "one sent, one received," as shown in the image below:

image

In any case, the receiver must finish processing the red line request before it can handle subsequent requests, even if the latter requests arrive first, meaning "the first sent must be processed first."

To alleviate this problem, one solution is concurrent connections, which means initiating multiple long connections to a single domain, with each long connection being independent. However, maintaining long connections consumes server resources and may also be subject to malicious attacks, so it is generally stipulated that the upper limit for long connections is 6 to 8. If that is still not enough, another workaround is domain sharding; if a single server has multiple domains, the upper limit can be doubled.


  1. Redirection: Related to redirection.

When discussing status codes, we mentioned 301 (permanent) and 302 (temporary). I wonder if you remember their meanings. If such status codes are returned, the response header will definitely indicate the location of the redirection.

  • Location is the redirection location, which generally has two forms: absolute path and relative path. The absolute path corresponds to the basic format of the URL, while the relative path does not include scheme and host, defaulting to the URL in the original request.

There are also three status codes related to redirection: 303 is similar to 302, but the request method can only be GET; 307 and 308 are similar to 302 and 301 respectively (but in reverse), but they do not allow any changes to the request after redirection.


  1. Cookie: Solving the problem brought by the stateless nature of HTTP.

First, we need to clarify: which side is stateless? That's right, it is the server, meaning the server does not know the relationship between the current request and the previous request. This can complicate the server's handling of some special scenarios, such as shopping.

So here, cookies are used to solve this problem. In simple terms, it is a small note given by the server to the client, marking the identity of a certain client. This client brings this note with each request, proving its identity.

  • Set-Cookie: a=xxx, Set-Cookie: b=yyy: This is returned by the server. A cookie is essentially a key-value pair, and each cookie is separate.

  • Cookie: a=xxx; b=yyy: This is what the client sends with the request, which are the cookies returned by the server previously, combined together.

Note that after the client receives these cookies, it will save them on the client side, and we can check them in the Chrome browser.

image

Oh? Besides Name and Value, why are there so many attributes for cookies? In fact, cookies returned by the server generally look like this: Set-Cookie: a=xxx; Domain=xx; Path=xx; Max-Age=xx; Expires=xx; HttpOnly; Secure; SameSite=xx....

  1. Domain, Path: The cookie will only be sent if the URL requested by the client matches them.

  2. Max-Age, Expires: Represents the expiration time of the cookie. The Cache also has similar attributes; it is important to note that Max-Age takes precedence over Expires.

  3. HttpOnly: When true, it means that this cookie can only be transmitted via HTTP(S) protocol, prohibiting access through other means. For example, it can no longer be accessed using document.cookie in JS to prevent script attacks.

  4. Secure: When true, it means that this cookie will only be sent when a secure HTTPS request is made.

  5. SameSite=xxx: Setting SameSite=Strict can strictly limit that the cookie cannot be sent across sites; SameSite=Lax is slightly more lenient, allowing the cookie to be used in safe requests like GET/HEAD.


  1. Cache: Related to caching.

Caching is truly ubiquitous, and HTTP requests are no exception. The caching mentioned here is stored on the client side, aimed at minimizing network requests or the size of returned data to improve network transmission efficiency.

  • Cache-Control

    • The attributes that the server can return include: max-age=10/no-store/no-cache/must-revalidate.

      • max-age is measured in seconds, starting from the moment it is returned;

      • no-store indicates that the client is not allowed to cache;

      • no-cache indicates that the client must verify with the server before using the cache;

      • must-revalidate indicates that the cache must be verified after it becomes invalid.

    • The attributes that the client can send include: max-age=0; no-cache.

      • Generally, cmd + R to refresh the page will carry max-age=0, meaning that data that has existed for 0 seconds will not use local cache but will request a newly generated message from the server; cmd + shift + R to force refresh the page will carry no-cache, which is basically the same, depending on how the server handles it.

      • So when will the cache take effect? Generally, during browser forward, backward, or redirection, requests initiated by the client will not carry the above two attributes.

Additionally, to increase the flexibility of cache control, there are some conditional fields—

  • The server can return:

    • Last-Modified represents the last modification time of the file.

    • ETag, which stands for Entity Tag, represents the unique identifier of the resource. It is designed to solve the problem of modification time not accurately distinguishing file changes. For example, a file may be modified many times within a second, while the minimum unit of modification time is seconds; or a file may modify its time attribute without changing its content. Etag is also divided into strong Etag and weak Etag:

      • The former remains unchanged under the condition that the resource is unchanged at the byte level.

      • The latter remains unchanged under the condition that the resource is unchanged semantically, such as adding a few spaces. Additionally, the weak Etag's value will be prefixed with a W/ marker.

  • The corresponding client request uses:

    • If-Modified-Since, which contains the Last-Modified returned by the server during the last request. If the server resource has not been updated since this time, it will return 304, indicating that the client can use the cache.

    • If-None-Match, which contains the ETag returned by the server during the last request. If the ETag of the server resource has not changed, the server will also return 304.


  1. Proxy: Related to proxies.

Proxies have dual identities because, from the client's perspective, they are the server, while from the server's perspective, they are the client.

As mentioned earlier, proxies are generally divided into forward proxies and reverse proxies. Reverse proxies are generally used for load balancing (reasonably distributing tasks, deciding which server behind will respond to the request), security protection, encryption offloading (not encrypting communication within the internal network, reducing encryption and decryption costs), content caching (temporarily storing server responses, which will be discussed later, i.e., proxy caching), etc.

image

In the above scenario involving proxy servers, the header fields involved are:

  • Via: The proxy server will append its hostname and port information to this field when sending a request.

However, the server generally needs to know the real IP information of the client to facilitate access control, user profiling, statistical analysis, etc. Therefore, outside the HTTP standard, the following header fields are also stipulated:

  • X-Forwarded-For: Similar to the Via addition, but the content appended is the requester's IP address.

  • X-Real-IP: Only records the client's IP address, which is a bit more concise.

However, the above methods have a significant drawback: performance loss! Because each time, the proxy server needs to parse the HTTP message header and modify the message data; in some cases, the message cannot be modified (encrypted) at all. Therefore, a dedicated proxy protocol was later introduced, which is also stipulated outside the standard.

Based on this protocol, the proxy server only needs to add a line of text before the HTTP message. For example:

PROXY TCP4 1.1.1.1 2.2.2.2 55555 80\r\n
GET / HTTP/1.1\r\n
Host: www.xxx.com\r\n
\r\n
  • The beginning is the five uppercase letters PROXY;

  • Then is the type of the client's IP address, such as TCP4 or TCP6;

  • After that are the addresses of the requester and responder, as well as the ports of the requester and responder;

  • Finally, it ends with a carriage return and newline.


  1. Proxy Cache: Related to proxy caching.

image

Clients can cache, and intermediary proxy servers can certainly cache as well. However, due to the dual identity of proxies, the Cache-Control for proxy caching adds some customized attributes—

  • From the server to the proxy server:

    • private indicates that the data can only be stored on the client side and cannot be cached on the proxy for sharing with others, such as private user data.
    • public indicates that the data is completely open and can be cached by anyone.
    • s-maxage indicates the lifespan of the cache on the proxy server.
    • no-transform indicates that the proxy server is prohibited from performing any transformation operations on the data, as some proxies may preemptively convert the data format for easier processing of subsequent requests.
  • From the client to the proxy server:

    • max-stale indicates acceptance of cache that has expired for a period of time.

    • min-fresh is the opposite of the above, indicating that the cache must still have a period of shelf life.

    • only-if-cached indicates that the client only accepts proxy cache. If there is no matching cache on the proxy, the client does not want the proxy to request the server again.


  1. Others

Let's take another look at this common header field image. Do you now understand the meaning and usage of each header field?

image

Wait, there are still some header field explanations missing above; I will summarize them here:

  • Host represents the hostname to be requested. It must appear in HTTP/1.1 to help the server distinguish which specific host to handle the request (if multiple virtual hosts are hosted on the computer, this is necessary; otherwise, the server generally will not process it). Additionally, in general network frameworks, it will help us parse a default Host value from the URL as a fallback, so you may not have to fill it in manually, as the framework will automatically supplement it.

  • User-Agent is the user agent, used to describe the identity of the requester. The server can use it to return an appropriate page layout or data. However, due to historical reasons, its usage has become somewhat chaotic, as each browser claims to be "Mozilla Chrome Safari," etc.

  • Date represents the time the message was created, generally appearing in the response header.

  • Server displays the name and version number of the software providing web services, but exposing part of the server's information may pose security risks, so sometimes this field is omitted from the response or only a vague piece of information is provided.

  • Content-Length represents the length of the body in the message. If this field is absent, there will generally be another field Transfer-Encoding: chunked, which we mentioned earlier.

Thus, we have completed the discussion on what HTTP is. You can try using Chrome Developer Tools or WireShark to capture packets to deepen your understanding.

What is HTTPS?#

HTTPS adds an S to HTTP, which represents the SSL/TLS protocol.

Now we come to our third goal: 🔐 Understand basic encryption knowledge.

This section will be brief as I have previously written related articles; you can refer to the links below:

  1. Information Security | How to Establish Trust in the Internet Age?: Three common cryptographic algorithms, digital signatures, digital certificates.

  2. Information Security | (Extra) How to Establish Trust in the Internet Age?: SSL/TLS, SSH, iOS signing, OpenSSL, WireShark practice.


One additional content to supplement is: the mainstream handshake method of TLS based on ECDHE vs. the traditional handshake method of TLS based on RSA.

The key difference between the two lies in the generation method of the third random number Pre-Master during the communication key generation process:

  • The former: Both sides randomly generate public and private keys first, and then the public key (signed) is sent to the other party as a parameter. Both sides then use the parameters from each other to generate Pre-Master using the ECDHE algorithm;

  • The latter: The client directly generates the random number Pre-Master, encrypts it with the public key of the server's certificate, and sends it to the server.

Because the public and private keys of the former are randomly generated, even if a private key is leaked or cracked during a communication process, it only affects that specific communication; whereas the public and private keys of the latter are fixed. If the private key is leaked or cracked, all previously encrypted communication records will be compromised, as patient hackers have been collecting packets for a long time, waiting for this day (it is said that the Snowden Prism incident exploited this point).

In other words, the former has "one-time keys," providing forward secrecy; while the latter has the risk of "today's interception, tomorrow's decryption," lacking forward secrecy.

For more details, you can refer to the lesson on TLS 1.2 Connection Process Analysis in the course “Perspective on HTTP Protocol,” or try capturing packets using WireShark yourself~

From the perspective of packet capture, the main differences between the two are:

  • The former has an additional "Server Key Exchange" message compared to the latter.

  • The former allows the client to begin encrypted communication without waiting for the connection to be fully established, meaning the client does not have to wait for the server to send back the "Finished" confirmation of the handshake, which is called "TLS False Start."

The Development of HTTP#

Let's summarize the development process of HTTP through the table below; today we will have an overall understanding of the development of HTTP—

TimeVersionMain changeNote
19893 key technologiesHTML, URI, HTTPPaper from Tim Berners-Lee.
1991HTTP/0.91. Request way: GET.
2. Data: HTML.
No RFC.
1996HTTP/1.01. +Request way: HEAD, POST.
2. +Data: img, audio.
3. +Other: HTTP Head, status code, protocol version.
RFC-1945 (1996).
Not a formal standard.
1999HTTP/1.11. +Request way: PUT, DELETE.
2. +Cache-control.
3. +Keep-Alive.
4. +Pipeline transmission(Content-Length), chunked transmission.
5. +Host head (Required).
+Google, Sina, Sohu, Netease, Tencent.
RFC-2616 (1999).
+Facebook, Twitter, Taobao, JD.
Divided to RFC-7230~7235 (2014).
RFC-9112 (2022).
2015HTTP/21. Transmission data format: text → binary data.
2. +Concurrent requests (use stream, abandon pipeline transmission).
3. +Header Compression.
4. +Allow the server to push.
5. +Combined with TLS 1.2+.
Based on SPDY in Chrome browser (2009).
RFC-7540 (2015).
RFC-9113 (2022).
2022HTTP/31. Transport layer protocol: TCP → QUIC (based on UDP, including TLS 1.3, IP → connection ID).
2. Header Compression: HPACK→QPACK
Based on QUIC in Chrome browser(2012).
RFC-9114 (2022).
  • Since HTTP/1.0, HTTP has been written into RFC documents (RFC document summary).

  • HTTP/1.1 is the first formal standard of HTTP, and most of the functions were introduced in the common header fields section. During this early stage, companies like Google, Sina, Sohu, Netease, and Tencent were founded, and later Facebook, Twitter, Taobao, JD, and others emerged.

  • HTTP/1.1 was relatively complete in various aspects, but there was still significant room for optimization in performance and security. Therefore, HTTP/2 and HTTP/3 mainly optimized the performance of HTTP.


  • HTTP/2 is based on the SPDY protocol in Chrome, which was promoted by Chrome. The main changes include:
    • The data transmission format changed from text to binary, greatly facilitating parsing by computers.

    • Based on the concept of virtual streams, it achieved multiplexing capability, replacing the pipeline function in HTTP/1.1.

    • Utilized the HPACK algorithm for header compression, which previously only applied to the body.

    • Allowed the server to create "streams" to actively push messages. For example, when the browser requests HTML, it can proactively send potentially needed JS and CSS files to the client.

    • In terms of security, some enhancements were also made. The encrypted version of HTTP/2 requires its underlying communication protocol to be at least TLS 1.2 (as previous versions had many vulnerabilities), needing to support forward secrecy and SNI (Server Name Indication, an extension protocol of TLS, which allows the client to inform the server of the hostname it is connecting to at the start of the handshake), and listing hundreds of weak cipher suites on a "blacklist."

PS: Compared to the concurrent connection method in HTTP/1.1, the concept of virtual streams elegantly solves the head-of-line blocking problem in HTTP.

image


  • HTTP/3 is based on the QUIC protocol in Chrome, which is also promoted by Chrome.
    • First, let's look at QUIC:

      • It implements reliable transmission based on UDP and introduces a stream concept similar to HTTP/2.

      • It includes TLS 1.3, speeding up connection establishment.

      • Connections use "opaque" connection IDs to mark both ends, rather than being bound by IP addresses and ports, thus supporting seamless connection migration for users.

    • Returning to HTTP/3:

      • Its biggest change is that it replaces the underlying transport layer protocol from TCP to QUIC, completely solving the head-of-line blocking problem of TCP (note that this refers to TCP, not HTTP), performing better in weak network environments. Since QUIC itself already supports encryption, streams, and multiplexing capabilities, the workload for HTTP/3 is significantly reduced.

      • The header compression algorithm has been upgraded from HPACK to QPACK.

      • On June 6, 2022, HTTP/3 was officially written into the RFC document, and HTTP/1.1 and HTTP/2 also updated their RFC documents.

PS: TCP has a special "packet loss retransmission" mechanism to ensure reliable transmission, meaning that lost packets must wait for retransmission confirmation. Other packets, even if they have been received, can only be placed in the buffer (kernel) and cannot be accessed by the upper-layer application (user). Refer to the image below: the red square request is the key to blocking TCP.

image

(Actually, there is a bit of confusion here: does this mean that what was solved before HTTP/3 was only the blocking problem from kernel to user? Specifically, it was only the blocking problem after TCP, what other blocking problems exist at this stage? 🤔? Experts are welcome to clarify~)

The Development of HTTPS#

This part discusses the development from TLS 1.2 to TLS 1.3. Previous versions have been deprecated due to various security issues, which can be learned from the article Information Security | (Extra) How to Establish Trust in the Internet Age?.

For TLS 1.3, its main optimization goals are three:

  1. Compatibility with TLS 1.2. To ensure that older devices can upgrade the protocol more easily, TLS 1.3 maintains the original record format and uses extension protocols to add some "extension fields" at the end of the original records to add new functions. Older versions of TLS can directly ignore them, achieving "backward compatibility."

  2. More secure. TLS 1.3 has slimmed down the supported cipher suites for security reasons, leaving only five cipher suites. The traditional handshake method of TLS based on RSA mentioned earlier has been abolished.

  3. Higher performance. The process of establishing an HTTPS connection involves both TCP handshake and TLS handshake. In TLS 1.2, the TLS handshake requires 2-RTT, while this time has been optimized to 1-RTT in TLS 1.3. How is this achieved?

    1. The answer lies in the previous point: because there are only so many cipher suites, TLS 1.3 can include all supported cipher suite parameters in the ClientHello message. The server can select one and directly generate the communication key for encrypted communication! The client also saves the process of waiting for the server to confirm the cipher suite before sending parameters, which was required in TLS 1.2.

    2. In addition to the standard 1-RTT handshake, TLS 1.3 can achieve 0-RTT handshake when there has been a previous connection that cached server cipher suite parameters, but this also poses risks of forward insecurity and replay attacks, so users need to weigh the trade-offs.

    3. Below, I will include a comparison diagram of the communication process from the course “Perspective on HTTP Protocol,” so you can see the differences between them.

image


Conclusion (Good News)#

Alright, that's all for today. Let's return to our goals and see if you can think of specific knowledge?

  1. 🔍 Quickly locate HTTP issues.

  2. 🥣 Familiarize with common header fields in HTTP messages.

  3. 🔐 Understand basic encryption knowledge.

Of course, don't forget our ultimate goal🏁: If you are interested in HTTP, try to independently delve into learning HTTP through WireShark, Chrome, Telnet tools, and RFC documents~


This is Bo2SS, see you next time!

Breaking news: After more than a year of casual operation, on August 14, 2022, at 10:45, the fan base of Bo2SS quietly surpassed 500🎉! Everyone, please help think of ways to celebrate (or take advantage of) Bo2SS? Welcome to vote or leave a message below!

Vote:

A. Send red envelopes for good luck

B. Give away books to gain knowledge

C. Create a group chat to promote friendship

D. Stop playing and continue writing articles

E. All of the above, don’t refuse anything

F. None of the above, I’ll leave a message instead

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.