Information Security | In the Internet Age, How to Build Trust?

Last week, I gave a technical presentation in the frontend group. Since the company is promoting "full English in the office," this was also a fully English presentation. Speaking entirely in English for an hour and a half was a challenge for me, but I would rate my performance as more courageous than skillful 🐶.

In the internet era, how do we establish trust❓ The foundation of building trust is of course to ensure that information transmission is secure, so that users are willing to communicate, shop, and make payments online... Today, let's take a step-by-step look at the development of information security in the internet era!

「Outline of this article」 is as follows:

「Keywords」 Cryptography, Symmetric Key Systems, Asymmetric Key Systems, Hashing, Digital Signatures, Digital Certificates, SSL/TLS, SSH, iOS Signing, OpenSSL, WireShark

💡: Don't worry, we won't discuss complex mathematical operations here.

Introduction#

We have a lot of code related to information security in our projects, such as RSA, AES, HMAC... Every time I encounter them, I feel confused or even overwhelmed, so I want to understand what they actually do.
I previously stumbled upon issues with iOS signing, which led to my last article (iOS | Illustrated Principles Behind iOS Signing, feel free to check it out if you're interested). This time, since it is a presentation aimed at the frontend group, it serves as an extension of the previous article.

These reasons led to the creation of this article, and I hope you in the internet era will find it interesting.

Objectives of this Article#

To answer and understand the following questions:

🥣 Why is information transmission generally done using symmetric encryption + asymmetric encryption? Can't we just use one of them?
🥣 Why is a digital signature needed for information security?
🥣 Why is hashing required before signing?
🥣 Why is a digital certificate necessary for information security?

Ultimate Goal: When we encounter cryptography-related issues, we will no longer feel fear or confusion.

What is Information Security?#

This is a relatively broad question, and here I want to answer it through the three elements of information security (referred to as CIA).

Source: comtact

The components of CIA are as follows:

Confidentiality: Refers to the protection of information from being disclosed to unauthorized users or entities during storage, transmission, and use.
Integrity: Refers to the prevention of unauthorized users from altering information during storage, transmission, and use, or preventing authorized users from making inappropriate alterations to the information.
Availability: Refers to ensuring that authorized users or entities can normally use information resources without being denied access, allowing them to reliably and timely access information resources.
- ➕Authentication: Also understood as Non-Repudiation, it refers to both parties in network communication being assured of the authenticity of the participants and the information provided, meaning that all participants cannot deny their true identity or the authenticity of the information provided and the operations and commitments made.
- ➕Controllability: Refers to the degree of control over the network system and information within the range of transmission and storage.

Reference: 5 Security Features of Information Security——51know

💡 Here are two points that need clarification:

Two additional elements, Authentication and Controllability, have been added to Availability. I believe these two serve Availability, so they are grouped together.
The focus of this article is on the three elements: Confidentiality, Integrity, and Authentication. We can consider them as three requirements, and the primary task of this article is to fulfill them. Additionally, the explanations above are somewhat technical, so let me simplify them:
1. ❗️Confidentiality: A sends a message to B and does not want C to see the content of the message.
2. ❗️Integrity: A sends a message to B and does not want C to modify the content of the message.
3. ❗️Authentication: A sends a message to B, and B can confirm that the sender's identity is A, not C.

Now, let's continue with our three requirements.

Why is Information Security Needed?#

Source: pixabay

Before discussing how to achieve it, let's first consider the reasons for needing information security. I can summarize it simply as "Users need it, and companies must meet that need." To elaborate, it can be divided into three points:

The need for information security is a common understanding, especially in fields involving money and user privacy, such as banking and e-commerce.
Analyzing from the perspective of internet users, if their information security cannot be guaranteed, how can they dare to shop, make payments, take loans, or input account passwords and other private information online?
For a company, if it cannot guarantee the information security of its users, it will lose the trust of its users, which is equivalent to losing users. What kind of development can such a company expect?

Therefore, in the internet era, information security is extremely necessary. Now, let's see how to achieve it!

How to Achieve Information Security#

❗️ Remember our three requirements: Confidentiality, Integrity, Authentication.

Source: electronicdesign

When discussing information security, we cannot overlook the role of cryptography, as the three basic security goals of cryptography (Confidentiality, Integrity, Availability) directly address the three requirements mentioned above.

First, we can learn about the history of cryptography from the following videos:

Brief Overview: The History of Cryptography｜Explained For Beginners——Binance Academy, Youtube
Detailed Overview:
- Secret Codes: A History of Cryptography (Part 1)——The Generalist Papers, Youtube
- More Secret Codes: A History of Cryptography (Part 2)——The Generalist Papers, Youtube

Now you should have a general understanding of cryptography. Next, let's look at the three common algorithms in cryptography.

Three Common Cryptographic Algorithms#

They are Symmetric Key Algorithms, Asymmetric Key Algorithms, and Hash Algorithms.

Symmetric Encryption Algorithm#

Source: preyproject

As shown in the yellow box in the image, the process of symmetric encryption is that the sender encrypts plaintext using a key (Secret Key), resulting in ciphertext, which is then sent to the receiver. The receiver decrypts the ciphertext using the same key to obtain the plaintext.

💡 The characteristic of symmetric encryption: The key used for encryption/decryption is the same (Same Key).

Q1: Can you think of some common symmetric encryption algorithms?

DES, 3DES, AES, IDEA, SM1, SM4, RC2, RC4.

Among these, except for RC4, which is a stream cipher (encrypting/decrypting one bit or byte at a time), the others are block cipher algorithms (encrypting after splitting into N groups and then combining in order).

The purpose of listing these algorithm names is to help you understand what they do when you encounter these terms, so you can explore the mathematical principles if you're interested.

Reference: Cryptography Basics (Part 1) Common Encryption Algorithm Classification——Blog

Q2: Looking at the image above, you might wonder: How should the key be sent to the receiver? The receiver needs it to decrypt the ciphertext.

That's a good question. In the real world, we could meet in secret to exchange the key, but in the internet world, hackers can easily intercept your communication. Therefore, the biggest challenge of symmetric encryption algorithms is the key distribution problem.

How do we solve this? This leads us to the following asymmetric encryption algorithm.

Asymmetric Encryption Algorithm#

Source: preyproject

As shown in the yellow box in the image, the process of asymmetric encryption is generally similar to that of symmetric encryption.

💡 The only difference is the characteristic of asymmetric encryption: The keys used for encryption/decryption are different (Different Key).

Just like the symmetric key in symmetric encryption, the private key (Secret Key / Private Key) in asymmetric encryption is very private and important and should not be shared casually, while the public key (Public Key) can be distributed freely.

If you want to communicate securely with the sender, you can send the public key to the sender for them to use for encryption. Once you (the receiver) receive the ciphertext, you can decrypt it using your private key.

Q1: Can you list some common asymmetric encryption algorithms?

RSA, ECC, DSA, ECDSA, SM2.

Q2: Since asymmetric encryption solves the key distribution problem, does that mean symmetric encryption is no longer needed?

Not quite. Asymmetric encryption also has its drawbacks; its disadvantage is that the encryption speed is much slower than that of symmetric encryption (the essence of symmetric encryption is bitwise operations, while the essence of asymmetric encryption is exponentiation and modular operations).

Therefore, this leads us to the first target question of this article: Why is information transmission generally done using symmetric encryption + asymmetric encryption? Can't we just use one of them?

We will discuss this further in the section on "Encryption Methods for Information Transmission." Next, let's introduce the last common cryptographic algorithm—hash algorithms.

Hash Algorithm#

Source: hackmd.io

As shown in the image, a hash algorithm can convert any data into a fixed-length code, which we generally refer to as a hash value or digest.

💡 Characteristics of hash algorithms:

"Unique" Identification: The same input will always produce the same output; different inputs will most likely produce different outputs. (The term "most likely" is used because it is not guaranteed to be unique.)
Irreversibility: It is impossible to deduce the input from the output.

We can think of the hash value as the "fingerprint" of the original data. At a crime scene, while we cannot deduce what the corresponding person looks like from a fingerprint (irreversibility), we can compare the fingerprints from the scene and the suspect (or fingerprint database) to identify the perpetrator!

💡 Combining the two characteristics above, hash algorithms generally have two uses:

Verifying whether data has been modified

When we download certain software, we often see a hash value (MD5) attached near the download link. What is its purpose?

This hash value acts like a "fingerprint" of the original download package A. If the package we downloaded is altered during the download process and becomes a fake package B, we can calculate the hash value B-hash (MD5) of the fake package B and compare it with the original package A's hash value A-hash. If B-hash and A-hash do not match, then we can conclude that package B "is suspicious."

Additionally, the digital signature technology mentioned later will also use hash functions, which we will discuss shortly.

Storing User Privacy

A platform's database needs to store users' account names and passwords. If passwords are stored in plaintext, it poses a risk; if the database is compromised, all passwords could be leaked.

Therefore, the database stores hash values of passwords. When users enter their passwords to log in, it only needs to compare the original password with the hash value of the entered password. This is why, when we forget our password on a platform, we can only reset it, as the platform does not know our original password!

At this point, let's look at two small Q&As.

Q1: Can you list some common hash algorithms?

MD5, SHA-1, SHA-2, SHA-3, HMAC, SM3.

Q2: The SM series algorithms are recognized and published by our country. Do you know their full name?

It is precisely because they are recognized by our country that SM actually comes from pinyin, and its full name is Commercial (S) Encryption (M) Code.

Returning to the second use of hash algorithms (storing user privacy), there are some risks, namely 🌈 Rainbow Attacks.

Source: ckd3

As shown in the rainbow table, for some common passwords, their corresponding hash values (MD5) are already well-known. If hackers obtain these hash values, it is equivalent to obtaining the original passwords, so we need additional measures to reduce the risk of rainbow attacks. There is a website based on rainbow tables called cmd5 that you can check out if you're interested.

For example, hash salting and HMAC. The former adds a random number (salt) to the plaintext before calculating the hash value; the latter is even more secure, as it combines a key (a pre-shared symmetric key) with the plaintext before calculating the hash value.

Reference:

Now, we have covered the three common cryptographic algorithms. Next, based on these foundational algorithms, let's discuss how to achieve our three requirements. Do you remember? Confidentiality, Integrity, Authentication.

Encryption Methods for Information Transmission#

❗️ To achieve Requirement 1: Confidentiality.

🥣 Answering Target Question 1: Why is information transmission generally done using symmetric encryption + asymmetric encryption? Can't we just use asymmetric encryption?

First, to answer Question 1, we have learned that both encryption methods have their shortcomings, but they can complement each other.

🥣 Therefore, combining the advantages of symmetric encryption (fast speed) and asymmetric encryption (ease of key distribution) to achieve secure information transmission means using asymmetric encryption to transmit the symmetric key, and then using the symmetric key for subsequent communication. This is currently the best-known method.

PS: Regarding key exchange methods, in addition to the above method based on asymmetric encryption, there are actually two other ways to exchange keys.

Dedicated key exchange algorithms, such as DH(E), ECDH(E);
Pre-deployment methods, such as PSK, SRP.

In summary, remember one thing: due to the good performance of symmetric encryption, a large amount of frequent communication data is encrypted using symmetric keys. The keys mentioned for exchange are also symmetric keys.

❗️ With encryption methods in place, the confidentiality of information transmission is guaranteed, thus fulfilling our first requirement!

Now, how do we ensure the integrity of the information? This is where digital signatures come into play.

Digital Signatures#

❗️ To achieve Requirement 2: Integrity.

🥣 Answering Target Question 2: Why is a digital signature needed for information security?

🥣 Answering Target Question 3: Why is hashing required before signing?

Digital signatures are generally included with the data to be transmitted to prevent data from being tampered with. Let's discuss how this is achieved.

First, its underlying core is the hash algorithm and asymmetric encryption algorithm mentioned earlier.

Generation (Signing)#

The signature is generated by the sender in the communication. The sender first hashes the data to be transmitted to obtain a data digest (the hash value), and then uses the private key to compute the digest, thus generating the signature for the data.

Verification (Signature Verification)#

Upon receiving the data and its signature, the receiver performs the following actions:

Data: Uses the same hash algorithm as the sender to compute the actual data digest A;
Signature: Uses the public key corresponding to the sender's private key to compute the original data digest B from the signature.

By comparing digest A and digest B, if they are equal, it indicates that the actual data has not been tampered with; otherwise, there is an issue with the data (isn't this somewhat similar to using hashing to verify data integrity? However, the security level of signatures is clearly higher, as it involves the protection of the private key).

The entire process of generating and verifying the signature is as follows. Now, returning to Target Questions 2 and 3, do you have the answers?

🥣 2: Why is a digital signature needed for information security?

Simply put, it is to ensure the integrity of the information (❗️ thus fulfilling Requirement 2).

🥣 3: Why is hashing required before signing?

First, consider two questions:

What is the purpose of hashing? It converts any data into a fixed-length code.
The essence of signing is asymmetric encryption; does it have any shortcomings? Yes, its performance is low.

Therefore, signing a large piece of data is less efficient than signing a smaller piece of data, and coincidentally, hash algorithms can largely ensure the uniqueness of the data.

However, 1) speeding up the signing process is only part of the answer. If hashing is not performed before signing, there will also be 2) security risks:

Reordering. If the message to be transmitted is too long and exceeds the maximum length supported by the asymmetric encryption algorithm, the system can only perform segmented signing on the message, resulting in multiple signatures. The receiver will then need to verify each signature individually. However, if this is the case, the order of the segmented messages cannot be guaranteed to remain unaltered, as each signature can still pass verification (one might think of integrating multiple signatures and then generating a single signature, but this could again be limited by the maximum length supported by the asymmetric encryption algorithm).
Message Forgery. Hackers can capture any signature and deduce the plaintext message, allowing them to assemble and use it later (as the public key used for deduction is easily obtainable).

Reference: Why hash the message before signing it with RSA?——StackExchange

At this point, we have only one requirement left❗️ and one target question 🥣 to answer!

Now, let's think about a question:

Q: How does the receiver obtain the public key used for signature verification?

Please continue reading.

Digital Certificates#

💡 To achieve Requirement 3: Authentication.

🥣 Answering Target Question 4: Why is a digital certificate necessary for information security?

Before understanding digital certificates, regarding the question "How does the receiver obtain the public key used for signature verification?" we might immediately think that the sender can simply attach the public key when sending the data and signature, and everything would be in order～ As shown in the image:

However, this introduces a new question Q: How can the receiver confirm that the public key has not been maliciously replaced by someone else? In other words, the identity of the public key is unclear.

🎉 Ding ding ding～ Now it's time for the digital certificate to come into play.

Components#

Let's first look at the components of a digital certificate. It consists of the public key, identity information of the public key, and their signature (another signature).

Note:

The private key used to encrypt the public key data is another public-private key pair issued by a trusted Certificate Authority (CA), which we certainly do not have access to.
How to understand CA? We can think of our ID card being issued by the government.

Public Key Wrapped in a Digital Certificate#

Now, the content we send has changed slightly: public key → certificate.

That is, the original data + signature + public key has changed to data + signature + certificate.

The certificate contains the public key needed by the receiver to verify signature A, as well as identity information of the public key and signature B:

The existence of identity information eliminates the risk of unclear public key identity (❗️ thus fulfilling Requirement 3: Authentication, 🥣 answering Target Question 4: Why is a digital certificate necessary for information security?);
Signature B guarantees the integrity of the public key and its identity information.

However, because of signature B, we also need to verify signature B when retrieving the public key:

The public key used to verify signature B is also issued by the CA and exists in the CA certificate.

(Here, we can review the components of the certificate: public key + identity information of the public key + their signature.)

♻️ You might have another question Q: How does the receiver obtain this CA certificate? Even if they have the CA certificate, how do they verify the signature of the CA certificate? It seems to enter an infinite loop.

In fact, CA certificates are generally built into the system/software during installation, so we should trust them, right?

Next, let's learn about the trust chain of certificates.

Certificate Trust Chain#

Based on the position of the certificate in the trust chain, certificates can be divided into three types:

Root Certificate
Intermediate Certificate
Leaf Certificate

🌰 Example: My developer certificate A (Apple Development) is issued by intermediate certificate B (Apple Worldwide Developer Relations Certification Authority, built-in when installing Xcode), and intermediate certificate B is issued by root certificate C (Apple Root CA, built into the system), while root certificate C is issued by its own CA, as C is at the top of the trust chain and has the final say.

Now, you can also check your Mac > Keychain Access > Certificates for a deeper understanding.

There is also an interesting question Q: Is it possible for a certificate to be tampered with?

Direct modification? First, consider that directly modifying the certificate's content (public key and identity information) is certainly not feasible. Hackers do not have the CA's private key, so they cannot re-sign the certificate content.
Direct replacement? This is also not possible. The certificate contains the sender's identity information. For example, if I access a website (sender) through a browser (receiver), the certificate of the website will contain domain information, which the browser can directly compare with the requested domain to determine whether the certificate has been tampered with.

Reference: Thoroughly Understand the Encryption Principles of HTTPS——Zhihu

Extension (Standards, Systems, Standards): Cryptography Basics (Part 2) Digital Certificates, Key Basics——Blog

At this point, we have fulfilled all three requirements❗️ and answered all four target questions 🥣! Let's take a moment to relax～

Finally, let's discuss some additional content:

Related technologies in information security (SSL/TLS, SSH, iOS Signing)
Some practical applications (OpenSSL, WireShark).

Due to the length of this article, please refer to another article: Additional Content: Information Security | How to Build Trust in the Internet Era?.

Returning to the Initial Goals#

If you can only remember a little from this presentation🤏, then try to understand the answers to the questions below!

Why is information transmission generally done using symmetric encryption + asymmetric encryption? Can't we just use asymmetric encryption?
Why is a digital signature needed?
Why is hashing required before signing?
Why is a digital certificate necessary?

Additionally, have you achieved the ultimate goal of this article? I look forward to your feedback!

Ultimate Goal: When we encounter cryptography-related issues, we will no longer feel fear or confusion.

——Written on a day with a red rainstorm warning⛈️, and I decided to take a day off, in Shenzhen