Course Content#
File Operations#
- The previously learned cp, mv, cat commands all involve reading and writing files
- cp: read → write
 - mv: read → write → delete
 - cat: read → write
 
 - How are these steps implemented?
 
【Low-level operations, based on file descriptors】
open#
Open or create a file [alias: openat, create]
- man 2 open【Focus on function prototype and description】
 - Prototype

- Return value int: file descriptor or -1

- Common file descriptors: 0-stdin, 1-stdout, 2-stderr
 - -1: an error occurred, and errno will be set [available for perror, see code demonstration]
 
 - flags: file opening method
 - [PS] No need to specifically remember header files
 
 - Description

- System call [system call]: helps you do things you don't have permission to do
 - If the file opened by open does not exist, it may create the file [when O_CREAT is defined in flags]
- O_CREAT
- flag of the open function
 - In the C language system, all uppercase indicates a macro definition
 - The underlying type is an int, called a bitmask
- 32 bits, can represent 32 states, each bit represents a state
 - States can be converted using AND, OR, XOR
 
 
 
 - O_CREAT
 - File descriptor [file descriptor]
- Small, non-negative, can be called by subsequent systems [read, write...]
 - The return value is always the smallest number that can be taken in the current process
- Can be used to determine the number of files [if it returns 1000, the current file count must exceed 1000]
 
 
 - After opening the file, the file pointer is by default at the beginning of the file
 - File description [file description]
- Each call to open creates a new open file description, which is an entry in the system global file table
- Records file offset and file status
 
 - [PS] The file descriptor is a reference to an open file description and is not affected by changes in pathname
 
 - Each call to open creates a new open file description, which is an entry in the system global file table
 - ⭐flags
- Must include one of O_RDONLY, O_WRONLY, O_RDWR
 - Flags are combined using bitwise OR
 - O_CREAT create
 - O_TRUNC truncate
 - O_DIRECT direct IO

- Direct IO - synchronous write, the file will be written directly without buffering
 - Buffered IO
- Buffering end conditions: ① accumulate a bunch of data; ② wait for a fixed time
- Writing a character 'a' to disk will not immediately write to disk, which can reduce costs
 - But data can be lost during power outages
 - [PS]
- The smallest unit of a disk is a block, each block is 4K
- Therefore, the disk is also called a block device
 
 - Similar to the conditions for printf output to stdout [line buffering]
- Encountering a newline / program end, the system automatically flushes the buffer
 - When the buffer is full, it automatically flushes
 - fflush function, manual flush
 
 
 - The smallest unit of a disk is a block, each block is 4K
 
 
 - Buffering end conditions: ① accumulate a bunch of data; ② wait for a fixed time
 
 - O_NONBLOCK non-blocking IO
- Blocking
- For example: when using scanf, it has to wait for input in the standard input stream before proceeding with subsequent operations
 - Disadvantage: wastes resources
 
 - Non-blocking
- Will not wait
 - Disadvantages
- Requires frequent checking, which also wastes resources
 - Requires some mechanism to monitor, incurring technical costs
 
 
 
 - Blocking
 - O_TMPFILE create a temporary file
- The file will be deleted after the process ends, and also when the transaction closes
 - Similar to the system's temporary folder /tmp
 
 
 
 - ❗ Before reading and writing files at a low level, you need to call the open function to obtain the file descriptor
 
read#
Read data through the file descriptor
- man read
 - Prototype

- Return value ssize_t: number of bytes read or -1
- Ending with _t, generally a user-defined type
 - Guess: also one of the basic types, possibly long long, possibly int
 - Find the specific type step by step through ctags: ctrl + ] , ctrl + o
- 
 - Answer: int [on a 32-bit system]; long int [on a 64-bit system]
 - [PS] Logically, on a 32-bit system, the size of long int is equivalent to int
 
 - 
 
 - buf, count: read at most count bytes of data into buf each time
 
 - Description + Return value

- Attempts to read at most count bytes into the buffer
- Cases where the number of bytes read does not reach count: interrupted by someone [signal]; data itself is less than count size
 
 - For each successful read of num [≤ count] bytes of data, the file offset [like a pointer] will automatically move forward by num size
- If the file offset is at EOF [no data to read], the function returns 0
 
 - count
- If set to 0, errors may be detected; if no error is detected, returns 0
 - If greater than SSIZE_MAX [the maximum value of int / long int], the returned result will be predefined [POSIX.1 standard]
 
 - Return value
- ≤ count
 - Returns -1 on error, and errno will be set
 
 
 - [PS] ERRORS
- EAGAIN

- When reading a file [including socket], even if the file has been set to O_NONBLOCK, read will block
 
 
 - EAGAIN
 
write#
Write data through the file descriptor
- man 2 write
 - Prototype

- Very similar to read
 
 - Description + Return value

- Very similar to read
 - Cases where the number of bytes written does not reach count: insufficient physical space; system resource limits; interrupted by signal
 - If O_APPEND [append] is set when opening the file
- The file offset [offset] is at the end of the file, and the write operation will append
 - Otherwise, it will be at the beginning, and the write operation will overwrite
 
 
 
close#
Close a file descriptor
- man close
 
- Mainly just close the file descriptor
 - [PS]
- The record lock will be removed
 - Special cases
- If the last file descriptor of the file description is closed, the resources corresponding to the file description will be released
 - If the last referenced file descriptor of the file is closed, the file will be deleted
 
 - Do not worry about what the kernel specifically does for now
 
 
【Standard file operations, based on file pointers】
<stdio.h>
fopen#
Open a file through a stream
- man fopen
 - Prototype

- Return value FILE *: file pointer
- Originally a macro definition, here uppercase is for compatibility
 
 - mode
- Type is char *, not int
 
 
 - Description

- Associate a stream [stream]
- [PS] Data published on the network, byte stream; file stream <type: FILE *>
 
 - mode
- r / r+: read / read-write
- Stream at the beginning of the file
 
 - w / w+: read / read-write
- Stream at the beginning of the file
 - If the file exists, truncate the file [original data will be cleared upon opening]
 - If the file does not exist, create the file
 
 - a / a+: append / read and append
- When appending, the stream is at EOF; when reading, the stream is at the beginning of the file
 - If the file does not exist, it will be created
 
 - +: read and write
 - [PS]
- b: can be at the end of the mode string or between two characters, used for handling binary files, but generally has no effect on Linux
 
- ❓ Any created file will be modified by the process's umask value
 
 
 - r / r+: read / read-write
 
 - Return value

- On success, returns the file pointer
 - On error, returns NULL and sets errno
 
 
fread, fwrite#
Binary stream IO
- man fread / fwrite
 
- fread: read nmeb times of data [size bytes / time] from stream into ptr
 - fwrite: write ptr's data nmeb times of data [size bytes / time] to stream
 - Return value size_t: number of items read / written [success]
- [unsigned ssize_t]
 - On error or when encountering EOF early 👉 0 ≤ return value < nmeb
- ❗ Therefore, cannot distinguish between EOF and error through return value, need to use feof, ferror to confirm
 
 - [PS] When size is 1, the return value equals the number of bytes transferred
 
 
fclose#
Flush the stream and close the file descriptor
- man fclose
 
- Flushing the stream actually calls fflush
 - Return value
- 0 [success]
 - -1(EOF), and sets errno [failure]
 - Undefined behavior [if an illegal pointer or one that has already been fclose'd is passed]
 
 - ⭐ All operations in standard IO are buffered IO
- It does not have permission to write itself, it needs to wait for kernel control
 
 - ❓ Standard IO is more suitable for text [user], while low-level IO is more suitable for binary files
 
Directory Operations#
Essentially also files [can be directly opened in early versions]
opendir#
- man opendir
 
- Return value DIR *: directory stream pointer or NULL
- The directory stream is by default placed at the first entry of the directory
 - Returns NULL and sets errno on error
 
 
readdir#
- man readdir
 
- Return value struct dirent *: directory entry or NULL
- Pointer to the next directory entry [structure] in the directory stream
- Main fields of the structure: d_ino, d_name
 - [PS]
- Returns the next file one at a time
 - d_off: same as the value returned by telldir, similar to ftell()
- This offset [each file has a different size] is different from the general meaning [in bytes]
 - ftell() gets the value of the current file's position indicator
 
 
 
 - NULL [when reaching the end of the directory stream or an error occurs]
 
 - Pointer to the next directory entry [structure] in the directory stream
 
closedir#
- Close the directory
 
Basic Idea of Implementing ls -al#
- ls -al effect
- 
 - The required information includes: file permissions, link count, username, group name, file size, modification time, file name
 
 - 
 - Idea
- readdir()
- man readdir
 - Read each file in the directory
 - Can obtain the file name
 
 - stat(), lstat()
- man 2 stat
 - Obtain file information based on file path: stat structure
 - 
 - Can obtain file permissions, hard link count, uid, gid, file size, modification time
 - Refer to the EXAMPLE inside: lstat
 - Difference between lstat() and stat()

- lstat() can view the information of soft links without jumping to the file pointed to by the soft link
 
 
 - getpwuid()
- man getpwuid
 - Obtain passwd structure based on uid
 - 
 - Can obtain the corresponding username
 
 - getgrgid
- man getgrgid
 - Obtain group structure based on gid
 - 
 - Can obtain the corresponding group name
 
 - If implementing by yourself → read file, split
- User information: /etc/passwd
 - Group information: /etc/group
 
 
 - readdir()
 - Other Details
- Color
 - Sorting
 - The number of display columns for pure ls command output changes with width
- Get terminal size
 - 
 - Refer to ioctl, man ioctl
 
 - How to determine column width can use brute force, binary search, or gradually approach
 
 
Code Demonstration#
Low-level File Operations#
- 
 - ⭐ See comments for details, focus on usage
 - ❗ Avoid garbled characters
- Leave one position at the end of the string buffer for '\0'
- sizeof(buff) - 1
 
 - The last read of less than 512 bytes needs to exclude interference from extra bytes
- Method ①: manually memset(buff, 0, sizeof(buff))
 - Method ②: always keep the end of the data as '\0', buff[nread] = '\0'
 
 - [PS] When learning system-level commands, do not need to focus too much on these
 
 - Leave one position at the end of the string buffer for '\0'
 - perror prints a system error message
- man 3 perror
 - Prototype

- fopen and others will set errno when an error occurs
 
 - Description

- Outputs the error message of the last call on stderr
 - s usually contains the name of the function
 
 
 - Create a common header file folder to store commonly used header files
- head.h
 - 
 
 
Standard File Operations#
- 
 - Buffer placed in the loop, will be initialized each time
 - nread is non-negative and cannot distinguish between EOF and error
 
Standard IO is Buffered IO#
- 
 - The first "Hello world" is output directly, stderr is not buffered
 - The second "Hello world" would originally wait for sleep to end and could not output to stdout, but can be output immediately through 👇
- Manually flush the buffer: fflush
 - Output a newline
 
 - The sleep function is in unistd.h
 
Additional Knowledge Points#
- ulimit -a can view the upper limit of the number of files that can be opened
- 
 - The upper limit of the number of files opened per process is 1024
- Exceeding this will cause the system to crash
 - [PS]
- System crashes also need to consider memory
 - Be a responsible program: manually close / free, output error logs
 
 
 
 - 
 - Only standard output is line-buffered
 
Points for Consideration#
- ❓ Does saving a file immediately write to disk?
- Refer to Are file edits in Linux directly saved into disk?——StackExchange
 
 
Tips#
- In vim, Shift + K can jump to the man manual
 - Recommended copy and translate software: CopyTranslator
 - Online documentation for man manuals: man page——die.net