Course Content#
File Operations#
- The previously learned cp, mv, cat commands all involve reading and writing files
- cp: read → write
- mv: read → write → delete
- cat: read → write
- How are these steps implemented?
【Low-level operations, based on file descriptors】
open#
Open or create a file [alias: openat, create]
- man 2 open【Focus on function prototype and description】
- Prototype
- Return value int: file descriptor or -1
- Common file descriptors: 0-stdin, 1-stdout, 2-stderr
- -1: an error occurred, and errno will be set [available for perror, see code demonstration]
- flags: file opening method
- [PS] No need to specifically remember header files
- Description
- System call [system call]: helps you do things you don't have permission to do
- If the file opened by open does not exist, it may create the file [when O_CREAT is defined in flags]
- O_CREAT
- flag of the open function
- In the C language system, all uppercase indicates a macro definition
- The underlying type is an int, called a bitmask
- 32 bits, can represent 32 states, each bit represents a state
- States can be converted using AND, OR, XOR
- O_CREAT
- File descriptor [file descriptor]
- Small, non-negative, can be called by subsequent systems [read, write...]
- The return value is always the smallest number that can be taken in the current process
- Can be used to determine the number of files [if it returns 1000, the current file count must exceed 1000]
- After opening the file, the file pointer is by default at the beginning of the file
- File description [file description]
- Each call to open creates a new open file description, which is an entry in the system global file table
- Records file offset and file status
- [PS] The file descriptor is a reference to an open file description and is not affected by changes in pathname
- Each call to open creates a new open file description, which is an entry in the system global file table
- ⭐flags
- Must include one of O_RDONLY, O_WRONLY, O_RDWR
- Flags are combined using bitwise OR
- O_CREAT create
- O_TRUNC truncate
- O_DIRECT direct IO
- Direct IO - synchronous write, the file will be written directly without buffering
- Buffered IO
- Buffering end conditions: ① accumulate a bunch of data; ② wait for a fixed time
- Writing a character 'a' to disk will not immediately write to disk, which can reduce costs
- But data can be lost during power outages
- [PS]
- The smallest unit of a disk is a block, each block is 4K
- Therefore, the disk is also called a block device
- Similar to the conditions for printf output to stdout [line buffering]
- Encountering a newline / program end, the system automatically flushes the buffer
- When the buffer is full, it automatically flushes
- fflush function, manual flush
- The smallest unit of a disk is a block, each block is 4K
- Buffering end conditions: ① accumulate a bunch of data; ② wait for a fixed time
- O_NONBLOCK non-blocking IO
- Blocking
- For example: when using scanf, it has to wait for input in the standard input stream before proceeding with subsequent operations
- Disadvantage: wastes resources
- Non-blocking
- Will not wait
- Disadvantages
- Requires frequent checking, which also wastes resources
- Requires some mechanism to monitor, incurring technical costs
- Blocking
- O_TMPFILE create a temporary file
- The file will be deleted after the process ends, and also when the transaction closes
- Similar to the system's temporary folder /tmp
- ❗ Before reading and writing files at a low level, you need to call the open function to obtain the file descriptor
read#
Read data through the file descriptor
- man read
- Prototype
- Return value ssize_t: number of bytes read or -1
- Ending with _t, generally a user-defined type
- Guess: also one of the basic types, possibly long long, possibly int
- Find the specific type step by step through ctags: ctrl + ] , ctrl + o
- Answer: int [on a 32-bit system]; long int [on a 64-bit system]
- [PS] Logically, on a 32-bit system, the size of long int is equivalent to int
- buf, count: read at most count bytes of data into buf each time
- Description + Return value
- Attempts to read at most count bytes into the buffer
- Cases where the number of bytes read does not reach count: interrupted by someone [signal]; data itself is less than count size
- For each successful read of num [≤ count] bytes of data, the file offset [like a pointer] will automatically move forward by num size
- If the file offset is at EOF [no data to read], the function returns 0
- count
- If set to 0, errors may be detected; if no error is detected, returns 0
- If greater than SSIZE_MAX [the maximum value of int / long int], the returned result will be predefined [POSIX.1 standard]
- Return value
- ≤ count
- Returns -1 on error, and errno will be set
- [PS] ERRORS
- EAGAIN
- When reading a file [including socket], even if the file has been set to O_NONBLOCK, read will block
- EAGAIN
write#
Write data through the file descriptor
- man 2 write
- Prototype
- Very similar to read
- Description + Return value
- Very similar to read
- Cases where the number of bytes written does not reach count: insufficient physical space; system resource limits; interrupted by signal
- If O_APPEND [append] is set when opening the file
- The file offset [offset] is at the end of the file, and the write operation will append
- Otherwise, it will be at the beginning, and the write operation will overwrite
close#
Close a file descriptor
- man close
- Mainly just close the file descriptor
- [PS]
- The record lock will be removed
- Special cases
- If the last file descriptor of the file description is closed, the resources corresponding to the file description will be released
- If the last referenced file descriptor of the file is closed, the file will be deleted
- Do not worry about what the kernel specifically does for now
【Standard file operations, based on file pointers】
<stdio.h>
fopen#
Open a file through a stream
- man fopen
- Prototype
- Return value FILE *: file pointer
- Originally a macro definition, here uppercase is for compatibility
- mode
- Type is char *, not int
- Description
- Associate a stream [stream]
- [PS] Data published on the network, byte stream; file stream <type: FILE *>
- mode
- r / r+: read / read-write
- Stream at the beginning of the file
- w / w+: read / read-write
- Stream at the beginning of the file
- If the file exists, truncate the file [original data will be cleared upon opening]
- If the file does not exist, create the file
- a / a+: append / read and append
- When appending, the stream is at EOF; when reading, the stream is at the beginning of the file
- If the file does not exist, it will be created
- +: read and write
- [PS]
- b: can be at the end of the mode string or between two characters, used for handling binary files, but generally has no effect on Linux
- ❓ Any created file will be modified by the process's umask value
- r / r+: read / read-write
- Return value
- On success, returns the file pointer
- On error, returns NULL and sets errno
fread, fwrite#
Binary stream IO
- man fread / fwrite
- fread: read nmeb times of data [size bytes / time] from stream into ptr
- fwrite: write ptr's data nmeb times of data [size bytes / time] to stream
- Return value size_t: number of items read / written [success]
- [unsigned ssize_t]
- On error or when encountering EOF early 👉 0 ≤ return value < nmeb
- ❗ Therefore, cannot distinguish between EOF and error through return value, need to use feof, ferror to confirm
- [PS] When size is 1, the return value equals the number of bytes transferred
fclose#
Flush the stream and close the file descriptor
- man fclose
- Flushing the stream actually calls fflush
- Return value
- 0 [success]
- -1(EOF), and sets errno [failure]
- Undefined behavior [if an illegal pointer or one that has already been fclose'd is passed]
- ⭐ All operations in standard IO are buffered IO
- It does not have permission to write itself, it needs to wait for kernel control
- ❓ Standard IO is more suitable for text [user], while low-level IO is more suitable for binary files
Directory Operations#
Essentially also files [can be directly opened in early versions]
opendir#
- man opendir
- Return value DIR *: directory stream pointer or NULL
- The directory stream is by default placed at the first entry of the directory
- Returns NULL and sets errno on error
readdir#
- man readdir
- Return value struct dirent *: directory entry or NULL
- Pointer to the next directory entry [structure] in the directory stream
- Main fields of the structure: d_ino, d_name
- [PS]
- Returns the next file one at a time
- d_off: same as the value returned by telldir, similar to ftell()
- This offset [each file has a different size] is different from the general meaning [in bytes]
- ftell() gets the value of the current file's position indicator
- NULL [when reaching the end of the directory stream or an error occurs]
- Pointer to the next directory entry [structure] in the directory stream
closedir#
- Close the directory
Basic Idea of Implementing ls -al#
- ls -al effect
- The required information includes: file permissions, link count, username, group name, file size, modification time, file name
- Idea
- readdir()
- man readdir
- Read each file in the directory
- Can obtain the file name
- stat(), lstat()
- man 2 stat
- Obtain file information based on file path: stat structure
- Can obtain file permissions, hard link count, uid, gid, file size, modification time
- Refer to the EXAMPLE inside: lstat
- Difference between lstat() and stat()
- lstat() can view the information of soft links without jumping to the file pointed to by the soft link
- getpwuid()
- man getpwuid
- Obtain passwd structure based on uid
- Can obtain the corresponding username
- getgrgid
- man getgrgid
- Obtain group structure based on gid
- Can obtain the corresponding group name
- If implementing by yourself → read file, split
- User information: /etc/passwd
- Group information: /etc/group
- readdir()
- Other Details
- Color
- Sorting
- The number of display columns for pure ls command output changes with width
- Get terminal size
- Refer to ioctl, man ioctl
- How to determine column width can use brute force, binary search, or gradually approach
Code Demonstration#
Low-level File Operations#
- ⭐ See comments for details, focus on usage
- ❗ Avoid garbled characters
- Leave one position at the end of the string buffer for '\0'
- sizeof(buff) - 1
- The last read of less than 512 bytes needs to exclude interference from extra bytes
- Method ①: manually memset(buff, 0, sizeof(buff))
- Method ②: always keep the end of the data as '\0', buff[nread] = '\0'
- [PS] When learning system-level commands, do not need to focus too much on these
- Leave one position at the end of the string buffer for '\0'
- perror prints a system error message
- man 3 perror
- Prototype
- fopen and others will set errno when an error occurs
- Description
- Outputs the error message of the last call on stderr
- s usually contains the name of the function
- Create a common header file folder to store commonly used header files
- head.h
Standard File Operations#
- Buffer placed in the loop, will be initialized each time
- nread is non-negative and cannot distinguish between EOF and error
Standard IO is Buffered IO#
- The first "Hello world" is output directly, stderr is not buffered
- The second "Hello world" would originally wait for sleep to end and could not output to stdout, but can be output immediately through 👇
- Manually flush the buffer: fflush
- Output a newline
- The sleep function is in unistd.h
Additional Knowledge Points#
- ulimit -a can view the upper limit of the number of files that can be opened
- The upper limit of the number of files opened per process is 1024
- Exceeding this will cause the system to crash
- [PS]
- System crashes also need to consider memory
- Be a responsible program: manually close / free, output error logs
- Only standard output is line-buffered
Points for Consideration#
- ❓ Does saving a file immediately write to disk?
- Refer to Are file edits in Linux directly saved into disk?——StackExchange
Tips#
- In vim, Shift + K can jump to the man manual
- Recommended copy and translate software: CopyTranslator
- Online documentation for man manuals: man page——die.net