1 File and Directory Operations and the Implementation Approach of ls

Course Content#

File Operations#

The previously learned cp, mv, cat commands all involve reading and writing files
- cp: read → write
- mv: read → write → delete
- cat: read → write
How are these steps implemented?

【Low-level operations, based on file descriptors】

open#

Open or create a file [alias: openat, create]

man 2 open【Focus on function prototype and description】
Prototype
- Return value int: file descriptor or -1
  - Common file descriptors: 0-stdin, 1-stdout, 2-stderr
  - -1: an error occurred, and errno will be set [available for perror, see code demonstration]
- flags: file opening method
- [PS] No need to specifically remember header files
Description
- System call [system call]: helps you do things you don't have permission to do
- If the file opened by open does not exist, it may create the file [when O_CREAT is defined in flags]
  - O_CREAT
    - flag of the open function
    - In the C language system, all uppercase indicates a macro definition
    - The underlying type is an int, called a bitmask
      - 32 bits, can represent 32 states, each bit represents a state
      - States can be converted using AND, OR, XOR
- File descriptor [file descriptor]
  - Small, non-negative, can be called by subsequent systems [read, write...]
  - The return value is always the smallest number that can be taken in the current process
    - Can be used to determine the number of files [if it returns 1000, the current file count must exceed 1000]
- After opening the file, the file pointer is by default at the beginning of the file
- File description [file description]
  - Each call to open creates a new open file description, which is an entry in the system global file table
    - Records file offset and file status
  - [PS] The file descriptor is a reference to an open file description and is not affected by changes in pathname
- ⭐flags
  - Must include one of O_RDONLY, O_WRONLY, O_RDWR
  - Flags are combined using bitwise OR
  - O_CREAT create
  - O_TRUNC truncate
  - O_DIRECT direct IO
    - Direct IO - synchronous write, the file will be written directly without buffering
    - Buffered IO
      - Buffering end conditions: ① accumulate a bunch of data; ② wait for a fixed time
        
        Writing a character 'a' to disk will not immediately write to disk, which can reduce costs
        
        But data can be lost during power outages
        
        [PS]
        
        The smallest unit of a disk is a block, each block is 4K
        
        Therefore, the disk is also called a block device
        
        Similar to the conditions for printf output to stdout [line buffering]
        
        Encountering a newline / program end, the system automatically flushes the buffer
        
        When the buffer is full, it automatically flushes
        
        fflush function, manual flush
  - O_NONBLOCK non-blocking IO
    - Blocking
      - For example: when using scanf, it has to wait for input in the standard input stream before proceeding with subsequent operations
      - Disadvantage: wastes resources
    - Non-blocking
      - Will not wait
      - Disadvantages
        
        Requires frequent checking, which also wastes resources
        
        Requires some mechanism to monitor, incurring technical costs
  - O_TMPFILE create a temporary file
    - The file will be deleted after the process ends, and also when the transaction closes
    - Similar to the system's temporary folder /tmp
❗ Before reading and writing files at a low level, you need to call the open function to obtain the file descriptor

read#

Read data through the file descriptor

man read
Prototype
- Return value ssize_t: number of bytes read or -1
  - Ending with _t, generally a user-defined type
  - Guess: also one of the basic types, possibly long long, possibly int
  - Find the specific type step by step through ctags: ctrl + ] , ctrl + o
    - Answer: int [on a 32-bit system]; long int [on a 64-bit system]
    - [PS] Logically, on a 32-bit system, the size of long int is equivalent to int
- buf, count: read at most count bytes of data into buf each time
Description + Return value
- Attempts to read at most count bytes into the buffer
  - Cases where the number of bytes read does not reach count: interrupted by someone [signal]; data itself is less than count size
- For each successful read of num [≤ count] bytes of data, the file offset [like a pointer] will automatically move forward by num size
  - If the file offset is at EOF [no data to read], the function returns 0
- count
  - If set to 0, errors may be detected; if no error is detected, returns 0
  - If greater than SSIZE_MAX [the maximum value of int / long int], the returned result will be predefined [POSIX.1 standard]
- Return value
  - ≤ count
  - Returns -1 on error, and errno will be set
[PS] ERRORS
- EAGAIN
  - When reading a file [including socket], even if the file has been set to O_NONBLOCK, read will block

write#

Write data through the file descriptor

man 2 write
Prototype
- Very similar to read
Description + Return value
- Very similar to read
- Cases where the number of bytes written does not reach count: insufficient physical space; system resource limits; interrupted by signal
- If O_APPEND [append] is set when opening the file
  - The file offset [offset] is at the end of the file, and the write operation will append
  - Otherwise, it will be at the beginning, and the write operation will overwrite

close#

Close a file descriptor

man close
Mainly just close the file descriptor
[PS]
- The record lock will be removed
- Special cases
  - If the last file descriptor of the file description is closed, the resources corresponding to the file description will be released
  - If the last referenced file descriptor of the file is closed, the file will be deleted
- Do not worry about what the kernel specifically does for now

【Standard file operations, based on file pointers】

<stdio.h>

fopen#

Open a file through a stream

man fopen
Prototype
- Return value FILE *: file pointer
  - Originally a macro definition, here uppercase is for compatibility
- mode
  - Type is char *, not int
Description
- Associate a stream [stream]
  - [PS] Data published on the network, byte stream; file stream <type: FILE *>
- mode
  - r / r+: read / read-write
    - Stream at the beginning of the file
  - w / w+: read / read-write
    - Stream at the beginning of the file
    - If the file exists, truncate the file [original data will be cleared upon opening]
    - If the file does not exist, create the file
  - a / a+: append / read and append
    - When appending, the stream is at EOF; when reading, the stream is at the beginning of the file
    - If the file does not exist, it will be created
  - +: read and write
  - [PS]
    - b: can be at the end of the mode string or between two characters, used for handling binary files, but generally has no effect on Linux
    - ❓ Any created file will be modified by the process's umask value
Return value
- On success, returns the file pointer
- On error, returns NULL and sets errno

fread, fwrite#

Binary stream IO

man fread / fwrite
fread: read nmeb times of data [size bytes / time] from stream into ptr
fwrite: write ptr's data nmeb times of data [size bytes / time] to stream
Return value size_t: number of items read / written [success]
- [unsigned ssize_t]
- On error or when encountering EOF early 👉 0 ≤ return value < nmeb
  - ❗ Therefore, cannot distinguish between EOF and error through return value, need to use feof, ferror to confirm
- [PS] When size is 1, the return value equals the number of bytes transferred

fclose#

Flush the stream and close the file descriptor

man fclose
Flushing the stream actually calls fflush
Return value
- 0 [success]
- -1(EOF), and sets errno [failure]
- Undefined behavior [if an illegal pointer or one that has already been fclose'd is passed]
⭐ All operations in standard IO are buffered IO
- It does not have permission to write itself, it needs to wait for kernel control
❓ Standard IO is more suitable for text [user], while low-level IO is more suitable for binary files

Directory Operations#

Essentially also files [can be directly opened in early versions]

opendir#

man opendir
Return value DIR *: directory stream pointer or NULL
- The directory stream is by default placed at the first entry of the directory
- Returns NULL and sets errno on error

readdir#

man readdir
Return value struct dirent *: directory entry or NULL
- Pointer to the next directory entry [structure] in the directory stream
  - Main fields of the structure: d_ino, d_name
  - [PS]
    - Returns the next file one at a time
    - d_off: same as the value returned by telldir, similar to ftell()
      - This offset [each file has a different size] is different from the general meaning [in bytes]
      - ftell() gets the value of the current file's position indicator
- NULL [when reaching the end of the directory stream or an error occurs]

closedir#

Close the directory

Basic Idea of Implementing ls -al#

ls -al effect
- The required information includes: file permissions, link count, username, group name, file size, modification time, file name
Idea
- readdir()
  - man readdir
  - Read each file in the directory
  - Can obtain the file name
- stat(), lstat()
  - man 2 stat
  - Obtain file information based on file path: stat structure
  - Can obtain file permissions, hard link count, uid, gid, file size, modification time
  - Refer to the EXAMPLE inside: lstat
  - Difference between lstat() and stat()
    - lstat() can view the information of soft links without jumping to the file pointed to by the soft link
- getpwuid()
  - man getpwuid
  - Obtain passwd structure based on uid
  - Can obtain the corresponding username
- getgrgid
  - man getgrgid
  - Obtain group structure based on gid
  - Can obtain the corresponding group name
- If implementing by yourself → read file, split
  - User information: /etc/passwd
  - Group information: /etc/group
Other Details
- Color
- Sorting
- The number of display columns for pure ls command output changes with width
  - Get terminal size
  - Refer to ioctl, man ioctl
- How to determine column width can use brute force, binary search, or gradually approach

Code Demonstration#

Low-level File Operations#

⭐ See comments for details, focus on usage
❗ Avoid garbled characters
- Leave one position at the end of the string buffer for '\0'
  - sizeof(buff) - 1
- The last read of less than 512 bytes needs to exclude interference from extra bytes
  - Method ①: manually memset(buff, 0, sizeof(buff))
  - Method ②: always keep the end of the data as '\0', buff[nread] = '\0'
- [PS] When learning system-level commands, do not need to focus too much on these
perror prints a system error message
- man 3 perror
- Prototype
  - fopen and others will set errno when an error occurs
- Description
  - Outputs the error message of the last call on stderr
  - s usually contains the name of the function
Create a common header file folder to store commonly used header files
- head.h

Standard File Operations#

Buffer placed in the loop, will be initialized each time
nread is non-negative and cannot distinguish between EOF and error

Standard IO is Buffered IO#

The first "Hello world" is output directly, stderr is not buffered
The second "Hello world" would originally wait for sleep to end and could not output to stdout, but can be output immediately through 👇
- Manually flush the buffer: fflush
- Output a newline
The sleep function is in unistd.h

Additional Knowledge Points#

ulimit -a can view the upper limit of the number of files that can be opened
- The upper limit of the number of files opened per process is 1024
  - Exceeding this will cause the system to crash
  - [PS]
    - System crashes also need to consider memory
    - Be a responsible program: manually close / free, output error logs
Only standard output is line-buffered

Points for Consideration#

❓ Does saving a file immediately write to disk?
- Refer to Are file edits in Linux directly saved into disk?——StackExchange

Tips#

In vim, Shift + K can jump to the man manual
Recommended copy and translate software: CopyTranslator
Online documentation for man manuals: man page——die.net