Bo2SS

Bo2SS

1 File and Directory Operations and the Implementation Approach of ls

Course Content#

File Operations#

  • The previously learned cp, mv, cat commands all involve reading and writing files
    • cp: read → write
    • mv: read → write → delete
    • cat: read → write
  • How are these steps implemented?

【Low-level operations, based on file descriptors】

open#

Open or create a file [alias: openat, create]

  • man 2 open【Focus on function prototype and description】
  • Prototype
    • Image
    • Return value int: file descriptor or -1
      • Image
      • Common file descriptors: 0-stdin, 1-stdout, 2-stderr
      • -1: an error occurred, and errno will be set [available for perror, see code demonstration]
    • flags: file opening method
    • [PS] No need to specifically remember header files
  • Description
    • Image
    • System call [system call]: helps you do things you don't have permission to do
    • If the file opened by open does not exist, it may create the file [when O_CREAT is defined in flags]
      • O_CREAT
        • flag of the open function
        • In the C language system, all uppercase indicates a macro definition
        • The underlying type is an int, called a bitmask
          • 32 bits, can represent 32 states, each bit represents a state
          • States can be converted using AND, OR, XOR
    • File descriptor [file descriptor]
      • Small, non-negative, can be called by subsequent systems [read, write...]
      • The return value is always the smallest number that can be taken in the current process
        • Can be used to determine the number of files [if it returns 1000, the current file count must exceed 1000]
    • After opening the file, the file pointer is by default at the beginning of the file
    • File description [file description]
      • Each call to open creates a new open file description, which is an entry in the system global file table
        • Records file offset and file status
      • [PS] The file descriptor is a reference to an open file description and is not affected by changes in pathname
    • ⭐flags
      • Must include one of O_RDONLY, O_WRONLY, O_RDWR
      • Flags are combined using bitwise OR
      • O_CREAT create
      • O_TRUNC truncate
      • O_DIRECT direct IO
        • Image
        • Direct IO - synchronous write, the file will be written directly without buffering
        • Buffered IO
          • Buffering end conditions: ① accumulate a bunch of data; ② wait for a fixed time
            • Writing a character 'a' to disk will not immediately write to disk, which can reduce costs
            • But data can be lost during power outages
            • [PS]
              • The smallest unit of a disk is a block, each block is 4K
                • Therefore, the disk is also called a block device
              • Similar to the conditions for printf output to stdout [line buffering]
                • Encountering a newline / program end, the system automatically flushes the buffer
                • When the buffer is full, it automatically flushes
                • fflush function, manual flush
      • O_NONBLOCK non-blocking IO
        • Blocking
          • For example: when using scanf, it has to wait for input in the standard input stream before proceeding with subsequent operations
          • Disadvantage: wastes resources
        • Non-blocking
          • Will not wait
          • Disadvantages
            • Requires frequent checking, which also wastes resources
            • Requires some mechanism to monitor, incurring technical costs
      • O_TMPFILE create a temporary file
        • The file will be deleted after the process ends, and also when the transaction closes
        • Similar to the system's temporary folder /tmp
  • ❗ Before reading and writing files at a low level, you need to call the open function to obtain the file descriptor

read#

Read data through the file descriptor

  • man read
  • Prototype
    • Image
    • Return value ssize_t: number of bytes read or -1
      • Ending with _t, generally a user-defined type
      • Guess: also one of the basic types, possibly long long, possibly int
      • Find the specific type step by step through ctags: ctrl + ] , ctrl + o
        • Image
        • Answer: int [on a 32-bit system]; long int [on a 64-bit system]
        • [PS] Logically, on a 32-bit system, the size of long int is equivalent to int
    • buf, count: read at most count bytes of data into buf each time
  • Description + Return value
    • Image
    • Attempts to read at most count bytes into the buffer
      • Cases where the number of bytes read does not reach count: interrupted by someone [signal]; data itself is less than count size
    • For each successful read of num [≤ count] bytes of data, the file offset [like a pointer] will automatically move forward by num size
      • If the file offset is at EOF [no data to read], the function returns 0
    • count
      • If set to 0, errors may be detected; if no error is detected, returns 0
      • If greater than SSIZE_MAX [the maximum value of int / long int], the returned result will be predefined [POSIX.1 standard]
    • Return value
      • ≤ count
      • Returns -1 on error, and errno will be set
  • [PS] ERRORS
    • EAGAIN
      • Image
      • When reading a file [including socket], even if the file has been set to O_NONBLOCK, read will block

write#

Write data through the file descriptor

  • man 2 write
  • Prototype
    • Image
    • Very similar to read
  • Description + Return value
    • Image
    • Very similar to read
    • Cases where the number of bytes written does not reach count: insufficient physical space; system resource limits; interrupted by signal
    • If O_APPEND [append] is set when opening the file
      • The file offset [offset] is at the end of the file, and the write operation will append
      • Otherwise, it will be at the beginning, and the write operation will overwrite

close#

Close a file descriptor

  • man close
  • Image
  • Mainly just close the file descriptor
  • [PS]
    • The record lock will be removed
    • Special cases
      • If the last file descriptor of the file description is closed, the resources corresponding to the file description will be released
      • If the last referenced file descriptor of the file is closed, the file will be deleted
    • Do not worry about what the kernel specifically does for now

【Standard file operations, based on file pointers】

<stdio.h>

fopen#

Open a file through a stream

  • man fopen
  • Prototype
    • Image
    • Return value FILE *: file pointer
      • Originally a macro definition, here uppercase is for compatibility
    • mode
      • Type is char *, not int
  • Description
    • Image
    • Associate a stream [stream]
      • [PS] Data published on the network, byte stream; file stream <type: FILE *>
    • mode
      • r / r+: read / read-write
        • Stream at the beginning of the file
      • w / w+: read / read-write
        • Stream at the beginning of the file
        • If the file exists, truncate the file [original data will be cleared upon opening]
        • If the file does not exist, create the file
      • a / a+: append / read and append
        • When appending, the stream is at EOF; when reading, the stream is at the beginning of the file
        • If the file does not exist, it will be created
      • +: read and write
      • [PS]
        • b: can be at the end of the mode string or between two characters, used for handling binary files, but generally has no effect on Linux
        • Image
        • ❓ Any created file will be modified by the process's umask value
  • Return value
    • Image
    • On success, returns the file pointer
    • On error, returns NULL and sets errno

fread, fwrite#

Binary stream IO

  • man fread / fwrite
  • Image
  • fread: read nmeb times of data [size bytes / time] from stream into ptr
  • fwrite: write ptr's data nmeb times of data [size bytes / time] to stream
  • Return value size_t: number of items read / written [success]
    • [unsigned ssize_t]
    • On error or when encountering EOF early 👉 0 ≤ return value < nmeb
      • ❗ Therefore, cannot distinguish between EOF and error through return value, need to use feof, ferror to confirm
    • [PS] When size is 1, the return value equals the number of bytes transferred

fclose#

Flush the stream and close the file descriptor

  • man fclose
  • Image
  • Flushing the stream actually calls fflush
  • Return value
    • 0 [success]
    • -1(EOF), and sets errno [failure]
    • Undefined behavior [if an illegal pointer or one that has already been fclose'd is passed]
  • ⭐ All operations in standard IO are buffered IO
    • It does not have permission to write itself, it needs to wait for kernel control
  • ❓ Standard IO is more suitable for text [user], while low-level IO is more suitable for binary files

Directory Operations#

Essentially also files [can be directly opened in early versions]

opendir#

  • man opendir
  • Image
  • Return value DIR *: directory stream pointer or NULL
    • The directory stream is by default placed at the first entry of the directory
    • Returns NULL and sets errno on error

readdir#

  • man readdir
  • Image
  • Return value struct dirent *: directory entry or NULL
    • Pointer to the next directory entry [structure] in the directory stream
      • Main fields of the structure: d_ino, d_name
      • [PS]
        • Returns the next file one at a time
        • d_off: same as the value returned by telldir, similar to ftell()
          • This offset [each file has a different size] is different from the general meaning [in bytes]
          • ftell() gets the value of the current file's position indicator
    • NULL [when reaching the end of the directory stream or an error occurs]

closedir#

  • Close the directory

Basic Idea of Implementing ls -al#

  • ls -al effect
    • Image
    • The required information includes: file permissions, link count, username, group name, file size, modification time, file name
  • Idea
    • readdir()
      • man readdir
      • Read each file in the directory
      • Can obtain the file name
    • stat(), lstat()
      • man 2 stat
      • Obtain file information based on file path: stat structure
      • Image
      • Can obtain file permissions, hard link count, uid, gid, file size, modification time
      • Refer to the EXAMPLE inside: lstat
      • Difference between lstat() and stat()
        • Image
        • lstat() can view the information of soft links without jumping to the file pointed to by the soft link
    • getpwuid()
      • man getpwuid
      • Obtain passwd structure based on uid
      • Image
      • Can obtain the corresponding username
    • getgrgid
      • man getgrgid
      • Obtain group structure based on gid
      • Image
      • Can obtain the corresponding group name
    • If implementing by yourself → read file, split
      • User information: /etc/passwd
      • Group information: /etc/group
  • Other Details
    • Color
    • Sorting
    • The number of display columns for pure ls command output changes with width
      • Get terminal size
      • Image
      • Refer to ioctl, man ioctl
    • How to determine column width can use brute force, binary search, or gradually approach

Code Demonstration#

Low-level File Operations#

  • Image
  • ⭐ See comments for details, focus on usage
  • ❗ Avoid garbled characters
    • Leave one position at the end of the string buffer for '\0'
      • sizeof(buff) - 1
    • The last read of less than 512 bytes needs to exclude interference from extra bytes
      • Method ①: manually memset(buff, 0, sizeof(buff))
      • Method ②: always keep the end of the data as '\0', buff[nread] = '\0'
    • [PS] When learning system-level commands, do not need to focus too much on these
  • perror prints a system error message
    • man 3 perror
    • Prototype
      • Image
      • fopen and others will set errno when an error occurs
    • Description
      • Image
      • Outputs the error message of the last call on stderr
      • s usually contains the name of the function
  • Create a common header file folder to store commonly used header files
    • head.h
    • Image

Standard File Operations#

  • Image
  • Buffer placed in the loop, will be initialized each time
  • nread is non-negative and cannot distinguish between EOF and error

Standard IO is Buffered IO#

  • Image
  • The first "Hello world" is output directly, stderr is not buffered
  • The second "Hello world" would originally wait for sleep to end and could not output to stdout, but can be output immediately through 👇
    • Manually flush the buffer: fflush
    • Output a newline
  • The sleep function is in unistd.h

Additional Knowledge Points#

  • ulimit -a can view the upper limit of the number of files that can be opened
    • Image
    • The upper limit of the number of files opened per process is 1024
      • Exceeding this will cause the system to crash
      • [PS]
        • System crashes also need to consider memory
        • Be a responsible program: manually close / free, output error logs
  • Only standard output is line-buffered

Points for Consideration#

Tips#

  • In vim, Shift + K can jump to the man manual
  • Recommended copy and translate software: CopyTranslator
  • Online documentation for man manuals: man page——die.net

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.