8 Files and Directories, AWK

File Reading, Permission Control, AWK Text Processing

Course Content#


  • cd: Change working directory
    • Default parameter: Return to your home, [PS] C language does not support default parameters, can be implemented through macros
    • Parameter ~: Return to your home, can specify the home of a username using [~username]
    • Parameter -: Return to the last directory, suitable for switching between two long paths
  • pwd
    • -L Logical working directory
    • -P Physical working directory [real]
    • Mainly reflected in soft links
      • ln -s [file to link] [location of soft link / or location of soft link / soft link name]
      • Both are in the same physical space, physical directories are consistent
      • [Delete] For soft link test pointing to a folder
        • [rm test] only deletes this link file
        • [rm -r test/] will delete all contents in the linked folder, dangerous!
          • Although this operation will prompt that test/ is not a directory, test/ is already clean
          • If the link is a file instead of a folder, it does not affect the original content
      • [PS] Hard links cannot be used on directories
  • mkdir
    • -p Automatically create parent directories when creating multi-level directories
    • -m Set permissions
    • Example: mkdir -p -m 700 ./test/abc/x
  • rmdir: Can be replaced by rm
  • Absolute path: Starts from the root directory /
  • Relative path: Starts from the current directory . or the upper directory ..

Management of Files and Directories#

  • ls: Display file and directory information
  • cp: Copy
    • -i Ask if the file exists
    • -r Recursive
    • -a = -pdr
      • p Copy along with file attributes
      • d Copy link files instead of what they point to
      • r Recursive
    • -u Only copy if the source file is newer than the destination file, suitable for large backups
    • -s Copy as a soft link; -l Copy as a hard link
    • cp/mv for multiple files, the last one must be a directory
    • ⭐cp's implementation logic
      • Open file → Read file → Write file
      • [If cp is a pipe]
        • It needs to receive data from the other end to read data and then write
        • After cp, the copy result is a normal file storing pipe data
  • rm: Delete
    • -i Interactive mode, will ask
    • -r Recursive
    • -f Force
  • mv: Move
    • Essentially cp + rm
    • -i, -f, -u, similar to cp
  • basename: Get file name; dirname: Get directory name
    • Will not check if there is a file
    • basename /home/xxx/abc → abc
    • dirname /home/xxx/abc → /home/xxx
    • Generally used in shell scripts
  • rename: Can be replaced by cp, mv

Viewing File Contents#

  • cat: Read continuously in the forward direction [reverse: tac]
    • -A = -vET
      • -v Display symbols that are generally not visible
      • -E Display line break symbol as $ [same as vim under list, nolist]
      • -T Display TAB as ^I
      • [PS] If there is a \0...stray error (when copying someone else's code), you can use -A to see special characters
    • -b List line numbers; -n List line numbers including empty lines
  • nl: Output line numbers to display files
    • Options [option parameters]👈 They have a specific order
    • -b [a/t] Method of specifying line numbers → Corresponds to [cat -n/ cat -b]
    • -n [ln/rn/rz] Representation methods for listing line numbers
    • -w [num] Number of spaces occupied by line numbers
    • Generally used for special texts
  • more, less, head, tail
    • Difference between more and less
      • more can only scroll down, search has no highlighting
      • less is more flexible, can scroll up and down, search has highlighting
    • Get lines 21 - 40: man ls | head -n 40 | tail -n -20 [cut tail, head, sed can also achieve this]
  • od View file contents in binary mode, temporarily not needed, see course notes for specific operations

Modifying File Time and Creating Files#

  • Three times
    • mtime [modify]: Modification time [default display]
    • ctime [change]: Permission modification time
    • atime [access]: Access time (recording cost is high)
    • [PS] Mac has more time types, incompatible with Linux
  • touch file
    • Mainly used to modify time
    • If it does not exist, it will be created automatically
    • Special usage is rare, see course notes for details
    • [PS] Can change the file time back to avoid detection

File Hidden Attributes#

Can be changed through chattr [+/-options]

  • A: Do not modify atime [can improve efficiency, disk lifespan]
  • S: Synchronous writing
    • Refers to IO synchronization, avoiding data loss during power outages [data in memory disappears directly after power loss]
  • a: Can only add data [logs]
    • Cannot modify or delete, even with sudo
    • Cannot modify in vim, but modifying [without sudo] will produce a backup file without the a attribute
  • i: Cannot delete, modify, or create links [default is hard link], equivalent to [solidification]
  • s: When a file is deleted, it is directly deleted from the disk
    • Normally, deleting a file only breaks the relationship between the file and the corresponding disk location, the content is still on the disk, similar to C language's free()
    • After adding the s attribute, deleting a file will set all data on the disk to zero, truly rewriting and clearing
    • This generally refers to mechanical hard drives, solid-state drives are another matter
  • [lsattr] View file hidden attributes
    • [PS] Soft links seem to have no hidden attributes

Special File Permissions#

PermissionPermission PlaceholderTarget ObjectEffect
set_uidsBinary program files, non-scriptsGain program owner's permissions when executing this program
set_gidsBinary program files, directoriesIn this directory, the effective group becomes the directory's group
sticky bittDirectoryIn this directory, users can only delete content they created

Can be changed through chmod [u/g/empty |+/-|s/t]

  • set_uid
    • ⭐chmod u+s, for owner user permissions
    • Occupies the x position of the owner user permissions: if x exists, it becomes s; if not, it becomes S
    • Example: Other users modify passwords through the passwd command
      • Image
      • Other users have no permissions on the file /etc/shadow that stores passwords
      • But the passwd command has s permission, so other users with execution permission gain program owner's permissions when executing, allowing them to modify the password file /etc/shadow
  • set_gid
    • ⭐chmod g+s, for group permissions
    • Occupies the x position of the group permissions: if x exists, it becomes s; if not, it becomes S
    • Actions taken in this directory are done as the [directory's group]
    • Generally applies to directories
      • For binary program files, it is somewhat similar to set_uid, but the permissions gained are from the [belonging group]
    • Facilitates collaborative work, managing all users under the directory by group
  • sticky bit
    • ⭐chmod +t, can refer to Sticky Bit — Wikipedia
    • Occupies the x position of other users' permissions: if x exists, it becomes t; if not, it becomes T
    • Example: /tmp directory and some directories under it

Command and File Query#

  • which: Find executable files
    • Search in the PATH
    • In zsh, ls is an alias for ls --color=tty, which may be misinterpreted as multiple paths by the second command's ls
      • Image
      • Check the options for which, you can use which -p ls to perform a [path search] only for ls
        • Image
        • Image
        • Learn to use command substitution
  • whereis: Find more types of files, can specify specific types
    • -b Only search for binary files
    • -m Only search for files in the man manual
    • -s Only search for source files
    • -u Search for other files
  • locate: Fuzzy locate
    • -i Ignore case
    • Based on a database, not updated in real-time, can use [sudo] updatedb to update immediately


  • -mtime, -ctime, -atime [unit is days]
    • <--__+n___|__n__|__-n__|: n days ago [not including n] | within one day of n days ago | within n days [including n]
    • image-20201216223627330
    • Minutes can be viewed in the man manual: -mmin, -cmin, -amin
  • ⭐Creating a file will modify both the file and the current directory's mtime
    • Because each directory has a file table that records file names and inodes
    • When a file is added to the directory, the file table gains a new file name and inode
    • Thus, the current directory's mtime will also change
    • ❗ But if only modifying a file, it is easy to understand, the current directory's mtime will not change
  • -newer file Use file time as a time node

【User, User Group】

  • -user Specify username
    • Find the process ID [PID] of user hz active within 10 minutes
    • Image
    • Process files are under /proc, all are virtual, file [directory] size is 0
    • Application: Kill process
    • [PS] Process ID and inode are not the same, cannot kill inode
  • -uid, -gid, -group, nouser, nogroup usage is the same as -user

【File, File Permissions】

  • -name Match file name
    • Image
    • ( ... -o ... ) Logical OR, note that parentheses need to be escaped with , and there are spaces inside
    • xargs — parameter substitution, no need to move to the front using command substitution
  • -size [+greater than -less than]
    • Can find empty files for cleanup, but be careful with some special files [.py, etc.]
  • -type
    • 7 types of files: f b c d l s p
  • -perm 775/-775
    • The latter includes the former, i.e., -775 includes 775, 776, 777
  • -exec find's built-in execution tool
    • Image
    • Count the number of lines of written c, cpp, sh code
    • -exec — start of the command
    • {} — result of find
    • ; — end of the command

➕AWK Text Data Processing#

awk [-Ffv] 'BEGIN { commands } pattern { commands } END { commands } file'
  • Common options

    • -F fs: Specify delimiter fs, can be a character or regular expression, default is space
    • -v var=value: Pass external variable var to awk
    • -f scriptfile: Read awk commands from script file scriptfile
  • BEGIN{}: Execute once at the start

  • pattern{}: Key point ⭐

    • Each line read by awk will execute once
    • If no pattern block is provided, it defaults to executing { print }
  • END{}: Execute once at the end

  • [PS]

    • Can read data from file file, or pass data using pipe "|"
    • commands support print functions from various languages, such as C language's printf()
  • Example: Count the total duration of recent logins

    • Image
    • First preprocess the data
    • Image
    • Use awk to get the required duration data
    • Image
    • Use awk for calculations
    • Common built-in variables: $n, NF, NR
  • Image
  • [PS]

    • Reference document — The AWK Programming Language
    • No need to memorize, just know its functionality
    • It is more specialized; in fact, it is a language with its own syntax structure
    • There are many variants
  • Image

Additional Knowledge Points#

  • For the run directory under /var: ls -al run/, ls -ald run/, ls -ald run
    • The results of the three are different, ls -ald run can also be used as ls -al run
    • Note the presence of the slash /
  • The PATH variable is in the process, when disconnecting SSH and reconnecting, the PATH variable will return to its original state
    • Strictly speaking, it is in memory
  • The difference between pipe files [created by mkfifo] and pipes [in commands |]
    • Essentially, there is no difference: one in, one out, no input will cause blocking
    • However, | implicitly creates a process, which processes the data and then returns to the original process
  • ⭐In Linux, the basic unit of text processing: line
    • cat, tac, scanf, printf
  • 【Hard link】
    • Equivalent to a file having another alias, deleting one alias does not affect the file
    • For files a and b that have hard links
      • Use ls -i to view inode [file node number]
      • You will find that the inode of files a and b is the same, and the file link count is 2
  • chmod -R ...: Recursive, modify permissions for the directory and all files below it
  • Unzip compression: gzip/gunzip, tar -c compress/-x decompress/-v display redundant information/-z operate through gzip
    • tar cannot decompress .gz files, use gunzip instead
  • ⭐The default size of a created directory is 4096 Byte = 4K
    • Corresponds to a block on the disk, storing a file table
    • ❗ The size of a directory will not be 0; directories with a size of 0 under /proc belong to a virtual file system, not real directories

Points for Thought#

  • The source file of C compiled into a.out, executing ./a.out, what does ./ mean here
    • No special meaning, just [current path], using relative path execution
    • Can also be executed using absolute path
    • ❓ What about directly a.out?
      • a.out will be considered a command, resulting in a command not found error
      • For default characters, the system assumes it is executing a command, looking for built-in commands or searching in the PATH
        • Defaults to command rather than file, as commands are more commonly used
  • Details of output redirection
    • Image
    • Method ①: Will queue and output in order
    • Method ②: Will transmit simultaneously, data will be out of order
    • &1 = file number 1
    • Additionally, method ① has order requirements
      • Image
      • Thus, file number 2 is not output to file number 1
      • The order here is important, you must open the xxx.log file first!
  • When find cannot find a file, using ls can easily create an illusion
    • Image
    • Equivalent to ls empty, i.e., ls current directory


  • 【Avoid dangerous rm operations】
    • Write scripts to regularly [contab] upload code to Github, Gitee
    • Wrap rm into mv, define a recycle bin for yourself, write scripts to regularly clean files older than 3 days
  • 【Small trick】 In shell, input a small part of the code, pressing up and down can match previously entered matching code
    • [Others] Use exclamation mark ! at the beginning
      • !mk directly inputs the most recent matching command in history
      • !9812 directly inputs line 9812 in history
  • Generally, -r, -R indicates recursion, you can try it yourself, lowercase will be used if occupied
  • Talking about link defaults to hard link
  • When searching parameters in the man manual, use "/-z," and add a comma at the end for fewer matches
  • $() can serve as a command substitution symbol ``

Course Notes#

  • Image
  • Image
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.