File Reading, Permission Control, AWK Text Processing
Course Content#
Directory#
- cd: Change working directory
- Default parameter: Return to your home, [PS] C language does not support default parameters, can be implemented through macros
- Parameter ~: Return to your home, can specify the home of a username using [~username]
- Parameter -: Return to the last directory, suitable for switching between two long paths
- pwd
- -L Logical working directory
- -P Physical working directory [real]
- Mainly reflected in soft links
- ln -s [file to link] [location of soft link / or location of soft link / soft link name]
- Both are in the same physical space, physical directories are consistent
- [Delete] For soft link test pointing to a folder
- [rm test] only deletes this link file
- [rm -r test/] will delete all contents in the linked folder, dangerous!
- Although this operation will prompt that test/ is not a directory, test/ is already clean
- If the link is a file instead of a folder, it does not affect the original content
- [PS] Hard links cannot be used on directories
- mkdir
- -p Automatically create parent directories when creating multi-level directories
- -m Set permissions
- Example: mkdir -p -m 700 ./test/abc/x
- rmdir: Can be replaced by rm
- Absolute path: Starts from the root directory /
- Relative path: Starts from the current directory . or the upper directory ..
Management of Files and Directories#
- ls: Display file and directory information
- cp: Copy
- -i Ask if the file exists
- -r Recursive
- -a = -pdr
- p Copy along with file attributes
- d Copy link files instead of what they point to
- r Recursive
- -u Only copy if the source file is newer than the destination file, suitable for large backups
- -s Copy as a soft link; -l Copy as a hard link
- cp/mv for multiple files, the last one must be a directory
- ⭐cp's implementation logic
- Open file → Read file → Write file
- [If cp is a pipe]
- It needs to receive data from the other end to read data and then write
- After cp, the copy result is a normal file storing pipe data
- rm: Delete
- -i Interactive mode, will ask
- -r Recursive
- -f Force
- mv: Move
- Essentially cp + rm
- -i, -f, -u, similar to cp
- basename: Get file name; dirname: Get directory name
- Will not check if there is a file
- basename /home/xxx/abc → abc
- dirname /home/xxx/abc → /home/xxx
- Generally used in shell scripts
- rename: Can be replaced by cp, mv
Viewing File Contents#
- cat: Read continuously in the forward direction [reverse: tac]
- -A = -vET
- -v Display symbols that are generally not visible
- -E Display line break symbol as $ [same as vim under list, nolist]
- -T Display TAB as ^I
- [PS] If there is a \0...stray error (when copying someone else's code), you can use -A to see special characters
- -b List line numbers; -n List line numbers including empty lines
- -A = -vET
- nl: Output line numbers to display files
- Options [option parameters]👈 They have a specific order
- -b [a/t] Method of specifying line numbers → Corresponds to [cat -n/ cat -b]
- -n [ln/rn/rz] Representation methods for listing line numbers
- -w [num] Number of spaces occupied by line numbers
- Generally used for special texts
- more, less, head, tail
- Difference between more and less
- more can only scroll down, search has no highlighting
- less is more flexible, can scroll up and down, search has highlighting
- Get lines 21 - 40: man ls | head -n 40 | tail -n -20 [cut tail, head, sed can also achieve this]
- Difference between more and less
- od View file contents in binary mode, temporarily not needed, see course notes for specific operations
Modifying File Time and Creating Files#
- Three times
- mtime [modify]: Modification time [default display]
- ctime [change]: Permission modification time
- atime [access]: Access time (recording cost is high)
- [PS] Mac has more time types, incompatible with Linux
- touch file
- Mainly used to modify time
- If it does not exist, it will be created automatically
- Special usage is rare, see course notes for details
- [PS] Can change the file time back to avoid detection
File Hidden Attributes#
Can be changed through chattr [+/-options]
- A: Do not modify atime [can improve efficiency, disk lifespan]
- S: Synchronous writing
- Refers to IO synchronization, avoiding data loss during power outages [data in memory disappears directly after power loss]
- a: Can only add data [logs]
- Cannot modify or delete, even with sudo
- Cannot modify in vim, but modifying [without sudo] will produce a backup file without the a attribute
- i: Cannot delete, modify, or create links [default is hard link], equivalent to [solidification]
- s: When a file is deleted, it is directly deleted from the disk
- Normally, deleting a file only breaks the relationship between the file and the corresponding disk location, the content is still on the disk, similar to C language's free()
- After adding the s attribute, deleting a file will set all data on the disk to zero, truly rewriting and clearing
- This generally refers to mechanical hard drives, solid-state drives are another matter
- [lsattr] View file hidden attributes
- [PS] Soft links seem to have no hidden attributes
Special File Permissions#
Permission | Permission Placeholder | Target Object | Effect |
---|---|---|---|
set_uid | s | Binary program files, non-scripts | Gain program owner's permissions when executing this program |
set_gid | s | Binary program files, directories | In this directory, the effective group becomes the directory's group |
sticky bit | t | Directory | In this directory, users can only delete content they created |
Can be changed through chmod [u/g/empty |+/-|s/t]
- set_uid
- ⭐chmod u+s, for owner user permissions
- Occupies the x position of the owner user permissions: if x exists, it becomes s; if not, it becomes S
- Example: Other users modify passwords through the passwd command
- Other users have no permissions on the file /etc/shadow that stores passwords
- But the passwd command has s permission, so other users with execution permission gain program owner's permissions when executing, allowing them to modify the password file /etc/shadow
- set_gid
- ⭐chmod g+s, for group permissions
- Occupies the x position of the group permissions: if x exists, it becomes s; if not, it becomes S
- Actions taken in this directory are done as the [directory's group]
- Generally applies to directories
- For binary program files, it is somewhat similar to set_uid, but the permissions gained are from the [belonging group]
- Facilitates collaborative work, managing all users under the directory by group
- sticky bit
- ⭐chmod +t, can refer to Sticky Bit — Wikipedia
- Occupies the x position of other users' permissions: if x exists, it becomes t; if not, it becomes T
- Example: /tmp directory and some directories under it
Command and File Query#
- which: Find executable files
- Search in the PATH
- In zsh, ls is an alias for ls --color=tty, which may be misinterpreted as multiple paths by the second command's ls
- Check the options for which, you can use which -p ls to perform a [path search] only for ls
- Learn to use command substitution
- whereis: Find more types of files, can specify specific types
- -b Only search for binary files
- -m Only search for files in the man manual
- -s Only search for source files
- -u Search for other files
- locate: Fuzzy locate
- -i Ignore case
- Based on a database, not updated in real-time, can use [sudo] updatedb to update immediately
find: Advanced Search⭐#
【Time】
- -mtime, -ctime, -atime [unit is days]
- <--__+n___|__n__|__-n__|:
n days ago [not including n] | within one day of n days ago | within n days [including n]
- Minutes can be viewed in the man manual: -mmin, -cmin, -amin
- <--__+n___|__n__|__-n__|:
- ⭐Creating a file will modify both the file and the current directory's mtime
- Because each directory has a file table that records file names and inodes
- When a file is added to the directory, the file table gains a new file name and inode
- Thus, the current directory's mtime will also change
- ❗ But if only modifying a file, it is easy to understand, the current directory's mtime will not change
- -newer file Use file time as a time node
【User, User Group】
- -user Specify username
- Find the process ID [PID] of user hz active within 10 minutes
- Process files are under /proc, all are virtual, file [directory] size is 0
- Application: Kill process
- [PS] Process ID and inode are not the same, cannot kill inode
- -uid, -gid, -group, nouser, nogroup usage is the same as -user
【File, File Permissions】
- -name Match file name
- ( ... -o ... ) Logical OR, note that parentheses need to be escaped with , and there are spaces inside
- xargs — parameter substitution, no need to move to the front using command substitution
- -size [+greater than -less than]
- Can find empty files for cleanup, but be careful with some special files [.py, etc.]
- -type
- 7 types of files: f b c d l s p
- -perm 775/-775
- The latter includes the former, i.e., -775 includes 775, 776, 777
- -exec find's built-in execution tool
- Count the number of lines of written c, cpp, sh code
- -exec — start of the command
- {} — result of find
- ; — end of the command
➕AWK Text Data Processing#
awk [-Ffv] 'BEGIN { commands } pattern { commands } END { commands } file'
-
Common options
- -F fs: Specify delimiter fs, can be a character or regular expression, default is space
- -v var=value: Pass external variable var to awk
- -f scriptfile: Read awk commands from script file scriptfile
-
BEGIN{}: Execute once at the start
-
pattern{}: Key point ⭐
- Each line read by awk will execute once
- If no pattern block is provided, it defaults to executing { print }
-
END{}: Execute once at the end
-
[PS]
- Can read data from file file, or pass data using pipe "|"
- commands support print functions from various languages, such as C language's printf()
-
Example: Count the total duration of recent logins
- First preprocess the data
- Use awk to get the required duration data
- Use awk for calculations
- Common built-in variables: $n, NF, NR
-
[PS]
- Reference document — The AWK Programming Language
- No need to memorize, just know its functionality
- It is more specialized; in fact, it is a language with its own syntax structure
- There are many variants
Additional Knowledge Points#
- For the run directory under /var: ls -al run/, ls -ald run/, ls -ald run
- The results of the three are different, ls -ald run can also be used as ls -al run
- Note the presence of the slash /
- The PATH variable is in the process, when disconnecting SSH and reconnecting, the PATH variable will return to its original state
- Strictly speaking, it is in memory
- The difference between pipe files [created by mkfifo] and pipes [in commands |]
- Essentially, there is no difference: one in, one out, no input will cause blocking
- However, | implicitly creates a process, which processes the data and then returns to the original process
- ⭐In Linux, the basic unit of text processing: line
- cat, tac, scanf, printf
- 【Hard link】
- Equivalent to a file having another alias, deleting one alias does not affect the file
- For files a and b that have hard links
- Use ls -i to view inode [file node number]
- You will find that the inode of files a and b is the same, and the file link count is 2
- chmod -R ...: Recursive, modify permissions for the directory and all files below it
- Unzip compression: gzip/gunzip, tar -c compress/-x decompress/-v display redundant information/-z operate through gzip
- tar cannot decompress .gz files, use gunzip instead
- ⭐The default size of a created directory is 4096 Byte = 4K
- Corresponds to a block on the disk, storing a file table
- ❗ The size of a directory will not be 0; directories with a size of 0 under /proc belong to a virtual file system, not real directories
Points for Thought#
- The source file of C compiled into a.out, executing ./a.out, what does ./ mean here
- No special meaning, just [current path], using relative path execution
- Can also be executed using absolute path
- ❓ What about directly a.out?
- a.out will be considered a command, resulting in a command not found error
- For default characters, the system assumes it is executing a command, looking for built-in commands or searching in the PATH
- Defaults to command rather than file, as commands are more commonly used
- Details of output redirection
- Method ①: Will queue and output in order
- Method ②: Will transmit simultaneously, data will be out of order
- &1 = file number 1
- Additionally, method ① has order requirements
- Thus, file number 2 is not output to file number 1
- The order here is important, you must open the xxx.log file first!
- When find cannot find a file, using ls can easily create an illusion
- Equivalent to ls empty, i.e., ls current directory
Tips#
- 【Avoid dangerous rm operations】
- Write scripts to regularly [contab] upload code to Github, Gitee
- Wrap rm into mv, define a recycle bin for yourself, write scripts to regularly clean files older than 3 days
- 【Small trick】 In shell, input a small part of the code, pressing up and down can match previously entered matching code
- [Others] Use exclamation mark ! at the beginning
- !mk directly inputs the most recent matching command in history
- !9812 directly inputs line 9812 in history
- [Others] Use exclamation mark ! at the beginning
- Generally, -r, -R indicates recursion, you can try it yourself, lowercase will be used if occupied
- Talking about link defaults to hard link
- When searching parameters in the man manual, use "/-z," and add a comma at the end for fewer matches
- $() can serve as a command substitution symbol ``