Bo2SS

Bo2SS

3 Multiprocessing

Course Content#

What is a Process#

  • A process is the image of a program in memory, it is a running program, it is an instantiation of a program, and it is a complex entity
    • It includes allocated memory space, user information, group information, permissions, resources in use, running code, open files, etc.
  • Correspondingly
    • ① What is a program
      • A program is a compiled executable binary file, stored on disk
        • It is an ordinary file with x permission
      • A collection of programs is an application
    • ② What is a thread
      • A thread represents a series of ordered instructions that need to be executed by the CPU
      • A process may consist of one or more threads, executing instructions simultaneously
    • [PS] A process is the basic unit of resource allocation by the CPU, while a thread is the basic unit of scheduling by the CPU

fork#

Create a child process [Process Interface]

  • man fork
  • Prototype + Description
    • Image
    • Return value [Type: pid_t]: Process ID
    • A new process [child process] is created by copying the process that called fork [parent process], with the parent and child processes running in independent memory spaces
      • When fork completes, both have the same content, and subsequent memory writes and file mappings do not affect each other
      • A true copy occurs only when memory changes [copy-on-write concept]
        • Otherwise, they share the same memory space
    • image-20210120105027238
    • The main differences between parent and child are as follows:
      • The child has its own unique PID, which does not match any existing PID
      • The child's perceived parent PID [getppid] is the same as the actual parent PID
      • The child does not inherit the parent's memory locks
      • The child's resource usage and CPU time will be reset to 0
      • The child does not inherit pending signals, semaphores, record locks, timers, or asynchronous I/O operations
  • Return value
    • Image
    • Success: returns the child's PID in the parent process, returns 0 in the child process
      • The parent cannot obtain the child's PID by other means, while the child can obtain the parent's PID via getppid
    • Failure: returns -1 and sets errno [child process not created]

wait#

Wait for process state changes

  • man wait
  • Prototype
    • Image
    • wstatus [int *]: returns the child's state
      • Such as the return or exit value in the child process
      • Needs to use macros to parse, such as WIFEXITED(wstatus), see code demonstration—wait—two
  • Description
    • Image
    • Wait object: the child of the calling process
    • State change situations: the child is terminated, interrupted by a signal, or awakened by a signal
    • When there is a terminated child,
      • The wait command can cause the system to release resources related to the child
      • Otherwise [if wait command is not executed], the terminated child process will become a zombie process [👇]
        • The dead child is not detected by the parent process, and its resources are not released
        • Can be viewed using top
        • Image
        • zombie refers to a zombie process
    • As long as one child has changed state, the wait command will return immediately
      • Otherwise, it will block until a child changes state or a signal interrupts
  • Return value
    • image
    • Returns the PID of the terminated child or -1 [error, and sets errno]

exec family#

Execute a file [everything is a file]

  • man exec
  • Prototype
    • Image
    • There are many siblings
  • Description
    • Image
    • Image
    • A new process image will replace the current process image
      • [Let the child have a brand new world]
    • The first parameter is always the name of the file to be executed
      • path: full path
      • file: can be a command in the PATH environment variable or a full path
    • The entire family can be summarized as: "exec + l/v + p/e/pe"
      • arg parameter name, indicating the parameter of the previous path parameter
      • l-list, all parameters are placed in a single string [parameter passing method]
        • By convention, arg0 should be related to the name of the file to be executed
        • Must end with (char *) NULL
      • v-vector, all parameters are placed in a string array [parameter passing method]
        • Must end with a null pointer
      • p-path, the executable file search range includes the PATH environment variable
        • It replicates the shell command lookup process
      • e-env, allows specifying environment variables
        • variable-value pairs
  • Return value
    • img
    • Returns -1 only when an error occurs

flock#

Operate advisory locks on open files

[Essentially to protect data]

  • man 2 flock
  • Prototype + Description
    • Image
    • Operated through file descriptor fd
    • Mainly three types of operations
      • LOCK_SH: shared lock
      • LOCK_EX: exclusive lock
        • Exclusive lock: if one person accesses, others cannot access
        • Example: many people using one restroom
      • LOCK_UN: unlock
  • Return value
    • Image
    • 0, success; -1, failure

Code Demonstration#

fork#

1. Copying Buffer, Line Buffer

  • Image
  • Output result
    • Image
    • ❗ Why does inputting suyelus output two suyelus after fork with no output function?
    • 【Fact】 Although the code after fork is copied to the child process, the child process will only execute the code after fork
    • 【Key】 The buffer was copied, and it still contained suyelus
      • There is no newline character in printf, and standard I/O is line-buffered I/O, so line 13 does not refresh the buffer after execution
      • The condition to trigger the buffer refresh occurs only when the program ends
    • [PS] Under zsh, it may only output suyelus once, possibly due to zsh optimization? There are two under bash

2. Parent and Child Processes are Independent

  • Image
  • Output result
    • Image
    • ❗ Does the parent process always execute first?
      • Not necessarily, parent and child processes are completely independent and unrelated; essentially, who executes first is determined by kernel scheduling
      • However, the parent process very likely executes first because each process has a running time assigned by the kernel, and the parent process's running time has not yet elapsed after it spawned the child
  • [PS]
    • Process 1 <pid 1 process> is the init process, and all other processes are spawned by it
    • Contrary to the human world, the first process in the computer world remains alive, waiting to collect the remains of child processes

3. Create 10 child processes and print their own serial numbers

10 child processes are full siblings

  • Image
  • If line 18's break is not added
    • It will produce 2^10 processes: 1 -> 2 -> 4 -> 8 -> 16 -> 32 -> ... -> 2^10
    • Count the number of running parent and child processes: ps -ef | grep -v grep | grep Ten | wc -l
      • [Ten is the executable program name]
  • Sleep duration does not accumulate
    • When a process encounters sleep, the system schedules to run other processes, and the final wait time only reflects about 10s
  • The i variable becomes independent after being taken by the child process and will not change due to changes in the i variable in the parent process

wait#

1. Create Zombie Processes

  • Image
  • Not using wait to perceive the termination of the child process will create zombie processes
  • Users have various ways to view zombie processes [let the program run in the background: ./a,out &]
    • Based on ps, check for processes with defunct or Z markers
    • Image
    • Based on top
    • Image
    • Using pstree can show the lineage of zombie processes
    • Image
  • [PS] To kill a zombie process, you need to kill its parent process; the parent process's parent process is zsh, and after the program ends, zsh will inform the system to collect the remains of parent and child processes

2. Perceive the Return Status of Child Processes

  • Image
  • After running for about 2s, the program outputs as follows:
  • Image
  • ❗ Why does the child process return 1, but the parent process wait gets a status of 256?
    • The value of a 16-bit int variable is 256 👉 its binary corresponds to the 8th bit being 1, and all other bits are 0
    • Refer to the following image [Linux-UNIX System Programming Manual (Volume 1) — Section 26.1.3], the problem has an answer
    • Image
    • In fact, the man manual mentions that macros can be used to check the status
    • Image
    • WEXITSTATUS(wstatus) can parse the exit status
    • In the source code, each macro corresponds to the following bit operations
    • Image
    • Therefore, when printing the status, process it with the macro as needed

exec family#

【Replace with a brand new process】

  • Image
  • The child process is replaced by a brand new process [vim] on line 17, and the subsequent code will never be executed
    • Directly exec after fork: it will not copy the parent process's memory space during fork and then immediately use it during exec [copy-on-write concept: a true copy occurs only when memory changes]
  • wait(NULL) is responsible for collecting the remains
  • The second parameter of execlp can be arbitrary, but it is more meaningful if related to the first parameter
    • This parameter's significance can be reflected in some aspects below
    • If the exec code is replaced with line 17, the second parameter can be named arbitrarily
    • Image
    • The source file test.c for generating the executable file Test is as follows:
      • Image
      • Output the value of argv[0]
    • The results of executing the upper and lower pieces of code are as follows:
      • Image
      • It can be seen that the second parameter is reflected in the argv[0] variable

Additional Knowledge Points#

  • Using while(1){} with sleep in the loop body is more CPU-friendly
    • Otherwise, it may cause CPU utilization to spike, idle, and overheat
  • pstree can conveniently show the inheritance relationship of processes, -p can display pid
  • Viewing zombie processes: ps, ps -aux, ps -ef, top can all be used
  • Deadlock: two or more processing units are waiting for each other to stop running to obtain system resources, but neither party exits early
  • Synchronization in computers is different from that in life
    • It is not about performing the same operation
    • But the order of events is determined and has a causal relationship

Points for Reflection#

Tips#

  • du [-h]: View the size of the current directory and all subdirectories [human-readable]
  • For multiprocessing output, using more will display the output of different processes independently
  • Recommended movie: "Her" 2013
    • Image
    • A love story between a silicon-based life and many carbon-based lives, involving high concurrency concepts
    • DoubanBaidu Cloud, extraction code: 8pic

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.