Bo2SS

Bo2SS

5 Arrays and Preprocessing Commands

Course Content#

Definition and Initialization of Arrays#

  • Definition: A sequential collection of the same type of elements of a fixed size
  • Initialization
    • int a[100] = {2, 3}; // a[0] = 2, a[1] = 3
    • int a[100] = {0}; // Array clearing operation: initializes every element to 0
      • Actually only defines the 0th element as 0, but causes other elements to be automatically initialized to 0
  • Others
    • Starts from index 0
    • Array space size: total size of all variables
    • Storage space is contiguous
      • int a[3]
Address0123456789[10][11]
Elementa[0]a[1]a[2]
    • The arrays defined above are all static arrays
      • Not recommended operation: int n; scanf("%d", &n); int arr(n);
    • Index-value mapping
      • Given the index, the time efficiency of accessing the value is O(1)

Prime Sieve#

Core Idea: Use primes to eliminate all composite numbers

  • Learn as an algorithm framework, with many variations
  • Prime: has only two factors, 1 and itself
    • 1 is neither prime nor composite
    • 2 is the smallest prime
    • A prime must be an odd number (except for 2), but an odd number is not necessarily a prime
  • Utilizing array structure. An array can represent three types of information:
    • Index
    • Value mapped by the index
    • Sign of the value
  • Algorithm measurement standards and performance
    • Space complexity: O(N), an array of size N is created for N numbers
    • Time complexity: O(N * logN), O(N * loglogN)--optimized version, approaching O(N)
      • Not O(N) due to duplicate marking
      • The base of the logarithm does not matter, base 2 is already quite small
      • How to calculate: amortization idea, calculate expectation based on time complexity and probability...
  • Marking: Prime, 0; Composite, 1
    • Utilizing two states of the array: index → number, value → mark
  • Prime sieve extension
    • Find the minimum and maximum prime factors of all numbers within a range
    • For example, for 12: 2, 3

Linear Sieve#

img

  • Start with Euler's 7th problem
    • Find the 10001st prime number
    • Can use enumeration method O(N * sqrt(N)), prime sieve
    • Estimate the size of the 10001st prime number, what is the upper limit?
      • Rule: The range of primes ranked within 200,000 will not exceed 20 times the rank
  • Thoughts from the prime sieve
    • How many times is 6 marked? 2 times: 2, 3
    • How many times is 30 marked? 3 times: 2, 3, 5
    • If a number N has m different primes in its prime factorization, how many times is N marked? m times
    • Can it be marked only once? 👇
  • Linear sieve
    • Both space and time are O(n)
    • Use an integer M to mark composite number N
    • The relationship between M and N satisfies four properties
      1. The smallest prime factor of N is p
      2. N = p * M
      3. ❗ p must be less than or equal to the smallest prime factor in M
        • Similar to the strategy of capturing Meng Huo, only mark on the last occasion
      4. Use M * P' (the set of all primes not greater than [the smallest prime of M]) to mark N1, N2, ...

[i and iii have a somewhat equivalent meaning]

  • Code
    • img
    • Core idea: N = M (↑ as large as possible) * p (↓ as small as possible)
      • Enumerate M, gradually increase p until p meets the termination condition: property iii
      • Compared to M, p is easier to control!
    • The key to avoiding duplication is property iii
    • Note: Although there are two nested for loops, the time complexity is O(N) because the break in the second for loop means the entire array is traversed only once
    • The difference between optimizing the initial condition of the prime sieve to i * i is? The prime sieve still has duplicate marking situations

Binary Search Method#

  • Sequential search → binary search: O(N) → O(logN)

  • Key premise: The search sequence is monotonic

  • Code

    • img
    • img
    • Mainly demonstrates three parts: ① Binary search in an array ② Binary search in a function ③

    • For the implementation part of binary search in functions

      • Should be able to unify binary_search_f_i and binary_search_f_d into one function
      • Focus on later learning, can you identify the return type of the passed function pointer?
      • But there are differences in implementation, so the significance should not be great
    • The commented code in the main function tests the results of binary search in arrays and functions, as follows:

    • img

      • Using functions for binary search can be more flexible
    • Double type equality: use the difference less than EPSL to judge

      • Remember to #undef EPSL
    • Details see comments

      • Learning order: binary_search 👉 binary_search_f_i 👉 binary_search_f_d
  • Recursive version code

    • img
    • The parameters of the function differ from the iterative method, and the search boundaries need to be determined

Newton's Iteration Method#

  • img
  • Purpose: To solve for the value of x when f(x) = 0

  • Conditions: The function is differentiable; convergent

  • Iteration formula: x2 = x1 - f(x1) / f'(x1)

  • Core idea: Each iteration brings x2 closer to the required value of x, until the absolute value of f(x) is less than EPSL

  • Code

    • img
    • ⭐ Remember to take the absolute value for the loop termination condition!

    • Is the initial value x = n / 2.0 fixed?

      • No
      • This value is an imagined x that is closest to the solution, which is fine on both sides of the solution
      • However, it cannot be 0, as the derivative would be 0, leading to a division by 0
    • Wrap the Newton function to get the my_sqrt function

    • This is not binary search! Binary search requires monotonicity~

Preprocessing Commands#

  • Commands starting with #
    • #include <stdio.h> → During preprocessing, the stdio.h file is copied to the current location unchanged
  • Macro Definitions
  1. Define symbolic constants: constant → symbol.
#define PI 3.1415926
#define MAX_N 1000
  • Convenient for batch modification of values
  1. Foolish expressions
#define MAX(a, b) (a) > (b) ? (a) : (b)
#define S(a, b) a * b
  • Foolish eg. #define S(a, b) a * b
    • S(2 + 3, 4 + 5) → Simple replacement → 2 + 3 * 4 + 5 is equivalent to 2 + (3 * 4) + 5
    • No calculations are done during the replacement!
  1. Define code segments (advanced)
#define P(a) { \
    printf("%d\n", a); \
}
  • Macros can only accept one line of definition; to break lines, use a backslash '' (continuation character)

  • Predefined macros (system encapsulated)

    • img
    • Macros are surrounded by two underscores

    • DATA, TIME: the date and time at preprocessing, which do not change if not preprocessed

      • Can achieve functionality similar to fingerprint recognition? Pre-stored relationships?
    • Non-standard macros: Some environments may not define them, and different environments may have different standards, so portability is poor

      • How to handle this? Conditional compilation can be used 👇
  • Conditional Compilation

    • #ifdef DEBUG
      • Description: Whether the DEBUG macro is defined
      • Note: DEBUG must be a macro, not a variable!
        • Macros take effect after preprocessing, while variables take effect after compilation
    • #ifndef DEBUG Whether not defined
    • #if MAX_N == 5 Whether macro MAX_N equals 5
      • Can be used in games to determine the user's mobile version
    • #elif MAX_N == 4
    • #else
    • Must end with #endif**!!!
      • 【Coding Standards】
      • Similar to va_start, va_end
  • Simplified Compilation Process

    • img
    • C Source .c

    👇 Preprocessing stage (pre-compilation) (gcc -E): includes header files, macro replacements, and other preprocessing commands, replacing all # commands

    • Compile Source .c

    👇 Syntax check, compile (gcc –S)

    • Assembly Source .s, .asm

    👇 Assembly (gcc -c)

    • Object File .o (on Linux) or .obj (on Windows)

    👇 Linking (gcc)

    • Executable Program a.out (on Linux) or a.exe (on Windows) (binary file)

    PS: Finally, there is a loading process, mainly establishing the mapping relationship between the executable file, virtual address, and physical address

Strings#

  • Character Arrays
    • Definition: char str[size];
    • Initialization
char str[] = "hello world";
char str[size] = {'h', 'e', 'l', 'l', 'o'};
  • Line1

    • The size in [] can be omitted, the system will calculate automatically
    • The system will automatically add a terminator '\0', the size of the character array: 11+1
  • Line2

    • Size must be at least 6, considering the terminator '\0' takes one space
      • Extra spaces are filled with '\0'
    • If size is not added, '\0' must be added at the end of the array, the system will not add it automatically
  • '\0'

    • A termination marker
    • Corresponds to 0 at the lower level: false
    • '\0' can serve as a termination condition, e.g., for loops reading strings
  • Is there a '\0' after characters? No, it is a characteristic of strings

  • Related Operations

    • img
    • Header File: <string.h>

    • strlen(str)

      • Returns the index of character '\0', length does not include '\0'
      • PS: sizeof() considers the actual memory occupied by the variable, including '\0', measured in bytes
    • strcmp(str1, str2);

      • Compares ASCII values in dictionary order
        • Starts from the first character
      • Return value is the difference (0 if equal)
    • strcpy(dest, src);

    • The end markers for comparison and copying: look for character '\0', what if '\0' is lost? Hence there are

    • Safer comparisons and copies: strncmp, strncpy

      • For cases where '\0' might be accidentally lost
      • Compares or copies at most n bytes
    • Memory Related

      • memset(str, c, n)

      • Not limited to string operations, str corresponds to an address

        • Here is an example of operations on arrays

        • memset(arr, 0, sizeof(arr))

          • Initializes, clears to 0
        • memset(arr, 1, sizeof(arr))

          • This does not set every position to 1, because it operates on each byte, and int occupies 4 bytes
          • 0000...01000...001000....0001...
        • memset(arr, -1, sizeof(arr))

          • It is indeed -1, because -1 in one byte is 11111111
  • img
  • Header File: <stdio.h>

  • The combination of both can do format concatenation: one for input, one for output

  • The str1 in scanf is analogous to terminal input, which is replaced with string input

In-Class Exercise#

Implement a BUG-Free MAX Macro#

  • img
  • Idea: ① Calculate the correct output manually 👉② First write the simplest implementation with BUGs and have friendly output display 👉③ Find and solve problems one by one for incorrect outputs

  • ① Correct output as shown

  • ② First write the one with BUGs and implement friendly output display

#define MAX(a, b) a > b ? a : b
#define P(func) { \
    printf("%s = %d\n", #func, func); \
}
  • Line2-4: Friendly output display
    • #func directly outputs its string representation
    • Without #, it calls its value
  • ③ Find and solve problems one by one for incorrect outputs
    • (The order of solving BUGs is also important, but the order of BUGs here is good)

    • At this point, look at the error situation

    • img
    • Errors caused by macros can be seen in the preprocessed file

      • gcc -E *.c outputs the code after preprocessing with macros replaced
    • First correction

      • img
      • Solution: Can use parentheses in the red box to increase precedence
#define MAX(a, b) (a > b ? a : b)
    • Second correction
      • img
      • The fourth result is 2, reasoning as shown in the image

      • Note the conditional operator '?:'

        • Has a very low precedence: lower than '>'

        • When multiple '?:' exist, the associativity direction is right to left

        • img
      • Solution: Use parentheses for each variable to increase precedence

#define MAX(a, b) ((a) > (b) ? (a) : (b))
    • Third correction
      • img
      • a++ ran twice

        • It should first take a for comparison, then perform a+1
      • Solution: Assign the pre-increment value of a to a variable; define it as an expression generated by a code segment

#define MAX(a, b) ({\
    __typeof(a) _a = (a);\
    __typeof(b) _b = (b);\
    _a > _b ? _a : _b;\
})
  • __typeof(a++) does a++ execute?

    • When used in an expression, it does not execute, it only retrieves its type
  • ({}) expression

    • The value is the value of the last expression in parentheses, and the data type is that of the last expression

    • Refer to Using parentheses and braces in C language combined with && compound statements-CSDN

    • img
    • Removing {} or () will not work!

      • () can only contain comma expressions
        • Comma expressions take the value of the last expression
      • {} is a code segment, which has no return value
        • If defined directly as a macro function
        • Directly printf output will lose the essence of MAX returning the maximum value
        • ❓ Use return to return value
          • Needs declaration
          • But using an expression to represent the return value is smarter
  • Final Version Code

  • img

​ Result is all correct!

  • img

Implement LOG Macro Printing#

  • img
  • img
  • Code

    • img
    • When delivering work, to omit log information, conditional compilation can be used

      • Use "gcc -DDEBUG" during compilation to pass in the DEBUG macro definition
      • Define an empty macro replacement after #else
    • Variadic Macros

      • Variadic '...' just needs a name. eg. args...
    • Using "#" and "##" in Macros

      • "#" : Take as string
      • "##" : Concatenate
        • Solve the situation where the log macro input is an empty character: If empty, the "," after frm will be automatically removed, refer to the preprocessing result 👇

        • img
        • Refer to Text Replacement Macros-cppreference

        • img
        • The difference between ab and a##b

          • ab will not substitute a and b before concatenation
          • The parameters a and b that need to be ## concatenated can be undefined
            • The preprocessor will concatenate these undefined variable names, and the defined variable names after concatenation can be defined

Highlighted Notes#

Code Demonstration#

Prime Sieve#

Code

  • img
  • Dual logic, control indentation

    • if continue
  • The front part of the prime array can simultaneously count and store (used to mark)

    • Initially, prime[0] and prime[1] are not used, remaining idle
    • There will be no overtaking issue: even numbers are definitely composite, and primes appear at least one apart
  • The initial condition of the second for loop can be optimized

    • Initial Condition: i * 2 → i * i
    • Time efficiency: O(Nlogn)→O(Nlognlogn)
    • Because composite numbers smaller than i * i must have been marked by numbers smaller than i
      • This reduces the number of duplicate markings in the earlier part, but not completely
      • To completely avoid duplicate marking, a linear sieve must be used
    • Danger point: i * i grows exponentially, more likely to overflow the maximum value that int can store
      • Can a check be added in advance? But it may lead to greater time consumption
      • Change to a data type with a higher upper limit to store i * i

Arrays#

Code

  • img
  • Output

    • img
    • Details 👇

Declaration and Initialization#

  • Programming Tip: Open 5 extra positions to avoid overflow, which can reduce the bug rate

#define max_n 100
int arr[max_n + 5];

  • The variable space in the function is defined in the stack area
    • The stack area can be understood as a badminton tube, last in first out
    • Default stack size: 8MB≈8000000Byte (bytes)
  • Arrays declared within a function may not be clean and need manual initialization
    • = {0};
    • scanf input

for (int i = 0; i < max_n; i++) {
scanf("%d",arr + i);
}

    • arr + i is equivalent to &arr[i]& cannot be omitted!
  • Arrays declared outside the function
    • The system will automatically clear them to 0, similar to the = {0} operation
    • The size that can be allocated is limited by the computer's memory, basically unlimited
  • Arrays defined with malloc, calloc, realloc are in the heap area, even if defined within a function

Space Occupation, Address#

  • The control character corresponding to sizeof() is %lu, unsigned long integer
  • The array name arr represents the address of the first element of the array, i.e., &arr[0]

printf("%p %p\n", arr, &arr[0]); // Address values are the same

  • Address + 1 offsets how many bytes depend on the size of the element represented by the address

Parameter Passing#

  • Condition: The external and internal parameter representations must be consistent
    • One-dimensional array: int arr[100];
      • void func(int *num)
    • Two-dimensional array: int arr2[100][100];
      • void func2(int **num) will warn
        • num + 1 and arr + 1 have inconsistent representations
        • num is a pointer to a pointer to int
        • The address difference between num and num + 1 is the size of one pointer, which is 8 bytes in a 64-bit operating system
        • The address difference between arr and arr + 1 is the size of one one-dimensional array, which is 4 * 100 bytes
      • void func2(int (*num)[100]) is valid
    • Three-dimensional array: int arr3[100][100][32]
      • void func3(int (*num)[100][32]) is valid
      • (*num)[100][32] can be written as num[][100][32]
        • (*num) and num[] have the same representation, but they are essentially different
  • ❓ Is q pointing to nil? That is 0x0

Additional Knowledge Points#

  • An array can represent three types of information
  • int arr[100]; // When arr is passed as a parameter to a function, it is the address of the first element of the array
  • Is 1 negated to -2? Let's derive it
    • 1: 0000...001
    • Negation: 1111...110
    • The sign bit remains unchanged, negating and adding one gives: 1000...010
    • That is -2
    • Conclusion: The bitwise negation of all positive integers is their own +1 negative number
  • When including <math.h>, add -lm during compilation, using gcc *.c -lm
  • Coding habit: No more than 80 characters per line
  • Control character "%X": Uppercase hexadecimal

Points for Thought#

  • Which is more powerful, functions or macros?
    • Macros can generate functions; aren't macros the father of functions?

Tips#

  • vim automatically completes the format of header files: add spaces
    • Enter the ~/.vimrc file and modify the corresponding format in SetTitle()
  • Explore macro expansion and nesting yourself
  • Refer to the reference book chapters 8.1, 8.4, and chapter 15

Course Notes#

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.