5 Arrays and Preprocessing Commands

Course Content#

Definition and Initialization of Arrays#

Definition: A sequential collection of the same type of elements of a fixed size
Initialization
- int a[100] = {2, 3}; // a[0] = 2, a[1] = 3
- int a[100] = {0}; // Array clearing operation: initializes every element to 0
  - Actually only defines the 0th element as 0, but causes other elements to be automatically initialized to 0
Others
- Starts from index 0
- Array space size: total size of all variables
- Storage space is contiguous
  - int a[3]

Address	0123	4567	89[10][11]
Element	a[0]	a[1]	a[2]

- The arrays defined above are all static arrays
  - Not recommended operation: int n; scanf("%d", &n); int arr(n);
- Index-value mapping
  - Given the index, the time efficiency of accessing the value is O(1)

Prime Sieve#

Core Idea: Use primes to eliminate all composite numbers

Learn as an algorithm framework, with many variations
Prime: has only two factors, 1 and itself
- 1 is neither prime nor composite
- 2 is the smallest prime
- A prime must be an odd number (except for 2), but an odd number is not necessarily a prime
Utilizing array structure. An array can represent three types of information:
- Index
- Value mapped by the index
- Sign of the value
Algorithm measurement standards and performance
- Space complexity: O(N), an array of size N is created for N numbers
- Time complexity: O(N * logN), O(N * loglogN)--optimized version, approaching O(N)
  - Not O(N) due to duplicate marking
  - The base of the logarithm does not matter, base 2 is already quite small
  - How to calculate: amortization idea, calculate expectation based on time complexity and probability...
Marking: Prime, 0; Composite, 1
- Utilizing two states of the array: index → number, value → mark
Prime sieve extension
- Find the minimum and maximum prime factors of all numbers within a range
- For example, for 12: 2, 3

Linear Sieve#

Start with Euler's 7th problem
- Find the 10001st prime number
- Can use enumeration method O(N * sqrt(N)), prime sieve
- Estimate the size of the 10001st prime number, what is the upper limit?
  - Rule: The range of primes ranked within 200,000 will not exceed 20 times the rank
Thoughts from the prime sieve
- How many times is 6 marked? 2 times: 2, 3
- How many times is 30 marked? 3 times: 2, 3, 5
- If a number N has m different primes in its prime factorization, how many times is N marked? m times
- Can it be marked only once? 👇
Linear sieve
- Both space and time are O(n)
- Use an integer M to mark composite number N
- The relationship between M and N satisfies four properties
  1. The smallest prime factor of N is p
  2. N = p * M
  3. ❗ p must be less than or equal to the smallest prime factor in M
    - Similar to the strategy of capturing Meng Huo, only mark on the last occasion
  4. Use M * P' (the set of all primes not greater than [the smallest prime of M]) to mark N1, N2, ...

[i and iii have a somewhat equivalent meaning]

Code
- Core idea: N = M (↑ as large as possible) * p (↓ as small as possible)
  - Enumerate M, gradually increase p until p meets the termination condition: property iii
  - Compared to M, p is easier to control!
- The key to avoiding duplication is property iii
- Note: Although there are two nested for loops, the time complexity is O(N) because the break in the second for loop means the entire array is traversed only once
- The difference between optimizing the initial condition of the prime sieve to i * i is? The prime sieve still has duplicate marking situations

Binary Search Method#

Sequential search → binary search: O(N) → O(logN)
Key premise: The search sequence is monotonic
Code
- Mainly demonstrates three parts: ① Binary search in an array ② Binary search in a function ③
- For the implementation part of binary search in functions
  - Should be able to unify binary_search_f_i and binary_search_f_d into one function
  - Focus on later learning, can you identify the return type of the passed function pointer?
  - But there are differences in implementation, so the significance should not be great
- The commented code in the main function tests the results of binary search in arrays and functions, as follows:
- - Using functions for binary search can be more flexible
- Double type equality: use the difference less than EPSL to judge
  - Remember to #undef EPSL
- Details see comments
  - Learning order: binary_search 👉 binary_search_f_i 👉 binary_search_f_d
Recursive version code
- The parameters of the function differ from the iterative method, and the search boundaries need to be determined

Newton's Iteration Method#

Purpose: To solve for the value of x when f(x) = 0
Conditions: The function is differentiable; convergent
Iteration formula: x2 = x1 - f(x1) / f'(x1)
Core idea: Each iteration brings x2 closer to the required value of x, until the absolute value of f(x) is less than EPSL
Code
- ⭐ Remember to take the absolute value for the loop termination condition!
- Is the initial value x = n / 2.0 fixed?
  - No
  - This value is an imagined x that is closest to the solution, which is fine on both sides of the solution
  - However, it cannot be 0, as the derivative would be 0, leading to a division by 0
- Wrap the Newton function to get the my_sqrt function
- This is not binary search! Binary search requires monotonicity~

Preprocessing Commands#

Commands starting with #
- #include <stdio.h> → During preprocessing, the stdio.h file is copied to the current location unchanged
Macro Definitions

Define symbolic constants: constant → symbol.

#define PI 3.1415926
#define MAX_N 1000

Convenient for batch modification of values

Foolish expressions

#define MAX(a, b) (a) > (b) ? (a) : (b)
#define S(a, b) a * b

Foolish eg. #define S(a, b) a * b
- S(2 + 3, 4 + 5) → Simple replacement → 2 + 3 * 4 + 5 is equivalent to 2 + (3 * 4) + 5
- No calculations are done during the replacement!

Define code segments (advanced)

#define P(a) { \
    printf("%d\n", a); \
}

Macros can only accept one line of definition; to break lines, use a backslash '' (continuation character)
Predefined macros (system encapsulated)
- Macros are surrounded by two underscores
- DATA, TIME: the date and time at preprocessing, which do not change if not preprocessed
  - Can achieve functionality similar to fingerprint recognition? Pre-stored relationships?
- Non-standard macros: Some environments may not define them, and different environments may have different standards, so portability is poor
  - How to handle this? Conditional compilation can be used 👇
Conditional Compilation
- #ifdef DEBUG
  - Description: Whether the DEBUG macro is defined
  - Note: DEBUG must be a macro, not a variable!
    - Macros take effect after preprocessing, while variables take effect after compilation
- #ifndef DEBUG Whether not defined
- #if MAX_N == 5 Whether macro MAX_N equals 5
  - Can be used in games to determine the user's mobile version
- #elif MAX_N == 4
- #else
- Must end with #endif**!!!
  - 【Coding Standards】
  - Similar to va_start, va_end
Simplified Compilation Process
- C Source .c
👇 Preprocessing stage (pre-compilation) (gcc -E): includes header files, macro replacements, and other preprocessing commands, replacing all # commands
- Compile Source .c
👇 Syntax check, compile (gcc –S)
- Assembly Source .s, .asm
👇 Assembly (gcc -c)
- Object File .o (on Linux) or .obj (on Windows)
👇 Linking (gcc)
- Executable Program a.out (on Linux) or a.exe (on Windows) (binary file)
PS: Finally, there is a loading process, mainly establishing the mapping relationship between the executable file, virtual address, and physical address

Strings#

Character Arrays
- Definition: char str[size];
- Initialization

char str[] = "hello world";
char str[size] = {'h', 'e', 'l', 'l', 'o'};

Line1
- The size in [] can be omitted, the system will calculate automatically
- The system will automatically add a terminator '\0', the size of the character array: 11+1
Line2
- Size must be at least 6, considering the terminator '\0' takes one space
  - Extra spaces are filled with '\0'
- If size is not added, '\0' must be added at the end of the array, the system will not add it automatically
'\0'
- A termination marker
- Corresponds to 0 at the lower level: false
- '\0' can serve as a termination condition, e.g., for loops reading strings
Is there a '\0' after characters? No, it is a characteristic of strings
Related Operations
- Header File: <string.h>
- strlen(str)
  - Returns the index of character '\0', length does not include '\0'
  - PS: sizeof() considers the actual memory occupied by the variable, including '\0', measured in bytes
- strcmp(str1, str2);
  - Compares ASCII values in dictionary order
    - Starts from the first character
  - Return value is the difference (0 if equal)
- strcpy(dest, src);
- The end markers for comparison and copying: look for character '\0', what if '\0' is lost? Hence there are
- Safer comparisons and copies: strncmp, strncpy
  - For cases where '\0' might be accidentally lost
  - Compares or copies at most n bytes
- Memory Related
  - memset(str, c, n)
  - Not limited to string operations, str corresponds to an address
    - Here is an example of operations on arrays
    - memset(arr, 0, sizeof(arr))
      - Initializes, clears to 0
    - memset(arr, 1, sizeof(arr))
      - This does not set every position to 1, because it operates on each byte, and int occupies 4 bytes
      - 0000...01000...001000....0001...
    - memset(arr, -1, sizeof(arr))
      - It is indeed -1, because -1 in one byte is 11111111
Header File: <stdio.h>
The combination of both can do format concatenation: one for input, one for output
The str1 in scanf is analogous to terminal input, which is replaced with string input

In-Class Exercise#

Implement a BUG-Free MAX Macro#

Idea: ① Calculate the correct output manually 👉② First write the simplest implementation with BUGs and have friendly output display 👉③ Find and solve problems one by one for incorrect outputs
① Correct output as shown
② First write the one with BUGs and implement friendly output display

#define MAX(a, b) a > b ? a : b
#define P(func) { \
    printf("%s = %d\n", #func, func); \
}

Line2-4: Friendly output display
- #func directly outputs its string representation
- Without #, it calls its value
③ Find and solve problems one by one for incorrect outputs
- (The order of solving BUGs is also important, but the order of BUGs here is good)
- At this point, look at the error situation
- Errors caused by macros can be seen in the preprocessed file
  - gcc -E *.c outputs the code after preprocessing with macros replaced
- First correction
  - Solution: Can use parentheses in the red box to increase precedence

#define MAX(a, b) (a > b ? a : b)

- Second correction
  - The fourth result is 2, reasoning as shown in the image
  - Note the conditional operator '?:'
    - Has a very low precedence: lower than '>'
    - When multiple '?:' exist, the associativity direction is right to left
  - Solution: Use parentheses for each variable to increase precedence

#define MAX(a, b) ((a) > (b) ? (a) : (b))

- Third correction
  - a++ ran twice
    - It should first take a for comparison, then perform a+1
  - Solution: Assign the pre-increment value of a to a variable; define it as an expression generated by a code segment

#define MAX(a, b) ({\
    __typeof(a) _a = (a);\
    __typeof(b) _b = (b);\
    _a > _b ? _a : _b;\
})

__typeof(a++) does a++ execute?
- When used in an expression, it does not execute, it only retrieves its type
({}) expression
- The value is the value of the last expression in parentheses, and the data type is that of the last expression
- Refer to Using parentheses and braces in C language combined with && compound statements-CSDN
- Removing {} or () will not work!
  - () can only contain comma expressions
    - Comma expressions take the value of the last expression
  - {} is a code segment, which has no return value
    - If defined directly as a macro function
    - Directly printf output will lose the essence of MAX returning the maximum value
    - ❓ Use return to return value
      - Needs declaration
      - But using an expression to represent the return value is smarter
Final Version Code

Result is all correct!

Implement LOG Macro Printing#

Code
- When delivering work, to omit log information, conditional compilation can be used
  - Use "gcc -DDEBUG" during compilation to pass in the DEBUG macro definition
  - Define an empty macro replacement after #else
- Variadic Macros
  - Variadic '...' just needs a name. eg. args...
- Using "#" and "##" in Macros
  - "#" : Take as string
  - "##" : Concatenate
    - Solve the situation where the log macro input is an empty character: If empty, the "," after frm will be automatically removed, refer to the preprocessing result 👇
    - Refer to Text Replacement Macros-cppreference
    - The difference between ab and a##b
      - ab will not substitute a and b before concatenation
      - The parameters a and b that need to be ## concatenated can be undefined
        
        The preprocessor will concatenate these undefined variable names, and the defined variable names after concatenation can be defined

Highlighted Notes#

Code Demonstration#

Prime Sieve#

Code

Dual logic, control indentation
- if continue
The front part of the prime array can simultaneously count and store (used to mark)
- Initially, prime[0] and prime[1] are not used, remaining idle
- There will be no overtaking issue: even numbers are definitely composite, and primes appear at least one apart
The initial condition of the second for loop can be optimized
- Initial Condition: i * 2 → i * i
- Time efficiency: O(Nlogn)→O(Nlognlogn)
- Because composite numbers smaller than i * i must have been marked by numbers smaller than i
  - This reduces the number of duplicate markings in the earlier part, but not completely
  - To completely avoid duplicate marking, a linear sieve must be used
- Danger point: i * i grows exponentially, more likely to overflow the maximum value that int can store
  - Can a check be added in advance? But it may lead to greater time consumption
  - Change to a data type with a higher upper limit to store i * i

Arrays#

Code

Output
- Details 👇

Declaration and Initialization#

Programming Tip: Open 5 extra positions to avoid overflow, which can reduce the bug rate

#define max_n 100
int arr[max_n + 5];

The variable space in the function is defined in the stack area
- The stack area can be understood as a badminton tube, last in first out
- Default stack size: 8MB≈8000000Byte (bytes)
Arrays declared within a function may not be clean and need manual initialization
- = {0};
- scanf input

for (int i = 0; i < max_n; i++) {
scanf("%d",arr + i);
}

- arr + i is equivalent to &arr[i]& cannot be omitted!
Arrays declared outside the function
- The system will automatically clear them to 0, similar to the = {0} operation
- The size that can be allocated is limited by the computer's memory, basically unlimited
Arrays defined with malloc, calloc, realloc are in the heap area, even if defined within a function

Space Occupation, Address#

The control character corresponding to sizeof() is %lu, unsigned long integer
The array name arr represents the address of the first element of the array, i.e., &arr[0]

printf("%p %p\n", arr, &arr[0]); // Address values are the same

Address + 1 offsets how many bytes depend on the size of the element represented by the address

Parameter Passing#

Condition: The external and internal parameter representations must be consistent
- One-dimensional array: int arr[100];
  - void func(int *num)
- Two-dimensional array: int arr2[100][100];
  - void func2(int **num) will warn
    - num + 1 and arr + 1 have inconsistent representations
    - num is a pointer to a pointer to int
    - The address difference between num and num + 1 is the size of one pointer, which is 8 bytes in a 64-bit operating system
    - The address difference between arr and arr + 1 is the size of one one-dimensional array, which is 4 * 100 bytes
  - void func2(int (*num)[100]) is valid
- Three-dimensional array: int arr3[100][100][32]
  - void func3(int (*num)[100][32]) is valid
  - (*num)[100][32] can be written as num[][100][32]
    - (*num) and num[] have the same representation, but they are essentially different
❓ Is q pointing to nil? That is 0x0

Additional Knowledge Points#

An array can represent three types of information
int arr[100]; // When arr is passed as a parameter to a function, it is the address of the first element of the array
Is 1 negated to -2? Let's derive it
- 1: 0000...001
- Negation: 1111...110
- The sign bit remains unchanged, negating and adding one gives: 1000...010
- That is -2
- Conclusion: The bitwise negation of all positive integers is their own +1 negative number
When including <math.h>, add -lm during compilation, using gcc *.c -lm
Coding habit: No more than 80 characters per line
Control character "%X": Uppercase hexadecimal

Points for Thought#

Which is more powerful, functions or macros?
- Macros can generate functions; aren't macros the father of functions?

Tips#

vim automatically completes the format of header files: add spaces
- Enter the ~/.vimrc file and modify the corresponding format in SetTitle()
Explore macro expansion and nesting yourself
Refer to the reference book chapters 8.1, 8.4, and chapter 15

Bo2SS

5 Arrays and Preprocessing Commands

Course Content#

Definition and Initialization of Arrays#

Prime Sieve#

Linear Sieve#

Binary Search Method#

Newton's Iteration Method#

Preprocessing Commands#

Strings#

In-Class Exercise#

Implement a BUG-Free MAX Macro#

Implement LOG Macro Printing#

Highlighted Notes#

Code Demonstration#

Prime Sieve#

Arrays#

Declaration and Initialization#

Space Occupation, Address#

Parameter Passing#

Additional Knowledge Points#

Points for Thought#

Tips#

Course Notes#