1 Binary Search Tree, AVL Tree

Data structure is to define a property and maintain that property.

BS Tree 👉 AVL Tree

Course Content#

Binary Search Tree#

[Also known as binary search tree, binary lookup tree]

Basic Knowledge#

Property: For any triplet, left subtree < root node < right subtree
Purpose: To solve retrieval needs related to searching and ranking
【Why is the entry point for tree structure searches always the root node?】
- First, understand the meaning of points and edges: sets and relations
- The root node represents the whole set
Structural operations: insert, delete, search
- Insert
  - Continuously compare with the root node of the subtree, recursively, until the subtree has no child nodes
  - ⭐ The new node will definitely become a leaf node
- Delete [Node]
  - Degree 0: delete directly
  - Degree 1: attach the orphan subtree directly to its parent node
  - Degree 2: [A bit troublesome]
    - Find the predecessor or successor to replace the original node
      - Predecessor node: the node corresponding to the maximum value in its left subtree [This node definitely has no right subtree, degree at most 1]
      - Successor node: the node corresponding to the minimum value in its right subtree [This node definitely has no left subtree, degree at most 1]
      - 【Only for nodes of degree 2, otherwise need to consider searching up to the parent node, to what extent?】
    - ⭐ Thus converting to the problem of deleting a node of degree 0 or 1 [predecessor or successor]
    - [Can also be understood as the deletion operation of a one-dimensional array]
- Search: Just search recursively based on the property
The order of insertion will affect the tree structure, corresponding to different average search efficiencies
- The same sequence, but different insertion orders
- Average search efficiency: the expected value of the number of node searches. That is, average search times: total search times / number of nodes
  - Assuming each node is searched with equal probability
In-order traversal → an ordered sequence from smallest to largest
Refer to 《Basic Data Structures》——3 Trees and Binary Trees——Binary Search Trees

Extended Content#

[See specific code in code demonstration——Binary Search Tree]

Delete operation optimization [Code demonstration——Binary Search Tree]
- The delete operation for nodes of degree 0 and degree 1 can share the deletion operation code for degree 1
- Code for degree 0
- Code for degree 1
- When the degree is 0, temp points to NULL, also destroy root, then return NULL
——Find the element ranked k
- ⭐ Add a size field to record the number of nodes in each subtree [including the root node]
  - The essence of data structure, modifying structure definition for specific problems
  - Update size during insertion and deletion
- Search conditions
  - ① k = LS + 1, the root node is the value we are looking for
  - ② k < LS + 1, the element ranked k is in the left subtree, continue searching in the left subtree
  - ③ k > LS + 1, the element ranked k is in the right subtree, search for the element ranked k - LS - 1 in the right subtree
  - [PS] LS is the number of nodes in the left subtree
  - Output at boundary conditions [-1 or key]
——Find the elements ranked top k [all elements ranked <= k]
- 【Based on the ranking problem】
- Search conditions
  - ① k = LS + 1, output all values in the left subtree
  - ② k < LS + 1, all top k elements are in the left subtree
  - ③ k > LS + 1, the root node and elements in the left subtree all belong to the top k elements
  - Equivalent to continuously reducing the size of the tree, thus transforming into the first case
  - There are actually many solutions [such as heaps], there is no standard answer
The relationship between binary search trees and quicksort
- 【Binary search trees are the data structure used by quicksort at the conceptual level】
- The Partition operation in quicksort
  - Treat the pivot value as the root node of the binary search tree
  - After each Partition operation, the relationship between the pivot value and the left and right sides is actually 👇
  - The relationship between the root node and the left and right subtrees
- ⭐ Understanding the following two thoughts can clarify the relationship between the two
  - Thought 1: The relationship between the time complexity of quicksort and the tree construction time complexity of binary search trees
    - They are essentially the same, the tree construction process is similar to multiple Partition processes
  - Thought 2: The relationship between the quickselect algorithm and binary search trees
    - The essence of both is the same, the former manifests as an algorithm, the latter as a data structure
  - [Personal understanding]
    - Quicksort is a bunch of numbers doing a Partition once, then continuing to split
    - The tree construction process is one number at a time doing a Partition, fixing the position of that number before moving to the next

Balanced Binary Search Tree—AVL Tree#

Solving the degeneration problem in binary search trees

Basic Knowledge#

Background
- Inventors: AV, L, with AV contributing more
- Year: 1962
- [Neural networks also emerged in this era]
Properties
- Essentially also a binary search tree, possessing all properties of binary search trees
- Additional property: The height difference between the left and right subtrees does not exceed 1 [Left and right subtrees are roughly equal]
⭐ Balance condition, balance adjustment

[Further] Thoughts#

What is the range of the number of nodes contained in a BS, AVL tree of height H?

[General upper limit for binary trees: 2 ^ H - 1]
<BS> H ~ 2 ^ H - 1
<AVL> low(H - 2) + low(H - 1) + 1 ~ 2 ^ H - 1
- That is, 1.5 ^ H ~ 2 ^ H - 1, low(H) represents the minimum number of nodes in an AVL tree of height H
- The left boundary is a Fibonacci sequence
  - Its growth rate is 1.618 ^ n ≈ 1.5 ^ n 【Can be used for estimation】
- 👉 The relationship between the number of nodes and height is logarithmic, so search efficiency is O(logN) level
❓ [Further] Is the search efficiency of a regular binary search tree always worse than that of an AVL tree?
- Not necessarily, it depends on the order of the insertion sequence, but the lower limit is high for AVL trees
- 【Extension】
  - Tree height = lifespan, number of nodes = wealth of life, different algorithms yield different results
  - Education raises the baseline, not the upper limit; the upper limit depends on ability and luck
    - The upper limit largely depends on an uncontrollable insertion sequence, left to fate

Adjustment ⭐#

Insertion/deletion + adjustment: The insertion operation is consistent with binary search trees, the key lies in adjustment [⭐ Which node to rotate]

Rotation#

[Basic operation used when unbalanced]

Left Rotation
- Grasp the root node and rotate left around the center point
  - K3 becomes the root node
  - K1 becomes the left subtree of K3
  - K3's original left subtree A becomes the right subtree of K1 [because K3 > A > K1]
- "Leaf node" depth changes: A remains unchanged, B - 1, K2 + 1
  - 👉 Subtree height changes: left subtree + 1, right subtree - 1
Right Rotation
- Grasp the root node and rotate right around the center point
  - K2 becomes the root node
  - K1 becomes the right subtree of K2
  - K2's original right subtree B becomes the left subtree of K1 [because K2 < B < K1]
- "Leaf node" depth changes: A - 1, B remains unchanged, K3 + 1
  - 👉 Subtree height changes: left subtree - 1, right subtree + 1
Left and right rotations are symmetric operations, essentially affecting the height of the subtrees

Types of Imbalance && Adjustment Strategies#

Judging scenario: During the backtracking process, adjust as soon as the first imbalance is detected
Symmetrical situations: LL ——RR , LR——RL

① LL

⭐ Key point ⭐: Analyze the relationship between h1, h2, h3, h4 [Write equations]
- Analysis
  - Insert one by one, the newly inserted node must cause an LL-type imbalance in subtree A [Deletion also occurs one by one]
  - Before insertion, it must be balanced
  - After insertion, K1's left subtree is 2 higher than the right subtree
    - That is, h(K2) = h(D) + 2, while
      - h(K2) = h1 + 1
      - h(D) = max(h3, h4) + 1
- Equation relationship=>
  - h1 = max(h3, h4) + 2 = h2 + 1
  - [PS] |h3 - h4| <= 1
<Adjustment Strategy> Big right rotation. Result see above, balance analysis see below:
- ① K1: left subtree height h2= right subtree height max(h3, h4) + 1=h1 -1
- ② K2: left subtree height h1= right subtree height h2 + 1
- Already balanced, and can be said to be terrifyingly balanced

② LR

⭐ Key point ⭐: Analyze the relationship between h1, h2, h3, h4 [Write equations]
- Analysis
  - Insert one by one, the newly inserted node must cause an LR-type imbalance in subtree B / C
  - Before insertion, it must be balanced
  - After insertion, K1's left subtree is 2 higher than the right subtree
    - That is, h(K2) = h(D) + 2, while
      - h(K2) = max(h2, h3) + 2
      - h(D) = h4
- Equation relationship=>
  - max(h2, h3) = h4 = h1
  - [PS] |h2 - h3| <= 1

❗ Remember again! The judgment always occurs at the first imbalance during the backtracking process, so all previous subtrees are balanced

<Adjustment Strategy> Small left rotation + big right rotation. Result see below:
- First perform a small left rotation to convert to LL type, then a big right rotation [LL adjustment strategy]
- Balance analysis
  - It is obvious that the heights of the left and right subtrees depend on h1, h2, h3, h4
  - And the height difference between them does not exceed 1 [one of h2 or h3 may be smaller by 1]

Summary of Adjustment Strategies#

Occurs at the first unbalanced node during the backtracking phase
Key to adjustment: the height relationships of the four subtrees ABCD in the four cases
Four cases
- LL, LR: both ultimately require a big right rotation
  - LL, big right rotation
  - LR, small left rotation first, then big right rotation
- RL, RR: both ultimately require a big left rotation
  - RL, small right rotation first, then big left rotation
  - RR, big left rotation

[Balanced Binary Search Tree—SBTree]#

AVL balances through tree height, while SBTree balances through the number of nodes
Refer to the learning order of AVL: balance condition + balance adjustment [insertion, deletion]

In-Class Exercises#

The binary search tree formed by the second sequence degenerates into a linked list, and the insertion order will affect the structure
When judging imbalance, backtrack from the inserted node step by step, and adjust immediately upon finding an imbalance

Code Demonstration#

Binary Search Tree#

5 basic operations: initialize, destroy, search, insert, delete
Add sorting problem [add size field], Top-k problem solution [based on sorting problem]
[PS]
- Finding predecessor has a bug: the predecessor of nodes of degree 1 or 0 may not exist in the subtree, but this scenario does not need to be considered
- Recursion must consider boundary conditions
- Learn to use macros to make the code more beautiful
- Analyzing binary trees generally involves discussing three situations: left, root, right
Partial results

【Appendix】 Code for randomly generating input operations

Generate non-proportional operation types

AVL Tree#

After insertion and deletion, remember to recalculate the tree height 【Always remember to maintain the structure definition】
- During insertion / deletion of nodes + during left rotation / right rotation adjustments
⭐ Introduced NIL nodes to replace NULL🆒[Red-black trees also need to use this]
- NULL is inaccessible
- NIL is an actual node that can be accessed
- 【Convenient for code implementation】
LL, LR, and RL, RR last operations are big right rotation and big left rotation respectively

Additional Knowledge Points#

When analyzing binary tree situations, it is generally divided into three cases: left, root, right

Points for Thought#

Is the search efficiency of a regular binary search tree always worse than that of an AVL tree?
- Not necessarily, it depends on the order of the insertion sequence, but the lower limit is high for AVL trees
- 【Extension】
  - Tree height = lifespan, number of nodes = wealth of life, different algorithms yield different results
  - Education raises the baseline, not the upper limit; the upper limit depends on ability and luck
    - The upper limit largely depends on an uncontrollable insertion sequence, left to fate

Tips#

Prerequisites for learning C++: system programming, network knowledge, algorithm structure, C language
The so-called ability to design and analyze algorithms: the ability to classify discussions [divide into several cases] and summarize [combine multiple cases into one]
- Program = Algorithm + Data Structure
- Those who improve this ability by practicing problems are gifted individuals
Recommended Geek Column——Programming Introductory Course That Everyone Can Learn——A challenge to traditional learning methods