6 Forests and Disjoint Sets

Course Content#

Union-Find: Union sets [establishing connectivity] and find connectivity in sets [determining whether two points are connected]

Connectivity Problem#

Rule: Points that are already connected cannot be connected again
Divide all points into 2 sets: ① and ②
[PS] The relationship between points (merging, determining connectivity) can also be the relationship between sets

Quick-Find Algorithm#

Core idea: Coloring
- One color corresponds to one category
- Initialization: Each individual is independent and has its own index, belonging to an independent set
- ⭐ Change the color of all points connected to itself to the color to be dyed
Time complexity
- Connectivity check: O(1) - fast lookup, hence the name Quick-Find
- Union operation: O(N) - need to traverse all points to determine whether to merge
🆒 Thought-provoking: Why is it called a forest?
- The key lies in the merge operation, merging several points with several points, merging sets with sets
- Similar to the merge between subtrees or making one tree a subtree of another tree
- All sets are put together, similar to putting all trees together, which is a forest
So, are there other ways besides coloring?
- Think: It doesn't matter what color it is dyed, just need to know which points have the same color, that is, connected

Quick-Union Algorithm#

Inspiration from life: Find the big brother, find the leader
Core idea: Find representative elements [root node]
- The stored value is its representative element
- Initialization: Each individual is independent and has its own index, belonging to an independent set
- ⭐ Find the representative elements of the two points and modify the value of one representative element to the value of the other representative element
- Representative element: The value inside is itself
- Association: When the big brother or his little brother betrays, they all start with the big brother and betray to the big brother of another little brother
  - The result is the same as Quick-Find, merging a whole subtree into another subtree, but it is achieved through representative elements
  - Can only be root node pointing to root node
Time complexity
- Both need to find the root node, related to the height of the tree
- Connectivity operation: O(tree height)
- Union operation: O(tree height)
- ❗ 2 cases
  - Normal state → both are O(logN)
  - Degenerate into a linked list → both are O(N)
How to solve the degenerate case? 👇

Weighted Quick-Union Algorithm#

[Merge by Rank]

How to avoid degeneration? → Ensure lush branches
- Merge basis 1: Tree height, hang the short tree under the tall tree [pairwise combination]
  - The tree with height h requires at least N = 2 ^ (h - 1) nodes
  - That is, tree height h = log[2]N + 1 ≈ log[2]N
  - [PS] Only when two trees of the same height are merged will the height increase
- Merge basis 2: Number of nodes, hang the tree with fewer nodes under the tree with more nodes
- Both optimization methods can achieve O(logN), but merge basis 2 [number of nodes] is better
⭐ Why is merge basis 2 better
- [Example] What is average search times
  - As shown in the figure below, the average search times of trees A and B are calculated
  - The depth of the node is the number of search times of the node, and the average search times = total search times / total number of nodes
  - In this example, the search operation of tree B is faster
- [Derivation] Merge basis 2 directly determines the average search times
  - For trees A and B with SA and SB nodes, their total search times LA and LB are respectively:
    - Where li represents the depth of the i-th node
  - At this time, when performing merge operations, calculate the average search times of ①A→B and ②B→A
    - ① When the A tree is merged into the B tree as a subtree, it is
      - All nodes in the A tree need to be searched one more time
    - ② When the B tree is merged into the A tree as a subtree, it is
      - All nodes in the B tree need to be searched one more time
  - ❗ [Comparison of average search times of the two methods]
    - The numerator of the average search times is not directly related to the tree height [LA, LB], but the number of nodes in the numerator [SA, SB] directly determines the number of search times, the smaller the better
    - 👉Whoever has fewer nodes will be merged as a subtree
    - ❓ Think: Does the above derivation prove that height cannot be used as a merge basis?
      - ❌No, height indirectly affects the number of nodes, and in general, the lower the height, the fewer the number of nodes
      - However, for special cases where the A tree is higher than the B tree, but the number of nodes in the A tree is less than the B tree, still use [number of nodes] as the merge basis and merge the A tree as a subtree into the B tree
- So using the number of nodes as the merge basis is better! The merge idea is as follows
When merging two subtrees
- If the number of nodes is the same, follow the idea of regular Quick-Union
- If not, the root node of the subtree with fewer nodes is attached to the root node of the subtree with more nodes
[PS] In other words
- When changing the big brother
- If the number of little brothers is the same, follow the idea of regular Quick-Union
- If not, the big brother with fewer little brothers has to follow the big brother with more little brothers

+ Path Compression#

Refer to the visualization result of weighted Quick-Union in in-class exercise 2
- The position of 0 is a bit awkward
- ❗ Whether the representative element of 0 is 1 or 3 doesn't matter, directly attach 0 to 3 can reduce the height of the tree
- 【Method】Make each non-root node directly point to the root node, flatten the structure

Efficiency comparison of the above algorithms

In-Class Exercise#

Quick-Find vs. Quick-Union#

【Key】Understanding Quick-Union
- 0->1->2->4->5, 3->4->5; 8->9->7->6
- Boundary for lookup and union: Stop when the representative element is itself

Quick-Union vs. Weighted Quick-Union#

【Key】Understanding weighted
- When the number of elements in two sets is different
- The value of the representative element of the set with fewer elements 👉 the value of the representative element of the set with more elements
- The big brother with fewer little brothers has to follow the big brother with more little brothers
Result visualization
- Obviously, the tree obtained by the weighted method is shorter, and the efficiency of merging and searching is higher

Code Demonstration#

HZOJ-71: Friend Circle#

Sample Input

Sample Output

No
Yes

Idea
- Use data structure - Union-Find
- 1 means union operation, 2 means find operation
- Test the efficiency of Quick-Find, Quick-Union [+weighted, +path compression, -weighted] algorithms
Code

Quick-Find#

Lookup and union strategy: Based on coloring

Quick-Union#

Based on Quick-Find
- Modify the color in the structure definition to the representative element father
- Modify the find operation: need to find the bottom
- Modify the union operation: also need to find the bottoms of the two elements before merging
Degeneration into a linked list is prone to occur, below is the optimization

+ Weighted#

Based on Quick-Union
Add the size attribute to record the number of nodes in the set where the node is located, and use it as the merge strategy

+ Path Compression#

Perform path compression every time a find operation is performed, make each non-root node point directly to the root node [root node]

- Weighted#

Erase all size-related operations
With path compression, not merging by rank [removing the size attribute] can still achieve good results

Time used for the problem on HZOJ platform

Method	Quick-Find	Quick-Union	+ Weighted	+ Path Compression	- Weighted
Time (ms)	744	2020	164	172	188

Quick-Union has degeneration problem
After adding path compression, not merging by rank can still achieve good results
🆒The last three methods all have good performance!

Points to Ponder#

In the weighted Quick-Union algorithm, does the derivation that the number of nodes is more optimal prove that height cannot be used as a merge basis?
- ❌No, height indirectly affects the number of nodes, and in general, the lower the height, the fewer the number of nodes
- However, for special cases where the A tree is higher than the B tree, but the number of nodes in the A tree is less than the B tree, still use [number of nodes] as the merge basis and merge the A tree as a subtree into the B tree

Tips#

《C++ Primer》 is recommended to be used as a reference book