Persistent Disjoint Set
I was randomly solving problems when I bumped into this interesting problem New Roads Queries. I initially felt that this problem will be completely out of reach, but in the end I came up with an interesting solution for the same and learnt a lot in this process. The problem statement goes like:
There are N cities in Byteland but no roads between them. However, each day, a new road will be built. There will be a total of M roads. Your task is to process q queries of the form: "after how many days can we travel from city a to city b for the first time?"
A small detour ๐ฃ๏ธ
Let's start with a simple and familiar problem:
Given a graph of N nodes and queries of the form (A, B) check whether the 2 nodes A and B are in the same connected component or not.
Although this particular problem can be done using a simple DFS or BFS on the graph, it can be done easily using the data structure of Disjoint Set. Let's see how.
Disjoint Set gives us a way to do two operations very efficiently on N disjoint sets:
Check whether two nodes are part of the same set
Merge the sets containing given two nodes
So if we add all the edges to the Disjoint Set one by one using the merge operation, in the end we would have a disjoint sets of nodes. Then all we need to do for each query is to answer whether each node is part of the same set or not.
With path-compression and light-to-heavy merging, we can answer each query in amortised log* N
, where N is the number of nodes in the graph and log* N
is the Inverse Ackermann function.
The code for DSU is also fairly simple and below in an example of the same in C++.
NOTE: All Disjoint Set uses to store things are two arrays/vectors.
Coming back ๐
Now that we know what Disjoint Set is, let's try to apply this to the problem we have at hand. One notices that the queries we need to answer are time โฐ based, which means we need to have a snapshot of the graph or the Disjoint Set (representing the graph) at each point in time. Let's number these Disjoint Sets representing the graphs at each time by D0, D1, ... DM. But we need to do this smartly without copying the entire Disjoint Set each time.
For the time being, let's assume that we have some efficient way to get the copy of the Disjoint Set at any time t, still how do we solve the problem at hand ??
๐ก On careful observation we can see that if two nodes are connected at time ti, then they will remain connected in all time tj where j >= i. That means the function of whether two nodes are connected is monotonic over time.
This brings us to the idea of using Binary Search to find the least time t where any given two nodes are connected. This will take us O(log M)
time where M is the number of edges in the graph. Formally, we are trying to find a Disjoint Set Di such that A and B have the same root in Di and i is minimum.
We seem to be getting somewhere now....
Now how do we solve the problem of persistence ??
We notice that our Disjoint Set just uses arrays. We can replace the arrays in the DSU with a Persistent Array to create a Persistent Disjoint Set data structure. This will allows us to query the Disjoint Set at any point in time!!
Show me some code ๐ป
First we create a generic Persistent Array. Persistent Array allows us to look at the state of the array by going back in time.
NOTE: Actually, the below array is only Partially Persistent which is good enough for the requirements of this problem.
Using the Persistent Array we just created, we create the Persistent Disjoint Set.
NOTE: The Persistent Disjoint Set. is almost the same as Disjoint Set, just a dimension of time has been added to it.
With these two data structures, we can easily solve the question at hand.
Here is the link to the full solution.
Other Solutions ๐งช
There are other interesting ways to solve this same problem which requires knowledge of some concepts like Heavy Light Decomposition etc. The rough outline of the solution is given here.
Last updated