Persistent Disjoint Set
Last updated
Last updated
I was randomly solving problems when I bumped into this interesting problem . I initially felt that this problem will be completely out of reach, but in the end I came up with an interesting solution for the same and learnt a lot in this process. The problem statement goes like:
There are N cities in Byteland but no roads between them. However, each day, a new road will be built. There will be a total of M roads. Your task is to process q queries of the form: "after how many days can we travel from city a to city b for the first time?"
Let's start with a simple and familiar problem:
Given a graph of N nodes and queries of the form (A, B) check whether the 2 nodes A and B are in the same connected component or not.
Although this particular problem can be done using a simple or on the graph, it can be done easily using the data structure of . Let's see how.
gives us a way to do two operations very efficiently on N disjoint sets:
Check whether two nodes are part of the same set
Merge the sets containing given two nodes
So if we add all the edges to the Disjoint Set one by one using the merge operation, in the end we would have a disjoint sets of nodes. Then all we need to do for each query is to answer whether each node is part of the same set or not.
With path-compression and light-to-heavy merging, we can answer each query in amortised log* N
, where N is the number of nodes in the graph and log* N
is the .
The code for DSU is also fairly simple and below in an example of the same in C++.
For the time being, let's assume that we have some efficient way to get the copy of the Disjoint Set at any time t, still how do we solve the problem at hand ??
๐ก On careful observation we can see that if two nodes are connected at time ti, then they will remain connected in all time tj where j >= i. That means the function of whether two nodes are connected is monotonic over time.
We seem to be getting somewhere now....
Now how do we solve the problem of persistence ??
NOTE: Actually, the below array is only Partially Persistent which is good enough for the requirements of this problem.
With these two data structures, we can easily solve the question at hand.
NOTE: All uses to store things are two arrays/vectors.
Now that we know what is, let's try to apply this to the problem we have at hand. One notices that the queries we need to answer are time โฐ based, which means we need to have a snapshot of the graph or the Disjoint Set (representing the graph) at each point in time. Let's number these Disjoint Sets representing the graphs at each time by D0, D1, ... DM. But we need to do this smartly without copying the entire Disjoint Set each time.
This brings us to the idea of using to find the least time t where any given two nodes are connected. This will take us O(log M)
time where M is the number of edges in the graph. Formally, we are trying to find a Disjoint Set Di such that A and B have the same root in Di and i is minimum.
We notice that our just uses arrays. We can replace the arrays in the DSU with a to create a Persistent Disjoint Set data structure. This will allows us to query the Disjoint Set at any point in time!!
First we create a generic . allows us to look at the state of the array by going back in time.
Using the we just created, we create the Persistent Disjoint Set.
NOTE: The Persistent Disjoint Set. is almost the same as , just a dimension of time has been added to it.
Here is the link to the .
There are other interesting ways to solve this same problem which requires knowledge of some concepts like etc. The rough outline of the solution is given .