Persistent Disjoint Set

Last updated 3 years ago

Persistent Disjoint Set

I was randomly solving problems when I bumped into this interesting problem . I initially felt that this problem will be completely out of reach, but in the end I came up with an interesting solution for the same and learnt a lot in this process. The problem statement goes like:

There are N cities in Byteland but no roads between them. However, each day, a new road will be built. There will be a total of M roads. Your task is to process q queries of the form: "after how many days can we travel from city a to city b for the first time?"

A small detour 🛣️

Let's start with a simple and familiar problem:

Given a graph of N nodes and queries of the form (A, B) check whether the 2 nodes A and B are in the same connected component or not.

Although this particular problem can be done using a simple or on the graph, it can be done easily using the data structure of . Let's see how.

gives us a way to do two operations very efficiently on N disjoint sets:

Check whether two nodes are part of the same set
Merge the sets containing given two nodes

So if we add all the edges to the Disjoint Set one by one using the merge operation, in the end we would have a disjoint sets of nodes. Then all we need to do for each query is to answer whether each node is part of the same set or not.

With path-compression and light-to-heavy merging, we can answer each query in amortised log* N, where N is the number of nodes in the graph and log* N is the .

The code for DSU is also fairly simple and below in an example of the same in C++.

class DisjointSet {
  public:
    vector<int> parent;
    vector<int> size;

    DisjointSet(int n) {
      parent.resize(n);
      size.resize(n, 1);
      /** Initially each element belongs to one set which is itself */
      for (int i = 0; i < n; i++) parent[i] = i;
    }

    /** Find the root of any element */
    int root(int a) {
      while (parent[a] != a) {
        parent[a] = parent[parent[a]]; /** Path compression */
        a = parent[a];
      }
      return a;
    }

    /** Find whether two elements belong to the same set */
    int find(int a, int b) { return root(a) == root(b); }

    /** Merge two sets which contain element a, b */
    void merge(int a, int b) {
      int root_a = root(a); int root_b = root(b);
      /** Light to Heavy merging */
      if (root_a != root_b) {
         if (size[root_a] < size[root_b]) {
          parent[root_a] = parent[root_b];
          size[root_b] += size[root_a];
        } else {
          parent[root_b] = parent[root_a];
          size[root_a] += size[root_b];
        }
      }    
    }
};

Coming back 🔙

For the time being, let's assume that we have some efficient way to get the copy of the Disjoint Set at any time t, still how do we solve the problem at hand ??

💡 On careful observation we can see that if two nodes are connected at time ti, then they will remain connected in all time tj where j >= i. That means the function of whether two nodes are connected is monotonic over time.

We seem to be getting somewhere now....

Now how do we solve the problem of persistence ??

Show me some code 💻

NOTE: Actually, the below array is only Partially Persistent which is good enough for the requirements of this problem.

template <typename T>
class PersistentArray {
  public:
    vector<vector<pair<int, T>>> arr;

    PersistentArray() {}

    PersistentArray(int n) { arr.resize(n); }

    PersistentArray(vector<T>& initial_array) {
      arr.resize(initial_array.size());
      for (int i = 0; i < initial_array.size(); i++) {
        arr[i].push_back({ 0, initial_array[i] });
      }
    }

    void resize(int n) { arr.resize(n); }

    void resize(int n, T a) {
      arr.resize(n);
      for (int i = 0; i < n; i++) {
        arr[i].push_back({ 0, a });
      }
    }

    void clear() { arr.clear(); }

    void push_back(T item, int index, int time) {
        arr[index].push_back({ time, item }); 
    }

    T at(int index, int time) {
      return prev(
        upper_bound(
            arr[index].begin(),
            arr[index].end(),
            make_pair(time, numeric_limits<T>::max())
        )
      )->second;
    }
};

class PersistentDisjointSet {
  public:
    PersistentArray<int> parent;
    PersistentArray<int> size;

    PersistentDisjointSet(int n) {
      parent.resize(n);
      size.resize(n, 1);
      for (int i = 0; i < n; i++) parent.push_back(i, i, 0);
    }

    int root(int a, int t) {
      int temp = a;
      /** Note we are not doing path compression here. */
      while (parent.at(a, t) != a) {
        a = parent.at(a, t);
      }
      return a;
    }

    int find(int a, int b, int t) {
      return root(a, t) == root(b, t);
    }

    void merge(int a, int b, int t) {
      int root_a = root(a, t); int root_b = root(b, t);
      if (root_a != root_b) {
         if (size.at(root_a, t) < size.at(root_b, t)) {
          parent.push_back(parent.at(root_b, t), root_a, t);
          size.push_back(size.at(root_b, t) + size.at(root_a, t), root_b, t);
        } else {
          parent.push_back(parent.at(root_a, t), root_b, t);
          size.push_back(size.at(root_a, t) + size.at(root_b, t), root_a, t);
        }
      }
    }
};

With these two data structures, we can easily solve the question at hand.

void solve() {
  PersistentDisjointSet pds(N);
  for (int i = 0; i < edges.size(); i++) {
    /** Merge every two nodes in the context of the time */
    pds.merge(edges[i].first - 1, edges[i].second - 1, i + 1);
  }
  for (auto [from, to]: queries) {
    int low = 0, high = M + 1;
    int answer = -1;
    int mid = (high + low) / 2;
    /** Binary searching over time */
    while (low <= high) {
      mid = (high + low) / 2;
      if (pds.find(from - 1, to - 1, mid)) {
        high = mid - 1;
        answer = mid;
      } else {
        low = mid + 1;
      }
    }
    cout << answer << "\n";
  }
}

Persistent Disjoint Set

Persistent Disjoint Set

A small detour 🛣️

Coming back 🔙

Show me some code 💻

Other Solutions 🧪

A small detour 🛣️

Coming back 🔙

Show me some code 💻

Other Solutions 🧪