Check If an Array is a Subset of Another Array
Which technique works better? HashMaps or Sorting+Two-pointer
Today, I solved a basic array problem, but it gave me valuable insights into how my thought process works when approaching a problem. Simple questions like these help in analyzing problem-solving patterns. Thanks to my DSA interviewer, I was able to evaluate whether I was on the right track.
Problem Statement:
Given two arrays, a[]
and b[]
, determine whether b[]
is a subset of a[]
.
Expected Complexity:
Time Complexity: O(m+n)
Space Complexity: O(n)
The image below shows an example provided by GeeksForGeeks:
My Thought Process in coming up with an approach:
What is a subset? When all elements of
b[]
are present ina[]
, in any order.Considering duplicates. Since both
a[]
andb[]
can contain duplicates, we need to track the frequency of each element ina[]
.Data structure: A HashMap is ideal for storing elements and their counts efficiently.
Implementation Steps:
Traverse
a[]
and store each element along with its count in a HashMap.Traverse
b[]
and check if each element exists in the HashMap:If present, decrease its count by 1.
If an element is missing or its count drops below zero,
b[]
is not a subset.
Considering Edge cases. Before processing, a quick check:
If
b[]
has more elements thana[]
, it cannot be a subset.In this case, we return false immediately, saving computation time.
Outcome. This approach successfully passed all test cases.
There is an alternative method that can also be used to solve this problem: Sorting & Two-pointer technique
Below is a comparison table explaining when to use which method based on different scenarios.
Sorting + Two-pointer | HashMap |
Time complexity: O(nlogn + mlogm), where m, and n are lengths of two arrays. Although sorting modifies the original array, space complexity is O(1). | Time complexity (in this question): O(n + m), and space complexity is O(n), which may create concerns in very large datasets. |
If you need to check multiple subsets against the same array, sorting it once and using two-pointers is better. | Using a hashmap may result in rebuilding the hashmap every time, which may be inefficient in repeated queries. |
Sorting avoids using any memory by working directly on the arrays. | If a[] and b[] contain large numbers (e.g., values up to 10^9), storing them in a HashMap can lead to higher memory consumption. |
Code of the problem using HashMap:
class Solution {
public boolean isSubset(int a[], int b[]) {
// edge case optimization
if (b.length > a.length) {
return false;
}
// create a hashmap, and loop through first array and store elements and
// their occurances
HashMap<Integer, Integer> map = new HashMap<>();
for(int i = 0; i < a.length; i++) {
map.put(a[i], map.getOrDefault(a[i], 0) + 1);
}
// loop through b[] to check for each element if it's present in map or not
for(int i = 0; i < b.length; i++) {
if (map.containsKey(b[i])) {
if (map.get(b[i]) == 0) {
return false;
}
else {
map.put(b[i], map.get(b[i]) - 1);
}
}
else {
return false;
}
}
return true;
}
}
Code of the problem using Sorting and Two-pointer:
class Solution {
public boolean isSubset(int a[], int b[]) {
if (b.length > a.length) {
return false;
}
// sort both arrays
Arrays.sort(a);
Arrays.sort(b);
int p1 = 0, p2 = 0;
while (p1 < a.length && p2 < b.length) {
// if values at both indexes are equal, move forward
if (a[p1] == b[p2]) {
p1++; p2++;
}
// if the value at index p1 is less than p2, then bigger value might lie
// further in a[], so move forward
else if (a[p1] < b[p2]) {
p1++;
}
// if value at index p2 is greater than value at index p1, means here is
// violation, so we can directly return false
else if (a[p1] > b[p2]) {
return false;
}
}
return true;
}
}