Last Updated on November 18, 2022 by Sumit Kumar
Find the minimum size of the substring of string
S
which contains all the character from a given stringT
. Print the substring with minimum length.
See original problem statement here
Test Case
Input:
1
prepbytes
pyte
ccbabcab
bbab
Output:
pbyte
babcab
Explanation:
Case 1: All characters in string 'pyte' are present in 'pbyte' and this is the smallest substring.
Case 2: All characters in string 'bbab' are present in strings 'babcab' and this is the smallest substring containing all characters.
Bruteforce Approach to Solve the Minimum Window Substring Problem –
1) We can generate all substrings of string
S
.
2) For each substring we check if it contains all character from the stringT
.
3) We print the string with minimum length.
4) You can see this approach is not efficient and takes O(T\times{N^{2}}
) time as it takes O(N^{2}
) time for calculating all substrings and takes O(T
) time for checking if all characters fromT
exists or not in the substring.
5) Do you have any other approach which is efficient than this approach? Try that out, if that one goes beyond the time limit, proceed to the following efficient approach.
In the above approach, we found all substring containing all characters from our pattern string, we can also observe that all discarded substring contain or minimum window substring, and the largest such substring starts from 0. Hence we can find a substring from the start of the string containing all characters from the pattern, and then discard characters from beginning one by one, until we reach a condition that violates our requirements. We discuss this approach our efficient approach
Efficient Approach to Solve the Minimum Window Substring Problem:
1) First check if the length of string
S
is less than the length of the given stringT
, if yes printNot Found
.
2) If no, then store the occurrences of characters from stringT
in a hash table.
3) Start matching the characters of stringT
with the characters of string i.e. increment count if a character matches.
4) Check if the count is equal to the length of stringT
, if yes we have found a window containing all characters from the stringT
.
5) If such a window is found, try to minimize it by removing extra characters from the beginning of the current window.
6) Update the minimum length on each character removal.
7) Print the window with minimum length when no character can be removed.
Example
- Let us take an example and understand above approach, let string
S
be ‘qabaaab
‘ and patternP
be ‘baaab
‘.- We maintain a hash table
hash_str
to store characters from string. We also maintain a hash tablehash_pat
to store characters from pattern string.- We find all appearances of characters in
hash_pat
, so finalhash_pat
for pattern is[3,2]
where index0
corresponds for charactera
and index1
corresponds for characterb
.- We now find the largest window containing all character from pattern string, this can be found by starting window from
0
and increasing end of window one by one until whole window contains all chracters from pattern string.
Hash table for string is initialised with 0, so default hash table is[0,0,0]
where index0
corresponds to'a'
, index1
corresponds to'b'
and index2
corresponds to'q'
.
Finding Window :
We also maintain a counter
count
, which when equal to length of pattern, we exit because that is last point of window.
For end = 0:
Current character is
'q'
, hencehash_str
becomes[0,0,1]
, ‘q’ does not appear in pattern, so we don’t increasecount
.
For end = 1:
Current character is
'a'
, hencehash_str
becomes[1,0,1]
, hash_str value is less than hash_pat value of ‘a’, so we increase counter by 1, socount=1
.
For end = 2:
Current character is
'b'
, hencehash_str
becomes[1,1,1]
, hash_str value is less than hash_pat value of ‘a’, so we increase counter by 1, socount=2
.
For end = 3:
Current character is
'a'
, hencehash_str
becomes[2,1,1]
, hash_str value is less than hash_pat value of ‘a’, so we increase counter by 1, socount=3
.
For end = 4:
Current character is
'a'
, hencehash_str
becomes[3,1,1]
, hash_str value is equal to hash_pat value of ‘a’, so we increase counter by 1, socount=4
.
For end = 5:
Current character is
'a'
, hencehash_str
becomes[4,1,1]
, hash_str value is greater than hash_pat value of ‘a’, so we dont increase counter, socount=4
.
For end = 6:
Current character is
'b'
, hencehash_str
becomes[4,2,1]
, hash_str value is less than hash_pat value of ‘a’, so we increase counter by 1, socount=5
.
Hence count is now equal to length of pattern, so we have found the window of maximum length, now we need to short this window for minimum size of window.
Minimizing The Window :
Here, we move start pointer of window to minimize the size, we decrease the count of hash_str value of current character if hash_str value is greater than hash_pat value of current character. If hash_str value becomes becomes less to hash_pat value of current character, we stop iterating, current window is the minimum window.
For start = 0:
Current character is
'q'
, hencehash_str
becomes[4,2,0]
, hash_str value is not in pattern so we move further.
For start = 1:
Current character is
'a'
, hencehash_str
becomes[3,2,0]
, hash_str value of ‘a’ is greater than hash_pat value, so we move window further.
For start = 2:
Current character is
'b'
, hencehash_str
becomes[2,2,0]
, hash_str value of ‘a’ is equal to hash_pat value, so we have minimized the window.
We print the final window starting from start to end pointer. Hence we print 'baaab'
.
Program of the Solution of Minimum Window Substring Problem
#include<stdio.h> #include<string.h> int main() { int test; scanf("%d",&test); while(test--){ char str[1000000], pat[1000000]; scanf("%s%s",str,pat); int len1 = strlen(str); int len2 = strlen(pat); if (len1 < len2) { printf("Not Found\n"); }else{ int hash_pat[256] = {0}; int hash_str[256] = {0}; for (int i = 0; i < len2; i++) hash_pat[pat[i]]++; int start = 0, start_index = -1, min_len = 9999999; int count = 0; for (int j = 0; j < len1 ; j++) { hash_str[str[j]]++; if (hash_pat[str[j]] != 0 && hash_str[str[j]] <= hash_pat[str[j]] ) count++; if (count == len2) { while ( hash_str[str[start]] > hash_pat[str[start]] || hash_pat[str[start]] == 0) { if (hash_str[str[start]] > hash_pat[str[start]]) hash_str[str[start]]--; start++; } int len_window = j - start + 1; if (min_len > len_window) { min_len = len_window; start_index = start; } } } if (start_index == -1) printf("Not Found\n"); else{ for(int i=start_index; i<min_len+start_index; i++) printf("%c",str[i]); printf("\n"); } } } }
#include<bits/stdc++.h> using namespace std; const int no_of_chars = 256; int main() { int test; cin>>test; while(test--){ string str, pat; cin>>str>>pat; int len1 = str.length(); int len2 = pat.length(); if (len1 < len2) { cout << "Not Found\n"; }else{ int hash_pat[no_of_chars] = {0}; int hash_str[no_of_chars] = {0}; for (int i = 0; i < len2; i++) hash_pat[pat[i]]++; int start = 0, start_index = -1, min_len = INT_MAX; int count = 0; for (int j = 0; j < len1 ; j++) { hash_str[str[j]]++; if (hash_pat[str[j]] != 0 && hash_str[str[j]] <= hash_pat[str[j]] ) count++; if (count == len2) { while ( hash_str[str[start]] > hash_pat[str[start]] || hash_pat[str[start]] == 0) { if (hash_str[str[start]] > hash_pat[str[start]]) hash_str[str[start]]--; start++; } int len_window = j - start + 1; if (min_len > len_window) { min_len = len_window; start_index = start; } } } if (start_index == -1) cout<<"Not Found\n"; else{ for(int i=start_index; i<min_len+start_index; i++) cout<<str[i]; cout<<endl; } } } }
import java.util.*; import java.io.*; import java.lang.Math; public class Main { public static void main(String args[]) throws IOException { Scanner sc= new Scanner(System.in); int test = sc.nextInt(); while(test != 0){ String str = sc.next(); String pat= sc.next(); int len1 = str.length(); int len2 = pat.length(); if (len1 < len2) { System.out.println("Not Found"); }else{ int hash_pat[] = new int[256]; int hash_str[] = new int[256]; for (int i = 0; i < len2; i++) hash_pat[pat.charAt(i) - 'a']++; int start = 0, start_index = -1, min_len = 9999999; int count = 0; for (int j = 0; j < len1 ; j++) { hash_str[str.charAt(j) - 'a']++; if (hash_pat[str.charAt(j) - 'a'] != 0 && hash_str[str.charAt(j) - 'a'] <= hash_pat[str.charAt(j) - 'a'] ) count++; if (count == len2) { while ( hash_str[str.charAt(start) - 'a'] > hash_pat[str.charAt(start) - 'a'] || hash_pat[str.charAt(start) - 'a'] == 0) { if (hash_str[str.charAt(start) - 'a'] > hash_pat[str.charAt(start) - 'a']) hash_str[str.charAt(start) - 'a']--; start++; } int len_window = j - start + 1; if (min_len > len_window) { min_len = len_window; start_index = start; } } } if (start_index == -1) System.out.println("Not Found"); else{ for(int i=start_index; i<min_len+start_index; i++) System.out.print(str.charAt(i)); System.out.println(); } } test--; } } }
no_of_chars = 256 for _ in range(int(input())): s, pat = input().split() len1, len2 = len(s), len(pat) if len1 < len2: print("Not Found") else: hash_pat = [0] * no_of_chars hash_s = [0] * no_of_chars for i in range(len2): hash_pat[ord(pat[i])] += 1 start = 0 start_index = -1 min_len = float("inf") count = 0 for j in range(len1): hash_s[ord(s[j])] += 1 if (hash_pat[ord(s[j])] != 0 and hash_s[ord(s[j])] <= hash_pat[ord(s[j])] ): count += 1 if (count == len2): while ( hash_s[ord(s[start])] > hash_pat[ord(s[start])] or hash_pat[ord(s[start])] == 0): if (hash_s[ord(s[start])] > hash_pat[ord(s[start])]): hash_s[ord(s[start])] -= 1 start += 1 len_window = j - start + 1; if (min_len > len_window): min_len = len_window start_index = start if (start_index == -1): print("Not Found") else: for i in range(start_index, min_len+start_index): print(s[i], end="") print()
Space Complexity of this approach would be
O(1).
[forminator_quiz id="1349"]
This article tried to discuss the concept of Strings, Hashing. Hope this blog helps you understand and solve the problem. To practice more problems on Strings, Hashing you can check out MYCODE | Competitive Programming.