Find the minimum size of the substring of string
Swhich contains all the character from a given stringT. Print the substring with minimum length.
Test Case
Input:
1
prepbytes
pyte
ccbabcab
bbab
Output:
pbyte
babcab
Explanation:
Case 1: All characters in string 'pyte' are present in 'pbyte' and this is the smallest substring.
Case 2: All characters in string 'bbab' are present in strings 'babcab' and this is the smallest substring containing all characters.
Bruteforce Approach to Solve the Minimum Window Substring Problem –
1) We can generate all substrings of string
S.
2) For each substring we check if it contains all character from the stringT.
3) We print the string with minimum length.
4) You can see this approach is not efficient and takes O(T\times{N^{2}}) time as it takes O(N^{2}) time for calculating all substrings and takes O(T) time for checking if all characters fromTexists or not in the substring.
5) Do you have any other approach which is efficient than this approach? Try that out, if that one goes beyond the time limit, proceed to the following efficient approach.
In the above approach, we found all substring containing all characters from our pattern string, we can also observe that all discarded substring contain or minimum window substring, and the largest such substring starts from 0. Hence we can find a substring from the start of the string containing all characters from the pattern, and then discard characters from beginning one by one, until we reach a condition that violates our requirements. We discuss this approach our efficient approach
Efficient Approach to Solve the Minimum Window Substring Problem:
1) First check if the length of string
Sis less than the length of the given stringT, if yes printNot Found.
2) If no, then store the occurrences of characters from stringTin a hash table.
3) Start matching the characters of stringTwith the characters of string i.e. increment count if a character matches.
4) Check if the count is equal to the length of stringT, if yes we have found a window containing all characters from the stringT.
5) If such a window is found, try to minimize it by removing extra characters from the beginning of the current window.
6) Update the minimum length on each character removal.
7) Print the window with minimum length when no character can be removed.
Example
- Let us take an example and understand above approach, let string
Sbe ‘qabaaab‘ and patternPbe ‘baaab‘.- We maintain a hash table
hash_strto store characters from string. We also maintain a hash tablehash_patto store characters from pattern string.- We find all appearances of characters in
hash_pat, so finalhash_patfor pattern is[3,2]where index0corresponds for characteraand index1corresponds for characterb.- We now find the largest window containing all character from pattern string, this can be found by starting window from
0and increasing end of window one by one until whole window contains all chracters from pattern string.
Hash table for string is initialised with 0, so default hash table is[0,0,0]where index0corresponds to'a', index1corresponds to'b'and index2corresponds to'q'.
Finding Window :
We also maintain a counter
count, which when equal to length of pattern, we exit because that is last point of window.
For end = 0:
Current character is
'q', hencehash_strbecomes[0,0,1], ‘q’ does not appear in pattern, so we don’t increasecount.
For end = 1:
Current character is
'a', hencehash_strbecomes[1,0,1], hash_str value is less than hash_pat value of ‘a’, so we increase counter by 1, socount=1.
For end = 2:
Current character is
'b', hencehash_strbecomes[1,1,1], hash_str value is less than hash_pat value of ‘a’, so we increase counter by 1, socount=2.
For end = 3:
Current character is
'a', hencehash_strbecomes[2,1,1], hash_str value is less than hash_pat value of ‘a’, so we increase counter by 1, socount=3.
For end = 4:
Current character is
'a', hencehash_strbecomes[3,1,1], hash_str value is equal to hash_pat value of ‘a’, so we increase counter by 1, socount=4.
For end = 5:
Current character is
'a', hencehash_strbecomes[4,1,1], hash_str value is greater than hash_pat value of ‘a’, so we dont increase counter, socount=4.
For end = 6:
Current character is
'b', hencehash_strbecomes[4,2,1], hash_str value is less than hash_pat value of ‘a’, so we increase counter by 1, socount=5.
Hence count is now equal to length of pattern, so we have found the window of maximum length, now we need to short this window for minimum size of window.
Minimizing The Window :
Here, we move start pointer of window to minimize the size, we decrease the count of hash_str value of current character if hash_str value is greater than hash_pat value of current character. If hash_str value becomes becomes less to hash_pat value of current character, we stop iterating, current window is the minimum window.
For start = 0:
Current character is
'q', hencehash_strbecomes[4,2,0], hash_str value is not in pattern so we move further.
For start = 1:
Current character is
'a', hencehash_strbecomes[3,2,0], hash_str value of ‘a’ is greater than hash_pat value, so we move window further.
For start = 2:
Current character is
'b', hencehash_strbecomes[2,2,0], hash_str value of ‘a’ is equal to hash_pat value, so we have minimized the window.
We print the final window starting from start to end pointer. Hence we print 'baaab'.
Program of the Solution of Minimum Window Substring Problem
#include<stdio.h>
#include<string.h>
int main()
{
int test;
scanf("%d",&test);
while(test--){
char str[1000000], pat[1000000];
scanf("%s%s",str,pat);
int len1 = strlen(str);
int len2 = strlen(pat);
if (len1 < len2)
{
printf("Not Found\n");
}else{
int hash_pat[256] = {0};
int hash_str[256] = {0};
for (int i = 0; i < len2; i++)
hash_pat[pat[i]]++;
int start = 0, start_index = -1, min_len = 9999999;
int count = 0;
for (int j = 0; j < len1 ; j++)
{
hash_str[str[j]]++;
if (hash_pat[str[j]] != 0 &&
hash_str[str[j]] <= hash_pat[str[j]] )
count++;
if (count == len2)
{
while ( hash_str[str[start]] > hash_pat[str[start]]
|| hash_pat[str[start]] == 0)
{
if (hash_str[str[start]] > hash_pat[str[start]])
hash_str[str[start]]--;
start++;
}
int len_window = j - start + 1;
if (min_len > len_window)
{
min_len = len_window;
start_index = start;
}
}
}
if (start_index == -1)
printf("Not Found\n");
else{
for(int i=start_index; i<min_len+start_index; i++)
printf("%c",str[i]);
printf("\n");
}
}
}
}
#include<bits/stdc++.h>
using namespace std;
const int no_of_chars = 256;
int main()
{
int test;
cin>>test;
while(test--){
string str, pat;
cin>>str>>pat;
int len1 = str.length();
int len2 = pat.length();
if (len1 < len2)
{
cout << "Not Found\n";
}else{
int hash_pat[no_of_chars] = {0};
int hash_str[no_of_chars] = {0};
for (int i = 0; i < len2; i++)
hash_pat[pat[i]]++;
int start = 0, start_index = -1, min_len = INT_MAX;
int count = 0;
for (int j = 0; j < len1 ; j++)
{
hash_str[str[j]]++;
if (hash_pat[str[j]] != 0 &&
hash_str[str[j]] <= hash_pat[str[j]] )
count++;
if (count == len2)
{
while ( hash_str[str[start]] > hash_pat[str[start]]
|| hash_pat[str[start]] == 0)
{
if (hash_str[str[start]] > hash_pat[str[start]])
hash_str[str[start]]--;
start++;
}
int len_window = j - start + 1;
if (min_len > len_window)
{
min_len = len_window;
start_index = start;
}
}
}
if (start_index == -1)
cout<<"Not Found\n";
else{
for(int i=start_index; i<min_len+start_index; i++)
cout<<str[i];
cout<<endl;
}
}
}
}
import java.util.*;
import java.io.*;
import java.lang.Math;
public class Main {
public static void main(String args[]) throws IOException {
Scanner sc= new Scanner(System.in);
int test = sc.nextInt();
while(test != 0){
String str = sc.next();
String pat= sc.next();
int len1 = str.length();
int len2 = pat.length();
if (len1 < len2)
{
System.out.println("Not Found");
}else{
int hash_pat[] = new int[256];
int hash_str[] = new int[256];
for (int i = 0; i < len2; i++)
hash_pat[pat.charAt(i) - 'a']++;
int start = 0, start_index = -1, min_len = 9999999;
int count = 0;
for (int j = 0; j < len1 ; j++)
{
hash_str[str.charAt(j) - 'a']++;
if (hash_pat[str.charAt(j) - 'a'] != 0 &&
hash_str[str.charAt(j) - 'a'] <= hash_pat[str.charAt(j) - 'a'] )
count++;
if (count == len2)
{
while ( hash_str[str.charAt(start) - 'a'] > hash_pat[str.charAt(start) - 'a']
|| hash_pat[str.charAt(start) - 'a'] == 0)
{
if (hash_str[str.charAt(start) - 'a'] > hash_pat[str.charAt(start) - 'a'])
hash_str[str.charAt(start) - 'a']--;
start++;
}
int len_window = j - start + 1;
if (min_len > len_window)
{
min_len = len_window;
start_index = start;
}
}
}
if (start_index == -1)
System.out.println("Not Found");
else{
for(int i=start_index; i<min_len+start_index; i++)
System.out.print(str.charAt(i));
System.out.println();
}
}
test--;
}
}
}
no_of_chars = 256
for _ in range(int(input())):
s, pat = input().split()
len1, len2 = len(s), len(pat)
if len1 < len2:
print("Not Found")
else:
hash_pat = [0] * no_of_chars
hash_s = [0] * no_of_chars
for i in range(len2):
hash_pat[ord(pat[i])] += 1
start = 0
start_index = -1
min_len = float("inf")
count = 0
for j in range(len1):
hash_s[ord(s[j])] += 1
if (hash_pat[ord(s[j])] != 0 and hash_s[ord(s[j])] <= hash_pat[ord(s[j])] ):
count += 1
if (count == len2):
while ( hash_s[ord(s[start])] > hash_pat[ord(s[start])] or hash_pat[ord(s[start])] == 0):
if (hash_s[ord(s[start])] > hash_pat[ord(s[start])]):
hash_s[ord(s[start])] -= 1
start += 1
len_window = j - start + 1;
if (min_len > len_window):
min_len = len_window
start_index = start
if (start_index == -1):
print("Not Found")
else:
for i in range(start_index, min_len+start_index):
print(s[i], end="")
print()
Space Complexity of this approach would be
O(1).
[forminator_quiz id="1349"]
This article tried to discuss the concept of Strings, Hashing. Hope this blog helps you understand and solve the problem. To practice more problems on Strings, Hashing you can check out .
