Suffix Arrays for Competitive Programming
Last Updated : 12 Mar, 2024
A suffix array is a sorted array of all suffixes of a given string. More formally if you are given a string 'S' then the suffix array for this string contains the indices 0 to n, such that the suffixes starting from these indices are sorted lexicographically.

Example:
Input: banana
0 banana 5 a
1 anana Sort the Suffixes 3 ana
2 nana ----------------> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5 a 2 nana
So the suffix array for "banana" is {5, 3, 1, 0, 4, 2}
Construction of Suffix Arrays:
- Naive way to construct suffix array
- Using Radix Sort to construct suffix array in O(n * Log(n))
Use Cases of Suffix Array:
1. Searching a Substring in a string:
Problem: Given a string 'S' and a string 'T' determine whether the string T is a substring of S, if so return the index at which T is a substring of S.
Example:
Input: S = "bannana" , T = "nan"
Output: 3
Naive Solution: In O(|S| * |T|) we can iterate on each index of 'T' and then compare whether the substring starting at that index matches 'S' or not.
Solution using Suffix Array: We can notice that any substring is a prefix of some suffix. In the suffix array for string 'S' we cut off the first |T| characters of each suffix and get all the substring of length atmost |T| in a sorted order. In order to find S we can simply apply binary search and compare the mid string to string S.
- If mid string of suffix array is lexicographically smaller than 'T' then binary search on right half.
- If mid string of suffix array is lexicographically greater than 'T' then binary search on left half.
- If both the string match return that index as our result.
Time Complexity: O(|S| * log(|S|) + |T| * log(|S|) ), where O(|S| * log(|S|)) is to construct suffix array for string S and O(|T| * log(|S|)) is to search and compare string T.
2. Finding Longest Common Prefix (LCP):
Problem: Given a string 'S' and Q queries of the form {i, j}. Find the LCP(i, j) i.e. length of the Longest Common Prefix(LCP) for the suffixes starting at index i and j.
Example:
Input: S = "banana" , Query = {{0, 5}, {4, 2}, {1, 3}}
Output: 0 2 3
Explanation: Query[0] = {0, 5} = LCP (banana, a) = ' ' = 0
Query[1] = {4, 2} = LCP (na, nana) = 'na' = 2
Query[2] = {1, 3} = LCP (anana, ana) = 'ana' = 3
Naive Solution: For each query we can we can compare both the suffixes starting from i and j in O(|S|) thus giving us a total time complexity of O(Q*|S| )
Solution using Suffix Array: Let our suffix array be Suffix[], in order to solve the problem let us construct an array lcp[] such that lcp[i] = LCP(Suffix[i], Suffix[i+1]). In simple language the lcp[] array stores the Longest common prefix of adjacent indices in suffix array as shown in the below image for string S = "banana".

Now in order to calculate LCP(i, j) just find the position of i and j in suffix array and calculate the minimum value in range lcp[Suffix[i]] to lcp[Suffix[j]-1].

Proof: Let LCP(i, j) = k , since the Suffixes are sorted in Lexicographical order, therefore each suffix from Suffix[i] to Suffix[j] will have atleast k common characters at string, So all lcp from i to j is not less than k and therefore the minimum on this segement is not less than k. On the other hand, it cannot be greater than k, since this means that each pair of suffixes has more than k common characters, which means that i and j must have more than k common characters.
Note: Interestingly we can construct a sparse table in order to answer each query in O(1).
How to construct the lcp[] array in O(N)
Time Complexity: O((|S| * log|S|) + Q)
3. Number of Different Substrings:
Problem: Given a string 'S', the task is to find the total number of unique substrings of S.
Example:
Input: S='abab'
Output: 7
Explanation: Unique substrings of "abab" = {"abab","aba","ab","a","bab","ba","b"}
Solution using Suffix array: As we know that any substring is a prefix of some suffix. In order to calculate the total number of distinct substrings we can iterate the suffix array (where suffixes are sorted) ,the total number of prefixes is equal to the length of the suffix. In order to find out which of them have already occurred in the previous suffixes, we just need to subtract the LCP of this suffix with the previous one.
The below image shows how to calculate number of distinct substrings for the string "BANANA" using suffix and lcp array.

Practice problems on Suffix Array:
Similar Reads
Arrays for Competitive Programming
In this article, we will be discussing Arrays which is one of the most commonly used data structure. It also plays a major part in Competitive Programming. Moreover, we will see built-in methods used to write short codes for array operations that can save some crucial time during contests. Table of
15+ min read
Queue for Competitive Programming
In competitive programming, a queue is a data structure that is often used to solve problems that involve tasks that need to be completed in a specific order. This article explores the queue data structure and identifies its role as a critical tool for overcoming coding challenges in competitive pro
8 min read
String Guide for Competitive Programming
Strings are a sequence of characters, and are one of the most fundamental data structures in Competitive Programming. String problems are very common in competitive programming contests, and can range from simple to very challenging. In this article we are going to discuss about most frequent string
15 min read
7 Best Books for Competitive Programming
Do you have a dream to win a Gold Medal in the Olympics of Programming (ACM ICPC)? Do you want to ace your career with Google Kickstart or want to win a prize amount of $20,000 to become a world champion in Facebook Hackercup or Google Code jam? Then you have to be an out-of-the-box problem solver.
8 min read
DP on Trees for Competitive Programming
Dynamic Programming (DP) on trees is a powerful algorithmic technique commonly used in competitive programming. It involves solving various tree-related problems by efficiently calculating and storing intermediate results to optimize time complexity. By using the tree structure, DP on trees allows p
15+ min read
What Are The Best Resources For Competitive Programming?
Gennady Korotkevich, Petr Mitrichev, Adam D'Angelo.... Have you heard the above name ever...?? Let me tell you who they are... The first two people (Gennady Korotkevich, Petr Mitrichev) are popular for being the top competitive programmers in the world and the last one (Adam D'Angelo) is also one of
9 min read
Segment Trees for Competitive Programming
Segment Tree is one of the most important data structures used for solving problems based on range queries and updates. Problems based on Segment Trees are very common in Programming Contests. This article covers all the necessary concepts required to have a clear understanding of Segment Trees. Tab
8 min read
Best Courses on Competitive Programming
Competitive programming has gone beyond being a niche interest. Has become a skill, for computer science enthusiasts. Being able to solve algorithmic problems is highly valued in the tech industry. Recognizing this demand various online platforms offer courses tailored to skill levels and learning p
5 min read
Ternary Search for Competitive Programming
Ternary search is a powerful algorithmic technique that plays a crucial role in competitive programming. This article explores the fundamentals of ternary search, idea behind ternary search with its use cases that will help solving complex optimization problems efficiently. Table of Content What is
8 min read
Which C++ libraries are useful for competitive programming?
C++ is one of the most recommended languages in competitive programming (please refer our previous article for the reason) C++ STL contains lots of containers which are useful for different purposes. In this article, we are going to focus on the most important containers from competitive programming
3 min read