Thursday, May 3, 2012

A naive algorithm for finding maximal palindromes

The previous blog post discusses why it is sufficient to calculate the length of the maximal palindrome around each center position in a string if I want to find palindromes in a string. In this blog post I will describe a naive algorithm for finding the length of all maximal palindromes in a string. This algorithm is slightly less naive than the naive algorithm for finding palindromes given in an earlier blog post. The latter algorithm requires a supercomputer for a couple of days to find palindromes in a long string, the algorithm I will develop in this blogpost requires several days of computing power of the laptop on which I am typing this post. In numbers, this algorithm is about a milliard times faster on such a long string.

Given a string as input, I want to find the list of lengths of maximal palindromes around all centers of the string. I will use the function maximalPalindromes for this purpose. It has the following type

< maximalPalindromes :: String -> [Int]

I want to find the length of the maximal palindrome around each center in a string. I will do this by trying to extend the trivial palindromes consisting of either a single letter (for odd centers) or of the empty string (for even centers) around each center. To extend a palindrome, I have to compare the characters before and after the current palindrome. It would be helpful if I had random access into the string, so that looking up the character at a particular position in a string can be done in constant time. Since an array allows for constant time lookup, I change the input type of the maximalPalindromes function to an array. I have to import the module Data.Array to use arrays.

> import Data.Array
> maximalPalindromes :: Array Int Char -> [Int]

If I change my input type from strings to arrays, I have to convert an input string into an array. The Data.Array module contains a function just for this purpose: the function listArray creates an array from a string together with a pair of indices for the first and last position in the string. Function maximalPalindromes calculates the following lengths of maximal palindromes in the string "abb":

?maximalPalindromes (listArray (0,2) "abb")

Function maximalPalindromes calculates the length of maximal palindromes by first calculating all center positions of an input array, and then the length of the maximal palindrome around each of these centers. The center positions of an array with first and last as bounds are the elements in the list [0 .. 2*(last-first+1)]. I will always use arrays that start at index 0, which implies that the length of an array is last+1.

> maximalPalindromes a  =   
>   let (first,last)  =  bounds a
>       centers       =  [0 .. 2*(last-first+1)]
>   in  map (lengthPalindromeAround a) centers

Function lengthPalindromeAround takes an array and a center position, and calculates the length of the longest palindrome around that position. Center positions do not exactly correspond to indices in the array, since they not only describe locations of characters, but also locations in between characters. To obtain an array index from a center position I divide it by two. If the center position is even, it is in between two characters, and if I divide it by two, I get an array index that points at the character after the center position. I compare the character pointed to by the index with the character before it to find out if the (empty) palindrome can be extended. If it can, I subtract one from the starting position, and add one to the end position, and try to extend the palindrome with the new indices. If the palindrome cannot be extended, because the elements around it are different, or the current palindrome starts at the start or ends at the end of the array, I return the length of the maximal palindrome found, which is the difference between the end and the start position minus one. For odd center positions I can start with the indices before and after the character pointed to by the index obtained by dividing the center position by two, rounding down.

> lengthPalindromeAround  ::  Array Int Char -> Int -> Int
> lengthPalindromeAround a center 
>   | even center = 
>       lengthPalindrome (first+c-1) (first+c) 
>   | odd  center = 
>       lengthPalindrome (first+c-1) (first+c+1) 
>   where  c             =  div center 2
>          (first,last)  =  bounds a
>          lengthPalindrome start end  = 
>             if   start < 0 
>               || end > last-first 
>               || a!start /= a!end
>             then end-start-1
>             else lengthPalindrome (start-1) (end+1) 

For each position, this function may take an amount of steps linear in the length of the array, so this is a quadratic-time algorithm.

No comments:

Post a Comment