I’ve learned about the Levenshtein distance in two different courses on computer algorithms. I spent a lot of time writing up a detailed explanation of a common method used to solve the problem with dynamic programming.

So, when I saw a practice problem on TopCoder called ShortPalindromes, I got excited about the similarity to the “minimum edit distance.”

Because the problem statement is owned by TopCoder, I can’t post the whole thing here. However, I do own my solution (available on GitHub), which is a nice variant of the Levenshtein distance solution I posted on my projects page.

I think I’m OK legally to summarize the problem in my own words, in order to frame my solution.

ShortPalindromes problem summary

The idea is to take a string as input, and turn the string into a palindrome if it is not already a palindrome. The output string doesn’t have to be an actual word, but it must be a palindrome.

For example, if I pass in ab then I should get aba. The tricky part of the problem spec, however, is that you can insert characters in front of the string, at the end, or at any point in between.

That means that ab could also be bab, or for a more complex example, raecar should become racecar (the second c gets inserted not at the front nor the back).

the min-ed-dis similarity

I initially thought I could use a tree to represent the possibilities, and use an A* search to get the best result for the lowest number of insertions. However, this approach would involve a lot of branches at each node, and could get huge for larger strings.

Then I realized that this is basically a “minimum edit distance” problem, where we are trying to make a forward version of the string just like a reversed version of the string.

Take massaman (my favorite curry dish) for example. Below is the chart we’d generate to turn massaman (forward input) into namassam (backward input):

namassam
012345678
m112234567
a221223456
s332232345
s443333234
a554434323
m665444432
a776545543
n877655654

I built this with the calculator at the bottom of my Levenshtein distance project page.

the difference

There are a few problems with this chart.

For one, we can’t “delete” or “substitute” any characters from the input string. This means we’ll have to redefine what it means to move through the chart.

Second, it doesn’t make sense to move all the way down to the bottom right cell of the chart. We don’t want to reverse our input, we just want the make the front half match the second half. This means that our solutions actually lie on the minor diagonal of the chart.

min-ed-dis modification

First, we need to address the differences in what it means to move from one cell to another.

Here’s what we’ll do… In my previous post I used a demo chart that was changing cast to bat – here let’s say my input is tab and I’m trying to get a ShortPalindrome.

our new chart labels

Let’s start by redefining what the column and row labels mean.

Imagine that we have 2 pointers. One points to the front of the input, and the other points to the end.

t a b
 

Initial string solution: tab (same as input)

As we move from the first row in our chart to the last, think of the front pointer advancing down the string. As we move from the first column in our chart to the last, imagine the back pointer moving toward the front of the string.

our new chart actions

  1. Insert the character that the back pointer points to in front of the front pointer, and advance the back pointer toward the front.
    t a b
     
    Modified string: btab

  2. Insert the character that the front pointer points to after the back pointer, and advance the forward pointer toward the back.
    t a b
     
    Modified string: tabt

It’s a little tricky to see in this example, since we’re only using 3 characters, but this means we allow ourselves to insert characters in between the start and end of the string, because we are moving the pointers inward from the ends.

our new chart motions

Now let’s put these actions into the context of movement through the chart.

  1. The characters differ, so we insert the character the front pointer points to to the back of the back pointer, and advance the front pointer forward. Since we've advanced the front pointer, we move down in the chart. But the back pointer remains pointing to the same character, even though the solution string is now longer. This means we stay in the same column until we address the back pointer.
    bat
    +1
    t
    a
    b

    Yellow cell (b,t) = dark blue score + 1
    Examine (b,a) next

  2. The characters differ, so we insert the character the back pointer points to in front of the front pointer, and advance the back pointer toward the front. Similar to our other option, we stay in the same row since the front pointer is not yet addressed.
    bat
    t+1
    a
    b

    Yellow cell (b,t) = dark blue score + 1
    Examine (a,t) next

  3. The characters are the same, we do nothing to the string we are modifying, march the front pointer toward the back, and the back pointer toward the front.
    bat
    +0
    t
    a
    b

    Yellow cell (b,r) = dark blue score + 0
    Examine (a,a) next

Note that there are only 3 options in this chart. In min-ed-dis we have 2 options on the diagonal move. The second option not here is the “substitute” operation. We cannot substitute in this chart, because that wouldn’t help us get any closer to a palindrome.

This also means that the diagonal move is only allowed if we have matching characters.

our new solution

The last difference is in how we read the chart’s solution. In the “minimum edit distance” problem, the solution lies in the bottom right of the chart.

In the case of ShortPaindrome, our solution is found when our front pointer and our back pointer are pointing to the same location. This means that we’ve matched the front half of the string with the second half.

This happens to occur along the minor diagonal of our matrix.

So, once we have filled up a triangular matrix with the scores associated with inserting characters near our pointers, we have a range of solutions along the minor diagonal.

All we have to do is grab the minimum score on the diagonal.

working demo

I wrote this in C++, so I don’t have a nice JavaScript animation of the output like I do for the “minimum edit distance” problem.

Check out my GitHub repo for an example of the output this program generates, and a working C++ implementation.