An Alternative Traceback Method for Nussinov's RNA Folding Algorithm

Nussinov's algorithm takes a given sequence of RNA and determines the the most stable secondary structure for that sequence based on the assumption that the more base pairs a structure has, the more stable the structure will be[1]. The algorithm can be divided into two steps:

Though there are various enhancements that can be made to bring the algorithm more in line with actual structures by modifying how the substructures are scored, the focus of this report is on how an alternate traceback algorithm can be used to obtain better results from Nussinov's algorithm that are normally discarded. Specifically, the new algorithm incorporates the following assumptions:

Generally speaking, the algorithm attempts to choose a traceback path that will extend opened loops and extend stems when possible while still acting within the constraints of the original algorithm. In pseudocode, the new traceback algorithm can be described as:

last_direction = NONE
while tb_stack is not empty:
    pop(tb_stack, i,j)
    if i >= j:
        continue
    if γ(i+1,j-1) + δi,j == γ(i,j):
        push(candidate_stack, DIAGONAL, i+1,j-1)
    if γ(i,j-1) == γ(i,j):
        push(candidate_stack, LEFT, i,j-1)
    if γ(i+1,j) == γ(i,j):
        push(candidate_stack, DOWN, i+1,j)
    if candidate_stack is empty:
        last_direction = NONE
        for k from i < k < j:
            if γ(i,k) + γ(k+1,j) == γ(i,j):
                push(tb_stack, k+1,j)
                push(tb_stack, i,k)
                break
    else:
        while candidate_stack is not empty:
            pop(candidate_stack, d, m,n)
            if d == last_direction:
                push(tb_stack, m,n)
                if d == DIAGONAL:
                    record(i,j)
                break
        else if nothing pushed onto tb_stack:
            push(tb_stack, m,n)

In the canonical implementation of the traceback step, whenever there are multiple structures that are equivalent in terms of number of base-pairs the first structure that works is chosen because the algorithm does not care about anything besides the number of base-pairs, so any structure with the same number of base pairs as the optimal one will do. However, this ignores important information that can lead it to choose an unstable structure over a more stable one.

Below in figure 2 are three possible structures as predicted by Nussinov's algorithm. All three meet the constraints of the algorithm and have the maximal three base pairs. However, these structures are not equally stable. Figure 2a is decidedly the most stable of the three. It forms a three base pair long stem and a three base hairpin loop (which is more stable than a shorter hairpin loop because the angle is less sharp). Figure 2b is rather unstable and improbable because it requires two adjacent bases to be paired, which would require bending the bases in a nonsensical way[1]. Figure 2c is not particularly stable, but is reasonable; it is nearly identical to the structure in figure 2a, except that it introduces a small bulge loop into the stem, which would make it less stable.

Figure 1: a diagram showing possible tracebacks
Figure 2a: a possible structure predicted by Nussinov's algorithm
Figure 2b: a possible structure predicted by Nussinov's algorithm
Figure 2c: a possible structure predicted by Nussinov's algorithm

As noted on this web page, the structure and traceback in figure 10.9 of Biological Sequence Analysis (reproduced here as figure 1 and figure 2a respectively) are inconsistent with the pseudocode description of the standard traceback algorithm given. The structure is arguably the most stable of the possible foldings that Nussinov's algorithm can predict, but is different from the one that the traceback algorithm described in the book would choose, which is shown above in figure 2b.[3]. This alternate version of the traceback algorithm yields the correct traceback corresponding to the optimal structure in figure 2a.

Figure 3: a diagram comparing the two tracebacks

References

  1. Durbin R, Eddy S, Krogh A, Mitchinson G. Biological Sequence Analysis. New York: Cambridge University Press. 1998.
  2. Wikipedia contributors. RNA structure. Wikipedia, The Free Encyclopedia. May 9, 2008, 06:22 UTC. Available at: http://en.wikipedia.org/w/index.php?title=RNA_structure&oldid=211205665. Accessed August 2, 2008
  3. Sonderegger B. Short Discussion of an Error on Page 271. Bernhard Sonderegger's Masters Program Page. Available at: http://ludwig-sun2.unil.ch/~bsondere/nussinov/Correction.htm. Accessed September 13th, 2006.