Diff algorithm

Author(s): Isabel Garcia-Contreras, Jose F. Morales.

This module implements Eugene Myers' Greedy Difference Algorithm described in 'An O(ND) Difference Algorithm and Its Variations' [Mye86].

The implementation is parametric in the comparison predicate. This predicate has to succeed if two elements are considered equal.

Complexity

This 'greedy' algorithm that computes the number differences has complexity O((N+M)D) in both time and space. With M and N the lenghts of the input lists and D the number of changes.

Pseudocode

This pseudocode is copied from the original article, where:

  • a and b are the input lists of lengths N and M,
  • k = x - y and is used to label diagonals,
  • V[k] contains the row index of the endpoint of a furthest reaching path in diagonal k.

Constant MAX [0,M+N]

Var V: Array [-MAX.. MAX] of Integer

V[1] = 0

For D = 0 to MAX Do
   For k = -D to D in steps of 2 Do
      If k = -D or (k = D and V[k-1] < V[k+1] Then
	x = V[k+1]
      Else
	x = V[k-1] + 1
	y = x - k
      While (x < N, y < M and a(x+1) = b(y+1) Do
	(x,y) = (x+1,y+1)
      V[k] = x
      If (x >= N and y >= M) Then
	    Length of an SES is D
	    Stop

Length of an SES is greater than MAX

Example

?- diff([a,a,b,c], [b,c,d], '=', Diff).

Diff = [del(0,a),del(0,a),ins(2,d)] ?

yes
?-

Patch

The patch operation applies a list of changes expressed as diff_item/1 to obtain a new list. Note that given two lists L1, L2, if their Diff is applied to L1, it will be obtained L2:

?- L1 = [a,a,b,c], L2 = [b,c,d],
   diff(L1, L2, '=', Diff),
   patch(L1, Diff, L2).


Diff = [del(0,a),del(0,a),ins(2,d)],
L1 = [a,a,b,c],
L2 = [b,c,d] ?
yes
?-

Usage and interface

Documentation on exports

PREDICATEdiff/4

Usage:diff(Ls1,Ls2,Compare,Diff)

Diff are the changes needed to transform Ls1 into Ls2.

Meta-predicate with arguments: diff(?,?,pred(2),?).

PREDICATEpatch/3

Usage:patch(Ls,Diff,LNew)

Apply a list of changes (Diff) onto a Ls.

diff_item(X)

X is an insertion denoted with ins(Pos, Elem) meaning that Elem has to be inserted in position Pos, or del(Pos, Elem) meaning that Elem has to be removed from position Pos. It is defined as:

diff_item(ins(P,_1)) :-
        nnegint(P).
diff_item(del(P,_1)) :-
        nnegint(P).

Usage:diff_item(X)

X is a single edition in a sequence (insertion or deletion)

    Documentation on imports

    This module has the following direct dependencies:

    Known bugs and planned improvements

    • A match/3 predicate could be implemented to compute the longest common sequence between two lists (using a computed diff).