![]() |
XmlDiff API |
XmlDiff API To use this package as a librarie, you need the provided python's
modules described below.
provides functions for Longest Common Subsequence calculation.
lcs2(X, Y, equal):
apply the greedy lcs/ses algorithm between X and Y sequence
(should be any Python's sequence)
equal is a function to compare X and Y which must return 0 (or
a Python false value) if X and Y are different, 1 (or Python
true value) if they are identical
return a list of matched pairs in tuples
lcsl(X, Y, equal):
same as above but return the length of the lcs
quick_ratio(a,b):
optimized version of the standard difflib.py quick_ratio
(without junk and class)
return an upper bound on ratio() relatively quickly.
provides functions for converting DOM tree or xml file in order to
process it with xmldiff functions.
tree_from_stream(stream, norm_sp=1, ext_ges=0, ext_pes=0, include_comment=1, encoding='UTF-8'):
create and return internal tree from xml stream (open file or
IOString)
if norm_sp = 1, normalize space and new line
if ext_ges = 1, include all external general (text) entities.
if ext_pes = 1, include all external parameter entities, including the external DTD subset.
if include_comment = 1, include comment nodes
encoding specify the encoding to use
tree_from_dom(root):
create and return internal tree from DOM subtree
Fast match/ Edit script algorithm (not sure to obtain the minimum edit
cost, but accept big documents).
Warning, the process(oldtree, newtree) function has a side effect:
after call it, oldtree == newtree.
class FmesCorrector(self, formatter, f=0.6, t=0.5):
class which contains the fmes algorithm
formatter is a class instance which handle the edit script
formatting (see format.py)
f and t are algorithm parameter, 0 < f < 1 and 0.5 < t < 1
in xmldiff, f = 0.59 and t = 0.5
FmesCorrector.process_trees(self, tree1, tree2):
launch diff between internal tree tree1 (old xmltree) and
tree2 (new xml tree)
return an actions list
Extended Zhang and Shasha algorithm (provide the minimum edit cost,
but too complex to be used with big documents).
class EzsCorrector(self):
class which contains the ezs algorithm
EzsCorrector.process_trees(self, tree1, tree2):
launch diff between internal tree tree1 (old xmltree) and
tree2 (new xml tree)
return an actions list
provides classes for converting xmldiff algorithms output to DOM
tree or printing it in native format or xml xupdate format. The
formatter interface is the following :
class AbstractFormatter:
abstract class designed to be overrinden by concrete
formatters
AbstractFormatter.init(self):
method called before the begining of the tree 2 tree
correction
AbstractFormatter.add_action(self, action):
method called when an action is added to the edit script
AbstractFormatter.format_action(self, action):
method called by end() to format each action in the edit
script
at least this method should be overriden
AbstractFormatter.end(self):
method called at the end of the tree 2 tree correction
the concrete classes are InternalPrinter, XUpdatePrinter and
DOMXUpdateFormatter
See xmldiff.py for an use example.
|