XmlDiff API
XmlDiff API
1. Contents
  • Link or reference ("mydifflib-py") to an inexistant target. mydifflib.py
  • Link or reference ("input-py") to an inexistant target. input.py
  • Link or reference ("fmes-py") to an inexistant target. fmes.py
  • Link or reference ("ezs-py-depricated") to an inexistant target. ezs.py ** DEPRICATED **
  • Link or reference ("format-py") to an inexistant target. format.py
To use this package as a librarie, you need the provided python's modules described below.
2. mydifflib.py
provides functions for Longest Common Subsequence calculation.
lcs2(X, Y, equal):
apply the greedy lcs/ses algorithm between X and Y sequence (should be any Python's sequence) equal is a function to compare X and Y which must return 0 (or a Python false value) if X and Y are different, 1 (or Python true value) if they are identical return a list of matched pairs in tuples
lcsl(X, Y, equal):
same as above but return the length of the lcs
quick_ratio(a,b):
optimized version of the standard difflib.py quick_ratio (without junk and class) return an upper bound on ratio() relatively quickly.
3. input.py
provides functions for converting DOM tree or xml file in order to process it with xmldiff functions.
tree_from_stream(stream, norm_sp=1, ext_ges=0, ext_pes=0, include_comment=1, encoding='UTF-8'):
create and return internal tree from xml stream (open file or IOString) if norm_sp = 1, normalize space and new line if ext_ges = 1, include all external general (text) entities. if ext_pes = 1, include all external parameter entities, including the external DTD subset. if include_comment = 1, include comment nodes encoding specify the encoding to use
tree_from_dom(root):
create and return internal tree from DOM subtree
4. fmes.py
Fast match/ Edit script algorithm (not sure to obtain the minimum edit cost, but accept big documents).
Warning, the process(oldtree, newtree) function has a side effect: after call it, oldtree == newtree.
class FmesCorrector(self, formatter, f=0.6, t=0.5):
class which contains the fmes algorithm formatter is a class instance which handle the edit script formatting (see format.py) f and t are algorithm parameter, 0 < f < 1 and 0.5 < t < 1 in xmldiff, f = 0.59 and t = 0.5
FmesCorrector.process_trees(self, tree1, tree2):
launch diff between internal tree tree1 (old xmltree) and tree2 (new xml tree) return an actions list
5. ezs.py ** DEPRICATED **
Extended Zhang and Shasha algorithm (provide the minimum edit cost, but too complex to be used with big documents).
class EzsCorrector(self):
class which contains the ezs algorithm
EzsCorrector.process_trees(self, tree1, tree2):
launch diff between internal tree tree1 (old xmltree) and tree2 (new xml tree) return an actions list
6. format.py
provides classes for converting xmldiff algorithms output to DOM tree or printing it in native format or xml xupdate format. The formatter interface is the following :
class AbstractFormatter:
abstract class designed to be overrinden by concrete formatters
AbstractFormatter.init(self):
method called before the begining of the tree 2 tree correction
AbstractFormatter.add_action(self, action):
method called when an action is added to the edit script
AbstractFormatter.format_action(self, action):
method called by end() to format each action in the edit script at least this method should be overriden
AbstractFormatter.end(self):
method called at the end of the tree 2 tree correction
the concrete classes are InternalPrinter, XUpdatePrinter and DOMXUpdateFormatter
See xmldiff.py for an use example.