Files
harbour-core/contrib/hbct/doc/en/strdiff.txt
vszakats 9687850865 2013-03-16 02:10 UTC+0100 Viktor Szakats (harbour syenar.net)
* (all files)
    * stripped svn header
    * minor cleanups
    ; use following command to find out the history of files:
       git log
       git log --follow
       git blame
       git annotate
2013-03-16 02:11:42 +01:00

62 lines
3.2 KiB
Plaintext

/* $DOC$
$NAME$
StrDiff()
$CATEGORY$
CT3 string functions
$ONELINER$
Evaluate the "Edit (Levensthein) Distance" of two strings
$SYNTAX$
StrDiff( <cString1>, <cString2>, [<nReplacementPenalty>], [<nDeletionPenalty>],
[<nInsertionPenalty>] ) -> <nDistance>
$ARGUMENTS$
<cString1> string at the "starting point" of the transformation process, default is ""
<cString2> string at the "end point" of the transformation process, default is ""
<nReplacementPenalty> penalty points for a replacement of one character, default is 3
<nDeletionPenalty> penalty points for a deletion of one character, default is 6
<nInsertionPenalty> penalty points for an insertion of one character, default is 1
$RETURNS$
<nDistance> penalty point sum of all operations needed to transform <cString1> to <cString2>
$DESCRIPTION$
The StrDiff() functions calculates the so called "Edit" or "Levensthein" distance of two strings.
This distance is a measure for the number of single character replace/insert/delete operations (so called
"point mutations") required to transform <cString1> into <cString2> and its value will be the smallest sum of
the penalty points of the required operations.
Be aware that this function is both quite time - O(Len(cString1)*Len(cString2)) - and memory consuming -
O((Len(cString1)+1)*(Len(cString2)+1)*sizeof(int)) - so keep the strings as short as possible.
E.g., on common 32 bit systems (sizeof(int) == 4), calling StrDiff() with two strings of 1024 bytes
in length will consume 4 MB of memory. To not impose unneeded restrictions, the function will only check if
(Len(cString1)+1)*(Len(cString2)+1)*sizeof(int) <= UINT_MAX, although allocing UINT_MAX bytes will not
work on most systems. If this simple check fails, -1 is returned.
Also, be aware that there can be an overflow when the penalty points are summed up: Assuming that the
number of transformation operations is in the order of Max(Len(cString1),Len(cString2)), the penalty point
sum, that is internally stored in an "int" variable, is in the order of
(Max(Len(cString1),Len(cString2))*Max(nReplacementPenalty,nDeletionPenalty,nInsertionPentaly).
The StrDiff() does not do an overflow check due to time performance reasons. Future versions of StrDiff()
could use a type different to "int" to store the penalty point sum to save memory or to avoid overflows.
The function is aware of the settings done by SetAtLike(), that means that the wildchar character
is considered equal to ALL characters.
$EXAMPLES$
? StrDiff( "ABC", "ADC" ) // 3, one character replaced
? StrDiff( "ABC", "AEC" ) // 3, dito
? StrDiff( "CBA", "ABC" ) // 6, two characters replaced
? StrDiff( "ABC", "AXBC" ) // 1, one character inserted
? StrDiff( "AXBC", "ABC" ) // 6, one character removed
? StrDiff( "AXBC", "ADC" ) // 9, one character removed and one replaced
$STATUS$
Ready
$COMPLIANCE$
StrDiff() is compatible with CT3's StrDiff().
$PLATFORMS$
All
$FILES$
Library is hbct.
$SEEALSO$
SetAtLike()
$END$
*/