* Makefile
* config/*
* contrib/*
* doc/*
* extras/*
* include/*
* lib/*
* package/*
* src/*
* tests/*
* utils/*
* removed empty lines left after removed '$' + 'Id' + '$' identifiers
61 lines
3.2 KiB
Plaintext
61 lines
3.2 KiB
Plaintext
/* $DOC$
|
|
$NAME$
|
|
StrDiff()
|
|
$CATEGORY$
|
|
CT3 string functions
|
|
$ONELINER$
|
|
Evaluate the "Edit (Levensthein) Distance" of two strings
|
|
$SYNTAX$
|
|
StrDiff( <cString1>, <cString2>, [<nReplacementPenalty>], [<nDeletionPenalty>],
|
|
[<nInsertionPenalty>] ) -> <nDistance>
|
|
$ARGUMENTS$
|
|
<cString1> string at the "starting point" of the transformation process, default is ""
|
|
<cString2> string at the "end point" of the transformation process, default is ""
|
|
<nReplacementPenalty> penalty points for a replacement of one character, default is 3
|
|
<nDeletionPenalty> penalty points for a deletion of one character, default is 6
|
|
<nInsertionPenalty> penalty points for an insertion of one character, default is 1
|
|
$RETURNS$
|
|
<nDistance> penalty point sum of all operations needed to transform <cString1> to <cString2>
|
|
$DESCRIPTION$
|
|
The StrDiff() functions calculates the so called "Edit" or "Levensthein" distance of two strings.
|
|
This distance is a measure for the number of single character replace/insert/delete operations (so called
|
|
"point mutations") required to transform <cString1> into <cString2> and its value will be the smallest sum of
|
|
the penalty points of the required operations.
|
|
|
|
Be aware that this function is both quite time - O(Len(cString1)*Len(cString2)) - and memory consuming -
|
|
O((Len(cString1)+1)*(Len(cString2)+1)*sizeof(int)) - so keep the strings as short as possible.
|
|
E.g., on common 32 bit systems (sizeof(int) == 4), calling StrDiff() with two strings of 1024 bytes
|
|
in length will consume 4 MB of memory. To not impose unneeded restrictions, the function will only check if
|
|
(Len(cString1)+1)*(Len(cString2)+1)*sizeof(int) <= UINT_MAX, although allocing UINT_MAX bytes will not
|
|
work on most systems. If this simple check fails, -1 is returned.
|
|
|
|
Also, be aware that there can be an overflow when the penalty points are summed up: Assuming that the
|
|
number of transformation operations is in the order of Max(Len(cString1), Len(cString2)), the penalty point
|
|
sum, that is internally stored in an "int" variable, is in the order of
|
|
(Max(Len(cString1), Len(cString2))*Max(nReplacementPenalty, nDeletionPenalty, nInsertionPentaly).
|
|
The StrDiff() does not do an overflow check due to time performance reasons. Future versions of StrDiff()
|
|
could use a type different to "int" to store the penalty point sum to save memory or to avoid overflows.
|
|
|
|
The function is aware of the settings done by SetAtLike(), that means that the wildchar character
|
|
is considered equal to ALL characters.
|
|
|
|
$EXAMPLES$
|
|
? StrDiff( "ABC", "ADC" ) // 3, one character replaced
|
|
? StrDiff( "ABC", "AEC" ) // 3, dito
|
|
? StrDiff( "CBA", "ABC" ) // 6, two characters replaced
|
|
? StrDiff( "ABC", "AXBC" ) // 1, one character inserted
|
|
? StrDiff( "AXBC", "ABC" ) // 6, one character removed
|
|
? StrDiff( "AXBC", "ADC" ) // 9, one character removed and one replaced
|
|
$STATUS$
|
|
Ready
|
|
$COMPLIANCE$
|
|
StrDiff() is compatible with CT3's StrDiff().
|
|
$PLATFORMS$
|
|
All
|
|
$FILES$
|
|
Library is hbct.
|
|
$SEEALSO$
|
|
SetAtLike()
|
|
$END$
|
|
*/
|