SimMetrics is an open source extensible library of Similarity or Distance Metrics, e.g. Levenshtein Distance, L2 Distance, Cosine Similarity, Jaccard Similarity etc etc. SimMetrics provides a library of float based similarity measures between String Data as well as the typical unnormalised metric output.
It is intended for researchers in information integration, II, and other related fields. It includes a range of similarity measures from a variety of communities, including statistics, DNA analysis, artificial intelligence, information retrieval, and databases.
Further details on the individual string or similarity metrics are discussed further here, http://www.dcs.shef.ac.uk/~sam/stringmetrics.html.
SimMetrics can be downloaded on sourceforge.
This library has been developed to provide a consitant interface layer to similarity measures that act in a normailised manner allowing comparison and composition of metrics, whilst still allowing usage of the basic algorithms original output.
All metrics can work on a simple basis whereby they take two strings and return a similarity measure from 0.0 to 1.0, 0.0 being entirely different, 1.0 being identical.
The metrics developed have been optermised for fast processing time and include methods that provide timing estimates.
Any metric with cost functions facilitates the addition or modification of the cost function allowing custom metrics to be developed, (for more details on cost functions they are detailed in the descriptions of various string metrics).
This standardised interface based approach allows a combination of techniques rather than inconsistent strategies that do not 'map'.
Similar projects, SecondString - (http://sourceforge.net/projects/secondstring/) this provides a large collection of String Metrics but has a problem in that they have unnormalised outputs meaning that composition of metrics is harder.
SimMetrics was developed by Sam Chapman at Sheffield University from the Natural Language Processing Group.
This work was carried out within the AKT project (http://www.aktors.org), sponsored by the UK Engineering and Physical Sciences Research Council (grant GR/N15764/01), and the Dot.Kom project, sponsored by the EU IST asp part of Framework V (grant IST-2001-34038).
This work is now released to the open source community and is benefitted from work from various developers and researchers.
I would welcome collaborations and outside development on this open source project, if you want to help or simply leave a comment then please email me at reverendsam@users.sourceforge.net or sam@dcs.shef.ac.uk.