Test page for TAXAMATCH functions

using Tony Rees' reference application at CMAR, Hobart
Input string 1:   
example: (taxon name): Pseudousipalia microptera
          (authority): Lohse, 1990

Input string 2:   
example: (taxon name): Pseudosipalia micropterus
          (authority): (Lose, 1991)

Test Type:              Block limit (for MDLD test only):
                                     
Notes:

1. Edit Distance test results are returned on a 0 - n (difference) scale, where n = no. of character-based edits by which the input strings differ (0 = identical, 1 = 1 char difference, etc.), i.e. the higher the value, the more dissimilar are the input strings. By contrast, n-gram and authority comparisons are returned on a 1-0 scale (similarity) where e.g. 1 = identical, 0.5 = 50% similarity, 0 = no similarity.

2. The Edit Distance (LD/DLD/MDLD) and n-gram based tests are case sensitive, i.e. there will be zero similarity detected between the strings "tony" and "TONY" (giving ED 4 [all 4 chars different] or n-gram similarity 0 in this example). To remove this effect, simply transform both strings into either all uppercase or all lowercase before testing.

3. Block limit sets the maximum length of a transposed block (in characters) that will be searched for by the MDLD algorithm. Larger settings will search for longer blocks but will also slow down the execution time of the algorithm. In present TAXAMATCH operation, a block limit of 3 is typically used. If the block limit is set to 1, the MDLD effectively functions as a DLD algorithm, i.e. only single character transpositions are accepted (at cost 1). Transpositions that exceed the selected block limit will be costed at twice the number of transposed characters rather than equal to the number of transposed characters.

4. The authority comparison function is not case sensitive, and in addition employs normalization in several respects, plus automated expansion of known author abbreviations, followed by a blend of 2/3 bigrams and 1/3 trigrams, plus a 50% reduced weighting for differences caused by diacritical marks on the same characters.

Some additional comments on expectations / performance of the "authority comparison" function are as follows:

    - Many/most "standard" abbreviated author surnames (as per Botanical usage) will be expanded to their full form before comparison, thus "Aarons." will match "Aaronsohn", and so on

    - Omission of a year in one name tested against the same with a year will affect the quoted similarity, but the result will typically still be above 0.4-0.5, for example "Linnaeus" vs. "Linnaeus, 1758" gives a computed similarity of 0.7617

    - Minor differences in cited year will produce some computed difference, which can be considered significant or not according to the user's requirements, for example "Linnaeus, 1759" vs. "Linnaeus, 1758" gives a computed similarity of 0.8381, "Linnaeus, 1760" vs. "Linnaeus, 1758" gives a computed similarity of 0.7682

    - Significant differences in cited authority style may produce significant computed difference, but the resulting similarity is often above (say) 0.4, for example "Borutzky, 1991" vs. "Borutzky in Borutsky, Stepanova & Kos, 1991" gives a computed similarity of 0.4857; "West & West, 1896" vs. "W. & G.S. West, 1896" gives a computed similarity of 0.6561; "Holovachov et al., 2001" vs. "Holovachov, Bostrom & Susulovsky, 2001" gives a computed similarity of 0.4866.

    - Word order is deliberately less critical for the authority matching algorithm, thus "Faubel & Kolasa, 1978" vs. "Kolasa & Faubel, 1978" are considered highly similar (similarity=0.9394) using this function, whereas the edit distance (e.g. LD / DLD / MDLD) between these strings is very high (12 chars different).

(How is the above useful?) - in the main, to introduce additional authority ranking (most to least similar) in situations where a taxon name is either an exact, or a near match to a candidate name on (e.g.) another list, after using one of the "edit distance" tests for the latter.

Below are some examples of the "authority match" in practice - in this instance, where the relevant genus names are already an exact match (Note, for testing e.g. using the interface above, only the authority portions should be compared):

IRMNG_GENNAME

GBIF_GENNAME

AUTH_SIMILARITY

Guadalgenus Stark & Gonzalez del Tanago, 1986

Guadalgenus Stark & Gonzales del Tanago, 1986

.9302

Cornigerus Mordukhai-Boltovskoi, 1967

Cornigerus Mordukai-Boltovskoi, 1967

.9294

Pausia Kuznetzov & Livshitz, 1972

Pausia Kuznetzov & Livhitz, 1972

.9294

Liometoxenus Kistner, Jensen & Jacobson, 2001

Liometoxenus Kistner, Jensen and Jacobson, 2002

.928

Sacculozetes Behan-Pelletier & Ryabinin, 1991

Sacculozetes Behan-Pelletier & Rjabinin, 1991

.928

Oxypyrgula Logvinenko & Starobogatov, 1968

Oxypyrgula Logvinenko & Starobogatov, 1969

.9258

Pyrgula Cristofori & Jan, 1832

Pyrgula De Cristofori & Jan, 1832

.9235

Pachylina Medvedev & Chernov, 1969

Pachylina Medvedev & Cernov, 1969

.9235

Baezia Alonso-Zarazaga & Garcia, 1999

Baezia Alonso-Zarazaga & García, 1999

.9233

Planoristes Iturrondobeitia & Subias, 1978

Planoristes Iturrondobeitia & Subías, 1978

.9233

Ovipleistophora M. Pekkarinen, J. Lom & F. Nilsen, 2002

Ovipleistophora M. Pekkarinen, J. Lom & F. Nilsen

.9232

 

 

 

 

 

 

Gnathogastrura Diaz & Najt, 1983

Gnathogastrura D�az & Najt, 1983

.8187

Anthodromius Redtenbacher, 1849

Anthodromius Redtenbacher, 1850

.8187

Azygonyx Gingerich, 1989

Azygonyx P. D. Gingerich, 1989

.8187

Octomicrus Schaufuss, 1877

Octomicrus L. W. Schaufuss, 1877

.8187

Onychocassis Spaeth & Reitter, 1926

Onychocassis Spaeth in Spaeth & Reitter, 1926

.817

Vasaces Champion, 1890

Vasaces Champion, 1889

.8158

Paraleirides St.-Claire Deville, 1906

Paraleirides Sainte-Claire Deville, 1906

.8132

Unicobelba Mahunka & Mahunka-Papp, 2000

Unicobelba Mahunka & Mahunka-Papp, 1999

.812

Brachyophonus Sciaky, 1986

Brachyophonus Sciaky, 1987

.8119

Argocetus Gloger, 1841

Argocetus Gloger, 1842

.8119

 

 

 

 

 

 

Ulrike Aspock, 1968

Ulrike H. Aspöck & U. Aspöck, 1968

.4882

Occidodiaptomus Borutzky, 1991

Occidodiaptomus Borutzky in Borutsky, Stepanova & Kos, 1991

.4857

Pseudorthygia Csiki, 1940

Pseudorthygia Csiki in Heikertinger and Csiki, 1940

.4855

Unilaterilecithum Oshmarin, 1952

Unilaterilecithum Oschmarin in Skrjabin & Evranova, 1952

.4815

Cyrtodelphis Abel, 1900

Cyrtodelphis O. Abel, 1899

.4811

Haemobaphoides Scott, 1912

Haemobaphoides Scott T. & Scott A., 1913

.4737

Geotrigona Moure, 1943

Geotrigona Camargo e Moure, 1996

.4706

Geotrigona Moure, 1943

Geotrigona Moure e Camargo, 1991

.4706

Plagioglypta Pilsbry, 1898

Plagioglypta Pilsbry in Pilsbry & Sharp, 1897

.4678

Arostrilepis Mas-Coma & Tenora, 1997

Arostrilepis Mas Coma, 1982

.4611

Multidentatus Clifford, Sonenshine, Keirans & Kohls, 1973

Multidentatus Clifford & al., 1973

.4611

 

 

 

 

 

 

Pulchroppia Hammer, 1979

Pulchroppia Subías, 1989

.2179

Epicnaptera Rambur, 1866

Epicnaptera Hübner, 1820

.2179

Foroculum Berthold, 1827

Foroculum Thompson J., 1843

.2125

Viguierella Maupas, 1899

Viguierella Perrier, 1893

.2094

Hyalesthes Amyot, 1847

Hyalesthes Signoret, 1865

.2094

Taxonus Dahlbom, 1835

Taxonus Hartig, 1837

.2094

Amphitectus Dahlbom, 1842

Amphitectus Hartig, 1840

.2094

Cheporus Dahl, 1823

Cheporus Latreille, 1829

.2094

Tibicina Amyot, 1847

Tibicina Kolenati, 1857

.2094

Raiboscelis Seidlitz, 1896

Raiboscelis Allard, 1876

.2015

Euryopicoris Puton, 1899

Euryopicoris Reuter, 1875

.2006