Biodiversity Informatics is the discipline of the application of IT tools and technology to biodiversity information, principally at the organism level, in particular the handling and reconciliation of species level identifiers (names, codes, etc.) - e.g. see this recent paper - and to the distribution of organisms in space and time.
CSIRO Marine and Atmospheric Research and its predecessors in the Biological domain (CSIRO Division of Fisheries, and CSIRO Division of Fisheries Research) have been involved in the use of numeric coding systems for species information for many years, commencing with the national (Australian) "FISHLIST" 6-digit codes developed in the 1970s, which have been developed further to form the basis of the present "CAAB" (Codes for Australian Aquatic Biota) 8-digit coding system (Yearsley et al., 1997; Rees, Yearsley & Gowlett-Holmes, 1999-current). More recently, the CMAR Data Centre has also been closely involved with the design of systems for displaying modelled marine species distributions for Australia, in the design of the international OBIS and AquaMaps systems, the development of "fuzzy match" algorithms for species scientific names, and the creation of IRMNG, the Interim Register of Marine and Nonmarine Genera.
CAAB is a coding system and database for marine and freshwater species in the Australian region. Every species (and some species groups, where appropriate) is allocated a unique 8-digit code that stays with the species (= species concept, = OTU or operational taxonomic unit) no matter what changes may be made to its name. In this manner, data stored against that numeric code can be reconciled at any later date with the "then current" scientific name for the species, regardless of whether this may have been changed (by relevant taxonomic experts) in the mean time. CAAB can also be used to generate species lists for a particular aquatic genus or family, to search on old names or superseded codes, and to display species-related information such as images, point data, or maps, as available. CAAB content is maintained by experts in the relevant taxonomic groups and aims to be continually updated as new information becomes available.
The CMAR Data Centre has been assisting the Fish Taxonomy group at CMAR by devising systems to enter, hold, update and visualize modelled marine species distribution data, which is used in the preparation of marine bioregionalisations of interest to environmental managers and others (e.g. see this report). The data are essentially held as limits around the coast and by depth in which a given species occurs, which can then be used to generate a map, e.g. of the 0.5 x 0.5 degree squares in the Australian region that satisfy those criteria and in which the species may be encountered. At present the majority of these maps are available for CMAR researcher use only, however subsets of them will progressively be made available to a range of public websites including the planned "Fishes of Australia Online", "Oceans Portal", and "Atlas of Living Australia" sites.
OBIS, the Ocean Biogeographic Information System, is an on-line portal to marine species distribution records held in more than 200 databases at institutions around the world. CMAR Data Centre staff designed the prototype and production systems that handle the centralised name and spatial indexing of harvested data on the OBIS Portal at Rutgers University, USA, the name and spatial search functions and custom taxonomic hierarchy that OBIS uses, and the maps that are generated on demand from the spatial index using the c-squares mapper developed here at CMAR.
AquaMaps is a modelling system for the distribution of marine species according to their preferred habitat conditions and empirical observations of other restrictions to their occurrence (such as latitudinal or longitudinal range, presence/absence in designated FAO areas, etc.). CMAR Data Centre staff are contributing to the database design and on-demand mapping capabilities for this project. Like the Australian Modelled Species Distributions described above, the predicted species distributions are presented by 0.5 degree square, but this time with associated probability values computed from stored values for a range of environmental variables. AquaMaps are already available for in excess of 7,000 fish species and for marine mammals, and the number is increasing continuously. It is also possible to click on a desired square of ocean to generate a list of all species for which modelled distributions have been computed, whose range encompasses that square.
Usage of the CAAB search interface by CMAR staff and external users demonstrated the desirability of retrieving species-related information even if a scientific name is misspelled (e.g. "Peneus" for "Penaeus", "Lujtanus" for "Lutjanus", etc.). Accordingly, an initial "fuzzy match" algorithm with particular applicability to matching on species scientific names was developed and implemented in CAAB in 2001, and subsequently at the 15(+) taxonomic databases hosted at VLIZ in Belgium (including the European Register of Marine Species, the world list of Porifera, and many others) in 2003, on the OBIS web site in 2004, and on the Euro+Med Plantbase portal in 2007. Further consideration of the design and desired performance of this algorithm has led to the development of TAXAMATCH, a more extensive algorithm that will cope with both phonetic and non-phonetic (keystroke) errors, for example matching "Hombo sapient" to Homo sapiens, within an acceptable time frame (i.e. one to a few seconds, or less in many cases). Exactly the same techniques can also be applied to the detection of near-duplicate names in cross-database queries, and for de-duplicating existing holdings where otherwise undetected near-duplicate names may be present.
IRMNG, the Interim Register of Marine and Nonmarine Genera, is under development as a tool for distinguishing marine from nonmarine, as well as extant from fossil, taxa, in the first instance at the genus level, but where available holding species names as well. The present version of IRMNG (v 1.0, January 2008) is constructed as a superset of the Catalogue of Life (2006 version), and holds some 238,000 genus names and over 1.4 million species names, most with relevant flags for marine/nonmarine, and extant/fossil status. As well as content from the Catalogue of Life (which holds only genera with subsidiary species information, and does not include fossils), IRMNG has a large number of additional genus names from a wide variety of sources covering both marine and fossil taxa. IRMNG development is continuing on an intermittent basis and its creation has been assisted by funding from OBIS Australia.
The CMAR Data Centre Biodiversity Resources page (formerly: "Taxonomic Resources") is now being maintained once again. It lists a wide range of web resources in classification and taxonomy of the full suite of animals, plants, fungi, protists, prokaryotes and viruses, as a service to the general enquirer.
CMAR Data Centre staff are also involved in collaborating with/contributing to current national and international biodiversity informatics developments including SeaLifeBase, Fishes of Australia Online (FOAO), the Atlas of Living Australia project, the World Register of Marine Species, and the species names management tools for the Encyclopedia of Life.
This page maintained by Tony Rees (Tony.Rees@csiro.au)
Return to Data Centre Home Page
© Copyright CSIRO Australia, 2008