SciELO - Scientific Electronic Library Online

vol.111 número11-12Oxidative stability of blesbok, springbok and fallow deer droëwors with added rooibos extractFood habits of the aoudad Ammotragus lervia in the Bou Hedma mountains, Tunisia índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados



Links relacionados

  • Em processo de indexaçãoCitado por Google
  • Em processo de indexaçãoSimilares em Google


South African Journal of Science

versão On-line ISSN 1996-7489
versão impressa ISSN 0038-2353

S. Afr. j. sci. vol.111 no.11-12 Pretoria Nov./Dez. 2015 



Putting fossils on the map: Applying a geographical information system to heritage resources



Merrill van der WaltI; Antony K. CooperII; Inge NetterbergIII; Bruce S. RubidgeI

IEvolutionary Studies Institute, School of Geosciences, University of the Witwatersrand, Johannesburg, South Africa
IIBuilt Environment Unit, Council for Scientific and Industrial Research, Pretoria, South Africa
IIINetterberg Consulting, Leuven, Belgium





A geographical information system (GIS) database was compiled of Permo-Triassic tetrapod fossils from the Karoo Supergoup in South African museum collections. This database is the first of its kind and has great time applicability for understanding tetrapod biodiversity change though time more than 200 million years ago. Because the museum catalogues all differed in recorded information and were not compliant with field capture requirements, this information had to be standardised to a format that could be utilised for archival and research application. Our paper focuses on the processes involved in building the GIS project, capturing metadata on fossil collections and formulating future best practices. The result is a multi-layered GIS database of the tetrapod fossil record of the Beaufort Group of South Africa for use as an accurate research tool in palaeo- and geoscience research with applications for ecology, ecosystems, stratigraphy and basin development.

Keywords: Spatial data; Permo-triassic; database; Karoo Supergroup




The fossil record of the Karoo Supergroup, which comprises a largely unbroken temporal record of tetrapod evolution from the Middle Permian to the Middle Jurassic,1,2 provides a unique opportunity to set up a GIS database of fossil occurrences which can be utilised to answer questions relating to ecological and biodiversity change through time. The Karoo fossil record is the best preserved ecological assemblage of pre-mammalian terrestrial tetrapods documenting the stem lineages of both mammals and dinosaurs.3,4

We geocoded palaeontological data for use in a geographical information system (GIS) for palaeoscience research to explore issues relating to the biodiversity of Permian and Triassic tetrapod faunas. This was the first time a GIS had been applied to the fossil records of the Karoo Supergroup. With the cooperation of seven South African museums and institutes (Council for Geoscience, Pretoria; Ditsong Museum, Pretoria; Evolutionary Studies Institute, University of the Witwatersrand, Johannesburg; National Museum, Bloemfontein; Albany Museum, Grahamstown; Rubidge Collection, Wellwood, Graaff-Reinet; Iziko South African Museum, Cape Town) that curate collections of Karoo tetrapod fossils, a GIS incorporating the South African databases of fossil records collected from the Beaufort Group, Karoo Supergroup has been compiled.

The hundreds of thousands of fossil artefacts stored and accessioned in museum collections are the foundation of our knowledge on past biodiversity. Great strides have been made in biodiversity informatics in providing digital access to extinct biodiversity data, for integration, interpretation, reconstruction and application objectives. Models for community data access are evident in abundant projects, such as:

  • The Revealing Human Origins Initiative (RHOI)5 Specimen Database, a collaboration of paleoanthropological and related projects studying Late Miocene (and Pliocene) hominins and other faunas in context, with the database including digital imagery and metadata that covers age, geology, collection elements and taxonomy;

  • The digital@rchive of Fossil Hominoids6, for which the primary mandate is to facilitate morphological investigations in the field of human evolution by providing digital data for the international scientific community;

  • The Darwin Core metadata standard7, a uniform standard designed to expedite the exchange of information about the geographic occurrence of species and specimen records in collections, with extensions for palaeontology.

These information systems, driven by distributed data retrieval, data capture and person-facilitated geospatial referencing, have enabled the investigation of novel research questions around ecological reconstruction, extinct biodiversity trends and predictive modelling.

Historically, details of fossils collected were recorded as hand-written descriptions on index cards or in catalogues (Figure 1). Such documentation included both data (e.g. species and location) and metadata (information about the record), such as who collected, prepared and/or identified the fossil, where the fossil is stored and who wrote up the index card.

There are a variety of standards available for metadata, such as the Dublin Core (ISO 15836:2009),8 developed primarily for describing resources for discovery, and ISO 19115:2003,9 for describing geographical data, of which the South African profile (subset) is SANS 1878-1:2005.10 Dublin Core is primarily text-based, making it easy to enter information for its 15 metadata elements, while ISO 19115 makes extensive use of encoding, which facilitates automated processing and presenting the metadata in multiple languages. Metadata can be converted from one standard to another using an ontology or a cross-walk.11 As ISO 19115 has encoded metadata and more detailed metadata elements, it is easy to convert its metadata to Dublin Core through a cross-walk (conversion table), but the reverse is difficult because of the need to subdivide metadata elements, text processing and, invariably, use human expertise. Hence, it would be better to use a metadata standard such as ISO 19115 for palaeontological records.

South African fossil-find field notes for the Beaufort Group (to be eventually reconfigured into museum index cards) were written up over the space of 150 years (since 184512,13) and do not conform to any particular standard. The main disadvantage is that some records contain inadequate or ambiguous data, particularly relating to the precise location of the fossil provenance.

This paper focuses on the processes involved in establishing a GIS for tetrapod fossils from the Beaufort Group. It highlights the key challenges encountered during database establishment, as well as describing its main applications and future best practices for use as an accurate research tool in palaeontological research. This unique database is curated at the Evolutionary Studies Institute (ESI) at the University of the Witwatersrand and is available as a research tool to all bona-fide scientists


Creating a reliable product

Extensive fossil collections have been amassed from the rocks of the time-extensive Permo-Triassic Beaufort Group and curated in different museum collections in South Africa, providing a unique opportunity to incorporate these collections onto a GIS. Ultimately, this database will be expanded to include fossils from the Beaufort Group which are housed in overseas institutions such as the Natural History Museum, London; Smithsonian Institution, Washington DC and the Field Museum, Chicago. This GIS will enhance their utility in research relating to changing biodiversity patterns, both temporally and geographically, as well as stratigraphic and basin development modelling.

Problems that had to be overcome in setting up the GIS database related largely to a lack of consistency in the data, ambiguous locality data and outdated taxonomic records, requiring rigorous standardisation and updating.

While all the original data was provided in digital format, these were set up from manual records. This is the main drawback encountered when having to apply human interpretation verses the structured logic of the computer. The establishment of the GIS database highlighted the value of structuring data to suit GIS and other digital applications. The migration of paper records to useful electronic records could not simply be carried out verbatim as many of the data obtained from the contributing South African museums needed to be restructured to facilitate analysis through electronic means.



Mapping palaeontological specimens

As this database has been set up as a research tool to be used by palaeontologists, it is important to explain the methodology in detail so that users can fully understand why the GIS was created in this particular way.

The broad-spectrum processes were divided into three stages (Table 1): Stage 1: Acquisition and processing of original data; Stage 2: Establishing a GIS management system; Stage 3: Reconciliation.

More detailed processes involved in spatially mapping the fossils were subdivided into two phases (Table 2), Phase 1: accessing and processing of data and Phase 2: development of a spatial model.

Alphanumeric data was converted to spatial data because, for most of the records, the location was specified using geographical identifiers,14 particularly farm names, rather than coordinates. Converting the data required rigorous 'cleaning', correction of spelling errors and standardisation of content to permit queries. Farm names with their corresponding farm numbers were aligned with the names registered with the Registrar of Deeds and the Surveyors General.

Once cleaning of data had been accomplished, selection of data fields applicable for spatial mapping was undertaken. Geospatial coordinates used for mapping species location and distribution are crucial for a reliable spatial system.15 Access to geospatially referenced data from fossils provides a quantitative basis for biodiversity analyses over time and predictive niche modelling for determining sampling densities of various sites.

Providing locality coordinates proved a significant challenge. Most of the recorded specimens were associated with a georeference, but this reference was, in most instances, a worded description of the localities from where they were discovered with few records having geographic coordinates (Table 3).



To get to the point where data could be represented on a spatial map, two approaches were adopted. The first involved selecting records that qualified for automatic import into the system. The second approach involved records that could only be entered onto the system manually.

Automated data entry procedure

Records with locality coordinates from a Global Positioning System (GPS) could be entered automatically. However, as the majority of records had only a farm name for the locality, a spatial database had to be created to allow records to be imported automatically to specific localities referenced as farm centroids. A farm centroid is the calculated gravitational centre of a polygon (farm boundaries are polygons). This centroid is calculated using the ArcMap field calculator which automatically sets a field value for a single record, or even all records.

Forcing such localities into a single point at the gravitational centre of the farm introduces error and inaccuracy into the data, but remains the best option to utilise locality data for the majority of fossils found prior to GPS usage. Current locality data were accurately captured by GPS.

To allow data to be imported automatically, certain tasks had to be completed (Table 4). A geodatabase was created to house the spatial data of the farm, administrative, district and magisterial boundaries and local authorities' databases.16 Various map layers (including Landsat 7 ETM+ Satellite imagery) were necessary as backdrop data to interpret the distribution patterns of fossil taxa.

Because most of the specimens in older collections lacked geographic coordinates for their place of discovery, the most accurate locality information in the majority of the databases was simply a farm and district name. To represent this locality information on the GIS, farm locality data was received in .FEA format from the Surveyor General and converted into shape file format. Alphanumeric data were exported as a point file and joined to the polygon data using a spatial join. The cadastre received from the Surveyor General contained farm boundaries and their farm numbers, but very few farm names. This lack of names posed a problem as localities for most of the specimens in the museum catalogues were given as locality names, which were assumed to correspond to the farm names. As such, the farm names were essential for the geocoding of the localities and thus the specimens.

To solve the number versus name problem, Environmental Potential Atlas (ENPAT 200417) farm cadastre data was used as the new spatial layer to identify localities. For each farm, centroids were generated and used to geocode the specimens by linking the specimen locality names to farm names. Additional backdrop map layers included Surveyor General data for magisterial districts and provinces. These data were used to identify further localities, as farm names are not unique across the country. Digitised geological maps covering the extent of the Beaufort Group were included as additional backdrop data.

The Evolutionary Studies Institute (ESI) collection database was selected as the test case because of the high resolution of farm locality and map sheet data, to determine whether automated entry of palaeontological records was a feasible option. Unique localities were split into those with coordinates and those without, as the process for identifying the location of these two groups of localities was different16.

Those localities with grid coordinates were extracted and all coordinate data converted to decimal degrees, imported into ArcGIS® as an event theme, converted to a shape file, and each specimen was located as a point in the spatial data file.

Localities that lacked coordinates were identified by districts, farm names and map sheet indices. As farm names are not unique and can be repeated for several districts, the map sheet index was used in addition to the farm names and districts as identifiers for the location of the farm localities. As an index shape file of the 1:50 000 map sheet series does not exist, a map sheet index shape file was created by digitising the sheets.

As a test run to determine how to automate the linkage of the locality name provided by the ESI data to farm records listed in the Surveyor General data, 13 distinct localities in the district of Beaufort West were selected (Table 5). According to the alphanumeric data, all these localities fall on the same map sheet except for the Winterberg (Gryskop) locality. Of the 13 localities, only seven were matched to the spatial data and of these only two localities fell on the correct map sheet,16 indicating it would be difficult to automate the linking process.



Another test was run to determine if 'selection by map sheet' could be used as a method to link alphanumeric data to spatial data. Map sheet 3123DD was randomly selected and alphanumeric records of the ESI collection located on this map sheet were selected, returning 41 records. These records were then queried such that only distinct localities would be returned, and resulted in 38 localities. As locality name should correspond to farm name, it follows that there should be 38 farms which intersect with this map sheet. A query was performed to select all the farms which lie wholly or partly on this map sheet and resulted in 16 farms - less than half the number of distinct localities. As the method of using locality name, farm name and farm centroid was not effective (because neither locality name nor farm name matched the government farm name), an alternative linkage solution needed to be created.

The database of Iziko South African Museum, which contains both locality names and formal government farm names, was used as the linkage mechanism. For each collection, a query for distinct records of specimens was run. Results from this query were input int