BugFlow Logo

BugFlow

a community resource for entomological collection digitization

BugFlow Main Page View on GitHub

Module 2C: In-house data cleanup

Module Purpose:

This module is for a digitization manager to review already captured specimen data. These tasks are ideally situated either as/before data are released to the public or after a particular digitization project is wrapped up. This module will produce a list of digitiation records and errors to fix and ideally several metrics about types of data issues to refine training materials or data capture workflows.

Module Keywords:

data management, edits, publication, quality assurance, review

TaskID Task Name Explanations and Comments Resources
T1 Identify and gather data to check Gather specimen records (via spreadsheets or within collection database), images, and any other digitization metadata needing review.  
T2 Verify locality/collection event data Using manual inspection, custom scripts or built-in tools examine locality information for missing fields, spelling errors, geographic mismatch (e.g. listed county not a part of listed state), and check GPS data to make sure the point falls within the stated geography. Verify date formats (e.g. date collected), distance units (e.g. elevation in meters), and any other data standardizations. Produce a detailed list of data errors and specimen records requiring edits.  
T3 Assign or fix specimen data issues. Apply individual or batch edits to any issues discovered. If they cannot be fixed from the data alone, be sure to flag the records and assign to a person to further edit after reviewing specimens or other data.  
T4 Verify image metadata Check images to be sure they have all metadata prescribed from imaging tasks in Module 4. In particular, verify that catalog numbers or other identifiers used to link file images to specimen records are correct and intact. If applicable, verify that image formats are acceptable for upload to data repository. See Module 4
T5 Assign or fix image data issues Apply individual or batch edits to any issues discovered. If they cannot be fixed from the data alone, be sure to flag the records and assign to a person to further edit after reviewing specimens or other data.  
T6 Verify identification/taxonomic data Using manual inspection, custom scripts or built-in tools examine the determination information for errors. Be sure that the information both matches the curatorial and collection location details as well as taxonomic authorities for the group. This step may use a taxonomic thesaurus or database to verify taxonomic names. Verify consistency of data (e.g. scientific name authorship with year or following a particular taxonomic treatment for a group).  
T7 Assign or fix taxonomic issues Apply individual or batch edits to any issues discovered. If they cannot be fixed from the data alone, be sure to flag the records and assign to a person to further edit after reviewing specimens or other data.  
T8 Update and track processing status Specimen records and images should have the processing status appropriately flagged within the database or whatever management software is being used. This may include such terms as “reviewed” or “Needs review” etc. Most collection management softwares will have a specific field for this, but it can be tracked manually via spreadsheets or other mechanisms if needed  
T9 Assess categories and sources of data errors Errors should not only be fixed but examined for ways they can be avoided in the future. Some key items to consider are: (1) updating or reexamining training materials for/with digitization personnel, (2) amending upstream digitization workflows, (3) amending physical collection curation to facilitate digitization.  

Essential Training:

Define metrics that can be measured to assess success of workflows using this module

Module Metrics, Costing, and Reporting:

Define metrics that can be measured to assess success of workflows using this module (reference specific TaskIDs).

Outreach Opportunities:

List outreach opportunities that arise in workflows using this module (reference specific TaskIDs).

Exemplar Workflows:

List of examplar workflows organized by database type.

Discussion:

No discussion yet. Open an issue and reference this module to start discussion.

Module List

See main page here