BugFlow Logo


a community resource for entomological collection digitization

BugFlow Main Page View on GitHub

Module 1D: Proactive Data Capture: In Museum

Module Purpose:

This module outlines what data should be captured when capturing data from specimens (and associated labels or field books) in a museum, including metadata not explicitly included on labels.

Module Keywords:

pre-digitization, transcription, metadata

TaskID Task Name Explanations and Comments Resources
T1 Gather representative specimens Make sure to include examples of specimens that have very little data all the way up to very large amounts of data. Typically this will include very old specimens (data-poor) and very new specimens (data-rich). Specimen handling Label handling
T2 Gather associated para-collection materials Find and include in your analysis any para-collection materials including field books, accession books, non-digital photographs, legacy databases. Collection-specific historical & legacy organization system(s)
T2 Label field discrimination Identify typical groups of fields that are included on labels associated with specimens or para-collections. Generally, these fall into “Collecting event” (DWC Occurrence, DWC Locality, & DWC Event; “Identification”, and “Other” (fields that do not fit DWC terms or otherwise need recorded). Darwin-Core Label handling
T3 Collection field discrimination Identify typical fields that need to be recorded that are implied by the specimen or para-collections, but are not included on the label. This includes fields like how the specimen is prepared (DWC “preparations”, number of specimens (DWC “organismQuantity” or “individualCount”, DWC “disposition”, etc …  
T4 Field naming and controlled vocabulary Decide upon format, names, and controlled vocabulary of these fields. This includes deciding whether to include verbatim transcription as separate fields, whether to split out certain fields like date, time, as multiple fields, etc … When in doubt, go by recommendations of DWC or do what someone else has done. Darwin-Core
T6 Image capture decision Decide whether images will be captured from specimens or labels. If images are not going to be captured, skip to T8. Imaging
T7 Image metadata capture decision Decide what metadata will be captured related to images. Consider following best practices of image metadata. Generally this includes things like who captured images, who edited images, when the images were captured, what project they are related to Dublin-Core
T8 Field assessment Do T1-T7 for a variety of specimens of various data quality and check if any fields have been missed.  

Essential Training:

Specimen handling
Label handling
Data recording best practices
General collections organization
Collection-specific historical & legacy organization system(s) Collection oddities, quirks Darwin Core Dublin Core

Module Metrics, Costing, and Reporting:

Number of records per hour
Number of images per hour Number of specimens damaged during digitiization Number of field book pages captured per hour

Outreach Opportunities:

T2 Transcription and images of para-collections materials can be crowdsourced T2 gathering individual “projects” together can lead to a great story T6 Images can be used in outreach

Exemplar Workflows:

See Bugflow workflows folder.


No discussion yet. Open an issue and reference this module to start discussion.

Module List

See main page here