Creating and Publishing Metadata in Support of Geospatial One-Stop and the NSDI
Metadata has long been promoted by the Federal Geographic Data Committee (FGDC), state GIS coordinators, and project managers as a means to:
- preserve organizational data investments
- instill data accountability and liability, and
- facilitate data sharing.
With the implementation of the Geospatial One-Stop (GOS) federal E-gov initiative, metadata is established as the official language for national data development and exchange. To maximize the value of your metadata investment and become an active participant in the GOS Geodata.gov portal, consider the following:
Metadata Content
Going Beyond the Minimum
Minimal metadata is minimally useful. If you limit your metadata to the mandatory elements of the Content Standard for Digital Geospatial Metadata (CSDGM), then you have limited your metadata to those elements common to all data types and have not realized the value of metadata to capture that which is unique to the data set. Metadata producers must determine all metadata elements necessary to adequately characterize the data set and provide complete, current information for each element. When you create metadata that goes beyond the minimum, you create a data management resource that serves both your community and your own data management efforts.
Multiple Online_Linkage Values
As a ‘repeatable’ element, Online_Linkage (Citation Information) is used to provide access to a variety of data download, data clearinghouse, and web-mapping services. Use this field to fully represent your geospatial data access and distribution capabilities by providing complete URLs and necessary information to indicate the nature of the weblink using the following style guidance:
- OGC Web Map Service (WMS) links include a ‘getmap’ request with a layer name, version, preferred image format, and preferred SRS, at a minimum: http://server/service?REQUEST=getmap&VERSION=1.1.0&LAYERS=roads&FORMAT=image/gif&SRS=EPSG:4326
- ArcIMS “Image” services using a URL-like request. If you pasted this request in a browser you will not connect to an ArcIMS server since it does not permit this style of request, however it contains enough information to allow geodata.gov to connect to an ArcIMS service: http://<server>/image/<service_name> will be assigned as Live Map, ArcXML Image service, where server URL is <server> and service name is ArcIMS <service_name>. The sub-path “/image/” must be present in the URL
- direct download sites include URLs, that start with either ftp:// or http:// and point to filenames with .zip, .tar, .tgz, .gz, .dxf, or .e00 extensions.
Theme_Keywords Using ISO Topic Categories
The more robust your theme keyword list, the more likely it can be located by others (and yourself). Data discovery is further enabled by the use of standardized keyword lists and vocabularies (Theme_Keyword_Thesaurus). The FGDC and GOS utilize the International Organization for Standards Metadata Standard (ISO 19115) Topic Categories to organize data and services into data categories. A list of Topic Categories with definitions and examples is provided on page 7 of this document. When creating new metadata records, include one or more Topic Categories as Theme_Keywords and cite the Theme_Keyword_Thesaurus as ‘ISO 19115 Topic Categories’. Use the exact format of the Topic Category values (e.g. “biota”, see Appendix A for more detail) unless your metadata creation tool provides a pick list of ISO-based themes.
For existing metadata records, develop a strategy for incorporating one or more Topic Categories into each metadata record. For homogenous metadata collections (those that contain metadata records related to a single Topic Category term) a simple script can be written to insert the Topic Category term into the Theme_Keyword element of each record. For heterogenous metadata collections (those that contain metadata records relating to a variety of subjects) the task will be more challenging. For collections with few metadata records, the Topic Category terms can be inserted manually into the Theme_Keyword element. For more extensive collections, all existing Theme_Keywords can be output to a listing and, as suggested above, a lookup table developed that relates a set of Theme_Keywords to each applicable Topic Categoriy. Once the lookup table is developed, a script can be written to insert all applicable Topic Categories into the Theme_Keyword element of each record.
If
you are registering your metadata collection with geodata.gov to make your
metadata available for harvesting into the geodata.gov portal, the metadata
publisher registration process allows you to edit or associate a lookup table
that will automatically assign an ISO Topic Category value when the validator
encounters an equivalent term you have used in your Theme_Keyword field. However, if you use the geodata.gov
publisher registration option, the Topic Category term will not be added to
your metadata record but simply used to direct your information to the geodata.gov channel
dedicated to the associate Topic Category. While this enables your metadata to
be better utilized within geodata.gov, you have lost the additional benefit of
prepping
your records for easy translation to the eventual adoption of the
international (ISO) metadata standard.
Metadata Creation
Strategies
Create and Use Templates
Organizations are encouraged to create metadata templates that establish core content for all organizationally produced metadata. Templates should:
- outline all metadata elements deemed mandatory by the organization
- provide standardized language for access, use, and liability statements
- provide definitions and domains for standardized data layers
- establish standards and guidelines for metadata production and publishing.
In addition to documenting existing data resources NSDI participants are encouraged to use the CSDGM to document planned data acquisitions.
When creating metadata for planned data:
- indicate the Status element as ‘planned’
- provide a robust Abstract, including data type (vector, image, raster…), geographic location, and specifications (scale, film type, bands…)
- include a rich set of Theme / Place_Keywords, including the ISO Topic Categories as Theme Keyword values, as described above
The documentation of planned data acquisitions enables developers to leverage data development investments via partnerships. In FY05, all federal agencies are required to create metadata for data acquisition plans estimated at $500K or greater.
Publish Metadata via geodata.gov
To make your metadata records available via Geodata.gov you must either:
- publish your metadata collection to a metadata distribution server from which Geodata.gov can harvest,
- directly upload your XML formatted metadata to geodata.gov, or
- create your metadata online using the geodata.gov metadata publication tool.
Metadata Harvesting
What is metadata harvesting?
Metadata harvesting is an automated scheduled process for collecting new and updated metadata from a wide variety of GIS metadata sources. The process of harvesting allows geodata.gov to synchronize its metadata with publishers metadata. If you participate in metadata harvesting, any update to your metadata should be made on your metadata repository. geodata.gov will obtain the update through harvesting.
If you have registered on geodata.gov as a publisher and would like to participate in harvesting, you need to update your publisher information.
Currently geodata.gov can harvest FGDC-compliant metadata from four different type of harvesting protocols: (1) Z39.50 metadata clearinghouse node, (2) ArcIMS metadata service, (3) Web Accessible Folder (WAF), and (4) Open Archive Initiative (OAI) metadata service.
Requirements
For a metadata record to be successfully harvested by geodata.gov, the following must be present in some form:
- Document unique ID - Document unique ID in each
metadata is required to determine if a document is new to geodata.gov. If your
metadata clearinghouse is a Z39.50 type, you need to verify if document unique
ID has been implemented in each metadata document. For Isite, please check the
Isite distribution to obtain the new release of Isite (Isite Vers. 2.10) which
implements document unique ID and update date.
Document unique ID is not an issue if your repository is a WAF, OAI metadata service, or an ArcIMS metadata service because these services already handle and expose a unique identifier for each document in the collection. - Update date - Once your repository been has harvested, the next harvesting will only look for metadata documents that are updated since the last harvesting date. In all cases the update date should be reflected in the “Metadata Date” field of CSDGM metadata.
- Keywords - Keywords are used to correctly
categorize your metadata. geodata.gov uses standard theme keywords as specified
in the ISO 19115. Without standard keywords, your metadata will still be
published and searchable in the geodata.gov repository but it will not be
categorized in one of the data categories.
You can submit standard keywords using one of the following methods:
- Insert the theme keyword with a standard keyword in the metadata
- Provide a lookup table in the harvesting registration process that translates your localized keywords into standard keywords.
- Register on geodata.gov - Before your metadata repository can be harvested, you need to register as a publisher, read and accept the publisher disclaimer, specify the type of harvesting in the publishing registration form, and provide keywords.
Metadata harvesting in geodata.gov is performed in three steps:
- Harvesting – based on information provided during the registration, geodata.gov will connect to your metadata repository, retrieve all metadata records if it is the first time harvesting, or only the updated records since the last harvesting date. You need to verify that the date of creation or last update is stored in your metadata (Metadata Date).
- Validation – during validation, each metadata record will be examined to meet the minimum requirements (see the list at the end of this description). The validation function will recognize only the FGDC tags. You can access your validation report via the harvesting history function.
- Publishing – during publishing, all successfully validated metadata will be published in the geodata.gov. If the same document (as indicated by document unique ID) already exists, then the existing document will be updated. Otherwise the document will be inserted in geodata.gov as a new document. Once the metadata is published, it will be searchable from geodata.gov.
- Data Type Assignment – during validation, metadata records that include Online_Linkage values will be automatically assigned to a specific ‘Data Type’ based upon the URL provided.
Types of harvesting protocols
geodata.gov supports four types of harvesting protocol:
- Z39.50
Metadata Clearinghouse http://www.fgdc.gov/clearinghouse/clearinghouse.html. If your repository is a Z39.50 metadata clearinghouse, you need to verify whether or not a document
unique ID and update date are implemented in each metadata document. For Isite,
please check the Isite distribution location (http://clearinghouse4.fgdc.gov/ftp)
to get the Version 2.10 release of Isite that implements document unique ID and
update date. For SMMS GeoConnect, Blue Angel, Compusult MetaManager and other
Z39.50 software providers, contact your distributor for information regarding
unique ID and update date capabilities. To register your Z39.50 node to be
harvested, you need to provide the URL, port number and database name.
Note: If your collection is less than 200 records, consider establishing a Web Accessible Folder (as described in Item 3. below) as an alternative to implementing the unique ID and update date features. - ArcIMS
Metadata Service http://www.esri.com/software/arcims/overview.html. If you currently maintain and serve your metadata using an ArcIMS metadata
service, you will need to specify the URL, service name, and if applicable, the
username and password to browse metadata.
- Web Accessible
Folder (WAF). You can
participate in geodata.gov harvesting by simply locating your metadata in XML
files on a WAF. A WAF is a directory on the WWW where a Web browser can browse
the content of the directory. It may not contain a default.html or index.htm
file. To register your WAF to be harvested, you only need register the URL. It
is recommended, but not required, that you also include html versions of the
metadata records within the WAF to support discovery by search engines such as
Google.
- Open Archive Initiative (OAI) Metadata Service http://www.openarchives.org/. If you maintain and serve your metadata using OAI Protocol for Metadata Harvesting, you need to provide the URL, set name and metadata prefix.
Metadata Upload
If you do not have access to any form of metadata distribution server as described above, you can upload your XML formatted metadata records directly to geodata.gov. You must first register at geodata.gov then select the ‘Upload Metadata’ option. Uploads can be done individually by record or as a group in batch mode. Please note that if you utilize the upload option, your published metadata is not linked to your resident metadata in any manner and updates to the metadata record must be uploaded manually as they cannot be automatically updated by geodata.gov.
Metadata Direct Entry
If you do not have access to metadata creation software/editor, or you have very few records to contribute to geodata.gov, you can utilize the online metadata creation tool provided at geodata.gov. As with the other metadata publishing options, you must first register as a metadata publisher at the geodata.gov site then select the ‘Create Metadata’ option. Note that the metadata will then be stored at geodata.gov and all updates must be made via geodata.gov. Also, the geodata.gov online metadata creation tool is intended as an easy to use interface for the collection of those metadata elements necessary for data discovery. As such metadata records created using the online tool will be limited in their use as data archive and data management resources.
Appendix A
Preparing
for the international metadata standard:
Theme Keywords and the ISO Topic Categories
The International Organization for Standards (ISO) metadata standard (ISO 19115) provides a set of Core metadata elements that must occur in every national profile/implementation. Most of these elements either map to existing CSDGM metadata elements or represent properties of the data that can be determined and populated using a data integrated metadata tool. Topic Category is the only mandatory element of the ISO core metadata set that requires new information that cannot be directly captured from the data. The following 19 subject headings represent the domain for the Topic Category element.
If your metadata creation software provides a pick list of Topic Category related terms simply select the pick list terms that apply and the software will insert the related Topic Category Name and/or Code. If creating data using the Geodata.gov metadata publisher, you will be asked to select a Primary Theme. The Primary Theme options are based upon the ISO Topic Categories below but the names have been altered to provide greater context, e.g., Geodata.gov Primary Theme ‘Cultural, Society, and Demographic’ will be captured in the Theme_Keyword metadata element as ISO Topic Category Name ‘Society’.
If your metadata creation software does not provide a list of subject headings based upon the ISO 19115 Topic Category, include the Topic Category Names (as presented below) as Theme_Keywords and cite your related Theme_Keyword_Thesaurus as: ‘ISO 19115 Topic Category’. The FGDC intends to develop CSDGM to ISO translation software that will insert the Topic Category Code when the Topic Category Name is found, however, those wishing to include the Topic Category Code as a Theme_Keyword can do so using the same Theme_Keyword_Thesaurus: ‘ISO 19115 Topic Category’.
Include all pertinent Topic Category Names, e.g.,:
business districts = boundaries and economy
toxic release inventory = environment and health
soil fertility = geophysical and farming
ISO Topic Category
Name Code
farming 001 rearing of animals and/or cultivation of plants
Examples: agriculture, irrigation, aquaculture, plantations, herding, pests and diseases affecting crops and livestock
biota 002 flora and/or fauna in natural environment
Examples: wildlife, vegetation, biological sciences, ecology, wilderness, sea life, wetlands, habitat, biological resources
boundaries 003 legal land descriptions
Examples: political and administrative boundaries, governmental units, marine boundaries, voting districts, school districts, international boundaries
climatologyMeteorologyAtmosphere 004 processes and phenomena of the atmosphere
Examples: cloud cover, weather, climate, atmospheric conditions, climate change, precipitation
economy 005 economic activities, conditions, and employment
Examples: production, labor, revenue, business, commerce, industry, tourism and ecotourism, forestry, fisheries, commercial or subsistence hunting, exploration and exploitation of resources such as minerals, oil and gas
elevation 006 height above or below seal level
Examples: altitude, bathymetry, digital elevation models, slope, derived products, DEMs, TINs
environment 007 environmental resources, protection and conservation
Examples: environmental pollution, waste storage and treatment, environmental impact assessment, monitoring environmental risk, nature reserves, landscape, water quality, air quality, environmental modeling
geoscientificInformation 008 information pertaining to earth sciences
Examples: geophysical features and processes, geology, minerals, sciences dealing with the composition, structure and origin of the earth’s rocks, risks of earthquakes, volcanic activity, landslides, gravity information, soils, permafrost, hydrogeology, groundwater, erosion
health 009 health, health services, human ecology, and safety
Examples: disease and illness, factors affecting health, hygiene, substance abuse, mental and physical health, health services, health care providers, public health
imageryBaseMapsEarthCover 010 base maps
Examples: land/earth cover, topographic maps, imagery, unclassified images, annotations, digital ortho imagery
intelligenceMilitary 011 military bases, structures, activities
Examples: barracks, training grounds, military transportation, information collection
inlandWaters 012 inland water features, drainage systems and
characteristics
Examples: rivers and glaciers, salt lakes, water utilization plans, dams, currents, floods and flood hazards, water quality, hydrographic charts, watersheds, wetlands, hydrography
location 013 positional information and services
Examples: addresses, geodetic networks, geodetic control points, postal zones and services, place names, geographic names
oceans 014 features and characteristics of salt water bodies
(excluding inland waters)
Examples: tides, tidal waves, coastal information, reefs, maritime, outer continental shelf submerged lands, shoreline
planningCadastre 015 information used for appropriate actions for future use of
the land
Examples: land use maps, zoning maps, cadastral surveys, land ownership, parcels, easements, tax maps, federal land ownership status, public land conveyance records
society 016 characteristics of society and culture
Examples: settlements, housing, anthropology, archaeology, education, traditional beliefs, manners and customs, demographic data, tourism, recreational areas and activities, parks, recreational trails, historical sites, cultural resources, social impact assessments, crime and justice, law enforcement, census information, immigration, ethnicity
structure 017 man-made construction
Examples: buildings, museums, churches, factories, housing, monuments, shops, towers, building footprints, architectural and structural plans
transportation 018 means and aids for conveying persons and/or goods
Examples: roads, airports/airstrips, shipping routes, tunnels nautical charts, vehicle or vessel location, aeronautical charts, railways
utilitiesCommunication 019 energy, water and waste systems and communications infrastructure
and services
Examples: hydroelectricity, geothermal, solar and nuclear sources of energy, water purification and distribution, sewage collection and disposal, electricity and gas distribution, data communication, telecommunication, radio, communication networks
Appendix B
Required FGDC XML Tags and Validation Rules
- Data Originator
Tag: /metadata/idinfo/citation/citeinfo/origin/
Rule: not null
Domain: ”unknown” or free text - Data Title
Tag: /metadata/idinfo/citation/citeinfo/title/
Rule: not null
Domain: free text - Abstract
Tag: /metadata/idinfo/descript/abstract/
Rule: not null
Domain: free text - Progress
Tag: /metadata/idinfo/status/progress
Rule: not null
Domain: “complete”, “in work”, “planned” - West Bounding
Coordinate
Tag: /metadata/idinfo/spdom/bounding/westbc/
Rule: not null
Domain: number between (-180.00) and (180.00) - East Bounding
Coordinate
Tag: /metadata/idinfo/spdom/bounding/eastbc/
Rule: not null
Domain: number between (-180.00) and (180.00) - North Bounding
Coordinate
Tag: /metadata/idinfo/spdom/bounding/northbc/
Rule: not null
Domain: number between (90.00) and (-90.00) - South Bounding
Coordinate
Tag: /metadata/idinfo/spdom/bounding/southbc/
Rule: not null
Domain: number between (90.00) and (-90.00) - Theme Keyword
Tag: /metadata/idinfo/keywords/theme/themekey/
Rule: Not null
Domain: free text - Metadata Contact Organization
Tag:
/metadata/metainfo/metc/cntinfo/cntorgp/cntorg/
Rule: not null if Metadata
Contact Person is null
Domain: free text
- Metadata Contact
Person
Tag: /metadata/metainfo/metc/cntinfo/cntperp/cntper/
Rule: not null if Metadata Contact Organization is null
Domain: free text - Metadata Contact Address City
Tag: /metadata/metainfo/metac/cntinfo/cntaddr/city/
Rule: not null
Domain: free text - Metadata Contact
Address State or Province
Tag: /metadata/metainfo/metac/cntinfo/cntaddr/state/
Rule: not null
Domain: free text - Metadata Contact
Address Postal Code
Tag: /metadata/metainfo/metac/cntinfo/cntaddr/postal/
Rule: not null
Domain: free text
Insertions
- Publication Date
Tag: /metadata/idinfo/citation/citeinfo/pubdate/
Rule: if null, insert ‘unknown’
Domain: ”unknown”, ”unpublished material” or free date
Date Format: YYYYMMDD (YYYY minimum) - Purpose
Tag: /metadata/idinfo/descript/purpose/
Rule: if null, insert ‘none provided’
Domain: free text - Time Period of
Content: Single Date
Tag: /metadata/idinfo/timeperd/timeinfo/sngdate/caldate
Rule: if null and if Range of Dates and Multiple Dates are null,
insert ‘unknown’
Domain: “unknown” or free date
Date Format: YYYYMMDD (YYYY minimum) - Time Period of
Content: Range of Dates, Beginning Date
Tag: /metadata/idinfo/timeperd/timeinfo/rngdates/begdate/
Rule: if Ending Date is not null, insert ‘unknown’,
Domain: “unknown” or free date
Date Format: YYYYMMDD (YYYY minimum) - Time Period of
Content: Range of Dates, Ending Date
Tag: /metadata/idinfo/timeperd/timeinfo/rngdates/enddate/
Rule: if Beginning Date is not null, insert ‘unknown’,
Domain: “unknown” or free date
Date Format: YYYYMMDD (YYYY minimum) - Currentness Reference
Tag: /metadata/idinfo/timeperd/current/
Rule: if null, insert ‘unknown’
Domain: free text - Maintenance and Update
Frequency
Tag: /metadata/idinfo/status/update
Rule: if null, insert ‘unknown’
Domain: free text - Theme Keyword Thesaurus
Tag: /metadata/idinfo/keywords/theme/themekt/
Rule: if null, insert ‘none’
Domain: free text - Access
Constraints
Tag: /metadata/idinfo/accconst/
Rule: if null, insert ‘unknown’
Domain: free text - Use Constraints
Tag: /metadata/idinfo/useconst/
Rule: if null, insert ‘unknown’
Domain: free text - Metadata Contact Address Type
Tag: /metadata/metainfo/metc/cntinfo/cntaddr/addrtype/
Rule: if null, insert ‘unknown’
Domain: free text - Metadata Contact Phone number
Tag: /metadata/metainfo/metc/cntinfo/cntvoice/
Rule: if null, insert ‘unknown’
Domain: free text - Metadata Date
Tag: /metadata/metainfo/metd
Rule: if null, insert harvest date
Domain: free date
Date Format: YYYYMMDD (YYYY
minimum)
Appendix C
Sample XML Metadata Record with FGDC Essential Elements
<?xml version="1.0" encoding="ISO-8859-1" ?>
- <metadata>
- <idinfo>
- <citation>
- <citeinfo>
<origin>Louisiana State University Coastal Studies Institute</origin>
<pubdate>20010907</pubdate>
<title>Geomorphology and Processes of Land Loss in Coastal Louisiana, 1932 –
1990</title>
</citeinfo>
</citation>
- <descript>
<abstract>A raster GIS file that identifies the land loss process and geomorphology associated with each 12.5 meter pixel of land loss between 1932 and 1990. Land loss processes are organized into a hierarchical classification system that includes subclasses for erosion, submergence, direct removal, and undetermined. Land loss geomorphology is organized into a hierarchical classification system that includes subclasses for both shoreline and interior loss.</abstract>
<purpose>The objective of the study was to determine the land loss geomorphologies associated with specific processes of land loss in coastal Louisiana.
</purpose>
</descript>
- <timeperd>
- <timeinfo>
- <rngdates>
<begdate>1932</begdate>
<enddate>1990</enddate>
</rngdates>
</timeinfo>
<current>ground condition</current>
</timeperd>
- <status>
<progress>Complete</progress>
<update>None planned</update>
</status>
- <spdom>
- <bounding>
<westbc>-92.000057</westbc>
<eastbc>-88.81416</eastbc>
<northbc>30.498417</northbc>
<southbc>28.914905</southbc>
</bounding>
</spdom>
- <keywords>
- <theme>
<themekt>ISO 19115 Topic Category</themekt>
<themekey>biota</themekey>
</theme>
- <theme>
<themekt>none</themekt>
<themekey>land loss</themekey>
<themekey>wetlands</themekey>
<themekey>geomorphology</themekey>
<themekey>landscape ecology</themekey>
</theme>
</keywords>
<accconst>none</accconst>
<useconst>The metadata should be read completely prior to use of the data set.
Data were collected and compiled as 12.5 meter pixels and should not be extended beyond the reasonable limits of the resolution. This is not a survey data product and should not be utilized as such.</useconst>
</idinfo>
- <metainfo>
<metd>20010907</metd>
- <metc>
- <cntinfo>
- <cntorgp>
<cntorg>Louisiana State University Coastal Studies Institute</cntorg>
</cntorgp>
- <cntaddr>
<addrtype>mailing and physical address</addrtype>
<city>Baton Rouge</city>
<state>LA</state>
<postal>70803</postal>
</cntaddr>
<cntvoice>(225) 578-2395</cntvoice>
</cntinfo>
</metc>
<metstdn>FGDC Content Standards for Digital Geospatial Metadata</metstdn>
<metstdv>FGDC-STD-001-1998</metstdv>
</metainfo>
</metadata>