National Oceanographic Data Center

From Wikipedia, the free encyclopedia

The National Oceanographic Data Center (NODC) manages the acquisition, ingest processing, quality control and long-term preservation of oceanographic data. Oceanographic data in digital form is sorted, categorized and assigned unique identification numbers at ingest (non-digital data and information are normally incorporated in the NOAA Library, but are always considered for future conversion to digital form to support modern access and retrieval methodologies). The data are scanned for computer viruses, and cryptographic checksums are generated and stored with the original data files so data integrity can be monitored and verified over extended time periods and across generations of storage technologies. A copy of the data is written to removable media for off-site storage. A number of data products are derived from the NODC data holdings to extend the utility of the data and information.

Contents

[edit] History

Established in 1961, the NODC was originally an interagency facility administered by the U.S. Naval Hydrographic (later Oceanographic) Office. The NODC was transferred to NOAA in 1970 when NOAA was created by Executive Order.

[edit] Mission

In the words of its charter, the NODC serves to "acquire, process, preserve, and disseminate oceanographic data." Its primary mission is to ensure that global oceanographic data collected at great cost is maintained in a permanent archive that is easily accessible to the world science community and to other users.

More specifically, the NODC serves the U.S. with data and information for understanding the ocean. NODC archives and provides public access to oceanographic observational data and products, provides scientific oceanographic data services, and conducts assessments of the ocean environment. NODC also operates NOAA's Central and Regional Libraries. These libraries provide environmental references services that support NOAA research, and other technical information retrieval services to NOAA staff; and maintain the official archives for NOAA documents. Internationally, NODC hosts the World Data Center for Oceanography, Silver Spring under the auspices of the International Council of Scientific Unions and the U.S. National Academy of Sciences.

[edit] Infrastructure

[edit] Storage

The NODC archive holdings include all the data acquired in its original form, as well as project and product files of data extractions. Every data acquisition is assigned a unique identification number to be used as a lifetime reference to that data. All data are passed through context verification prior to entry into the NODC data holdings. Also, checksum and byte count values are computed to tag to the data for continuous validation and verification processes used to maintain the integrity of the data. Metadata describing each acquisition are appended to new data for internal record management. Each unique data set referenced in the NOAA and NNDC server systems contains a metadata description, to aid in search and discovery processes.

A copy of all NODC digital data holdings is maintained in an online storage system for unattended access. A backup copy is preserved off-site for disaster recovery purposes. Another periodically refreshed backup copy is maintained for internal system recovery. Periodic data migration is required to mitigate system and media form-factor obsolescence.

[edit] Network infrastructure

The network infrastructure at NODC is based on Transmission Control Protocol/Internet Protocol (TCP/IP) and currently operates at 10/100/1000 Mbit/s internally. External connection from NODC to the campus network and to external campus or Internet destinations currently passes through a 100 Mbit/s network switch. The NODC network infrastructure is currently segregated into a number of public and private subnets by way of packet filtering firewalls that also provide Network Address Translation (NAT). Additionally, various automated intrusion detection systems are operated to monitor and report unauthorized connection attempts to NODC systems. NODC provides public access servers to support standard Internet protocols including, Simple Mail Transfer Protocol (SMTP), Hypertext Transfer Protocol (HTTP) and File Transfer Protocol (FTP). NODC also develops and provides tools to support public submission of data for archival storage at NODC, and public online access to certain archived datasets.

NODC operates many desktop systems running operating systems including Unix variants (Solaris, Linux, Irix), Microsoft Windows variants (98, XP, 2000) and Mac OS. These workstations are used for day-to-day operations in data ingest, maintenance, quality assurance, customer services, application development, web publishing, and for personnel productivity.

NODC currently operates a number of Digital Linear Tape (DLT) and IBM 3590 tape media systems for backup and archive and for off-line storage. NODC operates automated systems to perform regular virus scans of data in storage both during ingest and periodically thereafter, and also to generate cryptographic checksums of all digital archive data so that its integrity can be verified at any time. This allows for a high degree of confidence that any data corruption due to media failure or accidental or intentional destruction can be easily detected so data can be recovered from off-line backup media. It also allows for data migration across future generations of storage media and systems with a high degree of confidence in the inherent data integrity.

NODC also operates a number of high-capacity RAID disk storage systems to support data ingest, working storage, online products and database search and retrieval systems.

Tape jukebox systems are maintained at NODC to provide automated and manual backup facilities supporting the NODC workstations and servers as well as near-line mass storage jukeboxes (the backup copy results in a third copy of the original data). These systems currently use DLT technology and run under the control of Legato Networker backup software. Tape media systems are maintained to support data retrieval from legacy tape formats such as 9-track, and to copy data to IBM 3590 tape media for offsite, deep-storage archive. Other tape systems are provided for backup and restoration of database and critical data and information servers.

[edit] NODC data access systems

Various projects at NODC also provide online, public Internet access to certain data sets and products via ftp and CD-ROM or other media. Access to data and information, which include both metadata and browsing programs, are generally available at no cost. Customized data products are available on a cost recovery basis. Electronic commerce capabilities have been implemented to support unattended around-the-clock access, retrieval and delivery of data and information. At the same time, manual assistance is provided during normal business hours to assist in the understanding of NODC databases, and for responding to requests for custom products, or navigating through NODC's vast data holdings. Science discipline specialists are available to promote expanded acquisition and utilization of data, and to offer interpretive assistance.

Access is the key element in the data archive paradigm, as data are only valuable if they can be acquired. As the number of data sets and data volumes increase dramatically as anticipated with projected future programs and technologies, dynamism of the search and discovery processes supporting access will determine the measure of data utility. Most NODC data can be accessed through the Internet either directly through the NODC home page or via pointers served by the NOAA and NNDC Server systems. New capabilities have been installed to enable users to find and access the NODC data holdings and to download desired records/files and establish payment for the associated cost of recovery through an online e-commerce transaction.

NODC scientists become involved in or participate in research projects to develop and maintain a comprehensive understanding of the data and information contained within the archive record. Center scientists also develop analyses and content assessment products, data visualization tools, and data description tools to expand the utility of the data and information holdings to a larger more diverse community (now connected through the Internet). The assessment processes to determine the measure of integrity and continuity, where appropriate, are appended to the data throughout its life. These processes are part of the data accountability for archiving data. Data visualization and description tools are outreach processes. In a commercial environment where the centers must recover the cost of reproduction and marketing, these efforts are leading to data mining techniques to be able to extract event data (or categorize data) to aid users in the data discovery processes. This is especially necessary as the data volumes and diversity of data types expand to incomprehensible levels. The accountability of information processed from the original data is included in the metadata maintained on the NODC servers. Visualization and descriptive products are used for a variety of purposes from publications and journal articles to web page viewing tools, to marketing data availability.

[edit] Quality controlled data products

The National Oceanographic Data Center holds global physical, chemical, and biological oceanographic data sets that are used by researchers worldwide. Specifically NODC's Ocean Climate Laboratory is investigating interannual-to-decadal ocean climate variability using historical oceanographic data, and building scientifically, quality-controlled global oceanographic databases in their products known as the World Ocean Atlas and World Ocean Database.

[edit] Ocean Climate Laboratory

The Ocean Climate Laboratory (OCL) is a division of the National Oceanographic Data Center (NODC). The primary objectives of the OCL are to:

  • improve the quality of the NODC's oceanographic data archives by using the data to perform scientific analyses
  • develop improved ocean climatologies for annual, seasonal, and monthly compositing periods
  • investigate interannual-to-decadal ocean climate variability using historical oceanographic data
  • build scientifically, quality-controlled global oceanographic databases
  • facilitate international exchange of oceanographic data.

The OCL includes the International Data Exchange Group that conducts programs related to international affairs and oceanographic data exchange and also supports World Data Center for Oceanography, Silver Spring (WDC). Operated under the auspices of the U.S. Academy of Sciences, WDC is one of the U.S. discipline subcenters within the World Data Center system. There are two other World Data Centers for Oceanography, World Data Center-B for Oceanography, Obninsk, Russia and World Data Center- D, Oceanography, Tianjin, People's Republic of China.

The OCL directs the international Global Oceanographic Data Archaeology and Rescue (GODAR) Project. Initiated by the NODC and WDC, this project was subsequently endorsed by the Intergovernmental Oceanographic Commission. The GODAR Project has resulted in an increase of over six million historical ocean temperature profiles, 140,000 chlorophyll profiles, 1,400,000 plankton observations, as well as many other data.

[edit] Operational ocean observing systems

NODC has several ocean-data Web applications that can be accessed through the Internet, either directly through the NODC home page or via pointers served by the NNDC Server system. Some of these observing systems contain near-realtime data, while others are long-term archives for a certain project. NODC has the following observing systems:

[edit] See also

[edit] External links