Shapefile

From Wikipedia, the free encyclopedia

The files of an ESRI shapefile shown in Windows Explorer
The files of an ESRI shapefile shown in Windows Explorer

The ESRI Shapefile is a popular geospatial vector data format for geographic information systems software. It is developed and regulated by ESRI as a (mostly) open specification for data interoperability among ESRI and other software products. A "shapefile" commonly refers to a collection of files with ".shp", ".shx", ".dbf", and other extensions on a common prefix name (i.e., "lakes.*"). The actual shapefile relates specifically to files with the ".shp" extension, however this file alone is incomplete for distribution, as it depends on the other supporting files.

Shapefiles spatially describe points, polygons, polylines. These, for example, could represent water wells, lakes and rivers, respectively. Each item may also have attributes, that describe the items, such as the name or temperature.

Contents

[edit] Overview

A Shapefile is a digital vector storage format for storing geometric location and associated attribute information. This format lacks the capacity to store topological information. The Shapefile format was introduced with ArcView GIS version 2 in the beginning of the 1990s. Since then it is now possible to read and write Shapefiles using a variety of free and non-free programs.

Shapefiles are simple because they store primitive geometrical data types of Points, Lines and Polygons. Alone, these primitives are relatively useless without any attributes to specify what the primitives represent. Therefore, a table of records will store properties/attributes for each primitive shape in the Shapefile. Shapes (points/lines/polygons) together with data attributes can create infinitely many representations about geographical data. Representation provides the ability for powerful and accurate computations.

[edit] File components

While a Shapefile must be considered as a whole, a "Shapefile" is actually a set of files. Three individual files are mandatory and these store the core data. There are a further 8 optional individual files which store primarily index data to improve performance. Each individual file should conform to the MS DOS 8.3 naming convention (8 character filename prefix, fullstop, 3 character filename suffix such as shapefil.shp) in order to be compatible with past applications that handle shapefiles. For this same reason, all files should be located in the same folder.

Mandatory files :

  • .shp - the file that stores the feature geometry
  • .shx - the file that stores the index of the feature geometry
  • .dbf - the database of attributes

Optional files :

  • .sbn and .sbx - store the spatial index of the features
  • .fbn and .fbx - store the spatial index of the features for shapefiles that are read-only
  • .ain and .aih - store the attribute index of the active fields in a table or a theme's attribute table
  • .prj - the file that stores the coordinate system information, using well-known text
  • .shp.xml - metadata for the shapefile
  • .atx - attribute index for the .dbf file in the form of <shapefile>.<columnname>.atx (ArcGIS 8 and later)

[edit] Shapefile format (.shp)

The main file (.shp) contains the primary reference data in the Shapefile. The file consists of a single fixed length header followed by one or more variable length records. Each of the variable length records includes a record header component and a record contents component. A detailed description of the file format is given in the ESRI Shapefile Technical Description. [1]

The main file header is fixed at 100 bytes in length and contains 17 fields (nine 4-byte fields and eight 8-byte fields):

bytes 0-3: File code (always hex value 0000270A)
bytes 4-23: (Unused)
bytes 24-27: File length
bytes 28-31: Version
bytes 32-35: Shape type (see below)
bytes 36-99: Bounding box as minimum and maximum values for X, Y, Z, M.

The variable length record header is fixed at 8 bytes in length and simply contains 2 fields with data for record number and content length.

The variable length record contents depend entirely upon the Shape Type (included in the main file header) for which there is a one to one correspondence.

Shape Type Value Fields
Null Shape 0 Shape Type
Point 1 Shape Type, X, Y
Polyline 3 Shape Type, Box, NumParts, NumPoints, Parts, Points
Polygon 5 Shape Type, Box, NumParts, NumPoints, Parts, Points
MultiPoint 8 Shape Type, Box, NumPoints, Points
PointZM 11 Shape Type, X, Y, Z, M
PolylineZ 13 Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, Points, Z range , Z array

Optional: M range, M array

PolygonZ 15 Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, Points, Z range, Z array

Optional: M range, M array

MultiPointZ 18 Mandatory: Shape Type, Box, NumPoints, Points, Z range, Z array

Optional: M range, M array

PointM 21 Shape Type, X, Y, M
PolylineM 23 Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, Points

Optional: M range, M array

PolygonM 25 Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, Points

Optional: M range, M array

MultiPointM 28 Mandatory: Shape Type, Box, NumPoints, Points

Optional Fields: M range, M array

MultiPatch 31 Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, PartTypes, Points, Z range, Z array

Optional: M range, M array

[edit] Attribute file format (.dbf)

Attributes for each shape are stored in the xBase (dBase) format, which has an open specification, found here. First shape in the file corresponds to the first record in the dbf and so on.

[edit] Spatial index file format (.sbn)

Part of ArcView's spatial index. In case this file is outdated, ArcView will not display the shapefile correctly. It will appear like a lot of features have been deleted. To recreate the spatial index in ArcView, do the following:

  • Go to the table
  • Select the Shape field
  • Select Field->Remove Index from the menu
  • Select Field->Create Index from the menu

To recreate the spatial index in ArcCatalog, do the following:

  • Right click on the shapefile and choose properties
  • Click the indexes tab
  • At the bottom, choose Delete to remove the index
  • At the bottom, choose add to recreate the index

[edit] Limitations

[edit] Topology and shapefiles

Shapefiles do not have the ability to store topological information. ArcInfo coverages and Personal/Enterprise Geodatabases do have the ability to store feature topology.

[edit] Spatial representation

The edges of a polyline or polygon are defined using points, which can give it a jagged edge. Additional points are required to give smooth shapes, which requires more data space. This is in contrast to the use of bézier curves, which can capture complexity using smooth curves, without using as many points. Currently, none of the shapefile types support bézier curves.

[edit] Data storage

Unlike most databases, the xBASE format is incapable of storing null values in its fields. This limitation can make the storage of data in the attributes less flexible. In ArcGIS products, values that should be null are instead replaced with a 0 (without warning), which can make the data misleading. This problem is addressed in ArcGIS products by using geodatabases, which are based on Microsoft Access.

[edit] External links