next up previous 62
Next: Polarimetry Example
Up: Starlink Standard Data Structures
Previous: HISTORY Structure


The Extensible n-Dimensional-Data Format

The most-common structure for data that are not instrument specific is what has become known as the bulk-data frame. To avoid confusion with the Interim Environment's BDF, the new Starlink standard for storing bulk-data frames is called the Extensible $n$-Dimensional-Data Format (NDF for short). It has no specific HDS NAME because a container file may have several $<$NDF$>$ structures at a given level. It has an optional TYPE of $<$NDF$>$ that will not be tested by general-purpose applications but is recommended to assist recognition by human readers of structure listings. NDFs may be structured recursively--see the polarimetry example below, for example.

It was not, in fact, possible to keep strictly to the rules in Creating New Structures when designing the NDF structure; compromises were necessary in order to allow old Asterix and Wright-Giddings-formatted data, of which there is a great deal, to be processed by the new general-purpose applications. The NDF structure comprises a title, a data array and its associated objects (a [DATA_ARRAY] structure in the Wright-Giddings terminology), axis information, history and one or more registered named objects containing application-specific components. Note that everything at the top level is intended to be under Starlink control, and although general-purpose applications will (for an initial period) tolerate non-standard components at this level, such rogue objects will not be processed beyond being copied to the same place within an output structure. This Starlink-components-only restriction, which does not preclude extensibility (done through the MORE objects), simplifies the job of applications, relieving them of the responsibility of keeping track of arbitrary numbers of extra objects. It is recommended that if an application detects the presence of a rogue object it should display a warning message, to alert the user to take some action (for example to run the appropriate format conversion utility).


Table: Components of the Extensible $n$-Dimensional-Data structure
Component Name TYPE Brief Description
[VARIANT] $<$_CHAR$>$ variant of the $<$NDF$>$ type
[TITLE] $<$_CHAR$>$ title of $<$NDF$>$
[DATA_ARRAY] $<$various$>$ NAXIS-dimensional data array
[LABEL] $<$_CHAR$>$ label describing the data array
[UNITS] $<$_CHAR$>$ units of the data array
[VARIANCE] $<$s_array$>$ variance of the data array
[BAD_PIXEL] $<$_LOGICAL$>$ bad pixel flag
[QUALITY] $<$various$>$ quality of the data array
[AXIS(NAXIS)] $<$AXIS$>$ axis values, labels, units and errors
[HISTORY] $<$HISTORY$>$ history structure
[MORE] $<$EXT$>$ extension structure

[VARIANT]
Specifies which sort of $<$NDF$>$ structure. The variant must be one of the registered strings, of which only `SIMPLE' is currently available.
[TITLE]
A title for the data which may be used to annotate plots and listings, and which will help identify the NDF. (A single line of text will obviously be too brief to describe the contents of a dataset in detail, but will be useful for display purposes.)
[DATA_ARRAY]
This is the primary $n$-dimensional array of data values. It is the ONLY obligatory structure element. The [DATA_ARRAY] can be present in one of these forms:
  1. A $<$narray$>$. This primitive form, i.e. just a $<$numeric$>$ array of numbers, is available to give compatibility with the old Wright-Giddings proposals. Although its use is discouraged for new applications, it is recommended that general-purpose applications propagate the $<$narray$>$ format, as input, rather than convert to an $<$ARRAY$>$ structure.
  2. A $<$c_array$>$.
[LABEL]
This is a textual description for the kind of quantity stored in the [DATA_ARRAY] array.
[UNITS]
This is a textual description for the units in which the data values are given. If more than one NDF is being processed, the various [UNITS] text may be tested for equality. Should they prove unequal, the application must inform the user, who then may have an opportunity to permit processing to continue; however, [UNITS] would not under these circumstances be propagated to the output NDF if any.
[VARIANCE]
This is used to store the variance of the errors associated with [DATA_ARRAY]. It is used for computing symmetric error bars. The array dimensions must correspond to those of the [DATA_ARRAY] component. If all values in the data array have the same error, this can be represented by the scalar option. Other, more complex, forms of error representation (e.g. asymmetric errors) can be stored in specialised extensions, yet to be defined.
[BAD_PIXEL]
If this is false, applications may assume that [DATA_ARRAY] and [VARIANCE] contain no magic-value pixels. If it is either true or absent, applications must either test for magic-value pixels or--if incapable of performing bad-pixel processing--give up.
[QUALITY]
The data-quality values for the corresponding elements of [DATA_ARRAY]. Its TYPE is either $<$narray$>$ or $<$QUALITY$>$. Note that the array can be stored in a sparse variant $<$ARRAY$>$ structure; however, the dimensions of the sparse array must correspond to those of the [DATA_ARRAY] component, and the actual or equivalent primitive type for the data-quality values must be $<$_UBYTE$>$. The $<$narray$>$ option is to allow compatability with existing data in Wright-Giddings format; there was no [BADBITS] flag in the Wright-Giddings format, and so when such existing data are processed by general-purpose applications non-zero data quality will not be interpreted as bad pixels as would have occurred formerly.
NAXIS
The dimensionality of the [DATA_ARRAY], and therefore the number of elements (structures) in the AXIS array (of structures).
[AXIS]
This is an array of $<$AXIS$>$ structures, where [AXIS($n$)] corresponds to the $n^{\rm th}$ dimension of the [DATA_ARRAY]. If [AXIS] is not present the pixel index is used, starting from the associated value(s) of [DATA_ARRAY.ORIGIN], or 1 if the origin data object does not exist. If a simple pixel index is required, then the [AXIS] should be omitted from the $<$NDF$>$.
[MORE]
This is a wrapper containing extensions; it is not itself an extension. The extensions and their components are outside the scope of the NDF definition, and they will be defined separately and in many cases will belong to specific applications packages. Each extension must have a unique NAME, by which it is recognised. Its TYPE may be any one of the Starlink-defined standard TYPEs, or may a new one defined according to rules in Creating New Structures. Each extension (with the NAME, TYPE and variants) must be registered with the Starlink Head of Applications. Further NDFs may be located within these structures, and these may in turn contain extensions. To reduce the task of registration, and to minimise the risk of clashes, it is strongly recommended that one structure per application package be used rather than multiple minor items. It is also recommended that hierarchical structuring be used within extensions (rather than just `flat' lists of components) so as to group related data objects, e.g. by processing or instrument.

Notes:

  1. Locating the data array.

    General-purpose applications expecting an $<$NDF$>$ structure should be prepared to process the data array of Wright-Giddings formats as well. Also, it should not matter in either case whether the name of the structure containing the data array or the name of the data array itself is supplied by the user. However, only when the name of the $<$NDF$>$ structure is given can other data objects in the NDF be processed, because of the no-tree-walking rule. An outline algorithm to achieve the required functionality is:

    
    Given name of object
    
    Find its type
    if (type not primitive) then
    if (type not $<$c_array$>$) then
    if (type not $<$NDF$>$)then
    issue warning but proceed
    endif
    look for [DATA_ARRAY]
    if ([DATA_ARRAY] not found) then
    No data processed
    Exit
    else
    Search for other required items
    endif
    endif
    endif
    Process [DATA_ARRAY].

  2. Accessing part of a [DATA_ARRAY]

    Some general-purpose applications will need to be able to access subsets of a data array. The problem is twofold: first, the method of implementation needs to be specified, and second, the representation of each axis must be identified. An example is a general image-display routine which expects to be supplied a two-dimensional image but which is instead given a three-dimensional data cube. Such an application must have a means to select the whole or part of a slice from the cube. One method is simply to use two applications one after the other: first run MANIC (a KAPPA application) on the input data array to create a new dataset containing the required data; and second, run the required processing application on those extracted data. However, this means extra work for the user, and extra scratch space requirements, and in the case of frequently-used applications it will be more natural to provide the necessary `slicing' capability directly. In these cases, applications will be able to exploit MANIC's component subroutines, which will first obtain the parameter values to specify the data subset required, and then extract the subset efficiently and store it in internal workspace ready for processing. Through the applications interface file, it will be possible to set up default parameter values tailored to the application concerned. When the selection of axes is being made (specifying in what direction the 2-D cut through the 3-D data cube is to be made, for example), the application should display to the user the axis labels (if present) to assist identification.

  3. Higher-level structures

    Various specialised data objects and structures may be packaged around the NDF structure, using the NDF as a building block. One common requirement is for a series of related spectra or pictures; this could be implemented simply as a sequence of NDFs as follows:

    name          special_type
       [name1]      $<$NDF$>$
           $\vdots$                  $\vdots$
       [name2]      $<$NDF$>$
           $\vdots$                  $\vdots$
       [name3]      $<$NDF$>$
           $\vdots$                  $\vdots$

    Another approach would be to use an HDS array, each element of which is an NDF.

  4. Merging two or more $<$NDF$>$ structures

    The merging of history records has been discussed in $<$HISTORY$>$ Structure, and the same approach is followed for other data objects within an $<$NDF$>$. Thus, cases are divided into (i) those with a principal data array, where only the components of its $<$NDF$>$ structure are processed/copied to an output array, and (ii) those where the data arrays have equal importance, and the application, by convention, assumes the first $<$NDF$>$ supplied contains the principal data array. There will be an HDS editor and $<$NDF$>$ ``dressing/undressing'' utilities when this is not satisfactory. It is suggested that a common ADAM parameter name be assigned to this `principal' $<$NDF$>$, e.g. MAIN_ARRAY.



Subsections

next up previous 62
Next: Polarimetry Example
Up: Starlink Standard Data Structures
Previous: HISTORY Structure

Starlink Standard Data Structures
Starlink General Paper 38
Malcolm J Currie, P T Wallace &
R F Warren-Smith
1989 January 20
E-mail:ussc@star.rl.ac.uk

Copyright © 2008 Science and Technology Facilities Council