Statistics can be computed for one or more individual columns. They can be computed from either all the rows in the catalogue or just the subset of rows comprising a selection which has been created previously. Obviously, only non-null rows are used in the calculations. Statistics can be displayed for columns of any data type, though for CHARACTER and LOGICAL columns the only quantity which can be determined is the number of non-null rows.
For each chosen column its name, data type and the number of non-null rows
(that is, the number of rows used in the calculation) are displayed and the
statistics listed in Table
are computed. Though all these
quantities are standard statistics there is a remarkable amount of muddle
and confusion over their definitions, with textbooks giving divers
differing formulæ. For completeness, and to avoid any possible
ambiguity, the definitions used in xcatview and catview are
given below. These formulæ follow the CRC Standard Mathematical
Tables[4] except for the definition of skewness which is taken
from Wall[30].
|
|
In the following the set of rows for which statistics are computed is
called the `current selection' and it contains
non-null rows.
is the value of the column for the
th non-null row in the
current selection. The definitions of the various statistics are then as
follows.
The interquartile range is simply the positive difference between
and
.
The value computed for the mode is not exact. Indeed it is not obvious that the mode is defined for ungrouped data. Rather, the value given is computed from the empirical relation:
| (1) |
![]() |
(2) |
![]() |
(3) |
then
| (4) |
and
| (5) |
The expected values for the skewness and kurtosis are:
CURSA Catalogue and Table Manipulation Applications