A data
definition defines a brand new type, which is different from
every primitive type and every other type defined using a data
definition, even if they look structurally similar. The new type defined
by a data
definition is a "sum of products", or a "union of products".
topDefn ::= data typeId {tyVarId } = {summand | }[ derive ]
summand ::= conId {type }
summand ::= conId { { fieldDef ; }}
derive ::= deriving ( { classId , })
fieldDef ::= fieldId :: type
The typeId is the name of this new type. If the tyVarId's exist, they are type parameters, thereby making this new type polymorphic. In each summand, the conId is called a "constructor". You can think of them as unique tag's that identify each summand. Each conId is followed by a specification for the fields involved in that summand (i.e., the fields are the "product" within the summand). In the first way of specifying a summand, the fields are just identified by position, hence we only specify the types of the fields. In the second way of specifying a summand, the fields are named, hence we specify the field names (fieldId's) and their types.
The same constructor name may occur in more than one type. The same field name can occur in more than one type. The same field name can occur in more than one summand within the same type, but the type of the field must be the same in each summand.
The optional derive clause is used as a shorthand to make this new
type an instance of the classId's, instead of using a separate,
full-blown instance
declaration. This can only be done for certain
predefined classId's: Bits
, Eq
, and Bounded
. The compiler
automatically derives the operations corresponding to those classes
(such as pack
and unpack
for the Bits
class). Type classes,
instances, and deriving
are described in more detail in sections
2.1, 4.5 and
4.6.
To construct a value corresponding to some data
definition $T$, one
simply applies the constructor to the appropriate number of arguments
(see section 5.3{reference-type="ref"
reference="sec-exprs-constrs"}); the values of those arguments become
the components/fields of the data structure.
To extract a component/field from such a value, one uses pattern matching (see section 6{reference-type="ref" reference="sec-patterns"}).
Example:
data Bool = False | True
This is a "trivial" case of a data
definition. The type is not
polymorphic (no type parameters); there are two summands with
constructors False
and True
, and neither constructor has any fields.
It is a 2-way sum of empty products. A value of type Bool
is either
the value False
or the value True
Definitions like these correspond
to an "enum" definition in C.
Example:
data Operand = Register (Bit 5)
| Literal (Bit 22)
| Indexed (Bit 5) (Bit 5)
Here, the first two summands have one field each; the third has two
fields. The fields are positional (no field names). The field of a
Register
value must have type Bit 5. A value of type Operand
is
either a Register
containing a 5-bit value, or a Literal
containing
a 22-bit value, or an Indexed
containing two 5-bit values.
Example:
data Maybe a = Nothing | Just a
This is a very useful and commonly used type. Consider a function that,
given a key, looks up a table and returns some value associated with
that key. Such a function can return either Nothing
, if the table does
not contain an entry for the given key, of Just
$v$, if the table
contains $v$ associated with the key. The type is polymorphic (type
parameter "a
") because it may be used with lookup functions for
integer tables, string tables, IP address tables, etc., i.e., we do
not want here to over-specify the type of the value $v$ at which it may
be used.
Example:
data Instruction = Immediate { op::Op; rs::Reg; rt::CPUReg; imm::UInt16; }
| Jump { op::Op; target::UInt26; }
An Instruction
is either an Immediate
or a Jump
. In the former
case, it contains a field called op
containing a value of type Op
, a
field called rs
containing a value of type Reg
, a field called rt
containing a value of type CPUReg
, and a field called imm
containing
a value of type UInt16
. In the latter case, it contains a field called
op
containing a value of type Op
, and a field called target
containing a value of type UInt26
.
NOTE:
Error messages involving data type definitions sometimes show traces of how they are handled internally. Data type definitions are translated into a data type where each constructor has exactly one argument. The types above translate to:
data Bool = False PrimUnit | True PrimUnit data Operand = Register (Bit 5) | Literal (Bit 22) | Indexed Operand_$Indexed struct Operand_$Indexed = { _1 :: Reg 5; _2 :: Reg 5 } data Maybe a = Nothing PrimUnit | Just a data Instruction = Immediate Instruction_$Immediate | Register Instruction_$Register struct Instruction_$Immediate = { op::Op; rs::Reg; rt::CPUReg; imm::UInt16; } struct Instruction_$Register = { op::Op; target::UInt26; }