`data`

A data definition defines a brand new type, which is different from every primitive type and every other type defined using a data definition, even if they look structurally similar. The new type defined by a data definition is a "sum of products", or a "union of products".

topDefn  ::= data typeId {tyVarId } = {summand | }[ derive ]
summand  ::= conId {type }
summand  ::= conId { { fieldDef ; }}
derive   ::= deriving ( { classId , })
fieldDef ::= fieldId :: type

The typeId is the name of this new type. If the tyVarId's exist, they are type parameters, thereby making this new type polymorphic. In each summand, the conId is called a "constructor". You can think of them as unique tag's that identify each summand. Each conId is followed by a specification for the fields involved in that summand (i.e., the fields are the "product" within the summand). In the first way of specifying a summand, the fields are just identified by position, hence we only specify the types of the fields. In the second way of specifying a summand, the fields are named, hence we specify the field names (fieldId's) and their types.

The same constructor name may occur in more than one type. The same field name can occur in more than one type. The same field name can occur in more than one summand within the same type, but the type of the field must be the same in each summand.

The optional derive clause is used as a shorthand to make this new type an instance of the classId's, instead of using a separate, full-blown instance declaration. This can only be done for certain predefined classId's: Bits, Eq, and Bounded. The compiler automatically derives the operations corresponding to those classes (such as pack and unpack for the Bits class). Type classes, instances, and deriving are described in more detail in sections 2.1, 4.5 and 4.6.

To construct a value corresponding to some data definition $T$, one simply applies the constructor to the appropriate number of arguments (see section 5.3{reference-type="ref" reference="sec-exprs-constrs"}); the values of those arguments become the components/fields of the data structure.

To extract a component/field from such a value, one uses pattern matching (see section 6{reference-type="ref" reference="sec-patterns"}).

Example:

data Bool = False | True

This is a "trivial" case of a data definition. The type is not polymorphic (no type parameters); there are two summands with constructors False and True, and neither constructor has any fields. It is a 2-way sum of empty products. A value of type Bool is either the value False or the value True Definitions like these correspond to an "enum" definition in C.

Example:

data Operand = Register (Bit 5)
             | Literal (Bit 22)
             | Indexed (Bit 5) (Bit 5)

Here, the first two summands have one field each; the third has two fields. The fields are positional (no field names). The field of a Register value must have type Bit 5. A value of type Operand is either a Register containing a 5-bit value, or a Literal containing a 22-bit value, or an Indexed containing two 5-bit values.

Example:

data Maybe a = Nothing | Just a
               deriving (Eq, Bits)

This is a very useful and commonly used type. Consider a function that, given a key, looks up a table and returns some value associated with that key. Such a function can return either Nothing, if the table does not contain an entry for the given key, of Just $v$, if the table contains $v$ associated with the key. The type is polymorphic (type parameter "a") because it may be used with lookup functions for integer tables, string tables, IP address tables, etc., i.e., we do not want here to over-specify the type of the value $v$ at which it may be used.

Example:

data Instruction = Immediate { op::Op; rs::Reg; rt::CPUReg; imm::UInt16; }
                 | Jump { op::Op; target::UInt26; }

An Instruction is either an Immediate or a Jump. In the former case, it contains a field called op containing a value of type Op, a field called rs containing a value of type Reg, a field called rt containing a value of type CPUReg, and a field called imm containing a value of type UInt16. In the latter case, it contains a field called op containing a value of type Op, and a field called target containing a value of type UInt26.

NOTE:

Error messages involving data type definitions sometimes show traces of how they are handled internally. Data type definitions are translated into a data type where each constructor has exactly one argument. The types above translate to:

 data Bool = False PrimUnit | True PrimUnit

 data Operand = Register (Bit 5)
              | Literal (Bit 22)
              | Indexed Operand_$Indexed
 struct Operand_$Indexed = { _1 :: Reg 5; _2 :: Reg 5 }

 data Maybe a = Nothing PrimUnit | Just a

 data Instruction = Immediate Instruction_$Immediate
                  | Register Instruction_$Register

 struct Instruction_$Immediate = { op::Op; rs::Reg; rt::CPUReg; imm::UInt16; }
 struct Instruction_$Register = { op::Op; target::UInt26; }