'What does data Vector :: * -> Nat -> * where mean in Haskell?

I'm looking at https://wiki.haskell.org/GHC/Kinds and I found this:

data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where
  VNil :: Vector a Zero
  VCons :: a -> Vector a n -> Vector a (Succ n)

I tried looking about what does the where does in Haskell but I could not find info about it together with data. I have an idea of what a kind is. A constructor that takes no parameters has kind * and a constructor that takes one parameter has kind * -> *, it's the 'type' of the constructor. I think it's defining the kind of Vector as being * -> Nat -> * which means something that can take anything, a natural number and return a Vector?

And more important: why would someone use this stuff for?

haskell types

Solution 1:^[1]

The data ... where ... syntax is GADT syntax. (The wiki you were looking at also has a page about GADTs, linked from the page you were reading)

GADT syntax is enabled with the GADTs language extension. It gives you an alternative to the standard data declaration syntax, where instead of listing the data constructors as if they were applied to the types of their fields (thus implicitly defining the overall type of the constructor), you instead write an explicit type signature for each constructor.

For example, the standard Maybe type is defined like this in traditional syntax:

-- Maybe type constructor takes an argument a;
-- all data constructors implicitly return Maybe a
data Maybe a
  = Just a      -- Just data constructor takes an argument **of type** a, not a itself
  | Nothing     -- Nothing data constructor takes no arguments

And like this in GADT syntax:

-- Maybe type constructor takes an argument a
data Maybe a where
  Just :: a -> Maybe a   -- Just takes an argument of type a to return a Maybe a
  Nothing :: Maybe a     -- Nothing is simply of type Maybe a

This syntax is more verbose. In complex cases it can arguably be clearer, since we explicitly write the type of the constructors in ordinary type expression syntax, rather than defining them in a weird special-purpose syntax where we pseudo-apply term-level data constructors to the types of their arguments.

But more importantly, GADT syntax¹ opens up the door to new features that simply cannot be expressed in the original syntax. That is happening in your example.

Primarily the new features come from control over the return type of each constructor. If we wanted to try to define Vector using the traditional data syntax, we would have to do something like this:

data Vector a n
  = VNil
  | VCons a (Vector a n)

The n parameter is supposed to represent the vector's number of elements at the type level. The idea is so that we can do things like zip two vectors together with the compiler enforcing that the two vectors have the same length, rather than doing something like Data.List.zip where it simply gives up and silently discards any remaining elements from one list when the other runs out of elements.

But what we've written above can't do that. VNil always returns a value of type Vector a n, and it doesn't have any fields of type n (or a) so any VNil can be used with any n type parameter at all; n isn't going to say anything about the size of the vector! And similarly VCons has a field with a Vector a n in it, but is also going to end up constructing a value of type Vector a n, where the n parameter is the same, when we want it to indicate that the size is one larger than the tail-vector's size.

GADT syntax allows us to fix those problems:

{-# LANGUAGE GADTs #-}

data Vector a n where
  VNil :: Vector a Zero
  VCons :: a -> Vector a n -> Vector a (Succ n)

Now we can explicitly say that the VNil constructor is a vector of size Zero; it's not like Nothing :: Maybe a where it is always polymorphic in that type variable. (Our VNil is still polymorphic in the type variable a, of course, since having no elements means we shouldn't constrain what the element type will be). And VCons takes an a and a Vector a n to produce a Vector a (Succ n)²; specifically a vector that is "one item larger" than the vector inside it.

We missed a step though, which is actually defining Zero and Succ anywhere. The way to do that is simply:

data Nat = Zero | Succ Nat

This says: A natural number² is either Zero, or it is the Succ of some other Nat. We can pattern match a Nat repeatedly and we'll eventually³ hit Zero; if we wanted to convert this to a "normal" number we'd do that and count the Succ constructors.

But that has only given us Zero and Succ as data constructors. We were wanting to use them in the types of our VNil and VCons. For that we'll need another extension: DataKinds. It just allows use to use any data declaration both to define a (type-level) type constructor and its associated (term-level) data constructors and to define a (kind-level) "kind constructor" and its associated type constructors (which in turn do not contain any values at the term-level; they're purely for type manipulation).

Note that the wiki page you were looking at appears to be describing a not-yet-implemented language extension called Kinds, which is clearly the same as what is now DataKinds. As such, that wiki page is extremely old, so you should probably just ignore it. If you were looking for reading material about kinds in general (rather than the DataKinds extension specifically), you probably need to continue your search.

But using DataKinds we can do this (actually compiles now):

{-# LANGUAGE GADTs, DataKinds #-}

data Nat = Zero | Succ Nat

data Vector a n where
  VNil :: Vector a Zero
  VCons :: a -> Vector a n -> Vector a (Succ n)

Here we defined Nat (at the type level) and Zero & Succ (at the term level) as normal. But then in the definition of Vector we use Zero and Succ at the type level. This is enough for GHC to infer that in Vector a n the n must be of kind Nat (because it is used with type constructors in that kind). But we can also use yet-another extension KindSignatures and be more explicit about the kind of the type constructor Vector, like so:

{-# LANGUAGE GADTs, DataKinds, KindSignatures #-}

data Nat = Zero | Succ Nat

data Vector :: * -> Nat -> * where   -- this line is where the difference is
  VNil :: Vector a Zero
  VCons :: a -> Vector a n -> Vector a (Succ n)

Before we were just saying data Vector a n and trusting both the compiler and the reader to figure out from the usage below that a must be of kind * (it's used as the type of an actual value in the VCons constructor) and n is of kind Nat (because that position in the Vector type constructor is filled by type-level Zero and Succ n in the constructors). Now we can explicitly give the compiler and the reader more information up front with data Vector :: * -> Nat -> *; Vector takes a type parameter of kind *, another type parameter of kind Nat, and results in a type of kind * (which means it's a type that can actually have values at the term level).

It can be a little confusing with DataKinds whether a token like Zero refers to the term-level Zero (of type Nat, which is of kind *) or the type-level Zero (of kind Nat). Most of the time the compiler can perfectly tell which is which because it keeps very strong track of whether a given expression is a term expression or a type expression. But DataKinds gives you a way of being more explicit; if you prepend a type constructor with a single quote (like 'Zero) then it definitively means the data constructor promoted to type level. Some people consider it good style to always use this explicit marking.⁴

So that is pretty much everything in that is going on in your example.

As for what it's all for... at the most basic level it allows you do things like keeping track of the length of lists⁵, and then have the compiler enforce that multiple lists have the same size (or have sizes that have some specific other relationship, like being larger, or smaller, or twice as large, etc). People use that capability for a huge number of things that I can't possibly sum up in a single post (and not just with sizes of things; there are countless ways to use DataKinds and GADTs to reflect information at the type level so that the compiler can enforce things for you); it's a bit like asking "what are functions for". But here's a couple of example functions that do interesting things with the type-level length:

vzipWith :: (a -> b -> c) -> Vector a n -> Vector b n -> Vector c n
vzipWith _ VNil VNil = VNil
vzipWith f (a `VCons` as) (b `VCons` bs)
  = f a b `VCons` vzipWith f as bs

The zip that enforces the vectors have the same length, as I mentioned earlier. Guarantees that you can't have a bug because one list was accidentally shorter and you simply ignored elements of the other, instead the compiler will complain. Here's vzipWith at work in GHCI:

? vzipWith replicate (1 `VCons` (2 `VCons` VNil)) ('a' `VCons` ('b' `VCons` VNil))
VCons "a" (VCons "bb" VNil)
it :: Vector [Char] ('Succ ('Succ 'Zero))

With a bit more work we can define how to add type level Nats (using even more extensions, which I won't explain in detail here). Then we can append vectors while keeping track of the combined length:

type Plus :: Nat -> Nat -> Nat
type family Plus n m
  where Zero `Plus` n = n
        Succ n `Plus` m = Succ (n `Plus` m)

vappend :: Vector a n -> Vector a m -> Vector a (n `Plus` m)
vappend VNil ys = ys
vappend (x `VCons` xs) ys = x `VCons` vappend xs ys

And at work:

? vappend (1 `VCons` (2 `VCons` VNil)) (3 `VCons` (4 `VCons` (5 `VCons` VNil)))
VCons 1 (VCons 2 (VCons 3 (VCons 4 (VCons 5 VNil))))
it ::
  Num a => Vector a ('Succ ('Succ ('Succ ('Succ ('Succ 'Zero)))))

If you want to play around with it in GHCI, put all of this in a file and load it:

{-# LANGUAGE DataKinds, GADTs, KindSignatures, StandaloneDeriving, TypeFamilies, TypeOperators, StandaloneKindSignatures #-}

data Nat = Zero | Succ Nat

data Vector :: * -> Nat -> * where
  VNil :: Vector a 'Zero
  VCons :: a -> Vector a n -> Vector a ('Succ n)

deriving instance Show a => Show (Vector a n)


vzipWith :: (a -> b -> c) -> Vector a n -> Vector b n -> Vector c n
vzipWith _ VNil VNil = VNil
vzipWith f (a `VCons` as) (b `VCons` bs)
  = f a b `VCons` vzipWith f as bs


type Plus :: Nat -> Nat -> Nat
type family Plus n m
  where Zero `Plus` n = n
        Succ n `Plus` m = Succ (n `Plus` m)


vappend :: Vector a n -> Vector a m -> Vector a (n `Plus` m)
vappend VNil ys = ys
vappend (x `VCons` xs) ys = x `VCons` vappend xs ys

Here I have also added yet another extension StandaloneDeriving so we can derive a Show instance for Vector, so you can play around in the interpreter and see what you get.

¹ There is in fact an extension GADTSyntax that enables just the new syntax, but doesn't allow you to define any types that you couldn't have defined in the old syntax. Hardly anyone uses this extension as far as I know; GADTs are well-regarded and anyone who bothers to learn the new syntax only does so in the context of learning about GADTs, so if they like the new syntax and want to use it everywhere they're probably just enabling GADTs everywhere.

² "Succ" is short for "successor". The standard way of defining natural numbers from first principles is to assume that there exists the first natural number (zero), and that for any natural number has a successor which is a different natural number (and is not the successor of any other number). It's a fancy way of saying you can start at zero and count up from there as far as you want to go. But this inductive structure of natural numbers happens to map very nicely into Haskell's type logic, making it very easy to count things at type level using numbers defined this way, which is why it's being used here.

³ Unless it's infinite, which means our attempt to "count the succs" will never terminate, which is another way to produce bottom/undefined in Haskell.

⁴ This "tick" syntax is necessary in some cases, for disambiguation. For example, with DataKinds we can have type level lists of types, like [Bool, Char, Maybe Integer], because we can promote lists to operate at type-level instead of term-level. [Bool, Char, Maybe Integer] is a type-level list of kind [*] (a list of things of kind *, i.e. a list of types). The problem arises when we consider things like [Bool]. This is definitely a type expression, but is it the list type constructor applied to Bool (meaning, the type of terms that are a list of boolean values), or is it a singleton list-of-types (i.e. the list data constructors : and [], promoted to type level), whose single value happens to be the type Bool? One is of kind *, and the other is of kind [*]. There's no way to tell the programmer's intent from looking at it, so the rules of DataKinds say we prioritise the pre-DataKinds interpretation; [Bool] is definitely the type of lists of boolean values. We can use the tick mark to explicitly choose the other interpretation: '[Bool] is a singleton list of types.

⁵ Lists whose type reflects the length of the list are traditionally called vectors, to distinguish them from ordinary lists whose type says nothing about the length. Perhaps this is not ideal, since there are several other things that are also called "vectors" to distinguish them from ordinary lists. But it's established terminology.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'What does data Vector :: * -> Nat -> * where mean in Haskell?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]