Type-Driven Development with Idris - Review

2017-10-23

Summary

Type-Driven Development with Idris introduces a software development approach that uses types as the primary tool for building softwares. In this paradigm, types are given much more responsibilities than their traditional role of checking data validity. For example, types can be used to represent the input and output states of functions or various contracts that functions and datas must fulfill. Such descriptions and contracts are enforced at compile time, providing much stronger guarantee of software correctness at compile time than other approaches can.

Idris programming language is used in the book to teach type-driven development (TDD). Idris is a general purpose pure functional programming language created by Edwin Brady, who is also the author of this book. Idris is inspired by Haskell and ML and sports a state of the art type system suited for TDD. You will learn basics of Idris language alongside TDD in this book.

Who Is It For

I recommend this book to those who are interested in using statically typed system to build programs with strong guarantee of correctness. You should be familiar with basic concepts of functional programming such as immutability, closure, or higher-order function. It would be very helpful if you are already familiar with functional programming languages like Haskell, OCaml, or Scala. The book introduces a lot of new concepts, so you’d rather avoid having to learn about functors while trying to understand those other new concepts introduced in the book.

I do not recommend this book if you are not interested in statically typed system or if you are a pragmatist interested in concepts and languages that can be used in production right away. In addition, if you have little experience with functional programming, this book might feel overwhelming because of the sheer amount of prerequisite knowledge.

What Makes It Special

What I appreciate the most is that the author maintains clear focus on his stated goal of introducing TDD. Far too many books, especially the ones that deal with topics that their authors are passionate about, have unncessary chapters that digress from the topic of the book. In contrast, this entire book is written and organized for the introduction of TDD with a clear sense of progression between each chapter. The author begins by introducing basic concepts and setting up development environment; then he provides increasingly advanced examples of TDD; in the final five chapters, he presents somewhat more realistic examples like file IO or concurrent programming.

Another thing I like about this book is that the author teaches the concepts systematically, which is a somewhat rare skill among technical writers. At the beginning of the book, he introduces an iterative process for TDD: “type, define, refine: writing a type, implementing a function to satisfy that type, and refining the type or definition as we learn more about the problem.” This process is used throughout the book, each time in a slightly different context to demonstrate its application. Such repetition of identical process under different circumstances is one of the most effective method for teaching.

The author also introduces new concepts through tutorial-like examples. He first provides some code to serve as a starting point, and then guides the readers through each step, all the while adhering to the “type, define, refine” process. Here is a simple example with step-by-step progression from Chapter 3:

-- For example, to write the exclusive OR operator, you could follow these steps:

-- 1. Type — Start by giving a type:
xor : Bool -> Bool -> Bool

-- 2. Define — Press Ctrl-Alt-A with the cursor over xor to add a skeleton definition:
xor : Bool -> Bool -> Bool
xor x y = ?xor_rhs

-- 3. Define—Press Ctrl-Alt-C over the x to give the two possible cases for x :
xor : Bool -> Bool -> Bool
xor False y = ?xor_rhs_1
xor True y = ?xor_rhs_2

-- 4. Refine—Complete the definition by filling in the right sides:
xor : Bool -> Bool -> Bool
xor False y = y
xor True y = not y

Lastly, the author also provides nice exercises for each chapter. Whereas some books provide difficult exercises that can only be solved when you thoroughly understand new concepts, the exercises in this book are relatively simple ones aimed at getting you more familiar with them. I think that this kind of exercises is more appropriate when you are trying to learn something on your own through books.

Type-Driven Development (TDD)

In functional programming paradigm, a program is seen as a process of transforming data from one form to another. Such data forms or structures are represented as types, and the transformation process is represented as a function with input and output. TDD puts emphasis on using types and functions to depict the overall plan and structure of a program. The main process in TDD is “type, define, refine”, as previously mentioned.

As its name suggests, in TDD we write down types first. These types are used to explicitly represent the conceptual model of the program that we intend to write. Let’s say we want to represent the states of matter. We could start by defining data Matter = Solid | Liquid | Gas. Then we can represent a phase transition from one state to another as a function phaseTransition : Matter -> Matter.

Once we define initial types and functions, we keep refining these definitions so that the software would more precisely match our conceptual model. Let’s say we’d like to add the plasma state to our program. Then we can refine our type like data Matter = Solid | Liquid | Gas | Plasma. We repeat this “type, define, refine” process until we reach a satisfactory iteration.

I believe that rather than being a completely novel approach, TDD is more of an organized description of a software development approach that organically arose in communities that use statically typed functional languages like Haskell. Whenever I read or write Haskell programs, I find myself focusing foremost on type definitions and function type signatures to gain overall understanding of the program. TDD felt like an improved and formalized version of such an approach.

Idris Programming Language

Having said that, TDD as presented in this book is way more powerful than what little I’ve seen in Haskell. It is because of additional features offered by Idris programming language. Let’s look at some of its notable characteristics.

First-class Type

Idris has first-class type and can pass around types as arguments to other functions. If you’ve experienced the paradigm shift that occurred when first-class function was introduced, you could guess how much this changes the game. Like first-class function, first-class type opens up a completely new way to express our intents through code.

As a simple example of first-class type, in Idris List type is defined as data List : (elem : Type) -> Type. It’s just a regular type that takes another type as an argument, without special interface-like syntax provided by the language. Unfortunately, the full implications of first-class type is hard to demonstrate through such trivial example. In Idris, this feature is used mostly in conjunction with dependent type, which we will look at next.

Dependent Types

Dependent type is one of the defining features of Idris. It allows us to define types that can be calculated from other values. Personally, I found it easiest to think about it as a function at type level.

For example, we can think of Vector 4 String type, which represents a list of String with exactly 4 elements. We can keep creating similar types like Vector 0 String, Vector 1 String, Vector 2 String, and so on by entering different values as their lengths. This can be generalized as Vector n String type, which is essentially a function that takes n as input and returns types as outputs. The type of output returned by this function is “dependent” on the value of n, so this kind of function is called a dependent function. The type of this function, in turn, is called dependent function type or shortened to dependent type.

Dependent type gives us a new dimension of code expressiveness, enabling us to incorporate more information about program into the type system. I found two use cases most interesting. First, information about data that was available only at runtime can be made available at compile time. For example, Integer type tells us that a value of its type will be an integer. But whether a variable of that type would be a positive or negative number, or zero can only be determined at runtime. With dependent type, we can define more precise types such as “positive integers that are multiples of three” or “odd integers greater than -50 and less than 9953”. We’ve already seen the Vect n String type, which represents “lists of n String items”. Because such information is guaranteed by the type system at compile time, we never have to worry about unexpected values.

Second, description about what constitutes a valid program can be represented in the type system. For example, in the context of bank transactions, we can define Withdrawl dependent type, which takes not just the amount of money to withdraw, but also “valid credentials” and “bank account with balance greater than withdrawl amount” as arguments. Assuming that we defined the Withdrawl type properly, any value of Withdrawl type would always represent a valid bank transaction.

Traditionally, such information has been described in separate documents such as UML, comments of the source file, or worse, in the head of the developer. In Idris, it can be directly represented as part of the source code and made available to the compiler through dependent types, ensuring that a program’s description and implementation can programatically refer to each other. This means that any change to software requirements will be immediately reflected in the description of the program, which the compiler will use to check whether rest of the program follows the new requirements. There would be no more need to continuously go back and forth between documents and source code to ensure that new requirements are reflected.

In the book, the author demonstrates several use cases of dependent type. I’ll extend the state of matter example to showcase how to represent input and output states of operations in a dependent type.

data Matter = Solid | Liquid | Gas

data PhaseTransition : Type -> Matter -> Matter -> Type where
    Melt : PhaseTransition () Solid Liquid
    Vaporize : PhaseTransition () Liquid Gas
    Condense : PhaseTransition () Gas Liquid
    Freeze : PhaseTransition () Liquid Solid
    Sublime : PhaseTransition () Solid Gas
    Deposit : PhaseTransition () Gas Solid

    Pure : ty -> PhaseTransition ty state state
    (>>=) : PhaseTransition a state1 state2 ->
            (a -> PhaseTransition b state2 state3) ->
            PhaseTransition b state1 state3

PhaseTransition dependent type describes potential phase transitions among states of matter, where last two arguments refer to input and output states. For example, a value of PhaseTransition dependent type constructed with Melt constructor requires that input and output states must be Solid and Liquid, respectively. We can use these operations to write a sequence of phase transitions like following:

vaporizeIce : PhaseTransition () Solid Gas
vaporizeIce = do Melt
                 Vaporize

vaporizeIceThenFreezeAgain : PhaseTransition () Solid Gas
vaporizeIceThenFreezeAgain = do Melt
                                Vaporize
                                Deposit

freezeSteam : PhaseTransition () Gas Solid
freezeSteam = do Freeze
                 Freeze

We can see that the last function does not make sense. To make a gas into solid, we need to condense and then freeze it. Instead, freezeSteam function conducts freeze operation twice. Compiler will also notice that the sequence of phase transitions in freezeSteam is invalid and and report an error at compile time, which is made possible through dependent type.

Powerful REPL

Idris has one of the most powerful REPLs I have seen. Let’s take a look at the example I’ve already given above.

-- For example, to write the exclusive OR operator, you could follow these
-- steps:

-- 1. Type — Start by giving a type:
xor : Bool -> Bool -> Bool

-- 2. Define — Press Ctrl-Alt-A with the cursor over xor to add a skeleton definition:
xor : Bool -> Bool -> Bool
xor x y = ?xor_rhs

-- 3. Define—Press Ctrl-Alt-C over the x to give the two possible cases for x :
xor : Bool -> Bool -> Bool
xor False y = ?xor_rhs_1
xor True y = ?xor_rhs_2

-- 4. Refine—Complete the definition by filling in the right sides:
xor : Bool -> Bool -> Bool
xor False y = y
xor True y = not y

Something that starts with ?, in this case ?xor_rhs, ?xor_rhs_1, and ?xor_rhs_2, is a hole. It stands in for incomplete parts of the program, and Idris REPL provides amazing commands to work with holes. The instructions in the above code refer to some basic commands, such as automatic case splitting or displaying which type the hole stands for. In the following gif, you can see me actually following these instructions. Working in Idris REPL is an incredibly pleasant experience. It feels like having a meticulous and helpful assistant, not a nagging manager.

Other Musings

Overall reading the book was an eye-opening experience. This was by far the most difficult programming language book that I’ve read, as it introduced a lot of unique concepts. For example, I was dumbstruck when I learned that I had to convince the Idris compiler that 1 + k and k + 1 are equivalent types to pass the type check. I’ve never seen anything like that in other languages.

Idris is a visionary language, which tries to address some inherent flaws of current models of programming. I like it as a language, but I doubt it will ever become a mainstream one. First of all, it requires even more background knowledge than Haskell to use it. On the other hand, its type system is too powerful for typical tasks. Mainstream is all about being “just good enough”, but Idris is more about “above and beyond”. For typical shopping mall inventory management system, Idris is a bit overqualified. Moreover, I have a hunch that in collaborative environment it would be hamstrung by its excellent expressiveness. There would be so many ways to express something that each team member would end up writing programs in different styles, which causes significant problems for collaboration. There’s a reason that Google keeps Go language so plain despite continued criticisms.

Having said that, if I had to write software for nuclear reactors, I would definitely pick Idris, because it gives me unparalleled confidence in my code for such a critical system.