(Haskell in Haskell) 1. Setup

This is the first “real” post in the Haskell in Haskell series.

In this post we’ll go over setting up a basic project in Haskell. We’ll be building our compiler on top of this foundation in the next posts.

(The code for this part is available here)

By the end of this series, we’ll have written a Haskell program that reads in Haskell source code, and then spits out C code.

We could write all of our code in a big Haskell file, compile that, and then call it a day, but this isn’t a manageable solution. This is why we want to make a project, which will allow us to easily combine multiple source files into a single program.

Haskell Tools

Haskell, unlike other languages, doesn’t come with “out of the box” support for building projects; at least, not adequate support for what we’d like to do. We’re going to need some additional tooling to help us.

Compiler

I’ve been saying “Haskell”, when talking about the compiler, but what I really mean is GHC. In theory, there’s a Haskell specification, so you could use a different compiler. In practice, GHC is so good, and so entrenched, that nobody bothers to use anything different. Almost every single project depends on details specific to how GHC works, including its numerous extensions, so you couldn’t even use another compiler if you wanted to.

We could’ve limited ourselves to a small subset of Haskell, so that our compiler could process itself, but this is too much of a limitation. We’ll be making use of GHC’s features to make our work easier.

Build Tools

Next, we need a build tool, outside of the compiler itself, that allows us to manage the source files in our project. This tool will also be responsible for collecting the libraries that our project depends on, making sure the versions are compatible. We won’t be relying on very many libraries. We just need a handful, belonging to the “standard canon” of Haskell anyway.

At the time of writing, there are basically two build tools for Haskell, Cabal, and Stack.

The difference between Cabal and Stack is mainly historical. In practice, they’re more or less the same nowadays. The main difference between the two is that Cabal will use Hackage as its source of packages, while Stack will use Stackage. Stackage pulls its packages from Hackage, but a group of maintainers make sure that the packages build together. This delays the accessibility of packages, but provides more stability for programmers.

If you’re familiar with Linux distributions, Hackage is a bit like Arch, with community provided packages, and minimal vetting, whereas Stackage is a bit like Ubuntu, where you have maintainers curating the versions of packages in the repository, and providing stable LTS releases.

Back in the day, Cabal would install packages globally, unless you explicitly created a Sandbox for each project. Stack was created to address this concern, as well as to use Stackage which was seen as providing advantages over using Hackage directly. Stack only installed packages locally for each project, allowing different projects you have to use different versions of packages. Eventually, Cabal created the new-build, new-run, etc. commands to provide this behavior. Since these commands are now the default, Cabal essentially acts like Stack nowadays.

All this to say that these tools are basically equivalent, but if you like having the LTS snapshots that Stackage provides, you may want to use Stack.

For our purposes, we’ll be using Cabal. In practice involving Stack involves installing Cabal anyways, and ultimately Cabal works very well nowadays, and I personally find Stack to be somewhat superfluous.

ghcup

We know that we’ll be needing a compiler (GHC), and a build tool (Cabal), but what’s the easiest way to install them? On Linux and macOS, at least, the answer is probably ghcup.

This isn’t just an installer for GHC and Cabal, but rather a tool that you install once, and that then lets you update your version of GHC and Cabal as often as you want. This way you can just run this setup process a single time, and then have an easy way to follow the new releases of GHC as they come out.

The website for ghcup has very simple installation instructions for Linux and macOS, which just involve fetching and running a bash script. Unfortunately, for Windows this tool doesn’t seem to be available, but the website also provides instructions towards setting up the Haskell Platform, which seems to be the preferred method of installation here.

Regardless, if you’d like to install GHC and Cabal through other means, go right ahead. As long as you have recent versions of both of these tools, there shouldn’t be any problems following along with this project.

Cabal Project

All right, now that we have the tools of our craft, it’s time to start putting them to good use!

The first thing we want to do is to create a project using Cabal. A project will allow us to easily work across multiple source files, and use a handful of useful libraries in our program. Furthermore, by providing this series through a project, it should hopefully work in the future, because the dependencies will be pinned to specific versions known to work. Even if the packages release new versions, the ones we use for this series will still be there.

First thing’s first, create a directory for your project. Inside of this directory, run:

cabal init

This will create a project in that directory, sgenerating some files automatically. You can go ahead and delete CHANGELOG.md, if you’d like. The .cabal file contains information about our project. This includes where our source files are, what libraries we depend on, etc.

After removing some of the superfluous comments, and other irrelevant stuff, you should have something like:

cabal-version:       >=1.10

name:                haskell-in-haskell
version:             0.1.0.0
author:              <Your Name>
maintainer:          <Your Email>
build-type:          Simple

executable haskell-in-haskell
  main-is:             Main.hs
  build-depends:       base >=4.13 && <5
  default-language:    Haskell2010

Most of this is pretty straightforward metadata. We then have an executable named haskell-in-haskell. When we run cabal build this builds that executable, with the same name. Running cabal install will build the executable, and then put it in your path, so you can run haskell-in-haskell in your terminal to access the compiler you’ve written.

Finally, we can do cabal run, or cabal run haskell-in-haskell to run this program. After running this command, you should see a bunch of stuff as Cabal builds it for the first time, and then, finally:

Hello, Haskell!

Hello Haskell indeed.

In the executable section. You can see that we’ve specified the libraries we’re depending on. In this case, just the standard library, and a main source file for our program: Main.hs.

Looking at Main.hs, we see:

module Main where

main :: IO ()
main = putStrLn "Hello, Haskell!"

The main function in this file is what gets run by our executable, which explains why we got that nice message earlier. If you modify the message printed here, and run the command again, you’ll see the output changed to match.

Library

So, right now we do have a program, but it doesn’t do much besides print a little message. We’re going to eventually turn this program into a full fledged compiler. We could do this by adding more and more code to Main.hs, but this isn’t exactly very pretty nor modular. We could add more source files to our executable section, telling Cabal to use them to build our program as well. This would work, but a more common practice is to create a separate library section, exporting a package containing the main functions your program needs, and then using that library inside of a small Main.hs to create an actual program.

The idea behind this separation is that it makes it easier to expose the functionality behind a program to other Haskell users, since you can provide a library doing the same thing as your command line program. Furthermore, by forcing yourself to write code in the form of a library, you have to think a bit more about modularity and good code structure.

Let’s go ahead and add a library section to our Cabal file:

library
  build-depends:       base >=4.13 && <5
                     , containers >=0.6 && <0.7
                     , mtl >=2.2 && <2.3
  default-language:    Haskell2010
  exposed-modules:     Example
  ghc-options:         -Wall
  hs-source-dirs:      src

We’ve included some other libraries besides the standard library this time. containers will provide us with a Map, and a Set data structure, which are going to be very useful at different points in the compiler. mtl, or “Monad Transformer Library” provides us with, well, Monad Transformers. You don’t need to understand Transformers in all of their glory to follow this project. What we’ll be doing is setting up little “computational contexts” at different points in the compiler, mainly to throw exceptions, but also to read from some store of information we’ve gathered previously, or to keep track of a little bit of state as we do our compiler work.

hs-source-dirs says that we’ll have our source files in the src directory. We also have a single module exported by our library: Example. Because src is our source directory, Cabal expects to find this module in src/Example.hs. Let’s go ahead and add some simple code there:

module Example (example) where

example :: String
example = "This is an example"

This module does nothing more than export the string example. With this, our library is “complete”, even thought it doesn’t really do anything yet. Later we’ll expand this library section until it has a complete compiler!

Connecting Library to Executable

Now we want to connect our library with our executable, so that our main function can use this example value. Later on, our main function will be calling the functionality of our compiler, so it makes sense to get started on this right away.

We need to modify our Cabal file’s executable section to depend on our library:

executable haskell-in-haskell
  main-is:             Main.hs
  build-depends:       base >=4.13 && <4.14
                     , haskell-in-haskell
                     , pretty-simple >=4.0 && <4.1
  default-language:    Haskell2010

We’ve added a dependency on haskell-in-haskell, the library we’ve just created, as well as a dependency on pretty-simple. This is a library that allows us to pretty print data structures without any extra code! This is very useful, because we want to allow users of our compiler to stop it midway, and print out the intermediate results. This library allows us to print those data structures in a nice way, without having to add the boilerplate for formatting them.

We don’t need this dependency right away, but we might as well set this up, so that we won’t even need to touch the Cabal file in the next parts of this series.

We can now use the library in Main.hs:

module Main where

import Example (example)

main :: IO ()
main = putStrLn example

Running the whole project with cabal run haskell-in-haskell, you should now see:

This is an example

Custom Prelude

So far, we’ve set a basic project up, but none of the code we have is actually useful. We’re going to scrap all of this toy code with actual compiler stuff soon enough.

We are going to add one piece of useful code in this part: a Custom Prelude!

The Haskell Prelude consists of a subset of the standard library that is imported into every file, and available by default in the REPL. So if you fire up ghci, and type out sum [1, 2, 3], this works because sum is in the module Prelude.

A custom Prelude allows us to provide our own module to replace the standard one. This is a piece of code that we can write now, at the beginning of the project, and never really touch again.

Why a Custom Prelude

“Why are we even bothering to replace the standard Prelude?”, you might ask. Ultimately, this is a subjective concern, but I personally feel like some of the advantages of using a custom Prelude are very useful for this project.

Haskell’s Prelude hasn’t changed much over the years. Because of this, it has accumulated some cruft to it. One of the biggest problems is the pervasive use of String over Text. String provides a linked list of characters (literally type String = [Char]), but Text provides a much more efficient byte-array based representation. Basically, you should always use Text if you can.

Unfortunately, because Haskell’s Prelude makes this choice a bit of a pain, in practice you end up having to go through String to do a lot of things. In fact, for this project, we’ll be using String throughout, rather than Text, because this is ultimately simpler, and avoids a good amount of boilerplate. Furthermore, we’re not working on a production compiler, and we’re cutting corners in plenty of places already. This project is about learning a simple, but complete approach to implementing a reasonable subset of Haskell, so I feel ok papering over things like this.

Many commonly used functions are also missing from the standard Prelude. For example, foldl has the wrong strictness, and performs much worse compared to the strict foldl'. Unfortunately foldl is in the Prelude, while you need to import foldl' from Data.List. This kind of thing is pervasive, and the line between what does and doesn’t get included seems quite arbitrary at times.

There are plenty of libraries that provide alternative Preludes: Protolude, Relude, among many others. For this project, I’ve decided to just write our own, with a handful of extra functions.

In practice, I’d recommend you find an alternative Prelude you like, as a library, and then just stick with that (Relude would be my personal choice). For this project, I don’t want to depends on the idiosyncracies of any particular Prelude though. Choosing one would work, but it might alienate people not familiar with how it works.

Instead, I’m just introducing a few simple operators that make code much more readable, and consistent, in my opinion, and a few utility functions that we use often enough to warrant being in our Prelude.

Creating Ourlude

So, we’ll be creating a file src/Ourlude.hs to contain our custom Prelude. Let’s go ahead and add this to our Cabal file:

library
  ...
  exposed-modules:     Ourlude
                     , Example

And now let’s create the src/Ourlude.hs file:

module Ourlude
  ( module Prelude,
    (|>),
    (<|),
    (>>>),
    (<<<),
    foldMapM,
    mapLeft,
    first, second
  )
where

import Data.Bifunctor (first, second)

The first thing of note is that we’re exporting the entire standard Prelude! As much as I’ve ragged on about it, ultimately, all our custom Prelude does is add a few things to the standard Prelude.

What we’re including

first and second come from Data.Bifunctor.

What they do is map over the first and second parts of what’s called a “Bifunctor”:

first :: Bifunctor f => (a -> a') -> f a b -> f a' b
second :: Bifunctor f => (b -> b') -> f a b -> f a b'

You can think of a Bifunctor as being like a Functor, but with two slots you can map over, instead of just a single one. Examples include Either a b, and tuples (a, b). Concretely, we’ll be using first and second for their effect on tuples. In this case, the functions work like this:

first f (a, b) = (f a, b)
second f (a, b) = (a, f b)

These are just used often enough throughout the compiler to warrant including.

We’ll be implementing the rest of the functions ourselves, so make sure to add them to Ourlude.hs if you’re following along.

First we have: foldMapM:

foldMapM :: (Monad m, Monoid b) => (a -> m b) -> [a] -> m b
foldMapM f = mapM f >>> fmap mconcat

This might seem a bit weird to include, but it’s actually surprisingly useful. First, remember foldMap:

foldMap :: Monoid b => (a -> b) -> [a] -> b

This function takes a list [x1, x2, ..., xN] and gives us f x1 <> f x2 <> ... <> f xN, where <> is the monoidal operation. For example, you could use this to show a bunch of integers, and then concatenate the digits:

foldMap show [1, 2, 3]

This is useful in a compiler, because very often you’re traversing lists of stuff, and producing some kind of output that you want to merge together. For example, if you want to find the set of names used in an list of expressions, you’d use foldMap, given a function that finds the set of names used in a single expression.

foldMapM is useful, because you’re doing this kind of traversal inside of a monadic context. For example, while squashing this output, you might need to throw an error, or lookup some information that you have access to in this context.

Forward Composition

The remaining functions come as an ensemble really:

infixl 1 |>
(|>) :: a -> (a -> b) -> b
x |> f = f x
{-# INLINE (|>) #-}

infixr 0 <|
(<|) :: (a -> b) -> a -> b
(<|) = ($)
{-# INLINE (<|) #-}

infixr 9 <<<
(<<<) :: (b -> c) -> (a -> b) -> (a -> c)
g <<< f = g . f
{-# INLINE (<<<) #-}

infixl 9 >>>
(>>>) :: (a -> b) -> (b -> c) -> (a -> c)
f >>> g = g . f
{-# INLINE (>>>) #-}

Two of these operations should be well known to you. <<< is actually . in disguise, and <| is $ as well. The reason I include these is strictly for consistenty with the other two.

The other two provide forward composition. So f >>> g is nothing other than g <<< f, which is g . f, in more familiar terms. Similarly x |> f is just f $ x, and just f x, really. (The advantage of $ comes with multiple functions, or complicated expressions like f $ 2 + 2).

The reason I include forward composition operators is because I find forward composition better in most cases, compared to backwards composition. What I mean by “forward”, is that if we take an expression like f (g (x)), i.e. f $ g $ x, or f <| g <| x, we have x first “entering” g, and then “entering” f. So the data moves from right to left, which is backwards relative to the left to right order you’re reading this text in.

By contrast x |> g |> f has the data moving forwards, or left to right.

Similarly \x -> f (g x) becomes f <<< g, or g >>> f, and the same remarks apply.

I think forward composition is better in most cases, simply because it matches the standard reading order. For example, take this expression:

Matrix . map stripHead . filter isDefault $ rows

Now, when I read this, I have to first move my attention all the way to the end of the line, and then crane going backwards from right to left, in order to see how rows is transformed into the final result.

This wouldn’t look all that strange to many Haskellers, since we’re so used to reading things in this order. But if you use <|, as we will, this does look a bit odd:

Matrix <| map stripHead <| filter isDefault <| rows

It looks odd, because it is. You’re taking a piece of data, and transforming it, but you’re doing so by describing transformations in the opposite order in which they happen. It’s much more natural (at least in English) to say

“take the rows, then filter using isDefault, then map stripHead, then wrap that in Matrix”

versus

“wrap in Matrix before mapping with striphead, before filtering using isDefault, over the rows”

Let’s have some catharsis:

rows |> filter isDefault |> map stripHead |> Matrix

This looks much more straightforward to me, and matches the natural reading order.

That’s not to say that the other direction isn’t useful. It is, just not for longer chains like this. Where the other direction is useful is when it reminds us of “normal” function application.

For example, a large expression like this clearly benefits from using |>

transpose (swap index (transpose pats))

pats |> transpose |> swap index |> transpose

On the other hand:

\stuff -> map fst (filter okay stuff)

doesn’t have enough nested computation to warranted using |>, it’s readable as is.

If wanted to eta-reduce this, i.e. remove the lambda, it makes sense to do it like this:

map fst <<< filter okay

Here f <<< g is supposed to be immediately evocative of f (g x). In general, I prefer using <<< when I’m trying to draw a parallel with direct application.

<|, is used in the same way as $, as a kind of syntactic trick:

Matrix <| rows1 ++ rows2

return <| case x of

map <| \case

runMonadM <| do

...

What we’re doing here is making use of the low precedence of <| to neatly avoid wrapping things in parentheses. This is especially important in front of block expressions, because wrapping a block in parentheses is a bit awkward:

f (case x of
  Nothing -> 0
  Just y -> y
)

Making Ourlude Default

So, now we have a module Ourlude, which contains the functions we want in our custom Prelude, but we haven’t actually made it the Prelude. In fact, Prelude is still imported by default in every file, and this will conflict with Ourlude when we try to import it as well.

To amend this, we need to set things up so that Prelude is not imported by default. Then we can manually import Ourlude in every file, and things will work out.

Thankfully, Haskell provides an extension to make the Prelude not implicitly imported, called, well NoImplicitPrelude. We could add:

{-# LANGUAGE NoImplicitPrelude #-}

at the top of every file, but it’s easier to just add this as a default extension in our Cabal file. This makes it so that it’s included as an extension in every file.

executable haskell-in-haskell
  ...
  default-extensions:  NoImplicitPrelude

library
  ...
  default-extensions:  NoImplicitPrelude
  ...

Now, this breaks all of our code, because we assumed the Prelude was imported before.

We need to import the Prelude in Ourlude.hs, to be able to reexport things:

-- ...
import Prelude
-- ...

Let’s change Main.hs to import Ourlude now:

module Main where

import Ourlude

main :: IO ()
main = putStrLn <| "Hello Again!"

Here we use the <| operator from our custom Prelude, just as a little check that everything’s working right.

Let’s go ahead and fire cabal run haskell-in-haskell again. You should see:

Hello Again!

Conclusion

And that’s all for this part!

Even though we’re not really doing any compiler stuff yet, we’ve gotten a project set up, and a lot of the boilerplate necessary for the project out of the way.

The code for this part is available here.

Stay tuned for the next part of this series, where we’ll finally get into compiler goodness, by writing our own lexer for Haskell, and all of its whitespace trickery!