Databricks Systems Tables Quickstart

[Views and opinions are my own.]

Databricks released System Tables, which allow a user with sufficient GRANTs to inspect system-level data regarding their account. I write this following the writing of one of my colleagues that provides a comprehensive overview of:

  1. how these tables work,

  2. what one may find within these tables, and

  3. how to operationalize their use.

This post does not rehash these already well-attended topics. Instead, this shall serve as a sort of Quickstart guide for those who are technically savvy, up against a deadline, and just want to jump right in - as it were. I have written a great deal of SQL to query these tables for a number of use cases, and can reasonably state that the specific uses hereinafter shown are super common and can form the foundation for a dashboard of operational insights. We divide it into two sections: Billing and Informational.

We take some variables as given:

  • tag - Some queries search and organize information by the custom tags set on the securable object

  • interval_hours - The time to look back for queries. Of course, this can be changed by the reader to fit the need.

  • discount - if any exists, we want to get the accurate price with the discount, not the list price.

There is also plenty of use of SQL functions that can be read about at this link. The code below can be found in GitHub here.

Billing

Here there are two main tables:

  1. system.billing.usage

  2. system.billing.list_prices

DBUs

These are the basic unit to measure usage; please read the content behind this link if you do not understand them. Without an understanding of how DBUs are emitted, no billing dashboard would be possible. The usage can be queried to give a great deal of information. Since this is a Quickstart, we’ll just show how to aggregate DBUs by the common features: cluster, Serverless SQL, and SKU. To agg over other features is a simple find-and-replace through these queries.

DBUs by cluster:

SELECT usage_metadata.cluster_id, SUM(usage_quantity) AS DBUs
FROM system.billing.usage
WHERE (usage_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
   OR usage_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
   AND usage_metadata.cluster_id IS NOT NULL
   AND usage_unit = 'DBU'
GROUP BY (usage_metadata.cluster_id)
ORDER BY 2 DESC

DBUs by tag:

SELECT tag, SUM(DBUs) AS DBUs
FROM (
  SELECT CONCAT(:tag, '=', IFNULL(custom_tags[':tag'], 'null')) AS tag, usage_quantity AS DBUs
  FROM system.billing.usage
  WHERE (usage_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
    OR usage_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
    AND usage_metadata.cluster_id IS NOT NULL
    AND usage_unit = 'DBU')
GROUP BY (tag)
ORDER BY 2 DESC

DBUs by SKU for Classic Compute:

SELECT sku_name, SUM(usage_quantity) AS DBUs
FROM system.billing.usage
WHERE (usage_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
   OR usage_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
   AND usage_metadata.cluster_id IS NOT NULL
   AND usage_unit = 'DBU'
GROUP BY (sku_name)
ORDER BY 2 DESC

DBUs by SKU for Serverless SQL:

SELECT sku_name, SUM(usage_quantity) AS DBUs
FROM system.billing.usage
WHERE (usage_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
   OR usage_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
  AND usage_unit = 'DBU'
  AND sku_name LIKE '%SERVERLESS_SQL%'
GROUP BY (sku_name)
ORDER BY 2 DESC

Cost

To find cost, we will need to understand usage as above and multiple by the price. This requires a join between the usage and pricing tables. Further, since pricing can change, we need to ensure we split the usage amounts and times between the price change boundaries. To make things simple, we employ CTEs.

WITH current_pricing AS (
  SELECT
    sku_name, price_start_time, price_end_time,
    (
      pricing.effective_list.default
      * (1.000 - (:discount / 100.000))
    ) AS price_with_discount
  FROM system.billing.list_prices
  WHERE usage_unit='DBU'
    AND currency_code='USD'
    AND (price_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
      OR price_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
      OR price_end_time IS NULL)
),
current_usage AS (
  SELECT sku_name, usage_start_time, usage_end_time, usage_quantity AS DBUs
  FROM system.billing.usage
  WHERE (usage_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
    OR usage_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
    AND usage_metadata.cluster_id IS NOT NULL
    AND usage_unit = 'DBU'
),
current_usage_with_pricing AS (
  SELECT
    p.sku_name,
    UNIX_MILLIS(p.price_start_time) AS price_start_time,
    UNIX_MILLIS(p.price_end_time) AS price_end_time,
    UNIX_MILLIS(u.usage_start_time) AS usage_start_time,
    UNIX_MILLIS(u.usage_end_time) AS usage_end_time,
    u.DBUs,
    p.price_with_discount
  FROM   current_pricing p
    JOIN current_usage u
    ON p.sku_name = u.sku_name
      AND (CASE
        WHEN p.price_end_time IS NOT NULL THEN
          (u.usage_start_time BETWEEN p.price_start_time AND p.price_end_time
          OR u.usage_end_time BETWEEN p.price_start_time AND p.price_end_time)
        ELSE
          (u.usage_start_time >= p.price_start_time
          OR u.usage_end_time >= p.price_start_time)
        END
      )
)
SELECT SUM(
  (
    (ARRAY_MIN(ARRAY(usage_end_time, price_end_time))
     - ARRAY_MAX(ARRAY(usage_start_time, price_start_time))
    )
    / (usage_end_time-usage_start_time)
  ) * DBUs * price_with_discount
) AS cost
FROM current_usage_with_pricing

Here, we get the relevant pricing as current_pricing; the relevant usage as current_usage; join those together, and then aggregate the values. This should produce a value in dollars for cost of clusters within the time interval. Note the usage_metadata.cluster_id IS NOT NULL in the filter.

Just for good measure we exemplify a few more common uses.

%DBUs by SKU

WITH usage AS (
  SELECT sku_name, usage_quantity
  FROM system.billing.usage
  WHERE usage_unit = 'DBU'
    AND (usage_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
      OR usage_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
),
total_usage AS (
  -- OK to cross-join against this, it is a singleton dataset
  SELECT SUM(usage_quantity) AS total
  FROM usage
),
by_sku AS (
  SELECT sku_name, SUM(usage_quantity) AS DBUs
  FROM usage, total_usage
  GROUP BY sku_name
)
SELECT sku_name, DBUs/total AS ratio
FROM by_sku, total_usage
ORDER BY 2 DESC

DLT Maintenance Usage in DBUs

SELECT SUM(usage_quantity) dlt_maintenance_dbus
FROM system.billing.usage
WHERE usage_unit = 'DBU'
  AND (usage_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
    OR usage_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
  AND usage_metadata.dlt_maintenance_id IS NOT NULL

Warehouse Usage in DBUs

SELECT usage_metadata.warehouse_id, SUM(usage_quantity) warehouse_dbus
FROM system.billing.usage
WHERE usage_unit = 'DBU'
  AND (usage_start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
    OR usage_end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
  AND usage_metadata.warehouse_id IS NOT NULL
GROUP BY usage_metadata.warehouse_id
ORDER BY warehouse_dbus DESC

Informational

Operational complexity eventually rises to a level that no human can reasonably crunch down into a report. Always. Bet on it. We need the machine to do this work for us.

We add in more tables:

  • system.query.history

  • system.compute.node_types

  • system.compute.clusters

  • system.access.audit

Node Types that match a certain criteria

This can help a developer set up a suite of benchmarking tests for nodes within certain minimum and maximum size characteristics. A test tool can easily resubmit jobs with different node types and collect results using some of the queries above for review once complete.

SELECT
  node_type,
  memory_mb/1024 AS memory_gb,
  core_count,
  gpu_count
FROM system.compute.node_types
WHERE
  -- Memory from 64GB to 512GB
  memory_mb/1024 BETWEEN 64 AND 513
  AND core_count > 16
ORDER BY
  -- Some metric to track approx cost
  core_count * memory_mb * (1+gpu_count) DESC

Largest Clusters and their Owners

This enumerates recent clusters, orders by size, and shows the most significant first. This can help track either the processing leader or waste leader, depending upon the circumstances.

WITH clusters AS (
  SELECT *
  FROM system.compute.clusters
  WHERE delete_time IS NULL -- current
    OR create_time >= (current_timestamp() - make_dt_interval(0, :interval_hours)) -- recent
  QUALIFY 1 = (ROW_NUMBER() OVER (
    PARTITION BY cluster_id
    ORDER BY
      change_time DESC
  ))
),
drivers AS (
  SELECT c.owned_by, c.cluster_id, nt.core_count, nt.memory_mb/1024 AS memory_gb, nt.gpu_count
  FROM system.compute.node_types nt,
       clusters c
  WHERE nt.node_type = c.driver_node_type
),
workers AS (
  SELECT c.owned_by, c.cluster_id, nt.core_count, nt.memory_mb/1024 AS memory_gb, nt.gpu_count, COALESCE(c.worker_count, c.max_autoscale_workers) AS qty
  FROM system.compute.node_types nt,
       clusters c
  WHERE nt.node_type = c.worker_node_type
)
SELECT DISTINCT
  drivers.owned_by,
  drivers.cluster_id,
  drivers.core_count + (workers.core_count * workers.qty) AS total_cores,
  drivers.memory_gb + (workers.memory_gb * workers.qty) AS total_memory_gb,
  drivers.gpu_count + (workers.gpu_count * workers.qty) AS total_gpus
FROM drivers, workers
WHERE drivers.cluster_id = workers.cluster_id
ORDER BY total_cores*total_memory_gb*(1+total_gpus) DESC

Minimum DBR in use and by whom

This chooses the minimum DBR version currently being used by any cluster and which users own those low-version clusters. Useful during an upgrade cycle.

WITH dbrs AS (
  SELECT regexp_extract(dbr_version, '(\\d+\\.\\d+\\.(\\d|\\w)+)') AS dbr
  FROM system.compute.clusters
  WHERE delete_time IS NULL -- current
    OR create_time >= (current_timestamp() - make_dt_interval(0, :interval_hours)) -- recent
),
min_dbr AS(
  SELECT MIN(dbr) AS dbr
  FROM dbrs
  WHERE dbr IS NOT NULL
    AND '' <> dbr
)
SELECT DISTINCT owned_by, dbr
FROM system.compute.clusters, min_dbr
WHERE dbr_version LIKE CONCAT('%',dbr,'%')

Relative Query rates by Status

Provides the percentage of total queries are successful, cancelled, or failed.

WITH totals AS (
  SELECT execution_status, COUNT(*) qty
  FROM system.query.history
  WHERE execution_status IN ('FINISHED','CANCELED','FAILED') -- Completed only
    AND (start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
      OR end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
  GROUP BY execution_status
),
total_runs AS (
  SELECT SUM(qty) AS total
  FROM totals
)
SELECT execution_status, qty/total AS ratio
FROM totals, total_runs

Why Queries Fail

If a high failure rate is shown above, this will cause extra costs due to cluster use without returning results. Best to find the failure reasons that are most common and educate the team about those failures.

WITH errors AS (
  SELECT
    COALESCE(
      NULLIF(REGEXP_EXTRACT(error_message, '\\[(.*?)\\]'), ''),
      NULLIF(REGEXP_EXTRACT(error_message, '(\\w+((Exception)|(Error)):)'), ''),
      error_message
    ) AS error_type
  FROM system.query.history
  WHERE error_message IS NOT NULL
    AND (start_time >= current_timestamp() - make_dt_interval(0, :interval_hours)
      OR end_time >= current_timestamp() - make_dt_interval(0, :interval_hours))
)
SELECT error_type, COUNT(*) AS qty
FROM errors
GROUP BY error_type
ORDER BY qty DESC

Clusters that do not bear the owner’s name

It is common to enforce a naming convention. Queries like this can help find holes in your policies.

SELECT cluster_id, cluster_name, owned_by
FROM system.compute.clusters
WHERE cluster_name IS NOT NULL -- clusters with names
  AND owned_by IS NOT NULL -- clusters with Owners
  AND NOT REGEXP(LOWER(cluster_name), 'job-\\d+-run-\\d+.*') -- ignore jobs
  AND NOT REGEXP(
    REPLACE(LOWER(cluster_name), ' '),
    REPLACE(SUBSTR(LOWER(owned_by) FROM 1 FOR INSTR(owned_by, '@')-1), '.', '.?')
  )

Job Triggers by creator and type

Who is triggering jobs through which means. Who is triggering the most.

SELECT
  request_params['runCreatorUserName'] AS creator,
  request_params['jobTriggerType'] AS trigger,
  COUNT(*) AS qty
FROM system.access.audit
WHERE service_name = 'jobs'
  AND action_name = 'runTriggered'
  AND event_time >= (current_timestamp() - make_dt_interval(0, :interval_hours))
GROUP BY all
ORDER BY qty DESC

Active Users

How widespread is adoption?

SELECT COUNT(DISTINCT user_identity)
FROM system.access.audit
WHERE event_time >= (current_timestamp() - make_dt_interval(0, :interval_hours))
  AND user_identity.email LIKE '%@%'

Failure Leaders

Which usernames are causing the most failure responses in the audit trail? This could be indicative of networking issues or malicious intent.

SELECT user_identity.email AS email, COUNT(*) AS qty
FROM system.access.audit
WHERE event_time >= (current_timestamp() - make_dt_interval(0, :interval_hours))
  AND user_identity.email LIKE '%@%'
  AND NOT (response.status_code BETWEEN 200 AND 300)
GROUP BY ALL
ORDER BY qty DESC

Successful DB SQL Downloads by Hour

DB SQL exfiltration rate tracker. Exfil can be extremely costly depending upon cloud vendor terms; this is a hidden direct cost that is often overlooked until it is too late.

SELECT DATE_FORMAT(event_time, 'yyyyMMddHH') AS the_hour, COUNT(*) AS qty
FROM system.access.audit
WHERE event_time >= (current_timestamp() - make_dt_interval(0, :interval_hours))
  AND service_name = 'databrickssql'
  AND action_name = 'downloadQueryResult'
  AND response.status_code = 200
GROUP BY ALL
ORDER BY the_hour DESC

Scala: Stable Identifier Required

This error message seems to be the source of a lot of confusion and pain across the Scala ecosystem. Here, we will attempt to make it less scary.

Take the following:

val b = 2
def simpleMatch(a: Int): String = a match{
  case `b` => "OK"
  case _ => a + " NOK"}
(0 to 5).map(simpleMatch)

The result is:

Vector(0 NOK, 1 NOK, OK, 3 NOK, 4 NOK, 5 NOK)

When we do something as seemingly benign as change the val to a var our work life is immediately made worse

var b = 2
def simpleMatch(a: Int): String = a match{
  case `b` => "OK"
  case _ => a + " NOK"}
(0 to 5).map(simpleMatch)

And the compiler attempts to help us by kindly letting us know error: stable identifier required, but this.b found. What?!?!?!

The Scala Language Specification

When an error message is too terse for understanding a few great places to start are stack overflow, gitter, twitter or any other messaging or questioning service where a lot of scala developers gather. Assuming you've already exhausted these options we dig into the scala language specification (SLS) itself.

Section 3.1 of the SLS states very clearly

A stable identifier is a path which ends in an identifier.

Ok. Not super helpful immediately but, it does lead us to further reading; we need to understand both, paths and identifiers.

Identifiers

Section 1.1 defines identifiers and chapter 2 describes them. It is very dry however illuminating reading. Identifiers are the names of things. The word name here is applied broadly; operators (such as *, +, =, ++=) are also names.

Paths

Section 3.1 of the SLS is on Paths. It gives four cases

  • The empty path ε
  • C.this, where C references a class. The path this is taken as a shorthand for C.this where C is the name of the class directly enclosing the reference.
  • p.x where p is a path and x is a stable member of p. Stable members are packages or members introduced by object definitions or by value definitions of non-volatile types.
  • C.super.x.x or C.super[M].x[M].x where C references a class and xx references a stable member of the super class or designated parent class M of C. The prefix super is taken as a shorthand for C.super where C is the name of the class directly enclosing the reference.

The third case helps us here. Our error states error: stable identifier required, but this.b found (this.b looks like p.x). The difference is our b is not a package, object definition or val; it is a var. This makes some semantic sense; the word stable can hardly be used to describe something like a var which can change at any time.

Basically, a stable identifier is simply a name which is bound statically to a value. They are required for certain tasks (like pattern matching) so the compiler can make sense of the code it is generating and the types it is inferring. Next time you see this error just check that your types are well defined, and you are not shadowing any stable names with unstable ones. A quick work around is instead of matching on the unstable identifier you set it equal to a stable one

val definitelyStable = b
def simpleMatch(a: Int): String = a match{
  case `definitelyStable` => "OK"
  case _ => a + " NOK"}
(0 to 5).map(simpleMatch)

Introductory TyDD in Scala: Reading Type Level Functions

This is shapeless' HList

sealed trait HList extends Product with Serializable

final case class ::[+H, +T <: HList](head : H, tail : T) extends HList {
  override def toString = head match {
    case _: ::[_, _] => "("+head+") :: "+tail.toString
    case _ => head+" :: "+tail.toString
  }
}

sealed trait HNil extends HList {
  def ::[H](h : H) = shapeless.::(h, this)
  override def toString = "HNil"
}

case object HNil extends HNil

The important thing to note is it is very similar to a standard scala List just at the type level. It has a head and tail as type parameters and the tail is itself an HList. The implementation details are not important for our purposes here. All we need to take from this is an HList is a recursive list (head, tail) which is either empty (HNil) or whose elements can be of different types.

A Type Level Operation on HList

This is a function defined in shapeless which helps client code work with HLists

trait Mapped[L <: HList, F[_]] extends Serializable {
  type Out <: HList
}
object Mapped {
  …
  type Aux[L <: HList, F[_], Out0 <: HList] =
    Mapped[L, F] { type Out = Out0 }
 implicit def hnilMapped[F[_]]: Aux[HNil, F, HNil] =
    new Mapped[HNil, F] { type Out = HNil }
  …
  implicit def hlistMapped1[
  H, T <: HList, F[_], OutM <: HList](implicit
  mt: Mapped.Aux[T, F, OutM]): Aux[H :: T, F, F[H] :: OutM] =
    new Mapped[H :: T, F] { type Out = F[H] :: OutM }
}

We can see all the parts we mentioned in the first post of this series. Recall:

  1. Keywords
  2. Input types
  3. Output types
  4. Body

Note there are multiple implicit def declarations. The modifiers implicit def are similar to case statements in value function bodies. Let's take this piece by piece.

The First Case

implicit def hnilMapped[F[_]]: Aux[HNil, F, HNil] =
new Mapped[HNil, F] { type Out = HNil }

When focusing on the types, the intent becomes clear

implicit def hnilMapped[F[_]]: Aux[HNil, F, HNil] =
  new Mapped[HNil, F] { type Out = HNil }

This reads something like: Given a type constructor, yield the empty HList. Simple enough! This stuff is comprehensible afterall!

The Second Case

implicit def hlistMapped1[
H, T <: HList, F[_], OutM <: HList](implicit
mt: Mapped.Aux[T, F, OutM]): Aux[H :: T, F, F[H] :: OutM] =
  new Mapped[H :: T, F] { type Out = F[H] :: OutM }

This is admittedly more complex. Let's take it in stride again focusing on the types.

implicit def hlistMapped1[
H, T <: HList, F[_], OutM <: HList](implicit
mt: Mapped.Aux[T, F, OutM]): Aux[H :: T, F, F[H] :: OutM] =
  new Mapped[H :: T, F] { type Out = F[H] :: OutM }
  • Given a head, a tail which is an HList, a type constructor and another HList
  • Also, given proof that this type class holds good for our tail
  • We can generate an HList where our type constructor is applied to our head and the tail follows

A brief Word on Induction

Many of the type level functions encountered in the wild are of this type. Where there is at the very least

  1. a trivial case which produces some terminal element
  2. a more complex case where given a type or arity n can produce a type of arity n+1

In other words, recursive types can be generated inductively assuming you have an instance of the recursive type and all the necessary parts to wrap that instance within the target type.

In the next post we'll create some type level code using these methods.

Introductory TyDD in Scala: Basic Type Class Development

(A more complete treatment of type classes and higer kinds.)

A simple Type Class

Take the following

trait Mapping[A, B]{
  def map(a: A): B
}

and an instance for it

val mapping: Mapping[List[Int], List[String]] =
  new Mapping[List[Int], List[String]]{
    override def map(a: List[Int]): List[String] =
      a.map(_.toString)
  }

This instance is super restrictive. It only works for taking Int into String. We want to map a List of any type. Since we know what our type parameters are, we can achieve our goal by passing in a function

trait ListMapping[A, B]{
  def map(list: List[A])(f: A => B): List[B]
}

So, given a List[A] and a function, A => B, we can get a List[B]. And by taking the type parameters from the trait definition and placing them onto the function definition, we can squeeze out a bit more freedom.

trait ListMapping{
  def map[A, B](list: List[A])(f: A => B): List[B]
}
val mapping: ListMapping =
  new ListMapping{
    override def map[A, B](a: List[A])(f: A => B): List[B] =
      a.map(f)
  }

Now, why would anyone ever do this? The List type provides a map function which does exactly this. With this approach one may provide any number of methods for mapping a list. Take this alternative:

val reverseMapping: ListMapping =
  new ListMapping{
    override def map[A, B](a: List[A])(f: A => B): List[B] =
      a.reverse.map(f)
  }

Through type classes, we can define new functionality for old data structures on the fly. Similar code can be written for sort order, string formatting or just about anything else.

Making things even more general

While, the ability to map Lists of any type in any number of ways is fairly abstract it is not abstract enough for our purposes. What if we want to map a different data structure such as an Option or a Stream or a spark Dataset?

Luckily, Scala has a language feature which can help us out here.

trait WithMap[F[_]]{
  def map[A, B](m: F[A])(f: A => B): F[B]
}

The type parameter, F[_], has a type parameter of _, this tells the compiler that our type parameter itself requires a type parameter. Notice in our definition all mention of List has been replaced by our parameter, F. This just says that given a type, F, which itself takes a type parameter, we can change the inner type or F without changing F. We can do this with any parameterized type of arity 1.

implicit val listWithMap = new WithMap[List]{
  override def map[A, B](m: List[A])(f: A => B): List[B] =
    m.map(f)
}
implicit val optionWithMap = new WithMap[Option]{
  override def map[A, B](m: Option[A])(f: A => B): Option[B] =
    m.map(f)
}
implicit val streamWithMap = new WithMap[Stream]{
  override def map[A, B](m: Stream[A])(f: A => B): Stream[B] =
    m.map(f)
}
val reverseListWithMap = new WithMap[List]{
  override def map[A, B](m: List[A])(f: A => B): List[B] =
    m.reverse.map(f)
}


With these techniques we can define super polymorphic functions. Take this pretty strinfigy function

def prettyString[F[_]: WithMap, A](m: F[A])(f: A => String): String = {
  implicitly[WithMap[F]].map(m)(f).toString
}

This takes two type parameters, F[_]: WithMap and A. The `:` character in the first type parameter tells the compiler that it needs an implicit instance of WithMap defined for our type F.

And here is a data processor defined in the same way

def processData[F[_]: WithMap, A, B, C, D](
  m1: F[A])(f1: A => B)(f2: B => C)(f3: C => D): F[D] = {
  val F = implicitly[WithMap[F]] 
  val m2 = F.map(m1)(f1)
  val m3 = F.map(m2)(f2)
  F.map(m3)(f3)
}

We have taken an implementation detail (the map function on List, Option, etc...) and brought it outside the type. This has given us the ability to talk about data which has a sensible map function without knowing what that data necessarily looks like.

Next we'll learn how to read some of the Type Level functions that exist in the shapeless library.

Introductory TyDD in Scala: Anatomy of a Type Level Function

Value Level Functions

A value level function typically looks like

def f(a: Int, b: Int): String = {
  a.toString + b.toString
}

The key parts are

  1. A keyword: def
  2. Inputs: a: Int, b: Int
  3. Outputs: String
  4. Body: a.toString + b.toString

When you see these 4 parts, you know you are reading a value level function. Nothing surprising here. Now, let's see what a similar definition looks like at the type level.

Type Level Functions

A type level function looks like

trait MyTrait[A, B]{type Out}
object MyTrait{
  def apply[A, B, C](): MyTrait[A, B]{type Out = C} =
    new MyTrait[A, B]{override type Out = C}
}

In this definition there is a type refinement, MyTrait[A, B]{type Out = C}. These are undesirable artifacts of type level development. To simplify these definitions we use the Aux alias (a document about on this). Aux helps us remove type refinements from our logic.

A type level function looks like

trait MyTrait[A, B]{type Out}
object MyTrait{
  type Aux[A, B, C] = MyTrait[A, B]{type Out = C}
  def apply[A, B, C](): Aux[A, B, C] =
    new MyTrait[A, B]{override type Out = C}
}

The type refinement from the previous example is replaced by the nicer (more readable, fewer braces, less code) Aux type.

Type level functions have the same 4 key parts as value level functions

  1. Keywords: trait object
  2. Inputs: A, B
  3. Outputs: type Out
  4. Body: def apply[A, B, C](): Aux[A, B, C]

Here the inputs are type parameters and outputs are type members. This is so the output types are not erased and can be referenced later in business logic. This is similar to value level functions as the result of a value level function does not expose the inputs required by the function.

Bodies of type level functions are value level functions. They are typically only a few lines long. Their purpose is to present the compiler with a way to construct a new type from the types provided. This is what these blog posts will focus on.

Whenever you see a set of definitions which have these 4 qualities, you know you are looking at a type level function.

The type class is the fundamental element of this style of type driven development. The next post will give an overview of this concept.

Simple Generic Derivation with Shapeless

(Code on GitHub)

Shapeless is a library which makes generic programming in Scala easier. Generic Programming gives you the ability to build abstractions that the language does not directly support

First thing's first:

import shapeless._

For the purposes of this post, this is the only import needed.

HList

The HList type provides a way to maintain a collection of items whose types may be different. For example say you have an Int, a String and a Double that you want inside a list. Using a standard Scala List one has:

scala> List(1, "1", 1.0)
res0: List[Any]

The problem is the type information is lost, so when we use this data, we need to cast the appropriate index to the appropriate type. HList is a list which holds type information for each element separately. With the shapeless import, HLists are constructed using the :: operator

scala> 1 :: "1" :: 1.0 :: HNil//very similar syntax to standard List
res1: shapeless.::[Int,shapeless.::[String,shapeless.::[Double,shapeless.HNil]]]

We can see the type information for Int, String and Double are kept as part of the overall type. This implies we can produce concrete types for lists of arbitrary elements. This is the cornerstone of Generic Derivation with Shapeless.

Note: Yes, Scala has perfectly reasonable Product types with tuples and case classes and whatnot; however, for reasons that will become clear later in this post, these types are insufficient for our purposes here.

Typeclass Boilerplate

Let's take a typeclass

trait Foo[Type]{
  def bar(t: Type): String
}

Not super exciting but, it will get the point across. Say we create instances for Int, String and Double

implicit def fooInt = new Foo[Int]{
  def bar(t: Int): String = t + ": Int"
}
implicit def fooString = new Foo[String]{
  def bar(t: String): String = t + ": String"
}
implicit def fooDouble = new Foo[Double]{
  def bar(t: Double): String = t + ": Double"
}

Now we have instances we can use for all things that are Int or String or Double and we are happy with this for a time. Then we get a request for a Foo instance that can be used for both Int and String so, we produce one using our new friend the HList:

implicit def fooIntString = new Foo[Int :: String :: HNil]{
  def bar(t: Int :: String :: HNil): String = {
    val i :: s :: HNil = t //very similar syntax to standard List
    fooInt.bar(i) + ", " + fooString.bar(s)
  }
}

But, this is coming in from user input on a webservice and users are notoriously inconsistent with the ordering of their arguments. We need one for String then Int as well:

implicit def fooStringInt = new Foo[String :: Int :: HNil]{
  def bar(t: String :: Int :: HNil): String = {
    val s :: i :: HNil = t
    fooString.bar(s) + ", " + fooInt.bar(i)
  }
}

Great! But what about combinations with Double? And what about when we need to support Long or List[String] or any other type? The combinations here blow up quite quickly in any reasonable application. Luckily, Scala and Shapeless give us a few tricks to get rid of all the tedious boilerplate present in any typeclass polymorphic system.

Implicits to the Rescue

All of the typeclass definitions thus far have been declared implicit. There is a fantastic reason for this: Implicits get the Scala compiler to produce boilerplate declarations for us at compile time. We can greatly reduce the amount of boilerplate needed to create typeclass instances for combinations of types. All we have to do is tell the compiler what to do.

Recall, all our HList declarations end with HNil. We need an instance for HNil

implicit def fooNil = new Foo[HNil]{def bar(t: HNil): String = ""}

Recall too, what the type of our HList looked like

scala> 1 :: "1" :: 1.0 :: HNil
res1: shapeless.::[Int,shapeless.::[String,shapeless.::[Double,shapeless.HNil]]]

removing the noise, we can see a pattern of nesting: [Int,[String,[Double,HNil]]]. We know from math that when we have nested syntax, we evaluate from the inside out; this applies for Scala code as well. So logically, we start with an HNil, prepend a Double, prepend a String then prepend an Int.

We have given the compiler implicit typeclass instances for each of these types; all we need is to give it an implicit operation for prepending an instance onto another instance:

implicit def fooPrepend[Head, Tail<:HList](implicit
  head: Foo[Head],
  tail: Foo[Tail]): Foo[Head :: Tail] = new Foo[Head :: Tail]{
  def bar(t: Head :: Tail): String = {
    val hd :: tl = t
    head.bar(hd) + ", " + tail.bar(tl)
  }
}

This says, Given a Foo instance for some type, Head, and some HList, Tail, we can produce a new Foo instance for the HList, Head :: Tail. We can see this work in the REPL:

scala> :paste
// Entering paste mode (ctrl-D to finish)

val a: Foo[Int :: HNil] = implicitly
val b: Foo[String :: HNil] = implicitly
val c: Foo[String :: Int :: HNil] = implicitly
val d: Foo[Double :: String :: Int :: HNil] = implicitly
val e: Foo[Double :: String :: Int :: String :: HNil] = implicitly

// Exiting paste mode, now interpreting.

a: Foo[shapeless.::[Int,shapeless.HNil]]
b: Foo[shapeless.::[String,shapeless.HNil]]
c: Foo[shapeless.::[String,shapeless.::[Int,shapeless.HNil]]]
d: Foo[shapeless.::[Double,shapeless.::[String,shapeless.::[Int,shapeless.HNil]]]]
e: Foo[shapeless.::[Double,shapeless.::[String,shapeless.::[Int,shapeless.::[String,shapeless.HNil]]]]]

There was no need to define by hand Foo instances for any HList type that is a combination of types for which implicit Foo instances exist in scope.

The Reason for HList

Why not just use any old Product type? HList, unlike Product, has a way to take any HList and combine it with another type to produce another HList. There is no such capability baked into Product.

Recall, the last thing we did was build a Foo instance for a larger HList out of a Foo instance for a smaller HList. This process, started with the smallest possible HList, HNil, and built larger and larger HLists until the required type was produced.

This talk of Products leads us to our next step. Most applications have a lot of Product types and need typeclass instances for these Products. Given an HList is just a really fancy Product can we generically derive instances for tuples and case classes?

Note: There is an excellent talk on how to do this with nested Tuples.

Generic

Shapeless provides a typeclass, Generic, which helps convert between Product and its similar (isomorphic) HList. It is fairly straight forward to produce a Generic instance for a Product and get an HList:

scala> :paste
// Entering paste mode (ctrl-D to finish)

val gen = Generic[(Int, String)]
val asH = gen.to((1, "1"))

// Exiting paste mode, now interpreting.

gen: shapeless.Generic[(Int, String)]
asH: gen.Repr = 1 :: 1 :: HNil

With this, we can take any Product and create a Foo instance for an HList that is similar:

scala> :paste
// Entering paste mode (ctrl-D to finish)

val genIS = Generic[(Int, String)]
val genDS = Generic[(Double, String)]
val a = implicitly[Foo[genIS.Repr]].bar(genIS.to(1, "1"))
val b = implicitly[Foo[genDS.Repr]].bar(genDS.to(1.0, "1"))

// Exiting paste mode, now interpreting.

genIS: shapeless.Generic[(Int, String)]
genDS: shapeless.Generic[(Double, String)]
a: String = 1: Int, 1: String
b: String = "1.0: Double, 1: String, "

This implies given:

  1. a Product, P
  2. a Generic instance for P, gen
  3. and an implicit Foo instance for the HList, gen.Repr

We can produce a result that would be valid for an instance of Foo[P]. All we need to do is find a way to delegate the work Foo[P] needs to do to an instance of Foo[gen.Repr].

Generic Derivation

Like before, we'll use an implicit def so the compiler helps us along.

implicit def fooProduct[P<:Product, H<:HList](implicit
  gen: Generic[P]{type Repr = H},//compiler needs to know Generic converts P to H
  foo: Foo[H]): Foo[P] = {
  new Foo[P]{
  def bar(p: P): String = foo.bar(gen.to(p))
}
}

This states, Given a Product and an HList, P and H, a Generic instance, gen, which can convert between P and H and a Foo instance for H, we can construct a Foo instance for P. There are two things to note:

  1. We do not need to type implicit instanced for Generic like we had to for Foo
  2. The implicit Generic needs a structural bound

The shapeless library provides out of the box, automatic Generic instances for any Product type and brings these into implicit scope. The problem with Generic is the type Repr is a member (dependent) type not a type parameter. In order to make sure the compiler can prove Repr is indeed our parameter H, we need to give it explicitly. And with this, we have our generic derivation for Foo:

scala> :paste
// Entering paste mode (ctrl-D to finish)

val f: Foo[(String, Int)] = implicitly
val g: Foo[(Double, String, Int)] = implicitly
val h: Foo[(Double, String, Int, String)] = implicitly

case class A(i1: Int, i2: Int, s: String)
case class B(d: Double, s: String)
case class C(i1: Int, d1: Double, s: String, d2: Double, i2: Int)
val i: Foo[A] = implicitly
val j: Foo[B] = implicitly
val k: Foo[C] = implicitly

// Exiting paste mode, now interpreting.

f: Foo[(String, Int)]
g: Foo[(Double, String, Int)]
h: Foo[(Double, String, Int, String)]
defined class A
defined class B
defined class C
i: Foo[A]
j: Foo[B]
k: Foo[C]

And that's that! We can derive instances for our type class for any Product given each individual type within the product has an implicit instance defined.

In Sum

Shapeless provides a framework for Generic Programming which can help us remove boilerplate from our applications and derive instances for types that would be too tedious to write out ourselves. To take advantage of the derivation one needs a few parts:

  1. Implicit instances for basic types like Int, String, Double, etc...
  2. An implicit instance for HNil
  3. An implicit def which when given implicit instances for a type and an HList, Head and Tail, can produce an implicit instance for the HList, Head :: Tail.
  4. An implicit def which for a Product, P, when given implicit instances for an HList, H, and a Generic which can convert between P and H, can produce an implicit instance for P

What the Hell is an "Effect Type"?

I was reading through the fs2 documentation and user guide and thought to myself, this is really straight forward! And, to their credit it is. Anyone who is used to FP and Scalaz or Haskell will take to the documentation with little friction. However, when an FP novice or someone from a language with less powerful FP implementations (F#, C#, Java) encounters this documentation, its a pain to trudge through. 

Through many attempts at explaining fs2, I have found the main topic of concern is the "effect type". On its surface, this term seems rather benign:

  1. Everyone knows FP strives to have no side effects
  2. We all know certain things (IO for example) are fundamentally effectful
  3. In order to encode these effects into FP style, we build abstractions
  4. These abstractions for effects are our effect types

So, taking the canonical example from the fs2 documentation

import fs2.{io, text, Task}
import java.nio.file.Paths

def fahrenheitToCelsius(f: Double): Double =
(f - 32.0) * (5.0/9.0)

val converter: Task[Unit] =
  io.file.readAll[Task](Paths.get("testdata/fahrenheit.txt"), 4096)
    .through(text.utf8Decode)
    .through(text.lines)
    .filter(s => !s.trim.isEmpty && !s.startsWith("//"))
    .map(line => fahrenheitToCelsius(line.toDouble).toString)
    .intersperse("\n")
    .through(text.utf8Encode)
    .through(io.file.writeAll(Paths.get("testdata/celsius.txt")))
    .run

// at the end of the universe...
val u: Unit = converter.unsafeRun()

We see a file read, a parse, a transformation then a file write (reading and writing files are side effect heavy). The effect type is Task and the result is Unit. This can be read as

The converter exists to create a Task which reads a file as bytes, converts those bytes to utf-8 Strings, transform those Strings and write them back to disk in a separate file returning no result. The converter is purely effectful.

One can do this with a standard scala.collection.immutable.Stream as well.

val path = Paths.get("testdata/fahrenheit.txt")
val out = Paths.get("testdata/celsiusStream.txt")
val readerT =
  Try(Files.newBufferedReader(path, StandardCharsets.UTF_8))
val writerT =
  Try(Files.newBufferedWriter(out, StandardCharsets.UTF_8))
val result = for{
  reader <- readerT
  writer <- writerT
}yield{
  Stream.continually{reader.readLine}
    .takeWhile(null != _)
    .filter(s => !s.trim.isEmpty && !s.startsWith("//"))
    .map(line => fahrenheitToCelsius(line.toDouble).toString)
    .flatMap{Stream(_, "\n")}
    .foreach(writer.write)
}
readerT.foreach(_.close())
writerT.foreach(_.close())

So, other than the obvious fs2 io convenience functions, to most FP unindoctrinated it seems the standard Stream version is about as useful and as safe as the fs2 Stream version. However, the fs2 Stream is much better FP practice.

Function Parameters should be Declared

Upon Stream creation, there is a big difference between fs2 and standard. With fs2, the creation mechanism is a pure function

io.file.readAll[Task](Paths.get("testdata/fahrenheit.txt"), 4096)

It takes all of its necessary data as parameters and returns a Stream with effect type Task. On the other hand, the standard Stream requires a closure to be initialized

Stream.continually{reader.readLine}

It requires a by-name parameter, the by-name returns a different result upon each invocation and the body of the by-name depends on the closure within which the function is called. This line of code is impossible to understand by itself; taken out of context, it is meaningless. In other words the function lacks referential transparency.

Function Duties should be Declared

Lines from a file in fs2 are produced with

scala> io.file.readAll[Task](Paths.get("testdata/fahrenheit.txt"), 4096).
     |       through(text.utf8Decode).
     |       through(text.lines)
res3: fs2.Stream[fs2.Task,String] = evalScope(Scope(Free)).flatMap(<function1>)

and standard Stream we have

scala> val path = Paths.get("testdata/fahrenheit.txt")
path: java.nio.file.Path = testdata\fahrenheit.txt

scala> val reader = Files.newBufferedReader(path,
 | java.nio.charset.StandardCharsets.UTF_8)
reader: java.io.BufferedReader = java.io.BufferedReader@3aeed31e

scala> Stream.continually{reader.readLine}
res4: scala.collection.immutable.Stream[String] = Stream(120, ?)

There are two important differences here. The first is evaluation; even though standard streams are considered lazy, they evaluate the head value eagerly. We can see fs2 gives us a computation where standard Streams gives us a value. The second difference is the type.

In fs2 we have a type of Stream[Task, String]; standard gives us Stream[String]. The fs2 Stream explicitly expresses the intention for effectful computation, whereas the standard Stream hides this implementation detail. Another way of looking at this is, standard Streams hide their effects where fs2 Streams surface their effects. In fact, fs2 Streams (if effectful) only allow the developer to find its result through the effect type. This gives the developer a sort of heads up about what's going on in the application behind the scenes.

What the Effect Type Represents

The effect type in any such functional library represents the intent to perform an operation outside the scope of the return type. This is very common for IO (as we've seen) and other effectful operations like Logging or showing the user a pop up. The computation usually produces some sort of a result but, before returning the result writes it to disk, logs it or tells the user about a completion state. In FP, we like to express all data in a function through the function definition and effects are just weird data.

Crashing F#

There's been a lot of buzz about using F# on the CLR for functional programming. This week, I decided to take the language for a spin.

The Basics

F# has a wonderfully light and succinct syntax. It uses the same keyword, let, for both values and functions.

let str = "Hello"
let func a = a + str 
let another = func " World!"
printfn "%s" another

Also, discriminated unions (sum types) are super simple

type SumType =
  |First
  |Second
  |Third of string * int
  |Fourth of a:string * b:int * c:double

Anytime I try out a new language I try to explore five things:

 
  1. Tail recursive functions
  2. Functor
  3. Monad
  4. Free Monad
  5. Pattern Matching

F#, as far as I can tell after using it for a total of about 10 hours, can be used for 1-3 and 5 but not 4. Here is a brief synopsis of my trials.

Tail Recursion

This was extremely simple in F#. Its two steps:

  1. Mark a function as recursive using the rec keyword
  2. Put the recursive step as the last operation
module Recursion
    //need a keyword for recursion, otherwise boilerplate free
    //very nice
    let rec check many acc =
        if(0 = many) then acc
        else
            let a = many - 1
            let b = acc + many
            check a b

The Recursion.check function uses the accumulator pattern to add int values from 1 to many to the initial accumulator value acc.

Functor & Monad

This took some work! F# lacks higher kinded polymorphism. My first attempt at this was:

type Functor<'A> =
    abstract map: 'A<'B> -> ('B -> 'C) -> 'A<'C>

Which produced a compile time error: "Type parameter cannot be used as type constructor." Ouch!!!

Luckily, F# type inference is good enough that we don't really need to define type classes with a little boilerplate. Here is a Functor & Monad example:

//cannot do higher kinds so no type classes
//do not need higher kinds; type syntax takes care of it
module CategoryList
type Functor<'Type>() =
member this.map(l: list<'Type>)(f: 'Type -> 'Return) =
  List.map f l
type Monad<'Type>(func: Functor<'Type>) =
member this.point(t: 'Type) = [t]
member this.flatMap(l: list<'Type>)(f: 'Type -> list<'Return>) =
List.concat(func.map l f)

Defining a Functor and Monad instance is simple. We don't need a type hierarchy here; we only need 3 functions.

module Category
let check() =
//instantiate monad
let func = CategoryList.Functor<int>()
let monad = CategoryList.Monad(func)

//define monadic chain functions
let flatMap l f = monad.flatMap f l
let map l f= func.map f l

//define helpers
let mapper (x:int) = [1 .. x]
let mapperStr (x:int) = x.ToString()

let hundred = [1 .. 100]//data

//(|>) language level monad support
let hundredMore = (hundred
|> flatMap mapper)
printfn "%A" hundredMore

let bigString = (hundredMore
|> flatMap mapper
|> map mapperStr)
printfn "%A" bigString

The key here is the |> operator (also known as pipe). By piping a list into a flatMap into another flatMap and so on, we can get language level monad support.

Free Monad

Here I have failed. The closest I've gotten is:

module Free
type FreeMonad<'F, 'A> =
|Pure of 'A
|Suspend of 'F

type FreeMonad<'F, 'A> with
member this.fold fpure fsuspend =
match this with
|Pure(a) -> fpure(a)
|Suspend(free) -> fsuspend(free)

let inline liftF(t:'F)(func: 'T): FreeMonad<'F, 'A>
when ^T: (member map: (^F -> (^A -> ^AA) -> ^FF)) = Suspend(t)
let inline point(a: 'A)(func: 'T): FreeMonad<'F, 'A>
when ^T: (member map: (^F -> (^A -> ^AA) -> ^FF)) = Pure(a)

let check(): Unit =
let func = CategoryList.Functor<int>()
let lst: list<int> = [1..10]
let monad:FreeMonad<list<int>, int> = liftF(lst)(func)
()

I could never get the liftF function to work out. The combination of type parameters never type checks against the parameterized types AA and FF. The compiler just cannot understand the second layer of type inference that needs to happen for the map method to be recognized.

Pattern Matching

Pattern matching is pretty straight forward. You can do direct

let func a =
  match a with
  |1 -> "one"
  |2 -> "two"
  |3 | 5 | 10 | 30 -> "large"
  |_ -> "else"

with guards

let func a =
    match a with
    |value when 0 > value -> "negative"
    |value when 0 < value -> "positive"
    |_ -> "zero"

type matching

let func a =
    match a with
    |(str: string) -> "string"
    |_ -> "not"

discrimminated unions

type Union =
    |Simple
    |Pair of int * int
    |Triple of int * int * int
let func a =
    match a with
    |Simple -> "simple"
    |Pair(a, b) -> "(" + a.ToString() + "," + b.ToString() + ")"
    |Triple(a, b, c) -> "(" + a.ToString() + "," + b.ToString() + c.ToString() + ")"

That's most of pattern matching, by combining these patterns you can get some pretty serious control folow moving through your application.

 
 
 

Cats (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

Last time, we discovered Type Classes and how they give us more power to build extensible software than sub class polymorphic trees can give us. Here we introduce the Cats library to use a production quality library instead of the Adder and Chainer classes from the previous post.

We'll need the following imports from the cats library:

import cats.Monoid
import cats.Monad
import cats.implicits._

and to recall our Team definition:

case class Team[Type](members: List[Type])

The Monoid Type Class

Recall the Adder trait described the process of adding two containers of the same type together to create a new container of that type. This is the basic idea for a structure in category theory called a Monoid. A full treatment of a Monoid is beyond the scope of this post; the important thing here is they describe semantics for adding members of other types together.

In the cats library, a Monoid for our unstructured Team data is defined thus:

implicit def adder[Arg]: Monoid[Team[Arg]] =
  new Monoid[Team[Arg]]{
    override def empty: Team[Arg] = Team(Nil)
    override def combine(
        left: Team[Arg], right: Team[Arg]): Team[Arg] = {
      val newMembers = left.members ++ right.members
      Team(newMembers)
    }
  }

A Monoid has two operations, empty and combine, where empty is the identity element under the combine operation. In other words empty is a value, e, that when combined with any other value, v, returns v. Monoid replaces our Adder trait from the previous post. Our structured Team would have Monoid:

implicit def adder[Arg]: Monoid[Team[Arg]] =
  new Monoid[Team[Arg]]{
    override def empty: Team[Arg] = Team(Nil)
    override def combine(
        left: Team[Arg], right: Team[Arg]): Team[Arg] = {
      val (lead1, indi1) = left.members.splitAt(2)
      val (lead2, indi2) = right.members.splitAt(2)
      val newMembers = lead1 ++ lead2 ++ indi1 ++ indi2
      Team(newMembers)
    }
  }

Note, the empty value is the same for both cases. This is often the case for multiple Monoids over the same data structure; no matter how you combine elements, the identity is trivially applied.

The Monad Type Class

Now, we shift our focus to the Chainer trait from the previous post. This trait described how to sort of flatten a nesting of containers for instance, a Team[Team[_]] into a Team[_]. This is the basic operation behind the Monad. Again, we're not interested in figuring out Monads in detail; we're just trying to use a library in our work.

With Cats we'll have:

implicit def chainer: Monad[Team] = new Monad[Team]{
  override def flatMap[Arg, Ret](
      team: Team[Arg])(f: Arg => Team[Ret]): Team[Ret] = {
    val newMembers = team.members.flatMap(f(_).members)
    Team(newMembers)
  }
}

for our unstructred Monad and our structured would look like:

implicit def chainer: Monad[Team] = new Monad[Team]{
  override def flatMap[Arg, Ret](
      team: Team[Arg])(f: Arg => Team[Ret]): Team[Ret] = {
    val (leaders, individuals) = team.members.map{member =>
      val mems = f(member).members
      mems.splitAt(2)
    }.unzip
    Team(
        leaders.flatMap {x=>x} ++
        individuals.flatMap{x=>x})
  }
}

Monads add the flatMap (also called bind or >>= in some circles) operation to a data type. flatMap describes how to take a value of Team and a function which maps a member to a Team to produce another Team. It is a flattening operation. These are important as they describe data flow and functional composition through a system. To get the individual contributors from their Directors on could:

case class Director(name: String)
case class Manager(name: String)
case class Individual(name: String)
val directors: Team[Director] = ???
def managers(director: Director): Team[Managers] = ???
def individualContributors(manager: Manager): Team[Individual] = ???
val individuals = directors >>= (managers) >>= (individualContributors)

Then swapping out different functionality is simply recombining your function calls around your Monadic chain.

Monoids and Monads are simple to use. They describe operations to combine and process data as simple, type safe functional chains.

5. Typeclasses over Subclasses (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

In the previous post, we introduced the Argonaut library to convert between values and JSON strings. The important part of this conversion is there was no superclass or interface to implement in order to get the benefit of JSON across classes. All we needed to do was define values of type CodecJson for each of the types we wanted to convert. We added the functionality to the class without changing the class itself. 

Argonaut allowed us to call toJson on classes with a codec and decodeOption on Strings to produce values of classes with a codec defined. This type of polymorphism, where a function's implementation depends on its inputs is called ad-hoc polymorphism. Furthermore, when we define a type, T, which defines functionality across classes to be used in ad-hoc polymorphic functions we call T a Type Class. Type Class polymorphism is a specific flavor of ad-hoc polymorphism.

Type Class polymorphism is a powerful tool for expressing context based functionality far more powerful than subclass polymorphism. As a well-known example take the Java interfaces Comparable and Comparator. If some data is defined in a class which implements Comparable, it can be sorted one way and needs an entire second class definition to be sorted with a different method. On the other hand, using Comparator the data is defined with a single class and each sort method gets its own Comparator. Comparator is a Type Class and allows the developer to determine in which contexts which sorting method should be used.

Subclass Method

Take the following traits:

trait Adder[Type]{
  def add(other: Type): Type
}
trait Chainer[Arg, Type[Arg]]{
  def chain[Res](f: Arg => Type[Res]): Type[Res]
}

Adder describes how to add two values of some Type together. Chainer describes how to chain operations over a parameterized type.

We'll use the idea of a team to illustrate. For the sake of simplicity we say a team consists of people of a certain profession. So we can have a team of engineers or a team of doctors or a team of cashiers or ...

Teams can (trivially) grow by hiring but, they can also grow by combining with other teams. Teams can be added.

Teams can have members who are themselves team leads. At times, the members of a lead's team must join the team the lead belongs to. This implies an operation which develops teams out of the members of teams. Teams can be chained.

Here is our implementation of Team given this functionality:

case class Team[Type](members: List[Type])
  extends Adder[Team[Type]]
  with Chainer[Type, Team]{
  override def add(other: Team[Type]): Team[Type] = {
    Team(members ++ other.members)
  }
  override def chain[Res](
      f: Type => Team[Res]): Team[Res] = {
    val list = members.flatMap(member => f(member).members)
    Team(list)
  }
}

Simple enough but this doesn't account for an organization of structured teams. For an organization who develops teams that each have one product lead and one technical lead, simple concatenation won't maintain a soft ranking of individuals within the new team. We need a new Team definition which accounts for this.

case class TeamStructured[Type](members: List[Type])
  extends Adder[TeamStructured[Type]]
  with Chainer[Type, TeamStructured]{
  override def add(
      other: TeamStructured[Type]): TeamStructured[Type] = {
    val (lead1, indi1) = members.splitAt(2)
    val (lead2, indi2) = other.members.splitAt(2)
    TeamStructured(lead1 ++ lead2 ++ indi1 ++ indi2)
  }
  override def chain[Res](
      f: Type => TeamStructured[Res]): TeamStructured[Res] = {
    val (leaders, individuals) = members.map{member =>
      val mems = f(member).members
      mems.splitAt(2)
    }.unzip
    TeamStructured(
        leaders.flatMap {x=>x} ++
        individuals.flatMap{x=>x})
  }
}

Now we have two definitions for the same data that differ only by functionality. We have a triple coupling here:

  1. Data Definition
  2. Addition Description
  3. Chaining Description

If the data needs to change (from List to Set is a good place to start) the change needs to be made in two places. Each function which accepts a Team for the purpose of team composition and combination needs to know which style of team it needs at development time. These problems gets worse for each possibility for combining and chaining teams (maybe a round robin or reverse algorithm would fit in certain situations). Type Classes solve these issues.

Type Class Method

Our traits become:

//The underscore here implies we need a parameterized type.
trait Adder[Type[_]]{
  def add[Item](
      left: Type[Item], right: Type[Item]): Type[Item]
}
trait Chainer[Type[_]]{
  def chain[Item, Res](
      arg: Type[Item], f: Item => Type[Res]): Type[Res]
}

These have the same uses as their counterparts above. However we have a single definition of the Team type:

case class Team[Type](members: List[Type])

The data is defined in a single place. Each piece of software which requires a Team has a consistent idea about what a Team is and means. The two versions of functionality are defined by:

object unstructured{
  implicit def adder: Adder[Team] = new Adder[Team]{
    override def add[Item](
        left: Team[Item], right: Team[Item]): Team[Item] = {
      Team(left.members ++ right.members)
    }
  }

  implicit def chainer: Chainer[Team] = new Chainer[Team]{
    override def chain[Item, Res](
        arg: Team[Item], f: Item => Team[Res]): Team[Res] = {
      val list = arg.members.flatMap(
          member => f(member).members)
      Team(list)
    }
  }
}

object structured{
  implicit def adder: Adder[Team] = new Adder[Team]{
    override def add[Item](
        left: Team[Item], right: Team[Item]): Team[Item] = {
      val (lead1, indi1) = left.members.splitAt(2)
      val (lead2, indi2) = right.members.splitAt(2)
      Team(lead1 ++ lead2 ++ indi1 ++ indi2)
    }
  }

  implicit def chainer: Chainer[Team] = new Chainer[Team]{
    override def chain[Item, Res](
        arg: Team[Item], f: Item => Team[Res]): Team[Res] = {
      val (leaders, individuals) = arg.members.map{member =>
        val mems = f(member).members
        mems.splitAt(2)
      }.unzip
      Team(
          leaders.flatMap {x=>x} ++
          individuals.flatMap{x=>x})
    }
  }
}

Now, each function which accepts a team, if needed, will also accept an adder or chainer or both (wholly decoupled). The down side here is each call to such a function requires at least one extra argument from the subclass versions. Scala has a fix for this limitation.

Implicits

The implicit keyword before a definition is an important part of making Type Class polymorphism beneficial to the developer. The word implicit, according to Oxford Dictionaries, means Implied though not plainly expressed. In Scala it means we can prepend the implicit keyword to an argument list and not explicitly produce the value in code assuming a valid value is in scope. For example:

def chainTeams[Type, Result](
  team: Team[Type])(
  func: Type => Team[Result])(
  implicit chain: Chainer[Team]): Team[Result] = {
  chain.chain(team, func)
}

This has three arguments, the team to operate on, the operation to perform, and the chainer for application. However, since the final argument is implicit, if we bring a valid implicit value into scope, there is no need to pass it in directly.

import structured._
val team: Team[Person] = Team(List(???))
val func: Person => Team[Person] = {(p: Person) => ???}
val newTeam: Team[Person] = chainTeams(team)(func)//valid

Since, we don't need to explicitly state the Chainer it keeps boilerplate clean. A nice effect of implicit resolution is if you have scoped two separate valid values for the implicit argument, the compiler will complain. The suggestion if you have multiple valid implicits in scope is to decouple your code functionally. No single scope should have use of more than one implicit of the same type; this is a code smell. A corollary to this is one should not explicitly provide implicit arguments; let the compiler do its work.

In the final post of this series, we will introduce another library, Cats.

Monocle & Argonaut (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

In the last post, we put to rest our use of mutable objects for good. Here we learn how to make use of our new found Functional Programming powers in the real world.

Every application of sufficient user base requires a persistent settings store. The more users one has, the more styles one is responsible for accommodating. In my experience, JSON has been the most useful format for small-scale persistence. JSON is widely understood, works on the web and is plaintext. We'll use Monocle and Argonaut to implement a persistent settings store.

The first question here is "Why not circe?". I found circe to be a bit too ethereal for most Java developers to wrap their heads around. Implicit scope (especially when its as magical as circe's auto) is an alarming feature for people who come from a language with no developer-defined implicit semantics (C++ developers are quite comfortable with this notion).

Monocle

The Monocle library provides semantics for defining simple accessors and combinators on nested data. Monocle is especially well suited for handling nested Case Classes which is what we'll focus on. Take the following data definition

case class Color(r: Byte, g: Byte, b: Byte)
case class FishTank(liters: Int, color: Color, fish: List[Fish])

We have nested Case Classes as well as a nested collection. To cover the changes that can occur here we would need to define:

  • eighteen operations
  • six of which are a composition from a Fish Tank into a Color
  • one of which is nested within a List structure.

Monocle makes this simple:

val (tankLiters, tankColor, tankFish) = {
  val gen = GenLens[FishTank]
  (gen(_.liters), gen(_.color), gen(_.fish))
}
val (colorR, colorG, colorB) = {
  val gen = GenLens[Color]
  (gen(_.r), gen(_.g), gen(_.b))
}
val (tankColorR, tankColorG, tankColorB) = (
  tankColor.composeLens(colorR),
  tankColor.composeLens(colorG),
  tankColor.composeLens(colorB)
)

Defining your data using Case Classes provides Monocle with the information it needs in order to generate lenses (nested views) into your data structures. Lens composition in Monocle is a single straightforward call. We can get into and out of our data with very little boilerplate.

Now that we can define settings and alter them, we need a way to persist them and communicate them to other parts of the system. We'll use JSON as our data format and Argonaut as our transcoder.

Argonaut

Argonaut provides semantics for converting between classes and JSON strings. Like Monocle, its easiest to use with Case Classes. Taking the same classes as above we would have:

//ignore the implicit keyword.
//I promise we'll get to it in the next post!
implicit def codecTank: CodecJson[FishTank] =
  casecodec3(
      FishTank.apply, FishTank.unapply
  )("liters", "color", "fish")
implicit def codecColor: CodecJson[Color] =
  casecodec3(
      Color.apply, Color.unapply
  )("r", "g", "b")
implicit def codecFish: CodecJson[Fish] =
  CodecJson(
    (f: Fish) =>
      ("name" := f.name) ->:
      ("color" := f.color) ->:
      jEmptyObject,
    (c: HCursor) => for{
      name <- (c --\ "name").as[String]
      color <- (c --\ "color").as[String]
    }yield{(name, color) match{
      case ("One Fish", "Red Fish") => OneFish
      case ("Two Fish", "Blue Fish") => TwoFish
      case _ => NotFish
    }}
  )

There are quite a few operators here. This could make things tricky for Java developers at first but, there are few of them so no big deal. For Case Classes, Argonaut has next to no boilerplate; one passes in the apply and unapply functions and names everything. Also, composition in Argonaut is implicit. It gets a little tricky with non Case Classes but, its still not much. At most Argonaut requires two functions; one from the class to JSON, the other from JSON to the class.

One more thing to note is that codecs for simple standard collections are implicit. The codec for List[Fish] is implicitly defined by the codec for Fish.

Putting it all together

A settings object would look something like:

object settings{
  private val settings: mutable.Map[String, FishTank] =
    mutable.Map()
  
  def apply(key: String): Option[FishTank] = settings.get(key)
  def update(key: String, byte: Byte): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankColor.modify { _ => Color(byte, byte, byte) }(tank)
      case None =>
        FishTank(0, Color(byte, byte, byte), Nil)
    }
  }
  def update(key: String, size: Int): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankLiters.modify(_ => size)(tank)
      case _ =>
        FishTank(size, Color(0,0,0), Nil)
    }
  }
  def update(key: String, fish:List[Fish]): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankFish.modify(_ => fish)(tank)
      case _ =>
        FishTank(1, Color(0,0,0), fish)
    }
  }
  
  def persist(): Unit = {
    val jsonRaw = settings.toList.asJson
    val json = jsonRaw.nospaces
    putOnWire(json)
    writeToDisk(json)
  }
  def recall(): Unit = {
    val str = getFromDisk()
    val opt = str.decodeOption[List[(String, FishTank)]]
    opt.foreach{list =>
      settings ++= list.toMap
    }
  }
}

But, of course this is not threadsafe and it uses a mutable collection to perform its work. A different threadable implementation could look something like:

val actorSystem: ActorSystem = ???
implicit val timeout: akka.util.Timeout = ???
implicit val ec: ExecutionContext = ???
object asyncSettings{
  private sealed trait Message
  private case class Get(key: String)
      extends Message
  private case class SetGrey(key: String, hue: Byte)
      extends Message
      
  private class Perform extends Actor{
    override val receive: Receive = step(Map())
    def step(map: Map[String, FishTank]): Receive = {
        case Get(key) => sender ! map(key)
        case SetGrey(key, value) =>
          val newTank: FishTank = ???
          val newMap = map + (key -> newTank)
          context.become(step(newMap))
    }
    override def preStart(): Unit = ???//recall
    override def postStop(): Unit = ???//persist
  }
  
  val actor: ActorRef = actorSystem.actorOf{
    Props(new Perform())
  }
  def apply(key: String): Future[FishTank] =
    (actor ? Get(key)).collect{
      case Some(t: FishTank) => t
    }
  def update(key: String, hue: Byte) = 
    actor ! SetGrey(key, hue)
}

Here we use akka for asynchronous operations. One could also employ scalaz Task or simple Future composition or really any other asynchronous library.

Next we'll cover Type Classes to further decouple our data from functionality.

 

 

 

 

4. Objects are not Coroutines (Introducing Typelevel Scala into an OO Environment)

In the previous posts, we went over how to introduce immutability, combinators and case classes to move toward functional programming. These three points together are the basis for the point described in this point that Objects are not Coroutines.

If you are unfamiliar with coroutines, wikipedia has a basic description of them.

In Java, the usual application runs a little like this:

  1. Initialize an object
  2. Perform an operation
  3. Mutate the object
  4. Perform an Operation
  5. ...

This habit breaks all of the FP ideas we have developed so far.

When introducing Typelevel Scala, it is important to note we are not simply adding a library to an already existing system (the JVM). We are trying to change how people do their day to day work. Some of them have been writing OO Imperative software products for decades making a change of paradigm difficult. Keeping to simple language is key to our goal of shifting an organization's workflow.

The workflow shift we are suggesting here is:

  1. Define
  2. Apply Combinator
  3. ...

We will be moving from a paradigm based in mutability and state to one built on immutability and functions.

OO Imperative Style

We will be defining a school of fish and how that school of fish grows.

class BadSchool(){
  private var name: String = null
  private var depth: Depth = null
  private var location: Location = null
  private var fish: mutable.Buffer[Fish] = null
  def setName(newName: String): Unit = {
    name = newName
  }
  def getName(): String = name
  def setDepth(newDepth: Depth): Unit = {
    depth = newDepth
  }
  def getDepth(): Depth = depth
  def setLocation(newLocation: Location): Unit = {
    location = newLocation
  }
  def getLocation(): Location = location
  def setFish(newFish: mutable.Buffer[Fish]): Unit = {
    fish = newFish
  }
  def removeFish(aFish: Fish): Unit = {
    fish -= aFish
  }
  def addFish(aFish: Fish): Unit = {
    fish += aFish
  }
  def getFish(): mutable.Buffer[Fish] = fish
  override def toString(): String= {
    s"School(\n\t$name,\n\t$depth,\n\t$location,\n\t$fish)"
  }
}

In my experience, this is the type of code commonly written by people who have just made the jump from an OO language into Scala. Here we initialize the object's members to null and use mutable containers to maintain and augment state. A typical use case would look like:

def asCoroutine(): Unit = {
  val coroutine = new BadSchool()
  val (name, depth, location, fish) = someInit()
  coroutine.setName(name)
  coroutine.setDepth(depth)
  coroutine.setLocation(location)
  coroutine.setFish(fish)
  convertToJsonAndPutOnTheWire(coroutine)
  var newFish: Fish = null
  for(i <- (0 to 10)){
    newFish = nextFish(coroutine)
    coroutine.addFish(newFish)
    convertToJsonAndPutOnTheWire(coroutine)
  }
}

This application is difficult to follow and very cluttered. In order to create a new valid school of fish, one must initialize the object and call four set methods. When growing a school of fish, the developer needs to destroy the previous school of fish forcing any interaction with the object to be synchronous.

This is like a very messy coroutine. The addFish method is like a yield and there is no way to get back to the previously returned yield state.

Typelevel Style

The above impure code can be fairly simply converted to a more FP style. First we define our school of fish.

case class School(
    name: String,
    depth: Depth,
    location: Location,
    fish: immutable.Queue[Fish])

This is short and to the point. The intent of the code is clear and there are no messy methods defined for maintaining, getting and mutating state. The same use case would be implemented functionally like:

def aBetterWay(): Unit = {
  @annotation.tailrec
  def perform(qty: Int, acc: List[School]): List[School] = {
    if(qty > 0 && acc.nonEmpty){
      val head :: tail = acc
      val currentFish = head.fish.last
      val next = nextFish(currentFish)
      val result = head.copy(fish = head.fish.enqueue(next))
      perform(qty - 1, result :: acc)
    }else acc
  }
  val school = School(
      "Bikini Bottom",
      Deep,
      South,
      immutable.Queue(OneFish))
  val result = perform(10, List(school))
  result.foreach(convertToJsonAndPutOnTheWire)
}

There are two main ideas:

  1. In lieu of initializing state we define data. 
  2. Once data is defined, it cannot be redefined.

The function buildSchool builds a new school of fish from a provided school of fish. State is created, never destroyed, and there is no initialization step. Moreover, the webservice call can be made asynchronous without worry for synchronization or heap issues.

Next, we'll introduce our first libraries: Monocle and Argonaut.

 

3. Case Classes and Auto-Encapsulation (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

We have decoupled data from its usage by replacing more complicated control flow ideas (like loops null and throw) with combinators by passing functions as parameters.  This allows us to think differently about how we write programs.

Instead of thinking about software as objects acting on themselves; we can start to think of it as two parts:

  1. Data
  2. Functions which define Data flow

To illustrate, we'll redefine our color function as data.

sealed trait Fish{
  val name: String
  val color: String
}
case object OneFish extends Fish{
  override final val name: String = "One Fish"
  override final val color: String = "Red Fish"
}
case object TwoFish extends Fish{
  override final val name: String = "Two Fish"
  override final val color: String = "Blue Fish"
}
case object NotFish extends Fish{
  override final val name: String = "Ahab"
  override final val color: String = "White Whale"
}

Here, we model our fish data in three points; two good states and a bad state. Instead of modeling a two point set of data with Strings, we model it as three discrete states using case classes.

Case Classes are automatically encapsulated and immutable by default and, as the states are encoded into the types, there is no need to use null or throw.

At this point, we have built up a nice base for functional programming. We can define robust data structures which model our problem space very closely and functional constructs which allow us to transform that data. This brings us to the salient feature of this process: Objects are not Coroutines.

2. Combinators over loops, null & throw (Introducing Typelevel Scala into an OO Environment)

(examples can be found on GitHub)

In the previous section we found ways to transform mutable thinking into the immutable. Here we will take this one step further and introduce combinators into the organization.

A combinator is a method on a data structure. This method takes a function as argument and maps values of the data structure into other values of the data structure. This fits well with our immutable approach. There are four main ideas here:

  1. Functions produce new state; they do not destroy old state.
  2. Methods on structures are functions.
  3. Data and usage should be separate.
  4. Bad state should be handled as early as possible.

OO Imperative Style

When developers first get into Scala, it is pretty typical to see this kind of code:

class BadFish(
    private var m_name: String,
    private var m_color: String
){
  def this() = this(null, null)
  def getName(): String = m_name
  def getColor(): String = m_name
  def setName(name: String){
    m_name = name
  }
  def setColor(color: String){
    m_color = color
  }
  def isValid(): Boolean = try{
    check()
    true
  }catch{
    case _: IllegalArgumentException => false
  }
  def check(): Unit = {
    check(m_name, m_color)
  }
  def check(newName: String, newColor: String){
    if(!color(newName).equals(newColor))
      throw new IllegalArgumentException(
          "Fish color and name do not match"
      )
  }
}

The use case for this data is very tightly coupled to the data itself. In order to create a new value based on this value one must make at least 5 calls; 2 get calls, a new and 2 set calls:

val current: BadFish = ...
val (name, color) = (current.getName(), current.getColor())
val next = new BadFish()
next.setName(name)
next.setColor(color)

This puts a burden on any user of this class. One must know the entire structure of the class and the precise method calls for access in order to use it.

The set methods destroy old state to provide new state and the state of the object is not validated until a check method is envoked. When any one reference to this object has a set method called on it, the object could be made invalid for all references without any indication. It is very difficult to follow the logical flow of data through an application full of these kinds of classes.

This class also initializes values to null and provides semantics for its own data being overwritten. A null value is not a value of any type; its essentially the type system lying to the developer.

Typelevel Style

The same functionality can be provided by the following class:

class Fish(val fishName: String){
  val fishColor: String = color(fishName)
  def spawnFish(f: String => String): Fish = {
    new Fish(f(fishName))
  }
}

First thing's first. Its much shorter than the other snippet. The brevity affords a great deal of clarity about the class and intended usage.

It has a single input rather than two; the second variable is dependent upon the first for a valid state. Bad state is found at construction time. there is no longer any need to throw an exception in the class based on the class' own data. Everything is handled up front.

The single function doesn't destroy the state of the object, it only creates a new object based on the data in the current object. This allows us to make the data immutable as well as give the user of the class all the flexibility she needs for instantiation and processing of objects.

Next we'll use Case Classes to encapsulte our data and define it as immutable by default.

1. Immutability as Default (Introducing Typelevel Scala into an OO Environment)

(examples can be found on GitHub)

The first point is using Immutability as Default.

What does "as default" mean? We are not barring mutability from applications wholesale; there are practical reasons for using mutability. For instance,

  • performance
  • global settings.

However, the mutable code should be bounded by its defining scope. This idea can be captured with four rules of thumb:

  1. Function inputs are immutable.
  2. Function outputs are immutable.
  3. var and collection.mutable values are local and temporary.
  4. Function return values are placed into a val.

OO Imperative Style

First, we have a helper function for our examples.

def color(str: String): String = {
  str match{
    case "One Fish" => "Red Fish"
    case "Two Fish" => "Blue Fish"
  }
}

This snippet defines an imperative style One Fish Two Fish Red Fish Blue Fish. Note that we have two valid data points

  1. ("One Fish", "Red Fish")
  2. ("Two Fish", "Blue Fish")

Yet, we model it with a function from String to String which is a space much larger than 2 points. Also, it throws an exception on bad input; the function has a return type of String which is a lie since a thrown exception is a very different result from a String. I found code like this to be very common in the OO imperative world. In the coming posts we will seek to replace this with something better.

Some typical code which breaks our general rules looks a little like:

def bad(): mutable.Buffer[String] = {
  val fish = mutable.Buffer[String]()
  var one = "One Fish"
  var two = "Two Fish"

  fish.append(one)
  fish.append(two)

  one = color(one)
  two = color(two)

  fish.append(one)
  fish.append(two)
  
  fish
}

Here, mutable inputs or outputs can very easily poison threaded data in an application. This makes it difficult to follow the data through an application and reason about control flow.

Also, the vars one and two are used to first hold constant data then hold the result of a function call. The functional assignment to a var has similar negative cognitive affects as mutable arguments and return values. They needlessly complicate control flow.

Typelevel Style

The typelevel way to write the same functionality would be close to:

def better(): List[String] = {
  val one = "One Fish"
  val two = "Two Fish"

  List(
      one,
      two,
      color(one),
      color(two)
  )
}

The function is self contained. No mutable state inside the function escapes its defining scope and all function calls are nested readably within other function calls. The important part is the intent of the function is clear. Also, any threading performed around it is safe from accidental data poisoning; no messy synchronization calls are necessary.

Now that we have a foundation of immutability, we'll add combinators to our toolset.

Introducing Typelevel Scala into an OO Environment

These posts seek to describe a process by which a Scala dev can reasonably (responsibly) introduce Functional Programming (FP) in Scala into an Object Oriented (OO) Java development atmosphere. It can also serve as an entry point for an OO dev into the Scala community. They are based on a talk I gave at the Typelevel Summit in Philadelphia in 2016. Here are the slides and index cards from that talk. A video is available on YouTube.

We'll transform from OO paradigms to FP paradigms through five key ideas:

  1. Immutability as Default
  2. Combinators over loops, null & throw
  3. Case Classes for auto-encapsulation
  4. Objects are not Coroutines
  5. Type Classes over Subclasses

And provide practical examples using 3 libraries:

  1. Monocle
  2. Argonaut
  3. Cats

Each of these ideas builds upon the last and by the end we'll have a solid foundation for introducing FP and Category Theory without even saying any of the "M-words".

Before starting it is important to note that many of the people who develop software do not have Computer Science (CS) degrees. They will sometimes have a degree in another science or engineering discipline. Often, they will have no degree at all. An extension of this idea is you don't need a degree to define flatMap.

This fact is important to us because, the vernacular of CS is different from the vernacular of other disciplines. On the other hand, we are assuming an audience of OO developers so, some CS terminology is ok and indeed will be present throughout. Wherever OO words fail us, we will find "plain old English" ways of describing idioms and ideas. We will be coaching usage, strategies and tactics not terms and vocabulary.

Typelevel Summit Philadelphia

I just spoke at the Typelevel Summit in Philadelphia and it was fantastic!

I thank all the people who were involved in planning it as well as the kind people who approached me after I did my bit to share their kind words with me. It really made me feel like I was not only part of a community but also welcomed and valued as a member. I am truly touched.

This was my first talk ever. I spoke about introducing Typelevel Scala into an OO environment. Here are the slides. Here are the Index Cards.

The talk seemed to have gone over well and I got a lot of good feedback from attendees and fellow speakers that I can use to make any future talks I might do better.

In the coming weeks, I'll write a series of posts that further detail my process for bringing people into the ecosystem.

Type Class over Coupled Monads

Scala has support for built in language level Monads. They go a little something like this:

case class MyMonad[Type](item: Type){
  def map[Ret](f: Type => Ret): MyMonad[Ret] = ???
  def flatMap[Ret](f: Type => MyMonad[Ret]): MyMonad[Ret] = ??? 
}

The issue here is the Monadic flatMap (bind, >>=, whatever) is coupled to the data structure; there is a single operation which defines the Monadic functionality across all the code which uses the data structure. For instance, the Buffer flatMap flattens the Buffer[Buffer] into a Buffer by concatenating the first with the second with the third ... This flattening style is the only flattening style Buffer provides with Scala's built in, for comprehension Monadic chaining. This is very limiting.

If one wanted a different kind of flattening such as a round-robin: taking the first element from each Buffer, then the second from each ... One would need to implement their own Buffer structure and bake into that new Buffer a flatMap operation detailing the round-robin. Now, there are two separate Buffer data structures presenting two separate methods of Monadic chaining. Each function which accepts a Buffer needs to enforce which Buffer type it needs to ensure the flatMap provides the necessary functionality. Type Classes provide a solution to this issue.

Cats to the rescue

The Typelevel Cats library defines a Monad Type Class and implicits for syntax. Keeping to Buffer for our example: we would have two Monads over Buffer which look like:

import scala.collection.mutable.Buffer
import cats.Monad
import cats.implicits._

val mMonad = new Monad[Buffer]{
  override def pure[Type](items: Type): Buffer[Type] = pure(Seq(items))
  def pure[Type](items: Seq[Type]): Buffer[Type] = Buffer(items: _*)
  override def flatMap[Arg, Ret](c: Buffer[Arg])(f: Arg => Buffer[Ret]): Buffer[Ret] = {
    c.flatMap(f)
  }
}
val rrMonad = new Monad[Buffer]{
  override def pure[Type](items: Type): Buffer[Type] = pure(Seq(items))
  def pure[Type](items: Seq[Type]): Buffer[Type] = Buffer(items: _*)
  override def flatMap[Arg, Ret](c: Buffer[Arg])(f: Arg => Buffer[Ret]): Buffer[Ret] = {
    val mapped = c.map(f)
    val tupled = mapped.map(i => (i, i.size))
    val max = tupled.unzip._2.max
    val seq = (0 to max).flatMap{index =>
      tupled.collect{
        case (seq, size) if size > index => seq(index)
      }
    }
    seq.toBuffer
  }
}

Then to use them:

object simple{
  implicit val monad = mMonad
  def apply(values: Seq[Int]){
    val mMulti = mMonad.pure(values) >>={ int =>
      mMonad.pure((0 to int))
    }
    println(mMulti)
  }
}
object roundRobin{
  implicit val monad = rrMonad
  
  def apply(values: Seq[Int]){
    val rrMulti = rrMonad.pure(values) >>={int =>
      rrMonad.pure((0 to int))
    }
    println(rrMulti)
  }
}
val values = Seq(1,2,3,4,5)
simple(values)
roundRobin(values)

And that's it! Ad-hoc Monadic chaining any way you'd like without redefining classes or parameterized methods.

D3js defines a Functor over the DOM

D3js is a wonderful library for parsing datasets and manipulating the DOM based on what was parsed. With the right frame of mind, even complex data manipulations into the DOM become simple enough to implement in short order. The code here is written in javascript.

Those unfamiliar with the library can visit the D3js website for a set of very nice tutorials to get started.

Those unfamiliar with Category theory and Functors can find myriad resources on the internet which describe the concepts. In my opinion, the best treatment of Category Theory is in the book Categories for the Working Mathematician written by Saunders Mac Lane. The first few chapters are all that's needed here.

The basic idea behind D3js is you select one or many DOM elements and manipulate the set of them without cluttering your code with loops and control statements. Take:

<ul id='list'>
  <li id='idx1'>first</li>
  <li id='idx2'>second</li>
  <li id='idx3'>third</li>
  <li id='idx4'>fourth</li>
</ul>

This represents the array ['first','second','third','fourth']. This is the example we will use to show how D3js can be seen as a Functor.

A Category over the DOM

We will define a Category over the DOM by taking the DOM elements as the objects and containment as the arrows.

Given an arrow, f, the domain is the parent DOM element and the codomain is the child DOM element. So, an arrow f: a -> b shows that b is a child of a. Trivially, the identity arrow maps DOM elements to themselves, f: a -> a.

These arrows are composable. Given an arrow, f: a -> b, and an arrow, g: b -> c, an arrow, h: a -> c, can be produced by first applying f to a then applying g to b = f(a); h = g(f). This would show a grandchild relationship within the DOM.

This composition is associative. Take arrows, f: a -> b, g: b -> c, h: c -> d. Let z = h(g(f)), y = g(f), w = h(g). z = h(y) and z = w(f). The grandchild's child is the same as the child's grandchild. This is the great-grandchild relationship within the DOM.

This composition obeys the left and right unit laws. Take f: a -> b, f(id(a)) = f(a) = b. id(f(a)) = id(b) = b.

d3.select is a Functor over the DOM

The first part to discovering a Functor is to define a unit function which maps an element of the DOM, a, to the Functor, F(a). In D3js this function is d3.select.

var element = document.getElementById('list');
var selection = d3.select(element);

d3.select returns a d3 selector object which wraps the DOM element in argument. It "lifts" the DOM element into the d3 selector.

Next we need a function which maps arrows in the DOM, f: a -> b, to arrows in the Functor, F(a) -> F(b). We took our arrows to refer to containment; getElementsByTagName will serve as our f.

var element = document.getElementById('list');
var selection = d3.select(element);

var childElement = element.getElementsByTagName('li')[0];
var childText = childElement.textContent;
var child = selection.select(function(){
  var current = this.getElementsByTagName('li')[0];
  return current;
});

var same = childText === child.text();
if(!same){
  alert('It didn\'t work!');
}
alert(same);

Higher Kinded Types

A higher kinded type is simply an abstraction over types. The examples on this page use the scala language.

A proper type is something such as String. You can only instantiate values of proper types directly. These are represented as kind *.

val str: String = "seven"

In the above snippet, a String can simply be instantiated. No bells, whistles or hoops. (Some proper types cannot be instantiated, these are usually abstract types.)

On the other hand, a higher kinded type is a type which needs another type provided to it in order to create a proper type. It is a type constructor. Values cannot be instantiated for types of these kinds. These have myriad kind representations, the simplest of which is kind * -> *. Which says given a type of kind *, you get a type of kind *. List is a common example.

var list: List = List()//compile error
var list: List[String] = List("sev", "en")

In the above example a List cannot simply be instantiated. It must be provided with a proper type before the compiler can reason about it.

An extension of this is that a type like Map, which takes two type parameters, is kinded * -> * -> *. This says given two types of kind * a type of kind * is produced (currying for types).

Furthermore, a type like Monad[F] is of kind (* -> *) -> * which implies given a type of kind * -> * a type of kind * is produced.