I Hate Python

[Views and opinions are my own.]

First and foremost, I love Python. I believe Python is one of the best choices the mainstream development community provides for doing Functional Programming. It provides all of the bells and whistles necessary to do solid, well-thought (dare I say lawful?) FP.

One could say the hallmark features of FP are

  • functional values

  • free monoid with a fold

  • lambdas

  • strong types

Python does a wonderful job of providing all of these to a Developer. From these building blocks, we can mostly get to Type Classes in Python.

“But Hey! Wait just a moment! I thought you hated Python!!!” I do. Well… not properly, but I hate how Python is typically used. It appears, someone out there is teaching a large subset of the Python community to write Python like it is a worse version of Java (Yes, things can get worse than Java, so count your blessings!). Just a few points.

  • Some data can be stored not in a class

  • Some classes can be alone in the world with no Parent or Child classes defined

  • No one ever said Functions/Methods must be coupled to data

All too often I find myself in a sea of poorly conceived class hierarchies wherein “inheritance” is misappropriated to restrict rather than to extend capabilities and behaviors. This is the type of Python I hope to avoid in the future but know I will be faced with in short order.

Python is Good for FP

Yes. It is.

Python has Functions as Values

def add(a,b):
    return a + b   
compose = add
print(f"compose is add: {compose(1, 2) == add(1, 2)}")
def same(f, g, a, b):
    print(f"{f(a, b) == g(a, b)} for  and ")
same(add, compose, 1, 2)

This displays

compose is add: True
True for 1 and 2

Python has a Free Monoid with a Fold

from functools import reduce
free = [1,2,3,4,5,6]
def c1(a, b): return a + b
def c2(a, b): return a * b
def c3(a, b): return str(a) + str(b)
print(reduce(c1, free, 0))
print(reduce(c2, free, 1))
print(reduce(c3, free, ""))

This displays

21
720
1234

Python has Lambdas

from functools import reduce
free = [1,2,3,4,5,6]
print(reduce(lambda a,b: a + b, free))
print(reduce(lambda a,b: a * b, free))
print(reduce(lambda a,b: str(a) + str(b), free))

This displays

21
720
123456

Python has Strong Types

Strong types are important for FP. Static types are not.

Strong types ensure the data one function writes is the data another function reads. It ensures the type of a value is static through the lifetime of that value.

Static types ensure the type of a name is static through the lifetime of that name. Static vs Dynamic typing is only important in applications where there is use of the assignment operator outside of declaration. As functional programmers, we do not do this, even if it is allowed. A “limitation” or “inconvenience” caused by dynamic typing can simply be overcome by discipline.

a = '1'
b = 1
print(f"a= is a ; b= is a ; a == b is {a == b}")
print(f"a+a gives ")
print(f"b+b gives ")
print(f"a+b error")
print(a+b)

This displays

a=1 is a <class 'str'>; b=1 is a <class 'int'>; a == b is False
a+a gives 11
b+b gives 2
a+b error
Traceback (most recent call last):
  File "C:\Users\dread\OneDrive\DreadedSoftware\Talks\y2022\hatepython\04.py", line 12, in <module>
    print(a+b)
TypeError: can only concatenate str (not "int") to str

In Python, Everything is Mutable

This, like Dynamic Typing, is not a problem for a disciplined functional programmer.

a = [1,2,3,4,5,6,7]
print(a)
for i in range(len(a)):
    if 0 == i % 2:
        a[i] = -a[i]
print(a)

This displays

[1, 2, 3, 4, 5, 6, 7]
[-1, 2, -3, 4, -5, 6, -7]

If we avoid the = operator, things should work out great.

Applying the Basics

Say we have two computations which apply an operation across a list of numbers to produce a new number.

from functools import reduce
a = [1,2,3,4]
def black_box_extreme_computation_1(sequence):
    return reduce(lambda a,b: a+b, sequence)
def black_box_extreme_computation_2(sequence):
    return reduce(lambda a,b: a*b, sequence)
print(f"add: ")
print(f"multiply: ")

This displays

add: 10
multiply: 24

The problem is the functions differ only by the operation applied. this is very WET code, and we like DRY. Luckily, we can pass functions as values in Python.

a = [1,2,3,4]
def add(a,b):
    return a + b
def multiply(a,b):
    return a*b
def black_box_extreme_computation(sequence):
    def inner(f):
        return reduce(f, sequence)
    return inner
print(f"add: ")
print(f"multiply: ")

This displays

add: 10
multiply: 24

We have gotten rid of the repetition, but that is not good enough. Why? Code is meant to be read. A function which takes a simple functional value f is not good enough. We want to provide enough information to the reader so that the reader does not need to read the entire function to understand how and why the f parameter is used.

Further, gone are the days of "I write code so that a competent engineer can understand it." There are a lot of boot camp developers out there. These boot camps are like 3 months long. these people have literally been coding for 3 months. Even if someone with 3 months of experience reads most functions they will likely misunderstand something, misinterpret something or miss something important. We need to do some work up front to ensure a "competent engineer" (whatever that means) can understand the function just by reading the prototype AND a fresh-out-of-camp colleague will either understand the function just by the prototype or immediately ask someone "What do I do here?"

More succinctly, we need to ensure we put enough information into the function prototype so that someone who does not "just know" what things mean, will know to ask a question.

For example, let's require our f to be Associative. That information is not easily ascertainable, and someone, at some point in time will try to shove something non associative into the mix. Like maybe:

def alternate():
    def switcher():
        n = True
        while True:
            n = not n
            yield n
    gen = switcher()
    def inner(a, b):
        if next(gen): return add(a,b)
        else: return multiply(a,b)
    return inner

Any application which requires associativity will produce odd stuff in this case, and we want to avoid these types of bugs because, they are incredibly difficult to find and fix in a language with no compiler.

To solve this, we will use a class and type annotations. Even though types are not enforced in Python, annotations make it easy to tell future developers (including yourself 3 weeks from now) what it is your intention was in writing the code.

from abc import ABC, abstractmethod
class AssociativeOperator(ABC):
    @abstractmethod
    def combine(self, a, b): pass
from functools import reduce
a = [1,2,3,4]
class Add(AssociativeOperator):
    def combine(self, a,b):
        return a+b
class Multiply(AssociativeOperator):
    def combine(self, a,b):
        return a*b
def black_box_extreme_computation(sequence):
    def inner(f: AssociativeOperator):
        return reduce(f.combine, sequence)
    return inner

Now, it is super clear that we want an associative operation. For good measure, let’s do something similar with a Monoid.

from abc import ABC, abstractmethod
class Monoid(ABC):
    a = None
    @abstractmethod
    def combine(self, a, b): pass
from functools import reduce
a = [1,2,3,4]
class Add(Monoid):
    a = 0
    def combine(self, a,b):
        return a+b
class Multiply(Monoid):
    a = 1
    def combine(self, a,b):
        return a*b
def black_box_extreme_computation(sequence):
    def inner(f: Monoid):
        return reduce(f.combine, sequence, f.a)
    return inner

A simple change, and we went from requiring just a closed Associative Operator (Semigroup), to requiring both a closed Associative Operator and an identity item.

These very much are type classes; however, this encoding still leaves something to be desired. In programming languages that support "real functional programming", a function that looks like f(input)(typeclass_instance) can nearly automatically discover which typeclass_instance to choose.

For instance in Scala arguments can be automagically inferred by using the implicit keyword. In Haskell, typeclasses are well-supported at the compiler level and just declaring a type class instance in a function prototype delivers the instance.

What is needed is a semi-automatic way to pass in a function which is enhanced in some way by another function.

Enter the Decorator

def noDecorator():
    def log(f, message):
        print(f(message))
    def formatter(message):
        import datetime
        delim = {
            'begin': '~** ',
            'end'  : ' **~'
        }
        return f"[] "
    log(formatter, "first")
    log(formatter, "second")
    log(formatter, "third")
noDecorator()
def withDecorator():
    def logger(f):
        def inner(*args):
            print(f(*args))
        return inner
    @logger
    def log(message):
        import datetime
        delim = {
            'begin': '~** ',
            'end'  : ' **~'
        }
        return f"[] "
    log("first")
    log("second")
    log("third")
withDecorator()

This displays

[2022-07-17 02:38:08.810432] ~** first **~
[2022-07-17 02:38:08.811466] ~** second **~
[2022-07-17 02:38:08.811466] ~** third **~
[2022-07-17 02:38:08.811466] ~** first **~
[2022-07-17 02:38:08.812467] ~** second **~
[2022-07-17 02:38:08.812467] ~** third **~

Since we are discussing Typeclasses, it is important to note we can stack decorators to apply multiple enhancements to the same function.

def truncate(f):
    def inner(message):
        return f(message[:3])
    return inner
def logger(f):
    def inner(message):
        print(f(message))
    return inner
@logger
@truncate
def log(message):
    import datetime
    delim = {
        'begin': '~** ',
        'end'  : ' **~'
    }
    return f"[] "
print(log("first"))

This displays

[2022-07-17 02:41:15.011042] ~** fir **~

Let’s enhance our same typeclass example with Monoid with Decorators to almost automatically pass in our instances

Semi-Automatic Choice

from functools import reduce
from abc import ABC, abstractmethod
import random
a = [1,2,3,4]
class Monoid(ABC):
    def combine(a, b): pass
class Add(Monoid):
    def combine(a,b):
        return a+b
class Multiply(Monoid):
    def combine(a,b):
        return a*b
def monoid(F: Monoid):
    def decorator(f):
        def inner(*args, **kwargs):
            return f(*args, **kwargs)(F)
        return inner
    return decorator
def black_box_extreme_computation(sequence):
    def inner(f: Monoid):
        return reduce(f.combine, sequence)
    return inner
@monoid(Multiply)
def pi(sequence):
    return black_box_extreme_computation(sequence)
@monoid(Add)
def sigma(sequence):
    return black_box_extreme_computation(sequence)
print(f"pi(a) =\n   ")
print(f"pi(random.sample(a, len(a))) =\n   {pi(random.sample(a, len(a)))}")
print(f"pi([*a, 5]) =\n   {pi([*a, 5])}")
print(f"sigma(a) =\n   ")
print(f"sigma(random.sample(a, len(a))) =\n   {sigma(random.sample(a, len(a)))}")
print(f"sigma([*a, 5]) =\n   {sigma([*a, 5])}")

This displays

pi(a) =
   24
pi(random.sample(a, len(a))) =
   24
pi([*a, 5]) =
   120
sigma(a) =
   10
sigma(random.sample(a, len(a))) =
   10
sigma([*a, 5]) =
   15

To get rid of the need to explicitly pass in the typeclass instance in argument, we created a function which takes a typeclass instance as an argument and returns a decorator. Decorators are simply functions, functions are values, why not return a decorator as the result of a function? Our decorator is indeed a currying operator.

This gets us to semi-automatic typeclass use. There is a fair bit of pageantry and ceremony in defining typeclasses, there is no auto derivation, there is no compiler help; however, at least we have marked functions as requiring well-known typeclasses, and we can write Python code that looks at least a little bit like it was written in a language which enforces the types of discipline we take for granted in other environments.

Monads

Setup

from functools import reduce
from abc import ABC
a = [1,2,3,4]

A Functor is just a function mapping an item in our source category (types) into an item in our target category (types inside of some other type). We will make one for List and for “Optional”. Note: in Python everything is Optional so this is a fancy version of the identity, but it still makes for a nice example.

class Functor(ABC):
    def lift(a): pass
    def map(f): pass
class ListFunctor(Functor):
    def lift(a): return [a]
    def map(f):
        def inner(a):
            return map(f, a)
        return inner
class AnyFunctor(Functor):
    def lift(a): return a
    def map(f):
        def inner(a):
            if a != None: return f(a)
            else: return a
        return inner

A Functor and some special Natural Transformations make up a Monad. So let’s do it for List and Optional.

class Monad(ABC):
    F: Functor = None
    def bind(a, f): pass
class ListMonad(Monad):
    F = ListFunctor
    def bind(self, a, f):
        listList = self.F.map(f)(a)
        return [b for a in listList for b in a]
class AnyMonad(Monad):
    F = AnyFunctor
    def bind(self, a, f):
        optOpt = self.F.map(f)(a)
        return optOpt
def monad(F: Monad):
    def decorator(f):
        def inner(*args, **kwargs):
            return f(*args, **kwargs)(F)
        return inner
    return decorator

Just for completeness, we need an operation to chain monadic computations easily. Every “real functional language” provides such a mechanism (Haskell: >>=; Scala: for), so let’s do it here too.

def chain(a, *args):
    def inner(F: Monad):
        return reduce(lambda a,b: F.bind(F, a, b), args, a)
    return inner
@monad(ListMonad)
def flatMap(a, *args):
    return chain(a, *args)
@monad(AnyMonad)
def noneCheck(a, *args):
    return chain(a, *args)

Some functions to chain together

def toRange(a): return list(range(a))
def repeater(a): return [a] * 2
def doubler(a): return [a*2]
def stringToInt(a):
    try:
        return int(a)
    except:
        None
def intToList(a):
    return list(range(a))

And we have…

print(f"flatMap(a, doubler, repeater, toRange) =\n   {flatMap(a, doubler, repeater, toRange)}")
print(f"noneCheck('a', stringToInt, intToList) =\n   {noneCheck('a', stringToInt, intToList)}")
print(f"noneCheck('7', stringToInt, intToList) =\n   {noneCheck('7', stringToInt, intToList)}")
print(f"flatMap(noneCheck('7', stringToInt, intToList), intToList) =\n   {flatMap(noneCheck('7', stringToInt, intToList), intToList)}")

Which displays

flatMap(a, doubler, repeater, toRange) =
   [0, 1, 0, 1, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7]
noneCheck('a', stringToInt, intToList) =
   None
noneCheck('7', stringToInt, intToList) =
   [0, 1, 2, 3, 4, 5, 6]
flatMap(noneCheck('7', stringToInt, intToList), intToList) =
   [0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5]

Conclusion

We have typeclass-based, Monadic operations in Python which is a Dynamically typed language. This requires strong typing and functions as values.

We use classes to model Typeclasses.

We use the facts that functions are values and decorators are functions to give us a currying function which we then use to give us semi-automatic typeclass instance choice.

We have proven this sound for Semigroups, Monoids, Functors and Monads. There is no reason to believe it would fail for Groupoids, Groups, Applicatives or any other Typeclasses that are representable in Python.

Introductory TyDD in Scala: Anatomy of a Type Level Function

Value Level Functions

A value level function typically looks like

def f(a: Int, b: Int): String = {
  a.toString + b.toString
}

The key parts are

  1. A keyword: def
  2. Inputs: a: Int, b: Int
  3. Outputs: String
  4. Body: a.toString + b.toString

When you see these 4 parts, you know you are reading a value level function. Nothing surprising here. Now, let's see what a similar definition looks like at the type level.

Type Level Functions

A type level function looks like

trait MyTrait[A, B]{type Out}
object MyTrait{
  def apply[A, B, C](): MyTrait[A, B]{type Out = C} =
    new MyTrait[A, B]{override type Out = C}
}

In this definition there is a type refinement, MyTrait[A, B]{type Out = C}. These are undesirable artifacts of type level development. To simplify these definitions we use the Aux alias (a document about on this). Aux helps us remove type refinements from our logic.

A type level function looks like

trait MyTrait[A, B]{type Out}
object MyTrait{
  type Aux[A, B, C] = MyTrait[A, B]{type Out = C}
  def apply[A, B, C](): Aux[A, B, C] =
    new MyTrait[A, B]{override type Out = C}
}

The type refinement from the previous example is replaced by the nicer (more readable, fewer braces, less code) Aux type.

Type level functions have the same 4 key parts as value level functions

  1. Keywords: trait object
  2. Inputs: A, B
  3. Outputs: type Out
  4. Body: def apply[A, B, C](): Aux[A, B, C]

Here the inputs are type parameters and outputs are type members. This is so the output types are not erased and can be referenced later in business logic. This is similar to value level functions as the result of a value level function does not expose the inputs required by the function.

Bodies of type level functions are value level functions. They are typically only a few lines long. Their purpose is to present the compiler with a way to construct a new type from the types provided. This is what these blog posts will focus on.

Whenever you see a set of definitions which have these 4 qualities, you know you are looking at a type level function.

The type class is the fundamental element of this style of type driven development. The next post will give an overview of this concept.

Abstraction in F[_]: Lift Implementation Details Outside the Class

We now know that we can lift implementation details outside the class to gain some freedom in the client code; we have gained abstraction through decoupling. Why not apply a similar abstraction on the rest of the code.

We had:

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI)(implicit F: Functor[F]): F[Unit] = {
    val in = read(uri)
    val computed = F.map(in)(computation)
    F.map(computed)(write)
  }
  def read(uri: URI): F[A]
  def computation(in: A): B
  def write(in: B): Unit
}

And we can abstract it into:

trait Read[F[_], A] extends Function1[URI, F[A]]
trait Computation[A, B] extends Function1[A, B]
trait Write[B] extends Function1[B, Unit]

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI)(implicit
    F: Functor[F],
    read: Read[F, A],
    computation: Computation[A, B],
    write: Write[B]): F[Unit] = {
    val in = read(uri)
    val computed = F.map(in)(computation)
    F.map(computed)(write)
  }
}

While we're at it, let's abstract out the apply method as well.

sealed trait Pipeline[F[_], A, B]{
  def apply(uri: URI): F[Unit]
}
object Pipeline{
  final def apply[F[_]: Functor, A, B](implicit
    read: Read[F, A],
    computation: Computation[A, B],
    write: Write[B]) = {
    val F: Functor[F] = implicitly
    new Pipeline[F, A, B]{
      override def apply(uri: URI): F[Unit] = {
        val in = read(uri)
        val computed = F.map(in)(computation)
        F.map(computed)(write)
      }
    }
  }
}

Now our trait is a simple expression of inputs to outputs. But why would we ever perform this kind of abstraction? It seems like we are complicating the problem, not making it simpler.

 

The benefits here are more subtle and only appear in certain use cases. Say you have two pipelines:

val p1: Pipeline[Stream, Int, String] = ???
val p2: Pipeline[Stream, String, Array[Char]] = ???

And you wanted to combine them into a single

val pipeline: Pipeline[Stream, Int, Array[Char]] = ???

Taking two Pipeline instances and composing them is not a readily understood idea. However, taking two Function1 instances and composing them is very well understood. Notice we brought the functions read, compute and write outside the class as simple Function1 instances. Abstracting the Pipeline functions outside the trait provides the developer who is writing the client code with a clear and well understood method for composing multiple Pipelines.

This is still an incomplete implementation. We can see a path forward for composing any number of Pipelines whose computations can be composed but, how do we compose Pipelines who accept different inputs?

A simple switching mechanism

Say we have three Pipeline instances which require separate inputs.

val p1: Pipeline[Stream, ...] = ...
val p2: Pipeline[Stream, ...] = ...
val p3: Pipeline[Stream, ...] = ...

Our application would need to accept a URI and choose which pipeline (if any) should run it.

def perform(uri: URI): Stream[Unit] = {
if(uri.toString.contains("p1")) p1(uri)
else if(uri.toString.contains("p2")) p2(uri)
else if(uri.toString.contains("p3")) p3(uri)
else Stream()
}

This is a lot of boilerplate. Especially when you consider the number of Pipelines (for any successful business) is expected to increase. Let's unpack what we have and see if we can't abstract it into our Pipeline definition.

  1. Uniform Input URI is the input to ALL Pipeline instances
  2. Guards checking a URI against some
  3. Constant value defining a Pipeline for use in a Guard
  4. Default case in case the input matches no Pipeline

Our uniform input means we don't have to worry about which Pipeline can take what Types of values. This is already abstract enough.

We'll build a typeclass to model Guards and Constants associated with each pipeline.

trait Guard[-T]{
def name: String
}
sealed trait Pipeline[-T, A, B]{
def apply(uri: URI): F[Unit]
}
object Pipeline{
final def apply[T: Guard, F[_]: Functor, A, B](implicit
read: Read[F, A],
computation: Computation[A, B],
write: Write[B]): Default[T, F, A, B] = {
val G: Guard[T] = implicitly
val F: Functor[F] = implicitly
new Pipeline[T, A, B]{
override def apply(uri: URI): F[Unit] = ???
  }
}
}

We have an issue here. The last else case of our function returns an empty Stream. In the Pipeline object we don't know what our effect type is. We cannot return an empty version thereof. This problem takes me back to a talk given by Runar Bjarnason last year wherein he describes how when we liberate our types, we constrain our implementation and when we constrain our types we liberate our implementation. We have liberated all of our types here (except URI) leaving ourselves no room to implement what we need. So, we need to constrain a type that we may regain our ability to implement our function. Let's constrain our output type.

 
trait Guard[-T]{
  def name: String
}
sealed trait Pipeline[-T, A, B]{
type Out
  def apply(uri: URI): Out
}
object Pipeline{
  final def apply[T: Guard, F[_]: Functor, A, B](implicit
    read: Read[F, A],
    computation: Computation[A, B],
    write: Write[B]): Default[T, F, A, B] = {
    val G: Guard[T] = implicitly
    val F: Functor[F] = implicitly
    new Pipeline[T, A, B]{
  type Out = Either[Unit, F[Unit]]
      override def apply(uri: URI): Out = {
        val from = uri.toString
        if(from.contains(G.name)) Right{
          val in = read(uri)
          val computed = F.map(in)(computation)
          F.map(computed)(write)
        } else Left(())
      }
    }
  }
}

So our client code becomes

trait P1
trait P2
trait P3
implicit def guardP1 = new Guard[P1]{
  final override def name: String = "p1"
}
implicit def guardP2 = new Guard[P2]{
  final override def name: String = "p2"
}
implicit def guardP3 = new Guard[P3]{
  final override def name: String = "p3"
}
implicit def p1: Pipeline[P1, Stream, ...]
implicit def p2: Pipeline[P2, Stream, ...]
implicit def p3: Pipeline[P3, Stream, ...]
def perform(uri: URI): Either[Either[Either[Unit, Stream[Unit]], Stream[Unit]], Stream[Unit]] = {
  p1(uri).fold(
    _ => Left(p2(uri).fold(
      _ => Left(p3(uri).fold(
        _ => Left(()),
        a => Right(a)
      )),
      a => Right(a)
    )),
    a => Right(a)
  )
}

This has made things much worse. There is even more boiler plate and the nesting will become unreasonable in short order. But we have something we can easily reason about at the type level:

  • Given a known set of Pipeline instances
  • Created a computation which is at most 1 Pipeline
  • Resulting in a nested Data Structure

These characteristics indicate we can take an inductive approach to building our Pipeline library. Enter Shapeless.

 

Abstraction in F[_]: Abstract Your Functions

We have a reasonably abstract pipeline in

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI): F[Unit] =
    write(computation(read(uri)))
  def read(uri: URI): F[A]
  def computation(in: F[A]): F[B]
  def write(in: F[B]): F[Unit]
}

Recognizing Higher-Kinded Duplication

Taking a close look at the trait, we see the computation and write functions are the same aside from their type variables. In fact, if we rename them to have the same name, the compiler complains

scala> :paste
// Entering paste mode (ctrl-D to finish)
trait Pipeline[F[_], A, B]{
  def perform(in: F[A]): F[B]
  def perform(in: F[B]): F[Unit]
}
// Exiting paste mode, now interpreting.
<console>:9: error: double definition:
method perform:(in: F[B])F[Unit] and
method perform:(in: F[A])F[B] at line 8
have same type after erasure: (in: Object)Object
         def perform(in: F[B]): F[Unit]

Since these are the same, we can build an abstraction to simplify our API even further.

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI): F[Unit] = {
    val in = read(uri)
    val computed = convert(in)(computation)
    convert(computed)(write)
  }
  def read(uri: URI): F[A]
  def computation(in: A): B
  def write(in: B): Unit
  def convert[U, V](in: F[U], f: U => V): F[V]
}

We've removed the need for the developer to understand the effect type in order to reason about a computation or write step. Now, let's focus on this new function

def convert[U, V](in: F[U], f: U => V): F[V]

This is super abstract. Like so abstract it is meaningless without context. I am reminded of this video in which Rob Norris explains how he continued to abstract his database code until some mathematical principles sort of arose from the work. In this talk, he points out that anytime he writes something sufficiently abstract he checks a library for it, as he probably has not himself discovered some new basic principle of mathematics. We do the same here.

Inside the cats library we find the following def within the Functor class

def map[A, B](fa: F[A])(f: A => B): F[B]

This is the same as if we wrote our convert function as curried rather than multiple argument. We replace our custom function with one from a library; the chance is greater that a developer is well-informed on cats than our internal library. (post on implicits and type classes)

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI)(implicit F: Functor[F]): F[Unit] = {
    val in = read(uri)
    val computed = F.map(in)(computation)
    F.map(computed)(write)
  }
  def read(uri: URI): F[A]
  def computation(in: A): B
  def write(in: B): Unit
}

Here we were able to replace an internal (thus constrained) implementation detail with an external (thus liberated) argument. In other words, we have lifted an implementation detail outside our class giving the client code freedom to use the same instantiated Pipeline in a multitude of contexts.

Abstraction in F[_]: Abstract your Types

A Pipeline of Stream from BigInt to String

Say we have a  data pipeline:

trait Pipeline{
  final def apply(uri: URI): Stream[Unit] =
    write(computation(read(uri)))
  def read(uri: URI): Stream[BigInt]
  def computation(in: Stream[BigInt]): Stream[String]
  def write(in: Stream[String]): Stream[Unit]
}

The limitations here are many. The most important limitation is this only works for data pipelines your team can model as streams of an input to a BigInt to a String to an output. This is not very useful. The first step is abstracting over your computation types.

A Pipeline of Stream

Removing the constraint on BigInt and String requires type parameters on our Pipeline trait:

trait Pipeline[A, B]{
  final def apply(uri: URI): Stream[Unit] =
    write(computation(read(uri)))
  def read(uri: URI): Stream[A]
  def computation(in: Stream[A]): Stream[B]
  def write(in: Stream[B]): Stream[Unit]
}

We have gained a bit of freedom in implementation. We can now write Pipelines that can be modeled as streams of an input to a A to a B to an output given any A and B. Now instead of being constrained to BigInt and String, we have gained some liberty through our abstraction.

Still, we are constrained to the scala Stream type. This, too, is a nuisance what if we require Pipelines that effect through fs2 Stream or spark Dataset or any other suitable effect? Similar to how we abstracted away from BigInt and String by making them type parameters A and B, we can do the same with our Stream.

A Pipeline

Using a higher-kinded type parameter, we can abstract over any effect assuming the effect has a single type parameter.

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI): F[Unit] =
    write(computation(read(uri)))
  def read(uri: URI): F[A]
  def computation(in: F[A]): F[B]
  def write(in: F[B]): F[Unit]
}

Now, we can make a data Pipeline using any such types! We can have our original Pipeline of Stream from BigInt to String

val pipeline: Pipeline[Stream, BigInt, String] = ???

We can have a Pipeline of fs2 Stream with some type construction:

type MyStream[A] = fs2.Stream[fs2.Task, A]
val pipeline: Pipeline[MyStream, BigInt, String] = ???

We can even do this with spark

val pipeline: Pipeline[Dataset, BigInt, String] = ???

Any Pipeline your team can model as a read effect a computation and a write effect can be defined with Pipeline defined this way.

Abstraction in F[_]

I gave a talk at typelevel today about abstraction data pipelines and some ways to ease the use of Spark in purely functional application. It seemed to have gone pretty well, here you will find the video, deck (pptx, pdf) and code

In the coming weeks I'll be posting the long-form version of the talk.

  1. Abstract Your Types
  2. Abstract Your Functions
  3. Lift Implementation Details Outside the Class
  4. Implicits and Induction

Asynchronicity

(video, slides, code)

I gave a talk at PHASE (PHiladelphia Area Scala Enthusiasts) about Asynchronous programming. Specifically, we covered how to transform a basic linear and sequential computation (factorial) into one which could be processed in parallel. Moreover, we added an extra layer of complexity onto the computation (n choose k) to show how Actors and Futures in Scala can work together to help a developer build an asynchronous system.

What the Hell is an "Effect Type"?

I was reading through the fs2 documentation and user guide and thought to myself, this is really straight forward! And, to their credit it is. Anyone who is used to FP and Scalaz or Haskell will take to the documentation with little friction. However, when an FP novice or someone from a language with less powerful FP implementations (F#, C#, Java) encounters this documentation, its a pain to trudge through. 

Through many attempts at explaining fs2, I have found the main topic of concern is the "effect type". On its surface, this term seems rather benign:

  1. Everyone knows FP strives to have no side effects
  2. We all know certain things (IO for example) are fundamentally effectful
  3. In order to encode these effects into FP style, we build abstractions
  4. These abstractions for effects are our effect types

So, taking the canonical example from the fs2 documentation

import fs2.{io, text, Task}
import java.nio.file.Paths

def fahrenheitToCelsius(f: Double): Double =
(f - 32.0) * (5.0/9.0)

val converter: Task[Unit] =
  io.file.readAll[Task](Paths.get("testdata/fahrenheit.txt"), 4096)
    .through(text.utf8Decode)
    .through(text.lines)
    .filter(s => !s.trim.isEmpty && !s.startsWith("//"))
    .map(line => fahrenheitToCelsius(line.toDouble).toString)
    .intersperse("\n")
    .through(text.utf8Encode)
    .through(io.file.writeAll(Paths.get("testdata/celsius.txt")))
    .run

// at the end of the universe...
val u: Unit = converter.unsafeRun()

We see a file read, a parse, a transformation then a file write (reading and writing files are side effect heavy). The effect type is Task and the result is Unit. This can be read as

The converter exists to create a Task which reads a file as bytes, converts those bytes to utf-8 Strings, transform those Strings and write them back to disk in a separate file returning no result. The converter is purely effectful.

One can do this with a standard scala.collection.immutable.Stream as well.

val path = Paths.get("testdata/fahrenheit.txt")
val out = Paths.get("testdata/celsiusStream.txt")
val readerT =
  Try(Files.newBufferedReader(path, StandardCharsets.UTF_8))
val writerT =
  Try(Files.newBufferedWriter(out, StandardCharsets.UTF_8))
val result = for{
  reader <- readerT
  writer <- writerT
}yield{
  Stream.continually{reader.readLine}
    .takeWhile(null != _)
    .filter(s => !s.trim.isEmpty && !s.startsWith("//"))
    .map(line => fahrenheitToCelsius(line.toDouble).toString)
    .flatMap{Stream(_, "\n")}
    .foreach(writer.write)
}
readerT.foreach(_.close())
writerT.foreach(_.close())

So, other than the obvious fs2 io convenience functions, to most FP unindoctrinated it seems the standard Stream version is about as useful and as safe as the fs2 Stream version. However, the fs2 Stream is much better FP practice.

Function Parameters should be Declared

Upon Stream creation, there is a big difference between fs2 and standard. With fs2, the creation mechanism is a pure function

io.file.readAll[Task](Paths.get("testdata/fahrenheit.txt"), 4096)

It takes all of its necessary data as parameters and returns a Stream with effect type Task. On the other hand, the standard Stream requires a closure to be initialized

Stream.continually{reader.readLine}

It requires a by-name parameter, the by-name returns a different result upon each invocation and the body of the by-name depends on the closure within which the function is called. This line of code is impossible to understand by itself; taken out of context, it is meaningless. In other words the function lacks referential transparency.

Function Duties should be Declared

Lines from a file in fs2 are produced with

scala> io.file.readAll[Task](Paths.get("testdata/fahrenheit.txt"), 4096).
     |       through(text.utf8Decode).
     |       through(text.lines)
res3: fs2.Stream[fs2.Task,String] = evalScope(Scope(Free)).flatMap(<function1>)

and standard Stream we have

scala> val path = Paths.get("testdata/fahrenheit.txt")
path: java.nio.file.Path = testdata\fahrenheit.txt

scala> val reader = Files.newBufferedReader(path,
 | java.nio.charset.StandardCharsets.UTF_8)
reader: java.io.BufferedReader = java.io.BufferedReader@3aeed31e

scala> Stream.continually{reader.readLine}
res4: scala.collection.immutable.Stream[String] = Stream(120, ?)

There are two important differences here. The first is evaluation; even though standard streams are considered lazy, they evaluate the head value eagerly. We can see fs2 gives us a computation where standard Streams gives us a value. The second difference is the type.

In fs2 we have a type of Stream[Task, String]; standard gives us Stream[String]. The fs2 Stream explicitly expresses the intention for effectful computation, whereas the standard Stream hides this implementation detail. Another way of looking at this is, standard Streams hide their effects where fs2 Streams surface their effects. In fact, fs2 Streams (if effectful) only allow the developer to find its result through the effect type. This gives the developer a sort of heads up about what's going on in the application behind the scenes.

What the Effect Type Represents

The effect type in any such functional library represents the intent to perform an operation outside the scope of the return type. This is very common for IO (as we've seen) and other effectful operations like Logging or showing the user a pop up. The computation usually produces some sort of a result but, before returning the result writes it to disk, logs it or tells the user about a completion state. In FP, we like to express all data in a function through the function definition and effects are just weird data.

Type Variance - Why it Matters

The company I work for has a robust Intern program; as a result, I work with a lot of young engineers and computer scientists. To date:

  1. 100% of their resumes mention they have a tight grasp on Object Oriented Programming
  2. 100% of them fail to understand the finer points of subtyping and furthermore subclassing

I have given more explanations of variance than I have given explanations of anything else on the job. So, in a effort to practice the DRY principle in all my affairs, I decided to put it into documentation I can point to.

Note: Type Variance has A LOT of math (type theory & category theory) behind it. This post will focus on its usage in the Scala language not on the math.

Sub Classes

class Foo[T]
def check[A, B](a:A, b:B)(implicit ev: A <:< B): Unit = {}

Here the class Foo is parameterized by T and the check function simply checks if the type of its first argument is a subclass of the type of its second argument. Don't worry about the check function, its implementation details are beyond the scope of this post but, its a nifty trick!

val str = ""
val obj = new Object()
check(str, obj)//compiles

So, the check here compiles which tells us String is a subclass of Object.

val fStr = new Foo[String]
val fObj = new Foo[Object]
check(fStr, fObj)//error: Cannot prove that zzz.simple.Foo[String] <:< zzz.simple.Foo[Object].

This doesn't compile because the type parameter of Foo allows for no variation in its relationship; Foo is invariant in T.

We will begin by briefly describing variance.

Variance

Variance is a huge part of programming with types. It is an intrinsic property of class hierarchies and can be witnessed as such in languages like C++ and Scala who have compiler errors along the lines of:

  • covariant whatever in contravariant position
  • whatever is in covariant position but is not a subclass of whatever

In short, type variance describes the types that may be substituted in place of another type.

Covariance: We say a substitution is covariant with a type, Foo, if Foo or any other class with a subclass relationship to Foo is valid.
Contravariance: We say a substitution is contravariant with a type, Bar, if Bar or any other class with a superclass relationship to Bar is valid.
Invariance: We say a substitution is in variant with a type, Foo, if only types that are exactly Foo are valid.

Variance in Practice

We already saw invariance above. In Scala covariance and contravariance are denoted by using the + and - symbols respectively.

Covariance

case class Foo[+T]//covariant in T

Redeclaring Foo in this way makes it covariant so our test now validates

val str = ""
val obj = new Object()
check(str, obj)//compiles
val fStr = new Foo[String]
val fObj = new Foo[Object]
check(fStr, fObj)//compiles

Declaring the type variable with a + (as covariant) tells the compiler that the subclass relationship between type parameters gives rise to a direct subclass relationship in Foo. So any def, val or var requiring a Foo[Object] can take a Foo[String] as an argument in place.

Contravariance

case class Foo[-T]//contravariant in T

This redeclaration makes Foo contravariant and breaks our test again

val str = ""
val obj = new Object()
check(str, obj)//compiles
val fStr = new Foo[String]
val fObj = new Foo[Object]
check(fStr, fObj)//error: Cannot prove that zzz.simple.Foo[String] <:< zzz.simple.Foo[Object].

This is what we expect! Contravariance implies a superclass relationship not a subclass relationship. We can fix this by reversing our input arguments

check(fObj, fStr)//compiles

This declaration is a hint to the compiler that the subclass relationship between type parameters gives rise to a superclass relationship in Foo. So any def, val or var requiring a Foo[String] can take a Foo[Object].

How to use Variance

Where covariance preserves the subclass relationship from the type parameter into the type, contraveriance reverses this relationship.

Covariance is used a lot in Scala by the collections library. Most of the immutable collections are covariant. This makes working with your data types inside the collection the same as working with them outside the collection when writing interfaces.

Contravariance is less prominent. I use contravariance for typeclasses a lot.

trait Bar[-T]{
  def bar(t:T): Unit
}
implicit val bar = new Bar[Object]{def bar(o:Object): Unit = ()}
def procBar[T:Bar](t: T){
  implicitly[Bar[T]].bar(t)
}
procBar(obj)//compiles
procBar(str)//pulled the superclass instance in

If a class does not have a typeclass instance in implicit scope for its type it can use a contravariant instance if one is in scope.

 

 

Crashing F#

There's been a lot of buzz about using F# on the CLR for functional programming. This week, I decided to take the language for a spin.

The Basics

F# has a wonderfully light and succinct syntax. It uses the same keyword, let, for both values and functions.

let str = "Hello"
let func a = a + str 
let another = func " World!"
printfn "%s" another

Also, discriminated unions (sum types) are super simple

type SumType =
  |First
  |Second
  |Third of string * int
  |Fourth of a:string * b:int * c:double

Anytime I try out a new language I try to explore five things:

 
  1. Tail recursive functions
  2. Functor
  3. Monad
  4. Free Monad
  5. Pattern Matching

F#, as far as I can tell after using it for a total of about 10 hours, can be used for 1-3 and 5 but not 4. Here is a brief synopsis of my trials.

Tail Recursion

This was extremely simple in F#. Its two steps:

  1. Mark a function as recursive using the rec keyword
  2. Put the recursive step as the last operation
module Recursion
    //need a keyword for recursion, otherwise boilerplate free
    //very nice
    let rec check many acc =
        if(0 = many) then acc
        else
            let a = many - 1
            let b = acc + many
            check a b

The Recursion.check function uses the accumulator pattern to add int values from 1 to many to the initial accumulator value acc.

Functor & Monad

This took some work! F# lacks higher kinded polymorphism. My first attempt at this was:

type Functor<'A> =
    abstract map: 'A<'B> -> ('B -> 'C) -> 'A<'C>

Which produced a compile time error: "Type parameter cannot be used as type constructor." Ouch!!!

Luckily, F# type inference is good enough that we don't really need to define type classes with a little boilerplate. Here is a Functor & Monad example:

//cannot do higher kinds so no type classes
//do not need higher kinds; type syntax takes care of it
module CategoryList
type Functor<'Type>() =
member this.map(l: list<'Type>)(f: 'Type -> 'Return) =
  List.map f l
type Monad<'Type>(func: Functor<'Type>) =
member this.point(t: 'Type) = [t]
member this.flatMap(l: list<'Type>)(f: 'Type -> list<'Return>) =
List.concat(func.map l f)

Defining a Functor and Monad instance is simple. We don't need a type hierarchy here; we only need 3 functions.

module Category
let check() =
//instantiate monad
let func = CategoryList.Functor<int>()
let monad = CategoryList.Monad(func)

//define monadic chain functions
let flatMap l f = monad.flatMap f l
let map l f= func.map f l

//define helpers
let mapper (x:int) = [1 .. x]
let mapperStr (x:int) = x.ToString()

let hundred = [1 .. 100]//data

//(|>) language level monad support
let hundredMore = (hundred
|> flatMap mapper)
printfn "%A" hundredMore

let bigString = (hundredMore
|> flatMap mapper
|> map mapperStr)
printfn "%A" bigString

The key here is the |> operator (also known as pipe). By piping a list into a flatMap into another flatMap and so on, we can get language level monad support.

Free Monad

Here I have failed. The closest I've gotten is:

module Free
type FreeMonad<'F, 'A> =
|Pure of 'A
|Suspend of 'F

type FreeMonad<'F, 'A> with
member this.fold fpure fsuspend =
match this with
|Pure(a) -> fpure(a)
|Suspend(free) -> fsuspend(free)

let inline liftF(t:'F)(func: 'T): FreeMonad<'F, 'A>
when ^T: (member map: (^F -> (^A -> ^AA) -> ^FF)) = Suspend(t)
let inline point(a: 'A)(func: 'T): FreeMonad<'F, 'A>
when ^T: (member map: (^F -> (^A -> ^AA) -> ^FF)) = Pure(a)

let check(): Unit =
let func = CategoryList.Functor<int>()
let lst: list<int> = [1..10]
let monad:FreeMonad<list<int>, int> = liftF(lst)(func)
()

I could never get the liftF function to work out. The combination of type parameters never type checks against the parameterized types AA and FF. The compiler just cannot understand the second layer of type inference that needs to happen for the map method to be recognized.

Pattern Matching

Pattern matching is pretty straight forward. You can do direct

let func a =
  match a with
  |1 -> "one"
  |2 -> "two"
  |3 | 5 | 10 | 30 -> "large"
  |_ -> "else"

with guards

let func a =
    match a with
    |value when 0 > value -> "negative"
    |value when 0 < value -> "positive"
    |_ -> "zero"

type matching

let func a =
    match a with
    |(str: string) -> "string"
    |_ -> "not"

discrimminated unions

type Union =
    |Simple
    |Pair of int * int
    |Triple of int * int * int
let func a =
    match a with
    |Simple -> "simple"
    |Pair(a, b) -> "(" + a.ToString() + "," + b.ToString() + ")"
    |Triple(a, b, c) -> "(" + a.ToString() + "," + b.ToString() + c.ToString() + ")"

That's most of pattern matching, by combining these patterns you can get some pretty serious control folow moving through your application.

 
 
 

5. Typeclasses over Subclasses (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

In the previous post, we introduced the Argonaut library to convert between values and JSON strings. The important part of this conversion is there was no superclass or interface to implement in order to get the benefit of JSON across classes. All we needed to do was define values of type CodecJson for each of the types we wanted to convert. We added the functionality to the class without changing the class itself. 

Argonaut allowed us to call toJson on classes with a codec and decodeOption on Strings to produce values of classes with a codec defined. This type of polymorphism, where a function's implementation depends on its inputs is called ad-hoc polymorphism. Furthermore, when we define a type, T, which defines functionality across classes to be used in ad-hoc polymorphic functions we call T a Type Class. Type Class polymorphism is a specific flavor of ad-hoc polymorphism.

Type Class polymorphism is a powerful tool for expressing context based functionality far more powerful than subclass polymorphism. As a well-known example take the Java interfaces Comparable and Comparator. If some data is defined in a class which implements Comparable, it can be sorted one way and needs an entire second class definition to be sorted with a different method. On the other hand, using Comparator the data is defined with a single class and each sort method gets its own Comparator. Comparator is a Type Class and allows the developer to determine in which contexts which sorting method should be used.

Subclass Method

Take the following traits:

trait Adder[Type]{
  def add(other: Type): Type
}
trait Chainer[Arg, Type[Arg]]{
  def chain[Res](f: Arg => Type[Res]): Type[Res]
}

Adder describes how to add two values of some Type together. Chainer describes how to chain operations over a parameterized type.

We'll use the idea of a team to illustrate. For the sake of simplicity we say a team consists of people of a certain profession. So we can have a team of engineers or a team of doctors or a team of cashiers or ...

Teams can (trivially) grow by hiring but, they can also grow by combining with other teams. Teams can be added.

Teams can have members who are themselves team leads. At times, the members of a lead's team must join the team the lead belongs to. This implies an operation which develops teams out of the members of teams. Teams can be chained.

Here is our implementation of Team given this functionality:

case class Team[Type](members: List[Type])
  extends Adder[Team[Type]]
  with Chainer[Type, Team]{
  override def add(other: Team[Type]): Team[Type] = {
    Team(members ++ other.members)
  }
  override def chain[Res](
      f: Type => Team[Res]): Team[Res] = {
    val list = members.flatMap(member => f(member).members)
    Team(list)
  }
}

Simple enough but this doesn't account for an organization of structured teams. For an organization who develops teams that each have one product lead and one technical lead, simple concatenation won't maintain a soft ranking of individuals within the new team. We need a new Team definition which accounts for this.

case class TeamStructured[Type](members: List[Type])
  extends Adder[TeamStructured[Type]]
  with Chainer[Type, TeamStructured]{
  override def add(
      other: TeamStructured[Type]): TeamStructured[Type] = {
    val (lead1, indi1) = members.splitAt(2)
    val (lead2, indi2) = other.members.splitAt(2)
    TeamStructured(lead1 ++ lead2 ++ indi1 ++ indi2)
  }
  override def chain[Res](
      f: Type => TeamStructured[Res]): TeamStructured[Res] = {
    val (leaders, individuals) = members.map{member =>
      val mems = f(member).members
      mems.splitAt(2)
    }.unzip
    TeamStructured(
        leaders.flatMap {x=>x} ++
        individuals.flatMap{x=>x})
  }
}

Now we have two definitions for the same data that differ only by functionality. We have a triple coupling here:

  1. Data Definition
  2. Addition Description
  3. Chaining Description

If the data needs to change (from List to Set is a good place to start) the change needs to be made in two places. Each function which accepts a Team for the purpose of team composition and combination needs to know which style of team it needs at development time. These problems gets worse for each possibility for combining and chaining teams (maybe a round robin or reverse algorithm would fit in certain situations). Type Classes solve these issues.

Type Class Method

Our traits become:

//The underscore here implies we need a parameterized type.
trait Adder[Type[_]]{
  def add[Item](
      left: Type[Item], right: Type[Item]): Type[Item]
}
trait Chainer[Type[_]]{
  def chain[Item, Res](
      arg: Type[Item], f: Item => Type[Res]): Type[Res]
}

These have the same uses as their counterparts above. However we have a single definition of the Team type:

case class Team[Type](members: List[Type])

The data is defined in a single place. Each piece of software which requires a Team has a consistent idea about what a Team is and means. The two versions of functionality are defined by:

object unstructured{
  implicit def adder: Adder[Team] = new Adder[Team]{
    override def add[Item](
        left: Team[Item], right: Team[Item]): Team[Item] = {
      Team(left.members ++ right.members)
    }
  }

  implicit def chainer: Chainer[Team] = new Chainer[Team]{
    override def chain[Item, Res](
        arg: Team[Item], f: Item => Team[Res]): Team[Res] = {
      val list = arg.members.flatMap(
          member => f(member).members)
      Team(list)
    }
  }
}

object structured{
  implicit def adder: Adder[Team] = new Adder[Team]{
    override def add[Item](
        left: Team[Item], right: Team[Item]): Team[Item] = {
      val (lead1, indi1) = left.members.splitAt(2)
      val (lead2, indi2) = right.members.splitAt(2)
      Team(lead1 ++ lead2 ++ indi1 ++ indi2)
    }
  }

  implicit def chainer: Chainer[Team] = new Chainer[Team]{
    override def chain[Item, Res](
        arg: Team[Item], f: Item => Team[Res]): Team[Res] = {
      val (leaders, individuals) = arg.members.map{member =>
        val mems = f(member).members
        mems.splitAt(2)
      }.unzip
      Team(
          leaders.flatMap {x=>x} ++
          individuals.flatMap{x=>x})
    }
  }
}

Now, each function which accepts a team, if needed, will also accept an adder or chainer or both (wholly decoupled). The down side here is each call to such a function requires at least one extra argument from the subclass versions. Scala has a fix for this limitation.

Implicits

The implicit keyword before a definition is an important part of making Type Class polymorphism beneficial to the developer. The word implicit, according to Oxford Dictionaries, means Implied though not plainly expressed. In Scala it means we can prepend the implicit keyword to an argument list and not explicitly produce the value in code assuming a valid value is in scope. For example:

def chainTeams[Type, Result](
  team: Team[Type])(
  func: Type => Team[Result])(
  implicit chain: Chainer[Team]): Team[Result] = {
  chain.chain(team, func)
}

This has three arguments, the team to operate on, the operation to perform, and the chainer for application. However, since the final argument is implicit, if we bring a valid implicit value into scope, there is no need to pass it in directly.

import structured._
val team: Team[Person] = Team(List(???))
val func: Person => Team[Person] = {(p: Person) => ???}
val newTeam: Team[Person] = chainTeams(team)(func)//valid

Since, we don't need to explicitly state the Chainer it keeps boilerplate clean. A nice effect of implicit resolution is if you have scoped two separate valid values for the implicit argument, the compiler will complain. The suggestion if you have multiple valid implicits in scope is to decouple your code functionally. No single scope should have use of more than one implicit of the same type; this is a code smell. A corollary to this is one should not explicitly provide implicit arguments; let the compiler do its work.

In the final post of this series, we will introduce another library, Cats.

Monocle & Argonaut (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

In the last post, we put to rest our use of mutable objects for good. Here we learn how to make use of our new found Functional Programming powers in the real world.

Every application of sufficient user base requires a persistent settings store. The more users one has, the more styles one is responsible for accommodating. In my experience, JSON has been the most useful format for small-scale persistence. JSON is widely understood, works on the web and is plaintext. We'll use Monocle and Argonaut to implement a persistent settings store.

The first question here is "Why not circe?". I found circe to be a bit too ethereal for most Java developers to wrap their heads around. Implicit scope (especially when its as magical as circe's auto) is an alarming feature for people who come from a language with no developer-defined implicit semantics (C++ developers are quite comfortable with this notion).

Monocle

The Monocle library provides semantics for defining simple accessors and combinators on nested data. Monocle is especially well suited for handling nested Case Classes which is what we'll focus on. Take the following data definition

case class Color(r: Byte, g: Byte, b: Byte)
case class FishTank(liters: Int, color: Color, fish: List[Fish])

We have nested Case Classes as well as a nested collection. To cover the changes that can occur here we would need to define:

  • eighteen operations
  • six of which are a composition from a Fish Tank into a Color
  • one of which is nested within a List structure.

Monocle makes this simple:

val (tankLiters, tankColor, tankFish) = {
  val gen = GenLens[FishTank]
  (gen(_.liters), gen(_.color), gen(_.fish))
}
val (colorR, colorG, colorB) = {
  val gen = GenLens[Color]
  (gen(_.r), gen(_.g), gen(_.b))
}
val (tankColorR, tankColorG, tankColorB) = (
  tankColor.composeLens(colorR),
  tankColor.composeLens(colorG),
  tankColor.composeLens(colorB)
)

Defining your data using Case Classes provides Monocle with the information it needs in order to generate lenses (nested views) into your data structures. Lens composition in Monocle is a single straightforward call. We can get into and out of our data with very little boilerplate.

Now that we can define settings and alter them, we need a way to persist them and communicate them to other parts of the system. We'll use JSON as our data format and Argonaut as our transcoder.

Argonaut

Argonaut provides semantics for converting between classes and JSON strings. Like Monocle, its easiest to use with Case Classes. Taking the same classes as above we would have:

//ignore the implicit keyword.
//I promise we'll get to it in the next post!
implicit def codecTank: CodecJson[FishTank] =
  casecodec3(
      FishTank.apply, FishTank.unapply
  )("liters", "color", "fish")
implicit def codecColor: CodecJson[Color] =
  casecodec3(
      Color.apply, Color.unapply
  )("r", "g", "b")
implicit def codecFish: CodecJson[Fish] =
  CodecJson(
    (f: Fish) =>
      ("name" := f.name) ->:
      ("color" := f.color) ->:
      jEmptyObject,
    (c: HCursor) => for{
      name <- (c --\ "name").as[String]
      color <- (c --\ "color").as[String]
    }yield{(name, color) match{
      case ("One Fish", "Red Fish") => OneFish
      case ("Two Fish", "Blue Fish") => TwoFish
      case _ => NotFish
    }}
  )

There are quite a few operators here. This could make things tricky for Java developers at first but, there are few of them so no big deal. For Case Classes, Argonaut has next to no boilerplate; one passes in the apply and unapply functions and names everything. Also, composition in Argonaut is implicit. It gets a little tricky with non Case Classes but, its still not much. At most Argonaut requires two functions; one from the class to JSON, the other from JSON to the class.

One more thing to note is that codecs for simple standard collections are implicit. The codec for List[Fish] is implicitly defined by the codec for Fish.

Putting it all together

A settings object would look something like:

object settings{
  private val settings: mutable.Map[String, FishTank] =
    mutable.Map()
  
  def apply(key: String): Option[FishTank] = settings.get(key)
  def update(key: String, byte: Byte): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankColor.modify { _ => Color(byte, byte, byte) }(tank)
      case None =>
        FishTank(0, Color(byte, byte, byte), Nil)
    }
  }
  def update(key: String, size: Int): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankLiters.modify(_ => size)(tank)
      case _ =>
        FishTank(size, Color(0,0,0), Nil)
    }
  }
  def update(key: String, fish:List[Fish]): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankFish.modify(_ => fish)(tank)
      case _ =>
        FishTank(1, Color(0,0,0), fish)
    }
  }
  
  def persist(): Unit = {
    val jsonRaw = settings.toList.asJson
    val json = jsonRaw.nospaces
    putOnWire(json)
    writeToDisk(json)
  }
  def recall(): Unit = {
    val str = getFromDisk()
    val opt = str.decodeOption[List[(String, FishTank)]]
    opt.foreach{list =>
      settings ++= list.toMap
    }
  }
}

But, of course this is not threadsafe and it uses a mutable collection to perform its work. A different threadable implementation could look something like:

val actorSystem: ActorSystem = ???
implicit val timeout: akka.util.Timeout = ???
implicit val ec: ExecutionContext = ???
object asyncSettings{
  private sealed trait Message
  private case class Get(key: String)
      extends Message
  private case class SetGrey(key: String, hue: Byte)
      extends Message
      
  private class Perform extends Actor{
    override val receive: Receive = step(Map())
    def step(map: Map[String, FishTank]): Receive = {
        case Get(key) => sender ! map(key)
        case SetGrey(key, value) =>
          val newTank: FishTank = ???
          val newMap = map + (key -> newTank)
          context.become(step(newMap))
    }
    override def preStart(): Unit = ???//recall
    override def postStop(): Unit = ???//persist
  }
  
  val actor: ActorRef = actorSystem.actorOf{
    Props(new Perform())
  }
  def apply(key: String): Future[FishTank] =
    (actor ? Get(key)).collect{
      case Some(t: FishTank) => t
    }
  def update(key: String, hue: Byte) = 
    actor ! SetGrey(key, hue)
}

Here we use akka for asynchronous operations. One could also employ scalaz Task or simple Future composition or really any other asynchronous library.

Next we'll cover Type Classes to further decouple our data from functionality.

 

 

 

 

4. Objects are not Coroutines (Introducing Typelevel Scala into an OO Environment)

In the previous posts, we went over how to introduce immutability, combinators and case classes to move toward functional programming. These three points together are the basis for the point described in this point that Objects are not Coroutines.

If you are unfamiliar with coroutines, wikipedia has a basic description of them.

In Java, the usual application runs a little like this:

  1. Initialize an object
  2. Perform an operation
  3. Mutate the object
  4. Perform an Operation
  5. ...

This habit breaks all of the FP ideas we have developed so far.

When introducing Typelevel Scala, it is important to note we are not simply adding a library to an already existing system (the JVM). We are trying to change how people do their day to day work. Some of them have been writing OO Imperative software products for decades making a change of paradigm difficult. Keeping to simple language is key to our goal of shifting an organization's workflow.

The workflow shift we are suggesting here is:

  1. Define
  2. Apply Combinator
  3. ...

We will be moving from a paradigm based in mutability and state to one built on immutability and functions.

OO Imperative Style

We will be defining a school of fish and how that school of fish grows.

class BadSchool(){
  private var name: String = null
  private var depth: Depth = null
  private var location: Location = null
  private var fish: mutable.Buffer[Fish] = null
  def setName(newName: String): Unit = {
    name = newName
  }
  def getName(): String = name
  def setDepth(newDepth: Depth): Unit = {
    depth = newDepth
  }
  def getDepth(): Depth = depth
  def setLocation(newLocation: Location): Unit = {
    location = newLocation
  }
  def getLocation(): Location = location
  def setFish(newFish: mutable.Buffer[Fish]): Unit = {
    fish = newFish
  }
  def removeFish(aFish: Fish): Unit = {
    fish -= aFish
  }
  def addFish(aFish: Fish): Unit = {
    fish += aFish
  }
  def getFish(): mutable.Buffer[Fish] = fish
  override def toString(): String= {
    s"School(\n\t$name,\n\t$depth,\n\t$location,\n\t$fish)"
  }
}

In my experience, this is the type of code commonly written by people who have just made the jump from an OO language into Scala. Here we initialize the object's members to null and use mutable containers to maintain and augment state. A typical use case would look like:

def asCoroutine(): Unit = {
  val coroutine = new BadSchool()
  val (name, depth, location, fish) = someInit()
  coroutine.setName(name)
  coroutine.setDepth(depth)
  coroutine.setLocation(location)
  coroutine.setFish(fish)
  convertToJsonAndPutOnTheWire(coroutine)
  var newFish: Fish = null
  for(i <- (0 to 10)){
    newFish = nextFish(coroutine)
    coroutine.addFish(newFish)
    convertToJsonAndPutOnTheWire(coroutine)
  }
}

This application is difficult to follow and very cluttered. In order to create a new valid school of fish, one must initialize the object and call four set methods. When growing a school of fish, the developer needs to destroy the previous school of fish forcing any interaction with the object to be synchronous.

This is like a very messy coroutine. The addFish method is like a yield and there is no way to get back to the previously returned yield state.

Typelevel Style

The above impure code can be fairly simply converted to a more FP style. First we define our school of fish.

case class School(
    name: String,
    depth: Depth,
    location: Location,
    fish: immutable.Queue[Fish])

This is short and to the point. The intent of the code is clear and there are no messy methods defined for maintaining, getting and mutating state. The same use case would be implemented functionally like:

def aBetterWay(): Unit = {
  @annotation.tailrec
  def perform(qty: Int, acc: List[School]): List[School] = {
    if(qty > 0 && acc.nonEmpty){
      val head :: tail = acc
      val currentFish = head.fish.last
      val next = nextFish(currentFish)
      val result = head.copy(fish = head.fish.enqueue(next))
      perform(qty - 1, result :: acc)
    }else acc
  }
  val school = School(
      "Bikini Bottom",
      Deep,
      South,
      immutable.Queue(OneFish))
  val result = perform(10, List(school))
  result.foreach(convertToJsonAndPutOnTheWire)
}

There are two main ideas:

  1. In lieu of initializing state we define data. 
  2. Once data is defined, it cannot be redefined.

The function buildSchool builds a new school of fish from a provided school of fish. State is created, never destroyed, and there is no initialization step. Moreover, the webservice call can be made asynchronous without worry for synchronization or heap issues.

Next, we'll introduce our first libraries: Monocle and Argonaut.

 

3. Case Classes and Auto-Encapsulation (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

We have decoupled data from its usage by replacing more complicated control flow ideas (like loops null and throw) with combinators by passing functions as parameters.  This allows us to think differently about how we write programs.

Instead of thinking about software as objects acting on themselves; we can start to think of it as two parts:

  1. Data
  2. Functions which define Data flow

To illustrate, we'll redefine our color function as data.

sealed trait Fish{
  val name: String
  val color: String
}
case object OneFish extends Fish{
  override final val name: String = "One Fish"
  override final val color: String = "Red Fish"
}
case object TwoFish extends Fish{
  override final val name: String = "Two Fish"
  override final val color: String = "Blue Fish"
}
case object NotFish extends Fish{
  override final val name: String = "Ahab"
  override final val color: String = "White Whale"
}

Here, we model our fish data in three points; two good states and a bad state. Instead of modeling a two point set of data with Strings, we model it as three discrete states using case classes.

Case Classes are automatically encapsulated and immutable by default and, as the states are encoded into the types, there is no need to use null or throw.

At this point, we have built up a nice base for functional programming. We can define robust data structures which model our problem space very closely and functional constructs which allow us to transform that data. This brings us to the salient feature of this process: Objects are not Coroutines.

2. Combinators over loops, null & throw (Introducing Typelevel Scala into an OO Environment)

(examples can be found on GitHub)

In the previous section we found ways to transform mutable thinking into the immutable. Here we will take this one step further and introduce combinators into the organization.

A combinator is a method on a data structure. This method takes a function as argument and maps values of the data structure into other values of the data structure. This fits well with our immutable approach. There are four main ideas here:

  1. Functions produce new state; they do not destroy old state.
  2. Methods on structures are functions.
  3. Data and usage should be separate.
  4. Bad state should be handled as early as possible.

OO Imperative Style

When developers first get into Scala, it is pretty typical to see this kind of code:

class BadFish(
    private var m_name: String,
    private var m_color: String
){
  def this() = this(null, null)
  def getName(): String = m_name
  def getColor(): String = m_name
  def setName(name: String){
    m_name = name
  }
  def setColor(color: String){
    m_color = color
  }
  def isValid(): Boolean = try{
    check()
    true
  }catch{
    case _: IllegalArgumentException => false
  }
  def check(): Unit = {
    check(m_name, m_color)
  }
  def check(newName: String, newColor: String){
    if(!color(newName).equals(newColor))
      throw new IllegalArgumentException(
          "Fish color and name do not match"
      )
  }
}

The use case for this data is very tightly coupled to the data itself. In order to create a new value based on this value one must make at least 5 calls; 2 get calls, a new and 2 set calls:

val current: BadFish = ...
val (name, color) = (current.getName(), current.getColor())
val next = new BadFish()
next.setName(name)
next.setColor(color)

This puts a burden on any user of this class. One must know the entire structure of the class and the precise method calls for access in order to use it.

The set methods destroy old state to provide new state and the state of the object is not validated until a check method is envoked. When any one reference to this object has a set method called on it, the object could be made invalid for all references without any indication. It is very difficult to follow the logical flow of data through an application full of these kinds of classes.

This class also initializes values to null and provides semantics for its own data being overwritten. A null value is not a value of any type; its essentially the type system lying to the developer.

Typelevel Style

The same functionality can be provided by the following class:

class Fish(val fishName: String){
  val fishColor: String = color(fishName)
  def spawnFish(f: String => String): Fish = {
    new Fish(f(fishName))
  }
}

First thing's first. Its much shorter than the other snippet. The brevity affords a great deal of clarity about the class and intended usage.

It has a single input rather than two; the second variable is dependent upon the first for a valid state. Bad state is found at construction time. there is no longer any need to throw an exception in the class based on the class' own data. Everything is handled up front.

The single function doesn't destroy the state of the object, it only creates a new object based on the data in the current object. This allows us to make the data immutable as well as give the user of the class all the flexibility she needs for instantiation and processing of objects.

Next we'll use Case Classes to encapsulte our data and define it as immutable by default.

1. Immutability as Default (Introducing Typelevel Scala into an OO Environment)

(examples can be found on GitHub)

The first point is using Immutability as Default.

What does "as default" mean? We are not barring mutability from applications wholesale; there are practical reasons for using mutability. For instance,

  • performance
  • global settings.

However, the mutable code should be bounded by its defining scope. This idea can be captured with four rules of thumb:

  1. Function inputs are immutable.
  2. Function outputs are immutable.
  3. var and collection.mutable values are local and temporary.
  4. Function return values are placed into a val.

OO Imperative Style

First, we have a helper function for our examples.

def color(str: String): String = {
  str match{
    case "One Fish" => "Red Fish"
    case "Two Fish" => "Blue Fish"
  }
}

This snippet defines an imperative style One Fish Two Fish Red Fish Blue Fish. Note that we have two valid data points

  1. ("One Fish", "Red Fish")
  2. ("Two Fish", "Blue Fish")

Yet, we model it with a function from String to String which is a space much larger than 2 points. Also, it throws an exception on bad input; the function has a return type of String which is a lie since a thrown exception is a very different result from a String. I found code like this to be very common in the OO imperative world. In the coming posts we will seek to replace this with something better.

Some typical code which breaks our general rules looks a little like:

def bad(): mutable.Buffer[String] = {
  val fish = mutable.Buffer[String]()
  var one = "One Fish"
  var two = "Two Fish"

  fish.append(one)
  fish.append(two)

  one = color(one)
  two = color(two)

  fish.append(one)
  fish.append(two)
  
  fish
}

Here, mutable inputs or outputs can very easily poison threaded data in an application. This makes it difficult to follow the data through an application and reason about control flow.

Also, the vars one and two are used to first hold constant data then hold the result of a function call. The functional assignment to a var has similar negative cognitive affects as mutable arguments and return values. They needlessly complicate control flow.

Typelevel Style

The typelevel way to write the same functionality would be close to:

def better(): List[String] = {
  val one = "One Fish"
  val two = "Two Fish"

  List(
      one,
      two,
      color(one),
      color(two)
  )
}

The function is self contained. No mutable state inside the function escapes its defining scope and all function calls are nested readably within other function calls. The important part is the intent of the function is clear. Also, any threading performed around it is safe from accidental data poisoning; no messy synchronization calls are necessary.

Now that we have a foundation of immutability, we'll add combinators to our toolset.

Introducing Typelevel Scala into an OO Environment

These posts seek to describe a process by which a Scala dev can reasonably (responsibly) introduce Functional Programming (FP) in Scala into an Object Oriented (OO) Java development atmosphere. It can also serve as an entry point for an OO dev into the Scala community. They are based on a talk I gave at the Typelevel Summit in Philadelphia in 2016. Here are the slides and index cards from that talk. A video is available on YouTube.

We'll transform from OO paradigms to FP paradigms through five key ideas:

  1. Immutability as Default
  2. Combinators over loops, null & throw
  3. Case Classes for auto-encapsulation
  4. Objects are not Coroutines
  5. Type Classes over Subclasses

And provide practical examples using 3 libraries:

  1. Monocle
  2. Argonaut
  3. Cats

Each of these ideas builds upon the last and by the end we'll have a solid foundation for introducing FP and Category Theory without even saying any of the "M-words".

Before starting it is important to note that many of the people who develop software do not have Computer Science (CS) degrees. They will sometimes have a degree in another science or engineering discipline. Often, they will have no degree at all. An extension of this idea is you don't need a degree to define flatMap.

This fact is important to us because, the vernacular of CS is different from the vernacular of other disciplines. On the other hand, we are assuming an audience of OO developers so, some CS terminology is ok and indeed will be present throughout. Wherever OO words fail us, we will find "plain old English" ways of describing idioms and ideas. We will be coaching usage, strategies and tactics not terms and vocabulary.