Feb '17
08
Ad-Hoc Polymorphism in Scala
Data processing in large part can be boiled down to transformations of one piece of data into another. In Scala, it’s easy to be productive quickly by chaining .map
, .filter
, .reduceLeft
, and friends together. But when the number of types and transformations increases, composability and reusability become more and more important.
Scala’s type system presents an opportunity to make these transitions expressive and composable without losing compile time guarantees. This is especially important when considering long-running data processes, where a runtime error several hours into the process is massively frustrating.
Of course, there is a fine line between expressive and inexplicable. But careful application of generic types and implicit parameters can make for a powerful and flexible data pipeline.
I have tried to make this article as approachable as possible, but I'm going to assume you have a basic understanding of Scala syntax, generics, and implicit parameters.
:paste
before you paste. After you paste, hit CTRL+D to see the result.Union Types
Union types, or type disjunctions, are types that may be one type or another (or more, even), such as the pseudo-type String | Int
— a String
or an Int
.
One way to create type disjunctions in Scala is by creating a generic type, and in its companion object defining an implicit witness for each type in the union.
This example is adapted from a StackOverflow answer and shows a basic type disjunction implementation. The idea is to create a type that represents either a String
or an Int
.
class StringOrInt[T] object StringOrInt { implicit object IntWitness extends StringOrInt[Int] implicit object StringWitness extends StringOrInt[String] }
Next, we declare a function that accepts the type disjunction as a type parameter.
object Bar { def foo[T: StringOrInt](x: T) = x match { case _: String => println("str") case _: Int => println("int") } }
The Bar.foo
method's type parameter [T: StringOrInt]
seems a little counter-intuitive because in our case, T
will actually be an Int
or String
, but the type parameter provides what is called a context bound. By qualifying the type parameter with : StringOrInt
, the compiler ensures that an implicit StringOrInt
witness is in scope for the corresponding type T
.
Now, when calling Bar.foo(123)
or Bar.foo("a string")
the implicit witness will be found for the corresponding type and the compiler will be able to perform type checks, telling you when you try to pass an unsupported type into a method.
Bar.foo(123) // Outputs: int Bar.foo("test") // Outputs: str Bar.foo(true) // error: could not find implicit value for evidence parameter of type StringOrInt[Boolean]
StringOrInt[T]
a sealed trait (or sealed abstract class). By sealing the trait, you effectively prevent ‘abuse’ by disallowing the creation of witnesses for other types. If StringOrInt[T]
were sealed, you could not define a StringOrInt[Boolean]
witness outside of the file that StringOrInt
is defined in.An Alternate Syntax
Scala is a language where there are at least ten ways to do something, and context bounds are no different. The above Bar.foo
example can be rewritten as:
object Bar { def foo[T](x: T)(implicit ev: StringOrInt[T]) = x match { case _: String => println("str") case _: Int => println("int") } }
This syntax accomplishes the same end result, but removes the context bound, opting instead for what is known as “implicit evidence” for type StringOrInt[T]
. This is a little more verbose, but incredibly more useful, as we’ll see in the next section.
Beyond Unions
Context bounds and implicit evidence in Scala enable the creation of what are commonly known as type classes in Haskell. Type classes, in turn, enable ad-hoc polymorphism. Type classes are similar to implicit conversions, except they are much more explicit and flexible.
One use for type classes is to create relationships between unrelated types by creating a trait that accepts two type parameters. You can then define implicit witnesses for each relationship between types. Finally, create types that take these type classes as type parameters.
To show how this is useful, consider the following example.
// Given two unrelated types, perhaps representing Apache log data: case class ApacheLog(date: String, url: String) // And the aggregate count of that data: case class URLCounts(url: String, count: Int) object URLCounts { // A helper method to convert logs into counts. def fromLogs(logs: Seq[ApacheLog]): Seq[URLCounts] = { logs.map(line => URLCounts(line.url, 1)) .groupBy(_.url) .map{case (url, counts) => URLCounts(url, counts.map(_.count).reduceLeft(_ + _))}.toSeq } } val input = Seq( ApacheLog("2017-02-08 19:45:22", "http://localhost/some-url"), ApacheLog("2017-02-08 19:46:04", "http://localhost/some-url"), ApacheLog("2017-02-08 19:46:53", "http://localhost/some-url"), ApacheLog("2017-02-08 19:46:57", "http://localhost/some-other-url")) URLCounts.fromLogs(input).foreach(c => println(s"${c.url}: ${c.count}")) // Outputs: // http://localhost/some-other-url: 1 // http://localhost/some-url: 3
This, in my opinion, is a great way to encapsulate transformations from one type to another. However, we must explicitly know exactly how the transformations must occur to make them happen. That is, we must know that we need to call URLCounts.fromLogs(input)
to make the solution work.
Instead, what if we could embed this knowledge in the type system? We could then write code that doesn’t care how two things relate— only that they do relate, by way of implicit evidence. Read on.
// A Relation represents the relationship between two types, namely a way to convert from one to the other. trait Relation[T, R] { def convert(t: Seq[T]): Seq[R] } object Relation { // Now define the relationship between ApacheLog and URLCounts. implicit object CountsFromLogs extends Relation[ApacheLog, URLCounts] { def convert(t: Seq[ApacheLog]): Seq[URLCounts] = URLCounts.fromLogs(t) } } // A Converter is a generic converter that can convert sequences of one type to another. class Converter[T, R](implicit relation: Relation[T, R]) { def convert(seq: Seq[T]): Seq[R] = relation.convert(seq) } val input = Seq( ApacheLog("2017-02-08 19:45:22", "http://localhost/some-url"), ApacheLog("2017-02-08 19:46:04", "http://localhost/some-url"), ApacheLog("2017-02-08 19:46:53", "http://localhost/some-url"), ApacheLog("2017-02-08 19:46:57", "http://localhost/some-other-url")) val converter = new Converter[ApacheLog, URLCounts] converter.convert(input).foreach(c => println(s"${c.url}: ${c.count}")) // Outputs: // http://localhost/some-other-url: 1 // http://localhost/some-url: 3
Initially, it seems like a lot of boilerplate for little gain, but we’ve created an arguably clean and expressive public API, the Converter
especially, by making use of implicit evidence for a Relation[T, R]
. The converter operates on this Relation[T, R]
; but it has no idea what its converting.
In this way, we can create a relationship between two otherwise unrelated types. This is ad-hoc polymorphism, and it allows for the flexibility of dynamic typing while providing compile time guarantees that the types are sound. This can help you reduce runtime errors that might occur hours into a massive data pipeline process. And to top it off, the user doesn’t need to know about how the transformation happened, only that it exists.
Relation[T, R]
type above is not sealed— anyone can define new relationships at any point. This can very quickly turn into a giant mess, so it’s probably best to define type classes on a package level and seal them so that they can’t be tampered with. This does have some ramifications, namely you can no longer make one off, custom transformations using your API, but the benefit is you limit the possible side effects of having witnesses littered all over your codebase.When modeling data transformations, these relationships can help you gain a clear picture of the process without getting bogged down in how those transformations occur.
A Step Further
At this point, you might be skeptical about the benefit of all this. Let us take a look at where type classes really start to shine.
You may have noticed that the examples from the previous section called the type URLCounts
, not ApacheCounts
. With good reason!
Let’s extend the previous examples, making a new type class HasURL[T]
and changing URLCounts.fromLogs
to accept this type class as implicit evidence.
// HasURL is a generalized type representing something with a URL. trait HasURL[T] { def url(t: T): String } object HasURL { // Define how to get a URL out of an ApacheLog. implicit object ApacheLogsHaveURL extends HasURL[ApacheLog] { def url(t: ApacheLog): String = t.url } } case class URLCounts(url: String, count: Int) object URLCounts { // We use the implicit evidence syntax here, allowing us to use // methods defined on HasURL[T]. def fromLogs[T](logs: Seq[T])(implicit ev: HasURL[T]): Seq[URLCounts] = { logs.map(line => URLCounts(ev.url(line), 1)) .groupBy(_.url) .map{case (url, counts) => URLCounts(url, counts.map(_.count).reduceLeft(_ + _))}.toSeq } } URLCounts.fromLogs(input).foreach(c => println(s"${c.url}: ${c.count}")) // Outputs: // http://localhost/some-other-url: 1 // http://localhost/some-url: 3
Notice how the above example actually uses the evidence ev
, calling the url
method on HasURL[T]
to actually retrieve the URL from the type T
.
If the implication isn’t clear, we’ve now decoupled URLCounts
from knowing anything about the type it operates on! This means we can now count any type T
that has implicit evidence for HasURL[T]
. In other words, we can now count anything that has a URL. Note I did not make the trait HasURL[T]
sealed, nor did I make Relation[T, R]
sealed— we can add more witnesses as needed. Let’s try it out.
// A Conversion represents a sale that occurred on a merchant's website. case class Conversion(url: String, orderNumber: String, purchaseAmount: Int) // The syntax here is a little different because we are in the top-most scope; this // could (and probably should) be defined in the HasURL companion object. implicit def conversionsHaveURL: HasURL[Conversion] = new HasURL[Conversion] { def url(t: Conversion): String = t.url } val input = Seq( Conversion("http://buycool.stuff/order-confirm", "2134", 4999), // Prices are in cents, btw. Conversion("http://buycool.stuff/order-confirm", "1235", 3563), Conversion("http://lotsofneat.stuff/thank-you", "523a1", 6799)) URLCounts.fromLogs(input).foreach(c => println(s"${c.url}: ${c.count}")) // Outputs: // http://lotsofneat.stuff/thank-you: 1 // http://buycool.stuff/order-confirm: 2
Amazing! Since URLCounts.fromLogs
is now operating on HasURL[T]
, we only need to define an implicit HasURL[T]
and it can do its work.
We aren’t done yet! Let’s make a new Relation[T, R]
. But this relationship is not between ApacheLog
s and URLCounts
, nor Conversion
s and URLCounts
. We’re going to make a relationship between HasURL[T]
and URLCounts
.
// In this example, we use context bounds instead of implicit evidence. We // don't need the evidence, so there's no point in having it passed as an // implicit parameter. implicit def hasURLToCounts[T: HasURL]: Relation[T, URLCounts] = new Relation[T, URLCounts] { def convert(t: Seq[T]): Seq[URLCounts] = URLCounts.fromLogs(t) } val converter = new Converter[Conversion, URLCounts] converter.convert(input).foreach(c => println(s"${c.url}: ${c.count}")) // Outputs: // http://lotsofneat.stuff/thank-you: 1 // http://buycool.stuff/order-confirm: 2
Holy cow! Notice that we did not ever define an explicit relationship between Conversion
and URLCounts
. Instead, we defined a relationship between HasURL[T]
and URLCounts
. Now any sequence of type T
that has implicit evidence for HasURL[T]
can be converted into a sequence of URLCount
s! This is the true power of ad-hoc polymorphism.
Finally
There is a lot going on in this article, and a lot of stuff that could be expanded and explained more thoroughly. However, I hope this serves as a practical overview of the power of type classes and the Scala type system in general.
Comments
Search
Archive
- November 2022
- Incremental Progress
- August 2021
- Self-Hosting for Fun and Personal Freedom
- July 2019
- Closing Channels Twice in Go
- May 2019
- On Life, Legacy, and JavaScript
- March 2018
- Refactoring, Now With Generics!
- November 2017
- Packages 3.2 released!
- September 2017
- Introducing the MOTKI CLI
- July 2017
- Decoupling Yourself From Dependencies
- May 2017
- Model Rocketry Update
- April 2017
- Dynamic DNS with homedns