A hand drill may not always be the right tool

Why am I writing this?

Clients, recruiters and fellow software engineers usually ask what makes it "better" than the incumbent industry language - Java. I get a lot of questions from people that perhaps have not encountered Scala before asking why they would use it over something that most people (for some definition of most that probably includes Computer Science graduates) are already familiar with. To make the (hopefully only) analogy - if you have a hand drill that works, why would you want a power tool? One answer could be scale and convenience - you can build a fine piece of furniture with a hand drill, but if you need to make many holes in a concrete wall, it is much more practical to use a power drill with a hammer function.

This article will attempt to explain why I prefer Scala over Java for most tasks where one could pick either. The usual disclaimer applies - it's just one (albeit very powerful) set of tools in the programmer's toolbox, and will not help you be better at anything if you use it incorrectly (such as drilling in the wrong place or with the wrong technique. Ok, analogy_count++). I will start at a very basic technical level and increase the level of detail as the article continues.

TL;DR;

Scala can be a very productive language to develop in - a business domain can be modelled consisely (using case classes), complex problems can be broken down into simpler pieces (using function composition and pattern matching) and some classes of errors can even be eliminated at compile time (using typeclass derivation as an example). It is indeed a more complicated language than Java but overall I think the gains are worth the initial ramp-up time.

Language background

As an introduction - Java is a general-purpose object-oriented programming language first released in 1995, with a static type system that includes parametric polymorphism and inheritance, targeting a runtime system with garbage collection and a bytecode VM (the Java Virtual Machine or JVM). Scala is a multi-paradigm general-purpose programming language (object-oriented and functional) first released in 2004, with a static type system (that has all the features of Java's and more), targeting primarily the JVM, but also the web (with Scala.js) and native code (with Scala Native). It is telling that IDEs offer support for automatic conversion of Java to Scala but not the other way round.

The above paragraph contained quite a bit of jargon - let me try to define some of the terms:

functional:

ability to treat functions as a first-class value; also the general term for programming techniques grouped around that idea, such as: immutability, parametricity, currying

inheritance:

ability to get the behaviour and data (methods and fields) of parent class (superclass) by a single syntactic keyword (usually extends)

garbage collection:

memory is allocated dynamically at runtime and is freed by collection at some later point dynamically determined at runtime (in contrast with statically allocated memory where both the lifetime and sizes are known at compile time)

general-purpose:

equally applicable to all domains (i.e. nothing specific to make video games vs. text editors for example)

object-oriented:

able to associate data (fields) with code (methods), usually using a convenient syntax

parametric polymorphism:

able to write a function that can use the same code to perform an operation across different types (like an addition operator of a semigroup e.g. the natural numbers)

statically typed:

every value has a specific type known at compile-time (e.g. you can differentiate between a integer and a piece of text)

Terms that will be used later are not going to be explained but only linked to definitions or examples elsewhere (to reduce the scope of the article).

What features in Scala help the most?

As long as two languages are Turing Complete any programs written in one language can be written in another. However, this is not very useful - it may be far more appropriate to write a program in C vs. a program in Javascript depending on factors like tooling, ecosystem support, the environment where the program is going to run, the amount of resources available to the program etc.

Both languages run on the JVM, and both are used for backend development in small to large companies. To usefully compare Scala to Java we need to talk about specific features, my selection of which is:

  • case classes
  • pattern matching
  • implicits (given in Scala 3)
  • macros (mostly library usages though)

Case classes

and Algebraic Data Types

A case class is a container for immutable fields and (optionally) other behavior. (In Java terminology, they are value-based classes). For example, suppose you have a bus position you want to represent that contains the following information:

  • bus identifier (and maybe line, initial departure time, vehicle id)
  • coordinates of bus
  • timestamp

You might initially model everything using String (example in JSON):

{
    "bus": "42",
    "coordinates": "0.0,0.0",
    "timestamp": "2019-01-01T12:23:22.813Z"
}

And in Scala:

final case class BusPosition(bus: String, coordinates: String, timestamp: String)

val bus = Bus("0.0,0.0", "42", "2019-01-01T12:23:22.813Z")

println(bus.bus)
println(bus.coordinates)

At first, it seems fine to use String everywhere, but there's a bug in the above code. (The mistake is known as stringly-typed code and is not specific to either Scala or Java). Let's try again:

import java.time.Instant

final case class TransportLine(name: String) extends AnyVal
final case class VehicleId(id: String) extends AnyVal
final case class Coordinates(latitude: Double, longtitude: Double)
final case class BusVehicle(id: VehicleId, line: TransportLine, initialDeparture: Instant)
final case class BusPosition(bus: BusVehicle, coordinates: Coordinates, timestamp: Instant)

val bus = BusPosition(
    BusVehicle(VehicleId("01fa3-a35"), TransportLine("42"), Instant.of(2019, 1, 1, 12, 15)), 
    Coordinates(0.0, 0.0), 
    Instant.of(2019, 1, 1, 12, 23, 22, 813)
)

The above:

  • is very concise (all definitions in one place)
  • has a different type for every field in the BusPosition class (which also makes the type definition easier to read without having to read the names)
  • defines by virtue of being a case class: .equals, .hashCode and .toString as well as .copy which allows you to make a copy with only specific fields changed.

The extends AnyVal is known as a value class and allows to define a class with a single field that is not actually allocated at runtime. It allows us to clearly say what we mean when we write "42" for example - it is a transport line, but without any overhead except to declare the type and use it.

How to represent this in Java? There are still at least two camps about how to define an equivalent in Java - either with a series of classes with getters and setters, or with immutable fields as above.

With immutable fields, it would look like (split across several files):

import java.time.Instant

final class TransportLine {
    public final String name; 
}

final class VehicleId {
    public final String id;
}

final class Coordinates {
    public final Double latitude;
    public final Double longitude;
}

final class BusVehicle {
    public final VehicleId id;
    public final TransportLine line;
    public final Instant initialDeparture;
}

final class BusPosition {
    public final BusVehicle bus;
    public final Coordinates coordinates;
    public final Instant timestamp;
}

This is far less concise than Scala, and we still haven't defined object equality (so you can't use this representation if you want to compare two BusPosition in tests without defining equals yourself or getting the IDE to generate it for you). If you want to use the getters/setters style, then you have to write even more code, and lose immutability as a result. (Immutability is not just important for reasoning about the code, but also for safe publication in the presence of concurrency). You can also use Project Lombok, which will instrument your code at runtime, but this doesn't come as a built-in language feature.

Scala field accesses are actually using compiler-generated getters for uniform field/getter access, which means that if you define:

final case class Coordinates(latitude: Double, longtitude: Double)
final case class BusPositionLatLon(lat: Double, lon: Double)
final case class BusPositionCoordinates(coordinates: Coordinates) {
    def lat: Double = coordinates.latitude
    def lon: Double = coordinates.longitude
}

val posA = BusPositionLatLon(1.2, 3.4)
val posB = BusPositionCoordinates(Coordinates(1.2, 3.4))

posA.lat == posB.lat
posA.lon == posB.lon

Then you have a uniform access syntax to both fields and methods defined without parens (the convention is to typically leave out the parentheses if the method is not performing any side effects).

But this is just the basics (pattern matching is covered in the next section). Typically you have a situation where you want to model a case where you have a common type but different specific instances. Let's take the simplest (perhaps not the most illuminating example) of Optional

  • where you have either:

  • something containing a value

  • nothing without any value

In Java, this is modelled as a single class Optional. In Scala, this is modelled with the abstract class scala.Option which has a subclass Some for the case with a value, and a subclass None without. Actually the definition is a little more complex:

package scala

sealed abstract class Option[+A] extends Product with Serializable { ... }

final case class Some[+A](value: A) extends Option[A]
case object None extends Option[Nothing]

In the above:

  • + is a variance annotation that implies Option is covariant
  • sealed keyword tells the compiler that all the subclasses of Option are known at compile-time
  • case object None defines a singleton object that is an Option[Nothing]. This can be a singleton because Option is covariant, and Nothing is the bottom type (a subtype of every other type)

Why is the definition like it is and why is it better than Java's Optional?

We now have two concrete types: Some and None, and there are no other subtypes of Option. This means Option meets the definition of a Algebraic Data Type (or ADT for short), specifically a coproduct. The fact that Option is sealed allows us to do pattern matching exhaustively. This gives enormous type safety that simply isn't available in Java with instanceof. With ADTs we also have a uniform way of treating any type that is defined with sealed uniformly, which is a win for being able to reason about code.

With Optional, we cannot determine whether a value exists without calling the .isDefined or .isEmpty method, and certainly not by examining the types. But the example above is just one example - there are many instances where we would like to have compile-time certainty about handling all the cases of something. It is possible to use ADTs in Java with libraries like dataenum.

Pattern matching

Perhaps the case for case classes wouldn't be so strong if Scala was lacking another critical feature: pattern matching. Pattern matching is not just a safer version of instanceof in Java, it also allows to match on fields inside case classes, to do name binding, on values or to write guards for specific conditions. From the excellent Neophyte's Guide to Scala, patterns everywhere chapter:

final case class Player(name: String, score: Int)

def printMessage(player: Player) = player match {
  case Player(_, 0) => println("Try harder ;)")
  case Player(_, score) if score > 100000 => println("Get a job, dude!")
  case Player(name, _) => println(s"Hey $name, nice to see you again!")
}

In the above example:

  • destructuring the case class Player by putting name bindings to fields of Player, and _ underscore otherwise
  • used literal value 0 to distinguish the case where the score was zero
  • used a pattern guard to check for high scores

This could have been written in Java as follows:

public void printMessage(Player player) {
    if (player.score == 0) {
        println("Try harder ;)")
    } else if (player.score > 10000) {
        println("Get a job, dude!");
    } else {
        println("Hey " + player.name + ", nice to see you again!");
    }
}

And given the above was a pretty simple example, the code looks very similar.

But what about the case when ADTs are involved?

For example, if we want to model the bus location problem again, but now we have other types of transport:

final sealed trait TransportVehicle {
    def location: Coordinates
}

final case class Bus(name: String, location: Coordinates, sittingCapacity: Int, standingCapacity: Int) extends TransportVehicle
final case class Car(licensePlate: String, location: Coordinates, numSeats: Int) extends TransportVehicle
final case class RidesharingVehicle(company: String, id: Long, location: Coordinates, capacity: Int) extends TransportVehicle
final case class Tram(name: String, location:  Coordinates, sittingCapacity: Int, standingCapacity: Int) extends TransportVehicle

def findVehicleForThreePeople(vehicles: List[TransportVehicle]): Option[TransportVehicle] = vehicles.find {
    case bus: Bus               => bus.sittingCapacity + bus.standingCapacity >= 3
    case car: Car               => car.numSeats >= 3
    case rs: RidesharingVehicle => rs.capacity >= 3
    case tram: Tram             => tram.sittingCapacity + tram.standingCapacity >= 3
}

The above function is still a little simplistic (hard-coded constraints, no interface for capacity calculation) but it is still:

  • Clear and concise - the business logic (seats >= 3) is encoded in one place without any unnecessary detail
  • Exhaustive - if you add a new vehicle type, the compiler will warn that you have a new case to handle

In Java, this would look like:

public Optional<TransportVehicle> findVehicleForThreePeople(List<TransportVehicle> vehicles) {
    return vehicles.stream().filter(vehicle -> {
        if (vehicle isinstance Bus) {
            Bus bus = (Bus)vehicle;
            return bus.sittingCapacity + bus.standingCapacity >= 3;
        } else if (vehicle isinstance Car) {
            Car car = (Car)vehicle;
            return car.numSeats >= 3;
        } else if (vehicle isinstance Tram) {
            Tram tram = (Tram)vehicle;
            return tram.sittingCapacity + tram.standingCapacity >= 3;
        }
    }).findAny();
}

Did you spot the missing case above? The compiler didn't. At least there is a proposal for pattern matching in Java now.

Update Feb 2021: And now pattern matching is going to be released in Java SE 16

Implicits/given

given is a new mechanism coming in Scala 3 that improve upon the shortcomings of implicits in Scala 2.

Implicits are I think the feature in Scala that polarizes opinions the most. You either love it for the boilerplate reduction or hate it because of the perceived complexity and arcane rules of resolution (or because someone in your old project went wild with implicit conversions and made a mess).

There are 3 uses of implicit in Scala 2 at the moment:

  • Implicit classes, which allow you to declare extension methods on classes that you either didn't define yourself or are not able to change for backwards compatibility reasons
  • Implicit conversions, which allow the compiler to auto-magically convert a value from one type to another
  • Implicit parameters, which allow the compiler to inject an instance of a given class at compile-time, based purely on the imports and the implicit scoping rules

The details have been widely covered: Implicit Design Patterns in Scala or Implicits, type classes, and extension methods to give just two good posts.

For me, implicits are a great way to change run-time problems into compile-time problems (eliminating errors earlier) - especially when doing dependency injection or encoder/decoder definition.

Let's the take serialization example for JSON. In Java, you typically have a runtime mechanism for that using a library like Jackson that is reflection-based. If you look at some common Jackson exceptions, you can see things like:

  • JsonMappingException: Can not construct instance of
  • JsonMappingException: No suitable constructor
  • JsonMappingException: No serializer found for class

These can disappear if instead you use a library like circe which can do the work at compile-time:

// model definition
sealed trait SiteUser
final case object Anonymous extends SiteUser
final case class LoggedInUser(id: Int, name: String) extends SiteUser

// json protocol definition

import io.circe.generic.extras.Configuration
import io.circe.generic.extras.semiauto._

implicit val configuration: Configuration = Configuration.default.withSnakeCaseMemberNames.withDiscriminator("type")

implicit val siteUserEncoder: Encoder[SiteUser] = deriveEncoder

// usage
import io.circe.syntax._

val anon: SiteUser = Anonymous
val loggedIn: SiteUser = LoggedInUser(42, "bob")

println(anon.asJson)
println(loggedIn.asJson)

The above will print:

{
  "type" : "Anonymous"
}
{
  "id" : 42,
  "name" : "bob",
  "type" : "LoggedInUser"
}

If you add more cases to SiteUser or more fields, the derived encoder will keep compiling unless the fields have no encoder already defined for them. The resolution is done at compile-time because implicit is the mechanism under the hood. This guarantees that you will never get a runtime error unless the JSON itself is invalid (which is handled by all the fallible conversions returning an Either[Error, A] ADT).

Macros

Another advanced mechanism that Scala offers for generating code at compile-time is macros (scalameta). You typically won't use the feature directly unless you're a library author, so why is it relevant here?

Anecdote: at one of the projects recently I had a case where the JSON structure of an upstream system and the BigQuery database structure was identical (and was guaranteed to stay identical). To avoid duplicating code (and writing lots of error-prone manual mapping), I decided to reuse the case classes from the JSON world in the database mapping. BigQuery naturally only returned Map[String, Any] so I had to find a way to automatically populate the relevant case class. This was a perfect case for generic derivation (see type classes and generic derivation for a more complete example).

I wanted to have a trait (typeclass) that looked like:

trait Decoder[A] {
    def decode(bigQueryResult: BQ): Either[Error, A]
}

But not have to write it manually for each case class I wanted to query. Luckily, the magnolia library made it fairly straightforward to create a derivation for that, and I had a working example I could use within two days (for a fairly complex data type with lots of nested lists, maps, etc.). The initial time investment probably was similar to writing the manual mapping by hand, but since the investment is generic over any case class type, no mapping would need to be done manually in the future, saving time and reducing errors.

There is no direct equivalent of a macro library for Java - there is asm which does bytecode manipulation, but that is a lot lower level and magnolia supports coproducts and case classes which do not exist as high-level concepts in Java.

Summary

Scala for me offers both simple and advanced mechanisms to manage complexity (via ADTs, case classes and pattern matching), eliminate run-time errors by pushing checks to compile-time and conciseness. The reduction in code is significant because it both reduces bugs (no bugs in code that isn't written) and allows you to get up-to-speed with a new codebase quicker (there is less to read). The functional style of programming and the typelevel ecosystem of libraries make constructing software safer (at compile-time even before tests are involved) and more productive (due to the high level of genericity and low level of boilerplate). Functional style code typically allows for better local reasoning which is easier on the brains of developers and code reviewers.

I also ommitted a lot of smaller feature that Scala has such as for comprehensions, companion objects, type aliases, as well as patterns when writing scala code like typeclasses.

One disadvantage is that it will not take one day or even a week to get to know Scala well. I also feel that prototyping a new service when requirements are changing quickly is also quite difficult due to the static typing - sometimes it is easier to let everything be a string or a map, in which case a more dynamic language like Python or Clojure may be quicker to use initially.

Please leave comments or take a look at my example-pure-todomvc for more concrete example of what a Scala microservice that implements TodoMVC looks like.

Update Feb 2021: Have a look at the excellent bootzooka project, that offers up-to-date and comprehensive template to start building a microservice from