Why Scala?
A hand drill may not always be the right tool
Why am I writing this?
Clients, recruiters and fellow software engineers usually ask what makes it "better" than the incumbent industry language - Java. I get a lot of questions from people that perhaps have not encountered Scala before asking why they would use it over something that most people (for some definition of most that probably includes Computer Science graduates) are already familiar with. To make the (hopefully only) analogy - if you have a hand drill that works, why would you want a power tool? One answer could be scale and convenience - you can build a fine piece of furniture with a hand drill, but if you need to make many holes in a concrete wall, it is much more practical to use a power drill with a hammer function.
This article will attempt to explain why I prefer Scala over Java for
most tasks where one could pick either. The usual disclaimer applies -
it's just one (albeit very powerful) set of tools in the programmer's
toolbox, and will not help you be better at anything if you use it
incorrectly (such as drilling in the wrong place or with the wrong
technique. Ok, analogy_count++
). I will start at a very basic
technical level and increase the level of detail as the article
continues.
TL;DR;
Scala can be a very productive language to develop in - a business domain can be modelled consisely (using case classes), complex problems can be broken down into simpler pieces (using function composition and pattern matching) and some classes of errors can even be eliminated at compile time (using typeclass derivation as an example). It is indeed a more complicated language than Java but overall I think the gains are worth the initial ramp-up time.
Language background
As an introduction - Java is a general-purpose object-oriented programming language first released in 1995, with a static type system that includes parametric polymorphism and inheritance, targeting a runtime system with garbage collection and a bytecode VM (the Java Virtual Machine or JVM). Scala is a multi-paradigm general-purpose programming language (object-oriented and functional) first released in 2004, with a static type system (that has all the features of Java's and more), targeting primarily the JVM, but also the web (with Scala.js) and native code (with Scala Native). It is telling that IDEs offer support for automatic conversion of Java to Scala but not the other way round.
The above paragraph contained quite a bit of jargon - let me try to define some of the terms:
functional:
ability to treat functions as a first-class value; also the general term for programming techniques grouped around that idea, such as: immutability, parametricity, currying
inheritance:
ability to get the behaviour and data (methods and fields) of parent class (superclass) by a single syntactic keyword (usually
extends
)
garbage collection:
memory is allocated dynamically at runtime and is freed by collection at some later point dynamically determined at runtime (in contrast with statically allocated memory where both the lifetime and sizes are known at compile time)
general-purpose:
equally applicable to all domains (i.e. nothing specific to make video games vs. text editors for example)
object-oriented:
able to associate data (fields) with code (methods), usually using a convenient syntax
parametric polymorphism:
able to write a function that can use the same code to perform an operation across different types (like an addition operator of a semigroup e.g. the natural numbers)
statically typed:
every value has a specific type known at compile-time (e.g. you can differentiate between a integer and a piece of text)
Terms that will be used later are not going to be explained but only linked to definitions or examples elsewhere (to reduce the scope of the article).
What features in Scala help the most?
As long as two languages are Turing Complete any programs written in one language can be written in another. However, this is not very useful - it may be far more appropriate to write a program in C vs. a program in Javascript depending on factors like tooling, ecosystem support, the environment where the program is going to run, the amount of resources available to the program etc.
Both languages run on the JVM, and both are used for backend development in small to large companies. To usefully compare Scala to Java we need to talk about specific features, my selection of which is:
- case classes
- pattern matching
- implicits (
given
in Scala 3) - macros (mostly library usages though)
Case classes
and Algebraic Data Types
A case class is a container for immutable fields and (optionally) other behavior. (In Java terminology, they are value-based classes). For example, suppose you have a bus position you want to represent that contains the following information:
- bus identifier (and maybe line, initial departure time, vehicle id)
- coordinates of bus
- timestamp
You might initially model everything using String
(example in JSON):
{
"bus": "42",
"coordinates": "0.0,0.0",
"timestamp": "2019-01-01T12:23:22.813Z"
}
And in Scala:
final case class BusPosition(bus: String, coordinates: String, timestamp: String)
val bus = Bus("0.0,0.0", "42", "2019-01-01T12:23:22.813Z")
println(bus.bus)
println(bus.coordinates)
At first, it seems fine to use String
everywhere, but there's a bug
in the above code. (The mistake is known as stringly-typed code and is
not specific to either Scala or Java). Let's try again:
import java.time.Instant
final case class TransportLine(name: String) extends AnyVal
final case class VehicleId(id: String) extends AnyVal
final case class Coordinates(latitude: Double, longtitude: Double)
final case class BusVehicle(id: VehicleId, line: TransportLine, initialDeparture: Instant)
final case class BusPosition(bus: BusVehicle, coordinates: Coordinates, timestamp: Instant)
val bus = BusPosition(
BusVehicle(VehicleId("01fa3-a35"), TransportLine("42"), Instant.of(2019, 1, 1, 12, 15)),
Coordinates(0.0, 0.0),
Instant.of(2019, 1, 1, 12, 23, 22, 813)
)
The above:
- is very concise (all definitions in one place)
- has a different type for every field in the
BusPosition
class (which also makes the type definition easier to read without having to read the names) - defines by virtue of being a case class:
.equals
,.hashCode
and.toString
as well as.copy
which allows you to make a copy with only specific fields changed.
The extends AnyVal
is known as a value class
and allows to define a class with a single field that is not actually
allocated at runtime. It allows us to clearly say what we mean when we
write "42" for example - it is a transport line, but without any
overhead except to declare the type and use it.
How to represent this in Java? There are still at least two camps about how to define an equivalent in Java - either with a series of classes with getters and setters, or with immutable fields as above.
With immutable fields, it would look like (split across several files):
import java.time.Instant
final class TransportLine {
public final String name;
}
final class VehicleId {
public final String id;
}
final class Coordinates {
public final Double latitude;
public final Double longitude;
}
final class BusVehicle {
public final VehicleId id;
public final TransportLine line;
public final Instant initialDeparture;
}
final class BusPosition {
public final BusVehicle bus;
public final Coordinates coordinates;
public final Instant timestamp;
}
This is far less concise than Scala, and we still haven't defined
object equality (so you can't use this representation if you want to
compare two BusPosition
in tests without defining equals
yourself or
getting the IDE to generate it for you). If you want to use the
getters/setters style, then you have to write even more code, and lose
immutability as a result. (Immutability is not just important for
reasoning about the code, but also for safe publication in the presence
of concurrency). You can also use Project Lombok, which will instrument
your code at runtime, but this doesn't come as a built-in language
feature.
Scala field accesses are actually using compiler-generated getters for uniform field/getter access, which means that if you define:
final case class Coordinates(latitude: Double, longtitude: Double)
final case class BusPositionLatLon(lat: Double, lon: Double)
final case class BusPositionCoordinates(coordinates: Coordinates) {
def lat: Double = coordinates.latitude
def lon: Double = coordinates.longitude
}
val posA = BusPositionLatLon(1.2, 3.4)
val posB = BusPositionCoordinates(Coordinates(1.2, 3.4))
posA.lat == posB.lat
posA.lon == posB.lon
Then you have a uniform access syntax to both fields and methods defined without parens (the convention is to typically leave out the parentheses if the method is not performing any side effects).
But this is just the basics (pattern matching is covered in the next section). Typically you have a situation where you want to model a case where you have a common type but different specific instances. Let's take the simplest (perhaps not the most illuminating example) of Optional
-
where you have either:
-
something containing a value
-
nothing without any value
In Java, this is modelled as a single class
Optional.
In Scala, this is modelled with the abstract class
scala.Option
which has a subclass Some
for the case with a value, and a subclass
None
without. Actually the definition is a little more complex:
package scala
sealed abstract class Option[+A] extends Product with Serializable { ... }
final case class Some[+A](value: A) extends Option[A]
case object None extends Option[Nothing]
In the above:
+
is a variance annotation that impliesOption
is covariantsealed
keyword tells the compiler that all the subclasses ofOption
are known at compile-timecase object None
defines a singleton object that is anOption[Nothing]
. This can be a singleton becauseOption
is covariant, andNothing
is the bottom type (a subtype of every other type)
Why is the definition like it is and why is it better than Java's
Optional
?
We now have two concrete types: Some
and
None
, and there are no other subtypes of
Option
. This means Option
meets the
definition of a Algebraic Data Type (or ADT for
short), specifically a coproduct. The fact that Option
is sealed allows us to do pattern matching exhaustively. This gives
enormous type safety that simply isn't available in Java with
instanceof
. With ADTs we also have a uniform way of
treating any type that is defined with sealed
uniformly,
which is a win for being able to reason about code.
With
Optional,
we cannot determine whether a value exists without calling the
.isDefined
or .isEmpty
method, and certainly
not by examining the types. But the example above is just one example -
there are many instances where we would like to have compile-time
certainty about handling all the cases of something. It is possible to
use ADTs in Java with libraries like
dataenum.
Pattern matching
Perhaps the case for case classes wouldn't be so strong if Scala was
lacking another critical feature: pattern matching.
Pattern matching is not just a safer version of instanceof
in Java, it
also allows to match on fields inside case classes, to do name binding,
on values or to write guards for specific conditions. From the excellent
Neophyte's Guide to Scala,
patterns everywhere
chapter:
final case class Player(name: String, score: Int)
def printMessage(player: Player) = player match {
case Player(_, 0) => println("Try harder ;)")
case Player(_, score) if score > 100000 => println("Get a job, dude!")
case Player(name, _) => println(s"Hey $name, nice to see you again!")
}
In the above example:
- destructuring the case class
Player
by putting name bindings to fields ofPlayer
, and_
underscore otherwise - used literal value
0
to distinguish the case where the score was zero - used a pattern guard to check for high scores
This could have been written in Java as follows:
public void printMessage(Player player) {
if (player.score == 0) {
println("Try harder ;)")
} else if (player.score > 10000) {
println("Get a job, dude!");
} else {
println("Hey " + player.name + ", nice to see you again!");
}
}
And given the above was a pretty simple example, the code looks very similar.
But what about the case when ADTs are involved?
For example, if we want to model the bus location problem again, but now we have other types of transport:
final sealed trait TransportVehicle {
def location: Coordinates
}
final case class Bus(name: String, location: Coordinates, sittingCapacity: Int, standingCapacity: Int) extends TransportVehicle
final case class Car(licensePlate: String, location: Coordinates, numSeats: Int) extends TransportVehicle
final case class RidesharingVehicle(company: String, id: Long, location: Coordinates, capacity: Int) extends TransportVehicle
final case class Tram(name: String, location: Coordinates, sittingCapacity: Int, standingCapacity: Int) extends TransportVehicle
def findVehicleForThreePeople(vehicles: List[TransportVehicle]): Option[TransportVehicle] = vehicles.find {
case bus: Bus => bus.sittingCapacity + bus.standingCapacity >= 3
case car: Car => car.numSeats >= 3
case rs: RidesharingVehicle => rs.capacity >= 3
case tram: Tram => tram.sittingCapacity + tram.standingCapacity >= 3
}
The above function is still a little simplistic (hard-coded constraints, no interface for capacity calculation) but it is still:
- Clear and concise - the business logic (seats >= 3) is encoded in one place without any unnecessary detail
- Exhaustive - if you add a new vehicle type, the compiler will warn that you have a new case to handle
In Java, this would look like:
public Optional<TransportVehicle> findVehicleForThreePeople(List<TransportVehicle> vehicles) {
return vehicles.stream().filter(vehicle -> {
if (vehicle isinstance Bus) {
Bus bus = (Bus)vehicle;
return bus.sittingCapacity + bus.standingCapacity >= 3;
} else if (vehicle isinstance Car) {
Car car = (Car)vehicle;
return car.numSeats >= 3;
} else if (vehicle isinstance Tram) {
Tram tram = (Tram)vehicle;
return tram.sittingCapacity + tram.standingCapacity >= 3;
}
}).findAny();
}
Did you spot the missing case above? The compiler didn't. At least there is a proposal for pattern matching in Java now.
Update Feb 2021: And now pattern matching is going to be released in Java SE 16
Implicits/given
given is a new mechanism coming in Scala 3 that improve upon the shortcomings of implicits in Scala 2.
Implicits are I think the feature in Scala that polarizes opinions the most. You either love it for the boilerplate reduction or hate it because of the perceived complexity and arcane rules of resolution (or because someone in your old project went wild with implicit conversions and made a mess).
There are 3 uses of implicit
in Scala 2 at the moment:
- Implicit classes, which allow you to declare extension methods on classes that you either didn't define yourself or are not able to change for backwards compatibility reasons
- Implicit conversions, which allow the compiler to auto-magically convert a value from one type to another
- Implicit parameters, which allow the compiler to inject an instance of a given class at compile-time, based purely on the imports and the implicit scoping rules
The details have been widely covered: Implicit Design Patterns in Scala or Implicits, type classes, and extension methods to give just two good posts.
For me, implicits are a great way to change run-time problems into compile-time problems (eliminating errors earlier) - especially when doing dependency injection or encoder/decoder definition.
Let's the take serialization example for JSON. In Java, you typically have a runtime mechanism for that using a library like Jackson that is reflection-based. If you look at some common Jackson exceptions, you can see things like:
- JsonMappingException: Can not construct instance of
- JsonMappingException: No suitable constructor
- JsonMappingException: No serializer found for class
These can disappear if instead you use a library like circe which can do the work at compile-time:
// model definition
sealed trait SiteUser
final case object Anonymous extends SiteUser
final case class LoggedInUser(id: Int, name: String) extends SiteUser
// json protocol definition
import io.circe.generic.extras.Configuration
import io.circe.generic.extras.semiauto._
implicit val configuration: Configuration = Configuration.default.withSnakeCaseMemberNames.withDiscriminator("type")
implicit val siteUserEncoder: Encoder[SiteUser] = deriveEncoder
// usage
import io.circe.syntax._
val anon: SiteUser = Anonymous
val loggedIn: SiteUser = LoggedInUser(42, "bob")
println(anon.asJson)
println(loggedIn.asJson)
The above will print:
{
"type" : "Anonymous"
}
{
"id" : 42,
"name" : "bob",
"type" : "LoggedInUser"
}
If you add more cases to SiteUser
or more fields, the
derived encoder will keep compiling unless the fields have no encoder
already defined for them. The resolution is done at compile-time because
implicit
is the mechanism under the hood. This guarantees that you
will never get a runtime error unless the JSON itself is invalid (which
is handled by all the fallible conversions returning an
Either[Error, A]
ADT).
Macros
Another advanced mechanism that Scala offers for generating code at compile-time is macros (scalameta). You typically won't use the feature directly unless you're a library author, so why is it relevant here?
Anecdote: at one of the projects recently I had a case where the JSON
structure of an upstream system and the BigQuery database structure was
identical (and was guaranteed to stay identical). To avoid duplicating
code (and writing lots of error-prone manual mapping), I decided to
reuse the case classes from the JSON world in the database mapping.
BigQuery naturally only returned Map[String, Any]
so I had to find a
way to automatically populate the relevant case class. This was a
perfect case for generic derivation (see
type classes and generic derivation
for a more complete example).
I wanted to have a trait (typeclass) that looked like:
trait Decoder[A] {
def decode(bigQueryResult: BQ): Either[Error, A]
}
But not have to write it manually for each case class I wanted to query. Luckily, the magnolia library made it fairly straightforward to create a derivation for that, and I had a working example I could use within two days (for a fairly complex data type with lots of nested lists, maps, etc.). The initial time investment probably was similar to writing the manual mapping by hand, but since the investment is generic over any case class type, no mapping would need to be done manually in the future, saving time and reducing errors.
There is no direct equivalent of a macro library for Java - there is asm which does bytecode manipulation, but that is a lot lower level and magnolia supports coproducts and case classes which do not exist as high-level concepts in Java.
Summary
Scala for me offers both simple and advanced mechanisms to manage complexity (via ADTs, case classes and pattern matching), eliminate run-time errors by pushing checks to compile-time and conciseness. The reduction in code is significant because it both reduces bugs (no bugs in code that isn't written) and allows you to get up-to-speed with a new codebase quicker (there is less to read). The functional style of programming and the typelevel ecosystem of libraries make constructing software safer (at compile-time even before tests are involved) and more productive (due to the high level of genericity and low level of boilerplate). Functional style code typically allows for better local reasoning which is easier on the brains of developers and code reviewers.
I also ommitted a lot of smaller feature that Scala has such as for comprehensions, companion objects, type aliases, as well as patterns when writing scala code like typeclasses.
One disadvantage is that it will not take one day or even a week to get to know Scala well. I also feel that prototyping a new service when requirements are changing quickly is also quite difficult due to the static typing - sometimes it is easier to let everything be a string or a map, in which case a more dynamic language like Python or Clojure may be quicker to use initially.
Please leave comments or take a look at my example-pure-todomvc for more concrete example of what a Scala microservice that implements TodoMVC looks like.
Update Feb 2021: Have a look at the excellent bootzooka project, that offers up-to-date and comprehensive template to start building a microservice from