Where does my validation live?

This is a question I've received over and over again, a question that does not have a single answer. Sometimes when I explain how I approach this I get surprised reactions, so I figure I might as well share it in a blog post and see what other people think.

What makes something valid?

When talking about validation we cannot go past validity itself. What makes something valid? What makes something invalid? Well, it kind of depends on the context in which you view something. When I send malformed HTTP to a web server like NGINX, it will respond with a 400 Bad Request. When I send a HTTP with a malformed JSON body to that same web server, it will happily forward it to the application layer. While the application will reject the malformed JSON, the HTTP message was deemed valid for the web server.

Making these distinctions about what is valid for each of the contexts is useful, it helps keep responsibilities where they belong. For application code, these same dynamics apply. As the application responds to incoming requests, different parts of the code are responsible for different kinds of validation. But what belongs where?

I'll be sharing what works for me, and what I've seen work in larger codebases. Like with many things in software engineering, some practices only work great when coupled to other practices. I'll dig into some of these supporting practices for the different styles of validation that I'll be covering.

Validating user input in HTTP calls

I'll be focusing on validation of user input in HTTP calls, since those are the most common cases for me (and I expect they are common for you too). We'll begin our journey at the application layer.

I'll be working under the assumption that you have some sort of layering in your architecture. This post will assume the following later; 1) Controller Layer, 2) Service Layer, and 3) the Domain Layer. Each of these layers has a distinct type of validation, let's dive in!

Controller validation: shapes and types

Once you hit the controller, the user data is often represented as a request object, or exposed through framework helpers. At this point I limit the validation to shapes and types. To illustrate, let's use an example.

Let's say I'm building an application for a payment service provider. To create a payment, I expect the following JSON body to be posted:

let body = {
    "currency": "EUR",
    "amount": 1000,
    "options": {
        "locale": "nl_NL"
    }
}

In my controller layer, I care about the following aspects:

The provided payload is valid JSON
The body of the HTTP request should contain a string that can be parsed as JSON.
The currency field is a string.
The currency field should be of type string, it could be FOO or BAR, at this point we don't care about that.
The amount field is an integer.
As anybody working in finance knows, amounts can not be represented as a float, as those are not reliable when used for calculations. So, the amount field MUST be an integer.
The options field is an object.
The option field should be an object with a locale property, which must have a string value.

We now know all the properties, the property types, and the overall structure (shape) of the request. But what happens when the incoming data is not valid?

Whenever controller data validation fails, you should respond with a 400 Bad Request response. The body of the response may contain links to documentation or even provide feedback to the client explaining which fields were incorrectly supplied.

if ( ! is_string(request.currency)
    || ! is_integer(request.amount)
    || ! is_valid_options(request.options)) {
    throw BadRequestException("This is a bad request! No bueno!");
}

Supporting practice: Map incoming requests to DTO's

Data Transfer Objects (DTOs) are objects with the sole purpose of transferring information. It being an object, means it has a shape and each property a type. Its responsibilities align perfectly with that of the controller validation.

In your controller layer, convert your raw request to a DTO to signal to the rest of the application the shape and types are now known (and can be trusted). In languages like Java and PHP DTOs can be represented as classes, which give language-level safety that the DTOs indeed contain validly typed data. In languages like JS or TypeScript, you can fool the runtime. It is therefore important to apply discipline to ensure only valid data is forwarded to the next layer in your application architecture. Regardless of what tools (or limitations) your language of choice has, distinguishing between unvalidated input and valid input helps reduce complexity and increases clarity for code readers.

Service validation: from types to concepts concepts

At the Service Layer, DTOs are converted into instructions for the Domain Layer. My preferred style of instructions for the Domain Layer is in the form of Value Objects (VOs). VOs represent values in a richer way than scalar types do, they are more expressive and give meaning to the raw data.

VOs convey intent through their names, a string can be a million things, but a PaymentMethod has a pretty specific meaning. As a part of representing the values, VOs have the responsibility to be valid from construction. This is important as the rest of the application can trust this value. For this, there needs to be validation.

In languages like Java and PHP, static constructors can be used to turn possibly invalid input into valid value objects. In other languages like JS or TypeScript, some more discipline is required. Still, adding value objects adds value to projects as they give clarity over what data is valid and what the intended use of it is.

class PaymentMethod {
    private constructor(readonly name: string) {}
    public static fromString(name: string): PaymentMethod {
        const valid = ['ideal', 'creditcard', 'banktransfer'];
        
        if ( ! valid.includes(name)) {
            throw new SorryPaymentMethodIsNotValid(name);
        }
        
        return new PaymentMethod(name);
    }
}

In this step we've gone from a valid type to a valid domain concept by using value objects. This step added a level of strictness and a level of clarity to our code, making it easier to grasp what our application is doing.

Supporting practice: use Value Objects

This one will come to you as no surprise. I just wanted to re-emphasise that value objects give great value when it comes to modelling with validity of data in mind. If you haven't before, learn about them and introduce them into your codebase.

How about only accepting value objects?

In the wild I've seen two schools of thought regarding service layer validation. One is to push this responsibility to the application layer, accepting only value objects (VOs) in the service layer. This decreases responsibility and the number of edge cases your service needs to deal with. It also increases the responsibility of the calling code, which is not always a good thing. By increasing the external responsibility you decrease leverage or value that a layer provides.

The approach I've described converts DTOs into value objects. The previously anonymous strings and integers are converted into more meaningful concepts represented as code. I personally like this approach the most because it designs for leverage, the further you push down complexity, the more parts of your system benefit from it. That said, the further you push it down, the more translation of error messages you'll need to do to inform the upper layers of what is going on. It's not a silver bullet approach, but it generally works well for me.

Domain Layer: guarding invariants

In the domain layer we're all about the business rules and guarding so-called invariants. Invariants are states or conditions that always hold true. It's a bit of a weird word, so I often describe it as enforcing business rules. These business rules are domain specific. For example, a digital wallet may want to protect against overspending, while a package delivery service may want to ensure a package is scheduled within a certain timeframe. For the application to respect the rules of the business it needs to enforce them by doing checks.

By now all of our input has been converted into meaningful value objects that are valid. In the domain code we add another layer of validity, contextual validity. To illustrate, let's explore the use-case of making a purchase with a bank account.

Somewhere in the process of making a payment using a bank account (through iDeal for example) the bank received an instruction to pay a certain amount. The amount can be represented as a numeric value, combined with a currency.

class Amount
{
	constructor(
    	readonly value: number,
        readonly currency: Currency,
    ) {}
}

The Amount value object ensures that the amount is valid, but that says nothing about the customers balance. This is where the domain layer upholds constraints. It validates that the action can be taken while keeping the business rules happy.

Our bank account domain model ensures we do not allow overspending:

class BankAccount
{
    public spend(amount: Amount): bool
    {
        if (this.balance < amount.value) {
            return false;
        }
        
        this.balance -= amount.value;
        
        return true;
    }
}

Supporting practice: guard at the start

Always enforce the business rules before any data changes, this ensures your entities are always in a consistent state. Prefixing them with guardAgainst can help you articulate which conditions you're trying to prevent. Encapsulate guards as methods so they can be re-used.

class BankAccount
{
    public spend(amount: Amount): bool
    {
        // guard first
        this.guardAgainstOverspending(amount);
        
        // change data later
        this.balance -= amount.value;
    }
    
    private guardAgainstOverspending(amount: Amount): void
    {
        if (this.balance < amount.value) {
            throw new SorryNotEnoughBalanceForSpending(amount);
        }
    }
}

Where to draw the line?

Sometimes it's not super clear whether certain validation belongs to the controller layer or somewhere else. An example of this is the validation of enums, uuids, or unsigned integers. A uuid can be represented as a string, or as a UUID object, but which is right? Languages that support unsigned integers prevent DTO construction when the input number is not below zero, how should other languages treat this? In these cases, there's not really a right or wrong answer. It's mostly about preferences and consistency. I usually prefer to push complexity down to increase the leverage, others may validate more strictly at the edge to reduce the number of error scenarios in the core domain. Using language that supports rich object construction, validating at the edge may even give you the most leverage. There's really no hard line you can enforce. Even the programming language we choose has varying influence on where the optimal line is drawn.

In many of my applications I've observed the following convention to be the most pleasant:

Pass domain information to the service layer using scalar types.
When receiving domain specific information, such as an identifier of a payment method, pass them down as strings to the next layer.
Pass system information to the service layer using value objects.
When receiving application or system type of information, such as UUIDs, pass them down as VO's.

Another category of questionable cases are the enums. These are concrete types that represent a finite set of options. You could represent a currency as an enum, as there are not an infinite amount of currencies (although crypto enthusiasts may beg to differ). Enum validation can be seen as type validation or as value validation.

Command and Query objects

The use of command and query objects influences where and how validation is performed. These objects, used to interact with services, care more about the data semantics than DTOs do. There are generally two schools of thought, one that dictates these objects should only contain scalar values, the other praising the use of value objects. In these cases you elevate the Service Layer validation (from types to values) to the controller layer. This can reduce the amount of places where validation needs to happen.

In this setup, you still push the complexity down, but not in the same direction as the domain complexities. Rich domain commands benefit from standardised mapping from input to commands. A robust feedback mechanism for hydration errors is an absolute must for a pleasant user experience. Setting up a structure like that deserves a blog post of its own.

Conclusion

In short, validation takes on many shapes and forms. From coarse validation on shape and types, to domain specific and contextual business conditions. In this post I've shared my approach to input validation at the different points in a layered architecture. Try to use DTOs to represent data which has a valid shape and type. Try to use VOs to represent data which is valid for the domain. Use guards to ensure business rules are respected in the domain layer.