Dynamic Multi Tenancy with Spring Boot, Hibernate and Liquibase Part 1

19 September 2020 // Björn Beskow

Multi Tenancy usually plays an important role in the business case for SAAS solutions. Spring Boot and Hibernate provide out-of-the-box support for different Multi-tenancy strategies. Configuration however becomes more complicated, and the available code examples are limited. In the first part of this blog series, we’ll start by exploring the Multi Tenancy concept and three different architectural patterns for multi tenant data isolation. In the forthcoming episodes, we’ll deep dive into the details of implementing the different patterns usign Spring Boot, Spring Data and Liquibase.

Blog Series Parts

What is Multi tenancy?

By allowing one single, highly scalable software solution to serve many different customers, a scalable, elastic, agile and cost effective solution can be built. A software architecture in which a (logically) single instance of the software serves multiple tenants is frequently called a multi-tenancy architecture. A tenant is a group of users who share a common access with specific privileges to the software instance. Everything should be shared, except for the different customers’ data, which should be properly separated. Despite the fact that they share resources, tenants aren’t aware of each other, and their data is kept totally separate.

neighbours

Conflicting requirements

As usual with architectural patterns, a multi-tenant architecture has to balance two partly conflicting needs or forces: On one hand, we would like to share as much as possible, in order to achieve:

Better use of resources: One machine reserved for one tenant isn’t efficient, as that one tenant is not likely to use all of the machine’s computing power. By sharing machines among multiple tenants, use of available resources is maximized.
Lower costs: With multiple customers sharing resources, a vendor can offer their services to many customers at a much lower cost than if each customer required their own dedicated infrastructure.
Elasticity and Agility: With a shared infrastructure, onboarding new tenants can be much easier, quicker and cost efficient.

On the other hand, we would like to have a fool-proof separation of between tenants, in order to guarantee the privacy, confidentiality and consistency of each tenant’s data. We also have to avoid the problem with “noisy neighbors”, where a tenant that misbehaves potentially can disturb its neighboring tenants.

Multi tenancy patterns

As we can see, a challenge lies in separating the data for each tenant, while still sharing as much as possible of the other resources. Three principal architectural patterns for Multi Tenancy can be identified, which differs in the degree of (physical) separation of the tenant’s data.

Database per tenant

SeparateDatabaseMultiTenancy

The most obvious way to separate the data owned by different tenants is to use a separate database per tenant. Using this pattern, the data is physically isolated per tenant, and hence the privacy and confidentiality of the data can easily be guaranteed (including administrative housekeeping such as backups and cleansing). The tradeoff is equally obvious, since the database infrastructure as well as database connection pools cannot be shared between tenants.

Schema per tenant

SeparateSchemaMultiTenancy

A slight variation to is to use a separate database schema per tenant, while sharing the database instance. The data for each tenant is logically isolated by the semantics of separate schemas as provided by the database engine. If the schemas is owned by a separate database user per tenant, the database engine’s security mechanism further guarantee the privacy and confidentiality of the data (note however that in such a case, the database connection pool cannot be reused by the data access layer).

Shared database, using a Discriminator Column

SingleDatabaseMultiTenancy

The final pattern uses a fully shared database, in which data for all tenants are stored in the same table(s). An additional discriminator column is added to each table, which needs to be included in an additional where clause in each and every query. This pattern provides the least data separation (leaving it to the application to guarantee privacy and confidentiality for the tenant’s data) but the maximum sharing of resources. From an infrastructure perspective, it is the conceptually simplest solution, whereas the complexity is pushed into the application. Since data is not separated at the database level, administrative housekeeping such as backups per tenant becomes more difficult.

Choosing a Multi Tenancy pattern

Hence there are different pro’s and con’s with the three patterns above. The choice between them will be governed by the requirements of a particular solution. Database-per-tenant provides very strong data isolation between tenants, but requires more infrastructural resources and administrative work in setting up new tenants and performing database migrations. Hence there is an upper limit on the scalability of the Database-per-tenant pattern, both in size and the time required to onboard new tenants. Shared-database-with-Discriminator-column provides maximal sharing of infrastructural resources and hence excellent scalability, but with data isolation between tenants only guaranteed by the application layer.

If you have a smaller number of tenants (< 1000) and require strong guarantees for tenant data isolation, Database-per-tenant and Schema-per-tenant are the most frequent choices. Among them, Schema-per-tenant is usually a good balance between data separation and resource sharing. If you have a large number of tenants, Shared-database-with-Discriminator-column might be the only viable solution.

Sometimes, the most pragmatic approach is a mixed model, supporting different customer segments using different models.

Summary

In this blog post, we have explored the Multi Tenancy concept and discussed three different architectural patterns for multi tenant data isolation. In the next part, we’ll dive into an implementation strategy for for Multi Tenant Data Access using Spring Boot, Spring Data, Hibernate and Liquibase, that allows us to implement the different multi tenant patterns transparently and efficiently.