Blogg
Här finns tekniska artiklar, presentationer och nyheter om arkitektur och systemutveckling. Håll dig uppdaterad, följ oss på LinkedIn
Här finns tekniska artiklar, presentationer och nyheter om arkitektur och systemutveckling. Håll dig uppdaterad, följ oss på LinkedIn
With Spring Boot 3.2 and Spring Framework 6.1, we get support for Coordinated Restore at Checkpoint (CRaC), a mechanism that enables Java applications to start up faster. With Spring Boot, we can use CRaC in a simplified way, known as Automatic Checkpoint/Restore at startup. Even though not as powerful as the standard way of using CRaC, this blog post will show an example where the Spring Boot applications startup time is decreased by 90%. The sample applications are based on chapter 6 in my book on building microservices with Spring Boot.
The blog post is divided into the following sections:
Let’s start learning about CRaC and its benefits and challenges.
Coordinated Restore at Checkpoint (CRaC) is a feature in OpenJDK, initially developed by Azul, to enhance the startup performance of Java applications by allowing them to restore to a previously saved state quickly. CRaC enables Java applications to save their state at a specific point in time (checkpoint) and then restore from that state at a later time. This is particularly useful for scenarios where fast startup times are crucial, such as serverless environments, microservices, and, in general, applications that must be able to scale up their instances quickly and also support scale-to-zero when not being used.
This introduction will first explain a bit about how CRaC works, then discuss some of the challenges and considerations associated with It, and finally, describe how Spring Boot 3.2 integrates with it. The introduction is divided in the following subsections:
Checkpoint Creation: At a chosen point during the application’s execution, a checkpoint is created. This involves capturing the entire state of the Java application, including the heap, stack, and all active threads. The state is then serialized and saved to the file system. During the checkpoint process, the application is typically paused to ensure a consistent state is captured. This pause is coordinated to minimize disruption and ensure the application can resume correctly.
Before taking the checkpoint, some requests are usually sent to the application to ensure that it is warmed up, i.e., all relevant classes are loaded, and the JVM HotSpot engine has had a chance to optimize the bytecode according to how it is being used in runtime.
Commands to perform a checkpoint:
java -XX:CRaCCheckpointTo=<some-folder> -jar my_app.jar
# Make calls to the app to warm up the JVM...
jcmd my_app.jar JDK.checkpoint
State Restoration: When the application is started from the checkpoint, the previously saved state is deserialized from the file system and loaded back into memory. The application then continues execution from the exact point where the checkpoint was taken, bypassing the usual startup sequence.
Command to restore from a checkpoint:
java -XX:CRaCRestoreFrom=<some-folder>
Restoring from a checkpoint allows applications to skip the initial startup process, including class loading, warmup initialization, and other startup routines, significantly reducing startup times.
For more information, see Azul’s documentation: What is CRaC?.
As with any new technology, CRaC comes with a new set of challenges and considerations:
State Management: Open files and connections to external resources, such as databases, must be closed before the checkpoint is taken. After the restore, they must be reopened.
CRaC exposes a Java lifecycle interface that applications can use to handle this, org.crac.Resource
, with the callback methods beforeCheckpoint
and afterRestore
.
Sensitive information: Credentials and secrets stored in the JVM’s memory will be serialized into the files created by the checkpoint. Therefore, these files need to be protected. An alternative is to run the checkpoint command against a temporary environment that uses other credentials and replace the credentials on restore.
Linux dependency: The checkpoint technique is based on a Linux feature called CRIU, “Checkpoint/Restore In Userspace”. This feature only works on Linux, so the easiest way to test CRaC on a Mac or a Windows PC is to package the application into a Linux Docker image.
Linux privileges required: CRIU requires special Linux privileges, resulting in Docker commands to build Docker images and creating Docker containers also requiring Linux privileges to be able to run.
Storage Overhead: Storing and managing checkpoint data requires additional storage resources, and the checkpoint size can impact the restoration time. The original jar file is also required to be able to restart a Java application from a checkpoint.
I will describe how to handle these challenges in the section on creating Docker images.
Spring Boot 3.2 (and the underlying Spring Framework) helps with the processing of closing and reopening connections to external resources. Before the creation of the checkpoint, Spring stops all running beans, giving them a chance to close resources if needed. After a restore, the same beans are restarted, allowing beans to reopen connections to the resources.
The only thing that needs to be added to a Spring Boot 3.2-based application is a dependency to the crac
- library. Using Gradle, it looks like the following in the gradle.build
file:
dependencies {
implementation 'org.crac:crac'
Note: The normal Spring Boot BOM mechanism takes care of versioning the
crac
dependency.
The automatic closing and reopening of connections handled by Spring Boot usually works. Unfortunately, when this blog post was written, some Spring modules lacked this support. To track the state of CRaC support in the Spring ecosystem, a dedicated test project, Spring Lifecycle Smoke Tests, has been created. The current state can be found on the project’s status page.
If required, an application can register callback methods to be called before a checkpoint and after a restore by implementing the above-mentioned Resource
interface. The microservices used in this blog post have been extended to register callback methods to demonstrate how they can be used. The code looks like this:
import org.crac.*;
public class MyApplication implements Resource {
public MyApplication() {
Core.getGlobalContext().register(this);
}
@Override
public void beforeCheckpoint(Context<? extends Resource> context) {
LOG.info("CRaC's beforeCheckpoint callback method called...");
}
@Override
public void afterRestore(Context<? extends Resource> context) {
LOG.info("CRaC's afterRestore callback method called...");
}
}
Spring Boot 3.2 provides a simplified alternative to take a checkpoint compared to the default on-demand alternative described above. It is called automatic checkpoint/restore at startup. It is triggered by adding the JVM system property -Dspring.context.checkpoint=onRefresh
to the java -jar
command. When set, a checkpoint is created automatically when the application is started. The checkpoint is created after Spring beans have been created but not started, i.e., after most of the initialization work but before that application starts. For details, see Spring Boot docs and Spring Framework docs.
With an automatic checkpoint, we don’t get a fully warmed-up application, and the runtime configuration must be specified at build time. This means that the resulting Docker images will be runtime-specific and contain sensitive information from the configuration, like credentials and secrets. Therefore, the Docker images must be stored in a private and protected container registry.
Note: If this doesn’t meet your requirements, you can opt for the on-demand checkpoint, which I will describe in the next blog post.
With CRaC and Spring Boot 3.2’s support for CRaC covered, let’s see how we can create Docker images for Spring Boot applications that use CRaC.
While learning how to use CRaC, I studied several blog posts on using CRaC with Spring Boot 3.2 applications. They all use rather complex bash scripts (depending on your bash experience) using Docker commands like docker run
, docker exec
, and docker commit
. Even though they work, it seems like an unnecessarily complex solution compared to producing a Docker image using a Dockerfile.
So, I decided to develop a Dockerfile that runs the checkpoint command as a RUN
command in the Dockerfile. It turned out to have its own challenges, as described below. I will begin by describing my initial attempt and then explain the problems I stumbled into and how I solved them, one by one until I reach a fully working solution. The walkthrough is divided in the following subsections:
docker build
Dockerfile
Let’s start with a first attempt and see where it leads us.
My initial assumption was to create a Dockerfile based on a multi-stage build, where the first stage creates the checkpoint using a JDK-based base image, and the second step uses a JRE-based base image for runtime. However, while writing this blog post, I failed to find a base image for a Java 21 JRE supporting CRaC. So I changed my mind to use a regular Dockerfile instead, using a base image from Azul: azul/zulu-openjdk:21.0.3-21.34-jdk-crac
Note: Bellsoft also provides base images for CraC; see Liberica JDK with CRaC Support as an alternative to Azul.
The first version of the Dockerfile looks like this:
FROM azul/zulu-openjdk:21.0.3-21.34-jdk-crac
ADD build/libs/*.jar app.jar
RUN java -Dspring.context.checkpoint=onRefresh -XX:CRaCCheckpointTo=checkpoint -jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-XX:CRaCRestoreFrom=checkpoint"]
This Dockerfile is unfortunately not possible to use since CRaC requires a build to run privileged commands.
docker build
As mentioned in the section 1.2. Challenges and Considerations, CRIU, which CRaC is based on, requires special Linux privileges to perform a checkpoint. The standard docker build
command doesn’t allow privileged builds, so it can’t be used to build Docker images using the above Dockerfile.
Note: The
--privileged
- flag that can be used indocker run
commands is not supported bydocker build
.
Fortunately, Docker provides an improved builder backend called BuildKit. Using BuildKit, we can create a custom builder that is insecure, meaning it allows a Dockerfile to run privileged commands. To communicate with BuildKit, we can use Docker’s CLI tool buildx.
The following command can be used to create an insecure builder named insecure-builder
:
docker buildx create --name insecure-builder --buildkitd-flags '--allow-insecure-entitlement security.insecure'
Note: The builder runs in isolation within a Docker container created by the
docker buildx create
command. You can run adocker ps
command to reveal the container. When the builder is no longer required, it can be removed with the command:docker buildx rm insecure-builder
.
The insecure builder can be used to build a Docker image with a command like:
docker buildx --builder insecure-builder build --allow security.insecure --load .
Note: The
--load
flag loads the built image into the regular local Docker image cache. Since the builder runs in an isolated container, its result will not end up in the regular local Docker image cache by default.
RUN
commands in a Dockerfile that requires privileges must be suffixed with --security=insecure
. The --security
-flag is only in preview and must therefore be enabled in the Dockerfile
by adding the following line as the first line in the Dockerfile:
# syntax=docker/dockerfile:1.3-labs
For more details on BuildKit and docker buildx
, see Docker Build architecture.
We can now perform the build; however, the way the CRaC is implemented stops the build, as we will learn in the next section.
On a successful checkpoint, the java -Dspring.context.checkpoint=onRefresh -XX:CRaCCheckpointTo...
command is terminated forcefully (like using kill -9
) and returns the exit status 137
instead of 0
, causing the Docker build command to fail.
To prevent the build from stopping, the java
command is extended with a test that verifies that 137 is returned and, if so, returns 0 instead. The following is added to the java
command: || if [ $? -eq 137 ]; then return 0; else return 1; fi
.
Note:
||
means that the command following will be executed if the first command fails.
With CRaC working in a Dockerfile, let’s move on and learn about the challenges with runtime configuration and how to handle them.
Using Spring Boot’s automatic checkpoint/restore at startup, there is no way to specify runtime configuration on restore; at least, I haven’t found a way to do it. This means that the runtime configuration has to be specified at build time. Sensitive information from the runtime configuration, such as credentials used for connecting to a database, will written to the checkpoint files. Since the Docker images will contain these checkpoint files they also need to be handled in a secure way.
The Spring Framework documentation contains a warning about this, copied from the section Automatic checkpoint/restore at startup:
As mentioned above, and especially in use cases where the CRaC files are shipped as part of a deployable artifact (a container image, for example), operate with the assumption that any sensitive data “seen” by the JVM ends up in the CRaC files, and assess carefully the related security implications.
So, let’s assume that we can protect the Docker images, for example, in a private registry with proper authorization in place and that we can specify the runtime configuration at build time.
In Chapter 6 of the book, the source code specifies the runtime configuration in the configuration files, application.yml
, in a Spring profile named docker
.
The RUN
command, which performs the checkpoint, has been extended to include an environment variable that declares what Spring profile to use: SPRING_PROFILES_ACTIVE=docker
.
Note: If you have the runtime configuration in a separate file, you can add the file to the Docker image and point it out using an environment variable like
SPRING_CONFIG_LOCATION=file:runtime-configuration.yml
.
With the challenges of proper runtime configuration covered, we have only one problem left to handle: Spring Data JPA’s lack of support for CRaC without some extra work.
Spring Data JPA does not work out-of-the-box with CRaC, as documented in the Smoke Tests project; see the section about Prevent early database interaction. This means that auto-creation of database tables when starting up the application, is not possible when using CRaC. Instead, the creation has to be performed outside of the application startup process.
Note: This restriction does not apply to embedded SQL databases. For example, the Spring PetClinic application works with CRaC without any modifications since it uses an embedded SQL database by default.
To address these deficiencies, the following changes have been made in the source code of Chapter 6:
Manual creation of a SQL DDL script, create-tables.sql
Since we can no longer rely on the application to create the required database tables, a SQL DDL script has been created. To enable the application to create the script file, a Spring profile create-ddl-script
has been added in the review microservice’s configuration file, microservices/review-service/src/main/resources/application.yml
. It looks like:
spring.config.activate.on-profile: create-ddl-script
spring.jpa.properties.jakarta.persistence.schema-generation:
create-source: metadata
scripts:
action: create
create-target: crac/sql-scripts/create-tables.sql
The SQL DDL file has been created by starting the MySQL database and, next, the application with the new Spring profile. Once connected to the database, the application and database are shut down. Sample commands:
docker compose up -d mysql
SPRING_PROFILES_ACTIVE=create-ddl-script java -jar microservices/review-service/build/libs/review-service-1.0.0-SNAPSHOT.jar
# CTRL/C once "Connected to MySQL: jdbc:mysql://localhost/review-db" is written to the log output
docker compose down
The resulting SQL DDL script, crac/sql-scripts/create-tables.sql
, has been added to Chapter 6’s source code.
The Docker Compose file configures MySQL to execute the SQL DDL script at startup.
A CraC-specific version of the Docker Compose file has been created, crac/docker-compose-crac.yml
. To create the tables when the database is starting up, the SQL DDL script is used as an init script. The SQL DDL script is mapped into the init-folder /docker-entrypoint-initdb.d
with the following volume-mapping in the Docker Compose file:
volumes:
- "./sql-scripts/create-tables.sql:/docker-entrypoint-initdb.d/create-tables.sql"
Added a runtime-specific Spring profile in the review microservice’s configuration file.
The guidelines in the Smoke Tests project’s JPA section have been followed by adding an extra Spring profile named crac
. It looks like the following in the review microservice’s configuration file:
spring.config.activate.on-profile: crac
spring.jpa.database-platform: org.hibernate.dialect.MySQLDialect
spring.jpa.properties.hibernate.temp.use_jdbc_metadata_defaults: false
spring.jpa.hibernate.ddl-auto: none
spring.sql.init.mode: never
spring.datasource.hikari.allow-pool-suspension: true
Finally, the Spring profile crac
is added to the RUN
command in the Dockerfile to activate the configuration when the checkpoint is performed.
Dockerfile
Finally, we are done with handling the problems resulting from using a Dockerfile to build a Spring Boot application that can restore quickly using CRaC in a Docker image.
The resulting Dockerfile, crac/Dockerfile-crac-automatic
, looks like:
# syntax=docker/dockerfile:1.3-labs
FROM azul/zulu-openjdk:21.0.3-21.34-jdk-crac
ADD build/libs/*.jar app.jar
RUN --security=insecure \
SPRING_PROFILES_ACTIVE=docker,crac \
java -Dspring.context.checkpoint=onRefresh \
-XX:CRaCCheckpointTo=checkpoint -jar app.jar \
|| if [ $? -eq 137 ]; then return 0; else return 1; fi
EXPOSE 8080
ENTRYPOINT ["java", "-XX:CRaCRestoreFrom=checkpoint"]
Note: One and the same Dockerfile is used by all microservices to create CRaC versions of their Docker images.
We are now ready to try it out!
To try out CRaC, we will use the microservice system landscape used in Chapter 6 of my book. If you are not familiar with the system landscape, it looks like the following:
Chapter 6 uses Docker Compose to manage (build, start, and stop) the system landscape.
Note: If you don’t have all the tools used in this blog post installed in your environment, you can look into Chapters 21 and 22 for installation instructions.
To try out CRaC, we need to get the source code from GitHub, compile it, and create the Docker images for each microservice using a custom insecure Docker builder. Next, we can use Docker Compose to start up the system landscape and run the end-to-end validation script that comes with the book to ensure that everything works as expected. We will wrap up the try-out section by comparing the startup times of the microservices when they start with and without using CRaC.
We will go through each step in the following subsections:
Run the following commands to get the source code from GitHub, jump into the Chapter06
folder, checkout the branch used in this blog post SB3.2-crac-automatic
, and ensure that a Java 21 JDK is used (Eclipse Temurin is used here):
git clone https://github.com/PacktPublishing/Microservices-with-Spring-Boot-and-Spring-Cloud-Third-Edition.git
cd Microservices-with-Spring-Boot-and-Spring-Cloud-Third-Edition/Chapter06
git checkout SB3.2-crac-automatic
sdk use java 21.0.3-tem
Start with compiling the microservices source code:
./gradlew build
If not already created, create the insecure builder with the command:
docker buildx create --name insecure-builder --buildkitd-flags '--allow-insecure-entitlement security.insecure'
Now we can build a Docker image, where the build performs a CRaC checkpoint for each of the microservices with the commands:
docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t product-composite-crac --load microservices/product-composite-service
docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t product-crac --load microservices/product-service
docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t recommendation-crac --load microservices/recommendation-service
docker buildx --builder insecure-builder build --allow security.insecure -f crac/Dockerfile-crac-automatic -t review-crac --load microservices/review-service
To start up the system landscape, we will use Docker Compose. Since CRaC requires special Linux privileges, a CRaC-specific docker-compose file comes with the source code, crac/docker-compose-crac.yml
. Each microservice is given the required privilege, CHECKPOINT_RESTORE
, by specifying:
cap_add:
- CHECKPOINT_RESTORE
Note: Several blog posts on CRaC suggest using privileged containers, i.e., starting them with
run --privleged
or addingprivileged: true
in the Docker Compose file. This is a really bad idea since an attacker who gets control over such a container can easily take control of the host that runs Docker. For more information, see Docker’s documentation on Runtime privilege and Linux capabilities.
The final addition to the CRaC specific Docker Compose file is the volume mapping for MySQL to add the init file described above in the section 2.5. Problem #4, Spring Data JPA:
volumes:
- "./sql-scripts/create-tables.sql:/docker-entrypoint-initdb.d/create-tables.sql"
Using this Docker Compose file, we can start up the system landscape and run the end-to-end verification script with the following commands:
export COMPOSE_FILE=crac/docker-compose-crac.yml
docker compose up -d
Let’s start with verifying that the CRaC afterRestore
callback methods were called:
docker compose logs | grep "CRaC's afterRestore callback method called..."
Expect something like:
...ReviewServiceApplication : CRaC's afterRestore callback method called...
...RecommendationServiceApplication : CRaC's afterRestore callback method called...
...ProductServiceApplication : CRaC's afterRestore callback method called...
...ProductCompositeServiceApplication : CRaC's afterRestore callback method called...
Now, run the end-to-end verification script:
./test-em-all.bash
If the script ends with a log output similar to:
End, all tests OK: Fri Jun 28 17:40:43 CEST 2024
…it means all tests run ok, and the microservices behave as expected.
Bring the system landscape down with the commands:
docker compose down
unset COMPOSE_FILE
After verifying that the microservices behave correctly when started from a CRaC checkpoint, we can compare their startup times with microservices started without using CRaC.
Now over to the most interesting part: How much faster does the microservice startup when performing a restore from a checkpoint compared to a regular cold start?
The tests have been run on a MacBook Pro M1 with 64 GB memory.
Let’s start with measuring startup times without using CRaC.
To start the microservices without CRaC, we will use the default Docker Compose file.
So, we must ensure that the COMPOSE_FILE
environment variable is unset before we build the Docker images for the microservices. After that, we can start the database services, MongoDB and MySQL:
unset COMPOSE_FILE
docker compose build
docker compose up -d mongodb mysql
Verify that the databases are reporting healthy with the command: docker compose ps
. Repeat the command until both report they are healthy. Expect a response like this:
NAME ... STATUS ...
chapter06-mongodb-1 ... Up 13 seconds (healthy) ...
chapter06-mysql-1 ... Up 13 seconds (healthy) ...
Next, start the microservices and look in the logs for the startup time (searching for the word Started
). Repeat the logs
command until logs are shown for all four microservices:
docker compose up -d
docker compose logs | grep Started
Look for a response like:
...Started ProductCompositeServiceApplication in 1.659 seconds
...Started ProductServiceApplication in 2.219 seconds
...Started RecommendationServiceApplication in 2.203 seconds
...Started ReviewServiceApplication in 3.476 seconds
Finally, bring down the system landscape:
docker compose down
First, declare that we will use the CRaC-specific Docker Compose file and start the database services, MongoDB and MySQL:
export COMPOSE_FILE=crac/docker-compose-crac.yml
docker compose up -d mongodb mysql
Verify that the databases are reporting healthy with the command: docker compose ps
. Repeat the command until both report they are healthy. Expect a response like this:
NAME ... STATUS ...
crac-mongodb-1 ... Up 10 seconds (healthy) ...
crac-mysql-1 ... Up 10 seconds (healthy) ...
Next, start the microservices and look in the logs for the startup time (this time searching for the word Restored
). Repeat the logs
command until logs are shown for all four microservices:
docker compose up -d
docker compose logs | grep Restored
Look for a response like:
...Restored ProductCompositeServiceApplication in 0.131 seconds
...Restored ProductServiceApplication in 0.225 seconds
...Restored RecommendationServiceApplication in 0.236 seconds
...Restored ReviewServiceApplication in 0.154 seconds
Finally, bring down the system landscape:
docker compose down
unset COMPOSE_FILE
Now, we can compare the startup times!
Here is a summary of the startup times, along with calculations of how many times faster the CRaC-enabled microservice starts and the reduction of startup times in percentage:
Microservice | Without CRaC | With CRaC | CRaC times faster | CRaC reduced startup time |
---|---|---|---|---|
product-composite | 1.659 | 0.131 | 12.7 | 92% |
product | 2.219 | 0.225 | 9.9 | 90% |
recommendation | 2.203 | 0.236 | 9.3 | 89% |
review | 3.476 | 0.154 | 22.6 | 96% |
Generally, we can see a 10-fold performance improvement in startup times or 90% shorter startup time; that’s a lot!
Note: The improvement in the Review microservice is even better since it no longer handles the creation of database tables. However, this improvement is irrelevant when comparing improvements using CRaC, so let’s discard the figures for the Review microservice.
Coordinated Restore at Checkpoint (CRaC) is a powerful feature in OpenJDK that improves the startup performance of Java applications by allowing them to resume from a previously saved state, a.k.a a checkpoint. With Spring Boot 3.2, we also get a simplified way of creating a checkpoint using CRaC, known as automatic checkpoint/restore at startup.
The tests in this blog post indicate a 10-fold improvement in startup performance, i.e., a 90% reduction in startup time when using automatic checkpoint/restore at startup.
The blog post also explained how Docker images using CRaC can be built using a Dockerfile instead of the complex bash scripts suggested by most blog posts on the subject. This, however, comes with some challenges of its own, like using custom Docker builders for privileged builds, as explained in the blog post.
Using Docker images created using automatic checkpoint/restore at startup comes with a price. The Docker images will contain runtime-specific and sensitive information, such as credentials to connect to a database at runtime. Therefore, they must be protected from unauthorized use.
The Spring Boot support for CRaC does not fully cover all modules in Spring’s eco-system, forcing some workaround to be applied, e.g., when using Spring Data JPA.
Also, when using automatic checkpoint/Restore at startup, the JVM HotSpot engine cannot be warmed up before the checkpoint. If optimal execution time for the first requests being processed is important, automatic checkpoint/restore at startup is probably not the way to go.
In the next blog post, I will show you how to use regular on-demand checkpoints to solve some of the considerations with automatic checkpoint/restore at startup.
Specifically, the problems with specifying the runtime configuration at build time, storing sensitive runtime configuration in the Docker images, and how the Java VM can be warmed up before performing the checkpoint.