Blogg
Här finns tekniska artiklar, presentationer och nyheter om arkitektur och systemutveckling. Håll dig uppdaterad, följ oss på LinkedIn
Här finns tekniska artiklar, presentationer och nyheter om arkitektur och systemutveckling. Håll dig uppdaterad, följ oss på LinkedIn
This post explores an automated approach to creating high-quality photo collages from large image collections, a journey that began out of frustration with existing tools. The solutions I found, whether manual or automated, often created repetitive patterns, ugly image overlaps, and offered only a fixed set of layouts. This led me to explore a new approach, leveraging a Genetic Algorithm (GA), to efficiently generate large collage layouts. I’ll present an overview of the algorithm and a custom implementation written in Kotlin.
The journey began when I tried to arrange a few hundred photos in an image editing program. This quickly proved to be an endless, tedious task, so I rather quickly started experimenting with scripting a combination of ImageMagick plugins, aiming to automate the process. Although this kind of worked, the results were far from ideal: repetitive patterns, ugly image overlaps, and noticeable gaps between pictures. I also explored various commercial alternatives, but none seemed capable of handling hundreds of images with satisfactory results. Almost all also relied on rigid, fixed layouts that frequently cropped images to fit predefined slots – a limitation I absolutely wanted to avoid. My goal was to just beeing able to select a directory of images and get a good result with minimal effort.
It was during this search for a better solution that I discovered a promising approach outlined in a paper by Jian Fan, which proposed using a genetic algorithm for collage creation. Although I had a basic conceptual understanding of genetic algorithms, I had never applied them to an actual practical problem. This felt like the perfect learning opportunity!
Let’s first briefly delve into what genetic algorithms are and how they function. Then, we’ll connect this theory to the rather complex challenge of generating effective photo collage layouts, and specifically how a GA can provide an elegant solution. Finally, we’ll detail a custom implementation of the algorithm that builds upon Fan’s original algorithm, incorporating additional enhancements such as a fitness metric for improved centering of featured images.
Genetic algorithms (GAs) are a class of optimization algorithms inspired by the process of natural selection. They are particularly well-suited for complex optimization problems characterized by vast search spaces or non-linear relationships, where traditional deterministic methods often struggle to find optimal (or even good) solutions efficiently. GAs excel at discovering “good enough” solutions for real-world problems that might lack a precise mathematical model. Their beauty lies in their simplicity and general applicability – the core requirements are merely defining a robust fitness function and appropriate genetic operators tailored to your specific problem domain.
The underlying principle of genetic algorithms is to mimic the evolutionary process through a series of iterative steps:
This cyclical process is repeated for multiple generations. Over time, just as natural selection leads to better-adapted organisms, genetic algorithms tend to produce increasingly better solutions to the problem at hand.
Creating an appealing image collage involves arranging multiple images on a rectangular canvas, seemingly simple at first glance. However, the problem quickly escalates in complexity when you introduce real-world constraints:
As Jian Fan states in his paper, this quickly becomes an
“NP-complete combinatorial optimization problem. An exhaustive search is impractical even for a modest number of photos.”
This inherent complexity, combined with the need for a “good enough” solution rather than a computationally expensive perfect one, makes it an ideal candidate for a genetic algorithm approach.
In the algorithm, each individual in the population represents a complete collage layout of all source images. The layout is structured as a binary tree consisting of two types of nodes:
LayoutNode
): These represent divisions of the canvas, which can be either horizontal (left / right) or vertical (top / bottom)ImageNode
): These represent the actual images placed within the collage.Such a tree, where each node has either zero or two child nodes, is commonly known as a “full binary tree”. This structure inherently helps manage the spatial relationships between images. Below is a debug text rendering using the BinaryTreePrinter the for a 10 image collage solution from a testcase in the project that illustrates the tree structure:
Binary Tree:
V: 1823x1080
├──H: 1103x1080
│ ├──V: 1103x815
│ │ ├──H: 288x815
│ │ │ ├──H: 288x599
│ │ │ │ ├──Image-7 Size: 288x216 ≈ 36% of 800x600 (weight: 1) @1,333
│ │ │ │ └──Image-8 Size: 288x384 ≈ 20% of 1440x1920 (weight: 1) @0,750
│ │ │ └──Image-0 Size: 288x216 ≈ 9% of 3072x2304 (weight: 1) @1,333
│ │ └──Image-3 Size: 815x815 ≈ 68% of 1200x1200 (weight: 10) @1,000
│ └──V: 1103x265
│ ├──V: 551x265
│ │ ├──Image-4 Size: 353x265 ≈ 22% of 1600x1200 (weight: 1) @1,333
│ │ └──Image-9 Size: 198x265 ≈ 12% of 1712x2288 (weight: 1) @0,748
│ └──V: 551x265
│ ├──Image-5 Size: 198x265 ≈ 12% of 1712x2288 (weight: 1) @0,748
│ └──Image-6 Size: 353x265 ≈ 44% of 800x600 (weight: 1) @1,333
└──H: 720x1080
├──Image-1 Size: 720x540 ≈ 28% of 2592x1944 (weight: 3) @1,333
└──Image-2 Size: 720x540 ≈ 23% of 3072x2304 (weight: 3) @1,333
The initial population is created by generating a diverse set of random individuals. For each individual:
Finally, the dimensions and positions of all nodes in the tree are calculated based on these assignments and divisions.
This process ensures a highly varied starting population. While these random layouts will be far from optimal – often exhibiting many gaps and images downsized to minuscule sizes (e.g., 8x8 pixels) – they serve as the raw material for evolution. The only strict requirement is that they adhere to the basic constraints of a valid, non-overlapping layout.
The fitness (or score) of an individual is the crucial metric that determines how “good” a particular collage layout is. Our goal is to minimize this score, so a lower fitness indicates a better layout. The fitness score is determined by combining three main factors:
These factors are combined using configurable weights, allowing us to fine-tune the importance of each scoring component relative to the others. For example, by increasing the weight of the “centered feature” factor, the algorithm will prioritize placing featured images near the collage’s center. It’s crucial to remember that increasing all weights simultaneously is pointless, as they are competing metrics.
The selection process determines which individuals from a generation will become parents for the next. In this implementation, I use a simple, elitistic selection strategy: only individuals from the top 25% of the scored population are chosen. This ensures that only the most promising layouts contribute to the next generation, driving the evolution towards better solutions.
The crossover operation combines genetic material from two parent layouts to create a new offspring layout, introducing beneficial traits from both. In this implementation, this is achieved by:
This approach effectively preserves much of the parents’ successful structural elements while introducing variation, leading to novel, potentially improved layouts.
Mutation introduces random changes to the offspring’s genetic material, crucial for maintaining diversity within the population and enabling the algorithm to explore new regions of the solution space. This helps prevent premature convergence and avoids getting trapped in “local maxima” – suboptimal solutions that the algorithm might otherwise settle for. In this implementation, mutation randomly performs one of two operations:
LayoutNodes
within the tree, orImageNodes
.These random perturbations allow the algorithm to discover novel combinations and push beyond incremental improvements.
The entire evolutionary process runs for a specified number of generations. In each generation:
To illustrate how the solution progresses, heres are two images of a collage created using the sample images included in the repository.
The 800 x 600 output above exemplifies the typical state of the genetic algorithm’s initial population (population: 1, generation: 1) - essentially what the algorithm’s randomized V/H slicing logic will create without any iterative evolutionary guidance. None of the desired metrics is reflected in the output.
Same example but after 500 generations (population: 1000, generation: 500, with an image of a meerkat as a featured image with some boosted centering)
For the best collage results, I recommend experimenting with the parameters, particularly the ones described here. Different image sets often benefit from specific tuning to achieve optimal aesthetics.
By carefully experimenting with these parameters, you can fine-tune the genetic algorithm to produce collages that meet your specific aesthetic and functional requirements. I had a lot of fun creating this - I hope it may be useful for others!