Project 1: Colorization

Images of the Russian Empire -- Colorizing the Prokudin-Gorskii Photo Collection.

In this project, we colorize the Prokudin-Gorskii photo collection. We accomplish this by taking the RGB glass plate negatives and aligning them such that we produce a coherent, full-color image. The negatives are split into B, G, and R channels arranged vertically, and we align the green (G) and red (R) channels to the blue (B) channel to reconstruct the original color photograph.

For the alignment, we apply different techniques based on the image resolution: a naive single-scale algorithm is used for smaller JPG images, while a pyramid multi-scale approach is employed for larger TIF images. Beyond these two approaches, I also implemented a single-scale algorithm that utilizes Fourier transforms and phase correlation that works on all images.

Naive Single-Scale: Small Images

  • Max normalized cross-correlation (over min Euclidean distance)
  • 3.2326s on average

For implementing the naive single-scale search, I used the max normalized cross-correlation to find the best offset between the green (G) and red (R) channels to align to the blue (B) channel. Given that this algorithm is computationally expensive and uses exhaustive search, the naive single-scale algorithm is used for smaller JPG images. I initially tried to find the most optimal (dx, dy) displacement by finding the minimum Euclidean distance between the two channels that are being compared. The Euclidean distance, aka the L2 norm, has the following form:

L2(image1, image2) = sqrt(sum(sum((image1-image2).^2)))
This, however, kept on giving me artifacts in the output images, so I decided to use the max normalized cross-correlation instead. The max normalized cross-correlation, aka the NCC, has the following form (the dot product of the normalized vector representation of the images):
NCC(image1, image2) = (image1/||image1||) · (image2/||image2||)
In summary, I found the most optimal (dx, dy) displacement by finding the maximum value of the NCC between the two channels that are being compared, after realizing that finding the minimum L2 distance was not as effective (for some reason).

monastery.jpg

Middle (G) offset: (2, -3)
Bottom (R) offset: (2, 3)

Input: ../data/monastery.jpg
Alignment function: naive_ss
Show image: True
Middle (G) offset: (2, -3)
Bottom (R) offset: (2, 3)
Output: ../images/naive_ss_monastery.jpg
Time taken: 3.0769 seconds.

tobolsk.jpg

Middle (G) offset: (3, 3)
Bottom (R) offset: (3, 6)

Input: ../data/tobolsk.jpg
Alignment function: naive_ss
Show image: True
Middle (G) offset: (3, 3)
Bottom (R) offset: (3, 6)
Output: ../images/naive_ss_tobolsk.jpg
Time taken: 3.2929 seconds.

cathedral.jpg

Middle (G) offset: (2, 5)
Bottom (R) offset: (3, 12)

Input: ../data/cathedral.jpg
Alignment function: naive_ss
Show image: True
Middle (G) offset: (2, 5)
Bottom (R) offset: (3, 12)
Output: ../images/naive_ss_cathedral.jpg
Time taken: 3.3281 seconds.

Pyramid Multi-Scale: Large Images

  • Max normalized cross-correlation, recursively/iteratively
  • 21.5253s on average

When we try to align larger images, naive search simply isn't enough anymore because it's too computationally expensive. To address this, we use the pyramid multi-scale algorithm to align the green (G) and red (R) channels to the blue (B) channel for larger TIF images. In my implementation, I (1) created an image pyramid with the smallest image being at least 256x256 pixels, (2) iteratively aligned the green (G) and red (R) channels to the blue (B) channel, and (3) used the max normalized cross-correlation to find the best offset between the two channels. In terms of pseudocode, the pyramid multi-scale algorithm has the following form:

pyramid_t, template_i := the image pyramid for template and alignee, respectively
best_offset := (0, 0)
for i in range(len(pyramid_t)):
	best_offset = process_level(pyramid_t[i], pyramid_i[i], i)
return best_offset
The method process_level is essentially a modified version of my naive single-scale algorithm. The only difference is that I'm iterating through the image pyramid and iteratively aligning the green (G) and red (R) channels to the blue (B) channel, as we scale back up to the original image size. This worked perfectly for all of the .tif images except for emir.tif—we do resolve this by using the Fourier single-scale algorithm, which will be discussed in the next section. On the whole, we can see that the runtime has increased significantly, but this is still much faster than the naive single-scale algorithm if we were to apply it to these larger images.

emir.tif

Middle (G) offset: (24, 48)
Bottom (R) offset: (42, 72)

Input: ../data/emir.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (24, 48)
Bottom (R) offset: (42, 72)
Output: ../images/pyramid_ms_emir.jpg
Time taken: 23.2480 seconds.

church.tif

Middle (G) offset: (4, 24)
Bottom (R) offset: (-4, 58)

Input: ../data/church.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (4, 24)
Bottom (R) offset: (-4, 58)
Output: ../images/pyramid_ms_church.jpg
Time taken: 24.8634 seconds.

three_generations.tif

Middle (G) offset: (14, 52)
Bottom (R) offset: (10, 112)

Input: ../data/three_generations.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (14, 52)
Bottom (R) offset: (10, 112)
Output: ../images/pyramid_ms_three_generations.jpg
Time taken: 25.8074 seconds.

melons.tif

Middle (G) offset: (10, 82)
Bottom (R) offset: (12, 178)

Input: ../data/melons.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (10, 82)
Bottom (R) offset: (12, 178)
Output: ../images/pyramid_ms_melons.jpg
Time taken: 27.0712 seconds.

onion_church.tif

Middle (G) offset: (26, 52)
Bottom (R) offset: (36, 108)

Input: ../data/onion_church.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (26, 52)
Bottom (R) offset: (36, 108)
Output: ../images/pyramid_ms_onion_church.jpg
Time taken: 23.4848 seconds.

train.tif

Middle (G) offset: (6, 42)
Bottom (R) offset: (32, 86)

Input: ../data/train.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (6, 42)
Bottom (R) offset: (32, 86)
Output: ../images/pyramid_ms_train.jpg
Time taken: 23.6746 seconds.

icon.tif

Middle (G) offset: (16, 40)
Bottom (R) offset: (22, 90)

Input: ../data/icon.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (16, 40)
Bottom (R) offset: (22, 90)
Output: ../images/pyramid_ms_icon.jpg
Time taken: 22.5223 seconds.

self_portrait.tif

Middle (G) offset: (28, 78)
Bottom (R) offset: (36, 176)

Input: ../data/self_portrait.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (28, 78)
Bottom (R) offset: (36, 176)
Output: ../images/pyramid_ms_self_portrait.jpg
Time taken: 24.4477 seconds.

harvesters.tif

Middle (G) offset: (16, 60)
Bottom (R) offset: (14, 124)

Input: ../data/harvesters.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (16, 60)
Bottom (R) offset: (14, 124)
Output: ../images/pyramid_ms_harvesters.jpg
Time taken: 13.8459 seconds.

sculpture.tif

Middle (G) offset: (-10, 34)
Bottom (R) offset: (-26, 140)

Input: ../data/sculpture.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (-10, 34)
Bottom (R) offset: (-26, 140)
Output: ../images/pyramid_ms_sculpture.jpg
Time taken: 15.3781 seconds.

lady.tif

Middle (G) offset: (8, 52)
Bottom (R) offset: (12, 112)

Input: ../data/lady.tif
Alignment function: pyramid_ms
Show image: True
Middle (G) offset: (8, 52)
Bottom (R) offset: (12, 112)
Output: ../images/pyramid_ms_lady.jpg
Time taken: 12.4353 seconds.

(Bells & Whistles) Fourier Single-Scale: All Images

  • Fourier transforms and phase correlation
  • 0.0232s on average for jpg
    1.3789s on average for tif

Finally, for my bells & whistles, I implemented a single-scale algorithm that utilizes Fourier transforms and phase correlation to align the green (G) and red (R) channels to the blue (B) channel for all images. To do this, I first converted the images to the frequency domain using the Fourier transform, and then computed the cross-power spectrum to find the phase correlation. In pseudocode, the Fourier single-scale algorithm has the following form:

template_fft, image_fft := the Fourier transforms of the template and image
cross_power_spectrum := template_fft * conj(image_fft)
normalized_cps := the normalized cross-power spectrum
dx, dy := the offset that maximizes the normalized_cps
This method ended up being the most effective, as it was able to align all of the images without any artifacts and ran much faster than the other two methods as evidenced by the average times for both the jpg and the tif files.

monastery.jpg

Middle (G) offset: (2, -3)
Bottom (R) offset: (2, 3)

Input: ../data/monastery.jpg
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (2, -3)
Bottom (R) offset: (2, 3)
Output: ../images/fourier_ss_monastery.jpg
Time taken: 0.0188 seconds.

tobolsk.jpg

Middle (G) offset: (2, 3)
Bottom (R) offset: (3, 6)

Input: ../data/tobolsk.jpg
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (2, 3)
Bottom (R) offset: (3, 6)
Output: ../images/fourier_ss_tobolsk.jpg
Time taken: 0.0360 seconds.

cathedral.jpg

Middle (G) offset: (2, 5)
Bottom (R) offset: (3, 12)

Input: ../data/cathedral.jpg
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (2, 5)
Bottom (R) offset: (3, 12)
Output: ../images/fourier_ss_cathedral.jpg
Time taken: 0.0149 seconds.

emir.tif

Middle (G) offset: (24, 49)
Bottom (R) offset: (41, 106)

Input: ../data/emir.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (24, 49)
Bottom (R) offset: (41, 106)
Output: ../images/fourier_ss_emir.jpg
Time taken: 1.7839 seconds.

church.tif

Middle (G) offset: (3, 25)
Bottom (R) offset: (-4, 58)

Input: ../data/church.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (3, 25)
Bottom (R) offset: (-4, 58)
Output: ../images/fourier_ss_church.jpg
Time taken: 1.9641 seconds.

three_generations.tif

Middle (G) offset: (12, 55)
Bottom (R) offset: (8, 111)

Input: ../data/three_generations.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (12, 55)
Bottom (R) offset: (8, 111)
Output: ../images/fourier_ss_three_generations.jpg
Time taken: 1.8595 seconds.

melons.tif

Middle (G) offset: (10, 80)
Bottom (R) offset: (14, 176)

Input: ../data/melons.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (10, 80)
Bottom (R) offset: (14, 176)
Output: ../images/fourier_ss_melons.jpg
Time taken: 1.1858 seconds.

onion_church.tif

Middle (G) offset: (26, 55)
Bottom (R) offset: (34, 107)

Input: ../data/onion_church.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (26, 55)
Bottom (R) offset: (34, 107)
Output: ../images/fourier_ss_onion_church.jpg
Time taken: 1.0703 seconds.

train.tif

Middle (G) offset: (0, 40)
Bottom (R) offset: (28, 85)

Input: ../data/train.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (0, 40)
Bottom (R) offset: (28, 85)
Output: ../images/fourier_ss_train.jpg
Time taken: 1.1062 seconds.

icon.tif

Middle (G) offset: (16, 39)
Bottom (R) offset: (23, 88)

Input: ../data/icon.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (16, 39)
Bottom (R) offset: (23, 88)
Output: ../images/fourier_ss_icon.jpg
Time taken: 1.0008 seconds.

self_portrait.tif

Middle (G) offset: (29, 77)
Bottom (R) offset: (37, 175)

Input: ../data/self_portrait.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (29, 77)
Bottom (R) offset: (37, 175)
Output: ../images/fourier_ss_self_portrait.jpg
Time taken: 1.5025 seconds.

harvesters.tif

Middle (G) offset: (18, 60)
Bottom (R) offset: (11, 118)

Input: ../data/harvesters.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (18, 60)
Bottom (R) offset: (11, 118)
Output: ../images/fourier_ss_harvesters.jpg
Time taken: 1.5221 seconds.

sculpture.tif

Middle (G) offset: (-11, 33)
Bottom (R) offset: (-27, 140)

Input: ../data/sculpture.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (-11, 33)
Bottom (R) offset: (-27, 140)
Output: ../images/fourier_ss_sculpture.jpg
Time taken: 1.2267 seconds.

lady.tif

Middle (G) offset: (9, 57)
Bottom (R) offset: (13, 120)

Input: ../data/lady.tif
Alignment function: fourier_ss
Show image: True
Middle (G) offset: (9, 57)
Bottom (R) offset: (13, 120)
Output: ../images/fourier_ss_lady.jpg
Time taken: 0.9465 seconds.

Acknowledgements

This project is part of the Fall 2024 offering of CS180: Intro to Computer Vision and Computational Photography, at UC Berkeley. This website template is modified from HTML5 UP, and the images are from the Prokudin-Gorskii photo collection.