CS180: Introduction to Computer Vision & Computation Photography

Project 1

Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection

Clara Hung

Project Overview

Sergei Mikhailovich Prokudin-Gorskii (1863-1944), a Russian photographer, was a pioneer in color photography and much of his work documented the Russian Empire. His is idea was a precursor to our modern-day RGB color model. He took three black and white photographs of the same scene on glass-plates, each with a different color filter (Red, Green, and Blue). To recover the color image, each of the three, filtered negatives were projected onto the same screen through their corresponding color filters. Due to limitations of the equipment at the time,the images were not perfectly aligned yet still impressive to the population, who had only known of black and white photographs.

Using modern image-processing techniques, we can do better! The goal of this project is to take digitized versions of the original glass plates and utilize our digital image-processing toolkit to combine and align the filtered images into a full color photo with minimal artifacts.


Alignment

0. the most naive approach

Perhaps the most naive approach to colorizing the Prokudin-Gorskii photo collection is to simply stack the three images on top of each other. This is a simple and quick way to get a color image. First, we are given a single .tiff image with all three color channels stacked on top of each other in BGR order. The key assumption here is that each color channel is the same dimension, so we can split the stacked image into three separate images. We then stack each split color channel on top of each other to get a color image. However, this approach will still lend itself to color artifacts since the images are not perfectly aligned with each other. This was a result of the original photography process. In general, the equipment at the time didn't allow for simultaneous captures of each color channel with the same visual perspective. Whether it was beamsplitters, three stacked cameras, or sequential exposures, real-life imaging conditions made capturing mis-aligned images common.


1. naive approach

Now, the naive approach is to do what the project spec tells you to do :p. To improve the alignment of each color channel, we first exhaustively search over a small window of possible displacements (-15, 16), score each displacement with an image matching metric, then take the displacement with the best score. Our reason for searching over a small window of displacements is due to the assumption that each color channel was captured under similar imaging conditions with some small delta in perspective. Thus, we're essentially manually searching for a Homography transform between the images. In this case, we use the Blue channel as our reference. For this approach, I used the Normalized Cross-Correlation (NCC), which is just a dot product between the two normalized image vectors: \( \frac{\text{Image1}}{\| \text{Image1} \|} \cdot \frac{\text{Image2}}{\| \text{Image2} \|} \). I found that first center-cropping the images down to 90% of the original image size lead to better NCC evaluation as edge effects would not be calculated into the coefficients.


2. fourier, baby!

One of the limitations of the previous method is that, for extremely large images, the runtime is SLOW! As computer scientists, we love optimization and want to reduce the runtime. Thus, could we take advantage of intrinsic properties of images to reduce our computation?

Images are naturally a type of information easily expressed in the frequency space. A dense image in the time-domain is sparse in frequency-domain. Recall that our goal is to estimate the relative offset between different color channels in the time-domain. One property that is useful is the time-shifting property of Fourier transforms, which states that a shift in time is a phase-delay in frequency.

\[ \mathcal{F}\{f(t - t_0)\} = F(\omega) e^{-j \omega t_0} \]

Additonally, we know that the Fourier transform of a delta function is the complex exponential:

\[ \mathcal{F}^{-1}\{1\} = \delta(t) \]

Thus, we can use the following procedure to identify the shifts in time between two images. This procedure follows that listed in the Wikipedia Page for Phase Correlation. First, we take the 2D Discrete Fourier Transform (DFT) of the images. Then, we calculate the normalized cross-power spectrum (NCPS) of the two DFTs. The result of the NCPS, which is proved in the Wikipedia page, is just the 2D phase-shift between the two images, i.e. complex exponentials. Knowing that the iDFT of a complex exponential is a shifted delta, by computing the iDFT of the NCPS, we will obtain the 2D shifts between the two images in time. Not only does this allow us to obtain the exact shifts between color channels, we also save on compute because doing DFT/iDFTs is much faster than computing NCC since we can use FFT.

Shoutout to my DSP bestie Gopala for drilling this in our heads.


3. comparison

Emir Ref
Reference Image
No Alignment
Emir NCC
NCC
Displacement: G (14, 14); R (14, 14)
Emir Phase
Phase
Displacement: G (49, 24); R (106, 41)

4. pyramid scheme

Alas, despite my Fourier method being both speedy and precise, the project spec expects us to also complete image alignment using a pyramid scheme. The idea behind this is as follows: align the images at multiple resolutions, starting from the coarsest representation to the finest representation. Natural images have more low frequency components than high frequency components. Downsampling an image reduces its sampling rate and its size. Thus, a downsampled image still captures most of the image's low frequencies while also reducing algorithm runtime. This means that, when we perform our exhaustive search in part (1), we'd ideally like to exhaustively search over a smaller image space. Once we've narrowed down the "ballpark" shifts, we can slowly upsample, or increase our resolution, to fine-tune our shift for higher frequency information.

The general implementation becomes a recursive process. For each level of the image pyramid, starting from the bottom of the pyramid, which is the finest representation, we first, apply an anti-aliasing low pass filter (in this case, a Gaussian smoothing filter), then downsample the image by a factor of 2. We continue filtering then downsampling until we reach our desired depth for the top of the pyramid (coarsest representation). For the coarsest representation, we perform an exhaustive search using NCC to find the ballpark displacements. Then, for each decreasing (upsampled) level, we perform NCC on smaller search areas. In our case, since we only downsample by two each layer, the search area only needs to be (-2, 2) pixels.

At first, completing the image pyramid using the NCC coefficient did not produce images as accurate as the phase correlation method. Thus, I changed my comparison metric to phase correlation instead.

Lady Ref
Reference Image
No Alignment
Lady Multiscale
Image Pyramid
G (57, 9); R (120, 13)

4. the nice-to-haves

4.1. auto-contrasting

Often times, perceived image quality can be improved by rescaling pixel intensities. The most basic rescaling is to 0-1, where the smallest (darkest) pixel value is 0 and the largest (brightest) pixe value is 1. However, histogram equalization is also another method that evens out the intensity histogram of an image, and generally increases the global contrast of an image. For this, I used scikit-image's exposure.equalize_hist and exposure.equalize_adapthist methods. The former returns the conventional histogram equalization, where equalization is applied to each RGB color component; however, this may produce unnatural looking images as the color balance as the relative distribution of each color channel may change. The latter uses an algorithm for local contrast enhancement. It firsts converts the image to an HSV (Hue, Saturation, and Value) color space so it can perform the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm on the luminance (V) channel, then converts the image back to RGB space. In my opinion, the latter produces the most natural looking images, and improves contrast in regions of the image that have uneven brightnesses.

Comparison of different auto contrast mechanisms

5. results

Final image results processed using a pipeline of Phase Correlation (with image pyramiding for large images), cropping (to remove mismatched edges), and auto-contrasting using adaptive histogram equalization.

Cathedral Phase
Displacement: G (5, 2); R (12, 3)
Church Phase
Displacement: G (25, 3); R (58, -4)
Train Phase
Displacement: G (38, -41); R (85, 28)
Icon Phase
Displacement: G (39, 16); R (88, 23)
Lady Phase
Displacement: G (69, 9); R (120, 13)
Melons Phase
Displacement: G (79, 8); R (176, 14)
Monastery Phase
Displacement: G (338, 2); R (3, 2)
Onion Church Phase
Displacement: G (51, 19); R (107, 34)
Sculpture Phase
Displacement: G (33, 11); R (140, 7)
Self Portrait Phase
Displacement: G (25, 3); R (58, -4)
Three Generations Phase
Displacement: G (14, 14); R (14, 14)
Tobolsk Phase
Displacement: G (49, 24); R (106, 41)

6. references