CS180: Introduction to Computer Vision & Computation Photography

Project 2

Fun with Filters and Frequencies

Clara Hung

Project Overview

Project 2 is a big mish-mosh of different sub-projects that explores how to manipulate image frequencies. By the end of the project, we should be able to have some fun with multi-resolution blending!

Part 1: Fun with Filters

1.1 Finite Difference Operator

Approach

One way to implement an edge-detector, or a high-pass filter (HPF), is to take the derivative of an image. However, how does one take a derivative (a continuous operation) over a discrete space that is an image? Here comes the Finite Difference Operator (FDO) to the rescume! The FDO is a simple filter that approximates the gradient of an image, and we define a FDO in each direction in the image (x, y):

$$D_x = \begin{bmatrix} -1 & 1 \end{bmatrix}$$

$$D_y = \begin{bmatrix} -1 \\ 1 \end{bmatrix}$$

To obtain the partial derivative of the image (I) in both the x and y directions, we convolve the image with the FDOs. (Why can't we take a multi-dimensional derivative simultaneously? Beats me -- ask the math department!)

$$\frac{\partial I}{\partial x} = I \ast D_x$$

$$\frac{\partial I}{\partial y} = I \ast D_y$$

The magnitude of the gradient can be calculated by taking the square root of the sum of the squares of the partial derivatives:

$$\text{Gradient Magnitude} = \| \nabla f \| = \sqrt{(\frac{\partial I}{\partial x})^2 + (\frac{\partial I}{\partial y})^2}$$

To obtain an edge image, we can binarize the gradient magnitude by choosing a threshold and setting all values below the threshold to be zero and above to be 1. In the images below, I set the threshold to be 0.1, chosen after plotting a histogram of the gradient magnitude's pixel values. Most of the values will be 0 or noise. Notice how, even with the 0.1 threshold, the edge image is still quite noisy! We can still see some specs on the grass, and we don't want that!

Results

Finite Difference Operator in x-direction

Finite Difference Operator in y-direction

Gradient Magnitude

Magnitude of Finite Difference Operator Thresholded

Gradient Magnitude Thresholded

1.2 Derivative of a Gaussian (DoG) Filter

So, how should we get rid of noise? Since noise is a high-frequency component, let's low-pass filter (LPF) it first with a Gaussian! For this, we can use the associativity and commutivity of the convolution to have two implementations of the same filter: taking the derivative of a gaussian-filtered image is the same as convolving the image with a derivative-of-gaussian (DoG) filter!

$$\frac{\partial}{\partial x} (I \ast G) = I \ast (\frac{\partial}{\partial x} G)$$

For my implementation, I used cv2.getGaussianKernel with kernel_size = 10 and sigma=kernel_size/6. This took a little bit of fine-tuning to get right as sometimes, when the kernel size is too small, the DoG doesn't filter properly. Here are some results of the gradient magnitude, normalized to [0, 1]:

Filter -> Derivative

Filter -> Derivative, Binarized

Derivative of Gaussian -> Filtered

DoG -> Filtered, Binarized

For these images, using the same threshold of 0.1, we can clearly see that the edges are significantly cleaner than the FDO without the LPF. Additionally, there are no longer (or if there are, very few) high frequency componenets on the grassy area. The lines are thicker, but that's likely because the LPF smooths the edges out. When we compare the images from the two methods, it appears that they are nearly identical, sans some intensity differences that may due to implementation and bit-depth inaccuracies. For reference, here are my DoG filters! They look super blurry because they're blown up from a small 10x10 kernel.

DoG in x-direction

DoG in y-direction

Part 2: Fun with Frequencies

2.1 Image "Sharpening"

Image sharpening is a technique that enhances the edges of an image by accentuating the high-frequency components. One way to do this is to subtract the low-pass filtered image from the original image. This is equivalent to adding the high-pass filtered image to the original image. Mathematically, we define it as follows:

$$\text{Sharpened Image} = \text{Original Image} + \alpha(\text{Original Image} - \text{Low-pass Filtered Image})$$

Where $\alpha$ is a multiplier that defines how much the image should be sharpened. Below, the $\sigma, k, \alpha$ values of each image are included next to the sharpened image.

Original Taj

Sharpened Taj: $\sigma=1, k=3, \alpha=2$

Taj Details

I took this film photo myself!

Original Taipei

Sharpened Taipei: $\sigma=1, k=3, \alpha=2$

Taipei Details

For evaluation, let's test a sharp image, blur it, then try to sharpen it again.

Original Macarons

Sharpened Macarons: $\sigma=5, k=30, \alpha=4$

Macarons Details

Blurred Macarons: $\sigma=6, k=20$

Blurred Macarons sharpened with low-passed details: $\sigma=5, k=30, \alpha=15$

Blurred Macarons sharpened with original details: $\sigma=5, k=30, \alpha=15$

From the images above, it is clear that when we attempt to sharpen our blurred macarons with its own details, the images become slightly sharper compared to the original blurred macarons, but not by much, whereas when we sharpened the blurred image with details from the original filter, we do get a sharper image back if we turn up the $\alpha$ factor. This is likely because when we first LPF the macarons to get the blurred image, a lot of the high frequency content is already filtered out. Thus, we cannot add more high frequency content back (remember Babak's core tenant of LTI systems: no new frequencies!!!). However, we can manually add the high frequency content back by using the mask of the original image. This poses an interesting problem for deblurring applications: can we deblur without priors?

2.2 Hybrid Images

Next, we can try hybridizing images! For this section, we will superimpose two images together: one that is LPF-ed, leaving only LF components; one that is HPF-ed, leaving only HF components. The idea here is that when we look at the image from far away, our eyes are accustomed to seeing LF content, whereas if we look up close, we are used to seeing HF content. For Derek and Nutmeg, I aligned using the sparkle in their eye. I used a larger gaussian kernel (15) and a larger sigma for greater smoothing (10). I saved these images as grayscale since, when you choose images that have different intensities and color profiles, the hybrid images look a bit wacky. Images worked best when I had a larger kernel, with a greater \(\sigma\( for the HPF and smaller for the LPF. As a side note, I had to make sure my type conversions were correct (clip 0-255,then normalize) for when saving as gray scale, the images must be in uint8.

Derek (Low)

Nutmeg (High)

Hybrid

Here are a couple other I tried! This has great applications for biomimicry!

Seashell (Low)

Spiral Staircase (High)

Hybrid

Moth (Low)

Owl (High)

Hybrid

Van Gogh Self Portrait (Low)

The Girl with the Pearl Earring (High)

Hybrid

Here's a Fourier analysis of the paintings. The difference in sizing is a result of the alignment. You can notice there are some sidelobes in the Filtered Van Gogh. That may be a result of a windowing operation, perhaps in the alignment, or that our LPF was far too tight.

Van Gogh FFT

LPF-ed Van Gogh FFT

Pearl FFT

HPF-ed Pearl FFT

Hybrid FFT

Failure cases! Sometimes this hybrid imaging doesn't really work! I think it is because Taylor and Travis don't really have comparable face shapes, and if they do, Travis's beard gets in the way. His mouth is also a bit large, so I don't think the results look nearly as good as the previous ones. In this case, I also think I need to increase the $\sigma$ of Travis since I think his HF components need to be accentuated. You can see Taylor more clearly than Travis.

Taylor Swift (Low)

Travis Kelce (High)

Taylor and Travis Hybrid

Multi-Resolution Blending and the Oraple Journey

2.3 Gaussian and Laplacian Stacks

Here are the results of my Laplacian stacks with a binary half-half mask for creating the Oraple! Level 0 is the highest frequency layer, Level 2 medium frequencies, and Level 4 the lowest frequency layer! Additionally for the blending, I found that it was helpful to pass my mask through a LPF first to smooth out the sharp transitions. For all of the following blends, my parameters were as follows: $\text{depth} = 5, \sigma = 16, k = 6\sigma + 1$. Why the formula for the kernel size? I didn't want to manually guess and check kernel sizes, so I looked online for general heuristics on how kernel size should be determined.

Apple Level 0

Oraple Level 0

Orange Level 0

Apple Level 2

Oraple Level 2

Orange Level 2

Apple Level 4

Oraple Level 4

Orange Level 4

Apple Final

Oraple Final

Orange Final

2.4 Multi-Resolution Blending: the Oraple's final Journey!

People often say that Apple Park (AP) looks like a doughnut... does it? For this, I had to use the alignment starter code to align the donut and AP. It was hard to find images that had similar lighting.

Donut Level 0

Donut-AP Level 0

AP Level 0

Donut Level 2

Donut-AP Level 2

AP Level 2

Donut Level 4

Donut-AP Level 4

AP Level 4

Donut Final

Donut-AP Final

AP Final

An irregular mask, you say? This was inspired by my friend Viraj who wanted a catowl in the spirit of the catowl reddit page. I first aligned the cat and owl using the given starter code. Then, I used the aligned images to create my mask by positioning a circle on keynote to cover the owl's head. I took a screenshot of this and loaded it in as a numpy array. (Someone should teach me how to use Photoshop.)

Cat Level 0

Catowl Level 0

Owl Level 0

Cat Level 2

Catowl Level 2

Owl Level 2

Cat Level 4

Catowl Level 4

Owl Level 4

Cat Final

Catowl Final

Owl Final