Revision as of 14:11, 29 November 2012 by galazzo (Talk | contribs)

Face Recognition using 2D Fast Fourier Transform

From Nokia Developer Wiki
Jump to: navigation, search

This article explains how to implement a simple face recognition system based on analysis throught their Fourier spectrum. Recognition is done by finding the closest match between feature vectors containing the Fourier coefficients at selected frequencies. The introduced method well compares to other competing approaches.
Despite the argument is certainly complex, the approach used and the tools already implemented make the process easy to implement by all users. The call is therefore not to be frightened by appearances.

WP Metro Icon Multimedia.png
WP Metro Icon WP8.png
Article Metadata
Tested with
Devices(s): Nokia Lumia
Platform(s): Windows Phone
Windows Phone 8
Keywords: Face Recognition, Image Processing
Created: galazzo (12 Nov 2012)
Last edited: galazzo (29 Nov 2012)



The face is one of several features witch can be used to uniquely identify a person. It's the charateristic that we most commonly use to recognize others. Not two human faces are identical which makes them well suited for use in identification.
Besides being a challenging problem in itself the importance of face recognition systems lies in their potential applications such as access control, passport, etc...
The obviuous advantage of a face recognition system compared to competing methods is its low level of intrusion. It only requires looking into camera.
Automated face recognition systems generally evolved along two main routes, either the analysis of grey level information ( often called template based ) or the extraction of mainly geometrical features such as shape, profile or hair colour.
The work presented here comprises a novel template based approach that considering it's simple algorithm compares very wel to other more complex methods that are used commonly such Hidden Markov Models or back propagation Neural Network.

According to humans are thought to view faces primary in a holistic manner and experiments suggest that holistic approaches are superior to geometrical recognition systems.
The technique presented is based on the Fourier spectrum of facial images, thus it relies on a global transformation, every pixel in the image contributes to each value in the spectrum.
The Fourier spectrum is a plot of the energy against spacial frequencies, where spatial frequencies relate to the spatial relations of intensities in the image . In our case this translate to distances between areas of particular brightness such as the overall area of the head or the distance of the eyes.
Higher frequencies describe finer details and contrary to what you might think we found them less useful for identification, just as humans can recognise a face from a brief look without focusing on small details.
The recognition of faces is done by finding the closet match ( the difference or distance ) between the newly presented face and all those faces known to the system. The distances are calculated between the feature vectors with entries that are the Fourier transform values at specially chosen frequencies. As few as 30 frequencies yield excellent results.

Fast Fourier Transform

The Fourier Transform is an important image processing tool which is used to decompose an image into its sine and cosine components. The output of the transformation represents the image in the Fourier or frequency domain, while the input image is the spatial domain equivalent. In the Fourier domain image, each point represents a particular frequency contained in the spatial domain image.

The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering, image reconstruction and image compression, text orientation finding that will be covered ( hopefully ) in further articles.

The Fast Fourier Transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. The discrete Fourier transform (DFT) transforms one function into another, which is called the frequency domain representation of the original function. The DFT requires an input function that is discrete. Such inputs are often created by sampling a continuous function, such as a person's voice. The discrete input function must also have a limited (finite) duration, such as one period of a periodic sequence or a windowed segment of a longer sequence.

The DFT is the sampled Fourier Transform and therefore does not contain all frequencies forming an image, but only a set of samples which is large enough to fully describe the spatial domain image. The number of frequencies corresponds to the number of pixels in the spatial domain image, i.e. the image in the spatial and Fourier domain are of the same size.

For a square image of size N×N, the two-dimensional DFT is given by:


where f(a,b) is the image in the spatial domain and the exponential term is the basis function corresponding to each point F(k,l) in the Fourier space. The equation can be interpreted as: the value of each point F(k,l) is obtained by multiplying the spatial image with the corresponding base function and summing the result.
The Fourier Transform produces a complex number valued output image which can be displayed with two images, either with the real and imaginary part or with magnitude and phase. In image processing, often only the magnitude of the Fourier Transform is displayed, as it contains most of the information of the geometric structure of the spatial domain image. However, if we want to re-transform the Fourier image into the correct spatial domain after some processing in the frequency domain, we must make sure to preserve both magnitude and phase of the Fourier image.

The Fourier domain image has a much greater range than the image in the spatial domain. Hence, to be sufficiently accurate, its values are usually calculated and stored in float values.
The Fourier Transform is used if we want to access the geometric characteristics of a spatial domain image. Because the image in the Fourier domain is decomposed into its sinusoidal components, it is easy to examine or process certain frequencies of the image, thus influencing the geometric structure in the spatial domain.

In most implementations the Fourier image is shifted in such a way that the DC-value (i.e. the image mean) F(0,0) is displayed in the center of the image. The further away from the center an image point is, the higher is its corresponding frequency.

For a slightly deeper view, see Sound pattern matching using Fast Fourier Transform in Windows Phone.

Selecting Frequencies

From the spectrum it can be seen that almost all the informations is contained near the center, within the low frequencies. Thus it seems reasonable that these frequencies will also provide the best ground for the recognition process. Valuable frequencies don't lie in a circle around the origin but more in a rhombus shaped region.
We know that the second half of FFT carry no useful and duplicated information, so we can half the data to treat. As the 2D FFT is built as two pass of 1D FFT it means that we can focus just to one quadrant reducing further the data to treat.

Working with 2D FFT in Windows Phone

  • Download DSP.cs and add it into your project. DSP.cs provides a namespace called DSP and a class FourierTransform containing a set of funtions to compute the FFT.

Note.pngNote: Don't forget is not possible to use Camera when Zune software is connected. To avoid the problem you should close Zune and launch WPConnect.exe

Don't forget to include the namespace DSP

using DSP;
using Microsoft.Phone.Tasks; // Needed for Camera Task
using Microsoft.Phone; // Needed for PictureDecoder
using System.Windows.Media.Imaging;
namespace Face_Recognition
public partial class MainPage : PhoneApplicationPage
CameraCaptureTask ctask;
PhotoChooserTask photo;
public static WriteableBitmap CapturedImage;
private static int W = 256;
private static int H = 256;
private static int matchSamples = 25;
private double[] compareSignal = new Double[matchSamples];
private Double[] pRealIn = new Double[W*H];
private Double[] pImagIn = new Double[W*H];
private Double[] pRealOut = new Double[W*H];
private Double[] pImagOut = new Double[W*H];
private Double[] pRealOut2 = new Double[W * H];
private Double[] pImagOut2 = new Double[W * H];
public MainPage()
//Create new instance of CameraCaptureClass
ctask = new CameraCaptureTask();
photo = new PhotoChooserTask();
//Create new event handler for capturing a photo
ctask.Completed += new EventHandler<PhotoResult>(ctask_Completed);
photo.Completed += new EventHandler<PhotoResult>(ctask_Completed);
photo.PixelHeight = H;
photo.PixelWidth = W;
private void ApplicationBarIconButton_Click(object sender, EventArgs e)
//Show the camera.
private void ApplicationBarIconButton_Click_1(object sender, EventArgs e)
photo.ShowCamera = true;

Convertint a pixel to Grayscale

Here a useful function to convert a coloured pixel into grayscale. That operation allow us to save more computation,

     internal int ColorToGray(int color)
int gray = 0;
int a = color >> 24;
int r = (color & 0x00ff0000) >> 16;
int g = (color & 0x0000ff00) >> 8;
int b = (color & 0x000000ff);
if ((r == g) && (g == b))
gray = color;
// Calculate for the illumination.
// I =(int)(0.109375*R + 0.59375*G + 0.296875*B + 0.5)
int i = (7 * r + 38 * g + 19 * b + 32) >> 6;
gray = ((0x1) << 24) | ((i & 0xFF) << 16) | ((i & 0xFF) << 8) | (i & 0xFF);
return gray;


void ctask_Completed(object sender, PhotoResult e)
if (e.TaskResult == TaskResult.OK && e.ChosenPhoto != null)
//Take JPEG stream and decode into a WriteableBitmap object
MainPage.CapturedImage = PictureDecoder.DecodeJpeg(e.ChosenPhoto, W, H);
int[] pixel = MainPage.CapturedImage.Pixels;
// Extracts each pixel from the original image and convert it into a float gray scaled one
for (int y = 0; y < MainPage.CapturedImage.PixelHeight; y++)
for (int x = 0; x < MainPage.CapturedImage.PixelWidth; x++)
pRealIn[x + (y * MainPage.CapturedImage.PixelWidth)] = ColorToGray(MainPage.CapturedImage.Pixels[x + (y * MainPage.CapturedImage.PixelWidth)]) & 0xFF;
// Compute the 2D FFT
DSP.FourierTransform.Compute2D((uint)W, (uint) H, ref pRealIn, null, ref pRealOut, ref pImagOut, false);
// Extracts the quarter of rhombus containig the most valuable frequencies
Double[] match = DSP.Utilities.triangularExtraction( ref pRealOut, (uint)W, (uint)H, (uint)matchSamples, 0);
// Compute the differenze between the captured image and the image to compare
thresholdText.Text = ""+DSP.Utilities.MSE(ref compareSignal, ref match, (int) matchSamples);
// Restore the original image computing the inverse FFT 2D
DSP.FourierTransform.Compute2D((uint)W, (uint) H, ref pRealOut, pImagOut, ref pRealOut2, ref pImagOut2, true);
int color = 0;
for (int y = 0; y < MainPage.CapturedImage.PixelHeight ; y++)
for (int x = 0; x < MainPage.CapturedImage.PixelWidth; x++)
color = (int)Math.Floor(pRealOut2[x + (y * MainPage.CapturedImage.PixelWidth)]) ;
color = (color > 0) ? ((color > 255) ? 255 : color) : 0;
MainPage.CapturedImage.Pixels[x + (y * MainPage.CapturedImage.PixelWidth)] = (1 << 24) | (color<<16) | (color<<8) | color;
// Populate image control with WriteableBitmap object.
MainImage.Source = MainPage.CapturedImage;
// You decided not to take a picture


Assuming the threshold is 50

Image A Image B Result
534 page views in the last 30 days.

Was this page helpful?

Your feedback about this content is important. Let us know what you think.


Thank you!

We appreciate your feedback.