×
Namespaces

Variants
Actions
Revision as of 03:42, 5 September 2013 by hamishwillee (Talk | contribs)

Neural Network based Image Processing

From Nokia Developer Wiki
Jump to: navigation, search

This article explain how to create an image processing application where neural networks and the Fast Fourier Transform are used to predict likely user filter choices.

WP Metro Icon Tools.png
SignpostIcon XAML 40.png
WP Metro Icon WP8.png
Article Metadata
Code ExampleTested with
SDK: Windows Phone 8.0 SDK, Nokia Imaging SDK
Devices(s): Nokia Lumia 925, Nokia Lumia 920
Compatibility
Platform(s):
Windows Phone 8
Article
Created: galazzo (02 Aug 2013)
Last edited: hamishwillee (05 Sep 2013)

Contents

Introduction

The key concept behind this article is to show how to create an app that can learn from your behaviour when you take pictures - and after training autonomously decide what imaging filters or other settings to apply on your behalf.

The app will use a neural network to make the decisions, which you will train using information about the image - including features extracted using the Fast Fourier Transform (see Face Recognition using 2D Fast Fourier Transform for libraries and key concepts) and other values like GPS position, date, time, compass direction, etc. We'll also use the Nokia Imaging SDK for some of the processing (to apply a cartoon filter in order to normalise colours).

As in my previous articles, most of the heavy lifting has been done for you in the form of code for a neural network library. The article also provides an overview of neural networks that some readers may find a bit complex - I've simplified it quite a lot and you'll find it well worth reading if you want to use the network in other applications (particularly the sections on how to train your network can make a huge difference in results).

Tip.pngTip: The neural network library shown in this article is generic and simple to use. It has been applied here to helping make decisions about image processing, but it can be used without change to solve any other problem that is susceptible to neural network based decision making.

App workflow

The application "workflow" is as described below:

  1. Take a new photo
  2. Apply the cartoon filter in order to normalize colours
  3. Process Fast Fourier Transform in order to extract key image features
  4. Use the most important key feature as input of neural network, using also other values like: GPS position, date/time, direction, any other parameter you find useful.

The selection of parameters assume that your filter selection will be based on things like whether you're at home or on holiday (GPS position), whether it is summer or winter using the date (I like my polarized colours in summer and different filters in winter!), whether it's morning or night (time) and the direction you're facing (compass).

Tip.pngTip: The selection is somewhat arbitrary/proof of concept. It might be that you're an expert photographer and your use case is to train your camera based on ISO settings, F-stop, etc and you want the camera to present your preferred options first. Use parameters that make sense to your app!

The neural network, based on all the information you provided and trained by your previous choices, will apply your preferred filter, categorize the image and upload them to your preferred SkyDrive directory.

Nokia Lumia devices have more than enough power for the needed computations.

Neural Networks

Neural networks are mathematical models inspired by the human central nervous system that are capable of machine learning and pattern recognition (ie one of the "tools" of artificial intelligence). They are usually presented as systems of interconnected neurons that can compute values from inputs by feeding information through the network.

For example, in a neural network for handwriting recognition, a set of input neurons may be activated by the pixels of an input image representing a letter or digit. The activations of these neurons are then passed on, weighted and transformed by some function determined by the network's designer, to other neurons, etc., until finally an output neuron is activated that determines which character was read.

Like other machine learning methods, neural networks have been used to solve a wide variety of tasks that are hard to solve using ordinary rule-based programming, including computer vision and speech recognition.

Here an example of neural network:

Back-propagation Neural Network

Note the presence of the "hidden layer". A neural network can have a number of intermediate hidden levels. Increasing the number of layers may (or may not) increase the quality of the result, but comes at the cost of increased processing time, and (often more important) increased training time.

Tip.pngTip: While more complex problems may require many hidden layers, often "real-world" problems can be resolved with only one. For example, it is possible to autonomously drive a car with only one hidden layer [1].

Choosing the best number of neurons in a hidden layer is an "art" that depends heavily on the experience of the neural network designer. As a rule of thumb, on a single-hidden-layer network it is a good idea to start with a number that is similar to the number of input neurons.

Hidden layers are only required when the problem is "linear separable" (described in the next section).

Linear Separability

Consider the two-input patterns which can be classified into to groups (X1, X2) as shown in the figure below.

Each point with either symbol of x or o represents a pattern with a set of values (X1,X2). Each pattern is classified into one of two classes. Notice that these classes can be separated with a single line L. They are known as linearly separable patterns.

Linear separability refers to the fact that classes of patterns with n-dimensional vector {fx} = (x1, x2, ... , xn) can be separated with a single decision surface. In the case above, the line L represents the decision surface.

LinearSeparability-NeuralNetwork-2.png

A single-layer neural network is able to categorize a set of patterns into two classes as the linear threshold function defines their linear separability. Conversely, the two classes must be linearly separable in order for a single-layer neural network to function correctly. Indeed, this is the main limitation of a single-layer neural network.

The most classic example of linearly inseparable pattern is a logical exclusive-OR (XOR) function. Shown in figure below is the illustration of XOR function that two classes, 0 for black dot and 1 for white dot, cannot be separated with a single line. The solution seems that patterns of (X1, X2 ) can be logically classified with two lines L1 and L2

LinearSeparability-NeuralNetwork.png

For the image processing use-case we're discussing in this article, the problem is not "linearly separable" - we can't strongly classify our choices and so we can't solve the problem with a single layer neural network. This means that we will need at least one hidden layer.


Working with the library

Create your own Windows Phone project with Visual Studio and import this library: Media:BPNeuralNetwork.zip

Inside you will find a file named BPNeuralNetwork.cs to add to project as Existing Element and of course include the namespace:

using ArtificialIntelligence;

Initialise the Neural Network

The Library was born in C++ and has been ported in PHP and of course in C#. We will keep just the C# version. Here the steps to initialise the Net:

BPNeuralNetwork NeuralNetwork = new BPNeuralNetwork();
 
ulong[] layers = new ulong[3];
layers[0] = 25;
layers[1] = 60;
layers[2] = 45;
 
NeuralNetwork.SetData(0.1, layers, 3);
 
NeuralNetwork.RandomizeWB();

The function SetData() initializes the structure and behaviour of our neural network. Here the signature:

public int SetData(double learning_rate, ulong[] layers, ulong tot_layers)
  • learning_rate: The learning rate is a very important parameter. Hiigher values decrease the training time (number of inputs) but will be less accurate, lower values increase the training time but ultimately give better results. For our project this will be a balance - we want accuracy, but we don't want to take a huge number of photos to train the net.
  • layers: An array defining the morphology/structure of the net. In our example we have the input layer (layers[0]) with 25 input neurons (the criteria we're basing our decision on), an intermediate "hidden" layer with 60 neurons, and an output layer with 45 neurons (one for each of the filters we might select).
  • tot_layers: Represent the number of layers. This is for compatibility with C++ version of the library, and is not used in C# where arrays include the length method.

The function RandomizeWB initialises the net with random weights that will be modified by the training process in order to shape the net based on it, so with your needs.

Storage Operations

A particularly interesting feature of this library is that it allows you to load and to save your net in a JSON file format. This opens up opportunities share your net for use on other devices or platforms - perhaps to share more optimised or "pre-trained nets" with other users. There are heaps of other scenarios worth exploring!

We can modify our code in that way:

string NeuralNetworkName = "neural-network.json";
 
IsolatedStorageFile storage = IsolatedStorageFile.GetUserStoreForApplication();
if (storage.FileExists(NeuralNetworkName))
{
NeuralNetwork.Load(NeuralNetworkName);
}
else
{
ulong[] layers = new ulong[3];
layers[0] = 25;
layers[1] = 60;
layers[2] = 45;
 
NeuralNetwork.SetData(0.1, layers, 3);
NeuralNetwork.RandomizeWB();
NeuralNetwork.Save(NeuralNetworkName);
}

Training the Neural Network

As explained in the theory, to train our neural network we need an input array and an output array with expected (correct) outputs:

double[] input = new double[25];
double[] output = new double[45];
 
// Do your init operation to fill properly the two arrays
NeuralNetwork.Train(input, output);

That's it! You can repeat this operation more times and it will shape the net to steady this thought.

Using the Neural Network

This is the most interesting section as at this point we can "ask" something to the net and to get an "answer".

double[] input = new double[25];
 
// Init input vector as needed
 
NeuralNetwork.SetInputs(input);
double[] output = NeuralNetwork.GetOutput();

First lets explain what this means. It's a common approach to have an input element for each "fact" assumed during network modelling, and to represent each element as a value with a range between 0 and 1. For example the first entry (input[0]) might represent the fact you are far from home, with a value of one for true and zero if you are close. The second entry might represent the fact it's summer, the third winter etc.

  1. Far from home
  2. Spring
  3. Summer
  4. Winter
  5. Autumn
  6. Time (6 - 12) Morning
  7. Time (12 - 18) Afternoon
  8. Time (18 - 22) Evening
  9. Time (22 - 6) Night
  10. Pointing ( almost ) to North (Compass)
  11. Pointing to West (Compass)
  12. Pointing to East (Compass)
  13. Pointing to South (Compass)
  14. Picture luminosity
  15. Phone in Landscape mode

The parameters above are my suggestions and might not actually work perfectly in practice - they do serve for explaining the format of our input array below. So if you are far from home, it's summer, late afternoon pointing to North and in Landscape mode you will have and input array as shown below:

[1],[0],[1],[0],[0],[0],[1],[0],[0],[1],[0],[0],[0],[0.7],[1]

Our output could be the Nokia Imaging SDK filter to apply:

  1. Cartoon
  2. Sepia
  3. Antique
  4. Lomo
  5. etc.

So if our output filter is the "Lomo" filter, then the output array would be:

[0],[0],[0],[1] ....


The net will "answer" with an array as shown below.

[0.023569],[0.15],[0.05366],[0.9348788] ....

The winning neuron is the one with the highest value - in this case the fourth neuron wins which corresponds to the "Lomo" filter. The threshold value where the selection can be trusted will depend on use case an how well trained the network is - as a rule of thumb you would hope for more than 0.75 for the winning neuron on a trained network.

Note.pngNote: As the nature of the article is image processing we are focused on this domain. Note however that this is just an example - the library can be used on any domain - this is a tutorial to show the potentials of the net.

Nokia Imaging SDK

There are a lot of other good wiki articles and tutorials on how to use Nokia Imaging SDK, and explaining how to apply filters. To avoid useless repetitions I will assume that you are already comfortable with the Nokia Imaging SDK and focus on how to use it, driven by a neural network.

In my project, I manage the photo capture in MainPage and I created a UserControl named PreviewComponent that will be opened each time a photo is captured.

<UserControl x:Class="GeniusCam.Preview"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="d"
FontFamily="{StaticResource PhoneFontFamilyNormal}"
FontSize="{StaticResource PhoneFontSizeNormal}"
Foreground="{StaticResource PhoneForegroundBrush}"
xmlns:telerikPrimitives="clr-namespace:Telerik.Windows.Controls;assembly=Telerik.Windows.Controls.Primitives"
d:DesignHeight="480" d:DesignWidth="800">
 
<Grid x:Name="LayoutRoot" Background="{StaticResource PhoneChromeBrush}">
<Grid.ColumnDefinitions>
<ColumnDefinition Width="640"/>
<ColumnDefinition Width="160"/>
</Grid.ColumnDefinitions>
 
<Image x:Name="PreviewImage" Grid.Column="0" Height="480" Width="640" />
 
<Grid Grid.Column="1">
<Grid.RowDefinitions>
<RowDefinition Height="400"/>
<RowDefinition Height="80"/>
</Grid.RowDefinitions>
<telerikPrimitives:RadLoopingList x:Name="filterLoopingList"
Height="480"
ItemWidth="160"
ItemHeight="60"
ItemSpacing="3"
Orientation="Vertical"
IsCentered="True"
Grid.Row="0"
HorizontalAlignment="Center"
VerticalAlignment="Center"
SelectedIndexChanged="filterLoopingList_SelectedIndexChanged">
</telerikPrimitives:RadLoopingList>
<Button Grid.Row="1" Content="Apply" Click="Button_Click"></Button>
</Grid>
</Grid>
</UserControl>
public event EventHandler FilterApplied;
 
private LoopingListDataSource src = null;
 
public int SelectedFilter { get; set; }
private WriteableBitmap CapturedImage { get; set; }
public WriteableBitmap OriginalImage { get; set; }
 
public string[] filters = new string[45];
 
private static int matchSamples = 50;
 
private Double[] pRealIn = null;
private Double[] pImagIn = null;
private Double[] pRealOut = null;
private Double[] pImagOut = null;
private Double[] pRealOut2 = null;
private Double[] pImagOut2 = null;
private Double[] fingerprint = null;
 
BPNeuralNetwork NeuralNetwork = new BPNeuralNetwork();
string NeuralNetworkName = "neural-network.json";
 
private Thread NeuralNetworkThread;
 
public Preview()
{
InitializeComponent();
 
IsolatedStorageFile storage = IsolatedStorageFile.GetUserStoreForApplication();
if (storage.FileExists(NeuralNetworkName))
{
NeuralNetwork.Load(NeuralNetworkName);
}
else
{
ulong[] layers = new ulong[3];
layers[0] = (ulong) matchSamples;
layers[1] = (ulong)(matchSamples * 1.5);
layers[2] = 45;
NeuralNetwork.SetData(0.1, layers, 3);
NeuralNetwork.RandomizeWB();
NeuralNetwork.Save(NeuralNetworkName);
}
 
CapturedImage = new WriteableBitmap(PreviewImage, null);
 
filters[0] = "Original";
filters[1] = "Antique";
filters[2] = "Auto Enhance";
 
.............
 
filters[43] = "Watercolor";
filters[44] = "White Balance";
 
src = new LoopingListDataSource(45);
src.ItemNeeded += (sender, args) =>
{
args.Item = new LoopingListDataItem(filters[args.Index]);
 
};
src.ItemUpdated += (sender, args) =>
{
args.Item.Text = filters[args.Index];
};
 
filterLoopingList.DataSource = src;
filterLoopingList.SelectedIndex = 0;
}

The portion of code to manage filter selection.

private async void filterLoopingList_SelectedIndexChanged(object sender, EventArgs e)
{
SelectedFilter = (sender as RadLoopingList).SelectedIndex;
 
using (EditingSession editsession = new EditingSession(OriginalImage.AsBitmap()))
{
switch (SelectedFilter)
{
case 1: editsession.AddFilter(FilterFactory.CreateAntiqueFilter() ); break;
case 2: {
AutoEnhanceConfiguration config = new AutoEnhanceConfiguration();
editsession.AddFilter(FilterFactory.CreateAutoEnhanceFilter(config));
break;
}
case 3: editsession.AddFilter(FilterFactory.CreateAutoLevelsFilter() ); break;
case 4: editsession.AddFilter(FilterFactory.CreateBlurFilter(BlurLevel.Blur3)); break;
 
............
 
case 42: editsession.AddFilter(FilterFactory.CreateWarpFilter(WarpEffect.Alien, (float) 0.5) ); break;
case 43: editsession.AddFilter(FilterFactory.CreateWatercolorFilter(0.5, 0.5) ); break;
case 44: editsession.AddFilter(FilterFactory.CreateWhiteBalanceFilter(WhiteBalanceMode.Mean, 128,128,128) ); break;
 
default: break;
}
await editsession.RenderToWriteableBitmapAsync(CapturedImage, OutputOption.PreserveAspectRatio);
PreviewImage.Source = CapturedImage;
CapturedImage.Invalidate();
}
GC.Collect();
}
public void SetFilterFromNeuralNetwork()
{
int index = 0;
double max = 0;
 
double[] input = new double[matchSamples];
 
ComputeFingerprint();
 
// We need this step as being a triangula extraction sometimes the size of extracted array is > of the one requested
Array.Copy(fingerprint, input, matchSamples);
 
NeuralNetwork.SetInputs(input);
double[] output = NeuralNetwork.GetOutput();
 
for (int i = 0; i < output.Length; i++)
{
if (output[i] > max)
{
max = output[i];
index = i;
}
}
 
SelectedFilter = index;
filterLoopingList.SelectedIndex = index;
}

We extract the key features of the acquired image, the fingerprint, pass it as input to the neural network and get the result. The winning Neuron is the filter which the Net decides to use or better in that case just the index of the filter array!

To select the winning Neuron just look for the one with the max score.

Using the Preview component

void cam_CaptureThumbnailAvailable(object sender, Microsoft.Devices.ContentReadyEventArgs e)
{
Deployment.Current.Dispatcher.BeginInvoke(delegate()
{
PreviewComponent.OriginalImage = PictureDecoder.DecodeJpeg(e.ImageStream);
PreviewComponent.SetFilterFromNeuralNetwork();
PreviewComponent.Visibility = System.Windows.Visibility.Visible;
viewfinderCanvas.Visibility = System.Windows.Visibility.Collapsed;
});
}

The idea is to extract the image features using the Fast Fourier Transform and to pass that values to neural network with other parameters in order to train and get answers from the Net.

Calling SetFilterFromNeuralNetwork we ask the Net to select the appropriate filter to apply. Whatever will be the choice of the Net, would help to reinforce learning in our network.

To understand the FFT argument please refer to article: Face Recognition using 2D Fast Fourier Transform.

Note.pngNote: For our purposes we don't need all the detail in the image features - it is better not to complicate our neural next by providing detail that is not needed. As a result, even though we have the computation power to work with high resolution images, we instead use the preview image with resolution of 640 x 480. We also apply the Cartoon filter to normalise the image (further removing unnecessary information). Only then do we extract the most useful information using the FFT.

Fast Fourier Transform

Download and add as existing element the DSP library: Media:DSP.zip

using DSP;
private void ComputeFingerprint()
{
int width = OriginalImage.PixelWidth;
int height = OriginalImage.PixelHeight;
 
int np2Width = (int)DSP.FourierTransform.NextPowerOfTwo((uint)width);
int np2Height = (int)DSP.FourierTransform.NextPowerOfTwo((uint)height);
int[] pixel = OriginalImage.Pixels;
 
pRealIn = new Double[np2Width * np2Height];
pImagIn = new Double[np2Width * np2Height];
pRealOut = new Double[np2Width * np2Height];
pImagOut = new Double[np2Width * np2Height];
pRealOut2 = new Double[np2Width * np2Height];
pImagOut2 = new Double[np2Width * np2Height];
 
int color = -1;
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
color = pixel[x + (y * width)];
pRealIn[x + (y * np2Width)] = DSP.Utilities.ColorToGray(color) & 0xFF;
}
}
 
DSP.FourierTransform.Compute2D((uint)np2Width, (uint)np2Height, ref pRealIn, null, ref pRealOut, ref pImagOut, false);
fingerprint = DSP.Utilities.TriangularExtraction(ref pRealOut, (uint)np2Width, (uint)np2Height, (uint)matchSamples);
}

This function is delegated to extract the key features of our image and to store the most important parameters in an array called fingerprint that will be our net's input.

Training the Net based on our choices

Finally, after our applied filter finishes all the operations we need to take the user's choices to train our neural network. When the Apply button is clicked (Button_Click) we start a thread the execute the Net training code.

private void Button_Click(object sender, RoutedEventArgs e)
{
NeuralNetworkThread = new System.Threading.Thread(TrainNeuralNetwork); NeuralNetworkThread.Start();
 
EventHandler handler = FilterApplied;
if (handler != null)
{
handler(this, e);
}
}
private void TrainNeuralNetwork()
{
ComputeFingerprint();
 
double[] output = new double[filters.Length];
for (int i = 0; i < output.Length; i++)
{
output[i] = (i == SelectedFilter) ? 1 : 0;
}
 
NeuralNetwork.Train(fingerprint, output);
 
// Save the Neural Network after the training
NeuralNetwork.Save(NeuralNetworkName);
}

As we saw earlier, to train the neural net we require an input that is the image fingerprint and an output array that corresponds to the choice made. The output is simply created by initializing an array of double with a size that is the same of the last layer of our net, setting all elements to zero except the one that correspond to the one selected.

Summary

A lot of people think of artificial intelligence as a "dark art" that requires acres of data rooms and super-computer-like performance - and in some cases that is true! However as you can see from this article, the modern smartphone has enough power to compute complex neural networks for solving real-world tasks.

I hope this article could open up developers to opportunities for creating to more "intelligent applications".

236 page views in the last 30 days.
×