×
Namespaces

Variants
Actions
(Difference between revisions)

Nokia Imaging SDK in native code

From Nokia Developer Wiki
Jump to: navigation, search
hamishwillee (Talk | contribs)
m (Hamishwillee - Subedited/Reviewed - incomplete)
hamishwillee (Talk | contribs)
m (Hamishwillee - minor tidy)
Line 71: Line 71:
 
== Creating a C++ Windows Phone Runtime component ==
 
== Creating a C++ Windows Phone Runtime component ==
 
A C++ Windows Phone Runtime component is a library that can be called from the C# managed code and has also the advantage that once developed can be shared with other applications.
 
A C++ Windows Phone Runtime component is a library that can be called from the C# managed code and has also the advantage that once developed can be shared with other applications.
Said that you have two choises create a new project outside your App project or to create a new one included in it. If you do not plan to distribute the library my suggestion is to create it inside the main project as in that way you will be easier to manage all togheter.
+
Said that you have two choises create a new project outside your App project or to create a new one included in it. If you do not plan to distribute the library my suggestion is to create it inside the main project as in that way you will be easier to manage all together.
  
 
[[File:ImagingNativeCode01.png|thumb|800px|none|Creating and adding to main project a C++ Windows Phone Runtime component]]
 
[[File:ImagingNativeCode01.png|thumb|800px|none|Creating and adding to main project a C++ Windows Phone Runtime component]]
Line 353: Line 353:
  
 
== Imaging SDK ==
 
== Imaging SDK ==
The first step is to install the Nokia imaging SDK using the following [http://developer.nokia.com/Resources/Library/Lumia/#!nokia-imaging-sdk/adding-libraries-to-the-project.html guide].
+
The first step is to install the Nokia imaging SDK by following the instructions in the Lumia Developers' Library: [http://developer.nokia.com/Resources/Library/Lumia/#!nokia-imaging-sdk/adding-libraries-to-the-project.html Download and add the libraries to the project].
  
 
The proposed process will implement an interface that allow the user to create the filter chain from the managed code and then pass the list to the native component in order to be processed in C++.
 
The proposed process will implement an interface that allow the user to create the filter chain from the managed code and then pass the list to the native component in order to be processed in C++.
 +
 
First we need to create a quick Ui to easy manage filters. We will create a list of {{Icode|ToggleSwitch}} to allow the user to select the desired filters.
 
First we need to create a quick Ui to easy manage filters. We will create a list of {{Icode|ToggleSwitch}} to allow the user to select the desired filters.
  
Let's install the install the [https://www.nuget.org/packages/WPtoolkit Windows Phone Toolkit] and add the following assembly reference:
+
Let's install the install the [http://www.nuget.org/packages/WPtoolkit Windows Phone Toolkit] and add the following assembly reference:
  
 
<code xml>
 
<code xml>
Line 410: Line 411:
 
will appear a window like that
 
will appear a window like that
  
[[File:ImagingNativeCode04.png|thumb|300px|none|]]
+
[[File:ImagingNativeCode04.png|thumb|800px|none|]]
  
 
click on 'Add new reference' and browse the '....\NokiaWikiCompetition2013Q4\packages\NokiaImagingSDK.x.x.xxx.x\lib\wp8\ARM' directory inside your project  
 
click on 'Add new reference' and browse the '....\NokiaWikiCompetition2013Q4\packages\NokiaImagingSDK.x.x.xxx.x\lib\wp8\ARM' directory inside your project  
  
[[File:ImagingNativeCode05.png|thumb|300px|none|]]
+
[[File:ImagingNativeCode05.png|thumb|800px|none|]]
  
 
select both files and add to project
 
select both files and add to project
  
[[File:ImagingNativeCode06.png|thumb|300px|none|]]
+
[[File:ImagingNativeCode06.png|thumb|800px|none|]]
  
 
now we can finally add the following namespaces
 
now we can finally add the following namespaces

Revision as of 10:08, 9 December 2013

This article explains how to use Nokia Imaging SDK in native code using C++/CX. It also explains how to use Intrinsics ARM NEON instructions to optimize performance.

Note.pngNote: This is an entry in the Nokia Imaging and Big UI Wiki Competition 2013Q4.

SignpostIcon XAML 40.png
WP Metro Icon WP8.png
Article Metadata
Tested with
SDK: Nokia Imaging SDK 1.0
Devices(s): Nokia Lumia 1020,Nokia Lumia 925, Nokia Lumia 820
Compatibility
Platform(s):
Windows Phone 8
Article
Created: galazzo (13 Nov 2013)
Last edited: hamishwillee (09 Dec 2013)

Contents

Introduction

This article shows how to work with the Nokia Imaging SDK in native code using C++/CX component extensions. This gives us the opportunity to use optimization techniques offered by the platform like parallelization, and also to use mathematical tricks to speed up computationally expensive operations (for example: division). Lastly, it allows us to use Instrinsics ARM NEON, a Single instruction, multiple data (SIMD) instruction set, to leverage the ability of modern processor to compute multiple data simultaneously. Together with the highly optimized Nokia Imaging SDK, this will give the final boost to our image processing applications.

The article first describes how to create a C++ Runtime Component by setting up the camera and basic operations in C++, following by explanation on how to use Imaging SDK. This is followed by a sections introducing optimization tips described above.

Some techniques, especially using ARM NEON, will appear difficult at first (and second) glance. While these are not easy and this time I can't say that will become easy to use soon, but I'll guide you step by step to understand how it works in order to be able to work with it.

Setup the project

To develop in native code doesn't mean we will forget C# and XAML. The C++ / CX native programming language is used for massive computations and most important operations to take advantage of better performances, but this has a cost in terms of ease of use, so the application core is still in C# and XAML and we will use techniques already known to setup the project.

Create your own project, remove all items inside the default grid and put orientation it in LandscapeLeft mode:

    ...
SupportedOrientations="Landscape" Orientation="LandscapeLeft"
shell:SystemTray.IsVisible="False">
<Grid x:Name="LayoutRoot" Background="Transparent">
 
</Grid>
</phone:PhoneApplicationPage>

Prepare the Canvas

Add the container that will show the stream coming from the camera

<Grid x:Name="LayoutRoot" Background="Transparent">
<Canvas x:Name="viewfinderCanvas" Width="640" Height="480" Tap="OnTakePhotoNativeClicked" >
<Canvas.Background>
<VideoBrush x:Name="video" Stretch="UniformToFill">
<VideoBrush.RelativeTransform>
<CompositeTransform x:Name="previewTransform" CenterX=".5" CenterY=".5" />
</VideoBrush.RelativeTransform>
</VideoBrush>
</Canvas.Background>
</Canvas>
</Grid>

RelativeTransform will be used to rotate the canvas displaying the stream when device is rotated. Now it's time to setup camera using the PhotoCaptureDevice object, so it's already time to dive in the native code. To do that we need first to create a C++ Windows Phone Runtime component.

Creating a C++ Windows Phone Runtime component

A C++ Windows Phone Runtime component is a library that can be called from the C# managed code and has also the advantage that once developed can be shared with other applications. Said that you have two choises create a new project outside your App project or to create a new one included in it. If you do not plan to distribute the library my suggestion is to create it inside the main project as in that way you will be easier to manage all together.

Creating and adding to main project a C++ Windows Phone Runtime component

Name it NativeCamera and if everything went well should appear like that:

ImagingNativeCode02.png

NativeCamera.h and NativeCamera.cpp are the main files to work on. Open NativeCamera.h. I suggest to rename the main class name with a shorter and understandable name from:

namespace NativeCamera
{
public ref class WindowsPhoneRuntimeComponent sealed
{
public:
WindowsPhoneRuntimeComponent();
};
}

to something like that:

namespace NativeCamera
{
public ref class CameraComponent sealed
{
public:
CameraComponent();
};
}

Don't forget to change the name also in the constructor inside the NativeCamera.cpp

CameraComponent::CameraComponent()
{
}

Include the following headers and namespace

#include <Windows.Phone.Media.Capture.h>
#include <Windows.Phone.Media.Capture.Native.h>
 
using namespace Windows::Phone::Media::Capture;

Now can define the camera object and some properties:

namespace NativeCamera
{
public ref class CameraComponent sealed
{
public:
CameraComponent();
 
property PhotoCaptureDevice^ CaptureDevice;
property Windows::Foundation::Size CaptureResolution;
 
property bool IsOpened
{
bool get()
{
return m_opened;
}
}
private:
bool m_opened;
};
}

Soon we can notice that we can set properties using also setter and getter like in C# and this will be public properties that will be accessible from C# code. Second for those familiar with C++ will note the strange "^" (hat) near the object like a pointer.

Objects

WinRT objects are created, or activated, using ref new and assigned to variables declared with the ^ (hat) notation inherited from C++/CLI.

Foo^ foo = ref new Foo();

Reference counting

A WinRT object is reference counted and thus handles similarly to ordinary C++ objects enclosed in shared_ptrs. An object will be deleted when there are no remaining references that can be led to it. There is no garbage collection involved.

Init Camera

In NativeCamera.h define the following functions:

  • InitCapture() to init the camera
  • GetSequence() to take the shot
  • ThrowIfFailed(HRESULT hr) to manage exceptions


and the following fields

  • m_cameraCaptureSequence a CameraCaptureSequence object used to perform capture operation
  • pNativeFrame an ICameraCaptureFrameNative object to manage captured frame
  • m_camera_sensor_location a CameraSensorLocation object to manage Back or Front camera init
  • pBuffer a byte* pointer to manage raw captured data
  • m_bufferSize to store the size of raw captured buffer


public ref class CameraComponent sealed
{
public:
CameraComponent();
 
property PhotoCaptureDevice^ CaptureDevice;
property Windows::Foundation::Size CaptureResolution;
 
public:
IAsyncAction^ InitCapture();
IAsyncAction^ GetSequence();
 
private:
bool m_opened;
 
CameraCaptureSequence^ m_cameraCaptureSequence;
ICameraCaptureFrameNative *pNativeFrame;
 
CameraSensorLocation m_camera_sensor_location;
 
DWORD m_bufferSize;
byte* pBuffer;
 
private:
static void ThrowIfFailed(HRESULT hr);
};

In NativeCamera.cpp add the the following code:

#include "pch.h"
#include "NativeCamera.h"
 
#include <ppltasks.h>
 
using namespace NativeCamera;
using namespace Platform;
 
using namespace concurrency;
 
CameraComponent::CameraComponent()
{
}
 
IAsyncAction^ CameraComponent::InitCapture()
{
return create_async([this]
{
if( m_opened == true )
{
delete CaptureDevice;
m_opened = false;
}
 
Windows::Foundation::Collections::IVectorView<Size> ^availableSizes = PhotoCaptureDevice::GetAvailableCaptureResolutions(m_camera_sensor_location);
Windows::Foundation::Collections::IIterator<Windows::Foundation::Size> ^availableSizesIterator = availableSizes->First();
 
IAsyncOperation<PhotoCaptureDevice^> ^openOperation = nullptr;
if (availableSizesIterator->HasCurrent)
{
CaptureResolution = availableSizesIterator->Current;
 
int size = (int) (CaptureResolution.Width * CaptureResolution.Height*4);
pBuffer = new byte[size];
 
memset(pBuffer, 0, size);
 
openOperation = PhotoCaptureDevice::OpenAsync(m_camera_sensor_location, availableSizesIterator->Current);
m_opened = true;
} else
{
throw ref new FailureException("Can't open the camera");
}
 
return create_task(openOperation).then([this](PhotoCaptureDevice^ photoCaptureDevice)
{
::OutputDebugString(L"+[WindowsPhoneRuntimeComponent::InitCapture] => OpenAsync Completed\n");
this->CaptureDevice = photoCaptureDevice;
 
m_cameraCaptureSequence = CaptureDevice->CreateCaptureSequence(1);
 
return CaptureDevice->PrepareCaptureSequenceAsync(m_cameraCaptureSequence);
 
}).then([]()
{
::OutputDebugString(L"+[WindowsPhoneRuntimeComponent::InitCapture] => PrepareAsync Completed\n");
});
});
}
 
IAsyncAction^ CameraComponent::GetSequence()
{
return concurrency::create_async([this]()
{
if (pNativeFrame != NULL)
{
pNativeFrame->UnmapBuffer();
}
 
create_task( CaptureDevice->FocusAsync() ).wait();
 
create_task( m_cameraCaptureSequence->StartCaptureAsync() ).then([this]()
{
::OutputDebugString(L"+[HDR RuntimeComponent::CaptureImage] => Capture Sub exposed frame Completed \n");
 
CameraCaptureFrame^ frame = m_cameraCaptureSequence->Frames->GetAt(0);
 
HRESULT hr = reinterpret_cast<IUnknown*>(frame)->QueryInterface(__uuidof(ICameraCaptureFrameNative ), (void**) &pNativeFrame);
 
if (NULL == pNativeFrame || FAILED(hr))
{
throw ref new FailureException("Unable to QI ICameraCaptureFrameNative");
}
 
m_bufferSize=0;
byte* pixelBuffer=NULL;
pNativeFrame->MapBuffer(&m_bufferSize, &pixelBuffer); // Pixels are in pBuffer.
memcpy(pBuffer, pixelBuffer, m_bufferSize);
 
}, task_continuation_context::use_current()).wait();
 
});
}
 
inline void CameraComponent::ThrowIfFailed(HRESULT hr)
{
if (FAILED(hr))
{
// Set a breakpoint on this line to catch Win32 API errors.
throw Platform::Exception::CreateException(hr);
}
}

Adding reference to the main project

Once created the component in order to be used by the managed main project must be added to the reference list. Compile the NativeCamera by right click on the Project Name -> Build. Do not forget that the target must be ARM. Then add reference to NokiaWikiCompetition2013Q4 and browse the inner directory:

..\NokiaWikiCompetition2013Q4\ARM\Release\NativeCamera

select NativeCamera.winmd

Init native component in managed code

In order to use the component it must be instantiated by managed code. I used to put it into App.xaml.cs for commodity but of course can be instantiated where you need or you think is better. Everywhere you put it this how the code

public static NativeCamera.CameraComponent Camera = new NativeCamera.CameraComponent();

let's import the namespace

using Windows.Phone.Media.Capture;   // For advanced capture APIs
using Microsoft.Xna.Framework.Media; // For the media library
using System.IO; // For the memory stream
 
using NativeCamera;

create a PhotoCaptureDevice to get access to the the camera object created in native code

private PhotoCaptureDevice ManagedCmera;
protected override async void OnNavigatedTo(NavigationEventArgs e)
{
await App.Camera.InitCapture();
ManagedCamera = App.Camera.CaptureDevice;
}

Imaging SDK

The first step is to install the Nokia imaging SDK by following the instructions in the Lumia Developers' Library: Download and add the libraries to the project.

The proposed process will implement an interface that allow the user to create the filter chain from the managed code and then pass the list to the native component in order to be processed in C++.

First we need to create a quick Ui to easy manage filters. We will create a list of ToggleSwitch to allow the user to select the desired filters.

Let's install the install the Windows Phone Toolkit and add the following assembly reference:

xmlns:toolkit="clr-namespace:Microsoft.Phone.Controls;assembly=Microsoft.Phone.Controls.Toolkit"

now you can add the following code into your xaml

<ScrollViewer VerticalAlignment="Top" HorizontalAlignment="Left" Width="213" Height="460" Margin="0,10,0,0">
<Grid Width="180">
<Grid.RowDefinitions>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
<RowDefinition Height="Auto"/>
</Grid.RowDefinitions>
<toolkit:ToggleSwitch Grid.Row="1" Name="RadRedColorAdjust" Header="Red Adjust Color" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="2" Name="RadColorBoost" Header="Color Boost" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="3" Name="RadCartoon" Header="Cartoon" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="4" Name="RadSketch" Header="Sketch" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="5" Name="RadBlur" Header="Blur" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="6" Name="RadLomo" Header="Lomo" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="7" Name="RadSepia" Header="Sepia" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="8" Name="RadSolarize" Header="Solarize" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="9" Name="RadStamp" Header="Stamp" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="10" Name="RadNegative" Header="Negative" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="11" Name="RadContrast" Header="Contrast" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="12" Name="RadBrightness" Header="Brightness" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="13" Name="RadTemperature" Header="Temperature" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
<toolkit:ToggleSwitch Grid.Row="14" Name="RadTint" Header="Tint" Width="240" HorizontalAlignment="Left" IsChecked="False"/>
</Grid>
</ScrollViewer>

Adding reference to Imaging SDK to native component

This is one the the most important steps, adding the reference to Nokia Imaging SDK to native component. Right click on the NativeCamera project and click on References...

ImagingNativeCode03.png

will appear a window like that

ImagingNativeCode04.png

click on 'Add new reference' and browse the '....\NokiaWikiCompetition2013Q4\packages\NokiaImagingSDK.x.x.xxx.x\lib\wp8\ARM' directory inside your project

ImagingNativeCode05.png

select both files and add to project

ImagingNativeCode06.png

now we can finally add the following namespaces

using namespace Nokia::Graphics::Imaging;
using namespace Nokia::InteropServices::WindowsRuntime;

C++ Interface

It's now time to create an interface in native componet to pass the filter chain. The approach used is to create an IVector<IFilter^>^ that will be seen in managed code as IList<IFilter>.

In NativeCamera.h let's add the following code line:

using namespace Windows::Foundation::Collections;
property IVector<IFilter^>^ Filters;

the next step will be to create the Process() function, responsable to process all image manipulation operations like to apply filters and the final rendering

IAsyncActionWithProgress<int>^ CameraComponent::Process()
{
return concurrency::create_async([this] (progress_reporter<int> reporter, cancellation_token ct)
{
cancellation_token_source cts; auto token = cts.get_token();
 
BitmapImageSource^ bis = nullptr;
 
Bitmap^ BitmapToProcess = AsBitmapNV12(pBuffer, (unsigned int) CaptureResolution.Width, (unsigned int) CaptureResolution.Height );
bis = ref new BitmapImageSource(BitmapToProcess);
 
reporter.report(20);
 
FilterEffect^ fe = ref new FilterEffect(bis);
if( Filters->Size > 0 ) fe->Filters = Filters;
BitmapRenderer^ editSession = ref new BitmapRenderer(fe, BitmapToProcess);
 
reporter.report(80);
 
create_task( editSession->RenderAsync() ).then( [this](Bitmap^ _result)
{
Result = _result;
}, token).wait();
 
reporter.report(100);
});
}

concurrency::create_async togheter with IAsyncActionWithProgress<int>^ are responsible to use the async / await pattern in managed code. More interesting IAsyncActionWithProgress<int>^ that allow us to implement the very cool and useful action that report to progress of our operations. progress_reporter<int> is the variable to set. In that example we used an int type, but being a template you can set other types like string, double.

The method to set the value is report i.e ( reporter.report(80). Of course the values from 0 to 100 is not mandatory, you can set the value you want. You would set until reporter.report(80) as process continue to managed code for some reasons.

The rest of the code shoud be familiar and already documented except AsBitmapNV12. When a shot is taken the raw buffer is not in RGB format or jpeg compressed, but in NV12 format. AsBitmapNV12 convert the raw buffer to Nokia::Graphics::Imaging::Bitmap. Before to explain how it works we must first understand how NV12 format works.

NV12 Format

In the NV12 format all of the Y samples appear first in memory as an array of unsigned char values with an even number of lines. The Y plane is followed immediately by an array of unsigned char values that contains packed U (Cb) and V (Cr) samples of size 1/2 of Y size. When the combined U-V array is addressed as an array of little-endian WORD values, the LSBs contain the U values, and the MSBs contain the V values. The following illustration shows the Y plane and the array that contains packed U and V samples.

ImagingNativeCode07.png

Since the human eye is more responsive to brightness than it is to colour, many lossy image compression formats throw away half or more of the samples in the chroma channels to reduce the amount of data to deal with, without severely destroying the image quality.

In that way each pixel in UV is in common with the four relative pixels in Y.

ImagingNativeCode08.png

The first approach could be annoyng as accustomed with the more intuitive and nearest to real life RGB format and it's normal that your first temptation is to think on how to convert the buffer into RGB, but the advantages for developers in terms of performances and optimization worth all the initial effort to learn and to to directly into NV12.

One cool aspect of YUV ( or NV12 ) is that you can throw out the U and V components and get a grey-scale image and process almost all algorithms just on Y component with so to speed up up to 3x compared to RGB.

Blending YUV colors

Blending between colours in YUV form is very easy, and shouldn't require any conversions to other colorspaces. In fact, blending in YUV is the same as blending in RGB; Just interpolate between the components.

For example, to mix two colours in equal parts, the result will be:

(Y1+Y2)/2, (U1+U2)/2, (V1+V2)/2

YUV - RGB Conversion

If despite all you prefere to work with RGB format you can convert using the following formula assuming U and V are unsigned bytes:

R = Y + 1.4075 * (V - 128)
G = Y - 0.3455 * (U - 128) - (0.7169 * (V - 128))
B = Y + 1.7790 * (U - 128)
 
Y = R * .299000 + G * .587000 + B * .114000
U = R * -.168736 + G * -.331264 + B * .500000 + 128
V = R * .500000 + G * -.418688 + B * -.081312 + 128

These aren't perfect inverses of each other and you can find some different implementations all around the web, but basically this is the approach, depend on how much precision you need. In my opinion a value of 135 compared with 136 is not a big problem if brings performances improvements.

Wrapping buffers into a Bitmap

Now we know how works NV12, but to use it with Imaging SDK we must convert the char* raw buffer into an Nokia::Graphics::Imaging::Bitmap. Far to be easy to find a solution to that, but starting from the article How to wrap a char* buffer in a WinRT IBuffer in C++ I made some customization in order to be used with Imaging SDK. Basically we first need to implement the following class

class NativeBuffer : public Microsoft::WRL::RuntimeClass<Microsoft::WRL::RuntimeClassFlags<Microsoft::WRL::RuntimeClassType::WinRtClassicComMix>,
ABI::Windows::Storage::Streams::IBuffer,
Windows::Storage::Streams::IBufferByteAccess>
{
public:
virtual ~NativeBuffer()
{
}
 
STDMETHODIMP RuntimeClassInitialize(byte *buffer, UINT totalSize)
{
m_length = totalSize;
m_buffer = buffer;
return S_OK;
}
 
STDMETHODIMP Buffer(byte **value)
{
*value = m_buffer;
return S_OK;
}
 
STDMETHODIMP get_Capacity(UINT32 *value)
{
*value = m_length;
return S_OK;
}
 
STDMETHODIMP get_Length(UINT32 *value)
{
*value = m_length;
return S_OK;
}
 
STDMETHODIMP put_Length(UINT32 value)
{
m_length = value;
return S_OK;
}
 
private:
UINT32 m_length;
byte *m_buffer;
};

The NativeBuffer class will be responsible to help us on conversion. It's well known it'not easy and a king and almost tricky but this is at time of writing and is OS dependant doesn't come from Imaging SDK.

NV12 to bitmap

Bitmap^ CameraComponent::AsBitmapNV12(unsigned char* source, unsigned int width, unsigned int height)
{
int totalDimensionLength = width * height;
 
//Y buffer will be having a length of Width x Height
int yBufferLength = totalDimensionLength;
 
//UV Buffer will be Width/2 and Height/2 and each will take 2 bytes
int UVLength = (int) ((double) totalDimensionLength / 2);
 
int size = yBufferLength + UVLength; //NV12 buffer will be returned from the camera.
 
ComPtr<NativeBuffer> nativeBuffer;
MakeAndInitialize<NativeBuffer>(&nativeBuffer, (byte *) source, yBufferLength);
auto iinspectable = (IInspectable *)reinterpret_cast<IInspectable *>(nativeBuffer.Get());
IBuffer ^bufferY = reinterpret_cast<IBuffer ^>(iinspectable);
 
MakeAndInitialize<NativeBuffer>(&nativeBuffer, (byte *) source+yBufferLength,UVLength);
iinspectable = (IInspectable *)reinterpret_cast<IInspectable *>(nativeBuffer.Get());
IBuffer ^bufferUV = reinterpret_cast<IBuffer ^>(iinspectable);
 
nativeBuffer = nullptr;
 
Platform::Array<unsigned int, 1U>^ inputScanlines = ref new Platform::Array<unsigned int>(2); // for NV12 2 planes Y and UV.
Platform::Array<IBuffer^, 1U>^ inputBuffers = ref new Platform::Array<IBuffer^>(2);
 
//setting the input Buffers according to NV12 format
inputBuffers[0] = bufferY ;
inputBuffers[1] = bufferUV;
 
inputScanlines[0] = (unsigned int) width; // YBuffer, w items of 1 byte long
inputScanlines[1] = (unsigned int) width; // UVBuffer, Each UV is 2 bytes long, and there are w/2 of them.
 
return ref new Bitmap(Windows::Foundation::Size((float) width, (float) height), ColorMode::Yuv420Sp, inputScanlines, inputBuffers);
}

In my opinion this function, together with the customized ARM NEON memcpy is the most useful and important as open the door to use Imaging SDK.

Here other useful functions to convert WinRT objects to byte* and vice versa.

Native byte array to ARGB Bitmap

// Encode an ARBG byte array into a Bitmap image.
Bitmap^ CameraComponent::AsBitmapARGB(unsigned char* source, unsigned int width, unsigned int height, unsigned int pixelSize)
{
int length = (int) (width * height) * pixelSize; //PIXELSIZEINBYTES;
 
Buffer^ bitmapBuffer = ref new Buffer(length);
 
Object^ obj = bitmapBuffer;
ComPtr<IInspectable> insp(reinterpret_cast<IInspectable*>(obj));
ComPtr<IBufferByteAccess> bufferByteAccess;
ThrowIfFailed(insp.As(&bufferByteAccess));
unsigned char* pixels = nullptr;
ThrowIfFailed(bufferByteAccess->Buffer( &pixels ));
memcpy(pixels, source, length);
 
Windows::Foundation::Size size((float)width, (float)height);
Bitmap^ bitmap = ref new Bitmap( size, ColorMode::Argb8888, width * pixelSize, bitmapBuffer);
 
return bitmap;
}

Buffer to native byte array

// Get data from Buffer
unsigned char* CameraComponent::AsArray(Buffer^ source)
{
Object^ obj = source;
ComPtr<IInspectable> insp(reinterpret_cast<IInspectable*>(obj));
ComPtr<IBufferByteAccess> bufferByteAccess;
ThrowIfFailed(insp.As(&bufferByteAccess));
unsigned char* pixels = nullptr;
ThrowIfFailed(bufferByteAccess->Buffer( &pixels ));
 
return pixels;
}

IBuffer to native byte array

unsigned char* CameraComponent::FromIBuffer(Windows::Storage::Streams::IBuffer^ outputBuffer)
{
// Com magic to retrieve the pointer to the pixel buffer.
Object^ obj = outputBuffer;
ComPtr<IInspectable> insp(reinterpret_cast<IInspectable*>(obj));
ComPtr<IBufferByteAccess> bufferByteAccess;
insp.As(&bufferByteAccess);
ThrowIfFailed(insp.As(&bufferByteAccess));
unsigned char* pixels = nullptr;
ThrowIfFailed(bufferByteAccess->Buffer(&pixels));
 
return pixels;
}

How to apply Imaging SDK filters in C++

At this point we should have all tools and knowledge to finally start playing with Imaging SDK filters in native code. The way to go is not far from the same syntax we use in managed code and is intuitive.

// Convert the captured char* raw buffer into Bitmap
Bitmap^ ImageToProcess = AsBitmapNV12(pBuffer, (unsigned int) CaptureResolution.Width, (unsigned int) CaptureResolution.Height );
 
// Create a BitmapImageSource as needed by Imaging SDK
BitmapImageSource^ bis = ref new BitmapImageSource(ImageToProcess);
 
// Create the filter manager object
FilterEffect^ fe = ref new FilterEffect(bis);
 
// Assign the filter list to apply to filter manager
if( Filters->Size > 0 ) fe->Filters =Filters;
 
// Final rendering into a Bitmap
BitmapRenderer^ editSession = ref new BitmapRenderer(fe, ImageToProcess);
create_task( editSession->RenderAsync() ).then( [this](Bitmap^ _result)
{
Result = _result;
}, token).wait();


Using native component from managed code

The component has been designed so far to execute all computations in C++ / CX exposing an interface to set filters to apply. The following code shows how to to do that:

private async void OnTakePhotoNativeClicked(object sender, RoutedEventArgs e)
{
// Init the MemoryStream buffer to save the result
resultStream = null;
resultStream = new MemoryStream();
 
// Clean filter list
App.Camera.Filters.Clear();
 
// Add selected filters to the list
if (RadColorBoost.IsChecked == true) App.Camera.Filters.Add(new ColorBoostFilter(1.0));
if (RadCartoon.IsChecked == true) App.Camera.Filters.Add(new CartoonFilter(true));
if (RadLomo.IsChecked == true) App.Camera.Filters.Add(new LomoFilter(0.5, 0.5, LomoVignetting.Medium, LomoStyle.Blue));
if (RadSepia.IsChecked == true) App.Camera.Filters.Add(new SepiaFilter());
if (RadStamp.IsChecked == true) App.Camera.Filters.Add(new StampFilter(3, 0.5));
if (RadNegative.IsChecked == true) App.Camera.Filters.Add(new NegativeFilter());
if (RadTemperature.IsChecked == true) App.Camera.Filters.Add(new TemperatureAndTintFilter(0.5, 0));
if (RadRedColorAdjust.IsChecked == true) App.Camera.Filters.Add(new ColorAdjustFilter(0, 0, 0.1));
if (RadContrast.IsChecked == true) App.Camera.Filters.Add(new ContrastFilter(0.6));
if (RadBrightness.IsChecked == true) App.Camera.Filters.Add(new BrightnessFilter(0.6));
if (RadBlur.IsChecked == true) App.Camera.Filters.Add(new BlurFilter(15));
 
Dispatcher.BeginInvoke(() => MessagesToUser.Visibility = System.Windows.Visibility.Visible);
 
// Process the chain
var asyncProcessAction = App.Camera.Process();
 
// Manage progress. Value are setted inside the native code
asyncProcessAction.Progress = new AsyncActionProgressHandler<int>((action, progress) =>
{
Deployment.Current.Dispatcher.BeginInvoke(delegate()
{
pbar.Value = progress;
switch (progress)
{
case 20: MessagesToUser.Text = "Processing image " + progress + " %"; break;
case 30: MessagesToUser.Text = "Processing image " + progress + " %"; break;
case 40: MessagesToUser.Text = "Processing image " + progress + " %"; break;
default: MessagesToUser.Text = "Rendering " + progress + " %"; break;
}
});
});
await asyncProcessAction;
 
// Render the result
WriteableBitmap result = new WriteableBitmap((int)App.Camera.CaptureResolution.Width, (int)App.Camera.CaptureResolution.Height);
BitmapImageSource btmSrc = new BitmapImageSource(App.Camera.Result);
WriteableBitmapRenderer wbRender = new WriteableBitmapRenderer(btmSrc, result);
await wbRender.RenderAsync();
 
Dispatcher.BeginInvoke(() => MessagesToUser.Visibility = System.Windows.Visibility.Collapsed);
PreviewImage.Source = result; result.Invalidate();
}

AsyncActionProgressHandler represents a method that handles progress update events of an asynchronous action that provides progress updates. To use it we should install the Microsoft.Bcl.Async that can be done from NuGet.

Right click on project name and select Manage NuGet Packages...

ImagingNativeCode12.png

search for Microsoft Bcl Async and install it

ImagingNativeCode11.png

Optimizations

So far our optimization has been focused to switch from C# to C++ / CX and this for sure give us a tangible improvement, but to get an impressive boost C++ alone is not enough. We need take into account some techniques to use together with C++ like:

  • Using the so called magic numbers to speed up divisions
  • Using math properties to speed up important operations ( i.e sqrt )
  • Using SIMD intrinsics ARM NEON instructions

Magic Numbers

In image processing division is one of the most used operations and unfortunately one of most expensive. For sure one of the most common division is that by 3, let you think when calculate the RGB average ((R+G+B)/3). So division is one the cause of performance falls and would be great to have a solution to speed up that operation. The solution come by leveraging the fact that CPUs perform bit shift operation in a very fast way, uncomparable with other operations, that multiplication is fastest of division and the so called magic numbers.

Magic number are constant numbers with the feature that multiplied by our number and right shifting by some const bits perform our division operation. Of course there is a but else wouldn't be a reason why this approach is not used day by day and is the accuracy.

Divisions performed in that way will not have the same accuracy of ones performed as float we are accustomed to do. This could be a problem in scientific calculations where an accuracy of 0.0000000001 in some cases make the difference, but in our case even if the result of our division is 34 instead of 35.02378 ( that would be rounded to 35 anyway ) is not a problem at all compared to benefits.

Just to have an idea on a Snapdragon™ S4 in my HDR I - Implementing High Dynamic Range filters using Nokia Imaging SDK project having to process three shots and blending I saved something more than 3 seconds compared to the C# not optimized version.

Division by 3

To perform this operation we use the following values:

  • magic number - 0xAAAB
  • right shift - 17

Suppose to compute the average of the following RGB values (137,78,246). The result will be ((137 + 78 + 246 ) / 3) = ( 461 / 3 ) = 153. Using our solution the result will be (( 461 * 0xAAAB ) >> 17) = ((461*43691)>>17) = ( 20141551 >> 17 ) = 153.

That can be put inside a MACRO.

Of course for few operation you will not be able to see differences compared to standard solution, but consider it multiplied by millions of pixels there are in a image and be sure you will see an improvement.

0xAAAB is not the only magic number to divide by three. It is the smallest one give a very good accuracy, but you would use a smaller number in some circumstances for example when using SIMD ARM NEON instructions be be sure to stay inside the registers size.

In that case you can use the following values

  • magic number - 0x55
  • right shift - 8

As explained the following values are easier to treat but brings low accuracy.

Consider to perform the simple division 30 / 3 = 10 then we will have ((30*0x55)>8) = ((30*85)>8) = 9.

In a lot of circumstances the gap is unacceptable, but in my opinion, in a pixel context, there is no difference between 9 and 10. It's up to you to decide which magic number is more suitable if the bigger but accurate 0xAAAB or smaller but not so accurate 0x55.

Division by 2

To perform a division by 2 you don't need any magic number, but simply right shift by 1 your value

(255/2) = 127

with our approach

( 255 >> 1 ) = 127

Each time you shift by 1 you perform a division by 2, so if for example you want to divide by 4, perform a shift by 2 or better

(255 / 4) = 63

using shifts

(255 >>2 ) = 63

And so on.

All around the web you can search and find other magic numbers to perform other operations. We focused on that useful for us.

Fast Square Root

Another operation always present in computer graphics is the square root ( sqrt in C++ ). I think is useless to speek about how expensive operation it is, so as it's so important for us let's see how to improve it. We will assume that the float is in the [IEEE 754 single precision floating point format and the approach will be to treat it as an int leveraging the bit organization.

I leave to you to deepen the math behind that solution redirecting to the following reading.

The first approach is based on the method of reciprocal square root or better sqrt(x) = x−½.

The algorithm accepts a 32-bit floating point number as the input and stores a halved value for later use. Then, treating the bits representing the floating point number as a 32-bit integer, a logical shift right of one bit is performed and the result subtracted from the magic number 0x5f3759df. This is the first approximation of the inverse square root of the input. Treating the bits again as floating point it runs one iteration of Newton's method to return a more precise approximation. This computes an approximation of the inverse square root of a floating point number approximately four times faster than floating point division.

float sqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
 
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y;
i = 0x5f3759df - ( i >> 1 );
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
 
return y;
}

You can add as many Newton's method iteration you want to increase the accuracy, but of course the much accuracy you want the much performance fall you get.

Another method is not based on reciprocal and magic numbers but the approach is very similar and is based on the implementation of the following formula:

((((val_int / 2^m) - b) / 2) + b) * 2^m = ((val_int - 2^m) / 2) + ((b + 1) / 2) * 2^m)

Where b is the exponent bias and m the number of mantissa bits. Here the code implement it:

float sqrt(float z)
{
int val_int = *(int*)&z; // Same bits, but as an int
 
val_int -= 1 << 23; // Subtract 2^m
val_int >>= 1; // Divide by 2
val_int += 1 << 29; // Add ((b + 1) / 2) * 2^m
 
return *(float*)&val_int; // Interpret again as float
}

Both methods improve performances. The first method offers the possibility to reach more accuracy but does a bit of operation more then the second one that of course as counter part should offer better performances in terms of speed. All depend on how many operations you think to perform. Said that the difference between both methods is not so great and that both brings dramatic performance improvements, if your custom algorithm need to performs millions of square root operation in my opinion is reasonable to sacrifice accuracy and use the second one.

Again in my opinion these solutions are simply awesome! The dark math world sometimes produces miracles and the smaller accuracy worth all benefits in performance gains. Anyway there are different point of views from purists and not. At the end it's up to you the final decision.

ARM NEON memcpy

#if defined(_M_ARM)
#include <arm_neon.h>
#endif
void ARM::memcpy(void* Dest, void* Source, int length)
{
int arm_length = length / 16;
 
uint8 * src = (uint8 *) Source;
uint8 * dest = (uint8 *) Dest;
uint8x16_t buffer;
 
for( int i = 0 ; i < arm_length; i++ )
{
buffer= vld1q_u8(src);
vst1q_u8(dest, buffer);
 
src +=16;
dest +=16;
 
__prefetch(src);
}
 
int gap = length - (arm_length*16);
if(gap > 0 )
memcpy((byte*) Dest+arm_length, (byte*)Source+arm_length, gap);
}

ARM NEON - Convert to grayscale

void Utilities::ConvertToGrayNeon( unsigned char* inputBuffer, unsigned char* outputBuffer, int length)
{
uint8 * src = (uint8 *) inputBuffer;
uint8 * dest = (uint8 *) outputBuffer;
 
int n = length;
 
uint8x8_t rfac = vdup_n_u8 (77);
uint8x8_t gfac = vdup_n_u8 (151);
uint8x8_t bfac = vdup_n_u8 (28);
n/=8;
 
uint8x8x4_t interleaved;
interleaved.val[0] = vdup_n_u8 (0xFF); //Alpha value
 
for (int i=0; i < n; i++)
{
uint16x8_t temp;
uint8x8x4_t rgb = vld4_u8 (src);
 
temp = vmull_u8 (rgb.val[1], rfac);
temp = vmlal_u8 (temp,rgb.val[2], gfac);
temp = vmlal_u8 (temp,rgb.val[3], bfac);
 
interleaved.val[1] = vshrn_n_u16 (temp, 8);
interleaved.val[2] = interleaved.val[0];
interleaved.val[3] = interleaved.val[0];
 
vst4_u8 (dest, interleaved);
src += 8*4;
dest += 8*4;
}
}

ARM NEON - Blending images

byte* HDR::BlendArmNeon(unsigned char* image1,
unsigned char* image2,
unsigned char* image3,
DSP::CartesianPoint displacement1,
DSP::CartesianPoint displacement2)
{
int _width = (int) CaptureResolution.Width;
int _height = (int) CaptureResolution.Height;
int image_size = _width * _height;
 
int displacement1_X=0, displacement1_Y=0, displacement2_X=0, displacement2_Y=0;
double y1=0, y2=0, y3=0;
double totalLumaWeights = 0;
int r=0, g=0, b=0;
 
cancellation_token_source cts;
auto token = cts.get_token();
unsigned char *result = new unsigned char[_width*_height];
 
concurrency::parallel_for(0, _height, [this, image1, image2, image3, &result, _height, _width, &displacement1, &displacement2](int y)
{
int line = y*_width;
 
int size = _width * _height;
 
int m_displacement_1 = (int) (displacement1.X() + (displacement1.Y() * _width));
int m_displacement_2 = (int) (displacement2.X() + (displacement2.Y() * _width));
 
if( (m_displacement_1 < 0) || (line + _width + m_displacement_1 >= size) ) m_displacement_1 = 0;
if( (m_displacement_2 < 0) || (line + _width + m_displacement_2 >= size) ) m_displacement_2 = 0;
 
int arm_neon_length = _width / 8;
 
//uint8x8_t divideby3 = vdup_n_u8 (0xAAAb);
uint16x8_t divideby3 = vdupq_n_u16 (0x55);
 
uint8 * m_image1 = (uint8 *) image1 + line;
uint8 * m_image2 = (uint8 *) image2 + line + m_displacement_1;
uint8 * m_image3 = (uint8 *) image3 + line + m_displacement_2;
uint8 * dest = (uint8 *) result + line;
 
uint8x8_t y_image1;
uint8x8_t y_image2;
uint8x8_t y_image3;
 
uint16x8_t y_temp;
uint8x8_t result_temp;
 
for (int x = 0; x < arm_neon_length ; x++)
{
y_image1 = vld1_u8 (m_image1);
y_image2 = vld1_u8 (m_image2);
y_image3 = vld1_u8 (m_image3);
 
y_temp = vaddl_u8(y_image1, y_image2);
y_temp = vaddq_u16(y_temp, vmovl_u8(y_image3));
y_temp = vmulq_u16(y_temp, divideby3);
result_temp = vshrn_n_u16 (y_temp, 8);
 
vst1_u8(dest, result_temp);
 
m_image1 += 8;
m_image2 += 8;
m_image3 += 8;
dest += 8;
}
}
);
 
return result;
}
void HDR::BlendARGBArmNeon(Buffer^ ne_bitmapBuffer,
Buffer^ se_bitmapBuffer,
Buffer^ oe_bitmapBuffer,
DSP::CartesianPoint displacement1,
DSP::CartesianPoint displacement2)
{
int _width = (int) CaptureResolution.Width;
int _height = (int) CaptureResolution.Height;
int image_size = _width * _height;
int scanline = _width * PIXELSIZEINBYTES;
 
int displacement1_X=0, displacement1_Y=0, displacement2_X=0, displacement2_Y=0;
double y1=0, y2=0, y3=0;
double totalLumaWeights = 0;
int r=0, g=0, b=0;
 
cancellation_token_source cts;
auto token = cts.get_token();
 
unsigned char* image1 = AsArray(ne_bitmapBuffer);
unsigned char* image2 = AsArray(se_bitmapBuffer);
unsigned char* image3 = AsArray(oe_bitmapBuffer);
unsigned char* result = new unsigned char[scanline*_height];
 
concurrency::parallel_for(0, _height, [this, image1, image2, image3, &result, _height,_width, scanline, &displacement1, &displacement2](int y)
{
int line = y*scanline;
int size = scanline * _height;
 
int m_displacement_1 = (int) (displacement1.X()*4 + (displacement1.Y() * scanline));
int m_displacement_2 = (int) (displacement2.X()*4 + (displacement2.Y() * scanline));
 
if( (m_displacement_1 < 0) || (line + scanline + m_displacement_1 >= size) ) m_displacement_1 = 0;
if( (m_displacement_2 < 0) || (line + scanline + m_displacement_2 >= size) ) m_displacement_2 = 0;
 
int arm_neon_length = _width / 8;
 
//uint8x8_t divideby3 = vdup_n_u8 (0xAAAb);
uint16x8_t divideby3 = vdupq_n_u16 (0x55);
 
uint8 * m_image1 = (uint8 *) image1 + line;
uint8 * m_image2 = (uint8 *) image2 + line + m_displacement_1;
uint8 * m_image3 = (uint8 *) image3 + line + m_displacement_2;
uint8 * dest = (uint8 *) result + line;
 
uint8x8x4_t interleaved;
interleaved.val[0] = vdup_n_u8 (0xFF); //Alpha value
 
for (int x = 0; x < arm_neon_length ; x++)
{
uint16x8_t temp_r;
uint16x8_t temp_g;
uint16x8_t temp_b;
 
uint8x8x4_t rgb_image1 = vld4_u8 (m_image1);
uint8x8x4_t rgb_image2 = vld4_u8 (m_image2);
uint8x8x4_t rgb_image3 = vld4_u8 (m_image3);
 
temp_r = vaddl_u8(rgb_image1.val[1], rgb_image2.val[1]);
temp_g = vaddl_u8(rgb_image1.val[2], rgb_image2.val[2]);
temp_b = vaddl_u8(rgb_image1.val[3], rgb_image2.val[3]);
 
temp_r = vaddw_u8(temp_r, rgb_image3.val[1]);
temp_g = vaddw_u8(temp_g, rgb_image3.val[2]);
temp_b = vaddw_u8(temp_b, rgb_image3.val[3]);
 
temp_r = vmulq_u16(temp_r, divideby3);
temp_g = vmulq_u16(temp_g, divideby3);
temp_b = vmulq_u16(temp_b, divideby3);
 
interleaved.val[1] = vshrn_n_u16 (temp_r, 8);
interleaved.val[2] = vshrn_n_u16 (temp_g, 8);
interleaved.val[3] = vshrn_n_u16 (temp_b, 8);
 
vst4_u8 (dest, interleaved);
 
m_image1 += 8*4;
m_image2 += 8*4;
m_image3 += 8*4;
dest += 8*4;
}
}
);
}
class CartesianPoint
{
public:
CartesianPoint()
{
_x=0;
_y=0;
}
 
CartesianPoint(double X, double Y)
{
this->_x=X;
this->_y=Y;
}
 
CartesianPoint(CartesianPoint& other ) : _x(other._x), _y(other._y) {}
virtual ~CartesianPoint(void){}
 
CartesianPoint& CartesianPoint::operator=(const CartesianPoint& p1)
{
_x = p1._x ;
_y = p1._y ;
return *this ;
}
 
double X() { return _x; }
double Y() { return _y; }
 
void setX(double value){_x=value;}
void setY(double value){_y=value;}
 
protected:
double _x;
double _y;
};

Summary

259 page views in the last 30 days.