Symbian C++ Performance Tips

From Nokia Developer Wiki
Jump to: navigation, search
Article Metadata
Created: hugovk (08 May 2011)
Last edited: hamishwillee (23 Jul 2013)

In today’s world of ever-increasing bandwidth and ‘cool’ features to use it, we find ever-increasing demands on smartphone software to deliver throughput and processing capabilities. In order to deliver on these demands, the software has to be constructed to be as efficient as possible. The problem may not be as critical as ‘every cycle counts’, but there are a number of simple things to look out for and bear in mind.


What is performance

Performance is a number of measurable characteristics that a device can display, be it boot time, ROM size, RAM usage, viewing a picture or battery life. The usage and features of a device can often dictate desirable values for these characteristics. In order to satisfy these the software has to be designed and implemented accordingly.

Why it matters

Often when performance is important, the standard solution is to increase CPU speed, or dedicate large amounts of RAM to caching solutions. Neither of these options is really open to mobile phone manufacturers, as devices have to be built with battery and cost limitations in mind.

Performance killers

Most of the performance problems seen on smartphones fall into one or other of a small set of problems. It is this subset that we will explore in this article.

Too much code, not enough data

Very often during development, an application has some parameters that affect its behavior. These are often stored in a file which is processed during startup.

The problem arises when this configuration becomes static, either through a finalization of requirements or settling on a set of default values. At this point the parameters could be hard-coded, but often that can mean refactoring an application:

void SomeCode(void)
// Open file
// New ReadStream
// Read some data members into a struct
// Close file, etc.

It can also occur when a data structure is created on the heap, which could have been generated at build-time:

void ConstructL(void)
TFuncTable *fns = new (ELeave) TFuncTable;
fns->iFunc1 = ExampleFunc1;
fns->iFunc2 = ExampleFunc2;
fns->iFunc3 = ExampleFunc3;
iFuncTable = fns;

In this example it would be better to have a const copy of the initialized TFuncTable, and either have iFuncTable point to it, or, better still, use the const version in place of iFuncTable. This manifestation of the problem can often be hidden, having the function table take the form of an interface class.

In some cases it can be difficult to express the intentions in easy-to-understand C++ terms: a data structure that is read from a file could contain sub-structures. These can often become difficult to present as easily-read const struct or const class declarations.

A developer may even choose to have a ‘generated code’ solution that takes a human-readable data file and converts it into C++ code ready for the compiler.

Repeated code within loops

Redundant calculations very often occur in tandem with the construction of a complex type. Consider the following example:

ExampleClass::SimpleOperation(SimpleType a, SimpleType b)
//creation of a complex type – this is
//unnecessary, see text
ComplexType c = b.MakeComplex();
// some other code
SimpleType a,b;
//some code
SimpleOperation(a, b);
// some other code that does not change
//variable b

In the code above, the SimpleOperation() method is called within a loop. In each iteration, the same complex type is created from a SimpleType but it is not modified in the further code. This repeated creation of a ComplexType is unnecessary, it is a waste of resources and could seriously affect performance. Clearly if a ComplexType could be passed to SimpleOperation() then its repeated creation could be removed:

ExampleClass::SimpleOperation(SimpleType a, ComplexType &b)
// some code
SimpleType a;
ComplexType b; //create b as a complex type
//from outset
//pass b as a ComplexType instead of a
SimpleOperation(a, b);
//some code

To summarize, care has to be taken to ensure repeated calculations or processing does not get done in heavily used loops.

Inefficient heap usage

Often on embedded systems, the heap has to be used in place of the stack. Without care, this can lead to excessive heap calls, as stack space is usually used for temporary variables.

void LoopWithHeap(void)
CData *temp = new CData;
delete temp;

Where possible, any temporary variables should be reused:

void LoopWithHeap(void)
CData *temp = new CData;
delete temp;

Another cause of this problem is the use of segmented data structures with granularity that is too fine for the amount of data being processed.

Another possible cause of poor heap usage can be over-reliance on realloc. Poorly thought-out design can mean that heap cells are required to increase in size, and this usually involves an alloc, free and memcpy call.

Limited understanding of library

API documentation rarely includes any in-depth implementation notes. Coding to an inappropriate/poorly understood API can lead to problems such as duplicated or unnecessary processing and data transformation.

Consider a class that provides access to an array and implements bounds checking on the SetElement method:

void ArrayClass::SetSize(int aSize)
iMaxLength = aSize;
void ArrayClass::SetElement(int aPos, unsigned char aChar)
if(aPos >= 0 && aPos < iMaxLength)
iRawArray[aPos] = aChar;

Now consider a program written to use this class - it needs to add a number of elements to the array:

void ExampleClass::FillArray()
//some code
for(currentPos = 0; currentPos < bytesToProcess; currentPos++)
myArray.SetElement(currentPos, aByte);

The inefficiency here stems from the fact that the bounds checking done by SetElement is unnecessary – the calling function is in a loop that has already set the upper limit of the array.

This problem may have arisen for many reasons. Perhaps the developer of ExampleClass doesn’t realize that ArrayClass does bounds checking, or doesn’t know about another API on ArrayClass that may be more appropriate, or, perhaps the implementer of ArrayClass hasn’t provided such an API, and doesn’t expect that class to be used in such a manner.

Type coercion

Where a bad ‘data design’ has been used, processor time is needlessly wasted transforming data from one type to another. This often only involves a static transformation, and is usually done in preparation for passing the data to an external API.

Consider the following code, an example of ‘Chasing the data type’:

TInt intDrive;
TChar ch = ((*drives)[i])[0];
TDriveUnit curDrive(intDrive);

The code needs to use a TDriveUnit type, but what it has stored is a string of the drive name, and so has to process this data three times until it is in a usable form. Now consider that this function may be at the heart of a heavily used loop, this processing could become a significant part of the time taken to execute the function. It may be worth storing a TDriveUnit along with or instead of the drive name, which could then be used directly.

In some cases this problem can cause dummy data objects to be constructed on the stack, purely for the purpose of changing the interface to a particular data object.

Consider this example. Here we see the creation of an object just to get at a method call:


This example shows two problems: The primary problem is that the type coercion carried out by the Des() call constructs a temporary object, and in this instance this is being done five times, when the result could easily have been stored locally and reused:

TPtr des = iDllEntry.iName.Des();

A less obvious problem stems from the compiler options that are being used. The implementers of Des() have marked it as being inline, but as the compiler has been told to optimize for code size, and as the function is used often, the compiler will decide not to enforce the inline qualifier.

Inefficient file usage

This category covers a number of problems, and some of them apply not only to file misuse, but more widely to any data source that doesn’t offer the same instantaneous access as RAM does, for instance, hardware and network sources.

Inefficient use of the file system can arise when software uses it as if it were a database, where directory structure and filename format are used to define a database structure.

Another problem can come from ‘synchronous’ designs that read and process data from a file or other source in blocks, but do so serially. This can mean that processing of data is held up whilst waiting for the entire block to be read.

In the example below we find another common problem, that of reading files in multiple, small reads.

EXPORT_C CColorList*
ColorUtils::CreateSystemColorListL(RFs& aFs)
CDirectFileStore* store;
KGulColorSchemeFileName,EFileRead | EFileShareReadersOnly));
RStoreReadStream stream;
CColorList* colorList=CColorList::NewLC();
return colorList;

The problem lies hidden in the implementation of the overloaded C++ >> operator, a small section of the InternalizeL function that is called is shown below:

const TInt count(card);
TRgb rgb;
for (TInt ii=0;ii<count;ii++)

We can see that it calls further overloaded >> operator functions for each embedded class. Following through the function calls shows that the structures these functions create in memory are built in 32-bit blocks, with each block causing a new read from the file. Even if the File Server has a read-ahead cache for the file, the depth of function calls for each read will cause a performance problem.

Consider something like the following example instead:

const TInt count(card);
aStream->ReadL(iEikColors, count *sizeof(TRgb));

This will be faster, the trade-off is that care has to be taken to ensure the internal format of TRgb hasn’t changed.

Inefficient Database Usage

Closely coupled to bad file usage is the problem of inefficient usage of the Symbian OS database systems. Both of the database systems that Symbian OS supports make heavy usage of the underlying file system and therefore using the APIs provided at this level can lead to problems showing up in the file usage patterns.

The first point to note is in the usage of the Compact() API. For durability reasons the database sub-systems do not ‘edit in place’, that is to say, if you change the database structure in any way, the new parts of the structure are appended to the end of the database file and the various markers in the structure are set to point to these new areas. This clearly has an impact on file size, as a database that is constantly being updated will continue to grow in size, unchecked. It is for this reason that there is a Compact() API. This essentially rebuilds the database, removing the ‘dead’ areas as it progresses. Clearly then, this operation will be slow, especially for large and complex databases, so it is important to ensure it is only called when necessary.

Another important performance consideration when working with databases is the schema of the database itself. The database components make extensive use of caching, to limit the impact of file system usage. Better performance can be achieved if the schema used lends itself to caching by increasing the locality of often read items.

Consider a database that has a single table of records, each record having a few small fields and a large field. For a particular use case, it is the small fields that are accessed more frequently. The performance of the use case can be improved if the database is restructured so that it has two co-indexed tables; one containing just the frequently accessed fields, and the other containing the larger, infrequently accessed fields.

If the performance of your database application is critical, you may wish to measure the performance of alternative implementations. As already mentioned, database application performance is largely governed by the choice of schema and access algorithms so additional effort spent working out the best solution for you can pay large dividends. If you do decide to produce a performance test harness, note that database operations have some natural variation in execution time so a number of iterations of each measured event will be required in order to obtain a true picture.

Bad use of Design Patterns

Design Patterns are a useful way of categorizing software problems into well-known classes. Common, tried and tested solutions to those problems can then be easily implemented. However, the use of Design Patterns should never be a substitute to actually thinking through a problem and deriving a design solution.

Further problems can arise even once a particular design pattern has been chosen. Design Patterns usually imply an implementation that is described in full Object Oriented terms. Such OO abstraction is not always appropriate. Blindly implementing the prescribed solution can often incur a cost to performance, or an increase in code complexity.

Consider the following code: the design has suggested the use of the State Pattern. One way to implement this pattern is to derive from a common base class, a class for each state. In order to initialize the state machine, the implementers have decided to use a Factory Pattern:

CExampleStateFactory* CExampleStateFactory::NewL()
CExampleStateFactory* factory= new (ELeave) CExampleStateFactory();
// Create all the new states
factory->iStates[EError] = new (ELeave) TExampleStateError(*factory);
factory->iStates[EStarted] = new (ELeave) TExampleStateStarted(*factory);
factory->iStates[EStopped] = new (ELeave) TExampleStateStopped(*factory);
// etc...
return factory;

The state factory then owns each of the state classes, and they each take a pointer to the factory. However, if we look more closely at the state classes;

class TExampleStateBase
Inline TExampleStateBase*GetState(TStateEnum aState) {return iFactory->GetState(aState);}
CExampleStateFactory *iFactory;
TExampleStateBase::TExampleStateBase(CExampleStateFactory* aFactory) : iFactory(aFactory)
TExampleStateStarted::TExampleStateStarted(CExampleStateFactory* aFactory) : TExampleStateBase(aFactory)

We can see that the factory pointer that each state takes is only used for state switching; this means that the way the State Pattern has been implemented has added a great deal of complexity to the code. This can lead to performance problems if the factory is called repeatedly, or it can mean that much of the code only exists for initialization, which leads to an increase in ROM usage.

If the State Pattern had been implemented differently, with the state switching handled outside of the State Machine itself, then the complex construction could be avoided, and much of the Factory Pattern code could be handled by compiler intrinsics.

If each state is simple and contains no member data, then the state machine can be encapsulated entirely by the Virtual Function Pointer Table. In this case we can eliminate the need for the Factory Pattern completely, and simply change state by using casting, though this comes with the trade-off of a decrease in code legibility, and the higher risk of defects.

Generic and ‘future proofed’ code

This category of problems comes from an over-reliance on frameworks and plugins. It can also involve cases of trading ease-of-development for software performance.

Consider an application that stores some configuration parameters in a ‘.ini’ file. During development it may be that these parameters are being altered often, and so it makes sense for them to be in an easily edited form. However, once they become static, the overhead of loading and parsing the file can become a performance problem.

The ‘Framework and Plugins’ approach is one of the cornerstones of Symbian OS design, but all too often it can be used inappropriately.

The use of a framework carries with it an overhead in terms of code size, as you have to have administrative code to scan for plugins, check for changes in plugin availability, loading and linking of the plugin DLLs, and code for selecting a particular plugin. It also carries with it an overhead in terms of performance, as the framework usually provides a common client interface, which internally forwards requests to the plugin in use.

These are all acceptable tradeoffs when dynamic and in- the-field flexibility is required. However, when the set of plugins discovered and used becomes essentially static for the lifetime of a piece of software, these costs can become unacceptable.

Future proofing code can also be good practice, but again if the techniques are wrongly or inappropriately applied, there is a cost in code size and performance.

Developing and testing on the emulator

The Symbian OS emulator is provided as a development tool, as it allows a quick turnaround from writing code to getting it running. It also allows code to be debugged whilst it is running, without expensive hardware tools.

These facilities are invaluable, however, it is all too easy to forget about real hardware testing, especially when pressures of release deadlines loom. Hardware testing can often end up only being done for proving functionality, or ensuring the code runs on hardware prior to a release. This can lead to a poor understanding of how code behaves on actual devices, and while this doesn’t lead directly to performance problems, it can often mean they aren’t detected until the final stages of development or after the product ships.

This can appear to customers as though insufficient testing was performed or inadequate quality control was upheld. Additionally, it limits the scope for any improvements that can be made, as there will be little time or agreement to make big architectural changes.

You, your compiler and your code

As important as it is to know the system and the language you are programming for, so it is important to understand the basics of the compiler you are using. Many modern compilers contain a number of optimization phases that try to produce code for a set of desired qualities, be they small code size, or fastest execution path. It is important to understand which of these are being chosen in order to appreciate how that affects the code the compiler produces. It’s also important not to assume any ‘tricks’ done by one compiler will be done by another.

Don’t work against the compiler

Know your compiler, understand how the code it generates relates to the source you write. Don’t write code that is too prescriptive. This can force the compiler to produce code in a certain way, which may not be the most efficient code for the particular case.

Learn a little assembler

Assembler is often thought of as a ‘black art’ and is usually shied away from by most software developers. However, to fully understand how code will run on a particular platform, a passing understanding of assembler is incredibly useful.

Quick tips

Aside from the ‘killers’ mentioned in this booklet there are a number of smaller things that should be remembered when writing code. A lot of these are good common sense, and are usually formalized in a Coding Standard. A few of them are worth mentioning here.

Store results of calls used in loops

Avoid function calls in a loop’s condition statement. Prefer instead to store the value returned from a function call in a local variable, unless the result changes after each iteration.

Use references or pointers where necessary

Passing parameters by reference is usually a good thing, but don’t pass references to integer types if they are only ever read from.

Don’t unroll loops

With modern compilers this type of optimization is no longer necessary, and can even be counter productive. The compiler can perform this optimization where appropriate.

Avoid long if...else chains

It’s better to use a switch statement, as these can be implemented more efficiently. If the conditions aren’t constant integers, such as strings, then consider using a lookup table before a switch.

Use the const qualifier appropriately

By marking read-only variables as const the compiler can generate more efficient code.

Quick profiling using the Fast Counter

The user library provides access to a system clock that typical has a high resolution, User::FastCounter(). This can be used to measure the time taken for a particular piece of code to execute. The exact nature of the counter is device specific, but it’s attributes can be discovered using the HAL:Get() API: EFastCounterFrequency returns the frequency of the counter, and EFastCounterCountsUp returns an indication of which direction the counter progresses in.

Tools: the Sampling Profiler

This section is aimed at developers who have access to licensee prototypes and certain levels of SDK or are using reference boards.

The Sampling Profiler can be used to provide a rough, statistical view of software activity on a device. It does this by logging the thread ID and current Program Counter value at one millisecond intervals. It is accompanied by a command line-based program that can be used to analyse this data on a PC. This information can then be used to investigate performance problems, and inform code inspection of likely bottlenecks.

Build the ROM

To use the profiler, it first has to be added to the ROM. This can be done by adding ‘profiler.iby’ to the buildrom command line;

buildrom h4hrp techview profiler.iby

Start the profiler

The simplest way to control the profiler is from eshell. A command line such as:

start profiler start

This starts another thread that the profiler application runs in so you can switch back to other tasks using the <Control><Alt><Shift><T> key combination.

Run the code you wish to profile

At this point the profiler will be running, and gathering samples. A short pause before starting the code to be analyzed can help the thread activity analysis phase by visually separating out the various chunks of processing shown.

Stop the profiler

After you have profiled what you need, switch back to the eshell and stop the profiler;

profiler stop

And then to close the profiler data file;

profiler unload

Retrieve the profile data

You should have a file, profiler.dat, in the root of the C drive of the reference board. You can copy it to the MMC card and transfer it back to the build machine for analysis.

Analyze the data by activity

You should convert the data to a form suitable to be displayed in Excel in order to generate a graph, so you get an overall picture of the activity of the software you have profiled.

Copy the profile file

Copy it to the ROM folder as you need the symbol table to extract the names. Create the activity format file by running the following command:

analyse –r h4hrp_001.techview.symbol profiler.dat -v –mx > profile.xls

Create the activity graph

Open profile.xls file in Excel. To ensure the graph shows the thread names, delete the first six rows of the data. This is summary data and will mess up the graph if it is included. Similarly, the time stamps in the first column will mess up the graph but you cannot delete them as they are needed to cross reference the areas of the graph that you are interested in to the actual times.

Select all of the data and then click on the ‘chart wizard’. This opens up a four-stage wizard:

  • select ‘Area’ from chart type and ‘Stacked’ from the sub-type, select ‘Next’
  • adjust the area to miss out the time stamp in the first column. Change the A to a B. E.g., =profile!$A$2:$V$941 gets changed to =profile!$B$2:$V$941, select ‘Next’
  • ignore the next pane and press ‘Next’, select ‘As new sheet’ and press ‘Finish’.

Select the active section and threads

By looking at the graph created, you should be able to work out what your program was doing and when, allowing you to locate the area you are interested in. You can hover over the data area with your mouse and a pop-up window will tell you which thread was running at that point. You can then use the row number of the point to find its timestamp by looking at the value in the first column of the same row number in the data sheet. Additionally, you can delete rows you are not interested in. Remember that Excel will renumber the rows so delete the end of the range first. The graph will be redrawn with the new data.

Create a listing by function

Once you know the range of timestamps and within which thread they occurred, you can create a list of the functions ranked in order of the activity. For example, if you were interested in what functions were called between the 51300 and 76900 timestamps in the EFile thread, you would use the following command:

analyse –r h4hrp_001.techview.symbol profiler.dat -bf –p –s51300-76900 –t EFile* > analysis.txt
  • the sample range has no spaces after the –s or between the two numbers in the range and the hyphen separating them
  • the sample range is the timestamps that come from the first column of the datasheet used to create the activity graph
  • the thread name does have a space after the –t, and can include wildcards both at the beginning and the end of the name.

You can read the output file into a text editor (such as Notepad) where you will find the list of functions in the timestamp range selected. Usually, the top five or so functions will be of interest. You can then go to an IDE and inspect the relevant sections of code.

Original version

View this document on scribd.com

This page was last modified on 23 July 2013, at 14:09.
78 page views in the last 30 days.