Category: C++
MFC Memory Leak Reporting
When using some MFC-Linking DLLs in a non-MFC project, I stubled on sth quite annoying. According to various sources, an wierd destructor calls _CrtDumpMemoryLeaks each time the MFC is unloaded. In my case, that ment every time a DLL was scanned via LoadLibrary/FreeLibrary. As a result, each and every block of allocated memory was reported as leak for each call of FreeLibrary. This lead to a runtime of serveral minutes in which VisualStudio was unusable.
Thankfully, further search turned up a solution by Pieter Op de Beeck posted in microsoft.public.vc.mfc ten years ago. go figure.
In case you stuble onto this, you'll probably want the fix right away, so here it is: MemoryLeakDetector.zip
2010-06-29. 13:06:18. 118 words, 2939 views. Categories: programming, C++, MFC , Leave a comment » • Send a trackback »
Spectral Calculations with OpenMP
Since it is high time to get into multicore programming and all major compilers support the OpenMP standard, I choose that road for my first endeavor. One particular lengthy operation in some of my projects is that of diverse Spectrograms. Since an FFT is calculated for every column of pixels independently, these can be splitted straightforewardly into threads by the number of virtual processors. A couple of tests showed that it is indeed most advised to split as much data as possible, i.e. at the highest level avaliable. I hacked together a slightly off use of the for directive to achieve what I wanted.
// div width by number of virtual processors, min = 128, max = none
int slice = AP_MP::getSlice(width,128,-1);
int xStart;
#pragma omp parallel for private(xStart)
for (xStart=0;xStart<width;xStart+=slice) {
int xEnd = xStart+slice;
if (xEnd>width) xEnd=width;
calcSlice(..,xStart,xEnd);
}
As I hoped, the calculations were indeed 1.3-1.8 times faster on a dual-core machine, which is quite nice for such limited effort.
The following table gives the min/median/max times in ms for a 2.8s audio file sampled with 16bit at 22050Hz.
Spectrum is a pure magnitude spectrum calculation, Spectrogram represents a rendering of a spectrogram image, the Auditory Spectrogram includes a rather lengthy masking calculation. g++ -O3 -ffast-math is up to four times faster for the core routines, but that lessens for larger code segments. The Auditory Spectrogram is slightly faster on Windows.
| Linux, g++ -O3 -ffast-math | Windows, VS2008 | |||||||||||||||||
| 1 thread | 2 threads | speedup | 1 thread | 2 threads | speedup | |||||||||||||
| Spectrum | 10.1 | 10.3 | 12.3 | 5.4 | 7.0 | 11.9 | 1.32 | 31.1 | 31.4 | 71.1 | 18.0 | 18.4 | 51.4 | 1.64 | ||||
| Spectrogram | 80 | 89 | 95 | 60 | 67 | 70 | 1.30 | 138 | 141 | 188 | 78 | 82 | 95 | 1.75 | ||||
| Auditory Spectrogram | 4463 | 4670 | 4763 | 2320 | 2464 | 2725 | 1.89 | 4283 | 4300 | 4339 | 2246 | 2322 | 2647 | 1.84 | ||||
Undusting my R, I was able to generate a nice box-plot of the results:

2009-10-19. 03:32:40. 306 words, 2831 views. Categories: programming, C++ , Leave a comment » • Send a trackback »
MulDiv64
Well, that turned out to be effort than expected. At some point, I needed to multiply a 64 and a 32 bit value and divide them by a 64 bit value. Given that I'm using a 64 bit CPU, I assumed there would be an instruction or at least a library function. But no such luck. So I had to take my dusty bitmanipulation skills out of the closet and write it myself. Here goes
The calculation is performed by splitting the 64 bit multiplicant in two halves. So far, so obvious.
Where
is split again into the division canceled by 2^32 and the remainder (modulus). The modulus part may still be 64 bit, so it may have to be calculated by canceling appropriate powers of two via shifted by
according to the MSBs.
That last part took me a bit longer than anticipated, given that it has in fact been years since I saw bits at all while coding. The MSB determination origniates in the astounding Bit Twiddling Hacks by Sean Eron Anderson. So, anyway, enjoy or abhorr the following end result:
/** the two 32 bit parts of an 64 bit integer */
typedef struct {
uint32_t l : 32;
uint32_t h : 32;
} uint64_uint32;
/**
* determine the msb of a value in O(log log n)
* @author Sean Eron Anderson
*/
inline unsigned int msb(uint64_t value)
{
const int MAX_LOGLOG = 6;
const uint64_t BIT_LL[MAX_LOGLOG] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000, 0xFFFFFFFF00000000LL};
const unsigned int EXP_LL[MAX_LOGLOG] = {1, 2, 4, 8, 16, 32};
unsigned int r = 0;
for (int i = MAX_LOGLOG-1; i >= 0; i--) {
if (value & BIT_LL[i]) {
value >>= EXP_LL[i];
r |= EXP_LL[i];
}
}
return r;
}
/**
* multiply a 64 and a 32 bit value and divide them by a 64 bit value.
* result bits above 64 are ignored, so overflow flags are not set.
* @author Axel Plinge
* @param number 64 bit multiplicant
* @param numerator 32 bit multiplicant
* @param denominator 64 bit divisor
* @return (number*numerator)/denominator (+/-) 1
*/
inline uint64_t muldiv(uint64_t number,uint32_t numerator,uint64_t denominator)
{
uint64_t num_h = ((uint64_uint32*)&number)->h;
uint64_t num_l = ((uint64_uint32*)&number)->l;
uint64_t mul = numerator;
uint64_t res;
// lower 32bit portions yield 64 bit product
// that can be divded directly giving 64 bits of result
res = (num_l * mul)/denominator;
// upper 32bit have to be shifted, calculate modulus 2^32
uint64_t product_h = num_h*mul;
uint64_t div_h = product_h/denominator; // division main
uint64_t mod_h = product_h - denominator*div_h; // modulus
// upper bits
res += div_h<<32;
if (mod_h==0) {
return res;
}
// remainder of division
// if msb modulus < 32 we can be quick about it
if ((mod_h>>32)==0) {
res += (mod_h<<32)/denominator;
return res;
}
// if we reach this point we have full 64 bit values i.e. a 96bit dividend
// calculate an approximate result by shifting according to msb set
int msb_nominator = msb(mod_h)+32;
int msb_denominator = msb(denominator);
int msb = std::max(msb_nominator,msb_denominator);
int shift = msb-63;
res += (mod_h << (32-shift)) / (denominator>>shift);
return res;
}
2009-10-08. 18:08:11. 412 words, 5033 views. Categories: programming, C++ , Leave a comment » • Send a trackback »
avoiding virtual overloads with a template replacement
In most C++ Compilers virtual overloads work nowadays just as you'd expect.
Let us assume two classes A and B like
class A {
public:
virtual void foo(int i) {
cout << "A::foo(int " << i << ")" << endl;
}
virtual void foo(double d) {
cout << "A::foo(double " << d << ")" << endl;
}
};
class B : public A {
public:
virtual void foo(int i) {
cout << "B::foo(int " << i << ")" << endl;
}
virtual void foo(double d) {
cout << "B::foo(double " << d << ")" << endl;
}
};
When calling them, the correct one is identified since foo(int) and foo(double) get seperate vtable entries. A quick test:
void callFoo(A& a) {
a.foo(1.5);
a.foo(3);
}
..
A a;
callFoo(a);
B b;
callFoo(b);
However, things get confusing very easily. For both programmer and compiler. For example, when calling foo with an unsigned long, many compilers simply issue an error forcing you to cast explicitly or add another overload. Which has to be consistent over the whole class hierarchy, as everything you do here. This may not seem too difficult, however, its added risk. When working with some code I last touched 10 years ago, I stumbled more than once. So, to avoid confusion, one of the two has to go. Let's keep inheritance since it's most vital:
class C {
public:
virtual void fooInt(int i) {
cout << "C::fooInt(" << i << ")" << endl;
}
virtual void fooDouble(double d) {
cout << "C::fooDouble(" << d << ")" << endl;
}
Now you as programmer have to choose the variant, not the compiler. A call to fooInt will cast any argument to (int) and that's that.
So, what if you REALLY need the compiler to choose, e.g. when using C in a template? The solution is a type-switch with a template method as proposed in the CppFaqLite.
This is not especially beautiful, but leaves the above separation intact.
template <typename type>
inline void fooTemplated(type arg);
template <>
inline void fooTemplated(int arg) {
cout << "templated ";
fooInt(arg);
}
template <>
inline void fooTemplated(double arg) {
cout << "templated ";
fooDouble(arg);
}
};
template <typename type>
inline void C::fooTemplated(type arg) {
throw runtime_error("fooTemplated called with unforseen type");
}
Suffice to say that the above type-switch has to be written inline as shown above to avoid visibility issues when compiling.
I may also remark that I wrote C++ happily for 12 year before introducing this construct. ;-) Nevertheless, remember to beware of clever code. It may come back to haunt you..
2009-08-17. 22:45:39. 450 words, 4743 views. Categories: programming, C++ , Leave a comment » • Send a trackback »
VC++ build time and Ramdisks
I had a particularly slow compiling C++ project. It had lots of interdependent headers where modification leads to full rebuilds .. and due to Murphy's law, I always had to modify one of them that day. When waiting for 6 minutes again for the build to complete, I decided to check if I could speed it up with outside measures. I decided to move the source and object files to a faster device.
First off, I installed Gavotte Ramdisk with GUI which I found thanks to this entry in My Digital Life. Looking at my desktop, I picked up the Corsair Flash Voyager as well and plugged it into the USB Hub. I had planned to try out different combinations of locations for source, temp and object files. But... I stopped in my tracks when comparing the first run with everything on harddisk - 5:32 - with the one where everything resided on the ramdisk - 5:34!

As you can see, NTFS Caching in Windows XP is actually worth it's money. Needless to say, defragmenting the disk did not have much effect either. I've heard people say that you need RAID for a development machine. While, of course, browsing code probably is faster, and hopefully intellisense updates as well, I begin to wonder if there really is a significant difference.
Regardless of the nonexsisting speedup, I ended up using the ramdisk for TEMP and intermediate build files anyway. It helps to keep the disk clean. On a machine with 3GB ram, I use one gig for a ram disk, leaving 2 times one gig to start an instance of Visual Studio 2008. Of course, for really slow builds, it makes sense to keep the intermediate files on the disk for the next day.
2009-03-07. 13:30:51. 284 words, 14592 views. Categories: programming, C++ , Leave a comment » • Send a trackback »