Categories: programming, C++, MFC, Ruby, TeX

MFC Memory Leak Reporting

When using some MFC-Linking DLLs in a non-MFC project, I stubled on sth quite annoying. According to various sources, an wierd destructor calls _CrtDumpMemoryLeaks each time the MFC is unloaded. In my case, that ment every time a DLL was scanned via LoadLibrary/FreeLibrary. As a result, each and every block of allocated memory was reported as leak for each call of FreeLibrary. This lead to a runtime of serveral minutes in which VisualStudio was unusable.

Thankfully, further search turned up a solution by Pieter Op de Beeck posted in microsoft.public.vc.mfc ten years ago. go figure.

In case you stuble onto this, you'll probably want the fix right away, so here it is: MemoryLeakDetector.zip

Tags: fix, mfc
by axel
2010-06-29. 13:06:18. 118 words, 2938 views. Categories: programming, C++, MFC , Leave a comment »Send a trackback »

classicthesis with figures using full width

When using calssicthesis most of the defaults look good and are there for a typographical, aesthetical or arbitrary reason. Some people immediately want turn on dottedtoc for instance.

I had a very different concern. The package uses a rather wide margin, in order to place nice marginpars thruout the text. Thats fine by me, it also allows for a nice small text layout without resorting to multiple columns. I did not like this for the typesetting of figures, especially large figures spanning a whole page or the top half thereof. So, I ventured out to change this. After some searching I came across the neat changepage package, which most of all allows for save determination of even and odd pages. With this, it was fairly straightforward to implement. I chose to provide a widefigure environment to typeset full-width figures with [p] or [t].

% wider floats by use of changepage. 
% see changepage documentation for details on the underlying quirks
\usepackage[strict]{changepage}  
% calculate size of margin to use
\newlength\totalmargin
\setlength\totalmargin\marginparwidth
\addtolength\totalmargin\marginparsep
% layout using the full margin as well
\newenvironment{layoutfullwidth}{%
\checkoddpage%
\ifoddpage%
\begin{adjustwidth}{0pt}{-\totalmargin}
\else%
\begin{adjustwidth}{-\totalmargin}{0pt}
\fi%
% advance textwidth for use in float,
% e.g. includegraphics[width=\textwidth]
\advance\textwidth\totalmargin
}{%
\end{adjustwidth}
}
% new, wider figure to use where wanted.
% note that the caption will be layouted wide as well.
\newenvironment{widefigure}[1][!tpb]{%
\begin{figure}[#1] 
\begin{layoutfullwidth}
}{
\end{layoutfullwidth}
\end{figure}

This is a straightforeward way of doing it, changing the typesetting of all floats in general requires some really ugly plain TeX hacking. I got it to work out of playful ambition but decided against using it in order to keep the code clean and understandable.

by axel
2010-02-01. 19:52:10. 286 words, 8485 views. Categories: programming, science, TeX , Leave a comment »Send a trackback »

Spectral Calculations with OpenMP

Since it is high time to get into multicore programming and all major compilers support the OpenMP standard, I choose that road for my first endeavor. One particular lengthy operation in some of my projects is that of diverse Spectrograms. Since an FFT is calculated for every column of pixels independently, these can be splitted straightforewardly into threads by the number of virtual processors. A couple of tests showed that it is indeed most advised to split as much data as possible, i.e. at the highest level avaliable. I hacked together a slightly off use of the for directive to achieve what I wanted.

// div width by number of virtual processors, min = 128, max = none
int slice = AP_MP::getSlice(width,128,-1); 
int xStart;
#pragma omp parallel for private(xStart)
for (xStart=0;xStart<width;xStart+=slice) {
  int xEnd = xStart+slice;
  if (xEnd>width) xEnd=width;
  calcSlice(..,xStart,xEnd); 
}

As I hoped, the calculations were indeed 1.3-1.8 times faster on a dual-core machine, which is quite nice for such limited effort. The following table gives the min/median/max times in ms for a 2.8s audio file sampled with 16bit at 22050Hz. Spectrum is a pure magnitude spectrum calculation, Spectrogram represents a rendering of a spectrogram image, the Auditory Spectrogram includes a rather lengthy masking calculation. g++ -O3 -ffast-math is up to four times faster for the core routines, but that lessens for larger code segments. The Auditory Spectrogram is slightly faster on Windows.

  Linux, g++ -O3 -ffast-math   Windows, VS2008
    1 thread 2 threads   speedup     1 thread 2 threads   speedup
Spectrum   10.1 10.3 12.3   5.4 7.0 11.9     1.32   31.1 31.4 71.1   18.0 18.4 51.4     1.64
Spectrogram   80 89 95   60 67 70     1.30   138 141 188   78 82 95     1.75
Auditory Spectrogram   4463 4670 4763   2320 2464 2725     1.89   4283 4300 4339   2246 2322 2647    1.84

Undusting my R, I was able to generate a nice box-plot of the results:

by axel
2009-10-19. 03:32:40. 306 words, 2831 views. Categories: programming, C++ , Leave a comment »Send a trackback »

MulDiv64

Well, that turned out to be effort than expected. At some point, I needed to multiply a 64 and a 32 bit value and divide them by a 64 bit value. Given that I'm using a 64 bit CPU, I assumed there would be an instruction or at least a library function. But no such luck. So I had to take my dusty bitmanipulation skills out of the closet and write it myself. Here goes

The calculation is performed by splitting the 64 bit multiplicant in two halves. So far, so obvious.

\begin{eqnarray*} result &=& \frac{ number \cdot numerator}{ denominator } \\ &=& \frac{ number_l + number_h \cdot 2^{32} \cdot numerator }{ denominator } \quad=\quad \underbrace{\frac{ number_l \cdot numerator }{ denominator }}_{result_l} + \underbrace{\frac{ number_h \cdot 2^{32} \cdot numerator }{ denominator }}_{result_h} \end{eqnarray*}

Where $ result_h $ is split again into the division canceled by 2^32 and the remainder (modulus). The modulus part may still be 64 bit, so it may have to be calculated by canceling appropriate powers of two via shifted by $ s $ according to the MSBs.

\begin{eqnarray*} result_h &=& \frac{ number_h \cdot 2^{32} \cdot numerator }{ denominator } = \underbrace{\left\lfloor \frac{ number_h \cdot numerator }{ denominator } \right\rfloor}_{div_h} \cdot 2^{32} + \frac{ \overbrace{(number_h \:\mathrm{mod}\: numerator)}^{mod_h} }{ denominator } \cdot 2^{32} \\ &=& {div_h} \cdot 2^{32} + \frac{ {mod_h} \cdot 2^{32-s} }{ denominator \cdot 2^{-s} } \qquad s = \max\{ \mathrm{MSB}(mod_h)+32 , \mathrm{MSB}(denominator) \} - 63 \end{eqnarray*}

That last part took me a bit longer than anticipated, given that it has in fact been years since I saw bits at all while coding. The MSB determination origniates in the astounding Bit Twiddling Hacks by Sean Eron Anderson. So, anyway, enjoy or abhorr the following end result:

/** the two 32 bit parts of an 64 bit integer */
typedef struct  { 
    uint32_t l : 32;
    uint32_t h : 32;
} uint64_uint32;

/**
 * determine the msb of a value in O(log log n)
 * @author Sean Eron Anderson
 */
inline unsigned int msb(uint64_t value)
{
    const int MAX_LOGLOG = 6;
    const uint64_t BIT_LL[MAX_LOGLOG] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000, 0xFFFFFFFF00000000LL};
    const unsigned int EXP_LL[MAX_LOGLOG] = {1, 2, 4, 8, 16, 32};
    unsigned int r = 0; 
    for (int i = MAX_LOGLOG-1; i >= 0; i--)  {
        if (value & BIT_LL[i])  {
            value >>= EXP_LL[i];
            r |= EXP_LL[i];
        } 
    }
    return r;
}

/** 
 * multiply a 64 and a 32 bit value and divide them by a 64 bit value.
 * result bits above 64 are ignored, so overflow flags are not set.
 * @author Axel Plinge
 * @param number       64 bit multiplicant
 * @param numerator    32 bit multiplicant
 * @param denominator  64 bit divisor
 * @return  (number*numerator)/denominator (+/-) 1 
 */
inline uint64_t muldiv(uint64_t number,uint32_t numerator,uint64_t denominator)
{
    uint64_t num_h = ((uint64_uint32*)&number)->h;
    uint64_t num_l = ((uint64_uint32*)&number)->l;
    uint64_t mul = numerator;
    uint64_t res;
    // lower 32bit portions yield 64 bit product
    // that can be divded directly giving 64 bits of result
    res = (num_l * mul)/denominator;
    // upper 32bit have to be shifted, calculate modulus 2^32
    uint64_t product_h = num_h*mul;
    uint64_t div_h = product_h/denominator; // division main
    uint64_t mod_h = product_h - denominator*div_h; // modulus
    // upper bits
    res += div_h<<32;
    if (mod_h==0)  {
	return res;
    }
    // remainder of division
    // if msb modulus < 32 we can be quick about it	
    if ((mod_h>>32)==0) {
        res += (mod_h<<32)/denominator;
        return res;
    }
    // if we reach this point we have full 64 bit values i.e. a 96bit dividend
    // calculate an approximate result by shifting according to msb set
    int msb_nominator = msb(mod_h)+32;
    int msb_denominator = msb(denominator);
    int msb = std::max(msb_nominator,msb_denominator);
    int shift = msb-63;
    res += (mod_h << (32-shift)) / (denominator>>shift);
    return res;
}
by axel
2009-10-08. 18:08:11. 412 words, 5032 views. Categories: programming, C++ , Leave a comment »Send a trackback »

avoiding virtual overloads with a template replacement

In most C++ Compilers virtual overloads work nowadays just as you'd expect. Let us assume two classes A and B like

class A {
public:
   virtual void foo(int i) {
      cout << "A::foo(int " << i << ")" << endl;
   }
   virtual void foo(double d) {
      cout << "A::foo(double " << d << ")" << endl;
   }
};

class B : public A {
public:
   virtual void foo(int i) {
      cout << "B::foo(int " << i << ")" << endl;
   }
   virtual void foo(double d) {
      cout << "B::foo(double " << d << ")" << endl;
   }
};

When calling them, the correct one is identified since foo(int) and foo(double) get seperate vtable entries. A quick test:

void callFoo(A& a) {
   a.foo(1.5);
   a.foo(3);
}
..
A a;	
callFoo(a);
B b;
callFoo(b);

However, things get confusing very easily. For both programmer and compiler. For example, when calling foo with an unsigned long, many compilers simply issue an error forcing you to cast explicitly or add another overload. Which has to be consistent over the whole class hierarchy, as everything you do here. This may not seem too difficult, however, its added risk. When working with some code I last touched 10 years ago, I stumbled more than once. So, to avoid confusion, one of the two has to go. Let's keep inheritance since it's most vital:

class C  {
public:
   virtual void fooInt(int i) {
      cout << "C::fooInt(" << i << ")" << endl;
   }
   virtual void fooDouble(double d) {
      cout << "C::fooDouble(" << d << ")" << endl;
   }

Now you as programmer have to choose the variant, not the compiler. A call to fooInt will cast any argument to (int) and that's that.

So, what if you REALLY need the compiler to choose, e.g. when using C in a template? The solution is a type-switch with a template method as proposed in the CppFaqLite. This is not especially beautiful, but leaves the above separation intact.

   template <typename type>
   inline void fooTemplated(type arg);

   template <>
   inline void fooTemplated(int arg) {
      cout << "templated ";
      fooInt(arg);
   }

   template <>
   inline void fooTemplated(double arg) {
      cout << "templated ";
      fooDouble(arg);
   }

};

template <typename type>
inline void C::fooTemplated(type arg) {
   throw runtime_error("fooTemplated called with unforseen type");
}

Suffice to say that the above type-switch has to be written inline as shown above to avoid visibility issues when compiling.

I may also remark that I wrote C++ happily for 12 year before introducing this construct. ;-) Nevertheless, remember to beware of clever code. It may come back to haunt you..

by axel
2009-08-17. 22:45:39. 450 words, 4743 views. Categories: programming, C++ , Leave a comment »Send a trackback »

1 2 3 4 >>