MFC Memory Leak Reporting

When using some MFC-Linking DLLs in a non-MFC project, I stubled on sth quite annoying. According to various sources, an wierd destructor calls _CrtDumpMemoryLeaks each time the MFC is unloaded. In my case, that ment every time a DLL was scanned via LoadLibrary/FreeLibrary. As a result, each and every block of allocated memory was reported as leak for each call of FreeLibrary. This lead to a runtime of serveral minutes in which VisualStudio was unusable.

Thankfully, further search turned up a solution by Pieter Op de Beeck posted in microsoft.public.vc.mfc ten years ago. go figure.

In case you stuble onto this, you'll probably want the fix right away, so here it is: MemoryLeakDetector.zip

Tags: fix, mfc
by axel
2010-06-29. 13:06:18. 118 words, 3050 views. Categories: programming, C++, MFC , Leave a comment »Send a trackback »

classicthesis with figures using full width

When using calssicthesis most of the defaults look good and are there for a typographical, aesthetical or arbitrary reason. Some people immediately want turn on dottedtoc for instance.

I had a very different concern. The package uses a rather wide margin, in order to place nice marginpars thruout the text. Thats fine by me, it also allows for a nice small text layout without resorting to multiple columns. I did not like this for the typesetting of figures, especially large figures spanning a whole page or the top half thereof. So, I ventured out to change this. After some searching I came across the neat changepage package, which most of all allows for save determination of even and odd pages. With this, it was fairly straightforward to implement. I chose to provide a widefigure environment to typeset full-width figures with [p] or [t].

% wider floats by use of changepage. 
% see changepage documentation for details on the underlying quirks
\usepackage[strict]{changepage}  
% calculate size of margin to use
\newlength\totalmargin
\setlength\totalmargin\marginparwidth
\addtolength\totalmargin\marginparsep
% layout using the full margin as well
\newenvironment{layoutfullwidth}{%
\checkoddpage%
\ifoddpage%
\begin{adjustwidth}{0pt}{-\totalmargin}
\else%
\begin{adjustwidth}{-\totalmargin}{0pt}
\fi%
% advance textwidth for use in float,
% e.g. includegraphics[width=\textwidth]
\advance\textwidth\totalmargin
}{%
\end{adjustwidth}
}
% new, wider figure to use where wanted.
% note that the caption will be layouted wide as well.
\newenvironment{widefigure}[1][!tpb]{%
\begin{figure}[#1] 
\begin{layoutfullwidth}
}{
\end{layoutfullwidth}
\end{figure}

This is a straightforeward way of doing it, changing the typesetting of all floats in general requires some really ugly plain TeX hacking. I got it to work out of playful ambition but decided against using it in order to keep the code clean and understandable.

by axel
2010-02-01. 19:52:10. 286 words, 8605 views. Categories: programming, science, TeX , Leave a comment »Send a trackback »

Setting up NVidia TwinView

Well, it was time to discard my old Xinerama xorg.conf in Kubuntu Jaunty. From a few friendly hints I finally managed to establish a simple procedure. I installed the manufacturer driver a while ago.

  1. Start X with an empty configuration or sudo nvidia-xconfig and only one monitor
  2. Connect the second monitor
  3. Start nvidia-settings and add the second monitor. Save the configuration.
  4. If the Resolution is not sufficent, add Monitor sections for all Monitors and set the metamodes to auto

So, my xorg.conf now looks like this:

# twinview, both monitors set via section and nvidia-auto-select

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "ServerFlags"
    Option         "Xinerama" "0"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
  Identifier "Scott"
  Modelname "hr17c"
  VertRefresh 60-100
  HorizSync 42-80
  DisplaySize 325 243
  Modeline "1280x1024@75"  135.00  1280 1296 1440 1688  1024 1025 1028 1066 +hsync +vsync
  Option "DPMS"
EndSection

Section "Monitor"
  Identifier "Samsung"
  Modelname "226BW"
  VertRefresh 56-75
  HorizSync 30-80
  Option "DPMS"
  DisplaySize 474 296
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce 7600 GT"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Samsung"
    DefaultDepth    24
    Option         "TwinView" "1"
    Option         "TwinViewXineramaInfoOrder" "DFP-0"
    Option         "metamodes" "DFP: nvidia-auto-select +0+0, CRT: nvidia-auto-select +1680+0"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection
Tags: linux, nvidia, x11
by axel
2009-12-02. 03:56:20. 217 words, 2531 views. Categories: linux , Leave a comment »Send a trackback »

Spectral Calculations with OpenMP

Since it is high time to get into multicore programming and all major compilers support the OpenMP standard, I choose that road for my first endeavor. One particular lengthy operation in some of my projects is that of diverse Spectrograms. Since an FFT is calculated for every column of pixels independently, these can be splitted straightforewardly into threads by the number of virtual processors. A couple of tests showed that it is indeed most advised to split as much data as possible, i.e. at the highest level avaliable. I hacked together a slightly off use of the for directive to achieve what I wanted.

// div width by number of virtual processors, min = 128, max = none
int slice = AP_MP::getSlice(width,128,-1); 
int xStart;
#pragma omp parallel for private(xStart)
for (xStart=0;xStart<width;xStart+=slice) {
  int xEnd = xStart+slice;
  if (xEnd>width) xEnd=width;
  calcSlice(..,xStart,xEnd); 
}

As I hoped, the calculations were indeed 1.3-1.8 times faster on a dual-core machine, which is quite nice for such limited effort. The following table gives the min/median/max times in ms for a 2.8s audio file sampled with 16bit at 22050Hz. Spectrum is a pure magnitude spectrum calculation, Spectrogram represents a rendering of a spectrogram image, the Auditory Spectrogram includes a rather lengthy masking calculation. g++ -O3 -ffast-math is up to four times faster for the core routines, but that lessens for larger code segments. The Auditory Spectrogram is slightly faster on Windows.

  Linux, g++ -O3 -ffast-math   Windows, VS2008
    1 thread 2 threads   speedup     1 thread 2 threads   speedup
Spectrum   10.1 10.3 12.3   5.4 7.0 11.9     1.32   31.1 31.4 71.1   18.0 18.4 51.4     1.64
Spectrogram   80 89 95   60 67 70     1.30   138 141 188   78 82 95     1.75
Auditory Spectrogram   4463 4670 4763   2320 2464 2725     1.89   4283 4300 4339   2246 2322 2647    1.84

Undusting my R, I was able to generate a nice box-plot of the results:

by axel
2009-10-19. 03:32:40. 306 words, 2840 views. Categories: programming, C++ , Leave a comment »Send a trackback »

MulDiv64

Well, that turned out to be effort than expected. At some point, I needed to multiply a 64 and a 32 bit value and divide them by a 64 bit value. Given that I'm using a 64 bit CPU, I assumed there would be an instruction or at least a library function. But no such luck. So I had to take my dusty bitmanipulation skills out of the closet and write it myself. Here goes

The calculation is performed by splitting the 64 bit multiplicant in two halves. So far, so obvious.

\begin{eqnarray*} result &=& \frac{ number \cdot numerator}{ denominator } \\ &=& \frac{ number_l + number_h \cdot 2^{32} \cdot numerator }{ denominator } \quad=\quad \underbrace{\frac{ number_l \cdot numerator }{ denominator }}_{result_l} + \underbrace{\frac{ number_h \cdot 2^{32} \cdot numerator }{ denominator }}_{result_h} \end{eqnarray*}

Where $ result_h $ is split again into the division canceled by 2^32 and the remainder (modulus). The modulus part may still be 64 bit, so it may have to be calculated by canceling appropriate powers of two via shifted by $ s $ according to the MSBs.

\begin{eqnarray*} result_h &=& \frac{ number_h \cdot 2^{32} \cdot numerator }{ denominator } = \underbrace{\left\lfloor \frac{ number_h \cdot numerator }{ denominator } \right\rfloor}_{div_h} \cdot 2^{32} + \frac{ \overbrace{(number_h \:\mathrm{mod}\: numerator)}^{mod_h} }{ denominator } \cdot 2^{32} \\ &=& {div_h} \cdot 2^{32} + \frac{ {mod_h} \cdot 2^{32-s} }{ denominator \cdot 2^{-s} } \qquad s = \max\{ \mathrm{MSB}(mod_h)+32 , \mathrm{MSB}(denominator) \} - 63 \end{eqnarray*}

That last part took me a bit longer than anticipated, given that it has in fact been years since I saw bits at all while coding. The MSB determination origniates in the astounding Bit Twiddling Hacks by Sean Eron Anderson. So, anyway, enjoy or abhorr the following end result:

/** the two 32 bit parts of an 64 bit integer */
typedef struct  { 
    uint32_t l : 32;
    uint32_t h : 32;
} uint64_uint32;

/**
 * determine the msb of a value in O(log log n)
 * @author Sean Eron Anderson
 */
inline unsigned int msb(uint64_t value)
{
    const int MAX_LOGLOG = 6;
    const uint64_t BIT_LL[MAX_LOGLOG] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000, 0xFFFFFFFF00000000LL};
    const unsigned int EXP_LL[MAX_LOGLOG] = {1, 2, 4, 8, 16, 32};
    unsigned int r = 0; 
    for (int i = MAX_LOGLOG-1; i >= 0; i--)  {
        if (value & BIT_LL[i])  {
            value >>= EXP_LL[i];
            r |= EXP_LL[i];
        } 
    }
    return r;
}

/** 
 * multiply a 64 and a 32 bit value and divide them by a 64 bit value.
 * result bits above 64 are ignored, so overflow flags are not set.
 * @author Axel Plinge
 * @param number       64 bit multiplicant
 * @param numerator    32 bit multiplicant
 * @param denominator  64 bit divisor
 * @return  (number*numerator)/denominator (+/-) 1 
 */
inline uint64_t muldiv(uint64_t number,uint32_t numerator,uint64_t denominator)
{
    uint64_t num_h = ((uint64_uint32*)&number)->h;
    uint64_t num_l = ((uint64_uint32*)&number)->l;
    uint64_t mul = numerator;
    uint64_t res;
    // lower 32bit portions yield 64 bit product
    // that can be divded directly giving 64 bits of result
    res = (num_l * mul)/denominator;
    // upper 32bit have to be shifted, calculate modulus 2^32
    uint64_t product_h = num_h*mul;
    uint64_t div_h = product_h/denominator; // division main
    uint64_t mod_h = product_h - denominator*div_h; // modulus
    // upper bits
    res += div_h<<32;
    if (mod_h==0)  {
	return res;
    }
    // remainder of division
    // if msb modulus < 32 we can be quick about it	
    if ((mod_h>>32)==0) {
        res += (mod_h<<32)/denominator;
        return res;
    }
    // if we reach this point we have full 64 bit values i.e. a 96bit dividend
    // calculate an approximate result by shifting according to msb set
    int msb_nominator = msb(mod_h)+32;
    int msb_denominator = msb(denominator);
    int msb = std::max(msb_nominator,msb_denominator);
    int shift = msb-63;
    res += (mod_h << (32-shift)) / (denominator>>shift);
    return res;
}
by axel
2009-10-08. 18:08:11. 412 words, 5066 views. Categories: programming, C++ , Leave a comment »Send a trackback »

1 2 3 4 5 >>