Eric Smith eric at
Wed Jun 13 01:00:51 PDT 2007

Juergen wrote:
> Also, the use of specific uint16_t for all the word sized variables and
> function arguments is certainly adequate documentation-wise, while it
> will
> cause the x86 to do many type cast conversions like "movzx" or "movsx",
> which are (or at least were) slower than plain 32bit transfers. The
> latter many times map to just register renames in the x86 core.

I wrote:
> I'd rather replace it with uint_fast16_t, then.  I think I'd be inclined
> to do this:
>   typedef uint_fast16_t uword_t;
> There are probably places in the code that will require explicit masking,
> since the uint_fast16_t will most likely compile to 32-bit or 64-bit
> instructions.

I changed to using the typedef above, and it seems to have improved
the performance, but by less than 1%.  I tried using uint32_t, and
that seemed the same.  I haven't benchmarked it enough to get
statistically significant results, and the current benchmark isn't
very good anyhow.

For less than 1%, I'd rather continue to use uint16_t, and not have to
worry about masking.

Note that AMD improved the performance of 16-bit register operations
in one of the revisions of the Athlon.  I assume that this carried
over into the Athlon 64.  I am relatively unlikely to put in
optimizations that are specifically intended to improve performance
on older processors.


More information about the Altogether-devel mailing list