eric at brouhaha.com
Wed Jun 13 01:00:51 PDT 2007
> Also, the use of specific uint16_t for all the word sized variables and
> function arguments is certainly adequate documentation-wise, while it
> cause the x86 to do many type cast conversions like "movzx" or "movsx",
> which are (or at least were) slower than plain 32bit transfers. The
> latter many times map to just register renames in the x86 core.
> I'd rather replace it with uint_fast16_t, then. I think I'd be inclined
> to do this:
> typedef uint_fast16_t uword_t;
> There are probably places in the code that will require explicit masking,
> since the uint_fast16_t will most likely compile to 32-bit or 64-bit
I changed to using the typedef above, and it seems to have improved
the performance, but by less than 1%. I tried using uint32_t, and
that seemed the same. I haven't benchmarked it enough to get
statistically significant results, and the current benchmark isn't
very good anyhow.
For less than 1%, I'd rather continue to use uint16_t, and not have to
worry about masking.
Note that AMD improved the performance of 16-bit register operations
in one of the revisions of the Athlon. I assume that this carried
over into the Athlon 64. I am relatively unlikely to put in
optimizations that are specifically intended to improve performance
on older processors.
More information about the Altogether-devel