Juergen Buchmueller pullmoll at
Sat Jun 2 10:50:54 PDT 2007

Hi list,

my name is Juergen Buchmueller. I just subscribed this list, after I got
Eric's preliminary source compiled and running on my machine. Wow!

I took me some time to write down my own Makefile, as I don't know anything
about the development tool that uses SConscript files. I also ripped out the
zip file loading in cpu.c in favour of simple file-on-path functions. After
all we don't expect huge archives of microcode, or do we? Even the
harddiscs could be an average mail attachment nowadays ;-)

I've been looking at the Alto documentation for some time already, and I was
several times overwhelmend by the complexity and many dependencies of that
design. The ancient notation can sometimes be confusing as well, e.g the
"reverse" bit numbering, "xxxB" for octal, &C. &C.

After looking at the trace output of Eric's code, some things are now much
more obvious to me, such as how the order of operations for a microcode
word is meant to be. I think the most important detail to conceive is the
separation into rising and falling clock phases, i.e. the early and late
stages of a cycle.

Eric's work is definitely most impressive and a good start for anyone who
want's to tackle this task of emulating the Xerox Alto! This is not for the
fainthearted, which I believe I can judge, because I wrote a couple of CPU
core emulations for MAME.

It seems to me that the microcode emulation at this complex level will
require a fast host CPU to give acceptable speed. For my (already patched
[*]) build I get a benchmark time of 8,174892340 for 10 seconds.
	./altogether -d -nw
	> benchmark 10
so, if I'm not totally wrong here, this means that I'm just a tad below
"real time" speed, and this even without the display bitmap being updated.

FYI: My box is a AMD Athlon w/ 1,4GHz running NetBSD 2.1

I'll take some more time to delve into the source and see if I can
contribute code or at least suggestions.

With kind regards,

[*] I patched the source in that I stripped out the ucode_t typedef, the
"ucode_t *current_uinst;" and all the pointer dereferences like
I replaced this struct pointer by a bunch of arrays
	int ucode_rsel[UCODE_SIZE];
	int ucode_aluf[UCODE_SIZE];
	int ucode_next[UCODE_SIZE];
which are all addressed by the same public current_upc. I think it may be
faster to access static arrays by an index (current_upc), rather than
to dereference elements of a global, or passed into functions, pointer to a
struct. I may be wrong here, though.

Also, the use of specific uint16_t for all the word sized variables and
function arguments is certainly adequate documentation-wise, while it will
cause the x86 to do many type cast conversions like "movzx" or "movsx",
which are (or at least were) slower than plain 32bit transfers. The latter
many times map to just register renames in the x86 core.

I'll play around with the source and see if I can get better benchmarks by
changing some things, and if I do, I'll let you know.

More information about the Altogether-devel mailing list