Nicolet identified an opportunity in 1980. It was realtime infrared spectral analysis. The
spectrometer could produce spectra 10 times faster than the Nicolet 1280 could process. The
bottleneck was the time to perform Fourier analysis. Significant resources to solve the
problem were commited because future FTIR business hinged on solving this problem. A
highly qualified team of 6 hardware and software engineers was assembled. They had there
own building (which happend to be green) and they had no other responsibilities. Robert
Schumann was the team leader. Naturally, they were dubbed the "gang green." I was not in
the group and I played no part in the decision process.
Schumann's group set out to design a floating point computer implemented in TTL. A year
and a half later the project was shut down by Donald Haselhorst who was CEO at the time.
So Schumann left Nicolet and founded Dane Scientific. Gang Green stayed intact. They moved
to a facility in the suburbs of western Madison. They had trouble raising money, but they
continued working on their computer. The Dane Scientific computer project morphed into
something massive. It morphed into a replacement for a Cray supercomputer. The intention was not to equal
the performance of the Cray, but rather to get 10 percent of the performance at 5 percent
of the price. Other companies were attempting to do the same thing. It was called the
Crayette business. The Cray-1 weighed 5 tons, consumed 250,000 watts, was cooled by
immersion in freon, ran at 80 megahertz and cost 5 million dollars. It was the fastest
computer in the world at the time. It was manufactured in Chippewa Falls, Wisconsin. It
was successful. 80 were sold. It was named after Seymour Cray—the only computer
designer to achieve celebrity status and not just among geeks.
Dane Scientific was eventually sold to Astronautics of
Milwaukee. The Crayette was completed about 1987, but none were sold. Astronautics is still
around.
Gang Green's original goal was to design a machine that would compute Fourier transforms for the infrared spectrometer and its computer. The Crayette went well beyond the original goal. I think that is the reason why Haselhorst shut down the project.
I had a short conversation with Dick Ferrier in 1980. He said: "Schumann's group is going to produce something too expensive to be useful. What is needed is a board that plugs into the 1280 motherboard. The digitizer would DMA into it, thus eliminating the transfer time. Then the board would do an FFT on its own." The clarity and the simplicity of the idea hit me right between the eyes. I said I would try to do it. I designed it and got it running in a year. It was a one man project.
The array processor is a Harvard architecture computer. Instructions reside in a
small, fast, 20 bit memory. Data reside in a large, slow, 24 bit memory. Both memories run
concurrently. It was partial attempt at "breaking the Von Neumann
bottleneck." The array processor directly executes the Cooley-Tukey algorithm
in hardware. It was not a general purpose computer. A 64k real FFT could be performed in 4 seconds.
It was pipelined and programming it was tricky. It was the hardest thing I ever did.
There was no shirt pocket
instruction set summary. It was too complicated for that. The software manual had 110
pages.
The array processor was implemented in TTL and it fit on one 10 inch by 15 inch printed
circuit board that plugged into the 1280 motherboard.
If you look at the flow diagram to the right, you will see big and little butterflies.
These are complex multiplies. It takes 4 multiplies and 6 adds to execute a butterfly. The
array processor executes a butterfly as a single instruction. The butterfly operates on 6
numbers and produces 4 numbers. So, 6 numbers are read from data memory and 4 are written
to data memory every time the butterfly executes. A butterfly execute and 10 memory
cycles take about the same time. The 10 memory cycles required for the next butterfly
occur while the current butterfly executes. This is called pipelining.
Data reading and writing, address calculation, instruction fetching, and arithmetic all
happened concurrently.
Accuracy was a design priority. The goal was to get as much accuracy as possible out of a
24 bit, fixed point machine. Arithmetic overflows were detected after they
occurred. The array was divided by 2 and the butterfly pipeline was reloaded before
continuing an array operation. You did not need to scan an array for potential overflows.
You did not need to scale an array to prevent overflows. This saved time and improved
accuracy. Rounding was also done very carefully.
The array processor was said to be in warp mode when it disconnected itself from 1280
memory. This had the obvious benefit of allowing concurrent operation of the 1280 and the
array processor. But the concept has a less obvious advantage. It made it possible for two
array processors to work together efficiently. The first acquired data or dumped processed
data via DMA. The second performed Fourier analysis while in warp mode. Then they would
switch places.
The warp mode idea was fully exploited in a system that had 2 Nicolet 1280s and 6 array
processors. The system had the computational power of a Crayette. That system is described in a patent
which may be viewed here or online. http://www.freepatentsonline.com/EP0350209.pdf
I was able to hire engineers and programmers only after the prototype array processor was
up and running. They were Boyd Bain, Chris Barnett and Jei Chow. They did a fantastic
job.
Nicolet's competitors could not match the speed of the array processor. It could be
manufactured for $1000 but it sold for $10,000.