Return to Home Page

The Nicolet 1280 needs to be ten times faster

Nicolet identified an opportunity in 1980. It was realtime infrared spectral analysis. The spectrometer could produce spectra 10 times faster than the Nicolet 1280 could process. The bottleneck was the time to perform Fourier analysis. Significant resources to solve the problem were commited because future FTIR business hinged on solving this problem. A highly qualified team of 6 hardware and software engineers was assembled. They had there own building (which happend to be green) and they had no other responsibilities. Robert Schumann was the team leader. Naturally, they were dubbed the "gang green." I was not in the group and I played no part in the decision process.

cray logo Schumann's group set out to design a floating point computer implemented in TTL. A year and a half later the project was shut down by Donald Haselhorst who was CEO at the time. So Schumann left Nicolet and founded Dane Scientific. Gang Green stayed intact. They moved to a facility in the suburbs of western Madison. They had trouble raising money, but they continued working on their computer. The Dane Scientific computer project morphed into something massive. It morphed into a replacement for a Cray supercomputer. The intention was not to equal the performance of the Cray, but rather to get 10 percent of the performance at 5 percent of the price. Other companies were attempting to do the same thing. It was called the Crayette business. The Cray-1 weighed 5 tons, consumed 250,000 watts, was cooled by immersion in freon, ran at 80 megahertz and cost 5 million dollars. It was the fastest computer in the world at the time. It was manufactured in Chippewa Falls, Wisconsin. It was successful. 80 were sold. It was named after Seymour Cray—the only computer designer to achieve celebrity status and not just among geeks.

Dane Scientific was eventually sold to Astronautics of Milwaukee. The Crayette was completed about 1987, but none were sold. Astronautics is still around.

Gang Green's original goal was to design a machine that would compute Fourier transforms for the infrared spectrometer and its computer. The Crayette went well beyond the original goal. I think that is the reason why Haselhorst shut down the project.

The one minute conversation that sealed my fate
How the Nicolet 1280 got a 24 bit Fourier array processor

I had a short conversation with Dick Ferrier in 1980. He said: "Schumann's group is going to produce something too expensive to be useful. What is needed is a board that plugs into the 1280 motherboard. The digitizer would DMA into it, thus eliminating the transfer time. Then the board would do an FFT on its own." The clarity and the simplicity of the idea hit me right between the eyes. I said I would try to do it. I designed it and got it running in a year. It was a one man project.

How does the array processor work?

FFT flowchart The array processor is a Harvard architecture computer. Instructions reside in a small, fast, 20 bit memory. Data reside in a large, slow, 24 bit memory. Both memories run concurrently. It was partial attempt at "breaking the Von Neumann bottleneck." The array processor directly executes the Cooley-Tukey algorithm in hardware. It was not a general purpose computer. A 64k real FFT could be performed in 4 seconds. It was pipelined and programming it was tricky. It was the hardest thing I ever did. There was no shirt pocket instruction set summary. It was too complicated for that. The software manual had 110 pages.

The array processor was implemented in TTL and it fit on one 10 inch by 15 inch printed circuit board that plugged into the 1280 motherboard.

If you look at the flow diagram to the right, you will see big and little butterflies. These are complex multiplies. It takes 4 multiplies and 6 adds to execute a butterfly. The array processor executes a butterfly as a single instruction. The butterfly operates on 6 numbers and produces 4 numbers. So, 6 numbers are read from data memory and 4 are written to data memory every time the butterfly executes. A butterfly execute and 10 memory cycles take about the same time. The 10 memory cycles required for the next butterfly occur while the current butterfly executes. This is called pipelining.

Data reading and writing, address calculation, instruction fetching, and arithmetic all happened concurrently.

Accuracy was a design priority. The goal was to get as much accuracy as possible out of a 24 bit, fixed point machine. Arithmetic overflows were detected after they occurred. The array was divided by 2 and the butterfly pipeline was reloaded before continuing an array operation. You did not need to scan an array for potential overflows. You did not need to scale an array to prevent overflows. This saved time and improved accuracy. Rounding was also done very carefully.

Warp mode

The array processor was said to be in warp mode when it disconnected itself from 1280 memory. This had the obvious benefit of allowing concurrent operation of the 1280 and the array processor. But the concept has a less obvious advantage. It made it possible for two array processors to work together efficiently. The first acquired data or dumped processed data via DMA. The second performed Fourier analysis while in warp mode. Then they would switch places.

The warp mode idea was fully exploited in a system that had 2 Nicolet 1280s and 6 array processors. The system had the computational power of a Crayette. That system is described in a patent which may be viewed here or online.

I was able to hire engineers and programmers only after the prototype array processor was up and running. They were Boyd Bain, Chris Barnett and Jei Chow. They did a fantastic job.

Nicolet's competitors could not match the speed of the array processor. It could be manufactured for $1000 but it sold for $10,000.