Analysis of massively distributed applications (W.M. Zuberek)

The SETI@home project, managed by a group of researchers at the Space Science Laboratory of the University of California at Berkeley, is arguably the most popular attempt to use large-scale distributed computing to perform a sensitive search for radio signals from extraterrestrial civilizations.

SETI@home uses a dedicated L band receiver at the National Astronomy and Ionospheric Center's 305-meter radiotelescope in Arecibo, Puerto Rico. The receiver provides a beam of 0.1 degree width which is digitized and converted to a 2.5 MHz-wide band centered at the 1,420 MHz hydrogen line. It is believed that this frequency is one of the most likely locations for deliberate extraterrestrial transmissions. Digitized data are recorded continuously on magnetic tapes, along with data on telescope coordinates, time and some additional information. A single tape (35 gigabytes) records 17 hours of observations. Tapes are mailed to Berkeley for analysis on a weekly basis. The complete survey requires 1,100 tapes to record the total of 39 terabytes of data. Observations began in October 1998.

At Berkeley, data are divided into small "work units" by first splitting the 2.5 MHz bandwidth into 256 sub-bands (by means of a 2048 point FFT and 256 eight point inverse transforms), an then dividing each 9,765 Hz sub-band into units of 220 samples (which corresponds to 107 seconds of recorded data). Subsequent work units overlap by 20 to 30 seconds to allow full analysis of signals that might be on the boundary of units.

Work units are sent over the Internet to the client programs around the world for the bulk of the data analysis. A database of work units, their processing status, and returned results is maintained in Berkeley.

SETI@home distributes client software for about 50 different combinations of processor and its operating system. For Microsoft Windows and Apple Macintosh computers, the software installs itself as a screen saver, processing data only when the computer is not used. For other platforms, it runs in the background and becomes inactive whenever the user executes his jobs.

The client software, after receiving a work unit, performs a baseline smoothing to remove any wideband features (this prevents the client from confusing fluctuations in broadband noise with intelligent signals), and then searches the data for signals with drift rates between -10 Hz/sec to +10 Hz/sec in steps of 0.0018 Hz/sec (to take into account the unknown accelaration of a rotating planet sending the signals). At each drift rate, the client searches for signals at one or more bandwidths between 0.075 and 1,221 Hz. The data are examined for signals that exceed 22 times the mean noise power. All potential signals are sent back to the central server for further processing.

Analysis of a single work unit requires 2.4 to 3.8 trillion floating-point operations (Tflops), and takes 5 to 6 hours of a 1 GHz processor. On average, a client reports 8 detected signals for each work unit.

The vast majority of detected signals are created by sources of narrow-band emissions at or near the observatory, such as equipment, aircraft, satellites, and other transmitters. As these signals are long-duration ones, they are rejected in post-processing phase.

Some of detected signals are caused by errors in computers or communication networks. Although the probability of such errors is very low, the amount of computations required by the SETI@home project (in the order of millions of computation-years) must take the existence of such errors into consideration. Therefore each work unit is processed 3 times, on different processors, and the obtained results are compared for consistency.

The number of "standard", 1 GHz processors, needed for on-line processing of the recorded data can be estimated as follows. Analysis of 80 seconds of data (i.e., 107 seconds with 20 to 30 second overlap) in a single sub-band requires 5 to 6 hours of processing, so on-line processing of a single sub-band requires:

3,600 * 5.5 / 80 = 247.5 processors.

Multiplying this number by 256 sub-bands, and by the replication factor 3, results in almost 200,000 "standard" processors working "full time" on the analysis of the recorded data. The actual number of computers registered for the SETI@home project is greater at least an order of magnitude (there are several millions of registered computers), but these computers are usually not available "full time" (only about 20 % of registered computers are reported as active), and not all have 1 GHz processors.

Let the execution time of a single work unit on a "standard" processor be denoted by Tu, and let Td denotes the time of sending a work unit to a client, while Tr - the time needed to return the results of data analysis. The speedup, S(N), of distributed processing on a system composed of N processors, is:

S(N) = (N * Tu) / (Td + Tu + Tr)

and, since Tu is much greater than Td and Tr:

S(N) = N

so, for this particular application, the practical speedup approximates the ideal speedup, and this makes the SETI@home project so spectacularly successful.

Research topics include:


Prev Page Up to Project Page

Copyright by W.M. Zuberek, All rights reserved.
Revised: 2004.07.30 :

Notice: Use of undefined constant COUNTNAME - assumed 'COUNTNAME' in /users/cs/faculty/wlodek/.www/research/proj-distr-comp-c.php on line 135

Notice: Use of undefined constant COUNTER - assumed 'COUNTER' in /users/cs/faculty/wlodek/.www/research/counter.php on line 21
1082