Synergy User Manual and Tutorial. - THE CORE MEMORY
Synergy User Manual and Tutorial. - THE CORE MEMORY
Synergy User Manual and Tutorial. - THE CORE MEMORY
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> And <strong>Tutorial</strong>
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Documenting the <strong>Synergy</strong> Project<br />
Supervised by Dr. Yuan Shi<br />
Compiled by Joe Jupin<br />
syn·er·gy (sǐn r-jē) noun<br />
plural syn·er·gies<br />
1. The interaction of two or more agents or forces so that their combined<br />
effect is greater than the sum of their individual effects.<br />
2. Cooperative interaction among groups, especially among the acquired<br />
subsidiaries or merged parts of a corporation, that creates an enhanced<br />
combined effect.<br />
[From Greek sunergia, cooperation, from sunergos, working together.]<br />
"For it is unworthy of excellent men to lose hours like slaves in the labour of<br />
calculation which could safely be relegated to anyone else if machines were<br />
used."<br />
-Gottfried Wilhelm Leibniz<br />
2
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Table of Contents<br />
Introduction<br />
1. History <strong>and</strong> Limitations of Traditional Computing<br />
Parallel Processing<br />
1. What is parallel processing?<br />
2. Why parallel processing?<br />
3. History <strong>and</strong> Existing Tools for Parallel Processing<br />
a. History of Parallel Processing<br />
b. Linda<br />
c. Parallel Virtual Machine (PVM)<br />
d. Message Passing Interface (MPI)<br />
4. Parallel Programming Concepts<br />
a. Symmetric MultiProcessor (SMP)<br />
b. Stateless Machine (SLM)<br />
c. Stateless Parallel Processing (SPP)<br />
d. Tuple Spaces<br />
e. Division of labor (sharing workload between workers)<br />
f. Debugging Parallel Programs<br />
5. Theory <strong>and</strong> Challenges of Parallel Programs <strong>and</strong> Performance Evaluation<br />
a. Temporal Logic<br />
b. Petri Net<br />
c. Amdahl’s Law<br />
d. Gustafson’s Laws<br />
e. Performance Metrics<br />
f. Timing Models<br />
i. Gathering System Performance Data<br />
ii. Gathering Network Performance Data<br />
g. Optimal Load balancing<br />
h. Availability<br />
About <strong>Synergy</strong><br />
1. Introduction to The <strong>Synergy</strong> Project<br />
a. What is <strong>Synergy</strong>?<br />
b. Why <strong>Synergy</strong>?<br />
c. History<br />
2. Major Components <strong>and</strong> Inner Workings of <strong>Synergy</strong><br />
a. What are in <strong>Synergy</strong>? (<strong>Synergy</strong> Kernel with Explanation)<br />
3. Comparisons with Other Systems<br />
a. <strong>Synergy</strong> vs. PVM/MPI<br />
b. <strong>Synergy</strong> vs. Linda<br />
4. Parallel Programming <strong>and</strong> Processing in <strong>Synergy</strong><br />
5. Load Balance <strong>and</strong> Performance Optimization<br />
3
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
6. Fault Tolerance<br />
Installing <strong>and</strong> Configuring <strong>Synergy</strong><br />
1. Basic Requirements<br />
2. Compiling<br />
3. Setup<br />
4. Configuring the <strong>Synergy</strong> Environment<br />
5. Activating <strong>Synergy</strong><br />
6. Creating a Processor Pool<br />
Using <strong>Synergy</strong><br />
1. The <strong>Synergy</strong> System<br />
a. The Comm<strong>and</strong> Specification Language (csl) File<br />
b. <strong>Synergy</strong>’s Tuple Space Objects<br />
c. <strong>Synergy</strong>’s Pipe Objects<br />
d. <strong>Synergy</strong>’s File Objects<br />
e. Compiling <strong>Synergy</strong> Applications<br />
f. Running <strong>Synergy</strong> Applications<br />
g. Debugging <strong>Synergy</strong> Applications<br />
2. Tuple Space Object Programming<br />
a. A simple application—Hello <strong>Synergy</strong>!<br />
b. Sending <strong>and</strong> Receiving Data—Hello Workers!—Hello Master!!!<br />
c. Sending <strong>and</strong> Receiving Data Types<br />
d. Getting Workers to Work<br />
i. Sum of First N Integers<br />
ii. Matrix Multiplication<br />
e. Work Distribution by Chunking<br />
i. Sum of First N Integers Chunking Example<br />
ii. Matrix Multiplication Chunking Example<br />
f. Optimized Programs<br />
i. Matrix Multiplication Optimized<br />
3. Pipe Object Programming<br />
4. File Object Programming<br />
Parallel Meta-Language (PML)<br />
1. Automated Parallel Code Generation<br />
Future Directions<br />
Function <strong>and</strong> Comm<strong>and</strong> Reference<br />
1. Comm<strong>and</strong>s<br />
2. Functions<br />
3. Error Codes<br />
References<br />
Index<br />
4
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Introduction<br />
Red text: Copied <strong>and</strong> pasted from syng_man.ps by Dr. Shi<br />
The emergence of low cost, high performance uni-processors forces the enlargement of<br />
processing grains in all multi-processor systems. Consequently, individual parallel<br />
programs have increased in length <strong>and</strong> complexities. However, like reliability, parallel<br />
processing of any multiple communicating sequential programs is not really a functional<br />
requirement.<br />
Separating pure functional programming concerns from parallel processing <strong>and</strong> resource<br />
management concerns can greatly simplify the conventional ``parallel programming''<br />
asks. For example, the use of dataflow principles can facilitate automatic task<br />
scheduling. Smart tools can automate resource management. As long as the application<br />
dependent parallel structure is uncovered properly, we can even automatically assign<br />
processors to parallel programs in all cases.<br />
<strong>Synergy</strong> V3.0 is an implementation of above ideas. It supports parallel processing using<br />
multiple ``Unix computers'' mounted on multiple file systems (or clusters) using TCP/IP.<br />
It allows parallel processing of any application using mixed languages, including parallel<br />
programming languages. <strong>Synergy</strong> may be thought of as a successor to Linda 1 , PVM 2 <strong>and</strong><br />
Express 3 .<br />
Our need to store <strong>and</strong> process data has been continually increasing for thous<strong>and</strong>s of years.<br />
This need has lead to the development of complex storage, communication, numerical<br />
<strong>and</strong> processing systems. The information in this section was wholly obtained from<br />
sources freely available on the Internet, which are cited in the references section. Much<br />
of it was obtained from timelines, encyclopedias <strong>and</strong> academic Web pages. The accuracy<br />
of information collected from the Internet was checked by using multiple corroborating<br />
resources <strong>and</strong> eliminating contradictory information.<br />
1 Linda is a tuple space parallel programming system lead by Dr. David Gelenter, Yale University. Its<br />
commercial version is distributed by the Scientific Computing Associates, New Heaven, NH.<br />
2 PVM is a message passing parallel programming system by Oak Ridge National Laboratory, University<br />
of Tennessee <strong>and</strong> Emory University.<br />
3 Express is a commercial message passing parallel programming system by ParaSoft, CA.<br />
5
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
History <strong>and</strong> Limitations of Ancient <strong>and</strong> Traditional<br />
Computing<br />
The first recognized use of a tool to record the<br />
result of transactions was a device called a tally<br />
stick. The oldest known artifact is a wolf bone<br />
with a series of fifty-five cuts in groups of five<br />
that dates from approximately 30,000 to 25,000<br />
BC. The notches in the stick may refer to the<br />
number of coins or other items that are counted<br />
by some early form of bookkeeping. The<br />
earliest stock markets used tally sticks to record<br />
transactions. The word “stock” actually means a<br />
stout stick. During a transaction the “broker”<br />
would record the transaction for the purchase of<br />
stock on a tally stick <strong>and</strong> then “break” the stick,<br />
keeping half <strong>and</strong> giving the other half to the<br />
investor. The two halves would be fit together<br />
at some later time to verify the investor’s<br />
ownership of the shares of stock. In 1734 the<br />
English government ordered the cessation of the<br />
use of tally sticks but they were not completely<br />
abolished until 1826. By 1834 British<br />
Parliament collected a very large number of tally sticks, which the decided to burn in the<br />
fireplace at the House of Lords. The fireplace was “engorged” with tally sticks such that<br />
the fire spread to the paneling <strong>and</strong> to the neighboring House of Commons, destroying<br />
both buildings, which took ten years to reconstruct. i Other primitive recording devices<br />
included clay tablets, knotted strings, pebbles in bags <strong>and</strong> parchments. In modern times,<br />
books or legers have been used to record commercial or financial data using more formal<br />
bookkeeping systems, such as the double entry st<strong>and</strong>ard that is widely used today.<br />
The first place-valued numerical system, in which both digit <strong>and</strong> position within the<br />
number determine value, <strong>and</strong> the abacus, which was the first actual calculating<br />
mechanism, are believed to have been invented by the Babylonians sometime between<br />
3000 <strong>and</strong> 500 BC. Their number system is believed to have been developed based on<br />
astrological observations. It was a sexagesimal (base-60) system, which had the<br />
advantage of being wholly divisible by 2, 3, 4, 5, 6, 10, 15, 20 <strong>and</strong> 30. The first abacus<br />
was likely a stone covered with s<strong>and</strong> on which pebbles were moved across lines drawn in<br />
the s<strong>and</strong>. Later improvements were constructed from wood frames with either thin sticks<br />
or a tether material on which clay beads or pebbles were threaded. Sometime between<br />
6
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
200 BC the 14 th century, the<br />
Chinese invented a more advanced<br />
abacus device. The typical<br />
Chinese swanpan (abacus) is<br />
approximately eight inches tall <strong>and</strong><br />
of various widths <strong>and</strong> typically has<br />
more than seven rods, which hold<br />
beads usually made from<br />
hardwood. This device works as a<br />
5-2-5-2 based number system,<br />
which is similar to the decimal<br />
system. Advanced swanpan techniques are not limited to simple addition <strong>and</strong><br />
subtraction. Multiplication, division, square roots <strong>and</strong> cube roots can be calculated very<br />
efficiently. A variation of this devise is still in use by shopkeepers in various Asian<br />
countries. ii There is direct evidence that the Chinese were using a positional number<br />
system by 1300 BC <strong>and</strong> were using a zero valued digit by 800 AD.<br />
Sometime after 200 BC, Eratosthenes of Cyrene (276-194 BC) developed the Sieve of<br />
Eratosthenes, which was a procedure for determining prime numbers. It is called a sieve<br />
because it strains or filters out all non-primes. The process is as follows:<br />
1. Make a list of all integers greater than one <strong>and</strong> less than or equal to n<br />
2. Strike out the multiples of all primes less than or equal to the square root of n.<br />
3. The numbers that are left are the primes.<br />
The table below show the result for n = 50 with primes in the white squares.<br />
2 3 4 5 6 7 8 9 10<br />
11 12 13 14 15 16 17 18 19 20<br />
21 22 23 24 25 26 27 28 29 30<br />
31 32 33 34 35 36 37 38 39 40<br />
41 42 43 44 45 46 47 48 49 50<br />
Eratosthenes is also credited with being the first person to accurately estimate the<br />
diameter of the Earth <strong>and</strong> also served as the director of the famed Library of Alex<strong>and</strong>ria. iii<br />
7
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
A postage stamp issued by the USSR in<br />
1983 to commemorate the 1200th<br />
anniversary of Muhammad al-<br />
Khowarizmi. Scanned by Donald Knuth,<br />
one of the legends of computer science.<br />
The Sieve of Eratosthenes is one of the first welldocumented<br />
uses of an efficient algorithm-type solution<br />
to solve a complex problem. The word algorithm is<br />
derived from the Latin derivation of Al-Khowarizmi’s<br />
name. Muhammad ibn Musa al-Khwarizmi was an<br />
Arab mathematician of the court of Mamun in Baghdad<br />
born before 800 AD in central Asia, now called<br />
Uzbekistan. Along with other Arabic mathematicians,<br />
he is responsible for the proliferation of the base-ten<br />
number system, which was developed in India. His<br />
book on the subject of Hindu numerals was later<br />
translated into the Latin text Liber Algorismi de<br />
numero Indorum. While a scholar at the House of<br />
Wisdom in Baghdad, he wrote Hisãb al-jabr w'almuqãbala<br />
(from which the word "algebra" is derived).<br />
Lose translations of this title could be “the science of<br />
transposition <strong>and</strong> cancellation” or “the calculation of<br />
reduction <strong>and</strong> restoration.” He devised a method to<br />
restore or transpose negative terms to the other side of<br />
an equation <strong>and</strong> reduce (cancel) or unite similar terms<br />
on either side of the equation. Transposition means that a quantity can be added or<br />
subtracted (multiplied or divided) from both sides of an equation <strong>and</strong> cancellation means<br />
that if there are two equal terms on either side of an equation, they can be altogether<br />
cancelled. The following is a translation of a popular verse in Arab schools from over six<br />
hundred years ago:<br />
Cancel minus terms <strong>and</strong> then<br />
Restore to make your algebra;<br />
Combine your homogeneous terms<br />
And this is called muqabalah.<br />
Robert of Chester translated this work into Latin in 1140 AD. Similar methods are still in<br />
use in modern algebraic manipulations, which came in the sixteenth century from<br />
Francois Viète. Al-Khowarizmi also claimed in his book Indorum (the book of Al-<br />
Khowarizmi) that any complex mathematical problem could be broken down into<br />
smaller, simpler sub-problems, whose results could be logically combined to solve the<br />
initial problem. This is the main concept of an algorithm. Latin translations of his work<br />
contributed to much of medieval Europe’s knowledge of mathematics. In 1202,<br />
Leonardo of Pisa (otherwise known by his nickname Fibonacci) (c. 1175-1250) wrote the<br />
8
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
historic book Liber Abaci or “The Book of Calculation”, which was his interpretation of<br />
the Arabic-Hindu decimal number system that he learned while traveling with Arabs in<br />
North Africa. This book was the first to expose the general public, rather than academia,<br />
to the decimal number system, which quickly gained popularity because of its clear<br />
superiority over existing systems. iv<br />
The Greek astronomer,<br />
geographer <strong>and</strong><br />
mathematician<br />
Hipparchus (c. 190 BC<br />
– 120 BC) likely<br />
invented the<br />
navigational instrument<br />
called an astrolabe.<br />
This is a protractor-like<br />
device consisting of a<br />
degree marked circle<br />
with a center attached rotating arm. When the zero degree mark is aligned on the horizon<br />
<strong>and</strong> a celestial body is sighted along the movable arm, the celestial body’s position can be<br />
read from the degree marks on the circle. The sextant eventually replaced this device<br />
because the sextant measured relative to the horizon <strong>and</strong> not the device itself, which<br />
allowed more accurate measurements of position for latitude.<br />
Sometime between 1612 <strong>and</strong> 1614, John Napier (1550 -<br />
1617), born at Merchiston Tower in Edinburgh,<br />
Scotl<strong>and</strong>, developed the decimal point, logarithms <strong>and</strong><br />
Napier’s bones—an abacus for the calculation of<br />
products <strong>and</strong> quotients of numbers. H<strong>and</strong> performed<br />
calculations were made much easier by the use of<br />
logarithms, which made possible many later scientific<br />
advancements. Mirifici Logarithmorum Canonis<br />
Descriptio or in English "Description of the Marvelous<br />
Canon of Logarithms", his mathematical work, contained<br />
thirty-seven pages of explanatory matter <strong>and</strong> ninety<br />
pages of tables, which furthered advancements in<br />
astronomy, dynamics <strong>and</strong> physics. Based on Napier’s<br />
algorithms in 1622, William Oughtred (1574 - 1660)<br />
invented the circular slide rule for calculating multiplication <strong>and</strong> division. In 1632 he<br />
published Circles of Proportion <strong>and</strong> the Horizontal Instrument, which described slide<br />
rules <strong>and</strong> sundials. By 1650 the sliding stick form of the slide rule was developed. In<br />
9
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
1624, Henry Briggs (1561 - 1630) published the first set of modern logarithms, <strong>and</strong> in<br />
1628, Adrian Vlacq published the first complete set of modern logarithms.<br />
In 1623,<br />
Wilhelm<br />
Schickard (1592<br />
- 1635) invented<br />
what is believed<br />
to be the first<br />
mechanical<br />
calculating<br />
machine (left). This device used a “calculating<br />
clock” with a gear driven carry for mechanism to<br />
calculate the multiplication of multi-digit numbers<br />
in higher order positions. Between 1642 <strong>and</strong> 1643, at the age of 18, Blaise Pascal (1623 -<br />
1662) created the “Pascaline” (right) a gear driven adding machine, which was the first<br />
mechanical adding/subtracting machine. Pascal developed this machine to help his father<br />
with his work—a tax collector. He discovered how to mechanically carry numbers to the<br />
next high order by causing the higher order gear to advance one tooth for a full rotation<br />
(ten teeth) of the next lower ordered gear. This method is similar to that of old pinball<br />
machines or gas pumps with rotating number counters. These devices were never placed<br />
into commercial service due to high cost of manufacture. Approximately fifty Pascalines<br />
were constructed <strong>and</strong> could h<strong>and</strong>le calculations with up to eight digits. v<br />
In 1666 Sir Samuel Morl<strong>and</strong> (1625-1695) invented a mechanical calculator that could add<br />
<strong>and</strong> subtract. This machine was designed for use with English currency but had no<br />
automatic carry mechanism. Auxiliary dials recorded numerical overflows <strong>and</strong> had to be<br />
re-entered as addends. vi In 1673, Gottfried Wilhelm von Leibniz (1646 - 1716) designed<br />
a machine called the “Stepped Reckoner” that could mechanically perform all four<br />
mathematical operations using a stepped cylinder gear, though the initial design gave<br />
some wrong answers. This machine was never mass-produced because the high level of<br />
precision needed to manufacture it was not yet available. vii In 1774 Philipp-Matthaus<br />
Hahn (1739 - 1790) constructed <strong>and</strong> sold a small number of mechanical calculators with<br />
twelve digits of precision.<br />
The advent of the Industrial Revolution, just prior to the start of the nineteenth century,<br />
ushered in a massive increase in commercial activity. This created a great need for<br />
automatic <strong>and</strong> reliable calculation. Charles Xavier Thomas (1791 - 1871) of Colmar,<br />
France invented the first mass-produced calculating machine, called the Arithmometer<br />
(left) in 1820. His machine used Leibniz’s stepped cylinder as a digital-value actuator.<br />
However, Thomas’ automatic carry system worked in every possible case <strong>and</strong> was much<br />
10
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
more robust than any<br />
predecessor. This machine was<br />
improved <strong>and</strong> produced for<br />
decades. Other models, designed<br />
by competitors, eventually<br />
entered the marketplace.<br />
In 1786, J. H. Mueller, of the<br />
Hessian army, conceived the<br />
“Difference Engine” but could<br />
not raise the funds necessary for<br />
its construction. This was a<br />
special purpose calculating<br />
device that, given the differences between certain values where the polynomial is<br />
uniquely specified, can tabulate the polynomial values. This calculator would be useful<br />
for functions that can be approximated polynomially over certain intervals. The<br />
realization of the Difference Engine’s mechanical computer prototype design would not<br />
occur until 1822, when conceived by<br />
Charles Babbage (1792 - 1871). In<br />
1832, Babbage <strong>and</strong> Joseph Clement<br />
built a scaled-down prototype that could<br />
perform operations on 6-digit numbers<br />
<strong>and</strong> 2 nd order or quadratic polynomials.<br />
A full-sized machine would be as big as<br />
a room <strong>and</strong> able to perform operations<br />
on 20-digit numbers <strong>and</strong> 6 th order<br />
polynomials. Babbage’s Difference<br />
Engine project was eventually canceled<br />
due to cost overruns. In 1843, George<br />
Scheutz <strong>and</strong> his son Edvard Scheutz, of<br />
Stockholm, produced a 3 rd order engine<br />
with the ability to print its results. From<br />
1989-91, a team at London's Science<br />
Museum built a fully functional<br />
Difference Engine based on Babbage’s<br />
latest (1837), improved <strong>and</strong> simpler<br />
design, using modern construction<br />
materials <strong>and</strong> techniques. The machine<br />
could successfully operate on 31-digit numbers <strong>and</strong> 7 th order differences.<br />
11
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
The Difference Engine uses Sir Isaac Newton’s method of differences. It works as<br />
follows: Consider the polynomial p(x) = x 2 + 2x + 1 <strong>and</strong> tabulate the values for p(0),<br />
p(0.1) , p(0.2) , p(0.3) , p(0.4). The table below contains the results of the polynomial<br />
values in the first column, the differences of each consecutive set of polynomial results in<br />
the second column, <strong>and</strong> the differences of each consecutive set of differences from the<br />
second column in the third column. For a 2 nd order polynomial, the third column will<br />
always contain the same value.<br />
Likewise, for an n th order<br />
polynomial, column n+1 will<br />
always have the same value. To<br />
find p(0.5), start from the right<br />
column with value 0.02 <strong>and</strong><br />
subtract this from the second<br />
column to get -0.29. Then<br />
subtract this value from the first<br />
column to get 2.25, which is the<br />
solution to p(0.5). This can be<br />
p(0) = 1<br />
1 – 1.21 = -0.21<br />
p(0.1) = 1.21 -0.21 – (-0.23) = 0.02<br />
1.21 – 1.44 = -0.23<br />
p(0.2) = 1.44 -0.23 – (-0.25) = 0.02<br />
1.44 – 1.69 = -0.25<br />
p(0.3) = 1.69 -0.25 – (-0.27) = 0.02<br />
p(0.4) = 1.96<br />
continued incrementally for greater p(x), indefinitely, by updating the table <strong>and</strong> repeating<br />
the algorithm.<br />
This device impresses a zinc<br />
block, which prints the results<br />
of calculations on paper. This<br />
could be considered the first<br />
st<strong>and</strong>alone computer printer.<br />
1.69 – 1.96 = -0.27<br />
Babbage also invented the<br />
Analytical Engine, which<br />
was the first computing<br />
device designed to use readonly<br />
memory, in the form of<br />
punched cards, to store programs. This generalpurpose<br />
mathematical device was very similar to<br />
electronic processes used in early computers. Later<br />
designs of this machine would perform operations on<br />
40-digit numbers. The machine had a processing unit<br />
called the “mill” that contained two main<br />
accumulators <strong>and</strong> some special purpose auxiliary<br />
accumulators. It also had memory area called the<br />
“store”, which could hold approximately 100 more<br />
numbers. To accept data <strong>and</strong> program instructions,<br />
the Analytical Engine would be equipped with<br />
several punch card readers in which the cards were<br />
linked together to allow forward <strong>and</strong> reverse reading.<br />
These linked cards were first used in 1801 by Joseph-<br />
Marie Jacquard to control the weaving patterns of a<br />
loom. The machine could perform conditional<br />
12
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
branching called “jumps”, which allowed it to skip to a desired instruction. The device<br />
was capable of using a form of microcoding by using the position of studs on a metal<br />
barrel called the “control barrel” to interpret instructions. This machine could calculate<br />
an addition or subtraction operation in about three seconds, <strong>and</strong> a multiplication or<br />
division operation in about three minutes.<br />
In 1843, Augusta Ada Byron (1815 - 1852), Lady<br />
Lovelace, mathematician, scientist <strong>and</strong> daughter of the<br />
famed poet Lord Byron, translated an article from<br />
French about Babbage’s Analytical Engine, adding her<br />
own notes. Ada composed a plan for the calculation of<br />
Bernoulli numbers, which is considered to be the first<br />
ever “computer program.” Though because it was<br />
never built, the algorithm was never run on Analytical<br />
Engine. In 1979, the U.S. Department of Defense<br />
honored the world’s first “computer programmer” by<br />
naming its own software development language as<br />
“Ada.” viii<br />
George Boole (1815 -<br />
1864) (right) wrote, "An<br />
Investigation of the Laws<br />
of Thought, on Which Are<br />
Founded the Mathematical<br />
Theories of Logic <strong>and</strong> Probabilities" in 1854. This article<br />
detailed Boole’s new binary approach, which processed only<br />
two objects at a time (in a yes-no, true-false, on-off, zero-one<br />
type manner), to logic by incorporating it into mathematics<br />
<strong>and</strong> reducing it to a simple algebra, which presented an<br />
analogy between symbols that represent logical forms <strong>and</strong><br />
algebraic symbols. Three primary operations were defined based on those in Set Theory:<br />
AND—intersection, OR—union, <strong>and</strong> NOT—compliment. This system was the<br />
beginning of the Boolean algebra that is the basis for many applications in modern<br />
electronic circuits <strong>and</strong> computation. ix Though his idea was either ignored or criticized by<br />
many of his peers, twelve years later, an American, Charles S<strong>and</strong>ers Peirce, described it<br />
to the American Academy of Arts <strong>and</strong> Sciences. He spent the next twenty years<br />
exp<strong>and</strong>ing <strong>and</strong> modifying the idea, eventually designing a basic electrical logic-circuit.<br />
Processing <strong>and</strong> storage were not the only advancements<br />
made prior to the 20 th century. There were also great<br />
improvements in communications technology. Samuel<br />
13
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Morse (1791 -1872) conceived the telegraph in 1832<br />
<strong>and</strong> had built a working model by 1835. This was the<br />
first device to communicate through the use of<br />
electricity. The telegraph worked by tapping out a<br />
message from a sending device (right) in Morse code,<br />
which was a series of dots-<strong>and</strong>-dashes that<br />
represented letters, numbers, punctuation <strong>and</strong> other<br />
symbols. These dots-<strong>and</strong>-dashes were converted into<br />
electrical impulses <strong>and</strong> sent, on the wire, to a receiver<br />
(left). The receiver converted the electrical impulses to an audible sound that represented<br />
the original dots-<strong>and</strong>-dashes. In 1844, he sent a signal from Washington to Baltimore<br />
over this communication device. By 1854 there was 23,000 miles of telegraph wire being<br />
used within the United States. This provided a much more efficient form of<br />
communication that greatly affected national socio-economic development. x In 1858, a<br />
telegraph cable was run across the Atlantic Ocean, providing communication service<br />
between the U.S. <strong>and</strong> Engl<strong>and</strong> for less than a month. By 1861 a transcontinental cable<br />
connected the East <strong>and</strong> West coasts of the U.S. <strong>and</strong> by 1880, 100,000 miles of undersea<br />
cable had been laid.<br />
The next great advancement in<br />
communication was Alex<strong>and</strong>er<br />
Graham Bell’s (1847 - 1922)<br />
invention of the "electrical speech<br />
machine" or telephone in 1876.<br />
This invention was developed from<br />
improvements that Bell made to<br />
the telegraph, which allowed more<br />
than one signal to be transmitted<br />
over a single set of telegraph wires,<br />
simultaneously. Within two years,<br />
he had set up the first telephone<br />
exchange in New Haven,<br />
Connecticut. He had established<br />
long distance connections between<br />
Boston, Massachusetts <strong>and</strong> New<br />
York City by 1884. The telecommunication industry would eventually reach almost<br />
every locality in the country, then the world. Bell’s original venture evolved into larger<br />
companies <strong>and</strong> in 1881 American Bell Telephone Co. Inc. purchased Western Electric<br />
Manufacturing Company to manufacture equipment for Bell. In 1885, American<br />
Telephone <strong>and</strong> Telegraph Company (AT&T) were formed to extend Bell system long<br />
lines across the U.S. <strong>and</strong> in 1899 AT&T became the parent company of Bell, assuming<br />
14
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
all assets. The Western Electric Engineering Dept. was organized in 1907 <strong>and</strong> a research<br />
branch to do scientific research <strong>and</strong> development was organized in 1911. On December<br />
27, 1925, Bell Telephone Laboratories was created to consolidate the research labs from<br />
AT&T <strong>and</strong> Western Electric, which remained a wholly owned subsidiary of AT&T after<br />
the divestiture of the seven regional Bell companies. Bell Laboratories would eventually<br />
become one of the world’s premier communication <strong>and</strong> computer research centers. One<br />
of Bell Labs contributions to computing was the development of UNIX by Dennis<br />
Ritchie <strong>and</strong> Kenneth Thomson in 1970. In 1991, AT&T acquired NCR, formerly<br />
National Cash Register, which became AT&T Global Information Solutions. xi<br />
The explosion in population growth between 1880 <strong>and</strong> 1890,<br />
due to increased birth rates <strong>and</strong> immigration, created a great<br />
dilemma for the Census Bureau. During this time, Herman<br />
Hollerith (right) was a statistician for the Census Bureau <strong>and</strong><br />
was responsible to solve problems related to the processing<br />
of large amounts of data from the 1880 US census. He was<br />
attempting to find ways of manipulating data mechanically as<br />
was suggested to him by Dr. John Shaw Billings. In 1882,<br />
Hollerith joined MIT to teach mechanical engineering <strong>and</strong><br />
also started to experiment with Billings’ suggestion by<br />
studying the operation of the Jacquard loom. Though he<br />
found that the loom’s operation was not useful for processing data, he determined that the<br />
punched cards were very useful for storing data. In 1884, Hollerith devised a method to<br />
convert the data stored on the punched cards into electrical impulses using card-reading<br />
device. He also developed a typewriter-like device to record the data on the punched<br />
cards, which changed very little in its design over the next 50 years. The card readers<br />
used pins that pass through the holes in the cards creating electrical contacts, where the<br />
impulses from these contacts would activate mechanical counters to manipulate <strong>and</strong> tally<br />
the data. This system was successfully demonstrated in 1887 by tabulating mortality<br />
statistics <strong>and</strong> won the bid to be used to tabulate the 1890 Census data.<br />
Hollerith had Pratt <strong>and</strong> Whitney manufacture the<br />
punching devices <strong>and</strong> the Western Electric<br />
Company to manufacture the counting devices. The<br />
Census Bureau’s new system was ready by 1890<br />
<strong>and</strong> processing the first data by September the same<br />
year. The count was completed by December 12,<br />
1890 revealing that the total population of the<br />
United States to be 62,622,250. The count was not<br />
only completed eight times faster than if it was<br />
performed manually, it also allowed the gathering<br />
15
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
of more data than was possible before about the country’s population, such as number of<br />
children in family, etc. Hollerith founded the Tabulating Machine Company in 1896 to<br />
produce his improved counting machines <strong>and</strong> other inventions, one of which<br />
automatically fed the cards into the counting machines. His system was used again for<br />
the 1900 Census but because Hollerith dem<strong>and</strong>ed more that the cost to count the data by<br />
h<strong>and</strong>, the Census Bureau was forced to develop its own system. In 1911, Hollerith’s<br />
company merged with another company, becoming the Computer Tabulating Recording<br />
Company but was nearly forced out of the counting machine market due to fierce<br />
competition from new entrants. Hollerith retired his position of consulting engineer in<br />
1921. Because of the efforts Thomas J Watson, who joined the company in 1918, the<br />
company reestablished its position as a leader in the market by 1920. In 1924, Computer<br />
Tabulating Recording Company was renamed as International Business Machines<br />
Corporation (IBM). By 1928, punch card equipment will be attached to computers as<br />
output devices <strong>and</strong> will also be used by L. J. Comrie to calculate the motion of the<br />
moon. xii<br />
In 1895, Italian physicist <strong>and</strong> inventor<br />
Guglielmo Marconi sent the first<br />
wireless message. Prior to his first<br />
transmission, Marconi studied the works<br />
of Heinrich Hertz (1857-1894) <strong>and</strong> later<br />
started to experiment with Hertzian<br />
waves to transmit <strong>and</strong> receive messages<br />
over increasing distances without the use<br />
of wires. The messages were sent in<br />
Morse code. He patented his invention<br />
in 1896. After years of<br />
experimentation <strong>and</strong> improvement,<br />
especially with respect to distance, in<br />
1897 Marconi named his company as the Wireless Telegraph <strong>and</strong> Signal Company. After<br />
a series of takeovers <strong>and</strong> mergers, this company eventually became part of the General<br />
Electric Company (GEC), which was eventually renamed Marconi Corporation plc in<br />
2003. xiii In 1904, radio technology was improved by the<br />
invention of the two-electrode radio rectifier, which was<br />
the first electron tube, also called the oscillation valve or<br />
thermionic valve (left). It is credited to John Ambrose<br />
Fleming, a consultant to the Marconi Company. This<br />
device was much more sensitive to radio signals then its<br />
predecessor, the coherer. This invention inspired all<br />
16
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
subsequent developments in wireless transmission. In<br />
1906, Lee de Forest improved the thermionic valve by<br />
adding a third electrode <strong>and</strong> a grid to control <strong>and</strong> amplify<br />
signals, creating a new device called an Audion. This<br />
device was used to detect radio waves <strong>and</strong> convert the<br />
radio frequency (RF) to an audio frequency (AF), which<br />
could be amplified through a loudspeaker or headphones.<br />
By 1907 gramophone music was regularly broadcast from<br />
New York over radio waves. xiv In 1907, both A. A.<br />
Campbell-Swinton<br />
()(left) <strong>and</strong> Boris<br />
Rosing ()<br />
independently suggest<br />
using cathode ray tubes to transmit images. Though<br />
intended for television, the cathode ray tube has made<br />
a valuable contribution to computing by providing a<br />
human readable interface with computational devices.<br />
In a letter to Nature magazine, Swinton describes first<br />
full description of an all-electronic television system<br />
as:<br />
“Distant electric vision can probably be solved by the<br />
employment of two beams of kathode rays (one at the<br />
transmitting <strong>and</strong> one at the receiving station)<br />
synchronously deflected by the varying fields of two<br />
electromagnets placed at right angles to one another <strong>and</strong> energised by two alternating<br />
electric currents of widely different frequencies, so that the moving extremities of the two<br />
beams are caused to sweep synchronously over the whole of the required surfaces within<br />
the one-tenth of a second necessary to take advantage of visual persistence. Indeed, so<br />
far as the receiving apparatus is concerned, the moving kathode beam has only to be<br />
arranged to impinge on a suitably sensitive fluorescent screen, <strong>and</strong> given suitable<br />
variations in its intensity, to obtain the desired result.”<br />
In 1927, during a television demonstration, Herbert Hoover’s face is the first image<br />
broadcast in the U.S., using telephone wires for the voice transmission. Vladimir<br />
Zworykin invented the cathode ray tube (CRT) in 1928. It eventually became the first<br />
computer storage device. Color television signals were successfully transmitted in 1929<br />
<strong>and</strong> first broadcast in 1940.<br />
17
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In 1911, while studying the effects of extremely cold temperatures on metals such as<br />
mercury <strong>and</strong> lead, physicist Heike Kamerlingh Onnes discovered that they lost all<br />
resistance at certain low temperatures just above absolute zero. This phenomenon is<br />
known as superconductivity. In 1915, another physicist, Manson Benedicks, discovered<br />
that alternating current could be converted to direct current by using a germanium crystal,<br />
which eventually leads to the use of microchips. In 1919, U.S. physicists William Henry<br />
Eccles (1875 - 1966) <strong>and</strong> F.W. Jordan () invented the flip-flop, the first electronic<br />
switching electric circuit, which was critical to high-speed electronic counting systems.<br />
The flip-flop is a digital logic hardware circuit that can switch or toggle between two<br />
states controlled by its inputs, which is similar to a one-bit memory. The three common<br />
types of flip-flop are: the SR flip-flop, the JK flip-flop <strong>and</strong> the D-type flip-flop (shown<br />
below).<br />
In 1925, Vannevar<br />
Bush (1890 - 1974)<br />
developed the first<br />
analog computer to<br />
solve differential<br />
equations. These<br />
analog computers<br />
were mechanical<br />
devices that used<br />
large gears <strong>and</strong> other<br />
mechanical parts to<br />
solve equations. The<br />
first working machine<br />
was completed in<br />
1931 (left). In 1945,<br />
he published an<br />
article in the Atlantic Monthly called, "As We May Think, which described a theoretical<br />
device called a memex. This device uses a microfilm search system, which is very<br />
similar to hypertext, using a concept that he called associative trails. His description of<br />
the system is:<br />
18
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
"The owner of the memex let us say, is interested in the<br />
origin <strong>and</strong> properties of the bow <strong>and</strong> arrow. Specifically he<br />
is studying why the short Turkish bow was apparently<br />
superior to the English long bow in the skirmishes of the<br />
Crusades. He has dozens of possibly pertinent books <strong>and</strong><br />
articles in his memex. First he runs through an<br />
encyclopedia, finds an interesting but sketchy article,<br />
leaves it projected. Next, in a history, he finds another<br />
pertinent item, <strong>and</strong> ties the two together. Thus he goes,<br />
building a trail of many items. Occasionally he inserts a<br />
comment of his own, either linking it into the main trail or<br />
joining it by a side trail to a particular item. When it<br />
becomes evident that the elastic properties of available<br />
materials had a great deal to do with the bow, he branches<br />
off on a side trail which takes him through textbooks on<br />
elasticity <strong>and</strong> physical constants. He inserts a page of longh<strong>and</strong> analysis of his own. Thus<br />
he builds a trail of his interest through the maze of materials available to him."<br />
In 1934, Konrad Zuse (1910 - 1995) was an engineer<br />
working for Henschel Aircraft Company, studying<br />
stresses caused by vibrations in aircraft wings. His<br />
work involved a great deal of mathematical calculation.<br />
To aid him in these calculations, he developed ideas on<br />
how machines should perform calculations. He<br />
determined that these machines should be freely<br />
programmable by reading a sequence of instructions<br />
from a punched tape <strong>and</strong> that the machine should make<br />
use of both the binary number system <strong>and</strong> a binary logic<br />
system to be capable of using binary switching<br />
elements. He designed a semi-logarithmic floatingpoint<br />
unit representation, using an exponent <strong>and</strong> a<br />
mantissa, to calculate both very small <strong>and</strong> very large<br />
numbers. He developed a “high performance adder”,<br />
which included a one-step carry-ahead <strong>and</strong> precision<br />
arithmetic exceptions h<strong>and</strong>ling. He also developed an addressable memory that could<br />
store arbitrary data. He devised a control unit to control all other devices within the<br />
machine along with input <strong>and</strong> output devices that convert numbers from binary to<br />
decimal <strong>and</strong> vice versa.<br />
By 1936 he completed the design for the Z1 computer (top next page), which he<br />
constructed in his parents’ living room by 1938. This was a completely mechanical unit<br />
19
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
based on his previous design.<br />
Though unreliable, it had the<br />
ability to store 64 words, each 22<br />
bits in length (8 bits for the<br />
exponent <strong>and</strong> sign, <strong>and</strong> 14 bits for<br />
the mantissa), in its memory,<br />
which consisted of layers of metal<br />
bars between layers of glass. Its<br />
arithmetic unit was constructed<br />
from a large number of mechanical<br />
switches <strong>and</strong> had two 22-bit<br />
registers. The machine was freely<br />
programmable with the use of a<br />
punched tape. The device also had<br />
the prescribed control unit <strong>and</strong><br />
addressable memory, making it the world’s first programmable binary computing<br />
machine, with a clock speed of 1-Hertz. The picture above is a topside view of the Z1,<br />
which is very similar in appearance to a silicon chip. At first the machine was not very<br />
reliable. However, it functioned reliably by 1939.<br />
The Z2 was an experimental<br />
machine similar to the Z1 but<br />
used 800 relays for the<br />
arithmetic unit instead of<br />
mechanical switches. This<br />
machine proved that relays<br />
were reliable, which prompted<br />
Zuse to design <strong>and</strong> build the Z3<br />
using relays. The Z3 was<br />
constructed between 1938 <strong>and</strong><br />
1941 in Berlin. The Z3 used<br />
relays for the entire machine<br />
<strong>and</strong> had a 64-word memory,<br />
consisting of 22-bit floatingpoint<br />
numbers. The Z3 was the<br />
first reliable, fully functional, freely programmable computer based on the binary<br />
floating-point number <strong>and</strong> a switching system, which had the capability to perform<br />
complex arithmetic calculations. It had a clock speed of 5.33 Hertz <strong>and</strong> could perform a<br />
multiplication operation in 3 seconds. This machine contained all the components except<br />
the ability to store the program in the memory together with the data that was described<br />
by the von Neumann et al machine in 1946. In 1998, Raul Rojas proved that the Z3 was<br />
20
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
a truly universal computer in the sense of a Turing machine. The picture above is Zuse<br />
along with his 1961 reconstruction of the Z3. Allied bombing, during World War II,<br />
destroyed the original Z3.<br />
An example program from “The Life <strong>and</strong> Work of Konrad Zuse” Web Site, authored by<br />
Horst Zuse, listed in the references section, for the Z3 is the calculation of the<br />
polynomial: ((a4x + a3)x + a2)x + a1, where a4, a3, a2, <strong>and</strong> a1 would first be loaded into<br />
the memory cells 4, 3, 2, <strong>and</strong> 1.<br />
Lu To call the input device for the variable x<br />
Ps 5 To store variable x in memory word 5<br />
Pr 4 Load a4 in Register R1<br />
Pr 5 Load x in Register R2<br />
Lm Multiply: R1 := R1 x R2<br />
Pr 3 Load a3 in Register R2<br />
Ls1 Add: R1 := R1 + R2<br />
Pr 5 Load x in R2<br />
Lm Multiply: R1 := R1 x R2<br />
Pr 2 Load a2 in Register R2<br />
Ls1 Add: R1 := R1 + R2<br />
Pr 5 Load x in Register R2<br />
Lm Multiply: R1 := R1 x R2<br />
Ppr 1 Load a1 in Register R2<br />
Ls1 Add: R1 := R1 + R2<br />
Ld Shows the result as a decimal number<br />
The program above is very<br />
similar to the assembly code<br />
that is used in modern<br />
computers. From 1942 to<br />
1946 Zuse began to develop<br />
ways to program computers.<br />
To aid engineers <strong>and</strong><br />
scientists in the solution of<br />
complex problems, he<br />
developed the Plankakül<br />
(plan calculus) programming<br />
language. This precursor to<br />
today’s algorithm-type<br />
languages was the world’s<br />
first programming language<br />
<strong>and</strong> was intended for a<br />
logical machine. A logical machine could do more than just numerical calculations, of<br />
which the algebraic machines (Z1, Z2, Z3 & Z4) that he had previously designed are<br />
limited. The picture on the left is the Z4 model, completed in 1945 <strong>and</strong> reconstructed in<br />
21
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
1950, which used a mechanical memory, similar to that in the Z1, <strong>and</strong> had 32-bit words.<br />
By 1955, this machine had the added abilities to call subprograms, through a secondary<br />
punch tape reader, <strong>and</strong> use a conditional branch instruction.<br />
In 1942, Zuse built the S1, a special purpose computer to measure the wing surface area<br />
of airplanes, with 600 relays <strong>and</strong> 12-bit binary words. This machine was destroyed in<br />
1944. Zuse improved this model with the construction of the S2. This machine used<br />
approximately 100 clock gauges to automatically scan the surface of wings. This<br />
computer was most likely the first machine to use the concept of a process. It was<br />
destroyed in 1945. In 1949, he founded Zuse KG, Germany’s first computer company.<br />
In 1952, Zuse KG constructed the Z5 for optical calculations, an improved version of the<br />
Z4, which was about six times faster. It had many punch card readers for data <strong>and</strong><br />
program input, a punch card writer to output data <strong>and</strong> could h<strong>and</strong>le 32-bit floating-point<br />
numbers. In 1957, Zuse KG constructed the Z22 that contained an 8192-word magnetic<br />
drum <strong>and</strong> was the first stored program computer. In 1961, Zuse KG built the Z23, which<br />
was based on the same logic as <strong>and</strong> three times faster than the Z22, <strong>and</strong> was the first<br />
transistor-based computer. In 1965, his company produced the Z43, which was the first<br />
modern transistor computer to use TTL logic. The TTL (transistor-transistor-logic) type<br />
digital integrated circuit (IC) uses transistor switches for logical operations. In 1956,<br />
Siemens AG purchased Zuse KG. xv<br />
In 1937, Howard Aiken (1900 - 1973) proposed a machine that could perform four<br />
fundamental operations of arithmetic, addition, subtraction, multiplication <strong>and</strong> division,<br />
in a predetermined order to Harvard University, which was forwarded to IBM. His<br />
research had led to a system of differential equations that could only be solved using a<br />
prohibitive amount of calculations using numerical techniques <strong>and</strong> which had no exact<br />
solutions. His report stated:<br />
“... whereas accounting machines h<strong>and</strong>le only positive<br />
numbers, scientific machines must be able to h<strong>and</strong>le negative<br />
ones as well; that scientific machines must be able to h<strong>and</strong>le<br />
such functions as logarithms, sines, cosines <strong>and</strong> a whole lot of<br />
other functions; the computer would be most useful for<br />
scientists if, once it was set in motion, it would work through<br />
the problem frequently for numerous numerical values without<br />
intervention until the calculation was finished; <strong>and</strong> that the<br />
machine should compute lines instead of columns, which is<br />
more in keeping with the sequence of mathematical events.”<br />
22
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Aiken, working with IBM engineers, developed the ASCC computer (Automatic<br />
Sequence Controlled Calculator), which was capable of five operations, addition,<br />
subtraction, multiplication, division <strong>and</strong> reference to previous results. Though it ran on<br />
electricity <strong>and</strong> the major components were magnetically operated switches, this machine<br />
had a lot in common with Babbage's analytical engine. Construction of the machine<br />
started in 1939 at the IBM laboratories, Endicott <strong>and</strong> was completed in 1943. The<br />
machine weighed 35 tons, had more than 500 miles of wire, <strong>and</strong> used vacuum tubes <strong>and</strong><br />
relays to operate. The machine had 72 storage registers <strong>and</strong> could perform operations to<br />
23 significant figures. The machine instructions were entered on punched paper tapes,<br />
<strong>and</strong> punched cards were used to enter input data. The output was either in the form of<br />
punched cards or printed by means of an electric typewriter. The machine was moved to<br />
Harvard University, where it was renamed the Harvard Mark I, pictured above. The US<br />
navy used this machine in the Bureau of Ordnance’s Computation Project for gunnery<br />
<strong>and</strong> ballistics calculations, which was performed at Harvard. In 1947, Aiken completed<br />
the Harvard Mark II, which was a completely electronic<br />
computer. He also worked on the Mark III (the first<br />
computer to contain a drum memory) <strong>and</strong> Mark IV<br />
computers, <strong>and</strong> made contributions in electronics <strong>and</strong><br />
switching theory. xvi<br />
In 1937, Claude Shannon (1916 - 2001) wrote his Master's<br />
thesis, “A Symbolic Analysis of Relay <strong>and</strong> Switching<br />
Circuits”, using symbolic logic <strong>and</strong> Boole's algebra to<br />
analyze <strong>and</strong> optimize relay-switching <strong>and</strong> computer circuits.<br />
It was published in A.I.E.E. Transactions in 1938. For this<br />
work, Shannon was awarded the Alfred Nobel Prize of the<br />
combined engineering societies of the United States in<br />
1940. In 1948, Shannon published his most important work<br />
on information theory <strong>and</strong> communication, “A<br />
23
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Mathematical Theory of Communication”, where he demonstrated that all information<br />
sources have a “source rate” <strong>and</strong> all communication channels have a “capacity”, both<br />
measurable in bits-per-second, <strong>and</strong> that the information can be transmitted over the<br />
channel if <strong>and</strong> only if the capacity of the channel is not exceeded by the source rate. He<br />
also published works related to cryptography <strong>and</strong> the reliability of relay circuits, both<br />
with respect to transmission in noisy channels. xvii<br />
George Stibitz, a Bell Labs researcher, created the first electromechanical circuit that<br />
could control binary addition from old relays, batteries, flashlight bulbs, wires <strong>and</strong> tin<br />
strips in 1937. He realized that Boolean logic could be used for electromechanical<br />
telephone relays. He incorporated this binary adder (picture on left with Stibitz)<br />
prototype in his Model K digital calculator. Over the next two years, Stibitz <strong>and</strong> his<br />
associates at Bell Labs devised a machine to perform all four basic math operations on<br />
complex numbers. It was initially called the Complex Number Calculator but was<br />
renamed the Bell Labs Model Relay Computer (also known as the Bell Labs Model 1) in<br />
1949. This machine is considered to be the world's first electronic digital computer. Its<br />
electromechanical brain consisted of 450 telephone relays <strong>and</strong> 10 crossbar switches, <strong>and</strong><br />
three teletypewriters provided input to the machine. It could find the quotient of two<br />
eight-place complex numbers in about 30 seconds. Stibitz brought one of the typewriters<br />
to an American<br />
Mathematical<br />
Association<br />
meeting in 1940<br />
at Dartmouth<br />
<strong>and</strong> performed<br />
the world's first<br />
demonstration<br />
of remote<br />
computing by<br />
using phone<br />
lines to<br />
communicate<br />
with the<br />
Complex<br />
Number<br />
Calculator,<br />
which was in<br />
New York. xviii<br />
24
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In 1937, Alan Turing (1912 - 1954) published his<br />
paper “On Computable Numbers, with an<br />
application to the Entscheidungsproblem (decision<br />
problem)”. In this paper, he introduced the Turing<br />
Machine, which was an abstract machine capable of<br />
reading or writing symbols <strong>and</strong> moving between<br />
states, dependent upon the symbol read from a bidirectional,<br />
movable tape, using a set of finite rules<br />
listed in a finite table. This machine demonstrated<br />
that every method found for describing ‘welldefined<br />
procedures’, introduced by other<br />
mathematicians, could be reproduced on a Turing<br />
machine. This statement is known as the Church-<br />
Turing thesis <strong>and</strong> is a founding work of modern<br />
computer science, which defined computation <strong>and</strong><br />
its absolute limitation. His definition of computable<br />
was that a problem is ‘Calculable by finite means’.<br />
In 1938, his Ph.D. thesis, which was published as “Systems of Logic based on Ordinals”<br />
in 1939, Turing addressed uncomputable problems.<br />
During World War II, Turing worked at Bletchley Park,<br />
the British government's wartime communications<br />
headquarters. His main task was to master the Enigma<br />
(pictured right), the German enciphering machine,<br />
which he was able to crack, providing the Allies with<br />
valuable intelligence. His contributions made him a<br />
chief scientific figure in the fields of computation <strong>and</strong><br />
cryptography. After the war, he was interested in the<br />
comparison of the power of computation <strong>and</strong> the power<br />
of the human brain. He proposed the possibility that a<br />
computer, if properly programmed, could rival the<br />
human mind. In 1950, Turing wrote his famous paper<br />
"Computing Machinery <strong>and</strong> Intelligence," which, along<br />
with his previous work, founded the study of ‘Artificial<br />
Intelligence’. This paper introduces ‘the imitation<br />
game’, which is a test to determine if a computer<br />
program has intelligence. This game is now referred to<br />
as the Turing Test. Turing describes the original<br />
imitation game as:<br />
25
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
“The new form of the problem can be described in terms of a game which we call the<br />
‘imitation game.’ It is played with three people, a man (A), a woman (B), <strong>and</strong> an<br />
interrogator (C) who may be of either sex. The interrogator stays in a room apart from<br />
the other two. The object of the game for the interrogator is to determine which of the<br />
other two is the man <strong>and</strong> which is the woman. He knows them by labels X <strong>and</strong> Y, <strong>and</strong> at<br />
the end of the game he says either "X is A <strong>and</strong> Y is B" or "X is B <strong>and</strong> Y is A." The<br />
interrogator is allowed to put questions to A <strong>and</strong> B.”<br />
The idea in the Turing Test is that the interrogator (C) is actually communicating with<br />
human (A), a machine (B). The interrogator asks the two c<strong>and</strong>idates questions to decide<br />
their identities, as above with the man <strong>and</strong> woman. In order to prove that it’s program is<br />
intelligent, the machine must fool the interrogator into choosing it as the human. xix<br />
Between 1937 <strong>and</strong><br />
1938, John<br />
Vincent Atanasoff<br />
(far left) <strong>and</strong><br />
Clifford Berry<br />
devised the<br />
principals for the<br />
ABC machine<br />
(right), an<br />
electronic-digital<br />
machine that<br />
would lead to<br />
advances in digital computing machines. This nonprogrammable<br />
binary machine’s construction began in 1941<br />
but was stopped in 1942 due to World War II before<br />
becoming operational. This machine employed capacitors to<br />
store electrical charge that could correspond to numbers in<br />
the form of logical 0’s <strong>and</strong> 1’s. This was the first machine to<br />
demonstrate electronic techniques in calculation <strong>and</strong> to use<br />
regenerative memory. It contained 300 vacuum tubes in its<br />
arithmetic unit <strong>and</strong> 300 more in its control unit. The capacitors were affixed inside of 12-<br />
inch tall by 8-inch diameter rotating Bakelite (a thermosetting plastic) cylinders (shown<br />
below) with metal contact b<strong>and</strong>s on their outer surface. Each cylinder contained 1500<br />
capacitors <strong>and</strong> could store 30 binary numbers, 50 bits in length, which could be read from<br />
or written to the metal b<strong>and</strong>s of the rotating cylinder. The input data was loaded on<br />
punched cards. Intermediate data was also stored on punched cards by burning small<br />
spots onto the cards with electric sparks, which could be re-read by the computer at some<br />
26
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
later time by detecting the difference in electrical resistance of the carbonized burned<br />
spots. This machine could also convert from binary to decimal <strong>and</strong> vice versa. xx<br />
In 1943, the U.S. Army contracted with the Moore School of Electrical Engineering,<br />
University of Pennsylvania, for the production of the Electrical Numerical Integrator <strong>and</strong><br />
Computer (ENIAC), which would be used to calculate ballistic tables, which was<br />
designed by J. Presper Eckert (1919-1995) <strong>and</strong> John Mauchly (1907-1980). The 30-ton<br />
machine with approximately 18,000 vacuum tubes was completed in 1946 <strong>and</strong> was<br />
contained in a 30’ by 50’ room.<br />
The ENIAC was a general-purpose digital electronic computer that could call<br />
subroutines. It could reliably perform 5,000 additions or 360 multiplications per second,<br />
which was between 100 <strong>and</strong> 1000 times faster than existing technology. At the time of<br />
its introduction, ENIAC was the world’s largest single electronic apparatus. This<br />
machine was separated into thirty autonomous units. Twenty of these were accumulators,<br />
which were ten-digit, high-speed adding machines with the ability to store results. These<br />
accumulators used electronic circuits called ring counters, a loop of bistable devices (flipflops)<br />
interconnected in such a manner that only one of the devices may be in a specified<br />
27
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
state at one time, to count each of<br />
its digits from 0 to 9 (a decimal<br />
arithmetic unit). The machine<br />
also had a multiplier <strong>and</strong> dividersquare<br />
rooter, which special<br />
devices to accelerate their<br />
respective arithmetic operations.<br />
A “computer program” on<br />
ENIAC was entered by using<br />
wires to connect different units<br />
of the machine as to perform<br />
operations is a required<br />
sequence. The picture on the left<br />
shows two women entering a<br />
program, which was a very<br />
difficult task. The machine was controlled by a sequence of electronic pulses, in which<br />
each unit on the machine could issue a pulse to cause one or more other units to perform<br />
a computation. The control <strong>and</strong> data signals on ENIAC were identical, typically were 2<br />
microsecond pulses placed at ten microsecond intervals, which could allow for the output<br />
28
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
of an accumulator to be attached to the input of a control line of another accumulator.<br />
This could allow data-sensitive operations or operations based on data content. It also<br />
had a unit called the “Master Programmer”, which performed nested loops or iterations.<br />
ENIAC’s units could operate simultaneously, performing parallel calculations.<br />
Eventually this machine could perform IF-<strong>THE</strong>N conditional branches. It is likely that<br />
this was the first machine with this operation. xxi<br />
In 1944, because of suggested improvements from people involved with the project, the<br />
U.S. Army extended the ENIAC project to include research on Electronic Discrete<br />
Variable Automatic Computer (EDVAC), a stored program computer. At about this<br />
time, John von Neumann (1903 - 1957) visited the Moore School to take part in<br />
discussions regarding EDVAC’s design. He is best known for producing the bestrecognized<br />
formal description of a modern computer, based on a stored program<br />
computer, known as the von Neumann architecture, in his 1946 paper "First Draft of a<br />
report to the EDVAC". The basic elements of this architecture are:<br />
29
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
• A memory, which contains both data <strong>and</strong> instructions <strong>and</strong> also allows both data<br />
<strong>and</strong> instruction locations to be read from, <strong>and</strong> written to, in any order.<br />
• A calculating unit, which can perform both arithmetic <strong>and</strong> logical operations on<br />
the data.<br />
• A control unit, which can interpret retrieved memory instructions <strong>and</strong> select<br />
alternative courses of action based on the results of previous operations.<br />
The EDVAC was a multipurpose binary computing machine with a memory capacity of<br />
1,000 words, which was more than any other computing device of its time. Its memory<br />
worked by using mercury delay lines, tubes of mercury in which electrical impulses were<br />
bounced back <strong>and</strong> forth, creating a two-state device for storing 0’s <strong>and</strong> 1’s, which could<br />
be assigned or retrieved at will. It used 12 of 16 possible 4-bit instructions <strong>and</strong> each word<br />
in memory had 44 bits. The integer range was ±1-2 43 <strong>and</strong> the floating-point numbers had<br />
a 33-bit mantissa, 10 bit exponent <strong>and</strong> 1 bit for the sign, with a range ± (1-2 -33 )2 511 . It<br />
had approximately 10,000 crystal diodes <strong>and</strong> 4,000 vacuum tubes. Its average error-free<br />
up-time was about 8 hours. Its magnetic drum could hold 4,608 words 48 bits in length<br />
<strong>and</strong> a block transfer length of between 1 <strong>and</strong> 384 words. It also had a magnetic tape<br />
storage system that could store 112 characters per inch on a magnetic wire that was<br />
between 1,250 <strong>and</strong> 2500 feet long with a variable block length of between 2 <strong>and</strong> 1024<br />
words also 48 bits long. During searches of the tape the machine could be released for<br />
computation <strong>and</strong> data read from the tape could be automatically re-recorded to the same<br />
place on the tape. EDVAC’s input devices consisted of a photoelectric tape reader could<br />
read 78 words per second <strong>and</strong> an IBM card reader that could read 146 cards per minute at<br />
8 words per card. The output devices were a 30 word per minute paper tape perforator, a<br />
30 word per minute teletypewriter <strong>and</strong> a 1000 word per minute cardpunch. This machine<br />
had a clock speed of 1 MHz <strong>and</strong> was a significant improvement over ENIAC. xxii<br />
Thomas Flowers <strong>and</strong><br />
crew started<br />
construction on the<br />
Mark 1 COLOSSUS<br />
computer in 1943 at<br />
Dollis Hill Post Office<br />
Research Station in the<br />
U.K. Max Newman<br />
<strong>and</strong> associates of<br />
Bletchley Park (‘Station<br />
X’), Buckinghamshire,<br />
designed this machine,<br />
which was primarily<br />
intended for<br />
30
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
cryptanalysis of German Fish teleprinter ciphers used during World War II. This<br />
electromechanical attempt at a one-time pad was the German military’s most secure<br />
method of communication. Prior to knowledge of Zuse’s Z3, this was considered to be<br />
the first totally electronic computing device, using only vacuum tubes as opposed to<br />
relays in the Z3. This special purpose computer was equipped with very fast optical<br />
punch card readers for input. Nine of the improved Mark II machines were constructed<br />
<strong>and</strong> the original COLOSSUS Mark I was converted, for a total of ten machines. These<br />
machines were considered to be of the highest level of secrecy. After the end of the war,<br />
by direct orders from Churchill, all ten machines were destroyed—reduced into pieces no<br />
larger than a man’s h<strong>and</strong>. The COLOSSUS, Heath Robinson (precursor to the<br />
COLOSSUS) <strong>and</strong> the Bombe (a machine designed by Alan Turing) are all in the process<br />
of reconstruction to preserve these important achievements.<br />
The Universal Automatic Computer I (UNIVAC I) was designed by J. Presper Eckert <strong>and</strong><br />
John Mauchly in 1947. The machine, constructed by Eckert-Mauchly Computer<br />
Corporation, founded by Eckert <strong>and</strong> Mauchly in 1946 but later purchased by Sperry-<br />
R<strong>and</strong>, was delivered to the US Census Bureau in 1951 at a cost of $159,000. By 1953,<br />
three UNIVACs were in operation <strong>and</strong> by 1958 there were forty-six in the service of<br />
government departments <strong>and</strong> private organizations. R<strong>and</strong> sold the later machines for<br />
more than $1,000,000 each.<br />
31
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
UNIVAC’s input consisted of 12,800 character per second magnetic tape reader, a 240<br />
card per minute card to tape converter <strong>and</strong> a punched paper tape to magnetic tape<br />
converter. Its output consisted of a12,800 character per second magnetic tape reader, a<br />
120 card per minute card to tape converter, a 10 character per second character printer, a<br />
Uniprinter (a 600 line per minute high-speed line printer developed by Earl Masterson in<br />
1954) <strong>and</strong> a 60 word per minute Rad Lab buffer. This was the first machine to use a<br />
buffered memory. It had 5,200 vacuum tubes, 18,000 crystal diodes, 300 relays <strong>and</strong><br />
contained a mercury delay line memory that could hold 1,000 words 72 bits in length (11<br />
decimal digits plus sign). The 8 ton, 25 by 50 feet machine consumed 125,000 Watts of<br />
power—31,250 times as much as a desktop computer (the average desktop consumes less<br />
than 400 Watts). It could perform 1,900 additions, 465 multiplications or 256 divisions<br />
per second. The machine also had a character set, similar to a typewriter keyboard, with<br />
capital letters. In 1956 a commercial UNIVAC computer was introduced that used<br />
transistors.<br />
In 1943, the Massachusetts Institute of Technology (MIT) started the Whirlwind Project,<br />
under the supervision of Jay Forrester, for the U.S. Navy after determining that it was<br />
possible to produce a computer to run a flight simulator for training bomber crews.<br />
Initially, they attempted to use an analog machine but found that it was neither flexible<br />
nor accurate. Another problem was the typical batch-mode computers of the day were<br />
32
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
not computationally sufficient for<br />
time constrained processing because<br />
they could not continually operate on<br />
continually changing input.<br />
Whirlwind also required much more<br />
speed than typical computational<br />
systems. The design of this highspeed<br />
stored-program computer was<br />
completed by 1947 <strong>and</strong> 175 people<br />
started construction in 1948. The<br />
system was completed in three years,<br />
when the U.S. Air Force picked it up<br />
because the Navy had lost interest,<br />
renaming it Project Claude. This machine was too slow <strong>and</strong> improvements were<br />
implemented to increase performance. The initial machine used Williams tubes, cathode<br />
ray tubes that were used to store electronic data, which were unreliable <strong>and</strong> slow.<br />
Forrester exp<strong>and</strong>ed on the work of An Wang, who created the pulse transfer-controlling<br />
device in 1949. The product was magnetic core memory (upper left), which permanently<br />
stores binary data on tiny donut shaped magnets strung together by a wire grid. This<br />
approximately doubled the memory speed of the new machine, completed in 1953.<br />
Whirlwind was the world’s first real-time computer <strong>and</strong> the first computer to use the<br />
cathode ray tube, which at this time was a large oscilloscope screen, as a video monitor<br />
for an output device.<br />
The new machine was used in the<br />
Semi Automated Ground<br />
Environment (SAGE), which was<br />
manufactured by IBM <strong>and</strong> became<br />
operational in 1958. The picture on<br />
the right shows a SAGE terminal.<br />
This system coordinated a complex<br />
system of radar, telephone lines,<br />
radio links, aircraft <strong>and</strong> ships. It<br />
could identify <strong>and</strong> detect aircraft<br />
when they entered U.S. airspace.<br />
SAGE was contained in a 40,000<br />
square foot area for each two-system<br />
installation, had 30,000 vacuum<br />
tubes, had a 4k by 32-bit word magnetic drum memory <strong>and</strong> used 3 megawatts of power.<br />
In 1958, the Whirlwind project was also extended to include an air traffic control system.<br />
The last Whirlwind-based SAGE computer was in service until 1983. xxiii<br />
33
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In 1946, work started on the Electronic Delay Storage Automatic Calculator (EDSAC), a<br />
serial electronic calculating machine, at Cambridge. It was contained in a 5 by 4 meter<br />
room, had 3000 valves, consumed 12,000 Watts <strong>and</strong> could perform 650 instructions per<br />
second at 500kHz. Its mercury ultrasonic delay line memory could 1024 words 17 bits in<br />
length (35-bit “long” digits could be contained by using two adjacent memory “tanks”)<br />
<strong>and</strong> had an “Operating System” (called “initial orders”) that was stored in 31 words in<br />
read-only memory”. The input device consisted of a 6⅔ character per second 5-track<br />
teleprinter paper tape reader <strong>and</strong> output was performed on a 6⅔ character per second<br />
teleprinter. A commercial version of EDSAC, called LEO, which was manufactured by<br />
the Lyons Company, began service in 1953. Cambridge was the first university in the<br />
world to offer a Diploma in Computer Science, using EDASC, which was initially a oneyear<br />
post graduate course called Numerical Analysis <strong>and</strong> Automatic Computing. xxiv<br />
34
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In 1948, at the University of Manchester in Engl<strong>and</strong>, the Small Scale Experimental<br />
Machine, nicknamed the “Baby”, successfully executed its first program, becoming<br />
world's first stored-program electronic digital computer. Frederic C. Williams (1911 -<br />
1977) <strong>and</strong> Tom Kilburn (1921 - 2001) built the machine to test the Williams-Kilburn<br />
Tube (type of memory composed of cathode vacuum tubes storing one bit of information<br />
on a cathode ray tube, illuminating a point on the screen that stays on) for speed <strong>and</strong><br />
reliability, <strong>and</strong> to demonstrate the feasibility of a stored program computer. Its success<br />
prompted the development of the Manchester Mark I, a useable computer based on the<br />
same principals. The picture shows the “Baby” (replica), the shortest cabinet at the right,<br />
<strong>and</strong> the Mark I, the six taller cabinets.<br />
35
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
The picture on<br />
the left shows<br />
Williams <strong>and</strong><br />
Kilburn at the<br />
console of the<br />
Manchester<br />
Mark I. It was<br />
built in 1949<br />
<strong>and</strong> could<br />
store data in<br />
addressable<br />
"line"s,<br />
holding one<br />
40-bit number<br />
or two 20-bit<br />
instruction<br />
registers, <strong>and</strong><br />
had two 20-bit<br />
address modifier registers, called "B-lines" (for modifying addresses in instructions),<br />
which functioned either as index registers or as base address registers. This Mark I was<br />
of historical significance because it is the first machine to include this index/base register<br />
in its architecture, which was a very important improvement. It was the first R<strong>and</strong>om<br />
Access Memory computer. It could perform serial 40-bit arithmetic, with hardware add,<br />
subtract <strong>and</strong> multiply (with an 80-bit<br />
double-length accumulator) <strong>and</strong> logical<br />
instructions. The average instruction<br />
time was 1.8 milliseconds (about 550<br />
additions per second), with<br />
multiplication taking much longer. It<br />
had a single-address format order code<br />
with about 30 function codes. The<br />
machine used two Williams tubes for<br />
its 128 words of memory. Each tube<br />
contained 64 rows with 40 points (bits)<br />
per row, which was two “page”s (A<br />
page was an array of 32 by 40 points).<br />
It also had a 128 page capacity drumbacking<br />
store, 2 pages per track, about<br />
30 milliseconds revolution time on 2<br />
drums (each drum could hold up to 32<br />
36
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
tracks, i.e. 64 pages).<br />
The machine’s peripheral instructions included a “read” from a 5-hole paper tape reader,<br />
on which the code was normally entered, <strong>and</strong> “transfer” a page or track to or from a<br />
Williams-Kilburn Tube page or pair of pages in storage. It also had a bank of 40 (8 by 5)<br />
buttons that could be used to set the ones in a word in storage. There were also additional<br />
switches that controlled the operations of the Mark I. The current storage contents could<br />
be viewed on the machine’s display tube, shown on the left, which was organized into 8<br />
columns of 5-bit groups. There was a direct correspondence between the symbols, each<br />
made up of a 5-bit group, on the punched cards <strong>and</strong> the symbols on the display tube. The<br />
government awarded the contract to mass-produce Mark I computers to Ferranti Ltd.,<br />
which was the world’s first commercially available computer. Kilburn wrote the first<br />
electronically stored computer program for the Mark I <strong>and</strong> also established the world’s<br />
first university computer science department at Manchester. xxv<br />
There were substantial improvements in computer programming <strong>and</strong> user interface design<br />
as well as hardware architecture. John Mauchly (ENIAC <strong>and</strong> UNIVAC) developed Short<br />
Order Code, which is thought to be the first high-level language in 1949, for the Binary<br />
Automatic Computer (BINAC) computer. The BINAC, completed in 1949, was designed<br />
for Northrop Aviation <strong>and</strong> was the first computer to use a magnetic tape. In 1951, David<br />
Wheeler, Maurice Wilkes, <strong>and</strong> Stanley Gill introduced sub-programs <strong>and</strong> the “Wheeler<br />
jump”, to implement them by moving to a different section of instructions <strong>and</strong> returning<br />
to the original section after the sub-program is finished. Maurice Wilkes also originated<br />
the concept of micro-programming, which is a technique for providing an orderly<br />
approach to designing the control section of a computer system.<br />
In 1951, while working with the UNIVAC I<br />
mainframe, Betty Holberton (left) created the sortmerge<br />
generator, which was predecessor to the<br />
compiler <strong>and</strong> may have been the first useful<br />
program that had the capability of generating other<br />
programs for the UNIVAC I, <strong>and</strong> developed the C-<br />
10 instruction code, which controlled the its core<br />
functions. The C-10 instruction code allowed<br />
UNIVAC to be controlled by a control console<br />
(keyboard) comm<strong>and</strong>s instead of switches, dials <strong>and</strong><br />
wires, which made the system much more useful<br />
<strong>and</strong> human friendly. The code was designed to use<br />
mnemonic characters to input instructions, such as<br />
‘a’ for add. She later was the chairperson for the<br />
37
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
committee that established the st<strong>and</strong>ards for the Common Business Oriented Language<br />
(COBOL). xxvi<br />
In 1952, Grace Murray Hopper developed A-0, which is believed<br />
to be the first real compiler or an intermediary program that<br />
converts symbolic mathematical code into a sequence of<br />
instructions that can be executed by a computer. This allowed<br />
the use of specific call numbers assigned to the collected<br />
programming routines that were stored on magnetic tape, which<br />
the computer could find <strong>and</strong> execute. In the same year she<br />
developed a compiler for business use, B-0 (later renamed<br />
FLOW-MATIC) that could translate English terms <strong>and</strong> wrote a<br />
paper that described the use of symbolic English notation to<br />
program computers, which is much easier to use than machine<br />
code that was previously used. While working on the UNIVAC<br />
I, she encouraged programmers to reuse common pieces<br />
of code that were known to work well, reducing<br />
programming errors. She was on the CODASYL Short<br />
Range Committee to define the basic COBOL language<br />
design, which appeared in 1959 <strong>and</strong> were greatly<br />
influenced by FLOW-MATIC. COBOL was launched in<br />
1960 <strong>and</strong> was the first st<strong>and</strong>ardized computer<br />
programming language for business applications.<br />
Various computer manufacturers <strong>and</strong> the Department of<br />
Defense supported development of the st<strong>and</strong>ard. It was<br />
intended to solve business problems, be machine<br />
independent <strong>and</strong> to be updated. COBOL has been<br />
updated <strong>and</strong> improved over the years, <strong>and</strong> is still used today. Hopper spent many years<br />
contributing to the st<strong>and</strong>ardization of compilers, which eventually led to international <strong>and</strong><br />
national st<strong>and</strong>ards <strong>and</strong> validation facilities for many programming languages. xxvii<br />
In 1956, John Backus <strong>and</strong> his IBM team created the first<br />
FORTRAN (short for FORmula TRANslation). The initial<br />
compiler consisted of 25,000 lines of machine code, which<br />
could be stored on magnetic tape. Backus <strong>and</strong> team wrote<br />
the paper “Preliminary Report, Specifications for the IBM<br />
Mathematical FORmula TRANslating System, FORTRAN”<br />
to communicate their discovery <strong>and</strong> to show that scientists<br />
<strong>and</strong> mathematicians could program without actually<br />
underst<strong>and</strong>ing how the machines worked or without<br />
knowing assembly language. It works by using a software<br />
38
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
device called a translator, which contains a parser to translate the high-level language that<br />
could be read by people to a binary language that can be executed on a computer. A later<br />
version of FORTRAN is still in use today, over 40 years later. Backus also developed a<br />
st<strong>and</strong>ard notation, Backus-Naur Form (BNF), to unambiguously <strong>and</strong> formally describe a<br />
computer language. BNF uses grammatical-type rules to describe a language.<br />
In 1947, a major event occurred in<br />
electronics <strong>and</strong> computation. John<br />
Bardeen, Walter Brattain <strong>and</strong> William<br />
Shockley (pictured in order on left)<br />
announced that they developed the<br />
transistor for which they were<br />
awarded the Nobel Prize in 1956.<br />
This invention ushered in a new era in<br />
computers. First generation<br />
computers used vacuum tubes as their principal digital circuits. Vacuum tubes generated<br />
heat, consumed electrical power <strong>and</strong> quickly burned out, requiring frequent maintenance.<br />
They were also used in telecommunications to amplify long distance phone calls, which<br />
is the reason for this team’s research. Transistors can switch <strong>and</strong> modulate electronic<br />
current, <strong>and</strong> are composed of a semi-conductor that can both conduct <strong>and</strong> insulate, such<br />
as germanium or silicon. The transistor can act as a transmitter by converting sound<br />
waves into electronic waves <strong>and</strong> a resistor by controlling electrical current. In 1954,<br />
Texas Instruments lowered the cost of production by introducing silicon transistors. The<br />
transistor brought about the second generation in computers by replacing vacuum tubes<br />
with solid-state components, which began the semiconductor revolution. xxviii Philco<br />
Corporation engineers developed the surface barrier transistor in 1954, which was the<br />
first transistor suitable for use in high-speed computers. In 1957, Philco completed the<br />
TRANSAC S-2000—the first large-scale, fully transistorized<br />
scientific computer to be offered as a manufactured<br />
product. xxix<br />
In 1957, the Burroughs Atlas computer, constructed at the<br />
Great Valley Research Laboratory outside of Philadelphia,<br />
was one of the first to use transistors. The machine was<br />
developed for the America air defense system deployed<br />
during the 1950’s <strong>and</strong> was the ground guidance computer for<br />
the Atlas intercontinental ballistic missile (ICBM). The first<br />
launch was in 1958. The system had two memory areas, one<br />
for data with 256 24-bit words <strong>and</strong> one for instructions with<br />
2048 18-bit words. There were 18 Atlas computers<br />
constructed, costing $37 million. xxx<br />
39
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
After the launch of Sputnik (NASA recreated model pictured on left) by the U.S.S.R. in<br />
1957, The U.S. government responded by forming the Advanced Research Projects<br />
Agency (ARPA) to ensure technological superiority by exp<strong>and</strong>ing new frontiers of<br />
technology beyond immediate requirements. Initially ARPA's mission concerned issues,<br />
including space, ballistic missile defense, <strong>and</strong> nuclear test detection. The major<br />
contribution that ARPA made to computer technology was the Advanced Research<br />
Projects Agency Network (ARPANET).<br />
In 1960, Paul Baran of the RAND Corporation<br />
published studies on secure communication<br />
technologies that would allow military<br />
communications to continue operations after a nuclear<br />
attack. He discovered two important ideas that outline<br />
the packet-switching principal for data<br />
communications:<br />
1. Use a decentralized network having multiple<br />
paths between any two points, which allows<br />
single points of failure from which the system<br />
could automatically recover<br />
2. Divide complete user messages into blocks<br />
before sending them into the network<br />
In 1961, Leonard Kleinrock performed research<br />
on “store <strong>and</strong> forward” messaging, where<br />
messages are buffered completely on a switch or<br />
router, checksummed to find if an error exists in<br />
the message, <strong>and</strong> sent to the next location. In<br />
1962, J.C.R. Licklider from MIT discussed the<br />
“Gallactic Network” concept in a series of<br />
memos. These computer network ideas<br />
represent the same type of general<br />
communication system as is used in the Internet.<br />
The same year that he wrote these memos,<br />
Licklider was working at ARPA <strong>and</strong> was able to<br />
convince others that this was an important idea.<br />
In 1966, Lawrence G. Roberts from MIT was<br />
brought in to head the APRANET project to<br />
build the network. Roberts’ "plan for the<br />
ARPANET" was introduced at a symposium in<br />
40
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
1967, which included a time-sharing scheme using smaller computers to facilitate<br />
communication between larger machines as suggested by Wesley Clark. An updated<br />
plan was completed in 1968, which included packet switching. The contract to construct<br />
the network was awarded to Bolt, Beranek <strong>and</strong> Newman in early 1969. The first<br />
connected network consisted of four nodes between UCLA, the Stanford Research<br />
Institute, UCSB, <strong>and</strong> University of Utah. It was completed in December 1969. The<br />
ARPANET was the world’s first operational packet switched network. Packet switching<br />
was a new concept that allowed more than one machine to access one channel to<br />
communicate with other machines. Previously these channels were switched <strong>and</strong> only<br />
allowed one machine to communicate with one other machine at a time. By 1973, the<br />
University College of London in Engl<strong>and</strong> <strong>and</strong> the Royal Radar Establishment in Norway<br />
connect to the ARPANET, making it an international network.<br />
With the advent of computer internetworking came new innovations to facilitate<br />
communication between machines. One innovation formulated by Robert Kahn <strong>and</strong> Vint<br />
Cerf was to make host computers responsible for reliability, instead of the network as<br />
was done in the initial ARPANET. This minimized the role of the network, which made<br />
it possible to connect networks <strong>and</strong> machines with different characteristics <strong>and</strong>, made the<br />
development of the Transmission Control Protocol (TCP)—to check, track <strong>and</strong> correct<br />
transmission errors <strong>and</strong> the Internet Protocol (IP)—to manage packet switching. The<br />
TCP/IP suite is arranged as a layered set of protocols, called the TCP/IP Stack, which<br />
defines each layers responsibilities in the connectionless transmission of data <strong>and</strong><br />
interfaces that allow the passing of data between each layer. Because the interfaces<br />
between each layer are st<strong>and</strong>ardized <strong>and</strong> well defined, development of hardware <strong>and</strong><br />
software is possible for different purposes, <strong>and</strong> from different architectures. The TCP/IP<br />
protocols replaced the Network Control Protocol (NCP), the original ARPANET<br />
protocol, <strong>and</strong> the military part of ARPANET was separated, forming MILNET, in 1983.<br />
The initial network restricted commercial activities because it was government funded.<br />
In the early 1970’s, message exchanges that were initially available on mainframe<br />
systems became available across wide area networks. In 1972, Ray Tomlinson<br />
introduced the “name@computer” addressing scheme to simplify e-mail messaging,<br />
which is still in use today. In 1972, the Telnet st<strong>and</strong>ard for terminal emulation over<br />
TCP/IP networks, which allows users to log onto a remote computer, was introduced. It<br />
enables users to enter comm<strong>and</strong>s on offsite computers, executing the as if they were<br />
using the remote systems own console. In 1973, the File Transfer Protocol (FTP) was<br />
developed to facilitate the long-distance transfer of files across computer networks. The<br />
Unix <strong>User</strong> Network (Usenet) was created in 1979 to facilitate the posting <strong>and</strong> sharing of<br />
messages, called “articles”, to network distributed bulletin boards, called “newsgroups”.<br />
In the mid 1980’s the Domain Name System used Domain Name Servers to simplify<br />
machine identification. Instead of using a machines IP address, such as “10.192.20.128”,<br />
41
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
a user only need remember the machines domain name, such as “thismachine.net”. By<br />
1982, commercial e-mail service was available in 25 cities <strong>and</strong> the term “Internet” was<br />
designated to mean a “connected set of computer networks”. In 1983, the complete<br />
change to TCP/IP created a truly global “Internet”.<br />
National Science Foundation (NSF) became involved in ARPANET in the mid 1980’s.<br />
In 1986, the NSFNet Backbone was started to connect <strong>and</strong> provide access to<br />
supercomputers. In the late 1980’s, the Department of Defense stopped funding for<br />
ARPANET <strong>and</strong> the NSF assumed responsibility for long-haul connectivity in 1989. The<br />
first Internet Service Providers (ISP) companies also appeared, servicing regional<br />
research networks <strong>and</strong> providing access to email Usenet News for the public. The NSF<br />
initiated the connection of regional TCP/IP networks <strong>and</strong> the Internet began to emerge.<br />
In the 1990’s, commercial activity was allowed <strong>and</strong> the Internet grew rapidly.<br />
Eventually, this commercial activity created competition <strong>and</strong> commercial regional<br />
providers, called Network Access Points (NAP’s) took over backbones <strong>and</strong><br />
interconnections, causing NSFNet to be dropped <strong>and</strong> the removal of all existing<br />
commercial restrictions.<br />
In 1989, Tim Berners-Lee invented the Uniform<br />
Resource Locator (URL) <strong>and</strong> Hypertext Markup<br />
Language (HTML), which were inspired by Vannevar<br />
Bush's "memex". The URL provides a simple way to<br />
find specific documents on the Internet by using the<br />
name of the machine, the name of the document file<br />
<strong>and</strong> the protocol to obtain <strong>and</strong> display the file. HTML<br />
is a method to set the format a document by<br />
embedding codes, which can also be used to designate<br />
hypertext—text that can be “clicked” on with a mouse<br />
pointer to cause some action or to retrieve another<br />
document. Eventually it was possible to place<br />
graphics <strong>and</strong> sound in documents, which started the<br />
World Wide Web (WWW), <strong>and</strong> many of the services<br />
that are now available on the Internet. By 1997, 150<br />
countries <strong>and</strong> 15 million host computers were<br />
connected to the Internet, <strong>and</strong> 50 million people were using the World Wide Web. By<br />
1990, approximately 9 million people will send over 2.3 billion e-mail messages. xxxi<br />
In 1958, the ALGOrithmic Language (ALGOL) 58 high-level scientific programming<br />
language was formalized. It was designed to be a universal language by an international<br />
committee. It was the first attempt at software portability to provide a machine<br />
independent implementation. ALGOL is considered to be an important language because<br />
42
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
it influenced the development of future languages. Almost all languages have been<br />
developed with “ALGOL-like” lexical <strong>and</strong> syntactic structures that have hierarchal,<br />
nested environment <strong>and</strong> control structures. ALGOL 60 had block structure for statements<br />
<strong>and</strong> the ability to call subprograms by name or by value. It also had if-then-else control<br />
statements for iteration <strong>and</strong> with recursive ability. ALGOL has a small number of basic<br />
constructs with a non-restricted associated type <strong>and</strong> rules to combine them into more<br />
complex constructs, of which some can produce values. ALGOL also had dynamic<br />
arrays wit variable specified subscript ranges, reserved words for key functions that could<br />
not be used as identifiers, <strong>and</strong> user defined data types to fit particular problems. A<br />
sample ALGOL source code “Hello World!” program from the Web site referenced for<br />
this information that runs on a Unisys A-series mainframe is: xxxii<br />
BEGIN<br />
FILE F (KIND=REMOTE);<br />
EBCDIC ARRAY E [0:11];<br />
REPLACE E BY "HELLO WORLD!";<br />
WHILE TRUE DO<br />
BEGIN<br />
WRITE (F, *, E);<br />
END;<br />
END.<br />
As of 1959, more that 200 programming languages had been created.<br />
Between 1958 <strong>and</strong> 1959, both<br />
Texas Instruments <strong>and</strong> Fairchild<br />
Semiconductor Corporation were<br />
introducing integrated circuits<br />
(IC). TI’s Jack Kirby, an<br />
engineer with a background in<br />
transistor-based hearing aids,<br />
introduced first IC (pictured left<br />
from CNN), which was based on<br />
a germanium semiconductor.<br />
Soon after, one of Fairchild’s<br />
founders <strong>and</strong> research engineers,<br />
Robert Noyce, produced a<br />
similar device based on a silicon<br />
semiconductor. The monolithic<br />
integrated circuit combined transistors, capacitors, resistors <strong>and</strong> all connective wiring on<br />
a single semiconductor crystal or chip. Fairchild produced the first commercially<br />
available ICs in 1961. Integrated circuits quickly became the industry st<strong>and</strong>ard<br />
43
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
architecture for computers. Robert Noyce later founded Intel. Jack Kirby had<br />
commented:<br />
"What we didn't realize then was that the integrated circuit would reduce the cost of<br />
electronic functions by a factor of a million to one, nothing had ever done that for<br />
anything before" xxxiii<br />
In 1960, The<br />
Remington R<strong>and</strong><br />
UNIVAC delivered<br />
the Livermore<br />
Advanced Research<br />
Computer (LARC)<br />
computer to the<br />
University of<br />
California Radiation<br />
Laboratory, now<br />
called the Lawrence<br />
Livermore National<br />
Laboratory. This<br />
machine had four<br />
major cabinets that<br />
were approximately<br />
20 feet long, 4 feet<br />
wide <strong>and</strong> 7 feet tall.<br />
One cabinet contained the I/O processor to route <strong>and</strong> control input <strong>and</strong> output, another<br />
had the computational unit to perform computational activity, <strong>and</strong> the last two contained<br />
16K of ferrite core memory. There were also twelve floating head drums, rotating<br />
cylinders coated with a magnetic material, that were approximately 4 feet wide, 3 feet<br />
deep <strong>and</strong> 5 feet high, which were used as storage devices. Each drum could store<br />
250,000 12-decimal-digit LARC words—almost 3 Megs on its 12 drums. There were<br />
also two independent controllers for read <strong>and</strong> write operations. There were also eight<br />
tape head units that could hold approximately 450,000 LARC words on each tape reel,<br />
deducting storage overhead. Its printer could print 600 lines per minute <strong>and</strong> had a 51<br />
alphanumeric characters set. There was a punch card reader <strong>and</strong> a control console with<br />
toggle switches to control the system (pictured above). The LARC performed decimal<br />
mode arithmetic operations to 22 decimal digits <strong>and</strong> could perform 12x12 addition in 4<br />
microseconds <strong>and</strong> 12x12 multiplication in 12 microseconds, with division taking a little<br />
bit longer. The machine used storage, shift <strong>and</strong> result registers to store information<br />
during repetitive calculations. LARC’s hardware was difficult to maintain due to its<br />
44
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
discrete nature, being comprised of a collection of transistors, resistors, capacitors <strong>and</strong><br />
other electronic components. xxxiv<br />
In November of 1960, Digital<br />
Equipment Corporation (DEC)<br />
started production of the world’s<br />
first commercial interactive<br />
computer, the PDP-1 (left). The<br />
$120,000 machine’s four cabinets<br />
measured approximately 8 feet in<br />
length. A DEC technical bulletin<br />
describes it as:<br />
"...a compact, solid state general<br />
purpose computer with an internal<br />
instruction execution rate of<br />
100,000 to 200,000 operations per<br />
second. PDP-1 is a single address,<br />
single construction, stored program<br />
machine with a word length of 18-<br />
bits operating in parallel on 1's<br />
complement binary numbers."<br />
It had a 4000 18-bit word memory.<br />
It was the first computer with a<br />
typewriter keyboard <strong>and</strong> a cathoderay<br />
tube display monitor. It also<br />
had a light pen, which made it<br />
interactive, <strong>and</strong> a paper punch output device. Producing 50 of these machines made DEC<br />
the world’s first mass computer maker. xxxv<br />
Between 1961 <strong>and</strong> 1962, Fern<strong>and</strong>o Corbató of MIT developed Compatible Time-Sharing<br />
System (CTSS) as part of Project MAC, which was one of the first time-sharing<br />
operating systems that allowed multiple users to share a single machine. It was also the<br />
first system to have formatting text utility <strong>and</strong> one of the first to have e-mail capabilities.<br />
Louis Pouzin developed RUNCOM for CTSS, the precursor of UNIX shell script, which<br />
executed comm<strong>and</strong>s contained in a file <strong>and</strong> allowed parameter substitution. Multiplexed<br />
Information <strong>and</strong> Computing Service (Multics), the operating system that led to the<br />
development of UNIX, was also developed by project MAC. This system was the<br />
successor to CTSS <strong>and</strong> was used for multiple-access computing. xxxvi<br />
45
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In 1962, the Telstar I communications satellite was<br />
launched <strong>and</strong> relayed the first transatlantic television<br />
signals. The black <strong>and</strong> white image of an American flag<br />
was relayed from a large antenna in Andover, Maine to<br />
the Radome in Pleumeur-Bodou, France. This was the<br />
first satellite built for active communications. It<br />
demonstrated that a worldwide communication system<br />
was feasible. The satellite was launched by NASA from<br />
Cape Canaveral, Florida, weighed 171 pounds <strong>and</strong> was<br />
34 inches in diameter. On the same day, the Telstar I<br />
beamed the first satellite long distance phone call. The satellite was in service until 1963.<br />
As of 2002, there were 260 active satellites in Earth’s orbit.<br />
46
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In late 1962, the Atlas computer (left) entered service at the University of Manchester,<br />
Engl<strong>and</strong>. This was the first machine to have pipelined instruction execution, virtual<br />
memory <strong>and</strong> paging, <strong>and</strong> separate fixed <strong>and</strong> floating-point arithmetic units. At the time it<br />
was the world’s most powerful computer capable of about 200,000 FLOPS. It could<br />
perform the following arithmetic operations (approximate times):<br />
• Fixed-point addition in 1.59 microseconds<br />
• Floating-point add in 1.61 microseconds<br />
• Floating-point multiply in 4.97 microseconds<br />
The machine could timeshare between different peripheral <strong>and</strong> computing operations,<br />
was multiprogramming capable, had interleaved stores, had V-stores to store images of<br />
memory, had a one-level virtual store, had autonomous transfer units <strong>and</strong> ROM stores. It<br />
had an operating system called the “Supervisor” to manage the computers processing<br />
time <strong>and</strong> scheduling operations <strong>and</strong> could compile high-level languages. The machine<br />
had a 48-bit word size <strong>and</strong> a 24-bit address size. It could store 16K words in its main<br />
ferrite core memory, interleaving odd <strong>and</strong> even address. It had an additional 96K of<br />
storage in its four magnetic drum storage, which was integrated with the main memory<br />
using virtual memory or paging. It also accessed its peripheral devices through V-store<br />
addresses <strong>and</strong> extracode routines. xxxvii<br />
In 1964, J. Kemeny <strong>and</strong> T. Kurtz, mathematics professors at Dartmouth College,<br />
developed the Beginner's All Purpose Symbolic Instruction Code (BASIC) as a simple to<br />
learn <strong>and</strong> interpret language that would serve to help students learn more complex <strong>and</strong><br />
powerful languages, such as FORTRAN or ALGOL. xxxviii In the same year, IBM<br />
developed its Programming Language 1 (PL/1), formerly known as New Programming<br />
Language (NPL), which was the first attempt to develop a language that could be used for<br />
many application areas. Previously, programming languages were designed for a single<br />
purpose, such as mathematics or physics. PL/1 can be used for business <strong>and</strong> scientific<br />
purposes. PL/1 is a freeform language with no reserved keywords, has hardware<br />
independent data types, is block oriented, contains control structures to conditionally<br />
allow logical operations, supports arrays, structures <strong>and</strong> unions (<strong>and</strong> complex<br />
combinations of the three structures), <strong>and</strong> provides storage classes. xxxix<br />
In 1962, Doug Englebart of the Stanford Research Institute published the paper:<br />
“Augmenting Human Intellect: A Conceptual Framework”. His ideas proposed a device<br />
that would allow a computer user to interact with an information display screen by using<br />
a device to move a cursor on the screen—in other words, a mouse. The actual device,<br />
shown on the left, was invented in 1964. xl In the same year, the number of computers in<br />
the US grows to 18,000. In 1972, Xerox Palo Alto Research Center (PARC) Learning<br />
47
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Research Group developed Smalltalk.<br />
This forerunner of Mac OS <strong>and</strong> MS<br />
Windows was the first system with<br />
overlapping windows <strong>and</strong> opaque popup<br />
menus. In 1973, Alan Kay invented<br />
the “office computer”, a forerunner of<br />
the PC <strong>and</strong> Mac. Its design was based<br />
on Smalltalk, with icons, graphics <strong>and</strong> a<br />
mouse. Kay stated at a 1971 meeting at<br />
PARC:<br />
"Don't worry about what anybody else is going to do… The best way to predict the future<br />
is to invent it. Really smart people with reasonable funding can do just about anything<br />
that doesn't violate too many of Newton's Laws!" xli<br />
In 1973, R. Metcalfe <strong>and</strong> researchers at Xerox PARC developed the experimental Alto<br />
PC that incorporates a mouse, graphical user interface <strong>and</strong> Ethernet. Within the same<br />
year, PARC’s Charles Simonyi developed Bravo text editor, the first “What You See Is<br />
What You Get—type” (WYSIWYG) application. Metcalfe, later in the year, wrote a<br />
memo describing Ethernet as a modified “Alohanet”, titled “Ether Acquisition”. By<br />
1975, Metcalfe developed the first Ethernet local area network (LAN). By 1979, Xerox,<br />
Intel <strong>and</strong> DEC had announced support for Ethernet. The Alto PC was officially<br />
introduced in 1981 with a mouse, built-in Ethernet <strong>and</strong> Smalltalk. The commercial<br />
version, available the same year, was named the Xerox Star <strong>and</strong> was the forst<br />
commercially available workstation with a WYSIWYG desktop-type Graphical <strong>User</strong><br />
interface (GUI).<br />
In 1964, Control Data Corp. introduced the CDC<br />
6600 (left). It was designed by supercomputer guru<br />
Seymour Cray, had 400,000 transistors <strong>and</strong> was<br />
capable of 350,000 FLOPS. The 100 produced $7-<br />
10 million machines had over 100 miles of electrical<br />
wiring <strong>and</strong> a Freon refrigeration system to keep the<br />
system’s electronics cool <strong>and</strong> were the world’s first<br />
commercially successful supercomputer. The<br />
machine was also the first to have an interactive<br />
display the showed the graphical results of data, as it<br />
was processed in real-time.<br />
48
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Between 1964 <strong>and</strong> 1965, DEC<br />
introduced the PDP-8 (left)—the<br />
world’s first minicomputer. It<br />
contained transistor-based circuitry<br />
modules <strong>and</strong> was mass-produced for<br />
the commercial market—the first<br />
computer sold as a retail product.<br />
During its initial offering at $18,000,<br />
it was the smallest <strong>and</strong> least<br />
expensive available parallel generalpurpose<br />
computer. By 1973, the<br />
PDP-8, described as the “Model T”<br />
of the computer industry, was the<br />
best selling computer in the world.<br />
They had 12-bit words, usually with 4K words of memory, a robust instruction set <strong>and</strong><br />
could run at room temperature. xlii<br />
In 1965, Maurice V. Wilkes proposes the use of cache memory—a smaller, faster, more<br />
expensive type of memory that hold a copy of part of main memory. Access to entities in<br />
cache memory is much faster than that in main memory, which leads to better system<br />
performance. The same year, Intel founder Gordon Moore proposed that the number of<br />
transistors on microchips would double every year. The prediction was valid <strong>and</strong> came to<br />
be known as Moore’s Law. Consider that a chip in 1964 that was 2½ cm 2 had ten<br />
components <strong>and</strong> a chip in 1970 of the same size had about 1000.<br />
In 1967, Donald Knuth produced some of the work that would become “The Art of<br />
Computer Programming”. He introduced the idea that a computer program’s algorithms<br />
<strong>and</strong> data structures should be treated as different entities than the program itself, which<br />
has greatly improved computer programming. Volume 1 of The Art of Computer<br />
Programming was published in 1968.<br />
In 1967, Niklaus Wirth began to develop the Pascal structured programming language.<br />
The Pascal St<strong>and</strong>ard (ISO 7185) states that it was intended to:<br />
• “make available a language suitable for teaching programming as a systematic<br />
discipline based on fundamental concepts clearly <strong>and</strong> naturally reflected by the<br />
language”<br />
• “to define a language whose implementations could be both reliable <strong>and</strong> efficient<br />
on then-available computers” xliii<br />
49
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Pascal, based on ALGOL’s block structure, was released in 1970. An example “Hello<br />
World!” Program in Pascal is:<br />
Program Hello (Input, Output);<br />
Begin<br />
Writeln ('Hello World!');<br />
End.<br />
In 1968, Burroughs introduced the first computers that used integrated circuits—the<br />
B2500 <strong>and</strong> the B3500. The same year Control Data built the CDC7600 <strong>and</strong> NCR<br />
introduced their Century series computer—both using only integrated circuits.<br />
In 1968, the Federal Information Processing St<strong>and</strong>ard created the “The Year 2000 Crisis”<br />
by encouraging the “YYMMDD” six-digit date format for information interchange. In<br />
1968, the practice of structured programming started with Edsger Dijkstra’s writings<br />
about the harm of the goto statement. This lead to wide use of control structures, such as<br />
the while loop, to control iterative routines in programs. xliv Between 1968 <strong>and</strong> 1969,<br />
NATO Science Committee held two conferences on Software Engineering, which is<br />
considered to be the start of this field. From the 1960’s to the 1980’s, there was a<br />
“software crisis” because many software projects had undesirable endings. Software<br />
Engineering arose from the need to produce better software, on schedule <strong>and</strong> within the<br />
anticipated budget. Essentially, Software Engineering is a set of diverse practices <strong>and</strong><br />
technologies used in the creation <strong>and</strong> maintenance of software for diverse purposes. xlv<br />
In 1969, Bell Labs withdrew support from Project MAC <strong>and</strong> the Multics system to begin<br />
development of UNIX. Kenneth Thompson <strong>and</strong> Dennis Ritchie began designing UNIX<br />
in the same year. The operating system was initially named Uniplexed Information <strong>and</strong><br />
Computing System (UNICS) as a hack on Multics but was later changed. In the<br />
beginning, UNIX received no financial support from Bell Labs. Some support was<br />
granted to add text processing to UNIX for use on the DEC PDP-11/20. The text<br />
processor was named runoff, which Bell Labs used to record patent information, <strong>and</strong> later<br />
evolved into troff, the world’s first publishing program with the capability of full<br />
typesetting. In 1973, it was decided to rewrite UNIX in C, a high level language, to<br />
make it easily modifiable <strong>and</strong> portable to other machines, which accelerated the<br />
development of UNIX. AT&T licensed use of this system to commercial, education <strong>and</strong><br />
government organizations.<br />
In 1973, Dennis Ritchie developed the C programming language. C is a high level<br />
programming language mainly to be used with UNIX. A sample “Hello World!”<br />
program in C is:<br />
50
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
#include <br />
int main(){<br />
printf ("Hello World!\n");<br />
return 0;<br />
}<br />
Later in 1983, Bjarne Stoustrup (right) added object<br />
orientation to C, creating C++, at AT&T Bell Labs. In<br />
1995, Sun Microsystems released its object-oriented Java<br />
programming language, which was both platform<br />
independent <strong>and</strong> network compatible. Java is an extension<br />
of C++ <strong>and</strong> C++ is an extension of C.<br />
By 1975, there were versions of UNIX using pipes for inter process communication<br />
(IPC). AT&T released a commercial version, UNIX System III, in 1982. Later, System<br />
V was developed by combining features from other versions, including U.C. Berkley’s,<br />
Berkeley Software Distribution (BSD), which contributed the Vi editor <strong>and</strong> curses.<br />
Berkley continued to work on BSD the noncommercial version <strong>and</strong> added Transmission<br />
Control Protocol (TCP) <strong>and</strong> the Internet Protocol (IP), known as the TCP/IP suite, for<br />
network communication to the UNIX kernel. Eventually AT&T produced UNIX System<br />
V by adding system administration, file locking for file level security, job control,<br />
streams, the Remote File System <strong>and</strong> Transport Layer Interface (TLI) as a network<br />
application programming interface (API). Between 1987 <strong>and</strong> 1989, AT&T merged<br />
System V <strong>and</strong> XENIX, Microsoft’s x86 UNIX implementation, into UNIX System V<br />
Release 4 (SVR4).<br />
Novel bought the rights for UNIX from AT&T to in an attempt to challenge Microsoft’s<br />
Windows NT, which caused their core markets to suffer. Novel sold the UNIX rights to<br />
X/OPEN, an industry consortium that defined a version of the UNIX st<strong>and</strong>ard, who later<br />
merged with OSF/1, another st<strong>and</strong>ard group, to form the Open Group. The Open Group<br />
presently defines the UNIX operating system. xlvi<br />
In 1969, the RS-232<br />
st<strong>and</strong>ard, commonly<br />
referred to as a serial port,<br />
for serial binary data<br />
interchange between Data terminal equipment (DTE)<br />
<strong>and</strong> Data communication equipment (DCE) was<br />
established. xlvii<br />
51
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In 1970, RCA developed metal-oxide semiconductor<br />
(MOS) technology for fabricating integrated circuits,<br />
which made them smaller in size, cheaper <strong>and</strong> faster to<br />
produce. The first chips using large-scale integration<br />
(LSI) were produced in the same year, containing up to<br />
15,000 transistors per chip. In 1971, Intel introduced<br />
the world’s first mass produced, single chip, universal<br />
microprocessor, the Intel 4004 (left), which was<br />
invented by Federico Faggin, Ted Hoff, Stan Mazor <strong>and</strong> their<br />
engineering team. It was a dual inline package (DIP)<br />
processor, which means that it had two rows of pins that were<br />
inserted into the motherboard. The microprocessor can be<br />
thought of as a “computer on a chip”. All of the thinking<br />
parts of the computer, central processing unit (CPU),<br />
memory, input <strong>and</strong> output (I/O) controls, were miniaturized<br />
<strong>and</strong> condensed onto a single<br />
chip. The 4004 chip, based<br />
on the silicon-gated MOS<br />
technology, had more than<br />
2,300 transistors in an area<br />
of 12 square millimeters, a<br />
4-bit CPU that used 8-bit<br />
instructions, a comm<strong>and</strong><br />
register, a decoder,<br />
decoding control, control<br />
monitoring of machine<br />
comm<strong>and</strong>s <strong>and</strong> an interim<br />
register. The chip ran at a<br />
speed of 108 kHz <strong>and</strong> could<br />
process 60,000 instructions<br />
per second at a cost of $300. It had sixteen either 4-<br />
bit or 8-bit general-purpose registers <strong>and</strong> set of 45<br />
instructions. It could address 1K of program<br />
memory <strong>and</strong> 4K of data memory. Later models had<br />
clock speeds of up to 740KHz. The picture on the<br />
lower left shows the 4004 motherboard <strong>and</strong> the<br />
picture on the right shows the chip die. The Pioneer<br />
10 spacecraft, launched on March 2, 1972, used a<br />
4004 processor <strong>and</strong> became the first spacecraft (<strong>and</strong><br />
microprocessor) to enter the Asteroid Belt. xlviii<br />
52
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In 1972, Intel offered the 8008 chip (left),<br />
which was the world’s first 8-bit<br />
microprocessor. The 8008 had 3300<br />
transistors <strong>and</strong> even though its clock speed<br />
was 800 KHz it was slightly slower in<br />
instructions per second than the 4004 but<br />
because it was 8-bit, it could access more<br />
RAM <strong>and</strong> process data 3 to 4 times faster<br />
than the 4-bit chips. In 1974, Intel released<br />
the 8080 chip (left), which had a 16-bit<br />
address bus <strong>and</strong> an 8-bit data bus. It had a<br />
16-bit stack pointer, a 16-bit program<br />
counter <strong>and</strong> seven 8-bit registers, of which some could be combined for 16-bit registers.<br />
It also had 256 I/O ports to ensure that devices did not interfere with its memory address<br />
space. It had a clock speed of 2 MHz, 64 KB of addressable memory, 48 instructions <strong>and</strong><br />
vectored multilevel interrupts.<br />
In 1978, Intel introduced the 8086 chip<br />
(left)—the first 16-bit microprocessor.<br />
This chip had 29,000 transistors, using<br />
a 3.0-micron die core design <strong>and</strong> 300<br />
instructions. It had a 16-bit bus<br />
compatibility for communication with<br />
peripherals. The chips were available in 5, 6, 8, <strong>and</strong> 10 MHz clock speeds <strong>and</strong> had a 20-<br />
bit memory address space that could address up to 1 MB of RAM. Though the 8086 was<br />
available, IBM chose to use the 8088, the 8-bit version developed slightly later, because<br />
of the former chip’s great expense. xlix<br />
The Intel 80186, released in<br />
1980, had a 16-bit external<br />
bus, an initial clock speed of<br />
6 MHz <strong>and</strong> a 1.0-micron die.<br />
This chip was Intel’s first pin<br />
grid array (PGA) offering,<br />
meaning that the pins on the<br />
processor were arranged into<br />
a matrix-like array with the<br />
pins around the outside edge<br />
(upper right). This popular<br />
chip was mostly used in imbedded systems <strong>and</strong> rarely used in PCs. This model required<br />
less external chips than its predecessors. It had an integrated system controller, a priority<br />
53
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
interrupt controller, two direct memory access (DMA) channels (with<br />
controller), <strong>and</strong> timing circuitry (three timers). It replaced 22 separate<br />
VLSI <strong>and</strong> transistor-transistor logic (TTL) chips <strong>and</strong> was more cost<br />
efficient than the chips it replaced. In 1982, Intel developed the 80286<br />
processor, which had 134,000 transistors, a 1.5-micron die, <strong>and</strong> could<br />
address up to 16 megabytes of memory. This microprocessor was the<br />
first to introduce the protected mode, which allowed the computer to<br />
multitask by running more than one program at a time by time-sharing<br />
the systems resources. Its initial models ran at 8, 10 <strong>and</strong> 12.5 MHz but<br />
later models ran as fast as 20 MHz. The 80386 processor was released in<br />
1985 with 275,000 transistors, a 1.0-micron die, a 32-bit instruction <strong>and</strong><br />
a 32-bit memory address space that could address up to four gigabytes of<br />
RAM. It had the ability to address up to 64 terabytes of virtual memory. The initial<br />
clock speeds were 16, 20, 25, <strong>and</strong> 33 MHz. It also had a feature called instruction<br />
pipelining, which allowed the processor to run the next instruction before finishing the<br />
previous instruction. It had a virtual real time mode that allowed more than one running<br />
session of real time programs, a feature that is used in multitasking operating systems.<br />
This chip also had a system management mode (SMM), which could power down various<br />
hardware devices to decrease power use. In 1989, Intel introduced the 80486 line of<br />
processors with 1.2 million transistors, a 1.0-micron die, <strong>and</strong> the same instruction <strong>and</strong><br />
memory address size as the 386. This was the first microprocessor to have an integrated<br />
floating-point unit (FPU). Previously, CPUs had to have an external FPU, called a math<br />
coprocessor, to speed up floating-point operations. It also had 8 kilobytes of on-die<br />
cache, which stored predicted next instructions for pipelining. This saved an access to<br />
main memory, which is much slower than cache memory. Later 486 models could<br />
operate at greater speeds that the maximum system bus speed. The 486DX2/66 was a<br />
clock doubled 33 MHz to 66 MHz <strong>and</strong> the 486DX4/100 was clock a tripled 33 MHz to<br />
100 MHz.<br />
In 1993, Intel released the Pentium processor with 3.21 million<br />
transistors <strong>and</strong> a 0.8-micron die. Clock speeds were available<br />
from 60 to 200 MHz, with a 60 MHz processor capable of 100<br />
MIPS. It had the same 32-bit address space as the 386 <strong>and</strong> 486<br />
but had an external data bus width of 64 bits <strong>and</strong> a superscalar<br />
architecture (able to process two instructions per clock cycle),<br />
which allowed it to process instructions <strong>and</strong> data about twice as<br />
fast as the 486. Internally, this chip was actually two 32-bit<br />
processors chained together that shared the workload. It had<br />
two separate 8 KB caches (one data <strong>and</strong> one instruction cache) <strong>and</strong> a pipelined FPU,<br />
which could perform floating-point operations much faster than the 486. Later versions<br />
54
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
of the chip had symmetric dual processing—the ability to have two processors in the<br />
same system.<br />
In 1995, the Pentium Pro was released with 5.5 million<br />
transistors, a 0.6-micron die <strong>and</strong> a clock speed of up to 200<br />
MHz. It was a reduced instruction set computer (RISC)<br />
processor. RISC processors have a smaller set of instructions<br />
than complex instruction set computer processors. The first<br />
computers were of CISC design to bridge semantic differences<br />
or gaps between low-level machine code <strong>and</strong> high-level<br />
programming languages, which reduced the size of computer<br />
programs <strong>and</strong> calls to main memory but did not necessarily<br />
improve system performance. The main idea with RISC is to build more complex<br />
instructions using a sequence of smaller, simpler instructions. Complex instructions have<br />
greater time <strong>and</strong> space overhead while decoding instructions, especially when microcode<br />
is used to decode macroinstructions. There is a high probability that the frequency of<br />
instructions to be processed will be smaller rather than larger. Limiting the number of<br />
instructions in a computer to a smaller optimized set can contribute to greater<br />
performance. The Pentium Pro could process three instructions per clock cycle <strong>and</strong> had<br />
decoupled decoding <strong>and</strong> execution, which allowed the processor to keep working on<br />
instructions in other pipelines if one of the pipelines stops to wait for an event. The<br />
st<strong>and</strong>ard Pentium would stop all pipelines until the event occurred. It also had up to 1<br />
MB of onboard level-2 cache, which was faster than having the cache on the<br />
motherboard.<br />
In 1997, Intel released the Pentium MMX series of processors<br />
with 4.5 million transistors, clock speeds up to 233 MHz <strong>and</strong> a<br />
0.35-micron die size. The MMX had 57 additional complex<br />
instructions that aided the CPU in performing multimedia <strong>and</strong><br />
gaming instructions 10 to 20 percent faster than processors<br />
without the MMX instruction set. The processor also had dual<br />
16K level-1 cache <strong>and</strong> improved dynamic branch prediction, an<br />
additional instruction pipe <strong>and</strong> a pipelined FPU.<br />
In 1993, Intel released the Pentium II, which had 27.4<br />
million transistors <strong>and</strong> a 0.25-micron die. The<br />
Pentium II combined technology from both the<br />
Pentium Pro <strong>and</strong> the Pentium MMX. It had the Pro’s<br />
dynamic branch prediction, the MMX instructions,<br />
dual 16K level-1 cache <strong>and</strong> 512K of level-2 cache.<br />
The level-2 cache ran at ½-speed <strong>and</strong> was not<br />
55
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
attached directly to the processor, which yielded greater performance but not as much as<br />
if it were full-speed <strong>and</strong> attached. The most notable change was the single edge contact<br />
(SEC), called the “Slot 1”, package design, which resembled a card more than it did a<br />
processor. Initial chips had a 66 MHz bus speed but later models had a 100 MHz bus.<br />
The bus speed is the maximum speed that the processor uses to access data in main<br />
memory.<br />
In 1999, Intel released the Pentium III processor<br />
with 28 million transistors, a 0.18 die <strong>and</strong> a 450<br />
MHz clock speed. This processor had 70 additional<br />
instructions that were extensions of the MMX set,<br />
called the SSE instruction set (also known as the<br />
MMX2 instruction set), which improved the<br />
performance of 3D graphics applications. Later<br />
versions of the Pentium III increased the bus speed<br />
to 133 MHz <strong>and</strong> moved the level-2 cache off of the<br />
board <strong>and</strong> onto the CPU core. Though Intel halved the memory to 256K, there was still a<br />
benefit to performance.<br />
In late 2000, Intel introduced the Pentium IV with 42<br />
million transistors, 0.13-micron die <strong>and</strong> a new NetBurst<br />
architecture to support future increases in speed. NetBurst<br />
consists of the Hyper Pipelined Technology, the Rapid<br />
Execution Engine, the Execution Trace Cache <strong>and</strong> a<br />
400MHz system bus. The Hyper Pipelined Technology<br />
doubled the width of the data pipe from 10 to 20 stages,<br />
which decreased the amount of work per stage <strong>and</strong> allowed<br />
it to h<strong>and</strong>le more instructions. A negative consequence of<br />
widening the data pipe is that it took longer to recover from<br />
errors. A newer <strong>and</strong> advanced branch predictor aided the chip in hedging against this<br />
propensity. The Rapid Execution Engine was the inclusion of two arithmetic logic units<br />
operating at double the speed of the processor, which was necessary to h<strong>and</strong>le the<br />
doubled data pipe. The Execution Trace Cache was a new kind of cache that could hold<br />
decoded instructions until they are ready for execution. The chip has less level-1 cache,<br />
8K, to decrease latency. l<br />
One of the ways Intel <strong>and</strong> other manufacturers have increased the speed <strong>and</strong> performance<br />
of CPUs was to decrease die size. This decreases the voltage needed to run the processor<br />
<strong>and</strong> increases clock speed. The functional part of a processor is actually a tiny chip with<br />
less than a third of a square inch of area within the external package shown in the<br />
preceding paragraphs. The chips are thinner than a dime <strong>and</strong> contain tens-of-millions of<br />
56
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
electronic circuits <strong>and</strong> switches. The chips are constructed from semiconductor<br />
materials, such as gallium arsenide or most commonly silicon, which require certain<br />
conditions to conduct electricity. In the case of silicon, it is grown into a large crystal<br />
<strong>and</strong> sliced by precision saws into sheets, called wafers, which can hold many individual<br />
chips. Layers of various materials treated with a photosensitive material are built up on<br />
the surface of the wafer to form the foundation of the transistors <strong>and</strong> data pathways. A<br />
process called photolithography is used to process these wafers by copying the circuitry<br />
onto the layered materials on the wafer using a separate mask for each layer. Light is<br />
accurately focused through the masks, transferring the masks image onto the wafer,<br />
which causes a chemical reaction on the photosensitive material, fixing the circuitry.<br />
Another chemical is used to wash away the excess material. Sometime after the<br />
photolithography process is complete, the wafer is cut into small rectangular chips. The<br />
chips are installed into the CPU package by soldering the appropriate contacts on the chip<br />
with other circuitry <strong>and</strong> the pins that create the interface with the computer’s<br />
motherboard. li<br />
FIND MATERIAL ON ANALYTIC COMPLEXITY <strong>THE</strong>ORY—1972<br />
In 1975, Bill Gates <strong>and</strong> Paul Allen developed BASIC—the first microcomputer<br />
programming language. In 1977, Microsoft, Gates <strong>and</strong> Allen’s newly founded company,<br />
released Altair BASIC for use on the Altair 8800. In 1980, Microsoft acquired the<br />
nonexclusive rights to an operating system, called 86-DOS, that was developed by a<br />
Seattle Computer Products' Tim Patterson. Microsoft had paid $100,000 to contract the<br />
rights from SCP to sell 86-DOS to an unnamed client. In 1980, IBM chose Microsoft<br />
product PC-DOS as the operating system for their new personal computer line.<br />
The IBM PC became a mainstream corporate item when it<br />
was released in 1981. Microsoft bought all rights to 86-DOS<br />
in 1981, renaming it as MS-DOS. IBM’s 5150 had a 4.77<br />
MHz Intel 8088 CPU with 64K of RAM <strong>and</strong> 40K of ROM.<br />
It had a 5.25-inch, single-sided floppy drive, PC-DOS 1.0<br />
installed <strong>and</strong> sold for $3000. IBM’s new PC had an open<br />
architecture, which used off-the-shelf components. This was<br />
good for rapid <strong>and</strong> industry st<strong>and</strong>ard development but bad<br />
(for IBM) because other companies could obtain these<br />
components <strong>and</strong> build their own machines. In 1982,<br />
Columbia Data Products released the first IBM PC<br />
compatible “clone”, called the MPC <strong>and</strong> Microsoft released an IBM compatible version<br />
operating system—MS-DOS v1.25, which could support 360K double-sided floppy<br />
57
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
disks. The same year, Compaq introduces<br />
their first PC. The popularity of the PC<br />
caused sales to soar to 3,275,000 units in<br />
1982, which was greater than ten times as<br />
many in 1981. The social impact of<br />
computers was so important that Time<br />
Magazine named the PC as its “Man of the<br />
Year” to be published on the cover of the<br />
January 1983 edition as the “Machine of the<br />
Year”. By 1990, more that 54 million<br />
computers will be in use in the U.S. By 1996,<br />
approximately 66 percent of employees <strong>and</strong><br />
33 percent of homes have access to personal<br />
computers.<br />
The initial MS-DOS offerings did not support<br />
hard disks. Version 2.0 in 1983 supported up<br />
to 10 MB hard disks <strong>and</strong> tree – structured file<br />
systems. Version 3.0 in 1984 supported 1.2<br />
MB <strong>and</strong> hard disks larger than 10 MB <strong>and</strong> 3.1<br />
had Microsoft network support. Version 4.0 in 1988 had graphical user interface support,<br />
a shell menu interface <strong>and</strong> support for hard disks larger than 32 MB. Version 5.0 in 1991<br />
had a full-screen editor, undelete <strong>and</strong> unformat utilities, <strong>and</strong> task swapping. Version 6.0<br />
in 1993 had DoubleSpace disk compression utility <strong>and</strong> sold over a million copies in 40<br />
days. Version 7.0 of MS-DOS was included with Windows 95 in 1995. lii In 1985,<br />
Microsoft<br />
introduced<br />
Windows<br />
1.0(top left)<br />
with the promise<br />
of an easy-touse<br />
graphical<br />
user interface,<br />
device<br />
independent<br />
graphics <strong>and</strong><br />
multitasking<br />
support. A<br />
limited set of available applications lead to modest sales. Windows 2.0 (bottom left) was<br />
58
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
released in 1987 with two types available.<br />
One was for the 16-bit Intel 80286<br />
microprocessor, called Windows/286. It<br />
added icons <strong>and</strong> overlapping windows with<br />
independently running applications. The other<br />
was for Intel’s 32-bit line of 80386<br />
microprocessors, which had all the<br />
functionality of the Windows/286 system but<br />
also had the ability to run multiple DOS<br />
applications, simultaneously. Windows 2.0<br />
had much better sales due to the availability of<br />
software applications, including Excel, Word,<br />
Corel Draw!, Ami, Aldus PageMaker <strong>and</strong><br />
Micrografx Designer. In 1990, Microsoft<br />
released Windows 3.0 (left) with a completely<br />
new interface <strong>and</strong> the ability to address<br />
memory beyond 640K without secondary<br />
memory manager utilities. Many independent<br />
software developers produced software<br />
applications for this environment, boosting<br />
sales to over 10,000,000 copies.<br />
In 1994, Microsoft released Windows NT 3.1<br />
with an entirely new operating system kernel.<br />
This system was intended for high-end uses,<br />
such as network servers, workstations <strong>and</strong><br />
software development machines. Windows<br />
NT 4.0 was released later the same year <strong>and</strong><br />
was an object-oriented operating system. In<br />
1995, Microsoft introduced Windows 95<br />
(left), which was a full 32-bit operating<br />
system. It had preemptive multitasking,<br />
multithreaded, integrated network, advanced<br />
file system. Though it included DOS 7.0, the<br />
Windows 95 OS assumed full control of the<br />
system after booting. In 1998, Windows 98<br />
was released with enhanced Web support (the Internet Explorer browser was integrated<br />
with the OS), FAT32 for very large hard disk support, <strong>and</strong> multiple display support to use<br />
up to 8 video cards <strong>and</strong> monitors. It also had hardware support for DVD, Firewire,<br />
universal serial bus (USB) <strong>and</strong> accelerated graphics port (AGP). In 2000, Windows 2000<br />
(formerly NT 5.0) was released <strong>and</strong> included many of the features of Windows 98,<br />
59
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
including integrated Web support, <strong>and</strong> enhanced support for distributed file system. It<br />
also supported Internet, intranet <strong>and</strong> extranet platforms, active directory, virtual private<br />
networks, file <strong>and</strong> directory encryption, <strong>and</strong> installation of the W2K OS from a server<br />
located on the LAN.<br />
1976, Cray Research developed the Cray-1 (left)<br />
supercomputer with vectorial architecture, which<br />
was installed at the Los Alamos National<br />
Laboratory. The $8.8 million machine could<br />
perform 160 FLOPS (world record at the time)<br />
<strong>and</strong> had an 8-megabyte (1 million words) main<br />
memory. The machines hardware contained no<br />
wires longer than four feet <strong>and</strong> had a “unique C-<br />
shape”, which allowed integrated circuits to be<br />
very close together. In 1982, Steve Chen’s <strong>and</strong><br />
his research group built the Cray X-MP (right) by<br />
making architectural changes to the Cray-1,<br />
which contained two Cray-1 compatible<br />
pipelined processors <strong>and</strong> a shared memory<br />
(essentially two Cray-1 machines were linked<br />
together in parallel using a shared memory).<br />
This was the first use of shared-memory<br />
multiprocessing in vector supercomputing. The<br />
initial computational speedup of the twoprocessor<br />
X-MP over the Cray-1 was 300%—<br />
three times the computational speed by only<br />
doubling the number of processors. It was<br />
capable of 500 megaflops. This machine<br />
became world’s most commercially successful<br />
parallel vector supercomputers. Chen<br />
commented that the X in X-MP stood for<br />
“extraordinary”. The X-MP ran on UNICOS,<br />
which was Cray’s first UNIX-like operating<br />
system. In 1985, the Cray-2 reached one<br />
billion FLOPS <strong>and</strong> had the world’s largest<br />
memory at 2048 megabytes. In 1988, Cray<br />
produced the Y-MP, which was first<br />
supercomputer to “sustain” over one billion<br />
FLOPS on many of its applications. It had<br />
multiple 333 million FLOPS processors that<br />
could achieve 2.3 billion FLOPS. liii<br />
60
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In 1977, DEC introduced the 32-bit<br />
VAX11/780 computer (left), which was<br />
used primarily for scientific <strong>and</strong> technical<br />
applications. The first machine was<br />
installed at Carnegie Mellon University<br />
with other units installed at CERN in<br />
Switzerl<strong>and</strong> <strong>and</strong> the Max Planck Institute<br />
in Germany. It could perform 1,000,000<br />
instructions per second <strong>and</strong> was the first<br />
commercially available 32-bit machine. liv<br />
In 1981, Motorola introduced one of the first<br />
32-bit instruction microprocessor offerings from<br />
their 68000 line of processors. The chip has 32-<br />
bit registers <strong>and</strong> a flat 32-bit address space,<br />
which could access a specific memory location,<br />
instead of blocks of memory like the 8086. It<br />
had a 16-bit ALU but had a 32-bit address adder<br />
for address arithmetic. It had eight generalpurpose<br />
registers <strong>and</strong> eight address registers. It<br />
used the last address register as a stack pointer<br />
<strong>and</strong> had a separate status register. It was<br />
initially designed as an embedded processor for<br />
household products but found its way into<br />
Amiga <strong>and</strong> Atari home computers <strong>and</strong> arcade<br />
computer games as a controller. It was also used in Apple Macintosh, Sun Microsystems<br />
<strong>and</strong> Silicon Graphics machines. The architecture of this chip was very similar to PDP-11<br />
<strong>and</strong> VAX machines, which made it very compatible with programs written in the c<br />
language. The chip has been used by auto manufacturers as controllers as well as in<br />
medical hardware <strong>and</strong> computer printers because of its low cost. Updated models of the<br />
processor are still used today in personal digital assistants (PDAs) <strong>and</strong> Texas Instruments<br />
TI-89, TI-92 <strong>and</strong> Voyage 2000 calculators. In 1988, Motorola introduced the 88000<br />
series processors, which were RISC-based, had a true Harvard architecture (separate<br />
instruction <strong>and</strong> data busses) <strong>and</strong> could perform 17 MIPS. lv<br />
In 1985, Inmos introduced the transistor computer (transputer) with its concurrent parallel<br />
microprocessing architecture. Single transputer chips would have all the necessary<br />
circuitry to work by themselves or could be wired together to form more powerful<br />
devices from simple controllers to complex computers. Chips of varying power <strong>and</strong><br />
complexity were available to serve a wide array of tasks. A low power chip might be<br />
61
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
configured to be a hard disk controller <strong>and</strong> a few higher-powered chips might act as<br />
CPUs. These were the first general purpose chips to be specifically designed for parallel<br />
computing.<br />
It was realized in the early 1980’s that conventional CPUs would reach a performance<br />
limit. Even though advances in technology had miniaturized processor circuitry, packing<br />
millions of transistors on chips smaller than the size of a fingernail <strong>and</strong> had drastically<br />
increased computational speed, there was still a impenetrable barrier to conventional<br />
processor performance—the speed of light. Light in a vacuum travels at approximately<br />
299,792,458 meters per second or approximately one foot in a nanosecond. This is the<br />
upper limit for the speed that electrons can travel within electrical equipment, which<br />
suggests that the clock speed limit for processors is about 10 GHz. We are almost half<br />
way to this limit <strong>and</strong> we realize that the speed of light is a limiting factor in the design of<br />
CPUs. The best way to ensure progress in computational performance is parallel<br />
processing. lvi<br />
62
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Parallel Processing<br />
What is parallel processing?<br />
Parallel processing is the concurrent execution of the same activity or task on multiple<br />
processors. The task is divided or specially prepared so that the work can be spread<br />
among many processors <strong>and</strong> yield the same result as if done on one processor but in less<br />
time. There is a variety of parallel processing systems. A parallel processing system can<br />
be a single machine with many processors or many machines connected by a network.<br />
The most powerful machines in the world are machines with hundreds or thous<strong>and</strong>s of<br />
processors <strong>and</strong> hundreds of gigabytes of memory. These machines are called massively<br />
parallel processors (MPP). Many individual machines can cooperate to perform the same<br />
task in distributed networks. The combination of lower performance computers may<br />
exceed the power of a single high-performance computer, when the computational<br />
resources are comparable. The computational power of MPPs has been combined using<br />
the distributed system model to produce unprecedented performance.<br />
Flynn’s taxonomy classifies computing systems with respect to the two types of streams<br />
that flow into <strong>and</strong> out of a processor: instructions <strong>and</strong> data. These two types of streams<br />
can be conceptually split into two different streams, even if delivered on the same wire.<br />
The classifications, based on the number of streams of each type, are:<br />
Single instruction stream/single data stream (SISD) systems have a single instruction<br />
processing unit <strong>and</strong> a single data processing unit. These are conventional single<br />
processor computers, also known as sequential computers scalar processors.<br />
Single instruction stream/multiple data streams (SIMD) systems have a single instruction<br />
processing unit or controller <strong>and</strong> multiple data processing units. The instruction unit<br />
fetches <strong>and</strong> executes instructions until a data or arithmetic operation is reached. It then<br />
sends this instruction to all of the data processing units, which each perform the same<br />
task on different pieces of data, until all data is processed. These data processing units<br />
are either idle or all performing the same task as all other data processors. They cannot<br />
perform different tasks, simultaneously. Each of the data processors has a dedicated<br />
memory storage area. They are directed by the instruction processor to store <strong>and</strong> retrieve<br />
data to <strong>and</strong> from memory. The advantage of this system is the decrease in the amount of<br />
logic on the data processors. Approximately 20 to 50 percent of a single processor’s<br />
logic is dedicated to control operations. The rest of the logic is shared by register, cache,<br />
arithmetic <strong>and</strong> data operations. The data processors have little or no control logic, which<br />
allows them to perform arithmetic <strong>and</strong> data operations much more rapidly. A vector or<br />
array processing machine is an example of an SIMD machine that distributes data across<br />
63
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
all memories (possibly stores each cell of an array or each column of a matrix in a<br />
different memory area). These machines are designed to execute arithmetic <strong>and</strong> data<br />
operations on a large number of data elements very quickly. A vector machine can<br />
perform operations in constant time if the length of the vectors (arrays) does not exceed<br />
the number of data processors. Most supercomputers, used for scientific computing in<br />
the 1980’s <strong>and</strong> 1990’s, are based on this architecture.<br />
Multiple instruction streams/single data stream (MISD) systems have multiple instruction<br />
processors <strong>and</strong> a single data processor. Few of these machines have been produced <strong>and</strong><br />
have had no commercial success.<br />
Multiple instruction streams/multiple data streams (MIMD) systems have multiple<br />
instruction processors <strong>and</strong> multiple data processors. There are a diverse variety of MIMD<br />
systems including those constructed from inexpensive off-the-shelf components to much<br />
more expensive interconnected vector processors, <strong>and</strong> many other configurations.<br />
Computers over a network that simultaneously cooperate to complete a single task are<br />
MIMD systems. Computers that have two or more independent processors are another<br />
example. A multiple independent processor machine has the ability to perform more than<br />
one task, simultaneously. lvii<br />
There are three types of performance gains received from parallel processing solutions<br />
for the use of n processors:<br />
• Sub-linear speedup is when the increase in speed is less than<br />
o i.e. five processors yields only 3x speedup<br />
• Linear speedup is when the increase is equal to n<br />
o i.e. five processors yields 5x speedup<br />
• Super-linear speedup is when the increase is greater than n<br />
o i.e. five processors yields 7x speedup<br />
Generally linear or faster speedup is very hard to achieve because of the sequential nature<br />
of most algorithms. Parallel algorithms must be designed to take advantage of parallel<br />
hardware. Parallel systems may have one shared memory area, to which all processors<br />
may have access. In shared memory systems care must be taken to design parallel<br />
algorithms that ensure mutual exclusion, which protects data from being corrupted when<br />
operated on by more than one processor. The results from parallel operations should be<br />
determinate, meaning they should be the same as if done by a sequential algorithm. As<br />
an example, if two processors write to the same variable in memory such that:<br />
• Processor 1 reads: x<br />
• Processor 2 reads: x<br />
64
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
• Processor 1 writes: x = x + 1<br />
• Processor 2 writes: x = x – 1<br />
Depending on the possible orderings of the reads <strong>and</strong> writes the resulting variable could<br />
be x–1, x+1 or x. This is a race condition <strong>and</strong> is an extremely undesirable because the<br />
result depends on chance. Synchronization primitives, such as semaphores <strong>and</strong> monitors,<br />
aid in the resolution of conflicts due to race conditions. The shared memory may be in a<br />
single machine if it has more than one processor or a distributed shared memory, where<br />
individual computers access the same memory area(s) located on another computer(s) on<br />
the network.<br />
Parallel processors must use some means to communicate. This is done on the system<br />
buss <strong>and</strong> with shared memory in the case of a single computer with multiple processors.<br />
When multiple machines are involved, communication can be implemented over a<br />
network using either message passing or a distributed shared memory.<br />
Cost is a very important consideration in distributed computing. A parallel system with n<br />
processors is cheaper to build than a processor that is n-times faster. For tasks that need<br />
to be completed quickly <strong>and</strong> can be performed by more than one thread of execution with<br />
minimal concurrency, parallel processing is an exceptional solution. Many highperformance<br />
or supercomputing machines have parallel processing architectures. The<br />
parallel implementations discussed in the remainder of this book will be based on<br />
distributed computing as opposed to single machines with multiple processors.<br />
65
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Existing Tools for Parallel Processing<br />
The parallel programming systems discussed, PVM, MPI <strong>and</strong> Linda, are implemented<br />
with libraries of function calls that are coded directly into either C or Fortran source code<br />
<strong>and</strong> compiled. There are two primary types of communication used: message passing<br />
(PVM <strong>and</strong> MPI) <strong>and</strong> tuple space (Linda <strong>and</strong> <strong>Synergy</strong>). In message passing a participating<br />
process may send messages to any other process, directly, which is somewhat similar to<br />
inter-process communication (IPC) in the Linux/UNIX operating system. In fact, both<br />
message passing <strong>and</strong> tuple space systems are implemented with sockets in the<br />
Linux/UNIX environment. A tuple space is a type of distributed shared memory that is<br />
used by participating processes to hold messages. These messages can be posted or<br />
obtained by any of the participants. All of these programs function by the use of<br />
“master” <strong>and</strong> “worker” designations. The master is generally responsible to break the<br />
task into pieces <strong>and</strong> to assemble the results. The workers are responsible to complete<br />
their piece of the task. These systems are communicate over computer networks <strong>and</strong><br />
typically have some type of middleware to facilitate cooperation between machines, such<br />
as the cluster discussed below.<br />
Computer Clusters<br />
Computer clusters, sometimes referred to as server farms, are groups of connected<br />
computers that form a parallel computer by working together to complete tasks. Clusters<br />
were originally developed in the 1980’s by Digital Equipment Corporation (DEC) to<br />
facilitate parallel computing <strong>and</strong> file <strong>and</strong> peripheral device sharing. An example of a<br />
cluster would be a Linux network with some middleware software to implement the<br />
parallelism. Well established cluster systems have procedures to eliminate single point<br />
failures, providing some level of fault tolerance. The four major types of clusters are:<br />
• Director based clusters—one machine directs or controls the behavior of the<br />
cluster <strong>and</strong> usually implemented to enhance performance<br />
• Two-node clusters—two nodes perform the same part of the task or one serves as<br />
a backup in case the other fails to ensure fault tolerance<br />
• Multi-node clusters—may have tens of clustered machines, which are usually on<br />
the same network<br />
• Massively parallel clusters—may have hundreds or thous<strong>and</strong>s of machines on<br />
many networks<br />
Currently, the fastest supercomputing cluster is Earth Simulator at 35.86 TFlops, which is<br />
15 TFlops faster than the second place machine. The main reason for cluster based<br />
66
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
supercomputing, after performance, is cost efficiency. The third fastest supercomputing<br />
cluster is the 17.6 TFlop System X at Virginia Tech. It consists of 1100 dual processor<br />
Apple Power Macintosh G5s running Mac OS X. It cost a mere $5.2 million, which is 10<br />
percent of the cost of much slower mainframe supercomputers.<br />
The Parallel Virtual Machine (PVM)<br />
The Parallel Virtual Machine (PVM), a software tool to implement a system of<br />
networked parallel computers, was originally developed by Oak Ridge National<br />
Laboratory (ORNL) in 1989 by Vaidy Sunderam <strong>and</strong> Al Geist. Version 1 was a<br />
prototype that was only used internally for research .PVM was later rewritten by<br />
University of Tennessee <strong>and</strong> released as Version 2 in 1991, which was used primarily for<br />
scientific applications. PVM Version 3, completed in 1993, supported fault tolerance <strong>and</strong><br />
provided better portability. This system supports C, C++ <strong>and</strong> Fortran programming<br />
languages.<br />
PVM allows a heterogeneous network of machines to function as a single distributed<br />
parallel processor. This system uses message-passing model as a means to implement the<br />
sharing of tasks between machines. Programmers use PVM’s message passing to take<br />
advantage of the computational power of possibly many computers of various types in a<br />
distributed system, making them appear to be one virtual machine. PVM’s API has a<br />
collection of functions to facilitate parallel programming by message passing. To spawn<br />
workers, the pvm_spawn() function is called:<br />
int status = pvm_spawn(char* task, char** argv, int flag, char* where, int<br />
ntask, int* tid);<br />
where status is an integer that holds the number of tasks successfully spawned, task is the<br />
name of the executable to start, argv is the arguments for the task program, flag is an<br />
integer that specifies PVM options, where is the identifier of a host or system in which to<br />
start a process, ntask is an integer holding the number of task processes to start, <strong>and</strong> tid is<br />
an array to hold the task process ID’s. To end another task process, use the pvm_kill()<br />
function:<br />
int status = pvm_kill(int tid)<br />
where status contains information about the operation, <strong>and</strong> tid is the task process number<br />
to kill. To end the calling task, use the pvm_exit() function:<br />
int status = pvm_exit();<br />
67
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
where status contains information about the operation. To obtain the task process ID of<br />
the calling function, use the pvm_mytid() function:<br />
int myid = pvm_mytid();<br />
where myid is an integer holding the calling function’s task process ID. To obtain the<br />
task process ID of the calling function’s parent, use the pvm_mytid() function:<br />
int pid = pvm_parent();<br />
where pid is an integer holding the parent function’s task process ID. To send a message,<br />
the buffer must be initialized by calling the pvm_initsend() function:<br />
int bufid = pvm_initsend(int encoding);<br />
where bufid is the buffers ID number, <strong>and</strong> encoding is the method used to pack the<br />
message. To pack a string message into the buffer, use the pvm_pkstr() function:<br />
int status = pvm_pkstr(char* msg);<br />
where status contains information about the operation, <strong>and</strong> msg is a null terminated<br />
string. This function basically packs the array msg into the buffer. There are other<br />
functions to pack arrays of other data into the buffer. For a complete listing, see the<br />
PVM <strong>User</strong>’s Guide listed in the references. To send a message use the pvm_send()<br />
function:<br />
int status = pvm_send(int tid, int msgtag);<br />
where status contains information about the operation, tid is the task process number of<br />
the recipient, <strong>and</strong> msgtag is the message identifier. To receive a message, use the<br />
pvm_recv() function:<br />
int bufid = pvm_recv(int tid, int msgtag);<br />
where bufid is the buffers ID number, tid is the task process number of the sender, <strong>and</strong><br />
msgtag is the message identifier. This is a blocking receive. Entering “-1” as the tid<br />
value is a wildcard receive <strong>and</strong> will accept messages from all task processes. To unpack<br />
a buffer, use the pvm_upkstr() function:<br />
int status = pvm_upkstr(char* msg);<br />
where status contains information about the operation, <strong>and</strong> msg is a string in which to<br />
store the message. To compile <strong>and</strong> run a PVM application type:<br />
68
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
[c615111@owin ~/pvm ]>aimk master slave<br />
[c615111@owin ~/pvm ]>master<br />
The amik comm<strong>and</strong> compiles the application <strong>and</strong> the executable name of the master<br />
executable runs the application. An example of a PVM “Hello worker—Hello master”<br />
application is below. It demonstrates the structure of a basic PVM program. The master<br />
program is:<br />
// master.c: “Hello worker” program<br />
#include <br />
#define NUM_WKRS 3<br />
main(){<br />
int status; // Status of operation<br />
int tid[NUM_WKRS];// Array of task ID’s all must be unique in system<br />
int msgtag; // Message tag to ID a message<br />
int flag = 0; // Used to specify options for pvm_spawn<br />
char buf[100]; // Message string buffer<br />
char wkr_arg0 = 0;// Null argument to activate workers<br />
char** wkr_args; // Array of args to activate workers<br />
char host[128]; // Host machine name<br />
// Set wkr_args to start worker program to address of wkr_arg0<br />
// which has been set to 0 (NULL)<br />
wkr_args = &wkr_arg0;<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Get my task ID <strong>and</strong> print ID <strong>and</strong> host name to screen<br />
printf("Master: ID is %x, name is %s\n", pvm_mytid(), host);<br />
// Spawn a program executable named “worker”<br />
// Will return the number of workers spawned on success or 0 on error<br />
// The empty string (fourth arg) requests any machine<br />
// Putting a name in this arg would request a specific machine<br />
status = pvm_spawn("worker", wkr_args, flag, "", NUM_WKRS, tid);<br />
// If spawn was successful it will return NUM_WKRS<br />
// since there are NUM_WKRS workers<br />
if(status == NUM_WKRS){<br />
// Label first message as 1<br />
msgtag = 1;<br />
// Put message in buffer<br />
sprintf(buf, "Hello worker from %s", host);<br />
// Initialize the send message operation<br />
pvm_initsend(PvmDataDefault);<br />
69
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
// Transfer the message to PVM storage<br />
pvm_pkstr(buf);<br />
// Send the message signal to all workers<br />
for(i=0; i< NUM_WKRS; i++)<br />
pvm_send(tid[i], msgtag);<br />
// Print messages sent to workers<br />
printf(“Master: Messages sent to %d workers\n”)<br />
// Get replies from workers<br />
for(i=0; i< NUM_WKRS; i++){<br />
}<br />
// Execute a blocking receive to wait for reply from any (-1) worker<br />
pvm_recv(-1, msgtag);<br />
// Put the received message in the buffer<br />
pvm_upkstr(buf);<br />
// Print the message<br />
printf("Master: From %x: %s\n", tid, buf);<br />
// Print end message<br />
printf(“Master: Application is finished\n”);<br />
}<br />
}<br />
// Else the spawn was not successful<br />
else<br />
printf("Cannot start worker program\n");<br />
// Exit application<br />
pvm_exit();<br />
The master program spawns a number of workers, sends the “Hello worker…” message<br />
<strong>and</strong> waits for a reply. After the reply is received, it is printed to screen <strong>and</strong> the master<br />
terminates. The worker program is:<br />
// worker.c: “Hello Master” program<br />
#include <br />
main(){<br />
int ptid; // Parents task ID<br />
int msgtag; // Message tag to ID a message<br />
char buf[100]; // Message string buffer<br />
char host[128]; // Host machine name<br />
FILE* fd; // File in which to write master’s message<br />
// Open file to store message<br />
fd = fopen(“msg.txt”, "a");<br />
70
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Get parents task ID<br />
ptid = pvm_parent();<br />
// Label first message as 1<br />
msgtag = 1;<br />
// Execute a blocking receive to wait for message from master<br />
pvm_recv(ptid, msgtag);<br />
// Put the received message in the buffer<br />
pvm_upkstr(buf);<br />
// Print the message to file<br />
fprintf(fd, "Worker: From %x: %s\n", ptid, buf);<br />
// Put reply message in buffer<br />
sprintf(buf, "Hello master from %s", host);<br />
// Initialize the send message operation<br />
pvm_initsend(PvmDataDefault);<br />
// Transfer the message to PVM storage<br />
pvm_pkstr(buf);<br />
// Send the message signal to master<br />
pvm_send(ptid, msgtag);<br />
// Close file<br />
fclose(fd);<br />
// Exit application<br />
pvm_exit();<br />
The worker waits for the initial message from the master, writes the message to a file,<br />
sends a reply <strong>and</strong> terminates. The output on the master machine would resemble:<br />
[c615111@owin ~/pvm ]>master<br />
Master: ID is 0, name is owin<br />
Master: Messages sent to 3 workers<br />
Master: From 3: Hello master from saber<br />
Master: From 1: Hello master from sarlac<br />
Master: From 2: Hello master from owin<br />
Master: Application is finished<br />
All the workers output can be redirected to the master’s terminal by running the<br />
application in PVM’s console, which can be started by typing:<br />
[c615111@owin ~/pvm ]>pvm<br />
71
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
pvm>spawn -> master<br />
Typing “pvm” at the comm<strong>and</strong> prompt activates the console <strong>and</strong> typing “spawn -><br />
master” at the console prompt executes the application in console mode. The “->” causes<br />
all worker screen output to be printed on the masters terminal. At any point or time in a<br />
parallel application any executing PVM task (worker) may:<br />
• Create or terminate other tasks<br />
• Add or remove computers from the parallel virtual machine<br />
• Have any of its process communicate with any other task’s processes<br />
• Have any of its process synchronize with any other task’s processes<br />
By proper use of PVM constructs <strong>and</strong> host language control-flow statements, any specific<br />
dependency <strong>and</strong> control structure may be employed under the PVM system. Because of<br />
its easy to use programming interface <strong>and</strong> its implementation of the virtual machine<br />
concept, PVM became popular in the high-performance scientific computing community.<br />
Currently it is not being developed but it made a significant contribution to modern<br />
distributed processing designs <strong>and</strong> implementations. lviii<br />
Message Passing Interface (MPI/MPICH)<br />
The Message Passing Interface (MPI) is a communications protocol that was introduced<br />
in 1994. It is the product of a community effort to define the semantics <strong>and</strong> syntax for a<br />
core set of message passing libraries for use by a wide variety of users <strong>and</strong> that could be<br />
used on a wide variety of MPP systems. MPI is not a st<strong>and</strong>alone parallel system for<br />
distributed computing because it does not include facilities to manage processes,<br />
configure virtual machines or support input/output operations. It has become a st<strong>and</strong>ard<br />
for communication among machines running parallel programs on distributed memory<br />
systems. MPI is primarily a library of routines that can be invoked from programs<br />
written in the C, C++ or Fortran languages. Its differential advantages over older<br />
protocols are portability <strong>and</strong> performance. Its more portable because MPI has an<br />
implementation for almost every distributed system <strong>and</strong> faster because it is optimized for<br />
the specific hardware on which it is run. MPICH is the most commonly used<br />
implementation of MPI.<br />
The MPI API has hundreds of function calls to perform various operations within a<br />
parallel program. Many of these function calls are similar to IPC calls in the UNIX<br />
operating system. Some of the basic MPI functions will be briefly explained <strong>and</strong> used in<br />
an example program. Before any MPI operations can be used in a program the MPI<br />
interface must be initialized with the MPI_Init() function:<br />
72
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
MPI_Init(&argc, &argv);<br />
where argc is the number of arguments <strong>and</strong> argv is a vector of strings, both of which<br />
should be taken as comm<strong>and</strong> line arguments because the same program will be used for<br />
both the master <strong>and</strong> worker processes in the example application. After initialization, a<br />
program must determine its rank by calling MPI_Comm_rank(), designated by process<br />
number, to determine if it is the master or a worker process. The master will be process<br />
number 0. The function call is:<br />
MPI_Comm_rank(MPI_Comm comm, int* rank);<br />
where comm is a communicator <strong>and</strong> is defined in MPI’s libraries <strong>and</strong> rank is a reference<br />
pointer to an integer to hold this process’ rank. It may also be necessary for an<br />
application to determine the number of currently running processes. The<br />
MPI_Comm_size() function returns this number. The function call is:<br />
MPI_Comm_size(MPI_Comm comm, int* size);<br />
where comm is a communicator <strong>and</strong> is defined in MPI’s libraries <strong>and</strong> size is a reference<br />
pointer to an integer to hold the number processes. To send a message to another process<br />
the MPI_Send() function is used as such:<br />
MPI_Send(void* msg, strlen(msg)+1, MPI_Datatype type, int dest, int tag,<br />
MPI_Comm comm);<br />
where msg is a message buffer, strlen(msg)+1 sets the length of the message <strong>and</strong> its null<br />
terminal, type is the data type of the message as defined by MPI’s libraries, dest is an<br />
integer holding the process number of the destination, tag is an integer holding the<br />
message tag, <strong>and</strong> comm is a communicator <strong>and</strong> is defined in MPI’s libraries. This is a<br />
blocking send <strong>and</strong> will wait for the destination to receive the message before executing<br />
further instructions. To receive a message the MPI_Recv() function is used as such:<br />
MPI_Recv(void* msg, int size, MPI_Datatype type, int source, int tag, MPI_Comm<br />
comm, MPI_Status* status)<br />
where msg is a message buffer, is an integer holding the size actual size of the receiving<br />
buffer, type is the data type of the message as defined by MPI’s libraries, source is an<br />
integer holding the process number of the source, tag is an integer holding the message<br />
tag, comm is a communicator <strong>and</strong> is defined in MPI’s libraries, <strong>and</strong> status is the data<br />
about the receive operation. To end an MPI application session the MPI_Finalize()<br />
function is called:<br />
73
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
MPI_Finalize();<br />
which disables the MPI interface. To compile <strong>and</strong> run an MPI application type:<br />
[c615111@owin ~/mpi ]>mpicc -o hello hello.c<br />
[c615111@owin ~/mpi ]>mpirun –np 4 hello<br />
The mpirun comm<strong>and</strong> activates a MPI application named “hello” with 4 processes (1<br />
master <strong>and</strong> 3 workers) <strong>and</strong> the mpicc comm<strong>and</strong> is actually not a proprietary compiler. It<br />
is a definition that is equivalent a call to the cc compiler with the following arguments to<br />
access the proper libraries:<br />
[c615111@owin ~/mpi ]>cc -o hello hello.c -I/usr/local/mpi/include\<br />
-L/usr/local/mpi/lib -lmpi<br />
An example of an MPI application is:<br />
// hello.c program<br />
#include <br />
#include “mpi.h”<br />
main(int argc, char** argv){<br />
int my_rank;<br />
int p;<br />
int source;<br />
int dest;<br />
int tag = 50;<br />
char buf[100];<br />
MPI_Status status;<br />
FILE* fd;<br />
// Rank of process<br />
// Number of processes<br />
// Rank of sender in loops<br />
// Rank of receiver<br />
// Tag for messages<br />
// Storage buffer for the message<br />
// Return status for receive<br />
// File in which to write master’s message<br />
// Open file to store message<br />
fd = fopen(“msg.txt”, "a");<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Initialize MPI application session<br />
// No MPI functions may be used until this is called<br />
// This function may only be called once<br />
MPI_Init(&argc, &argv);<br />
// Get my rank<br />
// Master’s rank will be ‘0’<br />
// Worker’s ranks will be greater than ‘0’<br />
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);<br />
// Get the number of running processes<br />
MPI_Comm_size(MPI_COMM_WORLD, &p);<br />
// If my_rank != 0, I am a worker<br />
74
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
if (my_rank != 0){<br />
}<br />
// Set source to ‘0’ for master<br />
source = 0;<br />
// Receive message from master i<br />
MPI_Recv(buf, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status);<br />
// Print the message to file<br />
fprintf(fd, "Worker: %s\n", buf);<br />
// Put reply in buffer<br />
sprintf(buf, “Hello master from %s number %d”, buf, my_rank);<br />
// Set destination to ‘0’ for master<br />
dest = 0;<br />
// Send the reply to master<br />
// Use strlen(buf)+1 to include '\0'<br />
MPI_Send(buf, strlen(buf)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);<br />
// Else my_rank == 0 <strong>and</strong> I am the master<br />
else{<br />
// Get my task ID <strong>and</strong> print ID <strong>and</strong> host name to screen<br />
printf("Master: ID rank %d, name is %s\n", my_rank, host);<br />
}<br />
// Put reply in buffer<br />
sprintf(buf, “Hello worker from %s number %d”, buf, my_rank);<br />
// Send messages to all workers<br />
for (dest=1; dest
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Close file<br />
fclose(fd);<br />
// Print end message<br />
printf(“Master: Application is finished\n”);<br />
// End MPI application session<br />
// No MPI functions may be called after this function is called<br />
MPI—Finalize();<br />
The screen output on the master machine would resemble:<br />
Master: ID rank 0, name is owin<br />
Master: Sent: Hello worker from owin number 0 to 1<br />
Master: Sent: Hello worker from owin number 0 to 2<br />
Master: Sent: Hello worker from owin number 0 to 3<br />
Master: Received: Hello master from saber number 3<br />
Master: Received: Hello master from owin number 1<br />
Master: Received: Hello master from sarlac number 2<br />
Master: Application is finished<br />
Linda<br />
Linda is an environment <strong>and</strong> coordination language for parallel processing that was<br />
initially developed as a research project <strong>and</strong> a commercial product at Yale University by<br />
David Gelernter <strong>and</strong> Nicolas Carriero. Linda’s design is based on a compromise between<br />
message passing <strong>and</strong> shared memory within a distributed parallel processing system.<br />
This system introduced the concept of a tuple space, which is a distributed shared<br />
memory area in which machines can communicate by reading, taking or putting tuples.<br />
A single tuple space is created when the master program is executed. Tuples are similar<br />
to a vector data type but do not have specified primitive or structured data types<br />
contained within them. This allows any data to be stored in a binary format within the<br />
tuple space. Any combination of mixed data types can be placed not only into a tuple<br />
space but also in individual tuples within the space. Linda tuples may have a maximum<br />
of 16 fields, which are separated by commas. Entries in the tuple space are identified by<br />
names or numerical values in the tuple’s data rather than as an address in local machines.<br />
An example of a tuple space entry with 3 fields is:<br />
(“string”, 123, 45.678);<br />
which contains a character string, an integer <strong>and</strong> a floating point number, respectively.<br />
There are two kinds of tuples in Linda: active tuples, also called live or process tuples,<br />
are tuples that are under active evaluation, <strong>and</strong> passive tuples, also called data tuples, are<br />
76
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
entries in the tuple space similar to the example above. Active tuples are created with the<br />
eval() function. The function call:<br />
eval(“worker”, worker());<br />
would create a tuple entry with “worker” in the first field <strong>and</strong> spawn a new process that<br />
will immediately call the worker() function. Passive tuples are created <strong>and</strong> added to the<br />
tuple space with the Linda’s out() function. The function call:<br />
out(“string”, 123, 45.678);<br />
would create the tuple <strong>and</strong> add it to the tuple space.<br />
Data can be either read or removed from the tuple space. A template is used to retrieve a<br />
tuple from the tuple space by matching a pattern in the fields of a tuple’s fields. The<br />
following conditions must be met to match a template to a tuple:<br />
1. The template <strong>and</strong> tuple both must have the same number of fields.<br />
2. The template <strong>and</strong> tuple both must have the same types, values, <strong>and</strong> length of all<br />
literal values in corresponding fields.<br />
3. The template <strong>and</strong> tuple both must have matching types <strong>and</strong> lengths of all formals<br />
in the corresponding fields.<br />
A read operation, using the rd() function, leaves the tuple for other processes to access.<br />
The function call:<br />
rd(“string”, 123, ? A);<br />
reads a three entry tuple that has “string” as its first element <strong>and</strong> 123 as its second. The<br />
data in the third element is placed in the A variable. The in() function gets <strong>and</strong> removes<br />
an entry from the tuple space. The function call:<br />
in(“string”, 123, ? A);<br />
gets a three entry tuple that has “string” as its first element <strong>and</strong> 123 as its second. The<br />
data in the third element is placed in the A variable <strong>and</strong> the entry is removed from the<br />
tuple space.<br />
Programming for a tuple space is similar to programming for shared memory because all<br />
participating processes share it. However it is also similar to message passing because<br />
entries are posted <strong>and</strong> taken from it. The major benefit of this system is that participants<br />
can enter <strong>and</strong> leave the system without formerly announcing an arrival or departure.<br />
77
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
They can also take messages, data or tasks from the tuple space at their own pace, which<br />
can balance the workload, giving more work to machines capable of greater performance,<br />
<strong>and</strong> decrease the overall duration of a given task. Tuple spaces <strong>and</strong> load balancing will<br />
be discussed further in later sections.<br />
It should also be noted that Linda tuple spaces do not observe a first in first out (FIFO)<br />
structure. Reading or retrieving an entry may not necessarily obtain the oldest entry,<br />
which may cause programming errors if this structure is assumed. Linda parallel<br />
programs are written with both the master <strong>and</strong> worker programs in the same source file.<br />
The master function is the main function <strong>and</strong> the worker is a named function. Linda has<br />
its own built in compiler to compile the executable. To compile <strong>and</strong> execute a distributed<br />
network application type:<br />
[c615111@owin ~/linda ]>clc -o hello hello.cl<br />
[c615111@owin ~/linda ]>ntsnet hello<br />
The clc comm<strong>and</strong> activates Linda’s compiler <strong>and</strong> the ntsnet comm<strong>and</strong> executes the hello<br />
program as a network application. An example of a Linda master or main function for<br />
the “Hello worker—Hello Master” application is:<br />
// hello.cl program<br />
#define NUM_WKRS 3<br />
real_main(int argc, char* argv){<br />
int i;<br />
// Loop counter<br />
int hello(); // Function declaration<br />
char buf[100]; // Message string buffer<br />
char host[128]; // Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Print master’s name<br />
printf("Master: Name is %s\n", host);<br />
// Put message in buffer<br />
sprintf(buf, "Hello workers from %s", host);<br />
// Put the message in the tuple space<br />
out("message", buf);<br />
// Start the workers<br />
for (i=0; i< NUM_WKRS; i++)<br />
// Start an active tuple (a worker process)<br />
eval("worker", worker(i));<br />
// Get all workers’ reply from tuple space<br />
78
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
for (i=0; i< NUM_WKRS; i++){<br />
}<br />
// Get reply <strong>and</strong> remove from tuple space<br />
in("reply", ? buf);<br />
// Print reply to screen<br />
printf(“Master: %s\n”, buf);<br />
// Print end message to screen<br />
printf("Master: Application is finished\n");<br />
// End the master<br />
return(0);<br />
An example of a worker function is:<br />
// The worker function<br />
worker(int i){<br />
}<br />
char buf[100]; // Message string buffer<br />
char host[128]; // Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Read the message from tuple space<br />
rd(“message”, ? buf);<br />
// Print the message to screen<br />
printf("Worker: %s number %d got %s\n", host, i, buf);<br />
// Put message in buffer<br />
sprintf(buf, "Hello master from %s number %d", host, i);<br />
// Put reply in tuple space<br />
out("reply", buf);<br />
// Print end message to screen<br />
printf("Worker: %s finished\n");<br />
// End the worker<br />
return(0);<br />
Linda prints both the master <strong>and</strong> workers’ output to the master’s screen. The screen<br />
output on the master machine would resemble:<br />
[c615111@owin ~/fpc01 ]>ntsnet hello<br />
Master: Name is owin<br />
79
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Worker: saber number 1 got Hello workers from owin<br />
Worker: owin number 0 got Hello workers from owin<br />
Worker: owin finished<br />
Worker: sarlac number 2 got Hello workers from owin<br />
Master: Hello master from sarlac number 2<br />
Worker: saber finished<br />
Worker: sarlac finished<br />
Master: Hello master from saber number 1<br />
Master: Hello master from owin number 0<br />
Master: Application is finished<br />
It should also be noted that global variables in Linda applications are not transferred to<br />
workers. Using global variables will have unpredictable results. lix<br />
80
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Parallel Programming Concepts<br />
Stateless Parallel Processing (SPP)<br />
The Stateless Parallel Processing architecture is comprised of “fully configured<br />
computers” connected by a “multiple redundant switching network” that form a<br />
“unidirectional virtual ring network”, as shown below. Multiple direct paths are provided<br />
from each node to every other node. Redundancy allows for scalable performance <strong>and</strong><br />
fault tolerance.<br />
Multiple<br />
Redundant<br />
Switching<br />
Network<br />
Unidirectional<br />
Virtual Ring<br />
Network<br />
Fully<br />
Configured<br />
Computers<br />
The Stateless Parallel Processing Architecture<br />
Please note that the unidirectional “virtual” network is implemented through the multiple<br />
redundant switching network’s hardware <strong>and</strong> is not an actual physical ring. Each<br />
computer might have only one network interface adapter card. Each node on the virtual<br />
ring is aware of every other node because each maintains a current list of all participating<br />
nodes. Each node can also detect <strong>and</strong> isolate faulty nodes. The SPP virtual ring’s<br />
responsibility is limited to tuple queries <strong>and</strong> SPP backbone management. Tuple data is<br />
transmitted directly from point to point. This ring also provides full b<strong>and</strong>width support<br />
for multicast communication through the network, where all nodes can access multicast<br />
81
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
messages. The diagram below shows a conceptual representation of a unidirectional<br />
virtual ring, where the arrows may represent possibly a single multicast message that all<br />
nodes can acquire. The multiple switch network can transport a massive amount of data<br />
between machines.<br />
P1<br />
P8<br />
P2<br />
P7<br />
P3<br />
P6<br />
P4<br />
P5<br />
The Unidirectional Virtual Ring Configuration<br />
The tuple space model allows participating processes to acquire massages from a current<br />
tuple space without temporal restrictions. Processes can take messages when they are<br />
ready without causing a work stoppage, unlike communication methods that uses a<br />
blocking send. In this design, tuples flow freely through the network from process to<br />
process. Each process will perform a part of the task by taking work date tuples from the<br />
tuple space at its own pace. The processes are purely data driven <strong>and</strong> will activate or<br />
continue processing only when it receives required data. There are no explicit global<br />
state controls in this “stateless” system, which ensures fault tolerance. If a process fails<br />
the system can recover because the data can be renewed in the tuple space <strong>and</strong> taken by<br />
another worker process.<br />
SPP applications use a parallel processing model called “scatter <strong>and</strong> gather”, involving<br />
master <strong>and</strong> worker processes. A master process is the application controller for the<br />
worker processes. In a single task, single pass application, it divides the task into n<br />
subtasks, places the work data tuples in a tuple space, collects the completed subtasks<br />
82
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
from a tuple space, <strong>and</strong> directs the workers to terminate when all of the results are<br />
received. The three diagrams below show possible contents during an applications<br />
execution.<br />
Owin<br />
Saber<br />
Sarlac<br />
Luke<br />
Owin<br />
Saber<br />
Sarlac<br />
Luke<br />
Owin<br />
Saber<br />
Sarlac<br />
Luke<br />
(Another message tuple)<br />
(Data tuple 1)<br />
(A message tuple)<br />
ProblemTuple Space<br />
(Data tuple 2)<br />
...<br />
(Data tuple n)<br />
Result Tuple Space<br />
(Result tuple 1)<br />
(Result tuple 2)<br />
...<br />
(Result tuple n)<br />
(A message tuple)<br />
ProblemTuple Space<br />
(Another message tuple)<br />
(Termination tuple)<br />
The<br />
left-most diagram shows a problem tuple space, where work data is stored, after<br />
messages to workers <strong>and</strong> work data tuples have received. The center shows a result tuple<br />
space, where the master will receive completed subtasks. The right-most diagram shows<br />
a problem tuple space with a termination tuple, also called a poison pill, which instructs<br />
the workers to terminate. Notice that the message tuples remain in the tuple space <strong>and</strong><br />
that the data tuples are removed. This is because the messages were accessed by a read<br />
operation <strong>and</strong> the data tuples were accessed by a take operation. If the terminal message<br />
is accessed by a take operation, it must be replaced so that the next worker can access it.<br />
This scenario assumes a parallel system that can create multiple tuple spaces, such a<br />
synergy. If the system is limited to one, then it depends more heavily on name pattern<br />
matching of tuples.<br />
The master program with its accompanying tuple spaces can reside on any participating<br />
node. The worker processes take work tuples from the tuple space that match a tuple<br />
query, put the results into the result tuple space, until all work is completed, <strong>and</strong><br />
terminate when they get the terminate message tuple from the master. The diagram<br />
below shows a possible master-worker configuration. It should be noted that the master<br />
machine generally has both a master process <strong>and</strong> a worker process. Otherwise a valuable<br />
system resource would be wasted because the master machine would be idle between<br />
receiving results.<br />
83
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Initial<br />
Requests Are<br />
Multicast On<br />
Virtual Ring<br />
Network<br />
Multiple<br />
Switching<br />
Network<br />
W<br />
W<br />
W<br />
W<br />
W<br />
M<br />
W<br />
Node Running<br />
Master<br />
Program<br />
W<br />
Nodes Running<br />
Worker<br />
Programs<br />
The SPP Architectural Support<br />
Stateless Machine (SLM)<br />
A stateless machine (SLM) is a fully implemented stateless parallel processing system.<br />
An SLM should provide an API that offers a robust but easy to use interface with the<br />
system’s functionality. It should have a fault tolerance facility to recover from dropped<br />
hosts <strong>and</strong> lost data. The network structure should offer high efficiency <strong>and</strong> high<br />
performance. The locations of processes should be transparent for all participating<br />
processes in the application, meaning that the system should h<strong>and</strong>le communication<br />
between machines <strong>and</strong> not be directly noticeable to running programs. The workload<br />
should be balanced between the participating processes, where each process is kept busy<br />
until all work is complete.<br />
Linda Tuple Spaces Revisited<br />
84
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
As previously mentioned, the tuple space was first defined in the Linda distributed<br />
parallel programming implementation as a method of multi-machine inter-process<br />
coordination. It’s easiest to think of a Linda tuple space as a buffer, a virtual bag or a<br />
public repository that cooperating processes from different computers can put tuples in,<br />
or read <strong>and</strong> get tuples from. It’s a type of distributed shared memory, where any process<br />
can access any tuple, regardless of its storage location. A tuple space is not a physical<br />
shared memory. It is a logical shared memory because processes have to access it<br />
through an intermediary or tuple h<strong>and</strong>ling process. The API only makes the tuple space<br />
appear to be physically shared memory. The computers, though physically dispersed,<br />
must be part of some distributed system. The machines can communicate with each other<br />
without really being aware that any of the other machines exist, other than the data passed<br />
through the tuple space. Heterogeneous data types can be stored in tuples <strong>and</strong> differently<br />
structured tuples can be placed in the tuple space. Hence, all of the following data types:<br />
char name[4] = {“Bob”};<br />
int number = 12;<br />
double fraction = 34.56;<br />
can be placed in the same tuple:<br />
(name, number, fraction)<br />
<strong>and</strong> all of the following tuples:<br />
(name, number, fraction)<br />
(102, 73, 36, 125, 67.5, 1000)<br />
(“Sally”, “123 Broad St”, “Philadelphia PA 19024”, “555-123-4567”)<br />
can be placed in the same tuple space.<br />
85
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Owin<br />
Saber<br />
Sarlac<br />
Luke<br />
("Bob", 12, 34.56)<br />
(102, 73, 36, 125, 67.5, 1000)<br />
Tuple Space<br />
("Sally", "123 Broad St",<br />
"Philadelphia PA 19024",<br />
"555-123-4567")<br />
Tuples are placed in <strong>and</strong> retrieved from tuple spaces by function calls, previously<br />
described, that match a pattern from a template. A template is essentially a tuple that is<br />
used to express a pattern. The template:<br />
(? A, 12, ? B)<br />
where A is a string <strong>and</strong> B is a double, matches:<br />
(name, number, fraction) = (“Bob”, 12, 34.56)<br />
However, this template will not match the other tuples in the example above. The<br />
general rules for a Linda tuple were stated previously. This is called an associative<br />
memory because elements or tuples in the memory are accessed by associating them,<br />
synonymously, with a pattern in their content as opposed to being referenced by a<br />
memory address or physical location.<br />
Active tuples in Linda are based on the generative communication model, where<br />
dynamically spawned processes are turned into data upon completion of their task. The<br />
eval(“worker”, worker()) function will leave a tuple in the tuple space with two fields<br />
from the called worker function:<br />
worker(){<br />
// perform task<br />
86
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
return 0;<br />
}<br />
will place a tuple with the name assigned from the process that spawned the worker<br />
function in the first field (in this case “worker”) <strong>and</strong> the return value of the worker<br />
function. All tuples placed by the worker into the tuple space will be accessible by all<br />
other processes even after the worker terminates. The tuple from the example above after<br />
the eval() function returns would be:<br />
(“worker”, 0)<br />
Since the concept was pioneered at Yale, many languages have been implemented using<br />
variants of Linda’s tuple space model, including LiPS, ActorSpaces, TSpace,<br />
PageSpaces, OpenSpaces, Jini/Javaspaces, <strong>Synergy</strong>, etc.<br />
87
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Theory <strong>and</strong> Challenges of Parallel Programs <strong>and</strong> Performance<br />
Evaluation<br />
Basic Logic<br />
Logic is the study of the reasoning of arguments <strong>and</strong> is both a branch of mathematics <strong>and</strong><br />
a branch of philosophy. In the mathematical sense, it is the study of mathematical<br />
properties <strong>and</strong> relations, such as soundness <strong>and</strong> completeness of arguments. In the<br />
philosophical sense, logic is the study of the correctness of arguments. A logic is<br />
comprised of an informal language coupled with model-theoretic semantics <strong>and</strong>/or a<br />
deductive system. The language allows the arguments to be stated, which is similar to<br />
the way we state our thoughts in written or spoken languages. The semantics provide a<br />
definition of possible truth-conditions for arguments <strong>and</strong> the deductive system provides<br />
inferences that are correct for the given language.<br />
This section introduces formal logics that can be used as methods to design program logic<br />
<strong>and</strong> prove that the logic is sound. Systems based on propositional logic have been<br />
produced to facilitate the design <strong>and</strong> proofs for sequential programs. However, these<br />
systems were inadequate for concurrent applications. Variations of temporal logic, which<br />
is based on modal logic, are used to evaluate the logic of concurrent programs.<br />
Propositional Logic<br />
Symbolic logic is divided into several parts of which propositional calculus is the most<br />
fundamental. A proposition, or statement, is any declarative sentence, which is either<br />
true or false. We refer to true (T) or false (F) as the truth-value of the statement.<br />
“1 + 1 = 2” is a true statement.<br />
“1 + 1 = 11” is a false statement.<br />
“Tomorrow will be a sunny day” is a proposition whose truth is yet to be determined.<br />
“The number 1” is not a proposition because it is not a sentence.<br />
Simple statements are those that represent a single idea or subject <strong>and</strong> contain no other<br />
statements within. Simple statements will be represented by the symbols: p, q, r <strong>and</strong> s. If<br />
p st<strong>and</strong>s for the proposition: “ice is cold”, we denote it as:<br />
p: “ice is cold”,<br />
which is read as:<br />
88
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
p is the statement “ice is cold”.<br />
The following is an example of a simple statement assertion <strong>and</strong> negation.<br />
p assertion p is true if p is true or p is false if p is false.<br />
¬p negation ¬p is false if p is true or ¬p is true if p is false.<br />
Then for the true statement: p: “ice is cold”, ¬p is the statement that “ice is not cold”,<br />
which is false.<br />
A compound statement is made up of two or more simple statements. The simple<br />
statements are known as components of the compound statement. These components<br />
may be made up of smaller components. Operators, or connectives, separate<br />
components. The sentential connectives are disjunction (∨, pronounce as OR),<br />
conjunction (∧, pronounce as AND), implication (→, pronounce as IF) <strong>and</strong> equivalence<br />
(↔, pronounce as IF AND ONLY IF). These are called sentential because they join<br />
statements, or sentences, into compound sentences. They are binary operators because<br />
they operate on two components or statements. Equivalence statements (p↔q) are also<br />
called biconditionals, <strong>and</strong> implication statements (p→q) are also called conditionals. In<br />
the p → q conditional statement, the "if- clause" or first statement, p, is called the<br />
antecedent <strong>and</strong> the "then-clause" or second statement, q, is called the consequent. The<br />
antecedent <strong>and</strong> consequent could be compounds in more complicated conditionals rather<br />
than the simple statements shown above. These terms are used for all the binary<br />
operators listed above. Negation (¬) is called a unary operator because it only operates<br />
on one component or statement. The following define the conditions under which<br />
components joined with connectives are true; otherwise they are false:<br />
p∨q disjunction either p is true, or q is true, or both are true<br />
p∧q conjunction both p <strong>and</strong> q are true<br />
p→q implication if p is true, then q is true<br />
p↔q equivalence p <strong>and</strong> q are either both true or both false<br />
The statements:<br />
p: “ice is cold”<br />
q: 1 + 1 = 2<br />
r: “water is dry”<br />
s: 1 + 1 = 11<br />
under conjunction:<br />
89
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
p∧q is true because “ice is cold” is true <strong>and</strong> “1 + 1 = 2” is true<br />
p∧r is false because “ice is cold” is true <strong>and</strong> “1 + 1 = 11” is false<br />
s∧q is false because “1 + 1 = 11” is false <strong>and</strong> “1 + 1 = 2” is true<br />
r∧s is false because “water is dry” is false <strong>and</strong> “1 + 1 = 11” is false<br />
All meaningful statements will have a truth-value. The truth-value of a statement<br />
designates the statement as true T or false F. The statement p is either absolutely true or<br />
absolutely false. If a compound statement’s truth-value can be determined in its entirety<br />
based solely on its components, the compound statement is said to be truth-functional. If<br />
a connective constructs compounds that are all truth-functional, the connective is said to<br />
be truth-functional. Using these conditions it is possible to build truth-functional<br />
compounds from other truth-functional compounds <strong>and</strong> connectives. As an example: if<br />
the truth-values of p <strong>and</strong> of q are known, then we could deduce the truth-value of the<br />
compound using the disjunction connective, p∨q. This establishes that the compound,<br />
p∨q, is a truth-functional compound <strong>and</strong> disjunction is a truth-functional connective. A<br />
truth table contains all possible truth-values for a given statement. The truth table for p<br />
is:<br />
because the simple statement p is either absolutely true or absolutely false. The<br />
following is the truth table of p <strong>and</strong> q for the five previously mentioned operators:<br />
p<br />
T<br />
F<br />
p q ¬p ¬q p∨q p∧q p→q p↔q<br />
T T F F T T T T<br />
T F F T T F F F<br />
F T T F T F T F<br />
F F T T F F T T<br />
Parentheses ( ) are used to group components into whole statements. The whole<br />
compound statement p∧q can be negated by grouping it with parentheses <strong>and</strong> negating<br />
the group ¬(p∧q). The table below shows all negated truth-values for the operators<br />
previous table.<br />
p q ¬(¬p) ¬(¬q) ¬(p∨q) ¬(p∧q) ¬(p→q) ¬(p↔q)<br />
T T T T F F F F<br />
T F T F F T T T<br />
F T F T F T F T<br />
F F F F T T F F<br />
90
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
To avoid an excessive number of parentheses in statements, there is a st<strong>and</strong>ard for<br />
operator precedence. This simply means the order in which operations are performed.<br />
Negation has precedence over conjunction <strong>and</strong> conjunction has precedence over<br />
disjunction. The statement:<br />
¬p∨q is (¬p)∨q not ¬(p∨q)<br />
<strong>and</strong><br />
¬p∨q∧r is ((¬p) ∧q)∨r<br />
A truth table will have 2 n rows, where n is the number of distinct simple statements in the<br />
whole statement. The first truth table for p had only two rows <strong>and</strong> the previous two had<br />
four rows. If p, q <strong>and</strong> r were under consideration, there would be eight rows. To find<br />
which values for p, q, <strong>and</strong> r will evaluate to true for P(p, q, r) = ¬(p∨q)∧(r∨p), construct a<br />
truth table for the statement. Start by placing true values in the top row <strong>and</strong> false values<br />
in the next from the bottom row for one instance of each unique simple statement as<br />
shown below. The last row is to maintain the steps performed by operator precedence<br />
<strong>and</strong> parentheses. Mark all simple statements step 1.<br />
¬ (p ∨ q) ∧ (r ∨ p)<br />
T T T<br />
F F F<br />
1 1 1 1<br />
Then assume all F’s are 0’s <strong>and</strong> all T’s are 1’s, <strong>and</strong> count up the table from 0 to 7 in<br />
binary. Then copy values to all other duplicate simple statements.<br />
¬ (p ∨ q) ∧ (r ∨ p)<br />
T T T T<br />
T T F T<br />
T F T T<br />
T F F T<br />
F T T F<br />
F T F F<br />
F F T F<br />
F F F F<br />
91
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
1 1 1 1<br />
This holds all combinations of F’s <strong>and</strong> T’s relative to the three simple statements.<br />
Remember the pattern in the columns <strong>and</strong> you wont have to count next time. Next mark<br />
the second set columns to be evaluated by precedence <strong>and</strong> fill in the truth-values.<br />
Because of the parentheses, the next columns will be the third <strong>and</strong> seventh.<br />
¬ (p ∨ q) ∧ (r ∨ p)<br />
T T T T T T<br />
T T T F T T<br />
T T F T T T<br />
T T F F T T<br />
F T T T T F<br />
F T T F F F<br />
F F F T T F<br />
F F F F F F<br />
1 2 1 1 2 1<br />
Negation has precedence over conjunction. Hence the first column is the negation of the<br />
third. To find the truth-values for conjunction, consider the highest values in the last row<br />
on each side, which is column one on the left <strong>and</strong> column seven on the right.<br />
¬ (p ∨ q) ∧ (r ∨ p)<br />
F T T T F T T T<br />
F T T T F F T T<br />
F T T F F T T T<br />
F T T F F F T T<br />
F F T T F T T F<br />
F F T T F F F F<br />
T F F F T T T F<br />
T F F F F F F F<br />
3 1 2 1 4 1 2 1<br />
The statement is only true for P(p, q, r) = P(F, F, T).<br />
Again if p, q <strong>and</strong> r were under consideration, values for p, q, <strong>and</strong> r will evaluate to true<br />
for Q(p, q, r) = (p→q)∧[(r↔p)∨(¬p)], construct a truth table for the statement. Also note<br />
that brackets [ ] <strong>and</strong> braces { } can be used to differentiate compound groupings up to<br />
three levels.<br />
(p → q) ∧ [(r ↔ p) ∨ (¬ p)]<br />
T T T T T T T T F T<br />
T T T F F F T F F T<br />
T F F F T T T T F T<br />
92
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
T F F F F F T F F T<br />
F T T T T F F T T F<br />
F T T T F T F T T F<br />
F T F T T F F T T F<br />
F T F T F T F T T F<br />
1 2 1 4 1 2 1 3 2 1<br />
There are three types of propositional statements that can be deduced from all truthfunctional<br />
statements:<br />
• If the truth-value column for the table has a mixture of T’s <strong>and</strong> F’s, the table’s<br />
statement is called a contingency.<br />
• If the truth-value column contains all T’s, the statement is called a tautology.<br />
• Lastly, if the truth-value column contains all F’s, the statement is called a<br />
contradiction.<br />
The following logical equivalences apply to any combination of statements used to create<br />
larger compound statements. The p's, q's <strong>and</strong> r' s can be atomic statements or compound<br />
statements.<br />
The Double Negative Law<br />
The Commutative Law for conjunction<br />
The Commutative Law for disjunction<br />
The Associative Law for conjunction<br />
The Associative Law for disjunction<br />
DeMorgan's Law for conjunction<br />
DeMorgan's Law for disjunction<br />
The Distributive Law for conjunction<br />
The Distributive Law for disjunction<br />
Absorption Law for conjunction<br />
Absorption Law for disjunction<br />
Conditional using negation <strong>and</strong> disjunction<br />
Equivalence using conditionals <strong>and</strong> conjunction<br />
¬(¬p) ≡ p<br />
p∧q ≡ q∧p<br />
p∨q ≡ q∨p<br />
(p∧q)∧r ≡ p∧(q∧r)<br />
(p∨q)∨r ≡ p∨(q∨r)<br />
¬(p∨q) ≡ (¬p)∧(¬q)<br />
¬(p∧q) ≡ (¬p)∨(¬q)<br />
p∧(q∨r) ≡ (p∧q)∨(p∧r)<br />
p∨(q∧r) ≡ (p∨q)∧(p∨r)<br />
p∧p ≡ p<br />
p∨p ≡ p<br />
p→q ≡ (~p)∨q<br />
p↔ ≡ (p→q)∧(q→p)<br />
Predicate Calculus<br />
Another part of symbolic logic is predicate calculus, which is built from propositional<br />
calculus. Predicate calculus allows logical arguments based on some or all variables<br />
under consideration. Consider the following arguments, which cannot be expressed in<br />
propositional logic:<br />
All dogs are mammals<br />
93
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Fido is a dog<br />
Therefore, Fido is a mammal<br />
The three statements:<br />
p: All dogs are mammals<br />
q: Fido is a dog<br />
r: Fido is a mammal<br />
are of the form:<br />
p<br />
q<br />
∴ r<br />
can be independently evaluated under propositional logic but cannot be evaluated to<br />
derive the conclusion “r: Fido is a mammal” because “therefore” (‘∴’) is not a legitimate<br />
propositional logic operator. We need to exp<strong>and</strong> propositional calculus <strong>and</strong> set theory to<br />
make use of the predicate calculus.<br />
We use the universal quantifier ∀, which means for all or for every, to establish a<br />
symbolic statement that includes all of the things in a set X that we are considering as<br />
such:<br />
∀x[Px→Qx]<br />
The brackets define the scope of the quantifier. This example is read “For every variable<br />
x in set X, if Px then Qx”. Applied to the example above, we could reword the statement<br />
“All dogs are mammals” by letting Px be: “if x is a mammal” <strong>and</strong> Qx be “then x is a<br />
mammal”. We have:<br />
“For all x, if x is a dog, then x is a mammal”.<br />
This is called a statement form <strong>and</strong> will become a statement when x is given a value. Let<br />
f = Fido. A syllogism is a predicate calculus argument with two premises sharing a<br />
common term.<br />
∀x[Px→Qx]<br />
Pf<br />
∴ Qf<br />
94
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
The predicate P means “is a dog” <strong>and</strong> Q means “is a mammal”. The conclusion states<br />
that because Fido is a dog, Fido is a mammal. If we negate the quantifier as such:<br />
¬∀x[Px→Qx]<br />
The statement becomes:<br />
“Not every dog is a mammal”.<br />
Which sounds ridiculous but the statement is permissible by predicate logic. We can<br />
change this to:<br />
∀x[Px→¬Qx]<br />
Which translates to:<br />
“Some dogs are not mammals”.<br />
Mathematical statements can be constructed using propositional calculus. The statement:<br />
“If a integer is less than 10, then it is less than 11”<br />
This statement can be converted using the universal quantifier so that is true for every<br />
integer x (x ∈ N) less than 10 as such:<br />
∀x ∈ N [(x
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
If we let Px be “x is a lawyer” <strong>and</strong> Qx be “x speaks the truth”, we have:<br />
∃x [Px ∧ Qx],<br />
which states that at least one lawyer speaks the truth. Quantifiers can be applied to more<br />
then one variable in a statement.<br />
Let P be “is a shoe in my closet”, where x is a right shoe <strong>and</strong> y is a left shoe. Then:<br />
∀x, ∃y[Px ∧ Py],<br />
is a symbolic representation of the statement: “For every right shoe in my closet, there<br />
exists a left shoe”. A mathematical statement would be:<br />
∃z ∈ N [x = y×z], x ∈ N, y ∈ N,<br />
which states that there exists an integer z, such that integer x is divisible by integer y. lx<br />
Modal Logic<br />
Modal logic extends the capabilities of traditional logic to include modal expressions,<br />
which contain premises such as “it is necessary that…” or “it is possible that…”. Modal<br />
logic is the study of deductive behavior of expressions based on necessary <strong>and</strong>/or<br />
possible premises. Modal logic can also be defined as a family of related logical systems<br />
that include logics for belief <strong>and</strong> temporal related expressions. The table below contains<br />
some common symbols <strong>and</strong> definitions used in the modal logic family:<br />
Logic Symbols Expressions Symbolized<br />
Modal Logic It is necessary that …<br />
◊<br />
It is possible that …<br />
Deontic Logic O It is obligatory that …<br />
P<br />
It is permitted that …<br />
F<br />
It is forbidden that …<br />
Temporal Logic<br />
G<br />
It will always be the case that …<br />
F<br />
It will be the case that …<br />
H<br />
It has always been the case that …<br />
P<br />
It was the case that…<br />
Doxastic Logic<br />
Bx<br />
x believes that …<br />
96
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
A popular weak modal logic K, conceived by Saul Kripke, .defines three operators:<br />
“negation” (¬), “if…then…” (→), <strong>and</strong> “it is necessary that…” (). The other<br />
connectives, “<strong>and</strong>” (∧), “or” (∨), <strong>and</strong> “if <strong>and</strong> only if” (↔), can be defined by ¬ <strong>and</strong> → as<br />
in propositional logic. The operator “possibly” (◊) can be defined by ◊A = ¬¬A. In<br />
addition to the st<strong>and</strong>ard rules in propositional logic, K has the following rules:<br />
Necessitation Rule:<br />
Distribution Axiom:<br />
If A is a theorem of K, then so is A.<br />
(A → B) → (A → B).<br />
The necessitation rule states that all theorems are necessary <strong>and</strong> the distribution axiom<br />
states that “if it is necessary that if A then B, then if necessarily A then necessarily B”. A<br />
<strong>and</strong> B range over all possible formulas for the language.<br />
(M)<br />
A → A<br />
(4) A → A<br />
(5) ◊A → ◊A<br />
(S4):<br />
(S5):<br />
… = <strong>and</strong> ◊◊…◊ = ◊<br />
00… = <strong>and</strong> 00…◊ = ◊, where each 0 is either or ◊<br />
(B)<br />
A → ◊A<br />
Axiom Name Axiom Condition on Frames R is...<br />
(D) A → ◊A ∃u wRu Serial<br />
(M) A → A wRw Reflexive<br />
(4) A → A (wRv ∧ vRu) → wRu Transitive<br />
(B) A → ◊A wRv → vRw Symmetric<br />
(5) ◊A → ◊A (wRv ∧ wRu) → vRu Euclidean<br />
(CD) ◊A → A (wRv ∧ wRu) → v = u Unique<br />
(M) (A → A) wRv → vRv Shift Reflexive<br />
(C4) A → A wRv → ∃u(wRu∧uRv) Dense<br />
(C) ◊A → ◊A wRv∧wRx → ∃u(vRu ∧ xRu) Convergent<br />
97
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
lxi<br />
Temporal Logic<br />
P<br />
F<br />
H<br />
G<br />
"It has at some time been the case that …"<br />
"It will at some time be the case that …"<br />
"It has always been the case that …"<br />
"It will always be the case that …"<br />
Pp ≡ ¬H¬p<br />
Fp ≡ ¬G¬p<br />
Gp→Fp<br />
"What will always be, will be"<br />
G(p→q)→(Gp→Gq) "If p will always imply q, then if p will always be the case, so will q"<br />
Fp→FFp<br />
"If it will be the case that p, it will be — in between — that it will be"<br />
¬Fp→F¬Fp "If it will never be that p then it will be that it will never be that p"<br />
p→HFp<br />
p→GPp<br />
H(p→q)→(Hp→Hq)<br />
G(p→q)→(Gp→Gq)<br />
"What is, has always been going to be"<br />
"What is, will always have been"<br />
"Whatever always follows from what always has been, always has been"<br />
"Whatever always follows from what always will be, always will be"<br />
RH: From a proof of p, derive a proof of Hp<br />
RG: From a proof of p, derive a proof of Gp<br />
F∃xp(x)→∃xFp(x)<br />
p")<br />
("If there will be something that is p, then there is now something that will be<br />
98
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Spq "q has been true since a time when p was true"<br />
Upq "q will be true until a time when p is true"<br />
Pp ≡ Sp(p∨¬p)<br />
Fp ≡ Up(p∨¬p)<br />
Pp ≡ ∃n(n0 & Fnp)<br />
Hp ≡ ∀n(n0→Fnp)<br />
Op ≡ Up(p&¬p)<br />
Fp ≡ Op ∨ OFp<br />
Pp is true at t if <strong>and</strong> only if p is true at some time t′ such that t′
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Gp→Fp ∀t∃t′(t
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Petri Net<br />
Amdahl’s Law<br />
Gene Amdahl, a computer architect, entrepreneur, former IBM employee <strong>and</strong> one of the<br />
creators of the IBM System 360 architecture, devised this method in 1967 to determine<br />
the maximum expected improvement to a system when only part of it has been improved.<br />
He presented this as an argument against parallel processing. This law is similar to the<br />
law of diminished returns, which states that as more input is applied, each additional<br />
input unit will produce less additional output. Amdahl’s law states that a number of<br />
functions or operations must be executed sequentially, decreasing a computer’s speed<br />
when more processors are added. In other words, the number of tasks that must be<br />
completed sequentially limits computational speedup. This causes a bottleneck in the<br />
workflow, slowing the overall task. However as the size of a task increases the effect of<br />
Amdahl’s law decreases. The speedup of a system is:<br />
unimproved _ time<br />
= speedup =<br />
improved _ time<br />
performance _ with _ improvement<br />
performance _ without _ improvement<br />
If you make an improvement that greatly increases performance (maybe 100 times or<br />
more) in part of a computation but the overall improvement is only 25 percent, then the<br />
upper limit for speedup S is:<br />
S<br />
unimproved _ time<br />
=<br />
improved _ time<br />
1.00<br />
= = 1.333<br />
1.00 − 0.25<br />
Note: The unimproved execution time is 1.00 = 100% because this example makes use of<br />
the ratio between the two times, not the actual values. Assume that an unimproved<br />
computation takes 4 seconds <strong>and</strong> the improved computation takes 3 seconds. The<br />
equation is:<br />
S<br />
unimproved _ time<br />
=<br />
improved _ time<br />
=<br />
4sec<br />
3sec<br />
= 1.333<br />
If the improved computation is taken to be 100 percent performance, then by the<br />
relationship above the unimproved computation has 75 percent performance with respect<br />
to the improved.<br />
101
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
S<br />
=<br />
performance _ with _ improvement<br />
performance _ without _ improvement<br />
100<br />
= = 1.333<br />
75<br />
If a computation is improved such that it affects a proportion F p of the computation, then<br />
the improvement will have a speedup S affecting F p . The improved time for a<br />
computation will be equal to the unimproved time multiplied by the sum of the<br />
unaffected portion (1-F p ) <strong>and</strong> the speedup reduced affected portion (F p ÷S) of the task. To<br />
find the improved execution time we use:<br />
⎡<br />
improved _ time = unimproved _ time ×<br />
⎣<br />
F<br />
p ⎤<br />
⎢( 1−<br />
Fp<br />
) + ⎥ ⎦<br />
Continuing the formula above with an affected portion of 40 percent <strong>and</strong> a speedup of<br />
2.66 times on this portion, we have:<br />
⎡ 0.4 ⎤<br />
improved _ time = 4×<br />
⎢<br />
=<br />
⎣ 2.66⎥<br />
⎦<br />
( 1−<br />
0.4) + = 4×<br />
( 0.6×<br />
0.15) = 4×<br />
0.75 3<br />
This method states, assuming that the value for the speed of the unimproved computation<br />
is 100 percent, the overall speedup for this computational improvement will be:<br />
S<br />
S<br />
unimproved _ time<br />
=<br />
improved _ time<br />
=<br />
(1 − F<br />
1<br />
p<br />
) +<br />
F<br />
S<br />
p<br />
Then plugging in the example proportional values:<br />
S<br />
1<br />
=<br />
(1 − 0.4) +<br />
0.4<br />
2.66<br />
=<br />
1<br />
0.75<br />
= 1.33<br />
Using time values instead of proportions, we have:<br />
4sec<br />
S =<br />
1.6sec<br />
(4sec−1.6sec)<br />
+<br />
2.66<br />
=<br />
4sec<br />
3sec<br />
= 1.33<br />
102
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Amdahl’s law for parallelization states that the sequential fraction F s of a task that cannot<br />
be performed in parallel <strong>and</strong> the fraction F p = (1-F s ) that can gives the following formula<br />
for maximum speedup by N p processors:<br />
S<br />
=<br />
F<br />
s<br />
1<br />
1−<br />
F<br />
+<br />
N<br />
p<br />
s<br />
As N approaches infinity, the maximal speedup approaches 1/F s . As the (1-F s )/N p value<br />
becomes very small, the price paid for marginal performance increases. Assume that F s =<br />
0.06. Then F p = 1-F s = 0.94. For 4 processors:<br />
S<br />
1<br />
=<br />
1−<br />
0.06<br />
0.06 +<br />
4<br />
1<br />
=<br />
0.06 +<br />
0.94<br />
4<br />
1<br />
=<br />
=<br />
0.06 + 0.235<br />
1<br />
0.295<br />
= 3.3898<br />
The table below shows the run time, speedup, efficiency <strong>and</strong> cost for processors<br />
N p ={1,2,4,…,1024}, where F s = 0.06 <strong>and</strong> F p = 0.94. Notice that the speedup per<br />
additional processor is much less as N p increases, causing greater cost <strong>and</strong> less efficiency.<br />
The graphs show the effect on speedup (y-axis) with respect to F s (x-axis) with increasing<br />
N p .<br />
Processors(N p) 1 2 4 8 16 32 64 128 256 512 1024<br />
Run Time 1024.00 542.72 302.08 181.76 121.60 91.52 76.48 68.96 65.20 63.32 62.38<br />
Speedup 1.0000 1.8868 3.3898 5.6338 8.4211 11.1888 13.3891 14.8492 15.7055 16.1718 16.4155<br />
Efficiency 100.00% 94.34% 84.75% 70.42% 52.63% 34.97% 20.92% 11.60% 6.13% 3.16% 1.60%<br />
Cost 1.00 1.06 1.18 1.42 1.90 2.86 4.78 8.62 16.30 31.66 62.38<br />
4<br />
16<br />
16<br />
64<br />
70<br />
60<br />
3.8<br />
14<br />
50<br />
1<br />
1<br />
1<br />
F<br />
1 F 3.6<br />
4<br />
F<br />
1 F<br />
16<br />
12<br />
F<br />
1 F<br />
64<br />
40<br />
30<br />
3.4<br />
10<br />
20<br />
3.39 3.2<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
8.421<br />
8<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
13.389<br />
10<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
0 F<br />
0.06<br />
0 F<br />
0.06<br />
0 F<br />
0.06<br />
256<br />
300<br />
1200<br />
1.024 . 10 3<br />
250<br />
1000<br />
200<br />
800<br />
1<br />
1<br />
F<br />
1 F 150<br />
256<br />
F<br />
1 F 600<br />
1024<br />
100<br />
400<br />
50<br />
103<br />
200<br />
15.706<br />
0<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
16.416<br />
0<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
0 F<br />
0.06<br />
0 F<br />
0.06
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
The graphs have values N p of 4, 16, 64, 256 <strong>and</strong> 1024. Notice that as the value N p<br />
increases, the area under the curve decreases, meaning that the non-parallizable part of<br />
the serial program has a greater effect <strong>and</strong> the degeneration occurs faster as N p increases.<br />
Amdahl’s intention was to show “the continued validity of the single processor approach<br />
<strong>and</strong> of the weaknesses of the multiple processor approach”. His paper proposed<br />
arguments to support his proposal, such as:<br />
• “The nature of this overhead appears to be sequential so that it is unlikely to be<br />
amenable to parallel processing techniques.”<br />
• “A fairly obvious conclusion which can be drawn at this point is that the effort<br />
expended on achieving high parallel performance rates is wasted unless it is<br />
accompanied by achievements in sequential processing rates of very nearly the<br />
same magnitude.”<br />
Gustafson’s Law<br />
In 1988, John L. Gustafson proposed the notion that massively parallel processing was<br />
beneficial because Amdahl’s law implies that the parallel part of the computation <strong>and</strong> the<br />
number of processors is independent [ lxiii ]. He proposed a formula for a scaled speedup<br />
based on an observation that in most real world computations “the problem size scales<br />
with the number of processors”. His proposed formula is:<br />
S =<br />
fraction _ serial + ( fraction _ parallel × number _ of _ processors)<br />
( fraction _ serial + fraction _ parallel = 1)<br />
Fs<br />
+ (1 − Fs<br />
) × N<br />
p<br />
=<br />
F + (1 − F )<br />
s<br />
s<br />
=<br />
F + (1 − F ) × N<br />
s<br />
1<br />
s<br />
p<br />
= F + N<br />
s<br />
p<br />
− N<br />
p<br />
F<br />
s<br />
= N<br />
p<br />
+ ( F − N F )<br />
s<br />
p<br />
s<br />
= N<br />
p<br />
+ (1 − N<br />
p<br />
) × F<br />
s<br />
104
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
where S is the speedup, the serial portion is F s <strong>and</strong> N p is the number of processors.<br />
Again, assume that F s = 0.06. Then F p = 1–F s = 0.94. For 4 processors:<br />
S<br />
= N<br />
p<br />
p s<br />
+ ( 1−<br />
N ) × F = 4 + (1 − 4) × 0.06 = 4 − 0.18 = 3.82<br />
The table <strong>and</strong> graphs below show the same data as in Amdahl but using Gustafson’s law.<br />
Processors(N) 1 2 4 8 16 32 64 128 256 512 1024<br />
Run Time 1024.0000 527.8351 268.0628 135.0923 67.8146 33.9748 17.0043 8.5064 4.2543 2.1274 1.0638<br />
Speedup 1.0000 1.9400 3.8200 7.5800 15.1000 30.1400 60.2200 120.3800 240.7000 481.3400 962.6200<br />
Efficiency 100.00% 97.00% 95.50% 94.75% 94.38% 94.19% 94.09% 94.05% 94.02% 94.01% 94.01%<br />
Cost 1.0000 1.0309 1.0471 1.0554 1.0596 1.0617 1.0628 1.0633 1.0636 1.0637 1.0638<br />
4<br />
4<br />
16<br />
16<br />
65<br />
65<br />
3.95<br />
15.8<br />
64<br />
15.6<br />
4 ( 1 4) F<br />
3.9<br />
16 ( 1 16) F<br />
.<br />
64 1 F 64 ( )<br />
63<br />
15.4<br />
3.85<br />
15.2<br />
62<br />
3.82<br />
. 0.06<br />
3.8<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
0 F<br />
15.1<br />
. 0.06<br />
15<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
0 F<br />
61.16<br />
61<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
0 F<br />
0.06<br />
260<br />
256<br />
1040<br />
1.024 . 10 3<br />
255<br />
1020<br />
256 ( 1 256) F 250<br />
. 0.06<br />
1024 ( 1 1024) F1000<br />
. 0.06<br />
245<br />
980<br />
240.7 240<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
0 F<br />
Consider the following diagrams, which are similar to those in Gustafson’s paper:<br />
Time = s A + p A = 1<br />
s A<br />
p A<br />
Single Processor<br />
962.62<br />
960<br />
0 0.01 0.02 0.03 0.04 0.05 0.06<br />
0 F<br />
s A<br />
p A /N p<br />
N Processors<br />
Time = s A + p A /N p<br />
105
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Time = s G +N p p G<br />
s G<br />
N p p G<br />
Single Processor<br />
s G<br />
p G<br />
N Processors<br />
Time = s G + p G = 1<br />
Under Gustafson’s proposal, increasing the number of processors has little affect on cost<br />
or efficiency <strong>and</strong> an almost linear speedup, as shown in the graphs above. The problem<br />
with this method of evaluating computational speedup is that the serial <strong>and</strong> parallel<br />
programs perform different numbers of operations on the primary task because the task<br />
for the parallel implementation is N p times larger than that of the serial. If the<br />
parallelized operation were matrix multiplication on n 2 matrices for n s = 10, there would<br />
be 10 3 = 1000 multiplication <strong>and</strong> 1000 addition operations in the serial program. If you<br />
scale up the problem for N p = 4 processors the multiplication operations must increase to<br />
4000 <strong>and</strong> the matrix n p size must increase to:<br />
3<br />
4000 =<br />
3<br />
1000 ×<br />
3<br />
4 = 10×<br />
1.5874 ≈ 16<br />
Because matrix multiplication is O(n 3 ) complexity, increasing the size of the matrix, even<br />
minimally, creates a much bigger job. An observation by Yuan Shi was proposed in [ lxiv ],<br />
where an equivalence between Amdahl’s Law <strong>and</strong> Gustafson’s Law is explained. The<br />
relationship is based on the adjustment to the serial fraction in Amdahl’s Law, call it F sA ,<br />
<strong>and</strong> the unadjusted serial fraction used in Gustafson’s Law, call it F sG , such that:<br />
F<br />
sA<br />
1<br />
=<br />
(1 − FsG<br />
) × N<br />
1+<br />
F<br />
sG<br />
p<br />
As an example, consider a task that has serial fraction F sG = 0.05 with 1024 processors.<br />
Amdahl’s Law would predict speedup S to be:<br />
106
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
S =<br />
F<br />
sG<br />
1<br />
1−<br />
F<br />
+<br />
N<br />
p<br />
sG<br />
1<br />
=<br />
1−<br />
0.05<br />
0.05 +<br />
1024<br />
1<br />
=<br />
0.95<br />
0.05 +<br />
1024<br />
1<br />
=<br />
0.05 + 0.0009277<br />
Gustafson’s Law predicts:<br />
=<br />
1<br />
0.0509277<br />
= 19.635666<br />
S<br />
+ (1 − N<br />
) × F<br />
= N<br />
p<br />
p s<br />
= 1024 + (1 −1024)<br />
× 0.05<br />
= 1024 −1023×<br />
0.05 = 1024 − 51.15 = 972.85<br />
However when the serial fraction F sA is calculated from F sG using the equation above, we<br />
have:<br />
F<br />
sA<br />
1<br />
=<br />
(1 − FsG<br />
) × N<br />
1+<br />
F<br />
sG<br />
p<br />
1<br />
=<br />
(1 − 0.05) × 1024<br />
1+<br />
0.05<br />
1<br />
=<br />
972.8<br />
1+<br />
0.05<br />
1<br />
= =<br />
1+<br />
19456<br />
1<br />
19457<br />
=<br />
5.14E - 05<br />
We substitute F sA for F sG <strong>and</strong> solve:<br />
S =<br />
1<br />
1−<br />
Fs<br />
Fs<br />
+<br />
N<br />
p<br />
=<br />
5.14E - 05<br />
1<br />
1−<br />
5.14E - 05<br />
+<br />
1024<br />
=<br />
5.14E - 05<br />
1<br />
+<br />
0.99994<br />
1024<br />
=<br />
5.14E - 05<br />
1<br />
=<br />
+ 9.7650E - 04<br />
1<br />
1.0279E - 03<br />
= 972.85<br />
For this situation, the claim of equivalent results with Gustafson’s Law by obtaining F sA<br />
from F sG , as defined above, <strong>and</strong> substituting F sA for F sG in Amdahl’s Law is true. The<br />
table below shows that this is true for all number of processors, where Np = {1, 2, 4, 8,<br />
…, 1024} <strong>and</strong> F sG = 0.05.<br />
107
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Processors N p<br />
1 2 4 8 16 32 64 128 256 512 1024<br />
F sG 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05<br />
F sA<br />
0.05 0.025641 0.012987 0.0065359 0.0032787 0.001642 0.0008217 0.000411 0.0002055 0.0001028 5.14E-05<br />
Amdahl-F sG 1 1.9047619 3.4782609 5.9259259 9.1428571 12.54902 15.421687 17.414966 18.618182 19.284369 19.635666<br />
Gustafson 1 1.95 3.85 7.65 15.25 30.45 60.85 121.65 243.25 486.45 972.85<br />
Amdahl-F sA 1 1.95 3.85 7.65 15.25 30.45 60.85 121.65 243.25 486.45 972.85<br />
The table below shows that this is also true for all F sG , where F sG = {0.01, 0.02, …, 0.90,<br />
0.1, 0.2} <strong>and</strong> Np = 1024.<br />
Processors N p 1024 1024 1024 1024 1024 1024 1024 1024 1024 1024 1024<br />
F sG 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.2<br />
F sG 9.864E-06 1.993E-05 3.02E-05 4.069E-05 5.14E-05 6.233E-05 7.35E-05 8.491E-05 9.657E-05 0.0001085 0.0002441<br />
Amdahl-F sG 91.184328 47.716682 32.313033 24.427481 19.635666 16.415518 14.102741 12.361178 11.002471 9.9128751 4.9805447<br />
Gustafson 1013.77 1003.54 993.31 983.08 972.85 962.62 952.39 942.16 931.93 921.7 819.4<br />
Amdahl-F sA 1013.77 1003.54 993.31 983.08 972.85 962.62 952.39 942.16 931.93 921.7 819.4<br />
Performance Metrics<br />
Performance metrics are basically measures of computer <strong>and</strong>/or network system behavior<br />
over a given period of time. The four primary types of performance metrics:<br />
• Latency<br />
• Throughput<br />
• Efficiency<br />
• Availability<br />
• Reliability<br />
• Utilization<br />
Latency is also called response time. It is a measure of the delay between the initial time<br />
of a request for some service <strong>and</strong> the time that the service arrives, expressed in units of<br />
elapsed time. The elapsed time between the completion of dialing a phone number <strong>and</strong><br />
the first ring, the time that a router holds a packet, <strong>and</strong> the time spent waiting for a Web<br />
page to be displayed after a hyperlink is clicked are all latency metrics. It can be stated<br />
as a statistical distribution. An example is a server that must acknowledge 99.9% of<br />
client requests in one second or less.<br />
108
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Throughput, also called capacity, is the rate that results arrive or the amount of work<br />
done in a given time. It is measured in the quantity of units per time. Megabits per<br />
second of data transmitted across a network, transactions completed per minute in a<br />
transaction server, <strong>and</strong> gigabytes of data per second transferred across a system buss are<br />
all throughput metrics. The theoretical maximum throughput is called b<strong>and</strong>width. The<br />
b<strong>and</strong>width of a 400Mhz, 64-bit data bus is 25.6Gb/s (400Mhz × 64-bit) but the actual<br />
throughput is less because of padding between data blocks <strong>and</strong> control protocols.<br />
The ratio of usable throughput compared to the b<strong>and</strong>width is called efficiency. The<br />
efficiency of a 400Mhz, 64-bit data bus, with a throughput of 20.48Gb/s, is 80%<br />
(20.48Gb/s ÷ 25.6Gb/s). Goodput is the arrival rate of good data packets across a<br />
computer network. If, on average, 920 packets arrive uncorrupted at the destination, the<br />
goodput is said to be 92%.<br />
Availability is the percentage of time that a system is available to provide service. If a<br />
server is down for 15 minutes each day for maintenance, it has 98.96% availability<br />
(1425min ÷ 1440min).<br />
The reliability metric reports the mean time between failures (MTBF), which indicates<br />
the average period that the system is usable. The mean time to repair (MTTR) is the<br />
average time to recover from failures.<br />
Utilization is the percentage of time that a component in the system is active. Utilization<br />
is typically measured as a percentage. The capacity or maximum throughput of a system<br />
is reached when the utilization of the busiest component is 100%. Many systems have a<br />
utilization threshold because as utilization approaches 100%, system latency quickly<br />
increases.<br />
Performance metrics for parallel systems include the following:<br />
• Runtime<br />
• Speedup<br />
• Efficiency<br />
• Cost<br />
• Scalability<br />
The run time of a parallel system is elapsed time from the instance of execution of the<br />
master or controller program until the last program in the parallel system terminates. T s<br />
usually denotes the serial or single processor run time of a task is <strong>and</strong> T p usually denotes<br />
the parallel run time.<br />
109
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Speedup, usually denoted by S, is the ratio calculated by dividing the serial run time of a<br />
particular task by the parallel run time for the same task:<br />
T<br />
S =<br />
T<br />
s<br />
p<br />
As an example, if two size n matrices are to be multiplied, the operation has complexity<br />
Θ(n 3 ). Assuming that the run time for the operation a single processor is n 3 , the<br />
theoretical speedup, ignoring parallel system overhead, for 2 processors is:<br />
3<br />
n T<br />
= , S =<br />
2 T<br />
3<br />
3<br />
1<br />
1<br />
= n , T2<br />
= =<br />
3<br />
2<br />
n<br />
T<br />
Be careful not to make the following mistake for parallel time <strong>and</strong> speedup:<br />
n<br />
2<br />
2<br />
T<br />
s<br />
= n<br />
3<br />
3<br />
3<br />
3<br />
, ⎛ n ⎞ n Ts<br />
n<br />
Tp<br />
= ⎜ ⎟ = , S = =<br />
⎝ 2 ⎠ 8 T n<br />
3<br />
p<br />
8<br />
= 8<br />
This assumes a change in the overall problem size, which is false because matrix<br />
multiplication is n 3 multiplications <strong>and</strong> n 3 additions, regardless of how many processors<br />
are used.<br />
Efficiency, usually denoted as E, is the ratio calculated by dividing the speedup S by the<br />
number of processors N p , which measures the percentage of time that a processor is<br />
working on the primary task. For the matrix multiplication example the efficiency is:<br />
E =<br />
S<br />
N p<br />
2<br />
= = 1 = 100%<br />
2<br />
Parallel system overhead T o can decrease system efficiency. Parallel system overhead<br />
consists of all the necessary operations to manage <strong>and</strong> setup the parallel system, divide<br />
the task among the processors, transmit the task to the worker processes, collect the<br />
results from the processes <strong>and</strong> compile the results. It may include pieces of the sequential<br />
program that cannot be parallelized T 1-p . Hence a more realistic formula for the run time<br />
with n processors T n , where T c is the time spent on computation of the task, is:<br />
110
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
T<br />
n<br />
= Tc<br />
+ To<br />
+ T1<br />
− p<br />
Assume that the following values are valid for the matrix multiplication above:<br />
• Sequential run time T 1 120sec<br />
• Parallel computation time T c 60sec<br />
• Parallel overhead T o 20sec<br />
• Assume no non-parallizable code T 1-p 0sec<br />
Then speedup would be<br />
120sec<br />
T1 = 60sec,<br />
T2<br />
= Tc<br />
+ To<br />
+ T1<br />
− p<br />
= 60sec+<br />
20sec+<br />
0sec = 80sec, S = = 1.5 = 150%<br />
80sec<br />
This is somewhat less than the previous speedup.<br />
The cost C of a parallel system is calculated by multiplying the parallel run T n time <strong>and</strong><br />
the number of processors N p divided by the sequential run time T 1 :<br />
T<br />
C =<br />
n<br />
×<br />
T 1<br />
N<br />
p<br />
The values in the example above, ignoring overhead, would be.<br />
T<br />
C =<br />
n<br />
× N<br />
T<br />
1<br />
p<br />
60sec×<br />
2 120sec<br />
= = = 1<br />
120sec 120sec<br />
This equation is shows that the parallel system is optimal because the increase in speed is<br />
proportional with the number of processors added. Typically costs are not optimal.<br />
Considering the overhead in the example above, we have:<br />
T<br />
C =<br />
n<br />
× N<br />
T<br />
1<br />
p<br />
80sec×<br />
2 160sec<br />
= = = 1.333.<br />
120sec 120sec<br />
Timing Models<br />
111
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Gathering System Performance Data<br />
Gathering Network Performance Data<br />
Optimal Load balancing<br />
Load balancing is the efficient distribution of the workload over all available processors,<br />
keeping all processors busy until the task is complete. Not all machines will have the<br />
same computational capacity. Some machines may have lower processor speeds or other<br />
tasks that consume system resources. The idea is to shift more work to processors that<br />
can accommodate it. Optimization is the modification of a system to improve<br />
performance <strong>and</strong> efficiency. Optimal load balancing occurs when the latency of requests<br />
is minimized, computation is distributed equally across all processors, system throughput<br />
is maximized, <strong>and</strong> the system completes all tasks in the least possible time. An<br />
absolutely optimal system is rare <strong>and</strong> can be difficult to produce. Optimization usually<br />
involves compromise. Performance or efficiency in one part of a system may have to be<br />
sacrificed to optimize another part.<br />
Successful optimization requires the development of sound algorithms <strong>and</strong> a functional<br />
prototype. Challenges to load balancing include problems with timing, communication,<br />
synchronization, <strong>and</strong> iterative tasks <strong>and</strong> branching that may depend conditions elsewhere<br />
in the parallel system. If tasks in a parallel system have differing execution times, one or<br />
more processors will have to wait for the longest executing task to finish.<br />
Communication <strong>and</strong> synchronization will occur over some communication channel, such<br />
as the system buss or a network. Systems that require an abundance of communication<br />
may cause a bottleneck in these channels. If the channel is shared between multiple<br />
processes, competition for the resource may cause contention in heavily loaded channels.<br />
Loops <strong>and</strong> branches can easily lead to non-deterministic program behavior if measures<br />
are not employed to prevent it.<br />
There are two classifications of load balancing: static <strong>and</strong> dynamic. Static load balancing<br />
uses statistics, based on the ability of each processor’s ability to perform, to share the<br />
burden of the workload. Dynamic load balancing shares work by dynamically averaging<br />
job size based on the performance of participating processors. Dynamic load balancing<br />
requires more communication synchronization between processes, which consumes<br />
communication time. However, the tradeoff is that dynamic load balancing can h<strong>and</strong>le<br />
unexpected delays when jobs take unreasonable amounts of time, where static load<br />
balancing cannot. If a task is taking longer than anticipated, some work can be sent to<br />
other processes. The extra communication may decrease throughput but the processes<br />
will be kept busy. It is also important to mention that load balancing should reduce the<br />
112
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
overall run time for the system. If it takes less time to complete the task without it, we<br />
should forgo load balancing. lxv<br />
113
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
About <strong>Synergy</strong><br />
Blue text: Copied <strong>and</strong> pasted from Getting Started by Dr. Shi<br />
Red text: Copied <strong>and</strong> pasted from syng_man.ps by Dr. Shi<br />
Introduction to The <strong>Synergy</strong> Project<br />
What is <strong>Synergy</strong>?<br />
<strong>Synergy</strong> is a parallel computing system using a Stateless Parallel Processing (SPP)<br />
principle. It is a simplified prototype implementation of a Stateless Machine (SLM). It<br />
lacks backbone fault tolerance <strong>and</strong> stateful process fault tolerance. It is also known to<br />
have an inefficient tuple matching engine in comparison to the full implementation of<br />
SLM.<br />
SPP is based on coarse-grain dataflow processing. A full SLM implementation will<br />
offer, in addition to all benefits that <strong>Synergy</strong> affords, a more efficient tuple matching<br />
engine <strong>and</strong> a non-stop computing platform with total fault tolerance for stateful processes<br />
<strong>and</strong> for the backbone. An SLM can be considered a higher form of Symmetric<br />
MultiProcessor (SMP).<br />
Functionally, <strong>Synergy</strong> can be thought of as an equivalent to PVM, Linda or MPI/MPICH.<br />
<strong>Synergy</strong> uses passive objects for inter-process(or) communication. It offers<br />
programming ease, load balancing <strong>and</strong> fault tolerance benefits. The applicationprogramming<br />
interface (API) is a small set of operators defined on the supported object<br />
types, such as tuple space, file <strong>and</strong> database. <strong>Synergy</strong> programs use a conventional openmanipulate-close<br />
sequence for each passive object. Each <strong>Synergy</strong> program is<br />
individually compiled using a conventional compiler <strong>and</strong> a <strong>Synergy</strong> Language Injection<br />
Library (LIL). A parallel application is synthesized through a configuration specification<br />
(CSL) <strong>and</strong> an automatic processor-binding algorithm. <strong>Synergy</strong> runtime system can<br />
execute multiple parallel applications on the same cluster at the same time.<br />
<strong>Synergy</strong> API blends well into the conventional sequential programs. It is particularly<br />
helpful for reengineering legacy applications. It even allows parallel processing of mixed<br />
PVM <strong>and</strong> MPI programs.<br />
114
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
<strong>Synergy</strong> <strong>and</strong> SPP<br />
<strong>Synergy</strong> is a prototype implementation of a StateLess Machine (SLM). It uses a Passive<br />
Object-Flow Programming (POFP) method to offer programming ease, process fault<br />
Tolerance <strong>and</strong> high efficiency using cluster of networked computers.<br />
In principle, a Stateless Parallel Processing (SPP) system requires total location<br />
transparency for all processes (running programs). This affords three important nonfunctional<br />
features: ease of programming, fault tolerance <strong>and</strong> load balancing.<br />
In programming, this means that location (host address <strong>and</strong> port) dependent IPC<br />
primitives are NOT allowed. Consequently, a special asynchronous IPC layer (of Passive<br />
Objects) is used for inter-process communication <strong>and</strong> synchronization. The SPP runtime<br />
system can automatically determine the optimal process-to-processor binding during the<br />
execution of a parallel application. This additional IPC layer does carry some overheads<br />
in comparison to direct IPC systems such as MPI/PVM. In return, it gives three critical<br />
benefits: programming ease, load balancing <strong>and</strong> fault tolerance support at the architecture<br />
level.<br />
Why <strong>Synergy</strong>?<br />
First, one hidden fact that has not been mentioned in any high performance<br />
multiprocessor's literature is that the use of multiple processors for a single application<br />
necessarily reduces its availability if any processor failure can halt the entire application.<br />
The current state of art in parallel processing is still under the shadow of this gloomy fact.<br />
SPP offers an approach that promises breakthroughs in both high performance <strong>and</strong> high<br />
availability using multi-processors. <strong>Synergy</strong> is the first prototype designed to explore<br />
architectural flaws <strong>and</strong> to validate the claims of SPP.<br />
Second, technically, separation of functional programming from process coordination <strong>and</strong><br />
resource management functions can ease parallel programming while maintaining high<br />
performance <strong>and</strong> availability. Although many believe that explicit manipulation of<br />
processes <strong>and</strong> data objects can produce highly optimized parallel codes, we believe ease<br />
of programming, high performance <strong>and</strong> high availability are of a higher importance in<br />
making industrial strength parallel applications using multiprocessors.<br />
115
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
<strong>Synergy</strong> Philosophy<br />
Facilitating the best use of computing <strong>and</strong> networking resources for each application is<br />
the key philosophy in <strong>Synergy</strong>. We advocate competitive resource sharing as opposed to<br />
``cycle stealing.'' The tactic is to reduce processing time for each application. Multiple<br />
running applications would fully exploit system resources. The realization of the<br />
objectives, however, requires both quantitative analysis <strong>and</strong> highly efficient tools.<br />
It is inevitable that parallel programming <strong>and</strong> debugging will be more time consuming<br />
than single thread processing regardless how well the application programming interface<br />
(API) is designed. The illusive parallel processing results taught us that we must have<br />
quantitatively convincing reasons to processing an application in parallel before<br />
committing to the potential expenses (programming, debugging <strong>and</strong> future maintenance.)<br />
We use Timing Models to evaluate the potential speedups of a parallel program using<br />
different processors <strong>and</strong> networking devices [13]. Timing models capture the orders of<br />
timing costs for computing, communication, disk I/O <strong>and</strong> synchronization requirements.<br />
We can quantitatively examine an application's speedup potential under various processor<br />
<strong>and</strong> networking assumptions. The analysis results delineate the limit of hopes. When<br />
applied to practice, timing models provide guidelines for processing grain selection <strong>and</strong><br />
experiment design.<br />
Efficiency analysis showed that effective parallel processing should follow an<br />
incremental coarse-to-fine grain refinement method. Processors can be added only if<br />
there are unexplored parallelism, processors are available <strong>and</strong> the network is capable of<br />
carrying the anticipated load. Hard-wiring programs to processors will only be efficient<br />
for a few special applications with restricted input at the expense of programming<br />
difficulties.<br />
To improve performance, we took an application-oriented approach in the tool design.<br />
Unlike conventional compilers <strong>and</strong> operating systems projects, we build tools to<br />
customize a given processing environment for a given application. This customization<br />
defines a new infrastructure among the pertinent compilers, operating systems <strong>and</strong> the<br />
application for effective resource exploitation. Simultaneous execution of multiple<br />
parallel applications permits exploiting available resources for all users. This makes the<br />
networked processors a fairly real ``virtual supercomputer.''<br />
An important advantage of the <strong>Synergy</strong> compiler-operating system-application<br />
infrastructure is the higher level portability over existing systems. It allows written<br />
parallel programs to adapt into any programming, processor <strong>and</strong> networking technologies<br />
without compromising performance.<br />
116
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
An important lesson we learned was that mixing parallel processing, resource<br />
management <strong>and</strong> functional programming tools in one language made tool automation<br />
<strong>and</strong> parallel programming unnecessarily difficult. This is especially true for parallel<br />
processors employing high performance uni-processors.<br />
Building timing models before parallel programming can determine the worthiness of the<br />
undertaking in the target multiprocessor environment <strong>and</strong> prevent costly design mistakes.<br />
The analysis can also provide guidelines for parallelism grain size selection <strong>and</strong><br />
experiment design (http://joda.cis.temple.edu/~shi/super96/timing/timing.html)<br />
Except for server programs, all parallel processing applications can be represented by a<br />
coarse grain dataflow graph (CGDG). In CGDG, each node is either a repetition node or<br />
a non-repetition node. A repetition node contains either an iterative or recursive process.<br />
The edges represent data dependencies. It should be fairly obvious that CGDG must be<br />
acyclic.<br />
CGDG fully exhibits potential effective (coarse grain) parallelism for a given application.<br />
For example, the SIMD parallelism is only possible for a repetition node. The MIMD<br />
parallelism is possible for any 1-K branch in CGDG. Pipelines exist along all<br />
sequentially dependent paths provided that there are repetitive input data feeds. The<br />
actual processor assignment determines the deliverable parallelism.<br />
Any repetition node can be processed in a coarse grain SIMD (or scatter-<strong>and</strong>-gather)<br />
fashion. The implementation of a repetition node is to have a master <strong>and</strong> a worker<br />
program connected via two tuple space objects. The master is responsible for distributing<br />
the work tuples <strong>and</strong> collecting results. The worker is responsible for computing the<br />
results from a given input <strong>and</strong> delivering the results.<br />
For all other components in the graph, one can use tuple space or pipe. The use of<br />
file <strong>and</strong> database (yet to be implemented) objects is defined by the application.<br />
Following the above description results in a static IPC graph using passive objects. The<br />
programmer's job is to compose parallel programs communicating with these objects.<br />
History<br />
<strong>Synergy</strong> V3.0 is an enhancement to <strong>Synergy</strong> V2.0 (released in early 1994). Earlier<br />
versions of the same system appeared in the literature under the names of MT (1989),<br />
ZEUS (1986), Configurator (1982) <strong>and</strong> <strong>Synergy</strong> V1.0 (1992) respectively.<br />
117
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Major Components <strong>and</strong> Inner Workings of<br />
<strong>Synergy</strong><br />
Technically, the <strong>Synergy</strong> system is an automatic client/server software generation system<br />
that can form an effective parallel processor for each application using multiple<br />
distributed Unix or Linux computers. This parallel processor is specifically engineered to<br />
process programs inter-connected in an application dependent IPC (Inter-Program<br />
Communication/ Synchronization) graph using industry st<strong>and</strong>ard compilers, operating<br />
systems <strong>and</strong> communication protocols. This IPC graph exhibits application dependent<br />
coarse grain SIMD (Single Instruction Multiple Data), MIMD (Multiple Instruction<br />
Multiple Data) <strong>and</strong> pipeline parallelisms.<br />
<strong>Synergy</strong> V3.0 supports three passive data objects for program-to-program communication<br />
<strong>and</strong> synchronization:<br />
1. Tuple space (a FIFO ordered tuple data manager)<br />
2. Pipe (a generic location independent indirect message queue)<br />
3. File (a location transparent sequential file)<br />
A passive object is any structured data repository permitting no object creation functions.<br />
All commonly known large data objects, such as databases, knowledge bases, hashed<br />
files, <strong>and</strong> ISAM files, can be passive objects provided the object creating operators are<br />
absent. Passive objects confine dynamic dataflows into a static IPC graph for any<br />
parallel application. This is the basis for automatic customization.<br />
POFP uses a simple open-manipulate-close sequence for each passive object. An onedimensional<br />
Coarse-To-Fine (CTF) decomposition method (see Adaptable Parallel<br />
Application Development section for details) can produce designs of modular parallel<br />
programs using passive objects. A global view of the connected parallel programs reveals<br />
application dependent coarse grain SIMD, MIMD <strong>and</strong> pipeline potentials. Processing<br />
grain adjustments are done via the work distribution programs (usually called Masters).<br />
These adjustments can be made without changing codes. All parallel programs can be<br />
developed <strong>and</strong> compiled independently.<br />
What are in <strong>Synergy</strong>? (<strong>Synergy</strong> Kernel with Explanation)<br />
The first important ingredient in <strong>Synergy</strong> is the confinement of inter-program<br />
communication <strong>and</strong> synchronization (IPC) mechanisms. They convert dynamic<br />
118
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
application dataflows to a static, bipartite IPC graph. In <strong>Synergy</strong>, this graph is used to<br />
automate process coordination <strong>and</strong> resource management. In other words, <strong>Synergy</strong> V3.0<br />
uses this static IPC graph to automatically map parallel programs onto set of networked<br />
computers that forms a virtual multiprocessor. In the full SLM implementation, this<br />
static IPC graph will be implemented via a self-healing backbone.<br />
<strong>Synergy</strong> v3.0 contains the following service components:<br />
• A language injection library (LIL). This is the API programmers use to compose<br />
parallel programs. It contains operators defined on supported passive objects,<br />
such as tuple space, file, pipe or database.<br />
• Two memory resident service daemons (PMD <strong>and</strong> CID). These daemons resolve<br />
network references <strong>and</strong> are responsible for remote process/object execution <strong>and</strong><br />
management.<br />
• Two dynamic object daemons (TSH <strong>and</strong> FAH). These daemons are launched<br />
before every parallel application begins <strong>and</strong> are removed after the application<br />
terminates. They implement the defined semantics of LIL operators.<br />
• A customized Distributed Application Controller (DAC). This program actually<br />
synthesizes a multiprocessor application. It conducts processor binding <strong>and</strong><br />
records relevant information about all processes involved in the application until<br />
completion. DAC represents a customized virtual multiprocessor for each<br />
application.<br />
• <strong>Synergy</strong> shell: (prun <strong>and</strong> pcheck). These programs are <strong>Synergy</strong> runtime user<br />
interface.<br />
o prun launches a parallel application<br />
o pcheck is a runtime monitor for managing multiple parallel applications<br />
<strong>and</strong> processes<br />
ADD PRUN AND LIL INFO HERE<br />
Program ``pcheck'' functions analogously as the ``ps'' comm<strong>and</strong> in Unix. It monitors<br />
parallel applications <strong>and</strong> keeps track of parallel processes of each application. Pcheck<br />
also allows killing running processes or applications if necessary.<br />
To make remote processors listening to personal comm<strong>and</strong>s, there are two light weight<br />
utility daemons: the Comm<strong>and</strong> Interpreter Daemon (cid) <strong>and</strong> the Port Mapper Daemon<br />
(pmd). Cid interprets a limited set of process control comm<strong>and</strong>s from the network for<br />
each user account. In other words, parallel users on the same processor need different<br />
cid's. Pmd (the peer leader) provides a "yellow page" service for locating local cid's.<br />
Pmd is automatically started by any cid <strong>and</strong> is transparent to all users.<br />
119
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
FDD is a Fault Detection Daemon. It is activated by an option in the prun comm<strong>and</strong> to<br />
detect worker process failures at runtime.<br />
<strong>Synergy</strong> V3.0 requires no root privileged processes. All parallel processes assume<br />
respective user security <strong>and</strong> resource restrictions defined at account creation. Parallel use<br />
of multiple computers imposes no additional security threat to the existing systems.<br />
Theoretically, there should be one object daemon for each supported object type. For the<br />
three supported types: tuple space, pipe <strong>and</strong> files, we saved the pipe daemon by<br />
implementing it directly in LIL. Thus, <strong>Synergy</strong> V3.0 has only two object daemons: the<br />
Tuple Space H<strong>and</strong>ler (tsh) <strong>and</strong> the File Access H<strong>and</strong>ler (fah). The object daemons, when<br />
activated, talk to parallel programs via the LIL operators under the user defined identity<br />
(via CSL). They are potentially resource hungry. However they only "live" on the<br />
computers where they are needed <strong>and</strong> permitted.<br />
Optimal processor assignment is theoretically complex. <strong>Synergy</strong>'s automatic processor<br />
binding algorithm is extremely simple: unless specifically designated, it binds all tuple<br />
space objects, one master <strong>and</strong> one worker to a single processor. Other processors run the<br />
worker-type (with repeatable logic) processes. Since network is the bottleneck, this<br />
binding algorithm minimizes network traffic thus promising good performance for most<br />
applications using the current tuple matching engine. The full implementation of SLM<br />
will have a distributed tuple matching engine that promises to fulfill a wider range of<br />
performance requirements.<br />
Fault tolerance is a natural benefit of the SPP design. Processor failures discovered<br />
before a run are automatically isolated. Worker processor failures during a parallel<br />
execution is treated in V3.0 by a "tuple shadowing" technique. <strong>Synergy</strong> V3.0 can<br />
automatically recover the lost data from a lost worker with little overhead. This feature<br />
brings the availability of a multiprocessor application to be equal to that of a single<br />
processor <strong>and</strong> is completely transparent to application programs.<br />
<strong>Synergy</strong> provides the basis for automatic load balancing. However, optimal load<br />
balancing requires adjusting tuple sizes. Tuple size adjustments can adapt guided selfscheduling<br />
[1], factoring [2] or fixed chunking using the theory of optimal granule size<br />
for load balancing [3].<br />
<strong>Synergy</strong> V3.0 runs on clusters of workstations. This evaluation copy allows unlimited<br />
processors across multiple file systems (*requires one binary installation per file system).<br />
120
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Comparisons with Other Systems<br />
<strong>Synergy</strong> vs. PVM/MPI<br />
PVM/MPI is a direct message passing system [5,6] that requires inter-process<br />
communication be carried out based on process task id's. This requirement forces an<br />
extra user-programming layer if fault tolerance <strong>and</strong> load balancing are desired. This is<br />
because for load balancing <strong>and</strong> fault tolerance, working data cannot be "hard wired" to<br />
specific processors. An "anonymous" data item can only be supplied using an additional<br />
data management layer providing a tuple space-like interface. In this sense, we consider<br />
PVM/MPI a lower level parallel API as compared to Linda <strong>and</strong> <strong>Synergy</strong>.<br />
Fault tolerant <strong>and</strong> load balanced parallel programs typically require more inter-process<br />
communication than direct message passing since they refresh their states frequently in<br />
order to expose more “stateless moments” – critical to load balance <strong>and</strong> fault tolerance.<br />
This is a tradeoff that users must make before adapting the <strong>Synergy</strong> parallel programming<br />
platform.<br />
<strong>Synergy</strong> vs. Linda<br />
The original Linda implementation [4] uses a virtual global tuple space implemented<br />
using a compile time analysis method. The main advantage of the Linda method is the<br />
potential to reduce communication overhead. It was believed that many tuple access<br />
patterns could be un-raveled into single lines of communication. Thus the compiler can<br />
build the machine dependent codes directly without going through an intermediate<br />
runtime daemon that would potentially double the communication latency of each tuple<br />
transmission. However, experiments indicate that majority applications do not have<br />
static tuple access patterns that a compiler can easily discern. As a result, increased<br />
communication overhead is inevitable.<br />
The compile time tuple binding method is also detrimental to fault tolerance <strong>and</strong> load<br />
balancing.<br />
Another problem in the Linda design is the limited scalability. Composing all parallel<br />
programs in one file <strong>and</strong> compiled by a single compiler makes programming<br />
unnecessarily complex <strong>and</strong> is impractical to large-scale applications. It also presents<br />
difficulties for mixed language processing.<br />
121
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
In comparison, <strong>Synergy</strong> uses dynamic tuple binding at the expense of increased<br />
communication overhead by using dynamic tuple space daemons. In the full SLM<br />
implementation, this overhead will be reduced by a distributed tuple matching engine.<br />
Practical computational experiments indicate that synchronization overhead (due to load<br />
imbalance) logged more time than communication. Thus <strong>Synergy</strong>'s load balancing<br />
advantage can be used to offset its increased communication overhead.<br />
Parallel Programming <strong>and</strong> Processing in <strong>Synergy</strong><br />
A parallel programmer must use the passive objects for communication <strong>and</strong><br />
synchronization purposes. These operations are provided via the language injection<br />
library (LIL). LIL is linked to source programs at compilation time to generate hostless<br />
binaries that can run on any binary compatible platforms.<br />
After making the parallel binaries the interconnection of parallel programs (IPC graph)<br />
should be specified in CSL (Configuration Specification Language). Program ``prun''<br />
starts a parallel application. Prun calls CONF to process the IPC graph <strong>and</strong> to complete<br />
the program/object-to-processor assignments automatically or as specified. It then<br />
activates DAC to start appropriate object daemons <strong>and</strong> remote processes (via remote<br />
cid's). It preserves the process dependencies until all processes are terminated.<br />
Building parallel applications using <strong>Synergy</strong> requires the following steps:<br />
1. Parallel program definitions. This requires, preferably, establishing timing models<br />
for a given application. Timing model analysis provides decomposition<br />
guidelines. Parallel programs <strong>and</strong> passive objects are defined using these<br />
guidelines.<br />
2. Individual program composition using passive objects.<br />
3. Individual program compilation. This makes hostless binaries by compiling the<br />
source programs with the <strong>Synergy</strong> object library (LIL). It may also include<br />
moving the binaries to the $HOME/bin directory when appropriate.<br />
4. Application synthesis. This requires a specification of program-to-program<br />
communication <strong>and</strong> synchronization graph (in CSL). When needed, user preferred<br />
program-to-processor bindings are to be specified as well.<br />
5. Run (prun). At this time the program synthesis information is mapped on to a<br />
selected processor pool. Dynamic IPC patterns are generated (by CONF) to guide<br />
the behavior of remote processes (via DAC <strong>and</strong> remote cid's). Object daemons are<br />
started <strong>and</strong> remote processes are activated (via DAC <strong>and</strong> remote cid's).<br />
6. Monitor <strong>and</strong> control (pcheck).<br />
122
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Load Balancing <strong>and</strong> Performance Optimization<br />
Fault Tolerance<br />
123
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Installing <strong>and</strong> Configuring <strong>Synergy</strong><br />
Red text: Copied <strong>and</strong> pasted from syng_man.ps by Dr. Shi<br />
Gray text: Copied <strong>and</strong> pasted from a document by Dr. Shi<br />
Basic Requirements<br />
In addition to installing <strong>Synergy</strong> V3.0 on each computer cluster, there are four<br />
requirements for each ``parallel'' account:<br />
1. An active SNG_PATH symbol definition pointing to the directory where <strong>Synergy</strong><br />
V3.0 is installed. It is usually /usr/local/synergy.<br />
2. An active comm<strong>and</strong> search path ($SNG_PATH/bin) pointing to the directory<br />
holding the <strong>Synergy</strong> binaries.<br />
3. A local host file ($HOME/.sng_hosts). Note that this file is only necessary for a<br />
host to be used as an application submission console.<br />
4. An active personal comm<strong>and</strong> interpreter (cid) running in the background. Note<br />
that the destination of future parallel process's graphic display should be defined<br />
before starting cid.<br />
Since the local host file is used each time an application is started, it needs to reflect a)<br />
all accessible processors; <strong>and</strong> b) selected hosts for the current application.<br />
Unpacking<br />
To uncompress, at Unix prompt, type<br />
% uncompress synergy-3.0.tar.Z<br />
To untar,<br />
% tar -xvf synergy-3.0.tar<br />
A directory called "synergy" will be created <strong>and</strong> all files<br />
unpacked under this directory.<br />
Compiling<br />
124
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
To compile, change to the synergy directory <strong>and</strong> type<br />
% make<br />
The current version has been tested on these platforms:<br />
- SUN 3/4, SunOs<br />
- IBM RS6000, AIX<br />
- DEC Alpha, OSF/1<br />
- DEC ULTRIX<br />
- Silicon Graphics, SGI<br />
- HP, HP-UX<br />
- CDC cyber, EP/IX<br />
The makefile will try to detect the operating system <strong>and</strong> build binaries, libraries <strong>and</strong><br />
sample applications. You may need to edit the makefile if your system requires special<br />
flags, <strong>and</strong>/or if your include/library path is nonst<strong>and</strong>ard. Check the makefile for detail.<br />
Configuring the <strong>Synergy</strong> Environment<br />
After the installation procedure is complete, some minor changes must be made to the<br />
computers environment to access the <strong>Synergy</strong> system. When using a UNIX/Linux<br />
system we enter comm<strong>and</strong>s in a comm<strong>and</strong>-line environment called a shell. This shell<br />
must be configured to recognize the <strong>Synergy</strong> system. The two most used shells are C<br />
Shell (csh) <strong>and</strong> Bourne Again Shell (bash). Examples of configuration or profile files<br />
will be shown below for csh <strong>and</strong> bash. Because these files are hidden, you must type:<br />
ls –a<br />
<strong>and</strong> press the enter key at the terminal comm<strong>and</strong> prompt to view them.<br />
To configure csh, you must edit the “.cshrc” file in your home directory by adding the<br />
line:<br />
setenv SNG_PATH synergy_directory<br />
where synergy_directory is the directory containing all the binary files <strong>and</strong> the<br />
<strong>Synergy</strong> object library. Next, add the <strong>Synergy</strong> binary directory to the path definition by<br />
typing:<br />
set path=($SNG_PATH/bin $path)<br />
125
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
at the comm<strong>and</strong> line <strong>and</strong> pressing enter. It is important to add $SNG_PATH/bin before<br />
$path, since “prun” may be overloaded in some operating systems (such as SunOS 5.9).<br />
To activate the new settings enter:<br />
source .cshrc<br />
at the comm<strong>and</strong> prompt.<br />
An example of a “.cshrc” file after the settings have been changed, with the changes in<br />
bold, for the SunOS is:<br />
#ident "@(#)local.cshrc 1.2 00/05/01 SMI"<br />
umask 077<br />
set path=( /usr/users/shi/synergy/bin /opt/SUNWspro/bin /bin /usr/bin /usr/ucb<br />
/etc ~ )<br />
if ( -d ~/bin ) then<br />
set path=( $path ~/bin )<br />
endif<br />
set path=( $path . )<br />
if ( $?prompt ) then<br />
set history=32<br />
endif<br />
set prompt="[%n@%m %c ]%#"<br />
# Initialize new variables<br />
setenv LD_LIBRARY_PATH ""<br />
setenv MANPATH "/opt/SUNWspro/man"<br />
# Adding the SUN Companion CD Software, including GCC 2.95<br />
set path=( $path /opt/sfw/bin /opt/sfw/sparc-sun-solaris2.9/bin /usr/local/bin<br />
)<br />
setenv LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/opt/sfw/lib:/usr/local/lib"<br />
setenv MANPATH "/opt/sfw/man:/usr/local/man:${MANPATH}"<br />
# Adding Usr-Local-Bin<br />
set path=( $path /usr/local/bin )<br />
setenv LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/usr/local/lib"<br />
setenv MANPATH "/usr/local/man:${MANPATH}"<br />
# Usr-Sfw<br />
set path=( $path /usr/sfw/bin )<br />
setenv LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/usr/lib:/usr/sfw/lib"<br />
setenv MANPATH "${MANPATH}:/usr/man:/usr/sfw/man"<br />
# DT Window Manager<br />
set path=( $path /usr/dt/bin )<br />
#setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/dt/lib<br />
setenv MANPATH "${MANPATH}:/usr/dt/man"<br />
# GNOME<br />
126
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
set path=( $path /usr/share/gnome )<br />
setenv LD_LIBRARY_PATH "${LD_LIBRARY_PATH}:/usr/share/lib"<br />
setenv MANPATH "${MANPATH}:/usr/share/man"<br />
setenv SNG_PATH /usr/users/shi/synergy<br />
# SBIN<br />
set path=( $path /sbin /usr/sbin )<br />
An example “.cshrc” file for Linux OS would be:<br />
set path = ( ~ ~/bin /usr/java/j2sdk_nb/j2sdk1.4.2/bin $path \<br />
/usr/local/X11R6/bin /usr/local/bin /usr/bin /usr/users/shi/synergy/bin<br />
. )<br />
set noclobber<br />
limit coredumpsize 0<br />
# aliases for all shells<br />
#alias cd<br />
alias pwd<br />
alias edt<br />
'cd \!*;set prompt="`hostname`:`pwd`>"'<br />
'echo $cwd'<br />
'textedit -fn screen.b.14'<br />
set history = 1000<br />
set savehist = 400<br />
set ignoreeof<br />
set prompt="%m:%~>"<br />
alias help<br />
alias key<br />
man<br />
'man -k'<br />
setenv EDITOR 'pico -t'<br />
setenv MANPATH /usr/man:/usr/local/man:/usr/share/man<br />
setenv WWW_HOME http://www.cis.temple.edu<br />
setenv NNTPSERVER netnews.temple.edu<br />
setenv SNG_PATH /usr/users/shi/synergy<br />
#source ~/.aliases<br />
# auto goto client<br />
[ "$tty" != "" ] && [ `hostname` = 'lucas' ] && exec gotoclient<br />
To configure bash you must edit the “.bash_profile” file by adding the lines:<br />
SNG_PATH = synergy_directory<br />
export SNG_PATH<br />
where synergy_directory is the directory containing all the binary files <strong>and</strong> the<br />
<strong>Synergy</strong> object library <strong>and</strong> add the following entry to the path:<br />
/usr/users/shi/synergy/bin:<br />
127
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
To activate the new settings enter:<br />
source .bash_profile<br />
at the comm<strong>and</strong> prompt.<br />
Below is an example of the “.bash_profile” file for the Linux OS.<br />
# .bash_profile<br />
# Get the aliases <strong>and</strong> functions<br />
if [ -f ~/.bashrc ]; then<br />
. ~/.bashrc<br />
fi<br />
# <strong>User</strong> specific environment <strong>and</strong> startup programs<br />
PATH=/usr/users/shi/synergy/bin:/usr/java/j2sdk_nb/j2sdk1.4.2/bin:$PATH:$HOME/bin<br />
SNG_PATH = usr/users/shi/synergy<br />
export PATH<br />
export SNG_PATH<br />
unset USERNAME<br />
# auto goto client<br />
[ "$TERM" != "dumb" ] && [ `hostname` = 'lucas' ] && exec gotoclient<br />
Activating a Processor Pool<br />
To activate your personal parallel processors, you will need to start one "cid" one<br />
each of the host either manually or by some shell script at least once.<br />
In addition, if you have special remote display requirements, you need to setup your<br />
display characteristics BEFORE starting cid. For example you may want to monitor a<br />
simulator running on many hosts <strong>and</strong> "steer" the program as it goes.<br />
In this case, you will need to open as many windows as the number of hosts you want to<br />
monitor <strong>and</strong> telnet (rlogin) to these hosts. Then you need to start a cid in each of these<br />
hosts after you designate your display host. Cid has memories. It will send the local<br />
display to the designated host as by the "setenv DISPLAY" comm<strong>and</strong>.<br />
To start cid enter:<br />
128
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
%cid &<br />
Cid will try to connect to another daemon named "pmd". If it could not contact the peer<br />
leader in three times, it will start the peer leader automatically.<br />
To check for the total processor accessibility at any host, enter:<br />
%cds<br />
This comm<strong>and</strong> checks host status for all SELECTED entries in your host file.<br />
Note that you DO NOT have to re-start cid on the de-selected host if you want to reselect<br />
them if a cid is already running, unless you want to change the display setup.<br />
129
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Using <strong>Synergy</strong><br />
The <strong>Synergy</strong> System<br />
Using <strong>Synergy</strong>’s Tuple Space Objects<br />
Using <strong>Synergy</strong>’s Pipe Objects<br />
Using <strong>Synergy</strong>’s File Objects<br />
Compiling <strong>Synergy</strong> Applications<br />
Running <strong>Synergy</strong> Applications<br />
Debugging <strong>Synergy</strong> Applications<br />
130
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Tuple Space Object Programming<br />
A Simple Application – Hello <strong>Synergy</strong>!<br />
The first example given in most introductory computer programming books is the “Hello<br />
World!” program. To get started with <strong>Synergy</strong> programming, the “Hello <strong>Synergy</strong>!”<br />
program will be the first example. The master program (tupleHello1Master.c) simply<br />
opens a tuple space, puts the message in the tuple space <strong>and</strong> terminates. The worker<br />
programs (tupleHello1Worker.c) open the tuple space, read the message from the tuple<br />
space, display the message <strong>and</strong> terminate. The following example programs can be found<br />
in the example01 directory.<br />
The following is the tuple space “Hello <strong>Synergy</strong>!” master program:<br />
#include <br />
#include <br />
main(){<br />
int tplength;<br />
int status;<br />
int P;<br />
int tsd;<br />
char host[128];<br />
char tpname[20];<br />
// Length of ts entry<br />
// Return status for tuple operations<br />
// Number of processors<br />
// Problem tuple space identifier<br />
// Host machine name<br />
// Identifier of ts entry<br />
// Message sent to workers<br />
char sendMsg[50] = "Hello <strong>Synergy</strong>!\0";<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Master: Opening tuple space\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem",0);<br />
printf("Master: Tuple space open complete\n");<br />
// Get number of processors<br />
P = cnf_getP();<br />
printf("Master: Processors %d\n", P);<br />
// Send 'Hello <strong>Synergy</strong>!' to problem tuple space<br />
// Set length of send entry<br />
tplength = sizeof(sendMsg);<br />
// Set name of entry to host<br />
strcpy(tpname, host);<br />
printf("Master: Putting '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put entry in tuple space<br />
131
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
status = cnf_tsput(tsd, tpname, sendMsg, tplength);<br />
printf("Master: Put '%s' complete\n", sendMsg);<br />
// Sleep 1 second<br />
sleep(1);<br />
// Terminate program<br />
printf("Master: Terminated\n");<br />
cnf_term();<br />
The following is the tuple space “Hello <strong>Synergy</strong>!” worker program:<br />
#include <br />
#include <br />
main(){<br />
int tsd;<br />
// Problem tuple space identifier<br />
int status; // Return status for tuple operations<br />
int tplength; // Length of ts entry<br />
char host[128]; // Host machine name<br />
char tpname[20]; // Identifier of ts entry<br />
char recdMsg[50]; // Message received from master<br />
}<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple space<br />
printf("Worker: Opening tuple space\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem",0);<br />
printf("Worker: Tuple space open complete\n");<br />
// Set name to any<br />
strcpy(tpname,"*");<br />
// Read problem from problen tuple space<br />
tplength = cnf_tsread(tsd, tpname, recdMsg, 0);<br />
printf("Worker: Taking item (%s)\n", tpname);<br />
// Normal receive<br />
if (tplength > 0){<br />
printf("Worker: Took message: %s from %s\n",<br />
recdMsg, tpname);<br />
}<br />
// Terminate program<br />
printf("Worker: Terminated\n");<br />
cnf_term();<br />
Before the master <strong>and</strong> worker programs can execute these programs, a Comm<strong>and</strong><br />
Specification Language (csl) file must be created. It would be much more convenient to<br />
use a makefile to compile the programs. Examples of both are below.<br />
132
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
The csl file the programs is:<br />
configuration: tupleHello1;<br />
m: master = tupleHello1Master<br />
(factor = 1<br />
threshold = 1<br />
debug = 0<br />
)<br />
-> f: problem<br />
(type = TS)<br />
-> m: worker = tupleHello1Worker<br />
(type = slave)<br />
-> f: result<br />
(type = TS)<br />
-> m: master;<br />
The makefile for the programs is:<br />
CFLAGS = -O1<br />
OBJS = -L$(SNG_PATH)/obj -lsng -lnsl -lsocket<br />
all : nxdr copy<br />
nxdr : master1 worker1<br />
master1 : tupleHello1Master.c<br />
gcc $(CFLAGS) -o tupleHello1Master tupleHello1Master.c $(OBJS)<br />
worker1 : tupleHello1Worker.c<br />
gcc $(CFLAGS) -o tupleHello1Worker tupleHello1Worker.c $(OBJS)<br />
copy : tupleHello1Master tupleHello1Worker<br />
cp tupleHello1Master $(HOME)/bin<br />
cp tupleHello1Worker $(HOME)/bin<br />
To run the “Hello <strong>Synergy</strong>!” distributed application:<br />
1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />
2. Run the application by typing “prun tupleHello1” <strong>and</strong> pressing the enter key.<br />
The screen output for the master terminal should resemble:<br />
[c615111@owin ~/fpc01 ]>prun tupleHello1<br />
== Checking Processor Pool:<br />
++ Benchmark (186) ++ (owin) ready.<br />
== Done.<br />
== Parallel Application Console: (owin)<br />
== CONFiguring: (tupleHello1.csl)<br />
== Default directory: (/usr/classes/cis6151/c615111/fpc01)<br />
++ Automatic program assignment: (worker)->(owin)<br />
133
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
++ Automatic program assignment: (master)->(owin)<br />
++ Automatic object assignment: (problem)->(owin) pred(1) succ(1)<br />
++ Automatic object assignment: (result)->(owin) pred(1) succ(1)<br />
== Done.<br />
== Starting Distributed Application Controller ...<br />
Verifying process [|(c615111)|*/tupleHello1Master<br />
CID verify ****'d process (bin/tupleHello1Master)<br />
Verifying process [|(c615111)|*/tupleHello1Worker<br />
CID verify ****'d process (bin/tupleHello1Worker)<br />
** (tupleHello1.prcd) verified, all components executable.<br />
CID starting object (result)<br />
CID starting object (problem)<br />
CID starting program. path (bin/tupleHello1Master)<br />
Master: Opening tuple space<br />
CID starting program. path (bin/tupleHello1Worker)<br />
Master: Tuple space open complete<br />
Master: Processors 1<br />
Master: Putting 'Hello <strong>Synergy</strong>!' Length 50 Name owin<br />
Master: Put 'Hello <strong>Synergy</strong>!' complete<br />
Worker: Opening tuple space<br />
** (tupleHello1.prcd) started.<br />
Worker: Tuple space open complete<br />
Worker: Taking item (owin)<br />
Worker: Took message: Hello <strong>Synergy</strong>! from owin<br />
Worker: Terminated<br />
CID. subp(27144) terminated<br />
Setup exit status for (27144)<br />
Master: Terminated<br />
CID. subp(27143) terminated<br />
Setup exit status for (27143)<br />
CID. subp(27141) terminated<br />
Setup exit status for (27141)<br />
== (tupleHello1) completed. Elapsed [1] Seconds.<br />
CID. subp(27142) terminated<br />
Setup exit status for (27142)<br />
[c615111@owin ~/fpc01 ]><br />
The output for the worker terminal should resemble:<br />
CID verify ****'d process (bin/tupleHello1Worker)<br />
CID starting program. path (bin/tupleHello1Worker)<br />
Worker: Opening tuple space<br />
Worker: Tuple space open complete<br />
Worker: Taking item (owin)<br />
Worker: Took message: Hello <strong>Synergy</strong>! from owin<br />
Worker: Terminated<br />
CID. subp(21015) terminated<br />
Setup exit status for (21015)<br />
The output shows <strong>Synergy</strong>’s distributed application initialization screen output, the<br />
execution screen output of the master <strong>and</strong> worker programs, <strong>and</strong> termination screen<br />
output of both programs <strong>and</strong> the distributed application.<br />
134
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
135
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Sending <strong>and</strong> Receiving Data<br />
Hello Workers!—Hello Master!!!<br />
In this example application, the master (tupleHello2Master.c) sends the message “Hello<br />
Workers!” to all workers (tupleHello2Worker.c) <strong>and</strong> gets the response “Hello Master!!!”<br />
<strong>and</strong> the worker’s name from each worker. The source code, makefile <strong>and</strong> csl file for this<br />
application is located in the example02 directory.<br />
The following is the tuple space “Hello Workers!—Hello Master!!!” master program:<br />
#include <br />
#include <br />
main() {<br />
int tplength;<br />
int status;<br />
int P;<br />
int i;<br />
int res;<br />
int tsd;<br />
char host[128];<br />
char tpname[20];<br />
char recdMsg[50];<br />
// Length of ts entry<br />
// Return status for tuple operations<br />
// Number of processors<br />
// Counter index<br />
// Result tuple space identifier<br />
// Problem tuple space identifier<br />
// Host machine name<br />
// Identifier of ts entry<br />
// Message received from workers<br />
// Message sent to workers<br />
char sendMsg[50] = "Hello Workers!\0";<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Master: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem",0);<br />
// Open result tuple space<br />
res = cnf_open("result",0);<br />
printf("Master: Tuple spaces open complete\n");<br />
// Get number of processors<br />
P = cnf_getP();<br />
printf("Master: Processors %d\n", P);<br />
// Send 'Hello <strong>Synergy</strong>!' to problem tuple space<br />
// Set length of send entry<br />
tplength = sizeof(sendMsg);<br />
// Set name of entry to host<br />
strcpy(tpname, host);<br />
printf("Master: Putting '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put entry in tuple space<br />
136
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
status = cnf_tsput(tsd, tpname, sendMsg, tplength);<br />
printf("Master: Put '%s' complete\n", sendMsg);<br />
// Sleep 1 second<br />
sleep(1);<br />
// Receive 'Hello Back!!!' from result tuple space<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Normal receive<br />
if (tplength > 0){<br />
printf("Worker: Took message: %s from %s\n",<br />
recdMsg, tpname);<br />
// Set size of entry<br />
tplength = sizeof(sendMsg);<br />
// Set name to host<br />
sprintf(tpname,"%s", host);<br />
printf("Worker: Put '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put response in result tuple space<br />
status = cnf_tsput(res, tpname, sendMsg, tplength);<br />
printf("Worker: Reply sent\n");<br />
}<br />
// Terminate program<br />
printf("Worker: Terminated\n");<br />
cnf_term();<br />
The makefile <strong>and</strong> csl file are similar to the “Hello <strong>Synergy</strong>!” program except that all<br />
occurrences of “tupleHello1…” is changed to “tupleHello2…” in both files. To run the<br />
“Hello <strong>Synergy</strong>!” distributed application:<br />
1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />
2. Run the application by typing “prun tupleHello2” <strong>and</strong> pressing the enter key.<br />
The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
[c615111@owin ~/fpc02 ]>prun tupleHello2<br />
Master: Tuple spaces open complete<br />
Master: Processors 2<br />
Master: Putting 'Hello Workers!' Length 50 Name owin<br />
Master: Put 'Hello Workers!' complete<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Taking item owin<br />
Worker: Took message: ‘Hello Workers!’ from owin<br />
Worker: Put 'Hello Master!!!' Length 50 Name owin<br />
Worker: Reply sent<br />
Worker: Terminated<br />
Master: Waiting for reply<br />
Master: Taking item from saber<br />
Master: Took message 'Hello Master!!!'<br />
Master: Waiting for reply<br />
Master: Taking item from owin<br />
Master: Took message 'Hello Master!!!'<br />
Master: Terminated<br />
[c615111@owin ~/fpc02 ]><br />
138
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Taking item owin<br />
Worker: Took message: ‘Hello Workers!’ from owin<br />
Worker: Put 'Hello Master!!!' Length 50 Name saber<br />
Worker: Reply sent<br />
Worker: Terminated<br />
139
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Sending <strong>and</strong> Receiving Data Types<br />
Sending Various Data Types<br />
<strong>Synergy</strong> can put <strong>and</strong> get more than characters from its tuple space. The following<br />
example shows how to put various data types into a tuple space <strong>and</strong> get various data types<br />
out of a tuple space. The master program (tuplePassMaster.c) puts different data types<br />
into the problem tuple space, <strong>and</strong> the worker (tuplePassWorker.c) gets them, displays<br />
them <strong>and</strong> puts messages in the result tuple space identifying which data types it took.<br />
This application also uses a distributed semaphore to ensure that the workers take data<br />
properly. It also demonstrates the difference between the cnf_read() <strong>and</strong> cnf_get()<br />
functions. The tuplePass application is located in the example03 directory. The<br />
tuplePass.h file has the definitions for the constant <strong>and</strong> the data structure used in the<br />
application.<br />
The following is the tuple space “data type passing” master program:<br />
#include <br />
#include <br />
#include "tuplePass.h"<br />
main(){<br />
int tplength;<br />
int status;<br />
int P;<br />
int i;<br />
int res;<br />
int tsd;<br />
int sem;<br />
char host[128];<br />
char tpname[20];<br />
char recdMsg[50];<br />
// Length of ts entry<br />
// Return status for tuple operations<br />
// Number of processors<br />
// Counter index<br />
// Result tuple space identifier<br />
// Problem tuple space identifier<br />
// Semaphore<br />
// Host machine name<br />
// Identifier of ts entry<br />
// Message received from workers<br />
// Different datatypes to send to workers<br />
// Integer sent to worker<br />
int num = 12000;<br />
int *numPtr = #<br />
// Long integer sent to worker<br />
long lnum = 1000000;<br />
long *lnumPtr = &lnum;<br />
// Float sent to worker<br />
float frac = 0.5;<br />
float *fracPtr = &frac;<br />
// Double sent to worker<br />
double dfrac = 12345.678;<br />
double *dfracPtr = &dfrac;<br />
// Integer array sent to worker<br />
int numArr[MAX] = {0,1,2,3,4};<br />
140
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
// Double array sent to worker<br />
double dblArr[MAX] = {10000.1234, 2000.567,<br />
300.89, 40.0, 5.01};<br />
// String sent to worker<br />
char sendMsg[50] = "A text string.\0";<br />
// Struct sent to worker<br />
struct person bob = {"Bob",<br />
"123 Broad St.",<br />
"Pliladelphia", "PA", "19124",<br />
20, "brown", 70.5, "red"};<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Master: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem",0);<br />
// Open result tuple space<br />
res = cnf_open("result",0);<br />
printf("Master: Tuple spaces open complete\n");<br />
// Get number of processors<br />
P = cnf_getP();<br />
printf("Master: Processors %d\n", P);<br />
// Put semaphore in problem tuple space<br />
// Set name to sem<br />
strcpy(tpname,"sem");<br />
// Set length for semaphore<br />
tplength = sizeof(int);<br />
// Place the semaphore signal in problem ts<br />
printf("Master: Putting semaphore\n");<br />
status = cnf_tsput(tsd, tpname, &sem, tplength);<br />
// Put int num in ts<br />
// Set length of send entry<br />
tplength = sizeof(int);<br />
// Set name of entry to num<br />
strcpy(tpname, "D_num");<br />
printf("Master: Putting '%d' Length %d Name %s\n",<br />
num, tplength, tpname);<br />
// Put entry in tuple space<br />
status = cnf_tsput(tsd, tpname, numPtr, tplength);<br />
printf("Master: Put '%d' complete\n", num);<br />
// Put long lnum in ts<br />
// Set length of send entry<br />
tplength = sizeof(long);<br />
// Set name of entry to lnum<br />
strcpy(tpname, "D_lnum");<br />
printf("Master: Putting '%ld' Length %d Name %s\n",<br />
lnum, tplength, tpname);<br />
// Put entry in tuple space<br />
status = cnf_tsput(tsd, tpname, lnumPtr, tplength);<br />
printf("Master: Put '%ld' complete\n", lnum);<br />
141
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
// Put float frac in ts<br />
// Set length of send entry<br />
tplength = sizeof(float);<br />
// Set name of entry to frac<br />
strcpy(tpname, "D_frac");<br />
printf("Master: Putting '%f' Length %d Name %s\n",<br />
frac, tplength, tpname);<br />
// Put entry in tuple space<br />
status = cnf_tsput(tsd, tpname, fracPtr, tplength);<br />
printf("Master: Put '%f' complete\n", frac);<br />
// Put double dfrac in ts<br />
// Set length of send entry<br />
tplength = sizeof(double);<br />
// Set name of entry to dfrac<br />
strcpy(tpname, "D_dfrac");<br />
printf("Master: Putting '%g' Length %d Name %s\n",<br />
dfrac, tplength, tpname);<br />
// Put entry in tuple space<br />
status = cnf_tsput(tsd, tpname, (char *)dfracPtr, tplength);<br />
printf("Master: Put '%g' complete\n", dfrac);<br />
// Put int array numArr in ts<br />
// Set length of send entry<br />
tplength = sizeof(int)*MAX;<br />
// Set name of entry to numArr<br />
strcpy(tpname, "D_numArr");<br />
printf("Master: Putting\n ");<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
printf(" %s %s, %s %s\n",<br />
bob.address, bob.city, bob.state, bob.zip);<br />
printf(" %d %s %f %s\n",<br />
bob.age, bob.eyes, bob.height, bob.hair);<br />
printf(" Length %d Name %s\n", tplength, tpname);<br />
// Put entry in tuple space<br />
status = cnf_tsput(tsd, tpname, bob, tplength);<br />
printf("Master: Put struct bob complete\n");<br />
// Put string in ts<br />
// Set length of send entry<br />
tplength = sizeof(sendMsg);<br />
// Set name of entry to msg<br />
strcpy(tpname, "D_msg");<br />
printf("Master: Putting '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put entry in tuple space<br />
status = cnf_tsput(tsd, tpname, sendMsg, tplength);<br />
printf("Master: Put '%s' complete\n", sendMsg);<br />
// Receive results from result tuple space<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
int tsd;<br />
int res;<br />
// Problem tuple space identifier<br />
// Result tuple space identifier<br />
int status; // Return status for tuple operations<br />
int tplength; // Length of ts entry<br />
int i;<br />
// Counter index<br />
int sem = 0; // Semaphore<br />
char host[128]; // Host machine name<br />
char tpname[20]; // Identifier of ts entry<br />
char sendMsg[50]; // Message sent back to master<br />
// Different datatypes to receive from master<br />
// Integer received from master<br />
int num;<br />
// Long integer received from master<br />
long lnum;<br />
// Float received from master<br />
float frac;<br />
// Double received from master<br />
double dfrac;<br />
// Integer array received from master<br />
int numArr[MAX];<br />
// Double array received from master<br />
double dblArr[MAX];<br />
// String received from master<br />
char recdMsg[50];<br />
// Struct received from master<br />
struct person bob;<br />
// Initialize sendMsg<br />
strcpy(sendMsg, "");<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Worker: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem",0);<br />
// Open result tuple space<br />
res = cnf_open("result",0);<br />
printf("Worker: Tuple spaces open complete\n");<br />
while(1){<br />
// Set name to sem<br />
strcpy(tpname,"sem");<br />
// Read semaphore from problem tuple space<br />
tplength = cnf_tsget(tsd, tpname, &sem, 0);<br />
printf("Worker: Taking semaphore\n");<br />
// Set name to any<br />
strcpy(tpname,"D_*");<br />
tplength = cnf_tsread(tsd, tpname, recdMsg, 0);<br />
printf("Worker: Taking item %s\n", tpname);<br />
// Get int num from ts<br />
if(!strcmp(tpname, "D_num")){<br />
144
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Read problem from problem tuple space<br />
tplength = cnf_tsget(tsd, tpname, &num, 0);<br />
// Record the data type received<br />
strcpy(sendMsg, tpname);<br />
// Display the data<br />
printf("Worker: took %s '%d'\n", tpname, num);<br />
// Send reply back to master<br />
// Set size of entry<br />
tplength = sizeof(sendMsg);<br />
// Set name to host<br />
strcpy(tpname, host);<br />
printf("Worker: Put '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put response in result tuple space<br />
status = cnf_tsput(res, tpname, sendMsg, tplength);<br />
printf("Worker: Reply sent\n");<br />
// Get int lnum from ts<br />
else if(!strcmp(tpname, "D_lnum")){<br />
// Read problem from problem tuple space<br />
tplength = cnf_tsget(tsd, tpname, &lnum, 0);<br />
// Record the data type recieve<br />
strcpy(sendMsg, tpname);<br />
// Display the data<br />
printf("Worker: took %s '%ld'\n", tpname, lnum);<br />
// Send reply back to master<br />
// Set size of entry<br />
tplength = sizeof(sendMsg);<br />
// Set name to host<br />
strcpy(tpname, host);<br />
printf("Worker: Put '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put response in result tuple space<br />
status = cnf_tsput(res, tpname, sendMsg, tplength);<br />
printf("Worker: Reply sent\n");<br />
}<br />
// Get int frac from ts<br />
else if(!strcmp(tpname, "D_frac")){<br />
// Read problem from problem tuple space<br />
tplength = cnf_tsget(tsd, tpname, &frac, 0);<br />
// Record the data type received<br />
strcpy(sendMsg, tpname);<br />
// Display the data<br />
printf("Worker: took %s '%f'\n", tpname, frac);<br />
// Send reply back to master<br />
// Set size of entry<br />
tplength = sizeof(sendMsg);<br />
// Set name to host<br />
strcpy(tpname, host);<br />
printf("Worker: Put '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put response in result tuple space<br />
status = cnf_tsput(res, tpname, sendMsg, tplength);<br />
printf("Worker: Reply sent\n");<br />
145
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Get double dfrac from ts<br />
else if(!strcmp(tpname, "D_dfrac")){<br />
// Read problem from problem tuple space<br />
tplength = cnf_tsget(tsd, tpname, &dfrac, 0);<br />
// Record the data type received<br />
strcpy(sendMsg, tpname);<br />
// Display the data<br />
printf("Worker: took (%s) '%g'\n", tpname, dfrac);<br />
// Send reply back to master<br />
// Set size of entry<br />
tplength = sizeof(sendMsg);<br />
// Set name to host<br />
strcpy(tpname, host);<br />
printf("Worker: Put '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put response in result tuple space<br />
status = cnf_tsput(res, tpname, sendMsg, tplength);<br />
printf("Worker: Reply sent\n");<br />
}<br />
// Get integer array numArr<br />
else if(!strcmp(tpname, "D_numArr")){<br />
// Read problem from problem tuple space<br />
tplength = cnf_tsget(tsd, tpname, numArr, 0);<br />
// Record the data type received<br />
strcpy(sendMsg, tpname);<br />
// Display the data<br />
printf("Worker: took %s\n ", tpname);<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Send reply back to master<br />
// Set size of entry<br />
tplength = sizeof(sendMsg);<br />
// Set name to host<br />
strcpy(tpname, host);<br />
printf("Worker: Put '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put response in result tuple space<br />
status = cnf_tsput(res, tpname, sendMsg, tplength);<br />
printf("Worker: Reply sent\n");<br />
// Get struct person bob<br />
else if(!strcmp(tpname, "D_bob")){<br />
// Read problem from problem tuple space<br />
tplength = cnf_tsget(tsd, tpname, &bob, 0);<br />
// Record the data type received<br />
strcpy(sendMsg, tpname);<br />
// Display the data<br />
printf("Worker: took\n");<br />
printf(" %s\n", bob.name);<br />
printf(" %s %s, %s %s\n", bob.address,<br />
bob.city, bob.state, bob.zip);<br />
printf(" %d %s %f %s\n", bob.age, bob.eyes,<br />
bob.height, bob.hair);<br />
printf(" Length %d Name %s\n", tplength, tpname);<br />
// Send reply back to master<br />
// Set size of entry<br />
tplength = sizeof(sendMsg);<br />
// Set name to host<br />
strcpy(tpname, host);<br />
printf("Worker: Put '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put response in result tuple space<br />
status = cnf_tsput(res, tpname, sendMsg, tplength);<br />
printf("Worker: Reply sent\n");<br />
}<br />
// Get string<br />
else if(!strcmp(tpname, "D_msg")){<br />
// Read problem from problem tuple space<br />
tplength = cnf_tsget(tsd, tpname, recdMsg, 0);<br />
// Record the data type received<br />
strcpy(sendMsg, tpname);<br />
// Display the data<br />
printf("Worker: took %s '%s'\n", tpname, recdMsg);<br />
// Send reply back to master<br />
// Set size of entry<br />
tplength = sizeof(sendMsg);<br />
// Set name to host<br />
strcpy(tpname, host);<br />
printf("Worker: Put '%s' Length %d Name %s\n",<br />
sendMsg, tplength, tpname);<br />
// Put response in result tuple space<br />
status = cnf_tsput(res, tpname, sendMsg, tplength);<br />
printf("Worker: Reply sent\n");<br />
147
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Get terminal<br />
else if(!strcmp(tpname, "D_term")){<br />
printf("Worker: Received terminal\n");<br />
// Set name to sem<br />
strcpy(tpname,"sem");<br />
// Set length for semaphore<br />
tplength = sizeof(int);<br />
// Replace the semaphore signal in problem ts<br />
printf("Worker: Putting semaphore\n");<br />
status = cnf_tsput(tsd, tpname, &sem, tplength);<br />
break;<br />
}<br />
// Set name to sem<br />
strcpy(tpname,"sem");<br />
// Set length for semaphore<br />
tplength = sizeof(int);<br />
// Replace the semaphore signal in problem ts<br />
printf("Worker: Putting semaphore\n");<br />
status = cnf_tsput(tsd, tpname, &sem, tplength);<br />
// Sleep 1 second<br />
sleep(1);<br />
}<br />
// Terminate program<br />
printf("Worker: Terminated\n");<br />
cnf_term();<br />
}<br />
The makefile <strong>and</strong> csl file are similar to the last two applications except in the naming of<br />
the application objects <strong>and</strong> files. To run the data passing distributed application:<br />
1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />
2. Run the application by typing “prun tuplePass” <strong>and</strong> pressing the enter key.<br />
The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
[c615111@owin ~/fpc03 ]>prun tuplePass2<br />
Master: Opening tuple spaces<br />
Master: Tuple spaces open complete<br />
Master: Processors 2<br />
Master: Putting semaphore<br />
Master: Putting '12000' Length 4 Name D_num<br />
Master: Put '12000' complete<br />
Master: Putting '1000000' Length 4 Name D_lnum<br />
Master: Put '1000000' complete<br />
Master: Putting '0.500000' Length 4 Name D_frac<br />
148
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Master: Put '0.500000' complete<br />
Master: Putting '12345.7' Length 8 Name D_dfrac<br />
Master: Put '12345.7' complete<br />
Master: Putting<br />
0 1 2 3 4<br />
Length 20 Name D_numArr<br />
Master: Put 'D_numArr' complete<br />
Master: Putting<br />
10000.1 2000.57 300.89 40 5.01<br />
Length 40 Name D_dblArr<br />
Master: Put 'D_dblArr' complete<br />
Master: Putting<br />
Bob<br />
123 Broad St. Pliladelphia, PA 19124<br />
20 brown 70.500000 red<br />
Length 164 Name D_bob<br />
Master: Put struct bob complete<br />
Master: Putting 'A text string.' Length 50 Name D_msg<br />
Master: Put 'A text string.' complete<br />
Master: Waiting for reply<br />
Master: Taking item from saber<br />
Master: saber took 'D_num'<br />
Master: Waiting for reply<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Taking semaphore<br />
Worker: Taking item D_lnum<br />
Worker: took D_lnum '1000000'<br />
Worker: Put 'D_lnum' Length 50 Name owin<br />
Master: Taking item from owin<br />
Master: owin took 'D_lnum'<br />
Master: Waiting for reply<br />
Worker: Reply sent<br />
Worker: Putting semaphore<br />
Master: Taking item from saber<br />
Master: saber took 'D_frac'<br />
Master: Waiting for reply<br />
Worker: Taking semaphore<br />
Worker: Taking item D_dfrac<br />
Worker: took (D_dfrac) '12345.7'<br />
Worker: Put 'D_dfrac' Length 50 Name owin<br />
Master: Taking item from owin<br />
Master: owin took 'D_dfrac'<br />
Master: Waiting for reply<br />
Worker: Reply sent<br />
Worker: Putting semaphore<br />
Master: Taking item from saber<br />
Master: saber took 'D_numArr'<br />
Master: Waiting for reply<br />
Worker: Taking semaphore<br />
Worker: Taking item D_dblArr<br />
Worker: took D_dblArr<br />
10000.1 2000.57 300.89 40 5.01<br />
Length 40 Name D_dblArr<br />
Worker: Put 'D_dblArr' Length 50 Name owin<br />
Worker: Reply sent<br />
149
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Worker: Putting semaphore<br />
Master: Taking item from owin<br />
Master: owin took 'D_dblArr'<br />
Master: Waiting for reply<br />
Master: Taking item from saber<br />
Master: saber took 'D_bob'<br />
Master: Waiting for reply<br />
Worker: Taking semaphore<br />
Worker: Taking item D_msg<br />
Worker: took D_msg 'A text string.'<br />
Worker: Put 'D_msg' Length 50 Name owin<br />
Worker: Reply sent<br />
Worker: Putting semaphore<br />
Master: Taking item from owin<br />
Master: owin took 'D_msg'<br />
Master: Putting terminal signal in problem ts<br />
Master: Put terminal in ts<br />
Master: Terminated<br />
Worker: Taking semaphore<br />
Worker: Taking item D_term<br />
Worker: Received terminal<br />
Worker: Putting semaphore<br />
Worker: Terminated<br />
The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Taking semaphore<br />
Worker: Taking item D_num<br />
Worker: took D_num '12000'<br />
Worker: Put 'D_num' Length 50 Name saber<br />
Worker: Reply sent<br />
Worker: Putting semaphore<br />
Worker: Taking semaphore<br />
Worker: Taking item D_frac<br />
Worker: took D_frac '0.500000'<br />
Worker: Put 'D_frac' Length 50 Name saber<br />
Worker: Reply sent<br />
Worker: Putting semaphore<br />
Worker: Taking semaphore<br />
Worker: Taking item D_numArr<br />
Worker: took D_numArr<br />
0 1 2 3 4<br />
Length(20) Name(D_numArr)<br />
Worker: Put 'D_numArr' Length 50 Name saber<br />
Worker: Reply sent<br />
Worker: Putting semaphore<br />
Worker: Taking semaphore<br />
Worker: Taking item D_bob<br />
Worker: took<br />
Bob<br />
123 Broad St. Philadelphia, PA 19124<br />
150
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
20 brown 70.500000 red<br />
Length 164 Name D_bob<br />
Worker: Put 'D_bob' Length 50 Name saber<br />
Worker: Reply sent<br />
Worker: Putting semaphore<br />
Worker: Taking semaphore<br />
Worker: Taking item D_term<br />
Worker: Received terminal<br />
Worker: Putting semaphore<br />
Worker: Terminated<br />
151
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Getting Workers to Work<br />
Sum of First N Integers<br />
The calculation of the sum of the first n integers or ∑i<br />
can be easily calculated in a<br />
regular computer program. An ANSI C program would be:<br />
#include <br />
#define N 6<br />
int main{<br />
int i;<br />
int sum = 0;<br />
}<br />
for(i=N; i>=N; i--)<br />
sum+=i;<br />
printf(“The sum of the first %d integers is %d\n”, N, sum);<br />
return 0;<br />
This problem can easily be performed in a parallel program by having the master<br />
(tupleSum1Master.c) put each integer into the problem tuple space. The workers<br />
(tupleSum1Workers.c) take the integers out of the problem tuple space, tally their<br />
respective sub sums <strong>and</strong> put the sub sums into the result tuple space. The master gets the<br />
sub sums from the result tuple space <strong>and</strong> produces the desires sum. This application is<br />
located in the example04 directory.<br />
The following is the tuple space sum of n integers master program:<br />
n<br />
i=<br />
1<br />
#include <br />
#include <br />
main(){<br />
int P;<br />
// Number of processors<br />
int i;<br />
// Counter index<br />
int status;<br />
// Return status for tuple operations<br />
int res;<br />
// Result tuple space identifier<br />
int tsd;<br />
// Problem tuple space identifier<br />
int maxNum = 6;<br />
// MAX of n for sum of 1..n<br />
int sendNum = 0;<br />
// Number sent to problem ts<br />
int *sendPtr = &sendNum; // Pointer to sendNum<br />
int recdSum = 0;<br />
// Subsum received from result ts<br />
int *recdPtr = &recdSum; // Pointer to recdSum<br />
int calcSum = 0;<br />
// Calculated sum<br />
int sumTotal = 0;<br />
// Sum total of all subsums<br />
int tplength;<br />
// Length of ts entry<br />
152
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
char tpname[20];<br />
char host[128];<br />
// Identifier of ts entry<br />
// Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Master: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem", 0);<br />
// Open result tuple space<br />
res = cnf_open("result", 0);<br />
printf("Master: Tuple spaces open complete\n");<br />
// Get number of processors<br />
P = cnf_getP();<br />
printf("Master: Processors %d\n", P);<br />
// Send integers to problem tuple space<br />
// Set length of entry<br />
tplength = sizeof(int);<br />
printf("Master: tplength = (%d)\n", tplength);<br />
// Set maximum n<br />
sendNum = maxNum;<br />
printf("Master: Putting 1...%d to problem tuple space\n", maxNum);<br />
// Loop until all numbers are sent to workers<br />
while (sendNum > 0) {<br />
printf("Master: Putting %d\n", sendNum);<br />
// Set name of entry<br />
sprintf(tpname,"%d", sendNum);<br />
// Put entry in problem tuple space<br />
status = cnf_tsput(tsd, tpname, (char *)sendPtr, tplength);<br />
// Decrement number to set entry value<br />
sendNum--;<br />
}<br />
printf("Master: Finished sending 1...%d to tuple space\n", maxNum);<br />
// Insert negative integer tuple as termination signal<br />
printf("Master: Sending terminal signal\n");<br />
// Set length of entry<br />
tplength = sizeof(int);<br />
// Set entry value<br />
sendNum = -1;<br />
// Set entry name<br />
sprintf(tpname, "%d", maxNum+1);<br />
// Put entry in problem tuple space<br />
status = cnf_tsput(tsd, tpname, (char *)sendPtr, tplength);<br />
printf("Master: Finished sending terminal signal\n");<br />
// Receive sub sums from result tuple space<br />
i = 1;<br />
printf("Master: Getting sub sums from result tuple space\n");<br />
while (i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
tplength = cnf_tsget(res, tpname, (char *)recdPtr, 0);<br />
printf("Master: Received %d from %s\n", recdSum, tpname);<br />
// Add result to total<br />
sumTotal += recdSum;<br />
// Increment counter<br />
i++;<br />
}<br />
printf("Master: The sum total is: %d\n", sumTotal);<br />
// Calculate correct answer with math formula<br />
calcSum = (maxNum*(maxNum+1))/2;<br />
printf ("Master: The calculated sum is: %d\n", calcSum);<br />
// Compare results<br />
if(calcSum == sumTotal)<br />
printf("Master: The workers gave the correct answer\n");<br />
else<br />
printf("Master: The workers gave an incorrect answer\n");<br />
// Terminate program<br />
printf("Master: Terminated\n");<br />
cnf_term();<br />
The following is the tuple space sum of n integers worker program:<br />
#include <br />
#include <br />
main(){<br />
// Variable declarations<br />
int tsd;<br />
// Problem tuple space identifier<br />
int res;<br />
// Result tuple space identifier<br />
int recdNum = 0;<br />
// Number received to be added<br />
int *recdPtr = &recdNum; // Pointer to recdNum<br />
int sendSum = 0;<br />
// Sum of numbers received<br />
int *sendPtr = &sendSum; // Pointer to sendSum<br />
int status;<br />
// Return status for tuple operations<br />
int tplength;<br />
// Length of ts entry<br />
char tpname[20];<br />
// Identifier of ts entry<br />
char host[128];<br />
// Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Worker: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem", 0);<br />
// Open result tuple space<br />
res = cnf_open("result", 0);<br />
printf("Worker: Tuple spaces open complete\n");<br />
// Loop forever to accumulate sendSum<br />
154
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
printf("Worker: Beginning to accumulate sum\n");<br />
while(1){<br />
// Set name to any<br />
strcpy(tpname, "*");<br />
// Get problem from tuple space<br />
tplength = cnf_tsget(tsd, tpname, (char *)recdPtr, 0);<br />
printf("Worker: Took item %s\n", tpname);<br />
// If normal receive<br />
if(recdNum > 0){<br />
// Add to sum<br />
sendSum += recdNum;<br />
printf("Worker: Present subtotal is %d\n", sendSum);<br />
}<br />
// Else terminate worker<br />
else{<br />
printf("Worker: Received terminal signal\n");<br />
// Put terminal message back in problem tuple space<br />
status = cnf_tsput(tsd, tpname, (char *)recdPtr, tplength);<br />
// Set length of entry<br />
tplength = sizeof(int);<br />
// Set name of entry to host<br />
sprintf(tpname,"%s", host);<br />
printf("Worker: Sending sum %d\n", sendSum);<br />
// Put sum in result tuple space<br />
status = cnf_tsput(res, tpname, (char *)sendPtr, tplength);<br />
// Terminate worker<br />
printf("Worker: Terminated\n");<br />
cnf_term();<br />
}<br />
// Sleep 1 second<br />
sleep(1);<br />
}<br />
To run the sum of first n integers distributed application:<br />
1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />
2. Run the application by typing “prun tupleSum1” <strong>and</strong> pressing the enter key.<br />
The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
[c615111@owin ~/fpc04 ]>prun tupleSum1<br />
Master: Opening tuple spaces<br />
Master: Tuple spaces open complete<br />
Master: Processors 2<br />
Master: tplength = (4)<br />
Master: Putting 1...6 to problem tuple space<br />
Master: Putting 6<br />
Master: Putting 5<br />
Master: Putting 4<br />
Master: Putting 3<br />
Master: Putting 2<br />
155
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Master: Putting 1<br />
Master: Finished sending 1...6 to tuple space<br />
Master: Sending terminal signal<br />
Master: Finished sending terminal signal<br />
Master: Getting sub sums from result tuple space<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Beginning to accumulate sum<br />
Worker: Took item 5<br />
Worker: Present subtotal is 5<br />
Worker: Took item 3<br />
Worker: Present subtotal is 8<br />
Worker: Took item 1<br />
Worker: Present subtotal is 9<br />
Master: Received 12 from saber<br />
Worker: Took item 7<br />
Worker: Received terminal signal<br />
Worker: Sending sum 9<br />
Worker: Terminated<br />
Master: Received 9 from owin<br />
Master: The sum total is: 21<br />
Master: The calculated sum is: 21<br />
Master: The workers gave the correct answer<br />
Master: Terminated<br />
[c615111@owin ~/fpc04 ]><br />
The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
Worker: Tuple spaces open complete<br />
Worker: Beginning to accumulate sum<br />
Worker: Took item 6<br />
Worker: Present subtotal is 6<br />
Worker: Took item 4<br />
Worker: Present subtotal is 10<br />
Worker: Took item 2<br />
Worker: Present subtotal is 12<br />
Worker: Took item 7<br />
Worker: Received terminal signal<br />
Worker: Sending sum 12<br />
Worker: Terminated<br />
Matrix Multiplication<br />
Matrix multiplication, A ⋅ B = C, can be performed by a traditional C program using the<br />
following function:<br />
void multIntMats(int A[N][N], int B[N][N], int C[N][N]){<br />
int i=0, j=0, k=0;<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
}<br />
for(j=0; j
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
The procedure multiplies an array (or vector) by a matrix. An example of this procedure<br />
is:<br />
0<br />
0<br />
1 0 1 0 1 0<br />
A0 ( 1 0 1 0 0 0 ) B =<br />
C0 A0. B C0 = ( 1 0 0 0 0 0 )<br />
0 1 0 1 0 1<br />
1<br />
0<br />
0<br />
0<br />
0<br />
1<br />
1<br />
0<br />
1<br />
0<br />
0<br />
1<br />
0<br />
1<br />
1<br />
0<br />
0<br />
0<br />
0<br />
1<br />
0<br />
0<br />
1<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
1<br />
0<br />
1<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
1<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
1<br />
0<br />
1<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
A<br />
1<br />
0<br />
0<br />
1<br />
1<br />
0<br />
0<br />
1<br />
1<br />
0<br />
0<br />
1<br />
1 0 1 0 1 0<br />
0 0 1 0 0 0<br />
B = C A . B C = B A 1<br />
0 1 0 1 0 1<br />
0 0 0 1 0 0<br />
0<br />
0<br />
1<br />
0<br />
1<br />
0<br />
1<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
1<br />
0<br />
1<br />
0<br />
1<br />
0<br />
1<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
1<br />
The master will know which row to put the Ci results in because the tuple name (the i)<br />
will be the row number, which is also the tuple entry name. The multiplication of A <strong>and</strong><br />
B after the results were taken out of the result tuple space <strong>and</strong> assembled by the master<br />
would be:<br />
Notice that the multiplication produces the identity matrix. The B matrices used in<br />
examples are intentionally set to be the inverse of their respective A matrices to<br />
demonstrate that the programs actually work. The files for this application are located in<br />
the example05 directory. The master program for the matrix multiplication is:<br />
#include <br />
#include <br />
#include <br />
#define N 6<br />
158
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
main(){<br />
int i, j;<br />
int tplength;<br />
int status;<br />
int P;<br />
int res;<br />
int tsd;<br />
int n;<br />
int Ai[N];<br />
int Ci[N];<br />
char host[128];<br />
char tpname[20];<br />
// Matrix indices<br />
// Length of ts entry<br />
// Return status for tuple operations<br />
// Number of processors<br />
// Result tuple space identifier<br />
// Problem tuple space identifier<br />
// Counter<br />
// Row from A to send to worker<br />
// Row from C to get from worker<br />
// Host machine name<br />
// Identifier of ts entry<br />
// The A matrix to break up into arrays<br />
// <strong>and</strong> send to workers<br />
int A[N][N] = {{1,0,1,0,0,0},<br />
{0,1,0,1,0,0},<br />
{1,0,1,0,1,0},<br />
{0,1,0,1,0,1},<br />
{0,0,1,0,1,0},<br />
{0,0,0,1,0,1}};<br />
// The B matrix to send to workers<br />
int B[N][N] = {{0,0,1,0,-1,0},<br />
{0,0,0,1,0,-1},<br />
{1,0,-1,0,1,0},<br />
{0,1,0,-1,0,1},<br />
{-1,0,1,0,0,0},<br />
{0,-1,0,1,0,0}};<br />
// The C matrix built from arrays<br />
// received from workers<br />
int C[N][N];<br />
printf("Master: started\n");<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Master: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem",0);<br />
// Open result tuple space<br />
res = cnf_open("result",0);<br />
printf("Master: Tuple spaces open complete\n");<br />
// Get number of processors<br />
P = cnf_getP(); // Get number of processors<br />
printf("Master: Processors %d\n", P);<br />
// Print matrix A <strong>and</strong> B<br />
printf("Master: Matrix A\n");<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
printf("\n");<br />
}<br />
printf("Master: Matrix B\n");<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Print the C matrix from workers<br />
printf("Master: Matrix C\n");<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
// Set name to B<br />
strcpy(tpname,"B");<br />
// Read B matrix from problem tuple space<br />
status = cnf_tsread(tsd, tpname, B, 0);<br />
tplength = (N*N)*sizeof(double);<br />
printf("Worker: Matrix B\n");<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
status = cnf_tsput(res, tpname, Ci, tplength);<br />
sleep(1);<br />
}<br />
}<br />
// Else a zero length tuple was received<br />
else{<br />
printf("Worker: Error-received zero length tuple");<br />
printf("Worker: Terminated\n");<br />
cnf_term();<br />
}<br />
To run the matrix multiplication distributed application:<br />
1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />
2. Run the application by typing “prun tupleMat1” <strong>and</strong> pressing the enter key.<br />
The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
[c615111@owin ~/fpc05 ]>prun tupleMat1<br />
Master: Tuple spaces open complete<br />
Master: Processors 2<br />
Master: Matrix A<br />
1 0 1 0 0 0<br />
0 1 0 1 0 0<br />
1 0 1 0 1 0<br />
0 1 0 1 0 1<br />
0 0 1 0 1 0<br />
0 0 0 1 0 1<br />
Master: Matrix B<br />
0 0 1 0 -1 0<br />
0 0 0 1 0 -1<br />
1 0 -1 0 1 0<br />
0 1 0 -1 0 1<br />
-1 0 1 0 0 0<br />
0 -1 0 1 0 0<br />
Master: Starting C = A . B<br />
Master: Putting Length 144 Name B<br />
Master: tplength = 24<br />
Master: Putting item A0 1 0 1 0 0 0<br />
Master: Putting item A1 0 1 0 1 0 0<br />
Master: Putting item A2 1 0 1 0 1 0<br />
Master: Putting item A3 0 1 0 1 0 1<br />
Master: Putting item A4 0 0 1 0 1 0<br />
Master: Putting item A5 0 0 0 1 0 1<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Matrix B<br />
0 0 1 0 -1 0<br />
0 0 0 1 0 -1<br />
Master: Received 0<br />
163
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
1 0 -1 0 1 0<br />
0 1 0 -1 0 1<br />
-1 0 1 0 0 0<br />
0 -1 0 1 0 0<br />
Worker: Taking item A1<br />
0 1 0 1 0 0<br />
Worker : Array CA1 0 1 0 0 0 0<br />
Master: Received 1<br />
Worker: Taking item A2<br />
1 0 1 0 1 0<br />
Worker : Array CA2 0 0 1 0 0 0<br />
Master: Received 2<br />
Master: Received 3<br />
Worker: Taking item A4<br />
0 0 1 0 1 0<br />
Worker : Array CA4 0 0 0 0 1 0<br />
Master: Received 4<br />
Master: Received 5<br />
Master: Matrix C<br />
1 0 0 0 0 0<br />
0 1 0 0 0 0<br />
0 0 1 0 0 0<br />
0 0 0 1 0 0<br />
0 0 0 0 1 0<br />
0 0 0 0 0 1<br />
Master: Putting terminal signal<br />
Master: Terminated<br />
Worker: Taking item A6<br />
Worker: Terminated<br />
[c615111@owin ~/fpc05 ]><br />
The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Matrix B<br />
0 0 1 0 -1 0<br />
0 0 0 1 0 -1<br />
1 0 -1 0 1 0<br />
0 1 0 -1 0 1<br />
-1 0 1 0 0 0<br />
0 -1 0 1 0 0<br />
Worker: Taking item A0<br />
1 0 1 0 0 0<br />
Worker : Array CA0 1 0 0 0 0 0<br />
Worker: Taking item A3<br />
0 1 0 1 0 1<br />
Worker : Array CA3 0 0 0 1 0 0<br />
Worker: Taking item A5<br />
0 0 0 1 0 1<br />
Worker : Array CA5 0 0 0 0 0 1<br />
Worker: Taking item A6<br />
Worker: Terminated<br />
164
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
165
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Work Distribution by Chunking<br />
Finding the Sum of the First n Integers with Chunking<br />
The following is the tuple space “sum of n integers” master program implemented by<br />
sending work in chunks:<br />
#include <br />
#include <br />
#define N 32<br />
main(){<br />
int P;<br />
// Number of processors<br />
int chunk_size;<br />
// Chunk size<br />
int remainder;<br />
// Remainder of numbers to be sent<br />
int i;<br />
// Counter index<br />
int job;<br />
// Job number<br />
int status;<br />
// Return status for tuple operations<br />
int res;<br />
// Result tuple space identifier<br />
int tsd;<br />
// Problem tuple space identifier<br />
int *sendArr = 0;<br />
// Number sent to problem ts<br />
int sendNum;<br />
// Number sent to worker in sendArr<br />
int recdSum = 0;<br />
// Subsum recieved from result ts<br />
int *recdPtr = &recdSum; // Pointer to recdSum<br />
int calcSum = 0;<br />
// Calculated sum<br />
int sumTotal = 0;<br />
// Sum total of all subsums<br />
int tplength;<br />
// Length of ts entry<br />
char tpname[20];<br />
// Identifier of ts entry<br />
char host[128];<br />
// Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Master: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem", 0);<br />
// Open result tuple space<br />
res = cnf_open("result", 0);<br />
printf("Master: Tuple spaces open complete\n");<br />
// Get number of processors<br />
P = cnf_getP();<br />
printf("Master: Processors %d\n", P);<br />
// Get chunk size<br />
chunk_size = cnf_getf();<br />
printf("Master: Chunk size %d\n", chunk_size);<br />
// Put chunk size in ts<br />
// Set length of entry<br />
166
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
tplength = sizeof(int);<br />
// Set name of entry<br />
strcpy(tpname, "chunk_size");<br />
// Put entry in ts<br />
status = cnf_tsput(tsd, tpname, &chunk_size, tplength);<br />
printf("Master: Sent chunk size\n");<br />
// Send integers to problem tuple space<br />
// Set length of entry to chunk_size + 1 integers<br />
tplength = (chunk_size+1) * sizeof(int);<br />
printf("Master: tplength = %d\n", tplength);<br />
// Prepare <strong>and</strong> send integer arrays into tuple space<br />
printf("Master: Putting 1...%d to problem tuple space\n", N);<br />
if((sendArr = (int *) malloc(tplength)) == NULL)<br />
exit(1);<br />
// Loop until all numbers are sent to workers<br />
remainder = N;<br />
job = 0;<br />
sendNum = 1;<br />
while (remainder > 0) {<br />
if (remainder < chunk_size)<br />
chunk_size = remainder;<br />
remainder = remainder - chunk_size;<br />
job++;<br />
// Set name of entry to job number<br />
sprintf(tpname,"A%d", job);<br />
// Put chunk_size in index zero<br />
sendArr[0] = chunk_size;<br />
printf("Master: Putting %s Size %d\n ", tpname, sendArr[0]);<br />
// Put chunk_size integers in array<br />
for(i=1; i 0){<br />
// Set name of entry to any<br />
strcpy(tpname,"*");<br />
// Get entry from result tuple space<br />
tplength = cnf_tsget(res, tpname, (char *)recdPtr, 0);<br />
printf("Master: Recieved %d from %s\n", recdSum, tpname);<br />
// Add result to total<br />
sumTotal += recdSum;<br />
// Increment counter<br />
}<br />
167
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
printf("Master: The sum total is: %d\n", sumTotal);<br />
// Calculate correct answer with math formula<br />
calcSum = (N*(N+1))/2;<br />
printf ("Master: The formula calculated sum is: %d\n", calcSum);<br />
// Compare results<br />
if(calcSum == sumTotal)<br />
printf("Master: The workers gave the correct answer\n");<br />
else<br />
printf("Master: The workers gave an incorrect answer\n");<br />
// Insert negative integer tuple as termination signal<br />
printf("Master: Sending terminal signal\n");<br />
// Set length of entry<br />
tplength = (1) * sizeof(int);<br />
// Set entry value<br />
sendArr[0] = -1;<br />
// Set entry name<br />
sprintf(tpname, "A%d", N+1);<br />
// Send entry to tuple space<br />
status = cnf_tsput(tsd, tpname, sendArr, tplength);<br />
printf("Master: Finished sending terminal signal\n");<br />
// Terminate program<br />
printf("Master: Terminated\n");<br />
cnf_term();<br />
The following is the tuple space “sum of n integers” worker program implemented by<br />
receiving work in chunks:<br />
#include <br />
#include <br />
main(){<br />
// Variable declarations<br />
int tsd;<br />
// Problem tuple space identifier<br />
int res;<br />
// Result tuple space identifier<br />
int *recdPtr;<br />
// Pointer to recd array<br />
int sendSum = 0;<br />
// Sum of numbers received<br />
int *sendPtr = &sendSum; // Pointer to sendSum<br />
int status;<br />
// Return status for tuple operations<br />
int tplength;<br />
// Length of ts entry<br />
int chunk_size;<br />
// Size of recdPtr<br />
int i;<br />
// Index counter<br />
char tpname[20];<br />
// Identifier of ts entry<br />
char host[128];<br />
// Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
168
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
printf("Worker: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem", 0);<br />
// Open result tuple space<br />
res = cnf_open("result", 0);<br />
printf("Worker: Tuple spaces open complete\n");<br />
// Get the chunk size from ts<br />
// Set name of entry<br />
strcpy(tpname, "chunk_size");<br />
// Read chunk size<br />
status = cnf_tsread(tsd, tpname, &chunk_size, 0);<br />
printf("Worker: Chunk size %d\n", chunk_size);<br />
// Set length of tuple space entry<br />
tplength = (chunk_size+1) * sizeof(int);<br />
// Allocate memory for entry<br />
if((recdPtr = (int *)malloc(tplength)) == NULL)<br />
exit(-1);<br />
printf("Worker: array size %d\n", tplength);<br />
// Loop forever to accumulate sendSum<br />
printf("Worker: Begining to accumulate sum\n");<br />
while(1){<br />
sendSum = 0;<br />
// Set name to any<br />
strcpy(tpname, "A*");<br />
// Get problem from tuple space<br />
tplength = cnf_tsget(tsd, tpname, recdPtr, 0);<br />
// Get chunk_size from index zero<br />
chunk_size = (int) recdPtr[0];<br />
printf("Worker: Took item %s length %d\n ", tpname, chunk_size);<br />
// If normal receive<br />
if(chunk_size > 0){<br />
// Get number of array elements<br />
// Add to sendSum<br />
for(i=1; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
}<br />
}<br />
// Sleep 1 second<br />
sleep(1);<br />
To run the sum of first n integers distributed application with chunking:<br />
1. Make the executables by typing “make” <strong>and</strong> pressing the enter key.<br />
2. Run the application by typing “prun tupleSum2” <strong>and</strong> pressing the enter key.<br />
The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
[c615111@owin ~/fpc06 ]>prun tupleSum2<br />
Master: Opening tuple spaces<br />
Master: Tuple spaces open complete<br />
Master: Processors 2<br />
Master: Chunk size 4<br />
Master: Sent chunk size<br />
Master: tplength = 20<br />
Master: Putting 1...32 to problem tuple space<br />
Master: Putting A1 Size 4<br />
1 2 3 4<br />
Master: Putting A2 Size 4<br />
5 6 7 8<br />
Master: Putting A3 Size 4<br />
9 10 11 12<br />
Master: Putting A4 Size 4<br />
13 14 15 16<br />
Master: Putting A5 Size 4<br />
17 18 19 20<br />
Master: Putting A6 Size 4<br />
21 22 23 24<br />
Master: Putting A7 Size 4<br />
25 26 27 28<br />
Master: Putting A8 Size 4<br />
29 30 31 32<br />
Master: Finished sending 1...32 to tuple space<br />
Master: Getting sub sums from result tuple space<br />
Master: Recieved 10 from saber<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Chunk size 4<br />
Worker: array size 20<br />
Worker: Begining to accumulate sum<br />
Worker: Took item A2 length 4<br />
5 6 7 8<br />
Worker: Sending sum 26<br />
Master: Recieved 26 from owin<br />
Master: Recieved 42 from saber<br />
Worker: Took item A4 length 4<br />
13 14 15 16<br />
170
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Worker: Sending sum 58<br />
Master: Recieved 58 from owin<br />
Master: Recieved 74 from saber<br />
Worker: Took item A6 length 4<br />
21 22 23 24<br />
Worker: Sending sum 90<br />
Master: Recieved 90 from owin<br />
Master: Recieved 106 from saber<br />
Worker: Took item A8 length 4<br />
29 30 31 32<br />
Worker: Sending sum 122<br />
Master: Recieved 122 from owin<br />
Master: The sum total is: 528<br />
Master: The formula calculated sum is: 528<br />
Master: The workers gave the correct answer<br />
Master: Sending terminal signal<br />
Master: Finished sending terminal signal<br />
Master: Terminated<br />
Worker: Took item A33 length -1<br />
Worker: Recieved terminal signal<br />
Worker: Terminated<br />
[c615111@owin ~/fpc06 ]><br />
The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Chunk size 4<br />
Worker: array size 20<br />
Worker: Begining to accumulate sum<br />
Worker: Took item A1 length 4<br />
1 2 3 4<br />
Worker: Sending sum 10<br />
Worker: Took item A3 length 4<br />
9 10 11 12<br />
Worker: Sending sum 42<br />
Worker: Took item A5 length 4<br />
17 18 19 20<br />
Worker: Sending sum 74<br />
Worker: Took item A7 length 4<br />
25 26 27 28<br />
Worker: Sending sum 106<br />
Worker: Took item A33 length -1<br />
Worker: Recieved terminal signal<br />
Worker: Terminated<br />
Matrix Multiplication with Chunking<br />
171
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
0 1 2 3 4 5 6 7 8 9<br />
0 1 1 1 1 1 1 1 1 1 1<br />
1 1 1 1 1 1 1 1 1 1 0<br />
2 1 1 1 1 1 1 1 1 0 0<br />
3 1 1 1 1 1 1 1 0 0 0<br />
A = 4 1 1 1 1 1 1 0 0 0 0 B =<br />
5 1 1 1 1 1 0 0 0 0 0<br />
6 1 1 1 1 0 0 0 0 0 0<br />
7 1 1 1 0 0 0 0 0 0 0<br />
8 1 1 0 0 0 0 0 0 0 0<br />
9 1 0 0 0 0 0 0 0 0 0<br />
0<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
0 1 2 3 4 5 6 7 8 9<br />
0 0 0 0 0 0 0 0 0 1<br />
0 0 0 0 0 0 0 0 1 -1<br />
0 0 0 0 0 0 0 1 -1 0<br />
0 0 0 0 0 0 1 -1 0 0<br />
0 0 0 0 0 1 -1 0 0 0<br />
0 0 0 0 1 -1 0 0 0 0<br />
0 0 0 1 -1 0 0 0 0 0<br />
0 0 1 -1 0 0 0 0 0 0<br />
0 1 -1 0 0 0 0 0 0 0<br />
1 -1 0 0 0 0 0 0 0 0<br />
C<br />
0<br />
1<br />
2<br />
3 0 0 0 1 0 0 0 0 0 0<br />
A . B C = 4 0 0 0 0 1 0 0 0 0 0 B A 1<br />
5<br />
6<br />
7<br />
8<br />
9<br />
0 1 2 3 4 5 6 7 8 9<br />
1 0 0 0 0 0 0 0 0 0<br />
0 1 0 0 0 0 0 0 0 0<br />
0 0 1 0 0 0 0 0 0 0<br />
0 0 0 0 0 1 0 0 0 0<br />
0 0 0 0 0 0 1 0 0 0<br />
0 0 0 0 0 0 0 1 0 0<br />
0 0 0 0 0 0 0 0 1 0<br />
0 0 0 0 0 0 0 0 0 1<br />
The following is the tuple space “matrix multiplication” master program implemented by<br />
sending work in chunks:<br />
#include <br />
#include <br />
#include <br />
#include "matrix.h"<br />
// The A matrix to break up into arrays<br />
// <strong>and</strong> send to workers<br />
double A[N][N];<br />
// The B matrix<br />
double B[N][N];<br />
// The resulting C matrix<br />
double C[N][N];<br />
172
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
main(){<br />
int processors; // Number of processors<br />
int chunk_size; // Chunk size<br />
int remaining; // Remaining arrays of work<br />
int i, j;<br />
// Matrix indices<br />
int matrix_row; // Index of matrix row<br />
int array_pos; // Array position in rows array<br />
int status;<br />
// Return status for tuple operations<br />
int res;<br />
// Result tuple space identifier<br />
int tsd;<br />
// Problem tuple space identifier<br />
double *rows; // Rows from A to send to worker<br />
double worker_time; // Sum of times returned by workers<br />
double total_time; // Total application run time<br />
int tplength; // Length of ts entry<br />
char tpname[20]; // Identifier of ts entry<br />
char host[128]; // Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Get time stamp<br />
total_time = wall_clock();<br />
// Open tuple spaces<br />
printf("Master: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem",0);<br />
// Open result tuple space<br />
res = cnf_open("result",0);<br />
printf("Master: Tuple spaces open complete\n");<br />
// Get number of processors<br />
processors = cnf_getP();<br />
printf("Master: Processors %d\n", processors);<br />
// Get chunk size<br />
chunk_size = cnf_getf();<br />
printf("Master: Chunk size %d\n", chunk_size);<br />
printf("Master: Starting C = A . B\n");<br />
printf(" on %d x %d matrices\n", N, N);<br />
// Create <strong>and</strong> print matrix B<br />
makeDblInv(B);<br />
if(N
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
// Put entry in tuple space<br />
status = cnf_tsput(tsd, tpname, B, tplength);<br />
// Create <strong>and</strong> print matrix A<br />
makeDblMat(A);<br />
if(N 0) {<br />
// If remaining rows is less than chunk size<br />
// set number of rows sent to remaining rows<br />
if (remaining < chunk_size)<br />
chunk_size = remaining;<br />
// Subtract rows being sent from remaining rows<br />
remaining = remaining - chunk_size;<br />
printf("Master: chunk_size(%d) remaining(%d) \n",<br />
chunk_size, remaining);<br />
// Put chunk_size in first index<br />
rows[0] = chunk_size;<br />
// Set rows array position to 2<br />
// Second position (1) is reserved for<br />
// time returned by worker<br />
array_pos = 2;<br />
// Put rows of A matrix in rows array<br />
for (i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
}<br />
if(N 0){<br />
// Set entry name<br />
strcpy(tpname,"*");<br />
// Get entry from result tuple space<br />
tplength = cnf_tsget(res, tpname, rows, 0);<br />
// Get number rows in this chunk from last index<br />
chunk_size = rows[0];<br />
// Get time returned by worker<br />
worker_time += rows[1];<br />
// Convert beginning row of entry to an integer<br />
matrix_row = atoi(tpname);<br />
printf("Master: Recieved %s Size %d\n", tpname, chunk_size);<br />
// Set the position in the array to 2<br />
array_pos = 2;<br />
}<br />
// Assemble the result matrix C<br />
// Loop through recieved rows<br />
printf("Master: Recieved\n");<br />
for (i= 0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Resolve worker time<br />
printf("Master: The workers used %g seconds of processor time\n",<br />
(worker_time/1000000.0));<br />
// Check <strong>and</strong> print the C matrix<br />
if(N
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
int tplength;<br />
char tpname[20];<br />
char host[128];<br />
// Length of ts entry<br />
// Identifier of ts entry<br />
// Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Worker: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem", 0);<br />
// Open result tuple space<br />
res = cnf_open("result", 0);<br />
printf("Worker: Tuple spaces open complete\n");<br />
// Set tpname to B<br />
strcpy(tpname,"B");<br />
// Read matrix B from tuple space<br />
status = cnf_tsread(tsd, tpname, B, 0);<br />
// Print matrix B<br />
if(N 0){<br />
// Check termination signal<br />
if (!strcmp(tpname, "A-term")){<br />
printf("Worker: Recieved the terminal signal\n");<br />
// Replace the terminal signal in problem ts<br />
status = cnf_tsput(tsd, tpname, rows, tplength);<br />
// Free memory for rows<br />
free(rows);<br />
// Terminate worker<br />
printf("Worker: Terminated\n");<br />
cnf_term();<br />
}<br />
// Get number rows in this chunk from last index<br />
177
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
chunk_size = rows[0];<br />
// Convert beginning row of entry to an integer<br />
matrix_row = atoi(&tpname[1]);<br />
printf("Worker: chunk_size %d matrix_row %d\n",<br />
chunk_size, matrix_row);<br />
// Set rows array put position to 2<br />
array_put = 2;<br />
// Set rows array get position to 2<br />
array_get = 2;<br />
// Get beginning worker time<br />
worker_time = wall_clock();<br />
// For each row in chunk_size<br />
for(n=0; n
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
status = cnf_tsput(res, tpname, rows, tplength);<br />
if(N f: problem<br />
(type = TS)<br />
-> m: worker = tupleMat2Worker<br />
(type = slave)<br />
-> f: result<br />
(type = TS)<br />
-> m: master;<br />
The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
[c615111@owin ~/fpc07new ]>prun tupleMat2<br />
Master: Opening tuple spaces<br />
Master: Tuple spaces open complete<br />
Master: Processors 2<br />
Master: Chunk size 4<br />
Master: Starting C = A . B<br />
on 10 x 10 matrices<br />
The B double matrix<br />
0 0 0 0 0 0 0 0 0 1<br />
0 0 0 0 0 0 0 0 1 -1<br />
179
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
0 0 0 0 0 0 0 1 -1 0<br />
0 0 0 0 0 0 1 -1 0 0<br />
0 0 0 0 0 1 -1 0 0 0<br />
0 0 0 0 1 -1 0 0 0 0<br />
0 0 0 1 -1 0 0 0 0 0<br />
0 0 1 -1 0 0 0 0 0 0<br />
0 1 -1 0 0 0 0 0 0 0<br />
1 -1 0 0 0 0 0 0 0 0<br />
Master: Putting B Length(800) Name(B)<br />
The A double matrix<br />
1 1 1 1 1 1 1 1 1 1<br />
1 1 1 1 1 1 1 1 1 0<br />
1 1 1 1 1 1 1 1 0 0<br />
1 1 1 1 1 1 1 0 0 0<br />
1 1 1 1 1 1 0 0 0 0<br />
1 1 1 1 1 0 0 0 0 0<br />
1 1 1 1 0 0 0 0 0 0<br />
1 1 1 0 0 0 0 0 0 0<br />
1 1 0 0 0 0 0 0 0 0<br />
1 0 0 0 0 0 0 0 0 0<br />
Master: Putting chunk_size Length(4) Name(chunk_size)<br />
Master: Ai tplength = (336)<br />
Master: Putting A in problem tuple space<br />
Master: chunk_size(4) remaining(6)<br />
1 1 1 1 1 1 1 1 1 1<br />
1 1 1 1 1 1 1 1 1 0<br />
1 1 1 1 1 1 1 1 0 0<br />
1 1 1 1 1 1 1 0 0 0<br />
Master: chunk_size(4) remaining(2)<br />
1 1 1 1 1 1 0 0 0 0<br />
1 1 1 1 1 0 0 0 0 0<br />
1 1 1 1 0 0 0 0 0 0<br />
1 1 1 0 0 0 0 0 0 0<br />
Master: chunk_size(2) remaining(0)<br />
1 1 0 0 0 0 0 0 0 0<br />
1 0 0 0 0 0 0 0 0 0<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
The B double matrix<br />
0 0 0 0 0 0 0 0 0 1<br />
0 0 0 0 0 0 0 0 1 -1<br />
0 0 0 0 0 0 0 1 -1 0<br />
0 0 0 0 0 0 1 -1 0 0<br />
0 0 0 0 0 1 -1 0 0 0<br />
0 0 0 0 1 -1 0 0 0 0<br />
0 0 0 1 -1 0 0 0 0 0<br />
0 0 1 -1 0 0 0 0 0 0<br />
0 1 -1 0 0 0 0 0 0 0<br />
1 -1 0 0 0 0 0 0 0 0<br />
Worker: chunk_size 4 matrix_row 4<br />
Worker: Recieved<br />
1 1 1 1 1 1 0 0 0 0<br />
Worker: Calculated array CA4+0<br />
0 0 0 0 1 0 0 0 0 0<br />
Worker: Recieved<br />
1 1 1 1 1 0 0 0 0 0<br />
180
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Worker: Calculated array CA4+1<br />
0 0 0 0 0 1 0 0 0 0<br />
Worker: Recieved<br />
1 1 1 1 0 0 0 0 0 0<br />
Worker: Calculated array CA4+2<br />
0 0 0 0 0 0 1 0 0 0<br />
Worker: Recieved<br />
1 1 1 0 0 0 0 0 0 0<br />
Worker: Calculated array CA4+3<br />
0 0 0 0 0 0 0 1 0 0<br />
Worker: Putting 4<br />
Master: Recieved 4 Size 4<br />
Master: Recieved<br />
0 0 0 0 1 0 0 0 0 0<br />
0 0 0 0 0 1 0 0 0 0<br />
0 0 0 0 0 0 1 0 0 0<br />
0 0 0 0 0 0 0 1 0 0<br />
Master: Recieved 0 Size 4<br />
Master: Recieved<br />
1 0 0 0 0 0 0 0 0 0<br />
0 1 0 0 0 0 0 0 0 0<br />
0 0 1 0 0 0 0 0 0 0<br />
0 0 0 1 0 0 0 0 0 0<br />
Worker: chunk_size 2 matrix_row 8<br />
Worker: Recieved<br />
1 1 0 0 0 0 0 0 0 0<br />
Worker: Calculated array CA8+0<br />
0 0 0 0 0 0 0 0 1 0<br />
Worker: Recieved<br />
1 0 0 0 0 0 0 0 0 0<br />
Worker: Calculated array CA8+1<br />
0 0 0 0 0 0 0 0 0 1<br />
Worker: Putting 8<br />
Master: Recieved 8 Size 2<br />
Master: Recieved<br />
0 0 0 0 0 0 0 0 1 0<br />
0 0 0 0 0 0 0 0 0 1<br />
Master: The multiplication took 1.11439 seconds total time<br />
Master: The workers used 0.024033 seconds of processor time<br />
The C double matrix<br />
1 0 0 0 0 0 0 0 0 0<br />
0 1 0 0 0 0 0 0 0 0<br />
0 0 1 0 0 0 0 0 0 0<br />
0 0 0 1 0 0 0 0 0 0<br />
0 0 0 0 1 0 0 0 0 0<br />
0 0 0 0 0 1 0 0 0 0<br />
0 0 0 0 0 0 1 0 0 0<br />
0 0 0 0 0 0 0 1 0 0<br />
0 0 0 0 0 0 0 0 1 0<br />
0 0 0 0 0 0 0 0 0 1<br />
Master: C is Identity Matrix<br />
Master: Terminated<br />
Worker: Recieved the terminal signal<br />
Worker: Terminated<br />
== (tupleMat2) completed. Elapsed [2] Seconds.<br />
[c615111@owin ~/fpc07new ]><br />
181
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
The B double matrix<br />
0 0 0 0 0 0 0 0 0 1<br />
0 0 0 0 0 0 0 0 1 -1<br />
0 0 0 0 0 0 0 1 -1 0<br />
0 0 0 0 0 0 1 -1 0 0<br />
0 0 0 0 0 1 -1 0 0 0<br />
0 0 0 0 1 -1 0 0 0 0<br />
0 0 0 1 -1 0 0 0 0 0<br />
0 0 1 -1 0 0 0 0 0 0<br />
0 1 -1 0 0 0 0 0 0 0<br />
1 -1 0 0 0 0 0 0 0 0<br />
Worker: chunk_size 4 matrix_row 0<br />
Worker: Recieved<br />
1 1 1 1 1 1 1 1 1 1<br />
Worker: Calculated array CA0+0<br />
1 0 0 0 0 0 0 0 0 0<br />
Worker: Recieved<br />
1 1 1 1 1 1 1 1 1 0<br />
Worker: Calculated array CA0+1<br />
0 1 0 0 0 0 0 0 0 0<br />
Worker: Recieved<br />
1 1 1 1 1 1 1 1 0 0<br />
Worker: Calculated array CA0+2<br />
0 0 1 0 0 0 0 0 0 0<br />
Worker: Recieved<br />
1 1 1 1 1 1 1 0 0 0<br />
Worker: Calculated array CA0+3<br />
0 0 0 1 0 0 0 0 0 0<br />
Worker: Putting 0<br />
Worker: Recieved the terminal signal<br />
Worker: Terminated<br />
To run the matrix multiplication distributed application with chunk size of 200 <strong>and</strong> N =<br />
500 (a 500 x 500 matrix):<br />
1. Set the factor value in the csl file to 200 (as shown below)<br />
2. Make the executables by typing “make SIZE=500” <strong>and</strong> pressing the enter key.<br />
3. Run the application by typing “prun tupleMat2” <strong>and</strong> pressing the enter key.<br />
configuration: tupleMat2;<br />
m: master = tupleMat2Master<br />
(factor = 200<br />
threshold = 1<br />
debug = 0<br />
182
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
)<br />
-> f: problem<br />
(type = TS)<br />
-> m: worker = tupleMat2Worker<br />
(type = slave)<br />
-> f: result<br />
(type = TS)<br />
-> m: master;<br />
The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
[c615111@owin ~/fpc07new ]>prun tupleMat2<br />
Master: Opening tuple spaces<br />
CID starting program. path (bin/tupleMat2Worker)<br />
Master: Tuple spaces open complete<br />
Master: Processors 2<br />
Master: Chunk size 200<br />
Master: Starting C = A . B<br />
on 500 x 500 matrices<br />
Master: Putting B Length(2000000) Name(B)<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Master: Putting chunk_size Length(4) Name(chunk_size)<br />
Master: Ai tplength = (800016)<br />
Master: Putting A in problem tuple space<br />
Master: chunk_size(200) remaining(300)<br />
Master: chunk_size(200) remaining(100)<br />
Worker: chunk_size 200 matrix_row 200<br />
Master: chunk_size(100) remaining(0)<br />
Worker: Putting 200<br />
Master: Recieved 0 Size 200<br />
Master: Recieved<br />
Master: Recieved 200 Size 200<br />
Master: Recieved<br />
Master: Recieved 400 Size 100<br />
Master: Recieved<br />
Master: The multiplication took 9.66808 seconds total time<br />
Master: The workers used 15.0322 seconds of processor time<br />
Master: C is Identity Matrix<br />
Worker: Recieved the terminal signal<br />
Master: Terminated<br />
Worker: Terminated<br />
== (tupleMat2) completed. Elapsed [10] Seconds.<br />
[c615111@owin ~/fpc07new ]><br />
The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: chunk_size 200 matrix_row 0<br />
Worker: Putting 0<br />
183
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Worker: chunk_size 100 matrix_row 400<br />
Worker: Putting 400<br />
Worker: Recieved the terminal signal<br />
Worker: Terminated<br />
184
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Optimized Programs<br />
Optimized Matrix Multiplication with Chunking<br />
The following is the tuple space “optimized matrix multiplication” master program<br />
implemented by sending work in chunks:<br />
#include <br />
#include <br />
#include <br />
// The A matrix to break up into arrays<br />
// <strong>and</strong> send to workers<br />
double A[N][N];<br />
double B[N][N];<br />
double C[N][N];<br />
#include "matrix.h"<br />
// Main function<br />
main(){<br />
int processors; // Number of processors<br />
int chunk_size; // Chunk size<br />
int remaining; // Remaining arrays of work<br />
int i, j;<br />
// Matrix indices<br />
int matrix_row; // Index of matrix row<br />
int array_pos; // Array position in rows array<br />
int status;<br />
// Return status for tuple operations<br />
int res;<br />
// Result tuple space identifier<br />
int tsd;<br />
// Problem tuple space identifier<br />
double *rows; // Rows from A to send to worker<br />
double worker_time; // Sum of times returned by workers<br />
double total_time; // Total application run time<br />
int tplength; // Length of ts entry<br />
char tpname[20]; // Identifier of ts entry<br />
char host[128]; // Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Get time stamp<br />
total_time = wall_clock();<br />
// Open tuple spaces<br />
printf("Master: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem",0);<br />
// Open result tuple space<br />
res = cnf_open("result",0);<br />
printf("Master: Tuple spaces open complete\n");<br />
185
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
// Get number of processors<br />
processors = cnf_getP();<br />
printf("Master: Processors %d\n", processors);<br />
// Get chunk size<br />
chunk_size = cnf_getf();<br />
printf("Master: Chunk size %d\n",<br />
chunk_size);<br />
printf("Master: Starting C = A . B\n");<br />
printf(" on %d x %d matrices\n", N, N);<br />
// Create <strong>and</strong> print matrix B<br />
makeDblInv(B);<br />
if(N
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
matrix_row = 0;<br />
// Loop until all numbers are sent to workers<br />
while (remaining > 0) {<br />
// If remaining rows is less than chunk size<br />
// set number of rows sent to remaining rows<br />
if (remaining < chunk_size)<br />
chunk_size = remaining;<br />
// Subtract rows being sent from remaining rows<br />
remaining = remaining - chunk_size;<br />
// Set rows array position to 2<br />
// Second position (1) is reserved for<br />
// time returned by worker<br />
array_pos = 2;<br />
// Put chunk_size in last index<br />
rows[0] = chunk_size;<br />
// Put rows of A matrix in rows array<br />
for (i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
// Set the position in the array to 2<br />
array_pos = 2;<br />
// Assemble the result matrix C<br />
// Loop through recieved rows<br />
if(N
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
printf("Master: Terminated\n");<br />
cnf_term();<br />
The following is the tuple space “optimized matrix multiplication” worker program<br />
implemented by sending work in chunks:<br />
#include <br />
#include <br />
#include <br />
double Ai[N/2][N]; // A chunk of A matrix<br />
double B[N][N]; // B matrix<br />
double Ci[N/2][N]; // A chunk of C matrix<br />
#include "matrix.h"<br />
// Main function<br />
main(){<br />
int chunk_size; // Chunk size<br />
int i, j, k; // Matrix indices<br />
int matrix_row; // Index of matrix row<br />
int array_pos; // Get array position in rows array<br />
int status;<br />
// Return status for tuple operations<br />
int res;<br />
// Result tuple space identifier<br />
int tsd;<br />
// Problem tuple space identifier<br />
double *rows; // Rows from A<br />
double worker_time; // Time to return to master<br />
int tplength; // Length of ts entry<br />
char tpname[20]; // Identifier of ts entry<br />
char host[128]; // Host machine name<br />
// Get host machine name<br />
gethostname(host, sizeof(host));<br />
// Open tuple spaces<br />
printf("Worker: Opening tuple spaces\n");<br />
// Open problem tuple space<br />
tsd = cnf_open("problem", 0);<br />
// Open result tuple space<br />
res = cnf_open("result", 0);<br />
printf("Worker: Tuple spaces open complete\n");<br />
// Set tpname to B<br />
strcpy(tpname,"B");<br />
// Read matrix B from tuple space<br />
status = cnf_tsread(tsd, tpname, B, 0);<br />
// Print matrix B<br />
if(N
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
// Get chunk_size from master<br />
// Set tpname to chunk_size<br />
strcpy(tpname,"chunk_size");<br />
// Read chunk_size from tuple space<br />
status = cnf_tsread(tsd, tpname, &chunk_size, 0);<br />
// Prepare integer array for tuple space exchanges<br />
tplength = (2+chunk_size*N)*sizeof(double);<br />
if ((rows = (double*)malloc(tplength)) == NULL)<br />
exit(-1);<br />
// Loop until terminal signal is recieved<br />
while(1){<br />
// Set entry name to any begins with A<br />
strcpy(tpname,"A*");<br />
// Set length of entry<br />
tplength = cnf_tsget(tsd, tpname, rows, 0);<br />
// Normal recieve<br />
if(tplength > 0){<br />
// Check termination signal<br />
if (!strcmp(tpname, "A-term")){<br />
printf("Worker: Recieved the terminal signal\n");<br />
// Replace the terminal signal in problem ts<br />
status = cnf_tsput(tsd, tpname, rows, tplength);<br />
// Free memory for rows<br />
free(rows);<br />
// Terminate worker<br />
printf("Worker: Terminated\n");<br />
cnf_term();<br />
}<br />
// Get number rows in this chunk from last index<br />
chunk_size = (int)rows[0];<br />
// Convert beginning row of entry to an integer<br />
matrix_row = atoi(&tpname[1]);<br />
printf("Worker: Recieved chunk_size %d matrix_row %d\n",<br />
chunk_size, matrix_row);<br />
// Get beginning worker time<br />
worker_time = wall_clock();<br />
// For each row in chunk_size<br />
// Copy rows from rows to Ai<br />
for(i=0; i
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
}<br />
for(i=0; i m: worker = tupleMat3Worker<br />
(type = slave)<br />
-> f: result<br />
(type = TS)<br />
-> m: master;<br />
The screen output for the master terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
191
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Master: Opening tuple spaces<br />
Master: Tuple spaces open complete<br />
Master: Processors 2<br />
Master: Chunk size 200<br />
Master: Starting C = A . B<br />
on 500 x 500 matrices<br />
Master: Putting B Length 2000000 Name B<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Master: Putting chunk_size Length 4 Name chunk_size<br />
Master: Ai tplength = (800016)<br />
Master: Putting A in problem tuple space<br />
Master: Putting chunk_size 200 matrix_row A0 remaining 300<br />
Master: Putting chunk_size 200 matrix_row A200 remaining 100<br />
Worker: Recieved chunk_size 200 matrix_row 200<br />
Master: Putting chunk_size 100 matrix_row A400 remaining 0<br />
Master: All work has been sent<br />
Master: Recieved chunk_sizs 200 matrix_row 0<br />
Worker: Putting chunk_size 200 matrix_row 200<br />
Master: Recieved chunk_sizs 200 matrix_row 200<br />
Master: Recieved chunk_sizs 100 matrix_row 400<br />
Master: Recieved all work from workers<br />
Master: C matrix has been assembled<br />
Master: The multiplication took 4.39389 seconds total time<br />
Master: The workers used 6.23962 seconds of processor time<br />
Master: C is Identity Matrix<br />
Master: Terminated<br />
Worker: Recieved the terminal signal<br />
Worker: Terminated<br />
== (tupleMat3) completed. Elapsed [4] Seconds.<br />
[c615111@owin ~/fpc08 ]><br />
The screen output for the worker terminal with <strong>Synergy</strong>’s initialization <strong>and</strong> termination<br />
output removed should resemble:<br />
Worker: Opening tuple spaces<br />
Worker: Tuple spaces open complete<br />
Worker: Recieved chunk_size 200 matrix_row 0<br />
Worker: Putting chunk_size 200 matrix_row 0<br />
Worker: Recieved chunk_size 100 matrix_row 400<br />
Worker: Putting chunk_size 100 matrix_row 400<br />
Worker: Recieved the terminal signal<br />
Worker: Terminated<br />
192
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
<strong>Synergy</strong> in the Future<br />
193
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Function <strong>and</strong> Comm<strong>and</strong> Reference<br />
Comm<strong>and</strong>s<br />
addhost<br />
This comm<strong>and</strong> adds a host into the host file. The comm<strong>and</strong> fails if the given host is not<br />
<strong>Synergy</strong> capable. The [-f] option forces the insertion even if the host is not ready. A<br />
newly added host automatically becomes “selected”.<br />
Syntax:<br />
[c615111@owin ~ ]>addhost [-f]<br />
cds<br />
Checks the status of remote daemons. This comm<strong>and</strong> prints all available remote hosts to<br />
screen <strong>and</strong> shows their benchmark, name <strong>and</strong> availability.<br />
Example:<br />
[c615111@owin ~ ]>cds<br />
++ Benchmark (186) ++ (owin) ready.<br />
++ Benchmark (2077) ++ (rancor) ready.<br />
++ Benchmark (2109) ++ (saber) ready.<br />
++ Benchmark (1497) ++ (sarlac) ready.<br />
++ Benchmark (186) ++ (lynox) ready.<br />
[c615111@luke ~ ]><br />
[c615111@owin ~ ]>cds<br />
????? PMD down (129.32.92.82,ewok)<br />
????? CID down (129.32.92.66,luke) (c615111)<br />
????? CID down (129.32.92.89,ackbar) (c615111)<br />
????? CID down (129.32.92.69,r2d2) (c615111)<br />
[c615111@luke ~ ]><br />
[c615111@luke ~ ]>cds<br />
????? PMD down (129.32.92.82,ewok)<br />
++ Benchmark (371) ++ (luke) ready.<br />
????? CID down (129.32.92.89,ackbar) (c615111)<br />
????? CID down (129.32.92.69,r2d2) (c615111)<br />
[c615111@luke ~ ]><br />
194
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
chosts<br />
This comm<strong>and</strong> allows you to toggle the selected <strong>and</strong> de-selected status of processors.<br />
Only the selected processors will be used for immediate parallel processing. The -v<br />
option gives the current <strong>Synergy</strong> connection status. It requires some extra time.<br />
Syntax:<br />
[c615111@owin ~ ]>chosts [-v]<br />
Example:<br />
<strong>Synergy</strong> V3.0 : Host Selection Utility<br />
=Status=No.===IP Address=================Host Name==============Login=F Sys.=<br />
[-----] ( 1) #129.32.92.82 ewok c615111 none<br />
[-----] ( 2) #129.32.92.66 luke c615111 none<br />
[-----] ( 3) #129.32.92.89 ackbar c615111 none<br />
[-----] ( 4) #129.32.92.69 r2d2 c615111 none<br />
[-----] ( 5) #129.32.92.87 alliance c615111 none<br />
[-----] ( 6) #129.32.92.91 anakin c615111 none<br />
[-----] ( 7) #129.32.92.78 bantha c615111 none<br />
[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />
[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />
[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />
[-----] ( 11) #129.32.92.86 droids c615111 none<br />
[-----] ( 12) #129.32.92.68 emperor c615111 none<br />
[-----] ( 13) #129.32.92.77 gredo c615111 none<br />
[-----] ( 14) #129.32.92.71 jabba c615111 none<br />
[-----] ( 15) #129.32.92.76 jawa c615111 none<br />
[-----] ( 16) #129.32.92.83 l<strong>and</strong>o c615111 none<br />
[-----] ( 17) #129.32.92.84 leia c615111 none<br />
[-----] ( 18) #129.32.92.81 owin c615111 none<br />
[-----] ( 19) #129.32.92.70 rancor c615111 none<br />
=== Enter s(elect) | d(e-select) | c(ontinue):<br />
[-----] ( 3) #129.32.92.89 ackbar c615111 none<br />
[-----] ( 4) #129.32.92.69 r2d2 c615111 none<br />
[-----] ( 5) #129.32.92.87 alliance c615111 none<br />
[-----] ( 6) #129.32.92.91 anakin c615111 none<br />
[-----] ( 7) #129.32.92.78 bantha c615111 none<br />
[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />
[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />
[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />
[-----] ( 11) #129.32.92.86 droids c615111 none<br />
[-----] ( 12) #129.32.92.68 emperor c615111 none<br />
[-----] ( 13) #129.32.92.77 gredo c615111 none<br />
[-----] ( 14) #129.32.92.71 jabba c615111 none<br />
195
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
[-----] ( 15) #129.32.92.76<br />
[-----] ( 16) #129.32.92.83<br />
jawa<br />
l<strong>and</strong>o<br />
c615111<br />
c615111<br />
none<br />
none<br />
[-----] ( 17) #129.32.92.84 leia c615111 none<br />
[-----] ( 18) #129.32.92.81 owin c615111 none<br />
[-----] ( 19) #129.32.92.70 rancor c615111 none<br />
=== Enter s(elect) | d(e-select) | c(ontinue): s<br />
=== Host From (0 to continue) #: 1<br />
To #: 4<br />
(129.32.92.82 ewok) selected.<br />
(129.32.92.66 luke) selected.<br />
(129.32.92.89 ackbar) selected.<br />
(129.32.92.69 r2d2) selected.<br />
=== Enter s(elect) | d(e-select) | c(ontinue):<br />
<strong>Synergy</strong> V3.0 : Host Selection Utility<br />
=Status=No.===IP Address=================Host Name==============Login=F Sys.=<br />
[-----] ( 1) 129.32.92.82 ewok c615111 none<br />
[-----] ( 2) 129.32.92.66 luke c615111 none<br />
[-----] ( 3) 129.32.92.89 ackbar c615111 none<br />
[-----] ( 4) 129.32.92.69 r2d2 c615111 none<br />
[-----] ( 5) #129.32.92.87 alliance c615111 none<br />
[-----] ( 6) #129.32.92.91 anakin c615111 none<br />
[-----] ( 7) #129.32.92.78 bantha c615111 none<br />
[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />
[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />
[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />
[-----] ( 11) #129.32.92.86 droids c615111 none<br />
[-----] ( 12) #129.32.92.68 emperor c615111 none<br />
[-----] ( 13) #129.32.92.77 gredo c615111 none<br />
[-----] ( 14) #129.32.92.71 jabba c615111 none<br />
[-----] ( 15) #129.32.92.76 jawa c615111 none<br />
[-----] ( 16) #129.32.92.83 l<strong>and</strong>o c615111 none<br />
[-----] ( 17) #129.32.92.84 leia c615111 none<br />
[-----] ( 18) #129.32.92.81 owin c615111 none<br />
[-----] ( 19) #129.32.92.70 rancor c615111 none<br />
=== Enter s(elect) | d(e-select) | c(ontinue):<br />
[-----] ( 1) 129.32.92.82 ewok c615111 none<br />
[-----] ( 2) 129.32.92.66 luke c615111 none<br />
[-----] ( 3) 129.32.92.89 ackbar c615111 none<br />
[-----] ( 4) 129.32.92.69 r2d2 c615111 none<br />
[-----] ( 5) #129.32.92.87 alliance c615111 none<br />
[-----] ( 6) #129.32.92.91 anakin c615111 none<br />
[-----] ( 7) #129.32.92.78 bantha c615111 none<br />
[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />
[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />
[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />
[-----] ( 11) #129.32.92.86 droids c615111 none<br />
[-----] ( 12) #129.32.92.68 emperor c615111 none<br />
[-----] ( 13) #129.32.92.77 gredo c615111 none<br />
[-----] ( 14) #129.32.92.71 jabba c615111 none<br />
[-----] ( 15) #129.32.92.76 jawa c615111 none<br />
[-----] ( 16) #129.32.92.83 l<strong>and</strong>o c615111 none<br />
196
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
[-----] ( 17) #129.32.92.84<br />
[-----] ( 18) #129.32.92.81<br />
leia<br />
owin<br />
c615111<br />
c615111<br />
none<br />
none<br />
[-----] ( 19) #129.32.92.70 rancor c615111 none<br />
=== Enter s(elect) | d(e-select) | c(ontinue): d<br />
=== Host From (0 to continue) #: 2<br />
To #: 3<br />
(luke, #129.32.92.66) de-selected.<br />
(ackbar, #129.32.92.89) de-selected.<br />
=== Enter s(elect) | d(e-select) | c(ontinue):<br />
<strong>Synergy</strong> V3.0 : Host Selection Utility<br />
=Status=No.===IP Address=================Host Name==============Login=F Sys.=<br />
[-----] ( 1) 129.32.92.82 ewok c615111 none<br />
[-----] ( 2) #129.32.92.66 luke c615111 none<br />
[-----] ( 3) #129.32.92.89 ackbar c615111 none<br />
[-----] ( 4) 129.32.92.69 r2d2 c615111 none<br />
[-----] ( 5) #129.32.92.87 alliance c615111 none<br />
[-----] ( 6) #129.32.92.91 anakin c615111 none<br />
[-----] ( 7) #129.32.92.78 bantha c615111 none<br />
[-----] ( 8) #129.32.92.74 bobafet c615111 none<br />
[-----] ( 9) #129.32.92.80 c3p0 c615111 none<br />
[-----] ( 10) #129.32.92.88 chewbaca c615111 none<br />
[-----] ( 11) #129.32.92.86 droids c615111 none<br />
[-----] ( 12) #129.32.92.68 emperor c615111 none<br />
[-----] ( 13) #129.32.92.77 gredo c615111 none<br />
[-----] ( 14) #129.32.92.71 jabba c615111 none<br />
[-----] ( 15) #129.32.92.76 jawa c615111 none<br />
[-----] ( 16) #129.32.92.83 l<strong>and</strong>o c615111 none<br />
[-----] ( 17) #129.32.92.84 leia c615111 none<br />
[-----] ( 18) #129.32.92.81 owin c615111 none<br />
[-----] ( 19) #129.32.92.70 rancor c615111 none<br />
=== Enter s(elect) | d(e-select) | c(ontinue):<br />
cid<br />
Example:<br />
[c615111@luke ~ ]>cid &<br />
[1] 23104<br />
[c615111@luke ~ ]> CID HOST NAME (luke)<br />
Actual CID IP(129.32.92.66)<br />
CID ready.<br />
[c615111@owin ~ ]><br />
[c615111@owin ~ ]>cid &<br />
[2] 240<br />
197
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
[c615111@owin ~ ]> CID HOST NAME (owin)<br />
Actual CID IP(129.32.92.81)<br />
Found an old CID.<br />
Removed an old CID<br />
Reusing cid entry.<br />
CID ready.<br />
[c615111@owin ~ ]><br />
delhost<br />
This comm<strong>and</strong> permanently deletes a host from the host file. It fails if the host is <strong>Synergy</strong><br />
ready. The [-f] option forces the removal.<br />
Syntax:<br />
[c615111@owin ~ ]>delhost [-f]<br />
Example:<br />
dhosts<br />
This comm<strong>and</strong> lets you permanently delete more than one host at a time. The -v option<br />
will verify the hosts' current <strong>Synergy</strong> connection status (it takes some extra time).<br />
Syntax:<br />
[c615111@owin ~ ]>dhosts [-v]<br />
Example:<br />
kds<br />
This comm<strong>and</strong> kills all remote daemons. It only kills the daemons started by your own<br />
login. It will NOT kill daemons started by others.<br />
pcheck<br />
Utility to check <strong>and</strong> maintain running parallel programs<br />
198
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Syntax:<br />
[c615111@owin ~ ]>pcheck<br />
Example:<br />
pmd<br />
Example:<br />
[c615111@ewok ~ ]>pmd &<br />
[1] 24172<br />
[c615111@ewok ~ ]><br />
[c615111@luke ~ ]>pmd &<br />
[2] 23106<br />
[c615111@luke ~ ]>PMD already running.<br />
[2] Exit 1 pmd<br />
[c615111@luke ~ ]><br />
prun<br />
Example:<br />
[c615111@owin ~/example01 ]>prun tupleHello1<br />
== Checking Processor Pool:<br />
++ Benchmark (185) ++ (owin) ready.<br />
++ Benchmark (1487) ++ (rancor) ready.<br />
++ Benchmark (1482) ++ (saber) ready.<br />
== Done.<br />
== Parallel Application Console: (owin)<br />
== CONFiguring: (tupleHello1.csl)<br />
== Default directory: (/usr/classes/cis6151/c615111/example01)<br />
++ Automatic program assignment: (worker)->(owin)<br />
++ Automatic slave generation: (worker1)->(rancor)<br />
++ Automatic slave generation: (worker2)->(saber)<br />
++ Automatic program assignment: (master)->(owin)<br />
++ Automatic object assignment: (problem)->(owin) pred(1) succ(3)<br />
++ Automatic object assignment: (result)->(owin) pred(3) succ(1)<br />
== Done.<br />
== Starting Distributed Application Controller ...<br />
Verifying process [|(c615111)|*/tupleHello1Worker<br />
199
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Verifying process [|(c615111)|*/tupleHello1Worker<br />
Verifying process [|(c615111)|*/tupleHello1Master<br />
Verifying process [|(c615111)|*/tupleHello1Worker<br />
** (tupleHello1.prcd) verified, all components executable.<br />
** (tupleHello1.prcd) started.<br />
== (tupleHello1) completed. Elapsed [5] Seconds.<br />
[c615111@owin ~/example01 ]><br />
sds<br />
This comm<strong>and</strong> starts daemons on selected hosts (defined in ~/.sng_hosts).<br />
sfs<br />
Example:<br />
shosts<br />
Example:<br />
200
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Functions<br />
cnf_close(id)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Close all internal data structures according to type<br />
int id – identifier of object to be closed<br />
Nothing<br />
cnf_dget(tpname, tpvalue, tpsize)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Destructive read a tuple from a direct tuple space<br />
char *tpname – the name of the object to be read from<br />
char *tpvalue – address of receiving buffer<br />
int tpsize – ?<br />
int tpsize – the length of the data read in 8-bit bytes<br />
cnf_dinit()<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Initializes the tid_list before each scatter operation<br />
None<br />
1 always<br />
cnf_dput(tsd, tid, tpname, tpvalue, tpsize)<br />
PURPOSE: Inserts a typle into a direct tuple space<br />
PARAMETERS: int tsd<br />
long tpsize<br />
char *tid<br />
char *tpname<br />
char *tpvalue<br />
RETURNS: ?<br />
cnf_dread(tpname, tpvalue, tpsize)<br />
PURPOSE:<br />
PARAMETERS:<br />
Destructive read a tuple from a direct tuple space<br />
int tpsize;<br />
char *tpname;<br />
201
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
RETURNS:<br />
char *tpvalue;<br />
int tpsize<br />
cnf_dzap()<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Removes all local CID's tuples<br />
None<br />
1 if success or an error code otherwise<br />
cnf_eot(id)<br />
PURPOSE: Marks the end of tasks<br />
PARAMETERS: int id - ?<br />
RETURNS: 1 if success or an error code otherwise<br />
cnf_error(errno)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Prints to the user the kind of error encountered<br />
int errno<br />
1 always<br />
cnf_fflush(id)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Flushes a file<br />
int id – index into cnf_map to get channel #/ptr<br />
1 if success or 0 if error<br />
cnf_fgetc(id, buf)<br />
PURPOSE: Read a char from file into buffer<br />
PARAMETERS: int id – index into cnf_map to get channel #/ptr<br />
char *buf; – address of receiving buffer<br />
RETURNS: 0 on EOF otherwise 1<br />
int cnf_fgets(id, buf, bufsiz)<br />
202
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Read a line from file into buffer<br />
int id – index into cnf_map to get channel #/ptr<br />
char *buf – address of receiving buffer<br />
int bufsiz – max size of receiving buffer<br />
0 if EOF otherwise number of bytes read<br />
cnf_fputc(id, buf)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Write a char from buffer to file<br />
int id – index into cnf_map to get channel #/ptr<br />
char buf – address of receiving buffer<br />
1 if success or 0 if error<br />
cnf_fputs(id, buf, bufsiz)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Write a line from buffer to file<br />
int id – index into cnf_map to get channel #/ptr<br />
char *buf – address of receiving buffer<br />
int bufsiz – size of buffer<br />
Number of bytes written or 0 if error<br />
cnf_fread(id, buf, bufsiz, nitems)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Read a 'record' from file into buffer<br />
int id – index into cnf_map to get channel #/ptr<br />
char *buf – address of receiving buffer<br />
int bufsiz – max size of receiving buffer<br />
int nitems – number of bufsiz blocks to read<br />
0 if EOF otherwise number of bytes read<br />
cnf_fseek(id, from, offset)<br />
PURPOSE:<br />
PARAMETERS:<br />
Set the reader pointer from "from" to "offset" in a file<br />
int id – index into cnf_map to get channel #/ptr<br />
int from<br />
int offset<br />
203
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
RETURNS:<br />
1 if success or 0 if error<br />
cnf_fwrite(id, buf, bufsiz, nitems)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Write a 'record' from buffer into file<br />
int id – index into cnf_map to get channel #/ptr<br />
char *buf – address of receiving buffer<br />
int bufsiz – max size of receiving buffer<br />
int nitems – number of bufsiz blocks to write<br />
Number of bytes written or an error code on error<br />
cnf_getarg(idx)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Returns the runtime argument by index<br />
int idx – the index<br />
char * (idx'th argument)<br />
cnf_getf()<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Returns the factor value for loop scheduling<br />
None<br />
f value (0..100] integer<br />
cnf_getP()<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Returns the number of parallel workers<br />
None<br />
P value [1..N] integer<br />
cnf_gett()<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Returns the threshold value for loop scheduling<br />
None<br />
t value [1..N) integer<br />
204
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
cnf_gts(tsd)<br />
PURPOSE: Get all tid's processor assignments in one shot<br />
PARAMETERS: int tsd - ?<br />
RETURNS: 1 if success, 0 if no memory or an error code otherwise<br />
cnf_init()<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Initializes sng_map_hd <strong>and</strong> sng_map using either the init file or<br />
direct transmission from DAC. The init file's name is constructed<br />
from the value of the logical name CNF_MODULE suffixed with<br />
".ini".<br />
None<br />
Nothing if successful or an error code otherwise<br />
cnf_open(local_name, mode)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Lookup a pipe or tuple space object in sng_map structure, open a<br />
channel to the physical address for that ref_name<br />
char *local_name – local_name to find in cnf_map<br />
char *mode – open modes: r,w,a,r+,w+,a+. Only for FILEs<br />
int chan – an integer h<strong>and</strong>le, if successful or an error code<br />
otherwise. This is used like a usual Unix file h<strong>and</strong>le.<br />
cnf_print_map()<br />
PURPOSE: ?<br />
PARAMETERS: None<br />
RETURNS: Nothing<br />
cnf_read(id, buf, bufsiz)<br />
PURPOSE:<br />
PARAMETERS:<br />
read a 'record' from file or pipe into buffer (starting at address<br />
buff).<br />
int id – index into cnf_map to get channel #/ptr<br />
int bufsiz – max size of receiving buffer<br />
char *buf – address of receiving buffer<br />
205
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
RETURNS:<br />
0 on EOF otherwise number of bytes read<br />
cnf_rmall(id)<br />
PURPOSE: Destroy all tuples in a named tuple space<br />
PARAMETERS: int id - ?<br />
RETURNS: 0 if successful or an error code otherwise<br />
cnf_sot(id)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Marks the start of scantering of tasks<br />
int id<br />
1 if successful or an error code otherwise<br />
cnf_spzap(tsd)<br />
PURPOSE: Removes all "retrieve" entries in TSH<br />
PARAMETERS: int tsd - ?<br />
RETURNS: 1 if successful or an error code otherwise<br />
cnf_term()<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Called before image return to clean things up. Closes any files left<br />
open.<br />
None<br />
Nothing<br />
cnf_tget(tpname, tpvalue, tpsize)<br />
PURPOSE: Destructive read a tuple from a named tuple space<br />
PARAMETERS: int tpsize -<br />
char *tpname -<br />
char *tpvalue -<br />
RETURNS: int tpsize – the size of the tuple received if successful or an error<br />
code otherwise<br />
206
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
cnf_tsput(tpname, tpvalue, tpsize)<br />
PURPOSE: Inserts a tuple into a named tuple space<br />
PARAMETERS: int tpsize -<br />
char *tpname -<br />
char *tpvalue -<br />
RETURNS: ? on success or an error code otherwise<br />
cnf_tsread(tpname, tpvalue, tpsize)<br />
PURPOSE: Read a tuple from a named tuple space<br />
PARAMETERS: int tpsize -<br />
char *tpname -<br />
char *tpvalue -<br />
RETURNS: int tpsize – the size of the tuple received if successful or an error<br />
code otherwise<br />
cnf_tsget(id, tpname, tpvalue, tpsize)<br />
PURPOSE: Destructive read a tuple from a named tuple space<br />
PARAMETERS: int id -<br />
int tpsize -<br />
char *tpname -<br />
char *tpvalue -<br />
RETURNS: int tpsize – the size of the tuple received if successful or an error<br />
code otherwise<br />
cnf_tsput(id, tpname, tpvalue, tpsize)<br />
PURPOSE:<br />
PARAMETERS: int id -<br />
int tpsize -<br />
char *tpname -<br />
char *tpvalue -<br />
RETURNS: ? on success or an error code otherwise<br />
Inserts a tuple into a named tuple space<br />
207
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
cnf_tsread(id, tpname, tpvalue, tpsize)<br />
PURPOSE: Read a tuple from a named tuple space<br />
PARAMETERS: int id -<br />
int tpsize -<br />
char *tpname -<br />
char *tpvalue -<br />
RETURNS: int tpsize – the size of the tuple received if successful or an error<br />
code otherwise<br />
cnf_write(id, buf, bytes)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Send a 'record' to file (or mailbox or decnet channel) from buffer<br />
(starting at address buff). bytes is the number of bytes to send. id<br />
is the index into cnf_map global data structure where the actual<br />
channel number or file pointer is stored.<br />
int id – index into cnf_map for channel #/ptr<br />
int bytes – number of bytes to send/write<br />
char buf[] – address of message to send<br />
1 if successful or an error code otherwise<br />
cnf_xdr_fgets(id, buf, bufsize, e_type)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Read the external data representation of a line from file into buffer<br />
(starting at address xdr_buff) <strong>and</strong> translates it to C language.<br />
int id – The index into cnf_map global data structure where the<br />
actual channel number or file pointer is stored<br />
char *buf -<br />
int bufsize – the number of bytes to read<br />
int e_type -<br />
0 on EOF or number of bytes read on success otherwise an error<br />
code on error<br />
cnf_xdr_fputs(id, buf, bufsize, e_type)<br />
PURPOSE:<br />
Translates a line to it's external data representation <strong>and</strong> sends it to<br />
file from buffer (starting at address xdr_buff). .<br />
208
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
PARAMETERS:<br />
RETURNS:<br />
int id – The index into cnf_map global data structure where the<br />
actual channel number or file pointer is stored<br />
char *buf -<br />
int bufsize – the number of bytes to send<br />
int e_type -<br />
int status - number of bytes written, 0 if error writing or an error<br />
code otherwise<br />
cnf_xdr_fread(id, buf, bufsize, nitems, e_type)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Read the external data representation of a 'record' from file into<br />
buffer (starting at address xdr_buff) <strong>and</strong> translates it to C language.<br />
int id – The index into cnf_map global data structure where the<br />
actual channel number or file pointer is stored<br />
char *buf -<br />
int bufsize – the number of bytes to read<br />
int nitems -<br />
int e_type -<br />
int status - number of bytes read, 0 if error writing or an error code<br />
otherwise<br />
cnf_xdr_fwrite(id, buf, bufsize, nitems, e_type)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Translates a 'record` to it's external data representation <strong>and</strong> sends it<br />
to file from buffer (starting at address xdr_buff).<br />
int id – The index into cnf_map global data structure where the<br />
actual channel number or file pointer is stored<br />
char *buf -<br />
int bufsize – the number of bytes to send<br />
int nitems -<br />
int e_type -<br />
Number of bytes written or an error code or -1 on error<br />
cnf_xdr_read(id, buf, bufsize, e_type)<br />
PURPOSE:<br />
Read the external data representation of a 'record' from file or pipe<br />
into buffer (starting at address xdr_buff) <strong>and</strong> translates it to C<br />
language.<br />
209
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
PARAMETERS:<br />
RETURNS:<br />
int id – The index into cnf_map global data structure where the<br />
actual channel number or file pointer is stored<br />
char *buf -<br />
int bufsize – the number of bytes to read<br />
int e_type -<br />
int status - number of bytes read, 0 if error writing or an error code<br />
otherwise<br />
cnf_xdr_tsget(tsh, tp_name, tuple, tp_len, e_type)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Destructive reads the external data representation of a tuple from a<br />
named tuple space <strong>and</strong> Translates it to C language.<br />
int tsh<br />
char *tp_name<br />
char *tuple<br />
int tp_len<br />
int e_type<br />
int status - the size of the tuple received if successful, 0 if it is an<br />
asynchronous read or –1 on error<br />
cnf_xdr_tsput(tsh, tp_name, tuple, tp_len, e_type)<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
Translates a tuple to it's external data representation <strong>and</strong> inserts it<br />
into a named tuple space<br />
int tsh<br />
char *tp_name<br />
char *tuple<br />
int tp_len<br />
int e_type<br />
int status - ? on success or an error code otherwise<br />
cnf_xdr_tsread(tsh, tp_name, tuple, tp_len, e_type)<br />
PURPOSE:<br />
PARAMETERS:<br />
Reads the external data representation of a tuple from a named<br />
tuple space <strong>and</strong> translates it to C language.<br />
int tsh<br />
char *tp_name<br />
char *tuple<br />
210
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
RETURNS:<br />
int tp_len<br />
int e_type<br />
int status - number of bytes read, 0 if error writing or an error code<br />
or –1 on error<br />
cnf_xdr_write(id, buf, bufsize, e_type)<br />
PURPOSE: Translates a 'record` to it's external data representation <strong>and</strong> sends it<br />
to file (or mailbox or decnet channel) from buffer (starting at address xdr_buff).<br />
PARAMETERS: int id – The index into cnf_map global data structure where the<br />
actual channel number or file pointer is stored<br />
char *buf -<br />
int bufsize – the number of bytes to send<br />
int e_type -<br />
RETURNS: 1 if successful or an error code or –1 on error<br />
PURPOSE:<br />
PARAMETERS:<br />
RETURNS:<br />
211
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
Error Codes<br />
TSH_ER_NOERROR<br />
TSH_ER_INSTALL<br />
TSH_ER_NOTUPLE<br />
TSH_ER_NOMEM<br />
TSH_ER_OVERRT<br />
Normal operation - No error at all<br />
Error: Tuple Space daemon could not be started<br />
Error: Could not find such tuple<br />
Error: Tuple space daemon out of memory<br />
Warning: Tuple was overwritten<br />
212
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
References<br />
i Information on tally sticks found at members.fortunecity.com<br />
ii Information on abacus found at http://www.maxmon.com<br />
iii Jill Britton, Department of Mathematics, Camosun College, 3100 Foul Bay Road, Victoria, BC, Canada,<br />
V8P 5J2. Web Page: http://ccins.camosun.bc.ca/~jbritton/jberatosthenes.htm<br />
iv http://encyclopedia.thefreedictionary.com/<br />
v http://www.thocp.net/hardware/pascaline.htm<br />
vi http://www.ox.compsoc.net/~swhite/history/timelines.html<br />
vii http://miami.int.gu.edu.au/dbs/1010/lectures/lecture4/Ifrah-pp121-133.html<br />
viii http://www.agnesscott.edu/lriddle/women/love.htm<br />
ix http://www.kerryr.net/pioneers/boole.htm<br />
x http://knight.city.ba.k12.md.us/faculty/ss/samuelmorse.htm<br />
xi http://history.acusd.edu/gen/recording/bell-evolution.html<br />
xii http://www-gap.dcs.st-<strong>and</strong>.ac.uk/~history/Mathematicians/Hollerith.html - Article by: J J O'Connor <strong>and</strong><br />
E F Robertson<br />
xiii http://www.marconi.com/html/about/marconihistory.htm<br />
xiv http://www.radio-electronics.com/info/radio_history/gtnames/fleming.html<br />
xv http://www.epemag.com/zuse/<br />
xvi http://www-gap.dcs.st-<strong>and</strong>.ac.uk/~history/Mathematicians/Aiken.html<br />
xvii http://www.research.att.com/~njas/doc/shannonbio.html<br />
xviii http://www.kerryr.net/pioneers/stibitz.htm<br />
xix http://plato.stanford.edu/entries/turing/<br />
xx http://ei.cs.vt.edu/~history/do_Atanasoff.html<br />
xxi http://www.library.upenn.edu/exhibits/rbm/mauchly/jwmintro.html<br />
xxii http://ftp.arl.mil/~mike/comphist/61ordnance/chap3.html<br />
xxiii http://en.wikipedia.org/wiki/MIT_Whirlwind<br />
xxiv http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/statistics.html<br />
xxv http://www.computer50.org/mark1/MM1.html<br />
xxvi http://www.awc-hq.org/lovelace/1997.htm<br />
xxvii http://www.cs.yale.edu/homes/tap/Files/hopper-story.html<br />
xxviii http://inventors.about.com/library/weekly/aa061698.htm<br />
xxix http://csdl.computer.org/comp/mags/an/2004/02/a2034abs.htm<br />
xxx http://www.cc.gatech.edu/gvu/people/r<strong>and</strong>y.carpenter/folklore/v3n1.html<br />
xxxi http://en.wikipedia.org/wiki/Defense_Advanced_Research_Projects_Agency<br />
xxxii http://www.engin.umd.umich.edu/CIS/course.des/cis400/algol/algol.html#history<br />
xxxiii http://inventors.about.com/library/weekly/aa080498.htm<br />
xxxiv http://www.nersc.gov/~deboni/Computer.history/LARC.Cole.html<br />
xxxv http://www.smartcomputing.com/editorial/dictionary/<br />
detail.asp?guid=&searchtype=1&DicID=16502&RefType=Encyclopedia<br />
xxxvi http://en.wikipedia.org/wiki/CTSS<br />
xxxvii http://www.ukuug.org/events/linux2001/papers/html/DAspinall.html<br />
xxxviii http://www.fys.ruu.nl/~bergmann/history.html<br />
xxxix http://www.engin.umd.umich.edu/CIS/course.des/cis400/pl1/pl1.html<br />
xl http://www.afrlhorizons.com/Briefs/Mar02/OSR0103.html<br />
xli http://www.smalltalk.org/alankay.html<br />
xlii http://www.faqs.org/faqs/dec-faq/pdp8/<br />
213
<strong>Synergy</strong> <strong>User</strong> <strong>Manual</strong> <strong>and</strong> <strong>Tutorial</strong><br />
xliii http://bugclub.org/beginners/languages/pascal.html<br />
xliv http://en.wikipedia.org/wiki/Edsger_Dijkstra<br />
xlv http://www.campusprogram.com/reference/en/wikipedia/s/so/software_engineering.html<br />
xlvi http://en.wikipedia.org/wiki/UNIX<br />
xlvii http://en.wikipedia.org/wiki/RS-232<br />
xlviii http://inventors.about.com/library/weekly/aa092998.htm<br />
xlix http://bugclub.org/beginners/processors/Intel-8086.html<br />
l http://bugclub.org/beginners/processors/Intel-80186.html<br />
li http://www.pcguide.com/ref/cpu/char/mfg.htm<br />
lii http://members.fortunecity.com/pcmuseum/dos.htm<br />
liii http://www.cs.uiuc.edu/news/alumni/fa98/chen.html<br />
liv http://www.webmythology.com/VAXhistory.htm<br />
lv http://en.wikipedia.org/wiki/Motorola_68000<br />
lvi http://en.wikipedia.org/wiki/INMOS_Transputer<br />
lvii http://csep1.phy.ornl.gov/ca/node11.html<br />
lviii PVM: Parallel Virtual Machine - A <strong>User</strong>s' Guide <strong>and</strong> <strong>Tutorial</strong> for Networked Parallel Computing; Al<br />
Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, Vaidy Sunderam; MIT Press,<br />
Scientific <strong>and</strong> Engineering Computation; Janusz Kowalik, Editor; Copyright 1994 Massachusetts Institute<br />
of Technology. The book can be viewed at: http://www.netlib.org/pvm3/book/pvm-book.html<br />
lix Linda <strong>User</strong>s Guide <strong>and</strong> Reference <strong>Manual</strong>, <strong>Manual</strong> Version 6.2; Copyright © 1989-1994, SCIENTIFIC<br />
Computing Associates, Inc. All rights reserved.<br />
lx http://people.hofstra.edu/faculty/Stefan_Waner/RealWorld/logic/logicintro.html<br />
lxi Garson, James, "Modal Logic", The Stanford Encyclopedia of Philosophy (Winter 2003 Edition), Edward<br />
N. Zalta (ed.), URL = .<br />
lxii Galton, Antony, "Temporal Logic", The Stanford Encyclopedia of Philosophy (Winter 2003 Edition),<br />
Edward N. Zalta (ed.), URL = .<br />
lxiii Reevaluating Amdahl’s Law; John L. Gustafson; S<strong>and</strong>ia National Laboratories; 1988.<br />
lxiv Reevaluating Amdahl’s Law <strong>and</strong> Gustafson’s Law; Yuan Shi; Temple University; October 1996.<br />
lxv <strong>Synergy</strong> <strong>Manual</strong><br />
214