Back-End


next up previous
Next: Validation with SIMPLE Up: The Functional Algorithm Previous: Front-End

Back-End

  


Figure 1: Back-end of the FAST.

The back-end of FAST (see Figure 1) receives the output from the front-end and maps the task-flow graph onto a message-passing multiprocessor architecture. The mapping process is accomplished in a number of successive steps: first, FAST maps the task-flow graph onto an idealized architecture with a number of processors equal to the number of tasks, forming a fully-connected network (``abundant'' clique). The resulting graph is called parallel-execution graph. This mapping transforms the weights assigned to nodes and edges of the task-flow graph into processing times and message latencies respectively, according to a chosen set of hardware parameters. Furthermore, the user can select a communication paradigm for the message-passing interface, out of a couple of available choices such as synchronous or asynchronous blocking Send's and Receive's.

The parallel execution graph is subsequently passed through clustering, a stage seeking to minimize the communication overhead of the parallel execution, without sacrificing parallelism [8]. Clustering has been proven to be NP-Complete. A number of different heuristics have been implemented in FAST to allow experimentation with different choices [4].

After clustering, FAST performs a mapping of the clustered parallel-execution graph onto a message-passing architecture. The number of processors is provided by the user of the system. This mapping problem is also NP-complete [8]. To map the task-clusters onto processors we have implemented and incorporated in our system a number of heuristics similar to those included in the clustering stage [8]. More information about the clustering and mapping stages are presented in [3].

The clustering and mapping stages can be bypassed if the user of FAST chooses to have the front-end partition the problem-instance at hand into a number of blocks equal to the number of available processors. In that case, task-nodes representing the operations performed on each block are clustered together into single threads of execution and mapped directly to the processors.

Interconnection Network Topologies
A few different interconnection schemes are available in the current version of FAST: a ``limited'' clique (a clique with a limited number of processors), a ring, various multirings, and a binary hypercube. The limited clique represents the fastest possible parallel implementation of the applications studied, for the number of available processors and the clustering and mapping heuristics adopted. Although it is not a realistic interconnection topology, it is useful for benchmarking the efficiency of architectures with the same number of processors and sparser topologies.



next up previous
Next: Validation with SIMPLE Up: The Functional Algorithm Previous: Front-End