GCL/MPI (Beta version) Comments to: Gene Cooperman (gene@ccs.neu.edu) STAR/MPI is a system to allow binding of MPI to a generic (STAR) interactive language. The paper describes the general technique. The file gclmpi.tar.Z is a specific implementation for GCL Common LISP. GCL/MPI is intended as an easy-to-use master-slave distributed architecture. It combines the feedback of an interactive language (the GCL or AKCL dialect of LISP) with the the use of MPI to take advantage of networks of workstations. As such, it is hoped that it will make available an SPMD architecture that helps people overcome the initial learning barrier in writing parallel programs. Ease-of-use is emphasized while hoping to maintain reasonable efficiency and a reasonable feature set. This distribution, along with a paper describing it is available by anonymous ftp in the directory /pub/people/gene/starmpi at ftp.ccs.neu.edu . If you use this software, please send e-mail to gene@ccs.neu.edu to notify me. This is admittedly a very simplistic manual for now. As the system develops, this manual will expand further. The main idea to understand is that gcl/mpi is built in three layers. Most people will prefer to use almost entirely the master slave layer, while taking advantage of the lower layers only as needed. By default, commands are executed on the master only. The implicit "PRINT" in the read-eval-print loop operates only on the master. However, explicit print commands executed by user programs on master AND slaves will display on the user console. The routines par-eval and par-funcall cause execution on all processors (master and slaves). A user can restrict execution to a particular processor by use of the functions master-p and mpi-comm-rank. However, as a matter of style, it is recommended to use these only as a last resort, since programming is conceptually easier when the same data structures are present on all processors. The current implementation is based on MPI and GCL or AKCL, but it should be easily portable to other message passing libraries (such as PVM) and other dialects of Common LISP with a foreign function interface capable of loading object (.o) files and library archive (.a) files. It has been tested primarily under SPARC SunOS 4.x. If you are interested in a different architecture, please tell me (gene@ccs.neu.edu). Note that messages are converted to strings (print representations) before being sent. This implies significant overhead for very large messages, and the inability to distinguish (vector t) from (vector fixnum), for example. A future version will also recognize (vector fixnum) and (vector float), and encode them more efficiently. Also, a timer will kill any process that has not received a message in 60 minutes. This version is still experimental, and commands can change. More examples are planned for the future to make the learning curve less steep. Comments will be gratefully accepted. MASTER SLAVE LAYER (master-slave - see related article for full details. The user is responsible for writing the FNC's :get-task FNC - (get-task) returns TASK, arb. user data struct. :do-task FNC - (do-task TASK) returns RESULT, arb. user data struct :get-task-result FNC - (get-task-result RESULT) must return T, NIL, ?, or (CONTINUATION . *) T means call (update-environment TASK RESULT) NIL means do nothing more with this task ? means re-send TASK for additional comp. by do-task CONTINUATION is like ?, but * is user parameter :update-environment FNC - (update-environment TASK RESULT) executed on master and slaves, used only for side effect :trace T-OR-NIL) - obvious EXAMPLE: see file, myfactor.lsp (up-to-date-p) - utility to test if update-environment was called between time when most recent result was generated, using get-task, and when the correspond. result was obtained, using get-task-result EXAMPLE: (defun get-task-result-fnc (result) (if (up-to-date-p) t ;data is consistent, update all processors '?)) ;data structures changed, re-do computation (par-eval 'EXP) - EXP evaluated on all processors EXAMPLE: (par-eval '(defun foo (x) (1+ x))) (par-funcall FNC ARG1) - like (funcall FNC ARG1 ...) with value of ARG's take from lexical environment on master EXAMPLE: (par-funcall #'load "file.lsp") (par-reset) - resets slave processors if wedged (deadlocked) EXAMPLE: (par-reset) ; NOTE: (if (master-p) ..) not needed. (master-p) - returns t when executed on master, and nil when on slave. EXAMPLE: (if (master-p) (set-up-large-master-data-structure)) SLAVE SERVER LAYER - master distributes commands to slaves; These commands should be executed on the master ONLY. The slave-server layer sets the initial slave directory to the same as the master. The master can arbitrarily intermix sending commands and getting results (send-command LISP-EXPR OPTIONAL-MPI-ID OPT-TAG) - send arb. expression for evaluation; send to MPI-ID with TAG, default is ID = 1, TAG = 0 EXAMPLE: (send-command '(+ 3 4)) NOTE: if LISP-EXPR is a string, it assumes this is a print representation of a LISP object, and not a raw LISP string (get-result OPTIONAL-MPI-ID OPT-TAG) - return result from next msg from MPI-ID that was sent with TAG, default is next available slave, any tag EXAMPLE: (get-result) -> returns 7 for above case (broadcast-command LISP-EXPR) - send arb. expression to all slaves. No result is returned. (mpi-comm-rank) - returns MPI process ID, 0 = master EXAMPLE: (if (= (mpi-comm-rank) 2) (print x)) MPI LAYER - not all commands are implemented. The MPI manual is also a good source for what the commands do. The calling sequences can be found via (help 'mpi-command) or (describe 'mpi-command) (mpi-iprobe) - non-blocking probe for remaining messages. Returns t or nil. EXAMPLE: (if (null (mpi-iprobe)) (print "READY")) (mpi-probe) - blocking probe for remaining msg's. Always returns t, eventually. EXAMPLE: (progn (mpi-probe) (print "READY")) Many other MPI commands. Not recommended, except for special cases. One could load the MPI layer alone, for special requirements. DEBUGGING - It is recommended, when debugging, to run on the master and all slaves on a single processor. (The procgroup file can easily be set up this way.) The :trace keyword in the function, master-slave, is also useful.