Messages are sent over nanomsg. Message format is JSON. All messages have one string key 'op', a string key 'role' and a unique name field, currently 'host'. 'op' can be: - 'discover' - 'register' 'role' can only be: - 'master' - 'coordinator' - 'worker' 'host' is the FQDN or hostname of a coordinator (depends on the network setup). Operations: Discovery: ---------- Master sends: { "op": "discover", "role": "master" } Coordinators answer with: All coordinators send: { "op": "register", "role": "coordinator", "host": "server1", "cpus": 2, "os": "cpe:\/o:arch:arch:rolling", "arch": "x86_64" } The coordinator sends its own configuration to the master. On receiving a 'register' operation the master must handle accordingly, usually adding the coordinator as known and alive and provide new platforms and architectures to run workers on. Also currently scheduled jobs must be examined. The coordinators also send all known workers to the master as a list: { "workers": { "name": "worker1", "mode": "direct", "command": "build.sh" } { "name": "worker2", "mode": "direct", "command": "build.sh" } } Coordinator operations: ----------------------- The master can start and stop workers: { "op": "start", "rule": "master", "worker": "worker1" } { "op": "stop", "rule": "master", "worker": "worker1" } The master sends this as survey call to all coordinators, the coordinator who feels responsible for this working with start/stop/kill the worker with the given name. Then it sends back an ack message to the master: { "op": "stopped", "role": "coordinator", "host": "eeepc", "worker": "worker1", "found": true } Worker messages: ---------------- Workers send their output and states to the master via a data channel (PIPELINE): { "op": "output", "role": "worker", "worker": "worker1", "msg": "Msg 30\n", "stdout": false }