TRACE FILES FOR VisAndOr This a very informal description of the format of the traces used by VisAndOr. We are working on writing a more formal and complete specification document which will include the definition of new significant information to be recorded in the trace files. However, the present description is exhaustive about the description of the events (and its formats) which the current version of VisAndOr is capable of dealing with. The files used by VisAndOr as input are assumed to contain a sequence of significant events happened during a (parallel) execution of a program. - Each event appears in a different line in the trace file. - The general format of a line in the trace file consists of seven numerical integer fields, separated by spaces: Timestamp event_code Nodeid Branches (WAM id) Agent/Workerid u d x d x d The characters 'u', 'd' and 'x' indicate how VisAndOr reads each number: 'u' means 'unsigned integer' (the time is expected not to be negative), 'd' means 'signed integer' and 'x' means 'hexa'. The reasons for using hexadecimal numbers in some fields are historical. The absolute values of the fields (except timpestamp and event_type, of course) are completely decoupled from the representation offered by VisAndOr. Just for fun, here is the C line that generates the events in &-Prolog: fprintf(event_file, "%10u %d %X %d %X %d \n", eventptr->timestamp, eventptr->evtype, eventptr->ppf, eventptr->nfork, eventptr->wam, min_ts_agent); The only exception to the rule above is the first line of the file which contains a single digit which tells visandor the type of file it is reading (see below). All events start with a timestamp and an event code. The event code determines the type of event traced. The timestamp represents the time when the event happened. Two important caveats: first, events must be ordered chronologically (increasing timestamps). In case of a truly parallel trace this may imply sorting the events before writing them out. Second, no two events can have the same timestamp. This generally doesn't happen if one uses a clock with microsecond resolution (as in the sequent machines). If you do not have this clock resolution but are sure of the order of the events, use the simple trick of incrementing by one the time-stamp of an event with the same time-stamp as the one before. Measuring time in microseconds is recommended in a real system, but any time measure is fine. Nodeid: this is a unique identifier for each node (fork point) in the tree or graph being represented (more below). All events finish with a WAM id. and an agent id. The agent id. is a unique number (starting from 0 onwards for each agent) given to agent ("worker") that traced the event. The WAM id. is a unique number of the WAM (set of stacks) in which that agent is running. In some execution models the separation of agents and WAMs in meaningless, for they assume a one-to-one processor-WAM mapping. This is the case for example in MUSE and Aurora, since there is no difference between a set of stacks and an agent. But both can be different in some versions of &-Prolog, for which the tool was originally developed. If you do not have to distinguish between agent and worker, use the same id. in both fields, BUT be careful, because they are read in different ways: a 10 in both fields is not correct, because 10 hexa is 16 decimal. The fields should be A and 10, to represent the same number. The middle fields are, while keeping the same format, interpreted differently for each event. - The events and their code and format are as follows (first the event code of each event is given by showing its C definition, and then the meaning of the rest of the fields is commented): ------------------------------------------------------------------------------ General events ----------------------------- ------------------------------------------------------------------------------ #define START_TIME 5 /* Timestamp, 5, Nodeid, 0, WAMid, Agentid */ #define STOP_TIME 6 /* Timestamp, 6, Nodeid, 0, WAMid, Agentid */ ------------------------------------------------------------------------------ These two events are simply to delimit the start and stop times of the trace. START_TIME must be the first event of a trace file (just after the file identifier). In correspondence, the last event (and also the last line!!!) of a trace file must be STOP_TIME. ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ And-parallelism related events ------------------------------ ------------------------------------------------------------------------------ #define FORK 1 /* Timestamp, 1, Nodeid, #tasks, WAMid, Agentid */ ------------------------------------------------------------------------------ COMMENT: - This event indicates when an and-fork is created, i.e. a set of parallel 'and' tasks corresponding to the parallel execution of the goals in a clause body - #tasks: number of parallel tasks for the fork - Remember that "Nodeid" is a unique identifier assigned (arbitrarily) to this fork when a FORK event is produced. The "father" of this event will found looking for a START_GOAL with the same Agentid. ------------------------------------------------------------------------------- #define START_GOAL 2 /* Timestamp, 2, Nodeid, task#, WAMid, Agentid */ ------------------------------------------------------------------------------- COMMENT: - This event indicates the time when a task really starts to be executed - Nodeid: must be the Nodeid of the corresponding FORK (its "father") - task#: the identifier (the number) of the task. The tasks are numbered starting by 0 (1, 2,... , #tasks-1) ------------------------------------------------------------------------------- #define FINISH_GOAL 3 /* Timestamp, 3, Nodeid, task#, WAMid, Agentid */ ------------------------------------------------------------------------------- COMMENT: - This event indicates the instant when a task has finished. Note that a START_GOAL [or JOIN] with the same Nodeid and task# must be previously found. ------------------------------------------------------------------------------- #define JOIN 4 /* Timestamp, 4, Nodeid, task#, WAMid, Agentid */ ------------------------------------------------------------------------------- COMMENT: - This event indicates the instant when a task execution continues after a fork which has previously executed has finished (the execution of all the parallel tasks has terminated and the "father" execution is resumed). Note that a FORK with the same Nodeid and task# has before to be necessarily found. ------------------------------------------------------------------------------- The following events currently have not a graphical representation #define AGENT_BUSY 7 /* agent found work and starts working */ #define AGENT_IDLE 8 /* agent stops working and starts looking for work */ #define CREATE_WAM 9 #define CREATE_AGENT 10 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Or-parallelism related events ------------------------------- ------------------------------------------------------------------------------ #define MAKE_PUBLIC 20 /* Timestamp, 20, Nodeid, #branches, WAMid, Agentid */ ------------------------------------------------------------------------------- COMMENT: - This event indicates when an or-fork is created, i.e. a set of parallel 'or' tasks corresponding to a choice-point are made available for execution. - #branches: number of alternative branches for the fork - Remember that "Nodeid" is a unique identifier assigned (arbitrarily) to this fork when a MAKE_PUBLIC event is produced. The "father" of this event will found looking for a START_BRANCH [or RESUME_BRANCH] with the same Agentid. ------------------------------------------------------------------------------- #define START_BRANCH 21 /* Timestamp, 21, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------- COMMENT: - This event indicates the instant when the branch actually starts to be executed. - Nodeid: must be the Nodeid of the corresponding MAKE_PUBLIC (its "father") - branch#: the identifier (the number) of the branch. The branches are numbered starting by 0 (1, 2,... , #branches-1) ------------------------------------------------------------------------------- #define SUCC_BRANCH 22 /* Timestamp, 22, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------- COMMENT: - Precise time at which a branch has finished successfully. Note that a START_BRANCH (or RESUME_BRANCH) event with the same Nodeid, branch# and Agentid must appear previously in the trace file. ------------------------------------------------------------------------------- #define FAIL_BRANCH 23 /* Timestamp, 23, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------- COMMENT: - The instant when a branch has finished with failure. Same consideration that with the SUCCESS event. NOTE Some schedulers are not capable of distinguishing between this two events. The only difference from the point of view of the visualization tool is a different icon for representing each one of them. EXAMPLE ... 1851720561 20 4 2 2 2 > make public with two branches: node identifier '4' 1851723086 21 4 0 2 2 > start of the branch number 0 of the node '4' on worker 2 1851726142 21 4 1 3 3 > start of the branch number 1 of the node '4' on worker 3 1851730390 22 4 0 2 2 > success of the branch 0 of the node '4' 1851763787 20 5 2 3 3 > make public, with two branches, whose "father" is the branch number 1 of the node '4' on worker 3, and so on ... ------------------------------------------------------------------------------- #define SUSPEND_BRANCH 24/* Timestamp, 24, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------- COMMENT: - Branch execution has been suspended (the processor becomes free for working in another branch). The corresponding RESUME_BRANCH event must be found at a later time. ------------------------------------------------------------------------------- #define RESUME_BRANCH 25/* Timestamp, 25, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------- COMMENT: - A previously suspended branch is resumed (normally on a processor different from the used one before suspension). ------------------------------------------------------------------------------ #define CUTTING_BRANCH 26 /* Cutting branch, gives branch id */ /* Timestamp, 26, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------ COMMENT: The visualization of this event has not been implemented ------------------------------------------------------------------------------ #define LEAF-CUT 27/* Timestamp, 27, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------ COMMENT: A tentative event that currently has not sense. ------------------------------------------------------------------------------- Additional events for Andorra-I ------------------------------- ------------------------------------------------------------------------------ #define START_BUSY 33 /* Timestamp, 33, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------ COMMENT: - A new SLAVE join to work in this branch. - The codification of this event is somewhat different to the previous ones. The fields Nodeid and branch# are exactly equal for all the slaves working at a branch. The only difference will be the WAMid/Agentid. ------------------------------------------------------------------------------ #define STOP_BUSY 34 /* Timestamp, 34, Nodeid, branch #, WAMid, Agentid*/ ------------------------------------------------------------------------------ COMMENT: - A SLAVE has finished working in a branch. NOTE: At the end of a branch, the number of START_BUSYs (if any) is equal to the number of STOP_BUSYs. EXAMPLE: ... 1851464983 20 0 1 1 1 1851466980 21 0 0 1 1 1851468986 33 0 0 2 2 > first slave (processor 2) joined to the branch execution 1851470934 33 0 0 3 3 > second slave 1851472687 33 0 0 4 4 ... 1851579464 34 0 0 3 3 > the slave processor 3 has finished its work 1851582489 34 0 0 4 4 ... 1851585113 34 0 0 2 2 1851592863 33 0 0 3 3 > the processor 3 becomes newly slave 1851594589 33 0 0 2 2 ... 1851596377 33 0 0 4 4 1851598307 34 0 0 2 2 1851599906 34 0 0 4 4 1851601909 34 0 0 3 3 1851607254 20 1 3 1 1 > the branch continues perform a make_public (note that all slaves have finished their work yet) ... ------------------------------------------------------------------------------ IMPORTANT--IMPORTANT--IMPORTANT THE 'TYPE OF PARALLELISM' FLAG: For convenience, but not for elegance, the FIRST line of the trace file must contain a number indicating the type of parallelism. AND-parallelism: FLAG = 0 OR-parallelism: FLAG = 1 >> Andorra-I is supposed to be OR-parallelism. EXAMPLE: 1 > OR parallelism 1236420313 5 0 0 1 1 Fragment of an Andorra-I trace 1236422609 20 0 1 1 1 1236424094 21 0 0 1 1 1236425973 33 0 0 2 2 1236427516 33 0 0 3 3 ...