Advanced Tutorial on Program Development and Optimization using the Ciao Preprocessor

Author(s): The Ciao Development Team.

ciaopp is a standalone preprocessor to the standard clause-level compiler. It performs source-to-source transformations. The input to ciaopp are logic programs (optionally with assertions and syntactic extensions). The output are error/warning messages plus the transformed logic program, with results of analysis (as assertions), results of static checking of assertions, assertion run-time checking code, and optimizations (specialization, parallelization, etc.).

This tutorial is organized as follows:

  • Getting Started gives the “getting started” basics.
  • Static Analysis and Program Assertions shows how ciaopp performs program analysis.
  • Program Debugging and Assertion Validation does the same for program debugging and validation.
  • Source Program Optimization presents ciaopp at work for program transformation and optimization.

Getting Started

A ciaopp session consists in the preprocessing of a file. The session is governed by a menu, where you can choose the kind of preprocessing you want to be done to your file among several analyses and program transformations available. This tutorial shows how to customize the preprocessing but some predefined preprocessing options are described in CiaoPP Quick Tutorial.

Clicking on the icon in the buffer containing the file to be preprocessed displays the menu, which will look (depending on the options available in the current ciaopp version) something like the “Preprocessor Option Browser” shown in:

Except for the first line, which refers to selecting levels of customization. You can select analysis (analyze), assertion checking (check_assertions), certificate checking (check_certificate), or program optimization (optimize), and you can later combine the four kinds of preprocessing. The relevant options for the action group selected are then shown, together with the relevant flags.

A description of the values for each option will be given as it is used in the corresponding section of this tutorial.

Static Analysis and Program Assertions

The fundamental functionality behind CiaoPP is static global program analysis, based on abstract interpretation. For this task CiaoPP uses the PLAI abstract interpreter [MH92,BGH99].

The system infers information on variable-level properties such as moded types, definiteness, freeness, independence, and grounding dependencies: essentially, precise data structure shape and pointer sharing. It can also infer bounds on data structure sizes, as well as procedure-level properties such as determinacy, termination, non-failure, and bounds on resource consumption (time or space cost). CiaoPP implements several techniques for dealing with “difficult” language features (such as side-effects, meta-programming, higher-order, etc.) and as a result can for example deal safely with arbitrary ISO-Prolog programs [BCHP96]. A unified language of assertions [BCHP96,PBH00b] is used to express the results of analysis, to provide input to the analyzer of program specifications for debugging and validation, as well as the results of the comparisons performed against the specifications.

Static Analysis Basics

Consider the program defining a module which exports the qsort predicate:

:- module(qsort, [qsort/2], [assertions]).
:- entry qsort(A,B) : (list(A, num), var(B)).

qsort([X|L],R) :-
        qsort(L2,R2), qsort(L1,R1),

        E < C, !, partition(R,C,Left1,Right).
        E >= C, partition(R,C,Left,Right1).

append([H|X],Y,[H|Z]):- append(X,Y,Z).

The sharing and freenes analysis abstract domain computes freeness, independence, and grounding dependencies between program variables.

It is performed by selecting the menu option Aliasing-Mode:

The output of the analysis is performed via assertions (see Using assertions for preprocessing programs for a detailed description). In this case three assertions appear:

:- true pred qsort(A,B)
         : mshare([[A],[A,B],[B]])
        => mshare([[A,B]]).
:- true pred partition(A,B,C,D)
         : ( var(C), var(D), mshare([[A],[A,B],[B],[C],[D]]) )
        => ( ground(A), ground(C), ground(D), mshare([[B]]) ).
:- true pred append(A,B,C)
         : ( ground(A), mshare([[B],[B,C],[C]]) )
        => ( ground(A), mshare([[B,C]]) ).

These assertions express, for example, that the third and fourth arguments of partition have “output mode”: when partition is called (:) C and D are free unaliased variables and they are ground on success (=>). Also, append is used in a mode in which the first argument is input (i.e., ground on call). Also, upon success the arguments of qsort will share all variables (if any).

Type Analysis

CiaoPP can infer (parametric) types for programs both at the predicate level and at the literal level [GdW94,GP02,VB02]. Different type domains can be selected with the menu option Shape-Type Analysis. As shown in the following screenshot:

The preprocessor will output, as before, the analysis with assertions:

At the predicate level the inferred information is:

:- true pred qsort(A,B)
         : ( list(A,num), term(B) )
        => ( list(A,num), list(B,num) ).
:- true pred partition(_A,_B,Left,Right)
         : ( list(_A,num), num(_B), term(Left), term(Right) )
        => ( list(_A,num), num(_B), list(Left,num), list(Right,num) ).
:- true pred append(_A,X,_B)
         : ( list(_A,num), rt5(X), term(_B) )
        => ( list(_A,num), rt5(X), rt11(_B) ).
where term is any term and prop list is defined in library(lists) as:
:- regtype list(L,T) #"L is a list of T's.".
list([X|L],T) :- T(X), list(L).

And there were two new inferred types:

:- regtype rt5/1.
rt5([A]) :-

:- regtype rt11/1.
rt11([A|B]) :-

Entry assertions [BCHP96] to specify a restricted class of calls to the module entry points as acceptable:

:- entry qsort(A,B) : (list(A, num), var(B)).
This informs the analyzer that in all external calls to qsort, the first argument will be a list of numbers and the second a free variable. Note the use of builtin properties (i.e., defined in modules which are loaded by default, such as var, num, list, etc.). Note also that properties natively understood by different analysis domains can be combined in the same assertion.

Non-failure and Determinacy Analysis:

CiaoPP includes a non-failure analysis, based on [DLGH97] and [BLGH04], which can detect procedures and goals that can be guaranteed not to fail, i.e., to produce at least one solution or not terminate. It also can detect predicates that are “covered”, i.e., such that for any input (included in the calling type of the predicate), there is at least one clause whose “test” (head unification and body builtins) succeeds. CiaoPP also includes a determinacy analysis based on [LGBH05], which can detect predicates which produce at most one solution, or predicates whose clause tests are mutually exclusive, even if they are not deterministic (because they call other predicates that can produce more than one solution). Programs can be analyzed with this kind of domains by selecting to perform Non-Failure Analysis with domain nf:

Analyzing qsort with the nf domain will produce (among others) the following assertion:

:- true pred qsort(A,B)
         : ( mshare([[B]]), var(B), ground([A]), list(A,num), term(B) )
        => ( ground([A,B]), list(A,num), list(B,num) )
         + ( not_fails, covered ).

The + field in pred assertions can contain a conjunction of global properties of the computation of the predicate. not_fails states that if the precondition is met, the predicate will never fail.

Size, Resources, and Termination Analysis

CiaoPP can also infer lower and upper bounds on the sizes of terms and the computational cost of predicates [DLGHL94,DLGHL97]. The cost bounds are expressed as functions on the sizes of the input arguments and yield the number of resolution steps. Various measures are used for the “size” of an input, such as list-length, term-size, term-depth, integer-value, etc. Note that obtaining a non-infinite upper bound on cost also implies proving termination of the predicate.

Our resource analysis is parametric on the resources, therefore a package defining the resource to be used has to be imported in the module, in this case we use the default package that infers information about computation al steps. This is done by replacing the first line by:

:- module(qsort, [qsort/2], [assertions,predefres(res_steps)]).

Also, to be able to infer lower bounds a non-failure and determinacy analysis has to be performed:

As an example, the following assertions are part of the output of the upper bounds analysis:

:- true pred qsort(A,B)
         : ( list(A,num), var(B) )
        => ( list(A,num), list(B,num), size(ub,A,length(A)), size(ub,B,exp(2,length(A))-1.0) )
         + cost(ub,steps,sum($(j),1,length(A),exp(2,length(A)- $(j))* $(j))+exp(2,length(A)-1)*length(A)+2.0*exp(2,length(A))-1.0).
:- true pred append(_A,X,_B)
         : ( list(_A,num), rt13(X), var(_B) )
        => ( list(_A,num), rt13(X), rt13(_B), size(ub,_A,length(_A)), size(ub,X,length(X)), size(ub,_B,length(X)+length(_A)) )
         + cost(ub,steps,length(_A)+1).

For example, the second assertion is inferring on success size(ub,_B,length(X)+length(_A)), which means that an (upper) bound on the size of the third argument of append/3 is the sum of the sizes of the first and second arguments. The inferred upper bound on computational steps (+ cost(ub,steps,length(_A)+1)) is the length of the first argument of append/3.

The following is the output of the lower-bounds analysis:

:- true pred qsort(A,B)
         : ( list(A,num), var(B) )
        => ( list(A,num), list(B,num), size(lb,A,length(A)), size(lb,B,1) )
         + cost(lb,steps,length(A)+3).
:- true pred append(_A,X,_B)
         : ( list(_A,num), rt13(X), var(_B) )
        => ( list(_A,num), rt13(X), rt13(_B), size(lb,_A,length(_A)), size(lb,X,length(X)), size(lb,_B,length(X)+length(_A)) )
         + cost(lb,steps,0).
The lower-bounds analysis uses information from the non-failure analysis, without which a trivial lower bound of 0 would be derived.

In this case it is inferred that on success the lower bound of the third argument of append is size(lb,_B,length(X)+length(_A)) (the same as the upper bound!), and the upper bound on computational steps + cost(lb,steps,0), which represents the case in which the first list to concatenate is empty.

Decidability, Approximations, and Safety

As a final note on the analyses, it should be pointed out that since most of the inferred properties are in general undecidable at compile-time, the inference technique used, abstract interpretation, is necessarily approximate, i.e., possibly imprecise. On the other hand, such approximations are also always guaranteed to be safe, in the sense that (modulo bugs, of course) they are never incorrect: the properties stated in inferred assertions do always hold of the program.

Program Debugging and Assertion Checking

ciaopp is also capable of static assertion checking, and debugging using the ideas outlined so far. To this end, it implements the framework described in [HPB99,PBH00a] which involves several of the tools which comprise ciaopp. The following figure depicts the overall architecture.

Hexagons represent the different tools involved and arrows indicate the communication paths among them.

Program verification and detection of errors is first performed at compile-time by inferring properties of the program via abstract interpretation-based static analysis and comparing this information against (partial) specifications written in terms of assertions (see [HPBLG05] for a detailed description of the sufficient conditions used for achieving this ciaopp functionality).

The static checking is provably safe in the sense that all errors flagged are definite violations of the specifications.

Assertions and Properties

As introduced before, assertions are a means of specifying properties which are (or should be) true of a given predicate, predicate argument, and/or program point. See Using assertions for preprocessing programs for a more detailed description, of the concepts that we briefly introduce now.

They are of the form :- [Status] Scope Head : Pre => Post + Comp., where Status is a qualifier of the meaning of the assertion, Scope describes where the assertion is applied, Head is a normalized atom, and Pre, Post, and Comp are conjunctions of properties.

So far we have seen assertions with Status true, which mean that they express the behavior inferred by the analyzer. This status can only appear in the output of the analyzer (i.e. the user should not use it).

  • check: (input and output status) default status (i.e., if no status is specified), expresses properties that the user wants ensured to hold at run-time.
  • trust: (input status) the assertion represents an actual behavior of the predicate that the analyzer may not be able to infer automatically.
  • checked: (output status) the assertion was proved to hold statically by the analyzer.
  • false: (output status) the assertion was proved not to hold statically by the analyzer.

In CiaoPP properties are predicates, which may be builtin or user defined. For example, the property var used in the above examples is the standard builtin predicate to check for a free variable. The same applies to ground and mshare. The properties used by an analysis in its output (such as var, ground, and mshare for the previous mode analysis) are said to be native for that particular analysis. The system requires that properties be marked as such with a prop declaration which must be visible to the module in which the property is used.

Properties declared and/or defined in a module can be exported as any other predicate. For example:

:- prop list/1.
list([_|L]) :- list(L).
or, using the functional syntax package, more compactly as:
:- prop list/1. list := [] | [_|list].

defines the property “list”. A list is an instance of a very useful class of user-defined properties called regular types [YS87,DZ92,GdW94,GP02,VB02], which herein are simply a syntactically restricted class of logic programs. We can mark this fact by stating “:- regtype list/1.” instead of “:- prop list/1.” (this can be done automatically). The definition above can be included in a user program or, alternatively, it can be imported from a system library, e.g.: :- use_module(library(lists),[list/1]).

Using analysis information for debugging

The idea of using analysis information for debugging comes naturally after observing analysis outputs for erroneous programs. Consider this buggy implementation of qsort:

:- module(_, [qsort/2], [assertions]).
:- entry qsort(A,B) : (list(A, num), var(B)).

qsort([X|L],R) :-
        qsort(L2,R2), qsort(L1,R1), 
        append(R2,[x|R1],R).    % <-- 'x' should be X (variable)

        E < C, !, partition(R,C,Left1,Right).
        E >= C,   partition(R,C,Left,Right1).

append([H|X],Y,[H|Z]):- append(X,Y,Z).

The result of regular type analysis for this program includes the following code:

:- true pred qsort(A,B)
         : ( list(A,num), term(B) )
        => ( list(A,num), list(B,^(x)) ).
where list(B,^x) means “B is a list of atoms x.”. The information inferred does not seem compatible with a correct definition of qsort, which clearly points to a bug in the program.

Static Checking of Assertions in System Libraries

In addition to manual inspection of the analyzer output, ciaopp includes a number of automated facilities to help in the debugging task. For example, ciaopp can find incompatibilities between the ways in which library predicates are called and their intended mode of use, expressed in the form of assertions in the libraries themselves. Also, the preprocessor can detect inconsistencies in the program and check the assertions present in other modules used by the program.

Consider a different implementation of qsort, also with bugs:

:- module(_, [qsort/2], [assertions]).
:- entry qsort(A,B) : (list(A, num), var(B)).

qsort([X|L],R) :-
        partition(L,L1,X,L2),         % <-- swapped second and third arguments
        qsort(L2,R2), qsort(L1,R1),

partition([e|R],C,[E|Left1],Right):-  % <-- 'e' should be E (variable)
        E < C, !, partition(R,C,Left1,Right).
        E >= C, partition(R,C,Left,Right1).

append([H|X],Y,[H|Z]):- append(X,Y,Z).

We run compile-time error checking and selecting type and mode analysis for our tentative qsort program, by selecting the action check_assertions.

By default, the option Perform Compile-Time Checks is set to auto, which means that the system will automatically detect the analyses to be performed in order to check the program, depending on the information available in the program assertions (in the example in The entry assertion informs how the predicate qsort/2 will be called using types and modes information only).

Using the default options, and setting Report Non-Verified Assrts to error, we obtain the following messages (and the system highlights the line which produces the first of them, as shown:

WARNING (preproc_errors): (lns 5-8) goal qsort2:partition(L,L1,X,L2) at literal 1 does not succeed!

WARNING (ctchecks_messages): (lns 11-12) the head of clause 'qsort2:partition/4/2' is incompatible with its call type
         Head:      qsort2:partition([e|R],C,[E|Left1],Right)
         Call Type: qsort2:partition(basic_props:list(num),term,num,term)

WARNING (preproc_errors): (lns 13-14) goal arithmetic:>=(E,C) at literal 1 does not succeed!

First and last messages warn that all calls to partition and >=/2 will fail, something normally not intended (e.g., in our case). The error message indicates a wrong call to a builtin predicate, which is an obvious error. This error has been detected by comparing the mode information obtained by global analysis, which at the corresponding program point indicates that the second argument to the call to >=/2 is a variable, with the assertion:

:- check calls A>=B : (ground(A), ground(B)).

which is present in the default builtins module, and which implies that the two arguments to >=/2 should be ground when this arithmetic predicate is called. The message signals a compile-time, or abstract, incorrectness symptom [BDD97], indicating that the program does not satisfy the specification given (that of the builtin predicates, in this case). Checking the indicated call to partition and inspecting its arguments we detect that in the definition of qsort, partition is called with the second and third arguments in reversed order -- the correct call is partition(L,X,L1,L2).

After correcting this bug, we proceed to perform another round of compile-time checking, which continues producing the following message:

WARNING (ctchecks_messages): (lns 11-12) the head of clause 'qsort2:partition/4/2' is incompatible with its call type
         Head:      qsort2:partition([e|R],C,[E|Left1],Right)
         Call Type: qsort2:partition(basic_props:list(num),term,num,term)
This time the error is in the second clause of partition. Checking this clause we see that in the first argument of the head there is an e which should be E instead. Compile-time checking of the program with this bug corrected does not produce any further warning or error messages.

Static Checking of User Assertions and Program Validation

TODO: This section is not finished

Though, as seen above, it is often possible to detect error without adding assertions to user programs, if the program is not correct, the more assertions are present in the program the more likely it is for errors to be automatically detected. Thus, for those parts of the program which are potentially buggy or for parts whose correctness is crucial, the programmer may decide to invest more time in writing assertions than for other parts of the program which are more stable. In order to be more confident about our program, we add to it the following check assertions:

Note: The check prefix is assumed when no prefix is given, as in the example shown.

:- calls qsort(A,B) : list(A, num).                        % A1
:- success qsort(A,B)  => (ground(B), sorted_num_list(B)). % A2
:- calls partition(A,B,C,D) : (ground(A), ground(B)).      % A3
:- success partition(A,B,C,D) => (list(C, num),ground(D)). % A4
:- calls append(A,B,C) : (list(A,num),list(B,num)).        % A5
:- comp partition/4 + not_fails.                           % A6
:- comp partition/4 + is_det.                              % A7
:- comp partition(A,B,C,D) + terminates.                   % A8

:- prop sorted_num_list/1.
sorted_num_list([X]):- number(X).
        number(X), number(Y), X=<Y, sorted_num_list([Y|Z]).

where we also use a new property, sorted_num_list, defined in the module itself. These assertions provide a partial specification of the program. They can be seen as integrity constraints: if their properties do not hold at the corresponding program points (procedure call, procedure exit, etc.), the program is incorrect. Calls assertions specify properties of all calls to a predicate, while success assertions specify properties of exit points for all calls to a predicate. Properties of successes can be restricted to apply only to calls satisfying certain properties upon entry by adding a “:” field to success assertions. Finally, Comp assertions specify global properties of the execution of a predicate. These include complex properties such as determinacy or termination and are in general not amenable to run-time checking. They can also be restricted to a subset of the calls using “:”. More details on the assertion language can be found in [PBH00b].

ciaopp can perform compile-time checking of the assertions above, by comparing them with the assertions inferred by analysis (see [HPBLG05,BDD97,PBH00c] for details), producing as output the following assertions:

:- checked calls qsort(A,B) : list(A,num).                        % A1
:- check success qsort(A,B)  => sorted_num_list(B).               % A2
:- checked calls partition(A,B,C,D) : (ground(A),ground(B)).      % A3
:- checked success partition(A,B,C,D) => (list(C,num),ground(D) ).% A4 
:- false calls append(A,B,C) : ( list(A,num), list(B,num) ).      % A5
:- checked comp partition/4 + not_fails.                          % A6
:- checked comp partition/4 + is_det.                             % A7
:- checked comp partition/4 + terminates.                         % A8
In order to produce this output, the ciaopp check_assertions menu must be set to the same options as those used in Figure fig:debugging-ctchecking1a for checking assertions in system libraries. Since the auto mode has been used for the option Perform Compile-Time Checks, CiaoPP has automatically detected that the program must be analyzed not only for types and modes domains, but also to check non-failure, determinism, and upper-bound cost. Note that a number of initial assertions have been marked as checked, i.e., they have been validated. If all assertions had been moved to this checked status, the program would have been verified. In these cases ciaopp is capable of generating certificates which can be checked efficiently for, e.g., mobile code applications [APH04]. However, in our case assertion A5 has been detected to be false. This indicates a violation of the specification given, which is also flagged by ciaopp as follows:
ERROR: (lns 22-23) false calls assertion:
   :- calls append(A,B,C) : list(A,num),list(B,num)
      Called append(list(^x),[^x|list(^x)],var)
The error is now in the call append(R2,[x|R1],R) in qsort (x instead of X). Assertions A1, A3, A4, A6, A7, and A8 have been detected to hold. Note that though the predicate partition may fail in general, in the context of the current program it can be proved not to fail (assertion A6). However, it was not possible to prove statically assertion A2, which has remained with check status. Note also that A2 has been simplified, and this is because the mode analysis has determined that on success the second argument of qsort is ground, and thus this does not have to be checked at run-time. On the other hand the analyses used in our session (types, modes, non-failure, determinism, and upper-bound cost analysis) do not provide enough information to prove that the output of qsort is a sorted list of numbers, since this is not a native property of the analyses being used. While this property could be captured by including a more refined domain (such as constrained types), it is interesting to see what happens with the analyses selected for the example.

Note: Note that while property sorted_num_list cannot be proved with only (over approximations) of mode and regular type information, it may be possible to prove that it does not hold (an example of how properties which are not natively understood by the analysis can also be useful for detecting bugs at compile-time): while the regular type analysis cannot capture perfectly the property sorted_num_list, it can still approximate it (by analyzing the definition) as list(B, num). If type analysis for the program were to generate a type for B not compatible with list(B, num), then a definite error symptom would be detected.

Performance Debugging and Validation:

ciaopp allows stating assertions about the efficiency of the program. This is done by stating lower and/or upper bounds on the computational cost of predicates (given in number of execution steps). Consider for example the naive reverse program:

:- module(_, [nrev/2], [assertions,predefres(res_steps)]).

:- use_module(library(assertions/native_props)).

:- entry nrev(A,B) : (ground(A), list(A), var(B)).
%:- check comp nrev(A,B) + steps_ub(length(A)+1).     % (1) false
%:- check comp nrev(A,B) + steps_o(length(A)).        % (2) false
%:- check comp nrev(A,B) + steps_o(exp(length(A),2)). % (3) checked

nrev([H|L],R) :-

append([X|Xs],Ys,[X|Zs]):- append(Xs,Ys,Zs).

Suppose that the programmer thinks that the cost of nrev is given by a linear function on the size (list-length) of its first argument, maybe because he has not taken into account the cost of the append call, and adds the assertion:

:- check comp nrev(A,B) + steps_ub(length(A)+1).

Since append is linear, it causes nrev to be quadratic. ciaopp can be used to inform the programmer about this false idea about the cost of nrev.

As before, we set the option Select Action Group to check_assertions in the menu:

The output is:

We get the following error message:

ERROR (ctchecks_pred_messages): (lns 6-6) False assertion:
:- check comp nrev(A,B)
         + steps_ub(length(A)+1).

on comp nrev:nrev(A,B) :

[generic_comp] : covered,is_det,mut_exclusive,not_fails,steps_lb(0.5*exp(length(A),2)+1.5*length(A)+1),steps_ub(0.5*exp(length(A),2)+1.5*length(A)+1),[A:(name57,rt103),B:(name59,rt102)]

This message states that nrev will take at least resolution steps (which is the cost analysis output), while the assertion requires that it take at most resolution steps. The cost function in the user-provided assertion is compared with the lower-bound cost assertion inferred by analysis. This allows detecting the inconsistency and proving that the program does not satisfy the efficiency requirements imposed. Upper-bound cost assertions can also be proved to hold, i.e., can be checked, by using upper-bound cost analysis rather than lower-bound cost analysis. In such case, it holds when the upper-bound computed by analysis is lower or equal than the upper-bound stated by the user in the assertion. The converse holds for lower-bound cost assertions.

ciaopp can also verify or falsify cost assertions expressing worst case computational complexity orders (this is specially useful if the programmer does not want or does not know which particular cost function should be checked). For example, suppose now that the programmer adds the following “check” assertion:

:- check comp nrev(A,B) + steps_o(length(A)).

In this case, we get the following error message:

ERROR (ctchecks_pred_messages): (lns 7-7) False assertion:
:- check comp nrev(A,B)
         + steps_o(length(A)).

on comp nrev:nrev(A,B) :

[generic_comp] : covered,is_det,mut_exclusive,not_fails,steps_lb(0.5*exp(length(A),2)+1.5*length(A)+1),steps_ub(0.5*exp(length(A),2)+1.5*length(A)+1),[A:(name57,rt103),B:(name59,rt102)]

This message states that nrev will take at least resolution steps (which is the cost analysis output, as in the previous example), while the assertion requires that the worst case cost of nrev be linear on (the size of the input argument).

If the programmer adds now the following “check” assertion:

:- check comp nrev(A,B) + steps_o(exp(length(A),2)).

which states that the worst case cost of nrev is quadratic, i.e. is in , where is the length of the first list (represented as length(A)). Then the assertion is validated and the following “checked” assertion is included in the output produced by ciaopp:

:- checked comp nrev(A,_1) + steps_o( exp(length(A), 2) ).

ciaopp can certify programs with resource consumption assurances [HALGP04].