Tokenizer

Author(s): The Ciao Development Team, Jose F. Morales (curly blocks, Unicode source support).

This module defines the tokenizer for Ciao. In addition to optional flags, the main differences w.r.t. the ISO-Prolog standard are:

  • ` is a graphic char, there are no back_quoted_strings.
  • \ followed by any layout char (not only new_line) in a string is a continuation_escape_sequence.
  • \^ starts a control_escape_char in a string.
  • \c skips layout in a string.
  • \e = ESC, \d = DEL, \s = SPACE.
  • 13'23 is 23 in base 13 (same for other bases).
  • 0'' is accepted as 0''' (if not followed by ').
  • Quoted atoms and strings support Unicode escape \uDDDD (four hexadecimal digits) and \UDDDDDDDD (eight hexadecimal digits)
  • Support for Unicode identifiers (see Unicode source code).

Unicode source code

The tokenizer for Ciao extends ISO-Prolog with support for Unicode source identifiers, based on the the Unicode Standard Annex 31, as follows:

  • Identifiers can begin with XID_Start characters and must be followed with zero or more XID_Continue (see the Unicode Derived Core Properties), extended with categorty No. Variables are those identifiers that start with characters in the Lu category.
  • Use Z* as layout characters, as well as other control characters (Cc categoty) with bidirectional category WS (whitespace), S (segment separator), or B (paragraph separator).
  • Use S* (Sm, Sc, Sk, So) and P* (Pc, Pd, Ps, Pe, Pi, Pf, Po) as symbols.
  • Identifiers that begin with XID_Continue are treated as solo tokens.

Usage and interface

Documentation on exports

REGTYPEtoken/1
A regular type, defined as follows:
token(atom(A)) :-
    atm(A).
token(badatom(S)) :-
    string(S).
token(number(N)) :-
    num(N).
token(string(S)) :-
    string(S).
token(var(T,S)) :-
    term(T),
    string(S).
token('/* ...').
token(',').
token('(').
token(' (').
token(')').
token('[').
token(']').
token('|').
token('{').
token('}').
token('.').

PREDICATEread_tokens/2

Usage:read_tokens(TokenList,Dictionary)

Documentation on multifiles

PREDICATEdefine_flag/3

Usage:define_flag(Flag,FlagValues,Default)

The predicate is multifile.

Documentation on imports

This module has the following direct dependencies: