A Gentle Introduction To CAMlog


Contents

1. Introduction: An Example In Haskell And In CAMlog
2. The Type System
3. Continuations And Other Control Elements
4. Backtracking And Comprehension
5. Java Interoperability
6. Compiler/Interpreter Usage

Other Papers
 - A White Paper on CAMlog
 - CAMlog Language Description          (NOT YET)
 - CAMlog User Guide                    (NOT YET)
 - CAMlog Scripting API documentation   (NOT YET)
 - CAMlog Quick Reference               (NOT YET)


Preface

a) About The Paper
This paper is aimed at anyone with some experience in programming, preferably
in some functional or logic proramming language.
It's purpose is to most efficiently provide just enough information for getting
started to implement your own programs and use the 'CAMlog Language Description'
as a reference, so working through this rather formal document can be avoided.
(Note: The language description does NOT YET exist.)

b) About The Language
Behind the Language is an extension of a computation model known as the
'Categorical Abstract Machine' (CAM).
This makes CAMlog a polymorphically typed functional language with a logical
flavor and some typically imperative -so called monadic- features built on top
of it.
It should be noted that CAMlog is still an experiment, in particular the
intended inclusion of states/objects/process will change the language.
A more detailed exposition of our goals are to be found in the 'White Paper'.
(Note: The white paper does NOT YET exist.)

c) About The Implementation
The system is essentially Java based and thus has good Java-interoperability.
Sources are compiled to 'CAMlog operational code', which is then interpreted.
A compilation to C is intended.

d) What CAMlog Is Good For
Generally declarative languages show their most strength in so-called
'intelligent' applications: complex structures and algorithms with a high level
of abstraction having nontrivial mathematical semantics behind it.
CAMlog is intended to be used as a scripting engine in modeling applications;
however it is still a general purpose language.


1. Introduction:
   An Example In Haskell And In CAMlog

The integer quotient and remainder if a fraction term n/m may be computed
recursively like this: For a proper fraction, the quotient is 0 and n is the
remainder; else we can lower n by m and then have to increment the resulting
quotient by 1.

As a reference we provide the following implementation, Haskell being the quasi-
standard of functional programming. Even those without knowledge of it may find
the following source convenient, since the Haskell notation is rather natural.

    divmod n m
        | n<m       = (0,n)
        | otherwise = (q,r)
            where (q-1,r) = divmod (n-m) m
    div n m = q
            where (q,_) = divmod n m
    mod n m = r
            where (_,q) = divmod n m

Next is some CAMlog code in 1-to-1 correspondence with the above. This is not
even the best way to use CAMlog syntax, but the one than comes closest to the
Haskell code.

    def divmod < n m > q r :
              n << m ,
              0 > q ,
              n > r
           ;  divmod < (n-m) m > (q-1) r .
        div < n m > q :
              divmod < n m > q _ .
        mod < n m > r :
              divmod < n m > _ r .
     in mod < 80 7

Now let us explain some the characteristic language construct that we meet here.

a) Rule Application
The most common ingredient of a CAMlog program is the 'rule application', e.g.
    divmod < (n-m) m > (q-1) r
It means: There is a rule (roughly: a function) named 'divmod' that should be
applied to input '<' arguments '(n-m)' and 'm' and yield it's output '>' to
(q-1) and r.
More precisely, this 'statement' consists of two separate parts: the application
term, computing a pair of values
    divmod < (n-m) m
and the binding construct
    > (q-1) r
using these values to define the variables q and r.

b) Terms And Values
Here is a brief list of the other terms (language constructs that yield a value)
in the above example
    80    integer constant
    n     reference to a variable that has been defined (bound) before
    n-m   infix notation for an application term, namely 'minus < n m'
Application terms can be nested, as '(n-m)' is in 'divmod < ...' , though
nestings in CAMlog programs tend to be not so deep.
Integer literals, variable names, infix operator priorities are as one expects
them to be. The application operator '<' has low priority.

c) Patterns And Variable Binding
Patterns appear in binding constructs. By matching data to a pattern, the data
will be decomposed and the variables of the pattern will be bound to values.
In the above example we meet these kinds of patterns:
    (q-1) r
          tuple pattern: matching to a tuple value <x,y>, the components of the
          value will be matched to those of the pattern: x with q-1, y with r.
    q-1   the 'n+k' pattern: matching to a value x will match q to x+1.
    r     variable pattern: matching will bind the variable to the value.

d) Conjunction Clauses (The Comma)
A conjunction clause is a comma-separated sequence of statements. The only
conjunction in our example is too trivial, so let us look at little snippet from
another program (actually a more efficient algorithm for the same problem).
    divmod < n (2*m) > q  r ,
    divmod < r m     > q' rest ,
    2*q+q' > quotient
The variables bound in one line are available for reference in all following
lines, so the members of a conjunction are evaluated sequentially.
Without the binding construct '> quotient' at the end, the clause is a term
having the value of it's last member.

e) Failures And Alternatives (The Semicolon)
A rule may either compute a value or fail, this is why we don't use the term
'function' for them. 
Now let us look at the following part of our example
      n << m ,
      ...
    ; divmod < (n-m) m > (q-1)
The expression 'n<<m' is an application of the 'less than' rule, which fails in
case n>=m.
The semicolon-separated clauses are called alternatives, together they form a
formula. If some statement fails, the whole alternative fails, and the program
continues with the next one. If it's last alternative has failed, the whole
formula fails. As soon as a whole clause has succeeded, the computation ends and
it's value is yielded the value of the formula.
Variable bindings are restricted to their alternatives, so no information is
passed out of a failing alternative.
In our example the effect is that of an if-then-else.

f) Rule Definitions
We have seen how the 'divmod' rule is applied so here is how it is defined.
    divmod < n m > q r :
       ...  .
The head of a rule definition resembles an application, only this time the
input part '< n m' is a pattern and the result '> q r' a tuple term.
The rule body is enclosed in ':' and '.' and usually consists of a couple of
alternatives. The result term is evaluated last, and each alternative must
provide bindings for the variables that occur in it.
Rule definitions may be written in a more funtional style, without a result
term, with the rule then yielding the value of it's body.
The 'div' rule of our example might thus be written as
    div < n m :
          divmod < n m > q _ ,
          q .
Obviously a rule fails if all alternatives of it's body do.

g) def Binding Terms And Recursion
Rule definitions may appear only in the def-construct, which usually constitutes
the main body of a CAMlog program, as in our case
    def divmod ... .
        div ... .
        mod ... .
     in mod < 80 7
The defined rules are available in the body term after the 'in' keyword as well
as in all the rule bodies, so recursion and forward references are possible.
(Note that this is not true for the '>' binding construct.)
A def expression is a term and evaluates to the value of it's body.
In connection of def terms there is another way to write alternatives, namely by
having consecutive rule definitions with the same name. So we may write
    def divmod < n m > 0 n :
              n << m .
        divmod < n m > (q+1) r :
              divmod < (n-m) m > q r .
        ...
This is more flexible than the ';' since the rule heads may differ.


2. The Type System

a) The Basic Types
CAMlog is equiped with the usual static type system of contemporary functional
languages. The types of the values that we have encountered so far are:
    int           the atomic type of integer numbers
    int,int       a tuple type, here the type if pairs of int values, occurring
                  as input and result of some rules
    int,int->int  a function type, the type of both div and mod
Every term, pattern and variable has a compile time known type, which is infered
and checked for consistency by the system, so no user declarations are needed.
We informally write '::' for 'has type', e.g.
    div :: int,int -> int
but this does not occur in the source code

b) Lists And Other Algebraic Types
As an example of an algebraic type we give the definition of the builtin type
'list'. A Haskell definition (but with good old LISP terminology) would be 
    data List x =   Nil
                  | Cons { car :: x ,
                           cdr :: List x
                         }
The CAMlog version is just a mild variation:
    type list X :   nil <
                  ; cons < car > X ,
                           cdr > list X .
The comma denotes a product (tuple) type, the semicolon a sum (variant) type.
Algebraic types admit parametric polymorphism, here X is a type variable.
Along with the type, constructor (nil and cons) and selector (car and cdr) rules
are defined, which have the following typings:
    cons ::  \-/X. X,(list X) -> list X
    nil  ::  \-/X. 1          -> list X
    car  ::  \-/X. list X -> X
    cdr  ::  \-/X. list X -> list X
The universal quantifier means that the type variable X may take different
values at different occurrences of the constructor and selector names.
From a mathematical point of view, list X is the free algebra over the X with
the constructors cons and nil.

c) Example: The length Of A List
Here is a program that recursively counts the length of a list:
    def length < (nil < )      > 0 .
        length < (cons < x xs) > 1 + (length < xs) .
     in
        length < (cons < 1 (cons < 2 (cons < 3 (cons < 4 (cons < 5 (nil <)))))) 
The nasty expression in the body of the def just constructs the list 1,2,3,4,5
using cons and nil.
The rule definitions contain a new form of patterns, called the constructor
pattern: Matching 'cons < x xs' to a list value may fail, namely in case the
toplevel constructor of the value is not 'cons' (i.e. it's the empty list).
If it is, the components (car and cdr) of the value are matched to the patterns
x and xs. Failure of a pattern matching implies failure of the alternative it
contains. Pattern matching is the customary way to control algorithms that
operate on algebraic types and to access components of the data.
Typically, the selectors are rarely ever used, though there is a neat way to
formulate the length function:
    length < x :   1 + (length < (cdr < x))
                 ; 0 .
This works, because the 'cdr' selector fails in case of an empty list.

A CAVEAT
At this point we should mention a common misunderstanding for those who are not
familiar with polymorphic type systems.
A type of variants (which is what we have) is something other than a variation
of types (e.g. by inheritance, casting and the like). For any application term,
the TYPES of functions and their actual parameters a matched statically, at
compile time, and 'failure' here will result in rejection of the whole program
by the compiler. Thus, for instance, a function expecting a triple can never
be called with a pair of int_s as input, since these are different types.
At runtime however, various CONSTRUCTORS of the same type can be matched against
each other. So a list of length 3 may be matched against a list of length 2;
the matching algorithm will perform a structure walk on pattern and value, where
in our example after 3 steps matching 'cons' against 'nil' will fail and trigger
execution of the next alternative.

d) Special Notation For List Operation
Above we used the standard notation for algebraic types. As for the list type in
particular, there is some special syntax for constructor terms, which is used in
both terms and patterns. The example then becomes
    def length < []     > 0 .
        length < [x|xs] > 1 + (length < xs) .
     in
        length < [1 2 3 4 5]

e) Ad Hoc Polymorphism In def-Terms
In the scope of the def term, the length rule has the polymorphic type
    length ::  \-/X. (list X) -> int
Applications of length may then have diffent instances of that type, e.g. we
may have the def term body
    (length < [1 2 3]) + (length < ['a' 'b' 'c'])
Types are then 'list int -> int' and 'list string -> int'.
Note: If the length function had been assigned to a name by the '>' binding
construcs, this would not be admissable.


3. Continuations And Other Control Elements

CAMlog has (or will have) three so called 'monadic' features, that go beyond the
evaluation mechanism of a functional programming model:
    (i)   failure and alternatives
    (ii)  continuations
    (iii) stateful objects and processes
The monadic approach is to build such imperative features on top of a functional
computation model without destroying declarative semantics. There is a rich
literature to this topic: just google for 'Phil Wadler'.
In this section we will explain continuations and the way they interfer with
the 'logic style' elements of feature (i).
(Note: stateful objects and processes do NOT YET exist. One of the reasons for
the development of CAMlog was to provide a platform for experiments with
this monad combination.)

a) Labels And Continuations
Continuations are a semantically sound way to incorporate the expressive power
of the imperative 'goto' in the functional context; friends of the logic style
may replace this un-word by another one: 'cut'.
Continuations are good for handling exception situations in a nicely modular
way.
Our example computes the intersection point <x0|y0> of two lines y=m*x+b and
y'=m'*x+b' by the formula
    x0 = (b'-b)/(m-m'),  y0 = m*x0+b
The problem is to properly handle the case of parallel lines

    def safedivide < x y esc > x / y :
                        y /= 0
                      ; esc < 'infinity' .
        intersect < m b m' b' esc > x y :
                        safedivide < b-b' m'-m esc > x,
                        m * x + b > y .
     in           (: intersect < 3 5 3 2 parallel > point ,
                     show < point
                  ) > result ,
        parallel: append < 'lines intersect at ' result

Let us look at the main body first: enclosed in '(:' and ')' -this is the
parenthesization for formulae- a rule is called to compute the intersection
of y=3x+5 and y=3x+2; the point then is converted to a string by the built in
'show' function. The result of this block is then passed to variable 'result',
and the following computation -applying the builtin append- is supplied with
the label 'parallel'.
The label is visible inside the block before (and only there) as a variable;
it's value is called a 'continuation' represents the future of the computation
of the block.
The continuation is passed through the intersect rule to the safedivision rule,
where it occurs in an application just like a rule. In case of an attempt to
divide by 0 this call causes the whole computation to be skipped and resume
at the 'parallel:' label, and neither the 2nd line of 'intersect' nor the
'show < point' will be executed. It should be noted that alternatives in the
code would also be skipped by a continuation call, a case not present in our
very simple example.

b) The Type Of A Continuation
The type of a continuation depends on that of the block before. In our case,
it yields result::string, which makes the continuation have type
    parallel :: \-/ X. string -> X
with the quantification allowing the continuation to be applied anywhere.

AN OPEN PROBLEM
Our computation model allows a continuation to be passed outside the scope of
it's label, which leads to interesting control structures. However, the types of
such continuations are inherently recursive, which is not allowd in our system
so far.

c) Conditionals And Other Negative Information
Until now nesting conjunction clauses and alternatives has provided all the
neccessary means of control. Sometimes however we need a good old if-then-else,
which is written in Prolog notation in CAMlog.
The following one decides if two integers a, b have the same sign with a minimal
number of comparison operations:
       a << 0
    -> b << 0
    ;  b >= 0
A comma instead of the '->' would not do the job here, because a failure of
the then-part should not lead into the 'else'.
This reason why this needs a new language construct is the negative information
implicit in the else part, in CAMlog syntax
      ~ a << 0
which cannot be expressed in terms of ',' and ';'. It is expressible however by
continuations, but this would be too heavy a weapon for the programmer. (It's
what the compiler does!)


4. Backtracking And Comprehension

Backtracking is often described paradigmatically as 'non-determinism' and means
undoing unsuccessful parts of a computation. Examples often arise in the realm
of syntax analysis. Closely related are list comprehensions, which are much
easier to understand and very helpful in handling lists. 

a) Linear Vs. Backtracking Evaluation
Real life backtracking examples tend to be complicated. We give a rather trivial
one here, which will nonetheless have interesting consequences. Please have a
guess what the following code will do:
    def *choice < [x|xs] :
                     x
                   ; choice < xs .
     in  choice < [4 2 3 7 1 9] > x ,
         x >> 5
The treatment of alternatives we have seen so far is biased: if an alternative
succeeds (and the first one in 'choice' always will!), CAMlog will commit to the
result and never consider later alternatives again: this is called linear
evaluation and it would make our program fail immediately. (In logic programming
this the highly undesirable '!' cut.)
If 'choice' is evaluated in backtracking mode however, as is indicated by the
'*' befor the rule name, it means that the other alternative is reconsidered if
something fails later on. So as '4 >> 5' fails, instead of letting the whole
program fail with it, more the 'choice' rule is queried for more results.

A note on the use of 'x >> 5' above: relations don't give a boolean value, but
indicate their outcome by either failing or returnig the left operand, so x >> 5
will evaluate to 7. Right-associativity allows chaining relations, like
    x << y <= 7

b) Comprehension Expressions And The Solutions Combinator
As an example of the use of Haskell's list comprehensions, here is a nice
formulation of the quicksort algorithm

    sort []      = []
    sort (x0:xs) =    sort [ x | x <- xs, x < x0 ]
                   ++ [x0]
                   ++ sort [ x | x <- xs, x > x0 ]

Since there is NOT YET a library, we have to include some extra definitions, so
in CAMlog the code looks like this:

    def (++) < []     ys > ys .
        (++) < [x|xs] ys > [x|xs++ys] .
       *choice < [x|xs] :
                    x
                  ; choice < xs .
        sort < [] > [] .
        sort < [x0|xs] >
                    (sort < [: (choice < xs) << x0 ])
                 ++ [x0]
                 ++ (sort < [: (choice < xs) >> x0 ]) .
     in sort < [5 9 3 9 1 7 2 8 4 3 6 0]

The comprehension expressions [ : ... ] should be read 'all elements chosen from
xs that are less (rsp. greater) than x0'. The syntax inside the comprehension
term is nothing special: it is that of a rule definintion without name and
input parts. This code will be completely backtracked for all possible results,
which are then comprehended to a list.
The similariy with the Haskell code is even more striking in the verbose
notation
    [ > x : choice < xs > x , x << x0 ]
There is some conceptual difference behind the scene however: Roughly what is
backtracking for CAMlog is lazyness for Haskell. 
(Note: It is intended to incorporate lazyness in CAMlog too, but this topic is
deferred until an object concept is implemented, because there might be
interference between the two.)


5. Java Interoperability

Real life applications of CAMlog will usually be embedded as scripts in Java
systems and operate on data provided by those systems. For this, more important
than an extensive standard library (which is NOT YET shipped with CAMlog), is
excellent Java interoperability. Through it the programmer can easily create a
customized evironment for his/hers application.

I.  Calling Java From CAMlog

a) Creating Operations And Types For CAMlog In Java
The programmer may enhance CAMlog by creating new builtin functions and
primitive types. This is done by supplying dedicated 'resource' classes
containing the neccessary implementations and meta-information and importing
them through the
    import 'my.path.to.the.Resource'
directive.

- Functions And Constants
Any static Java method or field declared public in the resource class is made
available as a function rsp. constant, provided all the involved classes
correspond to atomic CAMlog types. This way we can generate monomorphic,
non-backtracking operations with a single return value.
A Java-implemented function can signal a 'failure' by returning a null value.

- Atomic Types
By an atomic type we mean a CAMlog type which is unstructured in contrast to
algebraic types, tuples and functions. Each atomic type is associated with a
Java class, which can be done in the resource by a declaration like
   public static RES.Atomic Roshambo = new RES.Atomic(Roshambo.class);
The class declaration itself may be placed anywhere, including as inner class
of the resource. It may implement the java.lang.Object.equals and toString
methods to provide for CAMlog '==' and 'show'; apart from that, all access
must be organized through function and constant definintions. RES.Atomic has to
be imported from the com.florianhaber.camlog.res package.
The builtin atomic types are 'int' (java.lang.Integer) and 'string' (java.lang.
string).

- Further Features
To use structured types and multiple return values, polymorphism and
backtracking, a certain understanding of the CAMlog type system
(com.florianhaber.camlog.ast.TYP) and the value domain (..camlog.cam.DOM) are
neccessary.
For the generation of binaries it is neccessary to split the resource class
into a typing part and a runtime part.
A small example can be found in test.jav.RoshamboRES; more features can be
studied in the source of the standard builtin resource com.florianhaber.camlog.
res.STD.

c) Accessing Java Objects And Instance Members
The implementation of these features are deferred until stateful objects are
integrated into the CAMlog language.


II. Calling CAMlog From Java

a) Calling Back CAMlog Closures From Java
This feature means higher order programming over the Java interface.
It is NOT YET implemented.

b) CAMlog Scripts In Java Systems
There is NOT YET a consistent API for this. To get an idea, look at the source
code in com.florianhaber.camlog.CAMlogVM.java


6. Compiler/Interpreter Usage

a) Running The Compiler/Interpreter
On your Java VM run
    com.florianhaber.camlog.CAMlog [-options]* [source] [output]
You need both CAMlog and ANTLR in your classpath.

Command Line Options of CAMlog
 *   -pp   prettyprint lambda and combinator code
 *   -typ  typecheck lambda code
 *   -bm   benchmark interpreter 
 *   -ccc  interpret combinator code denotationally
 *   -asm  list assembly language code
 *   -run  run CAM
 *   -all  run CAM backtracking to all results
 *   -bin=<file>
 *         write cam bytecode to <file>.CAM
 *   -binc=<file>
 *         write combinator bytecode to <file>.CCC
 *   -dbg  debug mode cam execution
 *   -dmp  dump cam register contents after execution
 *  directives
 *   -backtrack
 *         run whole program in backtracking mode
 *   -mycroft=<level>
 *         maximum number of iterations in typing recursive def-terms
Appending a '-' sign switches the option off.
The defaults are
    pp- typ bm- ccc asm- run dbg- dmp- backtrack- mycroft=16
Command line options override the compiler defaults and the options directive
in the source code overrides the command line.
(Note: This might be reversed.)

b) Running The Standalone Interpreter
On your target systems Java VM run
    com.florianhaber.camlog.CAMlogVM [-options]* [binary] [output]
e.g. by invoking
    java -jar CAMlogVM.jar ...
The CAMlog VM accepts both CCC and CAM binaries.
(Note: The jared version currently accepts only CAM binaries.)

Command Line Options of CAMlogVM
 *   -bm   benchmark interpreter
 *   -dbg  debug mode cam execution
 *   -dmp  dump cam contents after execution