A Gentle Introduction To CAMlog Contents 1. Introduction: An Example In Haskell And In CAMlog 2. The Type System 3. Continuations And Other Control Elements 4. Backtracking And Comprehension 5. Java Interoperability 6. Compiler/Interpreter Usage Other Papers - A White Paper on CAMlog - CAMlog Language Description (NOT YET) - CAMlog User Guide (NOT YET) - CAMlog Scripting API documentation (NOT YET) - CAMlog Quick Reference (NOT YET) Preface a) About The Paper This paper is aimed at anyone with some experience in programming, preferably in some functional or logic proramming language. It's purpose is to most efficiently provide just enough information for getting started to implement your own programs and use the 'CAMlog Language Description' as a reference, so working through this rather formal document can be avoided. (Note: The language description does NOT YET exist.) b) About The Language Behind the Language is an extension of a computation model known as the 'Categorical Abstract Machine' (CAM). This makes CAMlog a polymorphically typed functional language with a logical flavor and some typically imperative -so called monadic- features built on top of it. It should be noted that CAMlog is still an experiment, in particular the intended inclusion of states/objects/process will change the language. A more detailed exposition of our goals are to be found in the 'White Paper'. (Note: The white paper does NOT YET exist.) c) About The Implementation The system is essentially Java based and thus has good Java-interoperability. Sources are compiled to 'CAMlog operational code', which is then interpreted. A compilation to C is intended. d) What CAMlog Is Good For Generally declarative languages show their most strength in so-called 'intelligent' applications: complex structures and algorithms with a high level of abstraction having nontrivial mathematical semantics behind it. CAMlog is intended to be used as a scripting engine in modeling applications; however it is still a general purpose language. 1. Introduction: An Example In Haskell And In CAMlog The integer quotient and remainder if a fraction term n/m may be computed recursively like this: For a proper fraction, the quotient is 0 and n is the remainder; else we can lower n by m and then have to increment the resulting quotient by 1. As a reference we provide the following implementation, Haskell being the quasi- standard of functional programming. Even those without knowledge of it may find the following source convenient, since the Haskell notation is rather natural. divmod n m | n q r : n << m , 0 > q , n > r ; divmod < (n-m) m > (q-1) r . div < n m > q : divmod < n m > q _ . mod < n m > r : divmod < n m > _ r . in mod < 80 7 Now let us explain some the characteristic language construct that we meet here. a) Rule Application The most common ingredient of a CAMlog program is the 'rule application', e.g. divmod < (n-m) m > (q-1) r It means: There is a rule (roughly: a function) named 'divmod' that should be applied to input '<' arguments '(n-m)' and 'm' and yield it's output '>' to (q-1) and r. More precisely, this 'statement' consists of two separate parts: the application term, computing a pair of values divmod < (n-m) m and the binding construct > (q-1) r using these values to define the variables q and r. b) Terms And Values Here is a brief list of the other terms (language constructs that yield a value) in the above example 80 integer constant n reference to a variable that has been defined (bound) before n-m infix notation for an application term, namely 'minus < n m' Application terms can be nested, as '(n-m)' is in 'divmod < ...' , though nestings in CAMlog programs tend to be not so deep. Integer literals, variable names, infix operator priorities are as one expects them to be. The application operator '<' has low priority. c) Patterns And Variable Binding Patterns appear in binding constructs. By matching data to a pattern, the data will be decomposed and the variables of the pattern will be bound to values. In the above example we meet these kinds of patterns: (q-1) r tuple pattern: matching to a tuple value , the components of the value will be matched to those of the pattern: x with q-1, y with r. q-1 the 'n+k' pattern: matching to a value x will match q to x+1. r variable pattern: matching will bind the variable to the value. d) Conjunction Clauses (The Comma) A conjunction clause is a comma-separated sequence of statements. The only conjunction in our example is too trivial, so let us look at little snippet from another program (actually a more efficient algorithm for the same problem). divmod < n (2*m) > q r , divmod < r m > q' rest , 2*q+q' > quotient The variables bound in one line are available for reference in all following lines, so the members of a conjunction are evaluated sequentially. Without the binding construct '> quotient' at the end, the clause is a term having the value of it's last member. e) Failures And Alternatives (The Semicolon) A rule may either compute a value or fail, this is why we don't use the term 'function' for them. Now let us look at the following part of our example n << m , ... ; divmod < (n-m) m > (q-1) The expression 'n<=m. The semicolon-separated clauses are called alternatives, together they form a formula. If some statement fails, the whole alternative fails, and the program continues with the next one. If it's last alternative has failed, the whole formula fails. As soon as a whole clause has succeeded, the computation ends and it's value is yielded the value of the formula. Variable bindings are restricted to their alternatives, so no information is passed out of a failing alternative. In our example the effect is that of an if-then-else. f) Rule Definitions We have seen how the 'divmod' rule is applied so here is how it is defined. divmod < n m > q r : ... . The head of a rule definition resembles an application, only this time the input part '< n m' is a pattern and the result '> q r' a tuple term. The rule body is enclosed in ':' and '.' and usually consists of a couple of alternatives. The result term is evaluated last, and each alternative must provide bindings for the variables that occur in it. Rule definitions may be written in a more funtional style, without a result term, with the rule then yielding the value of it's body. The 'div' rule of our example might thus be written as div < n m : divmod < n m > q _ , q . Obviously a rule fails if all alternatives of it's body do. g) def Binding Terms And Recursion Rule definitions may appear only in the def-construct, which usually constitutes the main body of a CAMlog program, as in our case def divmod ... . div ... . mod ... . in mod < 80 7 The defined rules are available in the body term after the 'in' keyword as well as in all the rule bodies, so recursion and forward references are possible. (Note that this is not true for the '>' binding construct.) A def expression is a term and evaluates to the value of it's body. In connection of def terms there is another way to write alternatives, namely by having consecutive rule definitions with the same name. So we may write def divmod < n m > 0 n : n << m . divmod < n m > (q+1) r : divmod < (n-m) m > q r . ... This is more flexible than the ';' since the rule heads may differ. 2. The Type System a) The Basic Types CAMlog is equiped with the usual static type system of contemporary functional languages. The types of the values that we have encountered so far are: int the atomic type of integer numbers int,int a tuple type, here the type if pairs of int values, occurring as input and result of some rules int,int->int a function type, the type of both div and mod Every term, pattern and variable has a compile time known type, which is infered and checked for consistency by the system, so no user declarations are needed. We informally write '::' for 'has type', e.g. div :: int,int -> int but this does not occur in the source code b) Lists And Other Algebraic Types As an example of an algebraic type we give the definition of the builtin type 'list'. A Haskell definition (but with good old LISP terminology) would be data List x = Nil | Cons { car :: x , cdr :: List x } The CAMlog version is just a mild variation: type list X : nil < ; cons < car > X , cdr > list X . The comma denotes a product (tuple) type, the semicolon a sum (variant) type. Algebraic types admit parametric polymorphism, here X is a type variable. Along with the type, constructor (nil and cons) and selector (car and cdr) rules are defined, which have the following typings: cons :: \-/X. X,(list X) -> list X nil :: \-/X. 1 -> list X car :: \-/X. list X -> X cdr :: \-/X. list X -> list X The universal quantifier means that the type variable X may take different values at different occurrences of the constructor and selector names. From a mathematical point of view, list X is the free algebra over the X with the constructors cons and nil. c) Example: The length Of A List Here is a program that recursively counts the length of a list: def length < (nil < ) > 0 . length < (cons < x xs) > 1 + (length < xs) . in length < (cons < 1 (cons < 2 (cons < 3 (cons < 4 (cons < 5 (nil <)))))) The nasty expression in the body of the def just constructs the list 1,2,3,4,5 using cons and nil. The rule definitions contain a new form of patterns, called the constructor pattern: Matching 'cons < x xs' to a list value may fail, namely in case the toplevel constructor of the value is not 'cons' (i.e. it's the empty list). If it is, the components (car and cdr) of the value are matched to the patterns x and xs. Failure of a pattern matching implies failure of the alternative it contains. Pattern matching is the customary way to control algorithms that operate on algebraic types and to access components of the data. Typically, the selectors are rarely ever used, though there is a neat way to formulate the length function: length < x : 1 + (length < (cdr < x)) ; 0 . This works, because the 'cdr' selector fails in case of an empty list. A CAVEAT At this point we should mention a common misunderstanding for those who are not familiar with polymorphic type systems. A type of variants (which is what we have) is something other than a variation of types (e.g. by inheritance, casting and the like). For any application term, the TYPES of functions and their actual parameters a matched statically, at compile time, and 'failure' here will result in rejection of the whole program by the compiler. Thus, for instance, a function expecting a triple can never be called with a pair of int_s as input, since these are different types. At runtime however, various CONSTRUCTORS of the same type can be matched against each other. So a list of length 3 may be matched against a list of length 2; the matching algorithm will perform a structure walk on pattern and value, where in our example after 3 steps matching 'cons' against 'nil' will fail and trigger execution of the next alternative. d) Special Notation For List Operation Above we used the standard notation for algebraic types. As for the list type in particular, there is some special syntax for constructor terms, which is used in both terms and patterns. The example then becomes def length < [] > 0 . length < [x|xs] > 1 + (length < xs) . in length < [1 2 3 4 5] e) Ad Hoc Polymorphism In def-Terms In the scope of the def term, the length rule has the polymorphic type length :: \-/X. (list X) -> int Applications of length may then have diffent instances of that type, e.g. we may have the def term body (length < [1 2 3]) + (length < ['a' 'b' 'c']) Types are then 'list int -> int' and 'list string -> int'. Note: If the length function had been assigned to a name by the '>' binding construcs, this would not be admissable. 3. Continuations And Other Control Elements CAMlog has (or will have) three so called 'monadic' features, that go beyond the evaluation mechanism of a functional programming model: (i) failure and alternatives (ii) continuations (iii) stateful objects and processes The monadic approach is to build such imperative features on top of a functional computation model without destroying declarative semantics. There is a rich literature to this topic: just google for 'Phil Wadler'. In this section we will explain continuations and the way they interfer with the 'logic style' elements of feature (i). (Note: stateful objects and processes do NOT YET exist. One of the reasons for the development of CAMlog was to provide a platform for experiments with this monad combination.) a) Labels And Continuations Continuations are a semantically sound way to incorporate the expressive power of the imperative 'goto' in the functional context; friends of the logic style may replace this un-word by another one: 'cut'. Continuations are good for handling exception situations in a nicely modular way. Our example computes the intersection point of two lines y=m*x+b and y'=m'*x+b' by the formula x0 = (b'-b)/(m-m'), y0 = m*x0+b The problem is to properly handle the case of parallel lines def safedivide < x y esc > x / y : y /= 0 ; esc < 'infinity' . intersect < m b m' b' esc > x y : safedivide < b-b' m'-m esc > x, m * x + b > y . in (: intersect < 3 5 3 2 parallel > point , show < point ) > result , parallel: append < 'lines intersect at ' result Let us look at the main body first: enclosed in '(:' and ')' -this is the parenthesization for formulae- a rule is called to compute the intersection of y=3x+5 and y=3x+2; the point then is converted to a string by the built in 'show' function. The result of this block is then passed to variable 'result', and the following computation -applying the builtin append- is supplied with the label 'parallel'. The label is visible inside the block before (and only there) as a variable; it's value is called a 'continuation' represents the future of the computation of the block. The continuation is passed through the intersect rule to the safedivision rule, where it occurs in an application just like a rule. In case of an attempt to divide by 0 this call causes the whole computation to be skipped and resume at the 'parallel:' label, and neither the 2nd line of 'intersect' nor the 'show < point' will be executed. It should be noted that alternatives in the code would also be skipped by a continuation call, a case not present in our very simple example. b) The Type Of A Continuation The type of a continuation depends on that of the block before. In our case, it yields result::string, which makes the continuation have type parallel :: \-/ X. string -> X with the quantification allowing the continuation to be applied anywhere. AN OPEN PROBLEM Our computation model allows a continuation to be passed outside the scope of it's label, which leads to interesting control structures. However, the types of such continuations are inherently recursive, which is not allowd in our system so far. c) Conditionals And Other Negative Information Until now nesting conjunction clauses and alternatives has provided all the neccessary means of control. Sometimes however we need a good old if-then-else, which is written in Prolog notation in CAMlog. The following one decides if two integers a, b have the same sign with a minimal number of comparison operations: a << 0 -> b << 0 ; b >= 0 A comma instead of the '->' would not do the job here, because a failure of the then-part should not lead into the 'else'. This reason why this needs a new language construct is the negative information implicit in the else part, in CAMlog syntax ~ a << 0 which cannot be expressed in terms of ',' and ';'. It is expressible however by continuations, but this would be too heavy a weapon for the programmer. (It's what the compiler does!) 4. Backtracking And Comprehension Backtracking is often described paradigmatically as 'non-determinism' and means undoing unsuccessful parts of a computation. Examples often arise in the realm of syntax analysis. Closely related are list comprehensions, which are much easier to understand and very helpful in handling lists. a) Linear Vs. Backtracking Evaluation Real life backtracking examples tend to be complicated. We give a rather trivial one here, which will nonetheless have interesting consequences. Please have a guess what the following code will do: def *choice < [x|xs] : x ; choice < xs . in choice < [4 2 3 7 1 9] > x , x >> 5 The treatment of alternatives we have seen so far is biased: if an alternative succeeds (and the first one in 'choice' always will!), CAMlog will commit to the result and never consider later alternatives again: this is called linear evaluation and it would make our program fail immediately. (In logic programming this the highly undesirable '!' cut.) If 'choice' is evaluated in backtracking mode however, as is indicated by the '*' befor the rule name, it means that the other alternative is reconsidered if something fails later on. So as '4 >> 5' fails, instead of letting the whole program fail with it, more the 'choice' rule is queried for more results. A note on the use of 'x >> 5' above: relations don't give a boolean value, but indicate their outcome by either failing or returnig the left operand, so x >> 5 will evaluate to 7. Right-associativity allows chaining relations, like x << y <= 7 b) Comprehension Expressions And The Solutions Combinator As an example of the use of Haskell's list comprehensions, here is a nice formulation of the quicksort algorithm sort [] = [] sort (x0:xs) = sort [ x | x <- xs, x < x0 ] ++ [x0] ++ sort [ x | x <- xs, x > x0 ] Since there is NOT YET a library, we have to include some extra definitions, so in CAMlog the code looks like this: def (++) < [] ys > ys . (++) < [x|xs] ys > [x|xs++ys] . *choice < [x|xs] : x ; choice < xs . sort < [] > [] . sort < [x0|xs] > (sort < [: (choice < xs) << x0 ]) ++ [x0] ++ (sort < [: (choice < xs) >> x0 ]) . in sort < [5 9 3 9 1 7 2 8 4 3 6 0] The comprehension expressions [ : ... ] should be read 'all elements chosen from xs that are less (rsp. greater) than x0'. The syntax inside the comprehension term is nothing special: it is that of a rule definintion without name and input parts. This code will be completely backtracked for all possible results, which are then comprehended to a list. The similariy with the Haskell code is even more striking in the verbose notation [ > x : choice < xs > x , x << x0 ] There is some conceptual difference behind the scene however: Roughly what is backtracking for CAMlog is lazyness for Haskell. (Note: It is intended to incorporate lazyness in CAMlog too, but this topic is deferred until an object concept is implemented, because there might be interference between the two.) 5. Java Interoperability Real life applications of CAMlog will usually be embedded as scripts in Java systems and operate on data provided by those systems. For this, more important than an extensive standard library (which is NOT YET shipped with CAMlog), is excellent Java interoperability. Through it the programmer can easily create a customized evironment for his/hers application. I. Calling Java From CAMlog a) Creating Operations And Types For CAMlog In Java The programmer may enhance CAMlog by creating new builtin functions and primitive types. This is done by supplying dedicated 'resource' classes containing the neccessary implementations and meta-information and importing them through the import 'my.path.to.the.Resource' directive. - Functions And Constants Any static Java method or field declared public in the resource class is made available as a function rsp. constant, provided all the involved classes correspond to atomic CAMlog types. This way we can generate monomorphic, non-backtracking operations with a single return value. A Java-implemented function can signal a 'failure' by returning a null value. - Atomic Types By an atomic type we mean a CAMlog type which is unstructured in contrast to algebraic types, tuples and functions. Each atomic type is associated with a Java class, which can be done in the resource by a declaration like public static RES.Atomic Roshambo = new RES.Atomic(Roshambo.class); The class declaration itself may be placed anywhere, including as inner class of the resource. It may implement the java.lang.Object.equals and toString methods to provide for CAMlog '==' and 'show'; apart from that, all access must be organized through function and constant definintions. RES.Atomic has to be imported from the com.florianhaber.camlog.res package. The builtin atomic types are 'int' (java.lang.Integer) and 'string' (java.lang. string). - Further Features To use structured types and multiple return values, polymorphism and backtracking, a certain understanding of the CAMlog type system (com.florianhaber.camlog.ast.TYP) and the value domain (..camlog.cam.DOM) are neccessary. For the generation of binaries it is neccessary to split the resource class into a typing part and a runtime part. A small example can be found in test.jav.RoshamboRES; more features can be studied in the source of the standard builtin resource com.florianhaber.camlog. res.STD. c) Accessing Java Objects And Instance Members The implementation of these features are deferred until stateful objects are integrated into the CAMlog language. II. Calling CAMlog From Java a) Calling Back CAMlog Closures From Java This feature means higher order programming over the Java interface. It is NOT YET implemented. b) CAMlog Scripts In Java Systems There is NOT YET a consistent API for this. To get an idea, look at the source code in com.florianhaber.camlog.CAMlogVM.java 6. Compiler/Interpreter Usage a) Running The Compiler/Interpreter On your Java VM run com.florianhaber.camlog.CAMlog [-options]* [source] [output] You need both CAMlog and ANTLR in your classpath. Command Line Options of CAMlog * -pp prettyprint lambda and combinator code * -typ typecheck lambda code * -bm benchmark interpreter * -ccc interpret combinator code denotationally * -asm list assembly language code * -run run CAM * -all run CAM backtracking to all results * -bin= * write cam bytecode to .CAM * -binc= * write combinator bytecode to .CCC * -dbg debug mode cam execution * -dmp dump cam register contents after execution * directives * -backtrack * run whole program in backtracking mode * -mycroft= * maximum number of iterations in typing recursive def-terms Appending a '-' sign switches the option off. The defaults are pp- typ bm- ccc asm- run dbg- dmp- backtrack- mycroft=16 Command line options override the compiler defaults and the options directive in the source code overrides the command line. (Note: This might be reversed.) b) Running The Standalone Interpreter On your target systems Java VM run com.florianhaber.camlog.CAMlogVM [-options]* [binary] [output] e.g. by invoking java -jar CAMlogVM.jar ... The CAMlog VM accepts both CCC and CAM binaries. (Note: The jared version currently accepts only CAM binaries.) Command Line Options of CAMlogVM * -bm benchmark interpreter * -dbg debug mode cam execution * -dmp dump cam contents after execution