Quark Framework

From Wikipedia, the free encyclopedia

CAL
Image:CALlogo3D.png
Paradigm functional, non-strict, modular
Appeared in 2004
Designed by Luke Evans, Bo Ilic (Business Objects)
Typing discipline static, strong, inferred
Major implementations Open Quark Framework for Java
Dialects --
Influenced by Haskell, Clean, Java
Influenced --

The Quark Framework (Open Quark) consists of a non-strict functional language and runtime for the Java platform. The framework allows the compilation and evaluation of functional logic on the Java Virtual Machine (JVM), directly or under the control of a regular Java application. The native language (syntax) for the Quark Framework is called CAL. A full range of Java APIs provides the means to invoke the CAL compiler, manipulate workspaces of modules, add metadata and evaluate functional logic. Functional logic is usually dynamically compiled to Java bytecodes and executed directly on the JVM (with some small kernel logic controlling the normal evaluation order). However, an interpreter (G-machine) is also available.

Contents

[edit] Motivation, History and Concept

CAL, and the associated tools and APIs forming the Quark Framework, was conceived in 1998 within Seagate Software (later Crystal Decisions, now Business Objects) as a way to enable a common framework for sharing business logic across business intelligence products. A framework was desired that would allow data behaviour to be captured, at all levels of abstraction, and for these components to be composable into the actual data flows that individual products required. In general terms, such a system had to provide a way to express data behaviour declaratively, with universal composition and typing laws. In essence, this places the problem firmly in the domain of functional languages, and the desire to allow machine compositions of functions without incurring increasing efficiency penalties strongly suggested a non-strict evaluation semantic.

As well as the operational requirements, it was envisaged that future application logic would likely be written for a dynamic platform (such as Java or .Net), and therefore it was determined that the Quark Framework should be native to Java (initially) with considerable emphasis on performance and interoperability with application logic written on that platform. In 1999, work began in Crystal's Research Group on an implementation of this framework. Many of the original insights into lazy functional systems were drawn from implementations of Haskell. Early on, Haskell (HUGS, GHC) was even considered as a starting point for the implementation itself, but a number of requirements made this impractical, so it was decided to let the project emerge and evolve freely following its own design criteria. For the first few years of development, the CAL source language itself was not a primary motivator, but the operational semantics were of primary concern. At this time, CAL was merely a convenient script for expressing functions rather than composing them programmatically through Java APIs, or using a graphical language native to a tool called the Gem Cutter, which began to be implemented in mid-2000 as a way to author systems of functions that could be used in applications. From about 2002 onward, the CAL language became rather more central to the Quark Framework, especially once programmers began to create usable libraries of functions for real applications. As the language evolved, so did the demand for tools, and so a range of tools and utilities were created in parallel to language development to support those doing real work with the platform.

While the Gem Cutter remained the main development environment in the initial years, since late 2005 there has been an intention to produce Eclipse-based tools, and the emphasis has shifted to activities advancing the state of Eclipse integration.

The motivations for the Quark Framework appear to be similar to those driving Microsoft's LINQ project, in particular the desire for a declarative style and some lazy evaluation for certain kinds of logic, hosted within applications coded in an Object Oriented language. While CAL cannot yet be embedded inline within Java source, generated functions are fully compiled and the system can efficiently share data between CAL and Java sourced logic. For instance, CAL lists can be marshalled dynamically to and from Java data structures that implement the Iterator interface.

In 2007, the Quark Framework is an advanced and well tested framework for integrating non-strict functional logic into Java programs. It can be used as a standalone functional language too, that happens to compile to Java bytecodes. The framework was offered as open source under a BSD-style license in January 2007, and continues to be used and developed within Business Objects.

[edit] The CAL Language

The CAL language borrows much from Haskell syntax, but also eschews some Haskell features. As such, CAL is a strongly typed, lazily evaluated functional language, supporting algebraic functions and data types with parametric polymorphism and type inferencing. CAL has special syntax for strings, tuples, characters, numbers, lists and records. Single parameter type classes are supported, with superclasses, derived instances, deriving clauses for common classes, default class methods and higher-kinded type variables. While doubtless a subjective measure, CAL's developers have tried to keep the language simple. In particular, only the expression style is supported (Haskell's equation based style with argument pattern matching is not supported), and CAL does not make use of layout (semicolons are required to terminate definitions). CAL also makes certain syntactic choices to align it more strongly with Java. For instance, Java's syntax for comments is used, and CAL's inline documentation comments are close to JavaDoc.

One of the main differences between Haskell and CAL is in the area of interfacing with the 'real world'. Whereas Haskell goes to great lengths to validate the purity of functions, CAL takes a more pragmatic approach and relies on the programmer to hide 'imported impurity', exposing pure functions from a module where impure imports are made. CAL has a range of mechanisms for controlling evaluation order and laziness. These are often essential tools in the creation of effective solutions with native functions, but are also important in the aforementioned interface with the stateful world. The choice to de-emphasise formal tracking of purity, in favour of practical mechanisms to allow the programmer to express the right logic, has proven to provide a good balance of flexibility and 'directness' when interfacing with external operations.

One of the main design goals for CAL was to make the language as comfortable as possible for mainstream developers to pick up and use effectively. This is reflected in choices for syntax, but also in conventions and patterns used within the standard libraries. For example, libraries use longer, descriptive names and are commented to explain the implementations and best practices for use.

To see some CAL language source code click here. This tutorial CAL module is designed as a top-to-bottom 'feature parade' to showcase basic syntax with examples of some built-in and user defined data structures.

Here are a few examples, derived from the tutorial module linked to in the preceding paragraph:

[edit] 1. Quicksort

/**
 * Here is a simple implementation of the quicksort algorithm for lists in 
 * CAL.
 * 
 * Note: it is not the most efficient implementation, since it filters the 
 * list twice to partition. 
 * 
 * It is used here as an illustration. The production implementation of 
 * sorting on lists is 
 * {@link List.sort@}. 
 * 
 * The type of quicksort is constrained by the {@link Ord@} type class. This
 * means that quicksort can sort list of any orderable type.
 */
quicksort :: Ord a => [a] -> [a];
quicksort list =
    let  
        //partition_min is a local function of 1 argument        
        partition_min pivot = filter (\x -> x < pivot); 
        partition_max pivot = filter (\x -> x >= pivot);
    in
        case list of
        [] -> [];
        pivot : tail ->
            quicksort (partition_min pivot tail)                       
            ++ (pivot : (quicksort (partition_max pivot tail)));
        ;

Note that CAL supports inline documentation comments, with embedded tags. Function type declarations immediately precede the function definition. CAL supports type classes (e.g. Ord). Two local functions are declared in the let block. Quicksort has a recursive definition, building up the output list at each level of recursion from the sorts applied to the list of values either side of the pivot item.

[edit] 2. Data Declarations

In common with other functional languages, CAL provides a way to declare new types using data declarations. Here is an example of declaring a new data type for 'Employee':

/**
 * The Employee data type is an example of a CAL algebraic data type. It 
 * has one data constructor, RegularEmployee. The RegularEmployee data 
 * constructor has 3 fields: name, manager and directReports. 
 * 
 * The firstName and lastName fields have type String. The manager field 
 * has type Maybe Employee. This reflects the fact that an employee may 
 * have one manager or no managers (in the case fo the CEO), and that 
 * manager is also an Employee. directReports has type [Employee] i.e. a 
 * list of 0 or more employees. Note that the manager and directReports 
 * fields have types that recursively refer to the Employee type. In other 
 * words, Employee is a recursive algebraic data type.
 */
data Employee =
      RegularEmployee  
              name           :: String
              //the employeeID field is a potential future addition to the 
              //Employee data type. Notice below that case expressions using 
              //field-name based extraction would not need to be updated due
              //to this change, but case expressions using positional based
              //extraction would need updating.
              //employeeID   :: Int
              manager        :: (Maybe Employee)
              directReports  :: [Employee]

      //This deriving clause provides a default instance implementation for
      //the Show type class for the Employee type. Essentially what this 
      //means is that the Debug.show function will work with Employee 
      //values to display a String representation of the value suitable for
      //debugging purposes.
      deriving Show;

Note that fields (data contructor arguments) must have names in CAL, as well as their type. The deriving clause makes the data type work with functions defined in certain type classes. Here we ensure that CAL automatically adds the requisite logic to make values of Employee be renderable as strings for the purposes of tracing.

Fields can be extracted in a variety of ways:

a) Positionally in a case expression:

/**
 * @arg employee
 * @return the employee's name
 */
employeeName :: Employee -> String;
employeeName employee = 
    case employee of
    //illustrates positionally based case expression extraction of the 
    //"name" field of the RegularEmployee data constructor  
    RegularEmployee empname _ _ -> empname;   
    ;

b) By field name in a case expression:

/**
 * @arg employee
 * @return the employee's name
 */
employeeName :: Employee -> String;
employeeName employee = 
    case employee of
    //illustrates field-name based case expression extraction of the 
    //"name" field of the RegularEmployee data constructor  
    RegularEmployee {name} -> name;   
    ;

This method is obviously more stable with respect to changes in data contructor arguments than the positional version

c) By a selector expression

/**
 * @arg employee
 * @return the employee's name
 */
employeeName :: Employee -> String;
employeeName employee = 
    //illustrates data constructor field selection of the directReports 
    //field  
    employee.RegularEmployee.name;

Note that CAL allows multiple constructor arguments to be cited in a case extractor, along with multiple constructors to match on (so long as they all have the named arguments). So, the following scenario is possible (assuming the data declaration includes the employeeID field and new constructors for Contractor and Associate):

// ...snip...
case employee of
    //illustrates multiple field case expression extraction over multiple 
    //data constructors.  Note the local renaming of the employeeID field
    //to 'id'  
    (RegularEmployee | Contractor | Associate) {name, employeeID = id } -> (name, id);   
    ;
// ...snip...

[edit] 3. Records

CAL unifies tuples and records, which can be used as containers for heterogeneously typed values (as compared to lists, which are sequences of values of the same type). Records are extensible and can be convenient for passing collections of values where the formality of a new data definition is not necessary. Records can have textual or numeric (ordinal) indexed fields. Traditional tuples are simply records with exclusively ordinal fields and tuple contructors (parentheses) simply generate ordinal fields in sequence up from #1.

Here are three examples of records: a tuple (demonstrating its simpler constructor syntax), a record with ordinal fields (fully equivalent to the first tuple) and a record with mixed ordinal and named fields:

/** 
 * A record with 3 fields: field #1 has type String, field #2 has type Maybe
 * Boolean and field #3 has type [Double].
 * It is expressed using tuple notation. 
 */
tupleRecord1 :: (String, Maybe Boolean, [Double]);
tupleRecord1 = ("apple", Just True, [2.0, 3.0, -5]);
    
/** 
 * This record actually has the same value as tupleRecord1, but it includes
 * field names explicitly, and thus uses braces rather than parentheses. 
 * When using explicit field names, the ordering of the fields does not 
 * matter.
 */
tupleRecord2 :: {#1 :: String, #3 :: [Double], #2 :: Maybe Boolean};
tupleRecord2 = {#3 = [2.0, 3.0, -5], #1 = "apple", #2 = Just True};

/** 
 * Here is a record with both textual and ordinal fields.
 */
mixedRecord1 :: {#1 :: Maybe [Int], #2 :: Boolean, age :: Double, name :: String};
mixedRecord1 = {name = "Anton", age = 3.0, #1 = Just [10 :: Int, 20], #2 = False};
 

[edit] CAL on Java

The CAL compiler takes CAL source, as text or as a source model (Java object model). This is processed by the early compiler stages to desugar and analyse the source. The intermediate form, plus metadata from analysis, is processed by a number of optimisers, optionally including a full rewrite optimiser capable of function fusion, deforestation and other major optimisations that preserve semantics but improve a program operationally.

The compiler supports plugable back-ends. First amongst these is LECC (Lazily Evaluating CAL Compiler). This back-end generates Java classes and byte codes directly, emitting methods according to compiler schemes that take account of context metadata derived in the compiler, such as strictness of function arguments. LECC can package generated code in a number of ways, including as a CAL Archive (CAR), or a Java JAR. At runtime, a class loader can load an entire corpus of functions, or the Quark Framework loader can load closely connected functions according to prior dependency analysis. This latter feature is important to minimise start up times, whereby only the functions actually required by an application incur loading overhead.

The LECC back-end can also generate Java source code, which is then compiled by the regular JDK Java compiler to produce class files. Amongst other things, this is very useful when validating compiler schemes during compiler development, and provides a way to reason about the operational behaviour of CAL on the Java platform.

As well as LECC, Open Quark includes a G-machine interpreter and a compiler back-end that generates G-machine code. While considerably slower than LECC, this option is useful for experiments and may be a better fit for some deployments.

While many deployments of CAL may use the language standalone, the Quark Framework is fundamentally designed to be used within regular Java applications to provide a hybrid/multi-paradigm system. The intent is to allow transformational logic, which benefits from more algebraic representation, to be embeddable within Java OO logic handling regular (stateful) aspects of the application. To this end, CAL supports a very powerful and easy to use interface to Java, and the Quark Framework SDK allows Java code considerable control over how functions are evaluated and results produced. Java code can issue new functions to the Quark Framework for compilation, which can be immediately available for evaluation. Thus, Java can use the Quark Framework as a functional meta-programming environment. This has been a common use case for the framework, and is supported efficiently (low latency, with concurrent compilation and evaluation). On the consumption side, results can be presented to Java as lazy suspensions, so that minimal functional evaluation is performed and only when Java logic requests the 'next' output value. This feature allows data-flow logic to be constructed on-the-fly within a Java application, used on-demand and then disposed of if necessary. Both the LECC and G-machine runtimes are able to load and unload functions from memory to support a fully dynamic environment.

The 'foreign' interface between CAL and Java is able to import any Java entity into CAL and make calls on Java methods. Values are passed efficiently between the two environments, without unnecessary boxing/unboxing operations. A powerful feature called "I/O policies" allows values to be lazily marshalled between structures if required (for instance, if you have a particular Java class or data structure that you wish to produce to represent a CAL value). These policies are declared completely on the CAL side, leaving the Java side 'natural'. The default policies are usually quite sufficient to share values, so usually nothing special must be done to exchange values.

Here are some examples of CAL code that declares interfaces to Java entities:

[edit] 1. Importing a Java type, a constructor and a method

The following code imports the "java.util.LinkedList" class (which becomes the CAL 'opaque' data type "JLinkedList"). The fragment then imports the default constructor for this class, and the instance method 'add'. All of these imports are marked private, which means that they would only be usable within the importing CAL module. This is quite common, as its usually good practise to export public functions from modules that behave as pure functions.

data foreign unsafe import jvm "java.util.LinkedList"
    private JLinkedList deriving Inputable, Outputable;

foreign unsafe import jvm "constructor"
    private linkedList_new :: JLinkedList;

foreign unsafe import jvm "method add"
    private linkedList_add :: JLinkedList -> JObject -> Boolean;

[edit] 2. Fields can be imported too

foreign unsafe import jvm "static field java.lang.Float.NEGATIVE_INFINITY" floatNegativeInfinity :: Float; 

This is especially useful for constants, per the example.

[edit] 3. Casts for Java types

Occasionally, it is necessary to cast between Java types that have been imported. For instance, JObject and JList are imported in the Prelude. If you needed to cast between them in CAL, then you could declare the appropriate casting functions. These declarations are properly checked for validity.

foreign unsafe import jvm "cast" castObjectToList :: JObject -> JList;
foreign unsafe import jvm "cast" castListToObject :: JList -> JObject;

Of course, with casting, there's always the potential for runtime class cast exceptions, so it's a good thing that CAL's exception features are fully integrated with Java too, e.g. ...

[edit] 4. Java Exceptions

The following fragment shows a CAL exception type being declared and a function that can demonstrate the throwing in CAL of this exception, or a Java NullPointerException depending on the argument it is passed. Note that CAL exceptions can have any payload. The definition of nullPointerException_make is not shown for brevity.

data MyExceptionType a =
    MyExceptionType 
        message :: String
        someOtherStuff :: a
        list :: [a]
    deriving Show
    ;

instance Exception Double where;
instance Exception a => Exception (MyExceptionType a) where;

//this is an interesting example because the JNullPointerException is not 
//caught by the JThrowable catch clause. exceptions thrown by CAL (with 
//CAL types) must be caught using their CAL type.
calThrownException1 :: Int -> String;
calThrownException1 n =
    (
        if n < 0 then
            throw (MyExceptionType "spinach salad" 10.0 [20, 30, 40])
        else if n > 0 then
            throw (nullPointerException_make "a new NullPointerException")
        else
            "all OK"
           
    )
    `catch`
    (\ex -> "caught MyExceptionType: " ++ show (ex :: MyExceptionType Double))
    `catch`
    (\ex -> "caught Throwable: " ++ show (ex :: JThrowable))
    `catch`
    (\ex -> "caught NullPointerException: " ++ show (ex :: JNullPointerException))
    ;

If you want, you can make any type an Exception. See the following fragment:

instance Exception String where;
instance Exception Int where;
instance Exception Integer where;
instance Exception a => Exception (Maybe a) where;
instance Exception a => Exception [a] where;
instance Exception r => Exception {r} where;

//tests using various Cal types as Exception types, including the interesting
//case of records
calThrownException5 =
    throw ("abc", 1 :: Int, 2 :: Integer, ["abc", "def"], Just (20 :: Int))
    `catch`
    (
        let
            handler :: (String, Int, Integer, [String], Maybe Int) -> String;
            handler r = show (r.#5, r.#4, r.#3, r.#2, r.#1);
        in
            handler
    );

Finally, and just for fun, we show how you can map an exception handler over a list!

handleAnyThrowableWithDefault :: a -> JThrowable -> a;
handleAnyThrowableWithDefault def throwable  = def;

exceptionTest1 :: Boolean;
exceptionTest1 =
    List.map (\x -> x `catch` handleAnyThrowableWithDefault (-999)) 
        [(4 :: Int) / 1, 10 /2, 5 / 0, (1 - 1) / 0, 2 / 2, 5 / (2 -2)]
    == [4, 5, -999, -999, 1, -999]             
    ;

The two arithmetic exceptions (divide by zero) are converted to the default value (-999) by the exception handler.

[edit] External References

Downloads of Open Quark (with and without sources) and collateral are currently hosted at the Open Quark site.

The following documents are available: