L programs consist of a set of files (with '.l' file extension) each containing a number of classes. The compiler walks all imported files and builds an intermediate representation of the source code. From this intermediate representation plus an on disk cache of previously compiled code the compiler determines which classes need to be compiled and then proceeds, through futher intermediate stages, to compile those classes to machine code.
The front end of the L compiler is implemented entirely in L apart from small parts of the runtime library (which are in assembly and C/C++). The back end is LLVM

Front end

The front end is concerned with preparing the program for compilation by converting the source program text into parse tree form and building various tables. The front end passes are always executed for the entire program source including libraries and run in a single thread.

Front end passes:

  • Lexical analysis

    The source program is scanned and each file is read as a stream of tokens which are read by the parser. The lexical analyser is hand rolled and operates with one character lookahead except when differentiating between inequalities and generic brackets, when an additional short buffer of tokens is also used to look ahead for a closing generic bracket.

  • Parsing

    Each source file is parsed by one of two Jay generated parsers (C-style syntax or alternative syntax) into a parse tree representing that file. The nodes of the parse tree are objects whose classes correspond to the language constructs that their nodes represent. From this point on, translation of the source program into intermediate code for a given construct is handled largely by its parse tree class.

  • Declare Global Symbols

    All namespaces, classes, interfaces, structures and enumerations are entered into the symbol table. Templates are entered into the symbol table but not instantiated.

  • Instantiate Templates

    All references to templates are scanned and the set of required template instances is determined. Each unique template instance is created with actual types substituted for formal arguments. As some types inside a template instance are not known before it is created, creating an instance may require further templates to be instantiated. As a result this pass must be repeated until no more template instances are created

  • Declare Class Symbols

    All symbols declared at class scope are entered into the symbol table. Once this pass is finished all types are completely defined and their internal layout can be determined

  • Build Tables

    Various internal tables are built including:

    • Determine the in memory layout of instance variables for each class
    • Virtual call dispatch tables are built and never-overridden methods are marked as not requiring dynamic method call sequence and as candidates for inlining
    • Interface call dispatch tables are built using graph coloring to reduce table size

  • Declare Remaining Symbols

    All remaining symbols are entered into the symbol table. When this pass finishes the symbol table is complete and the type, size and offset of all symbols is known

Back end

The back end performs the actual work of translating the source program into machine executable form. The back end is executed only for those classes that actually require compilation because:
  • No cached object code is available
  • Source file has changed since cached object code was compiled
  • Change in another class forces a recompile

    For example, a superclass or implemented interface has changed making the dispatch tables in the cached object file invalid or a called method has been overridden and static calls must be replaced with dynamic ones

Dependancies:

  • Jay, to build the parsers. Jay normally outputs Java but L has a similar enough syntax that Jay can be used with a handful of small modifications.
  • Boehm GC, for garbage collection.
  • LLVM, for code generation
  • GlibC, mainly as an interface to Linux system calls but also to support the C portion of the runtime library.
  • GNU Assembler, to translate assembly output into object files
  • GNU Linker, to link object files into executables
  • GCC, to driver the linker. LLVM currently uses GCC to run the linker and has C main() as the runtime library entry point. This ensures that GLibC and any other C or C++ library code is correctly initialized before control enters an L program.
  • Linux.
I first bootstrapped the L compiler by writing a crude version written in C++ that compiled a subset of the L language and then hand translating this C++ version into L. After the initial bootstrap I re-wrote the back end and since then I've been incrementally adding language features to the front end. Adding features or altering the language requires maintaining backwards compatibility until new features stabilise as older versions of the compiler must be used to compile each new version. The compiler is now entirely in L with the exception of the parser generator, which is a slightly modified version of jay. The original C++ version is now discarded as it lacks various language features that the L compiler has come to depend on (templates, operator overloading etc.)
site design and content copyright (C) jeek 1995-2010     [ 13132 ]