The front end of the L compiler is implemented entirely in L apart from small parts of the runtime library (which are in assembly and C/C++). The back end is LLVM
Front end
The front end is concerned with preparing the program for compilation by converting the source program text into parse tree form and building various tables. The front end passes are always executed for the entire program source including libraries and run in a single thread.Front end passes:
- Lexical analysis
The source program is scanned and each file is read as a stream of tokens which are read by the parser. The lexical analyser is hand rolled and operates with one character lookahead except when differentiating between inequalities and generic brackets, when an additional short buffer of tokens is also used to look ahead for a closing generic bracket.
- Parsing
Each source file is parsed by one of two Jay generated parsers (C-style syntax or alternative syntax) into a parse tree representing that file. The nodes of the parse tree are objects whose classes correspond to the language constructs that their nodes represent. From this point on, translation of the source program into intermediate code for a given construct is handled largely by its parse tree class.
- Declare Global Symbols
All namespaces, classes, interfaces, structures and enumerations are entered into the symbol table. Templates are entered into the symbol table but not instantiated.
- Instantiate Templates
All references to templates are scanned and the set of required template instances is determined. Each unique template instance is created with actual types substituted for formal arguments. As some types inside a template instance are not known before it is created, creating an instance may require further templates to be instantiated. As a result this pass must be repeated until no more template instances are created
- Declare Class Symbols
All symbols declared at class scope are entered into the symbol table. Once this pass is finished all types are completely defined and their internal layout can be determined
- Build Tables
Various internal tables are built including:
- Determine the in memory layout of instance variables for each class
- Virtual call dispatch tables are built and never-overridden methods are marked as not requiring dynamic method call sequence and as candidates for inlining
- Interface call dispatch tables are built using graph coloring to reduce table size
- Declare Remaining Symbols
All remaining symbols are entered into the symbol table. When this pass finishes the symbol table is complete and the type, size and offset of all symbols is known
Back end
The back end performs the actual work of translating the source program into machine executable form. The back end is executed only for those classes that actually require compilation because:- No cached object code is available
- Source file has changed since cached object code was compiled
-
Change in another class forces a recompile
For example, a superclass or implemented interface has changed making the dispatch tables in the cached object file invalid or a called method has been overridden and static calls must be replaced with dynamic ones
Dependancies:
- Jay, to build the parsers. Jay normally outputs Java but L has a similar enough syntax that Jay can be used with a handful of small modifications.
- Boehm GC, for garbage collection.
- LLVM, for code generation
- GlibC, mainly as an interface to Linux system calls but also to support the C portion of the runtime library.
- GNU Assembler, to translate assembly output into object files
- GNU Linker, to link object files into executables
- GCC, to driver the linker. LLVM currently uses GCC to run the linker and has C main() as the runtime library entry point. This ensures that GLibC and any other C or C++ library code is correctly initialized before control enters an L program.
- Linux.