Post

C Compiler

A collaborative project with Satyam Jay

View code

Overview

This project involves the development of a C compiler that is capable of parsing C code, building an Abstract Syntax Tree (AST), and emitting LLVM Intermediate Representation (IR). The compiler supports basic C constructs and is designed to be extended with more complex features and optimizations.

Development Phases

Part 1: Building AST

  • Objective: Build an AST using the provided Bison grammar specification.
  • Implementation:
    • Modified the c.y file to support function definitions and expressions.
    • Implemented a dump_ast() function to visualize the AST structure.
    • Ensured the AST could handle basic examples like
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    void empty() { }
    
    void simple_arith() {
      (10 - 10/3) << 3 | (23+8*12) & 1024;
    }
    
    void simple_arith_with_arg(int d) {
      (d > d/2) || (d >= 100) && (d < 99);
    }
    

Part 2: Supporting More Complex Constructs

  • Objective: Enhance the compiler to support additional C constructs necessary for compiling more complex programs.
  • Implementation:
    • Added support for variable declarations, control flow statements (if, while, return, goto).
    • Implemented a symbol table for tracking scopes and declarations.
    • Ensured the compiler could parse and build AST for
    1
    2
    3
    4
    5
    6
    7
    8
    
    int printf(char const *format, ...);
    
    int
    main(int argc, char **argv)
    {
    printf("hello, world\n");
    return 0;
    }
    

Part 3: Emitting LLVM IR

  • Objective: Generate LLVM IR from the AST.
  • Implementation:
    • Integrated LLVM libraries to convert the AST directly into LLVM IR.
    • Ensured the generated LLVM IR could be executed using the LLVM interpreter (lli).

Part 4: Local Optimizations

  • Objective: Implement local optimizations in the LLVM IR generation phase.
  • Implementation:
    • Added optimizations such as constant folding, dead code removal, and trivial constant propagation.
    • Example optimization:
      1. Algebraic Simplification. Adding 0, Multiplying 0/1...
      2. DeadCode Removal. if(1) then remove else. if(0) then remove if. while(0) remove while.
      3. Constant Propagation. Only supported propagation in the same block. x=1; y=x+1; => y=2

Testing

Tested on:

  1. CMake v3.16.3
  2. Boost v1.71.0
  3. llvm-19 / llvm-17
Build Instructions:
  1. mkdir build && cd build
  2. cmake ..
  3. make

Compile from source to LLVM IR

  1. ./cc --in input.c Run with default options no optimization, input file is “input.c”
  2. ./cc --in input.c --all-opt Enable all optimizations, input file is “input.c”
  3. ./cc --help to see all options

Run LLVM IR:

  1. lli-17 input.c.ll
This post is licensed under CC BY 4.0 by the author.
Bookmarks
Notes & Highlights

Comments powered by Disqus.