Author Topic: Why do we need Clang and LLVM? (Read 90633 times)

lerno · « **on:** October 21, 2018, 12:28:43 AM »

LLVM and Clang are huge codebases that takes forever to compile whenever you want to play around with C2. That is bad.
LLVM offers excellent optimizations. That is good.

Here is an idea:

- For debug builds, initially use TinyCC (Tiny C Compiler)
- For optimized builds, use LLVM / Clang (and possibly later LLVM IR directly)

TinyCC is... tiny compared to LLVM. It's also very, very fast. TinyCC is also built to be fed C code directly.

lerno · « **Reply #1 on:** October 24, 2018, 12:11:36 PM »

On my older laptop LLVM compilation takes about 12 hours. When trying the c2-docker version, it died halfway through due to lack of virtual memory. After running a few hours. LLVM is a resource hog.

bas · « **Reply #2 on:** October 24, 2018, 02:38:38 PM »

(see other post about LLVM/Clang).

In the beginning I was hoping on being able to reuse Clang's AST. That was not the case. Still
just using the Tokenizer and Diagnostics engine already saved a lot of time. Additionally most of
C2's parsing is heavily based on/inspired by Clang's parser.

lerno · « **Reply #3 on:** October 24, 2018, 04:46:30 PM »

My experience is that lexing is actually the most straightforward part of writing a parser?

lerno · « **Reply #4 on:** October 24, 2018, 11:06:00 PM »

One thing would be to localize the DiagnosticsEngine to part of the code and call it through a wrapper. That way it's straightforward to replace it.

Can you explain what you use Clang's preprocessing for? Are we parsing any of standard C so that we need it as input?

lerno · « **Reply #5 on:** October 25, 2018, 11:51:44 PM »

For fun I wrapped all the calls of DiagnosticsEngine aside from the config in C2Builder::build.

The only calls outside of C2Builder would be starting a new file, Report, hasErrorOccurred + DiagnosticErrorTrap.

However, it's also passed into the Preprocessor and the SourceManager. From what I understand they are actually both not required in the sense that C2 doesn't have a complex library search, nor any preprocessing (or at least trivial such processing)

If we consider dog fooding C2 eventually, then this is a task that has to be done sooner or later. A self-hosting c2c might rely on LLVM IR and optimization, but not Clang for frontend (unless it's actually used for C parsing – which it isn't)

lerno · « **Reply #6 on:** October 25, 2018, 11:53:20 PM »

BTW, at some point a document outlining the coding standard for the C++ part is needed. It's occasionally inconsistent, and because the indentation style differs from what I'm used to I need some sort of reference to work with in order to use auto formatting.

lerno · « **Reply #7 on:** October 29, 2018, 09:36:23 PM »

Now that I have a better overview of the code I feel it's even more urgent that the Lexer is swapped for a custom one:

The lexer parses a lot of stuff that C2 shouldn't, this makes the code rather unclear at places.
The lexer has a firm idea about how to lex certain tokens, like "@" which might complicate parsing.
We have full C pre-processing. I don't think this is a good thing.
Keywords need to be defined in the Clang source
Code requires reading up on a lot of Clang code to see how things work, rather than just the c2compiler code.

lerno · « **Reply #8 on:** November 01, 2018, 02:09:03 AM »

Status update:

1. Relevant parts of Clang lifted out, verified to work stand alone.
2. Now working on cutting away the huge amounts of unnecessary code needed for various versions of C, C++ and different extensions that don't make any sense for C2.

lerno · « **Reply #9 on:** November 01, 2018, 02:29:36 PM »

Interestingly, about 90% of the complexity of Preprocessor work comes from resolving modules and macros from modules correctly. So this is why macro resolution should come AFTER parsing and not as a pre-processor

lerno · « **Reply #10 on:** November 01, 2018, 02:52:43 PM »

Code in clang's lexer:

Code: [Select]

    Char = getCharAndSize(CurPtr, SizeTmp);
    if (Char == '=') {
      Kind = tok::percentequal;
      CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
    } else if (LangOpts.Digraphs && Char == '>') {
      Kind = tok::r_brace;                             // '%>' -> '}'
      CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
    } else if (LangOpts.Digraphs && Char == ':') {
      CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
      Char = getCharAndSize(CurPtr, SizeTmp);
      if (Char == '%' && getCharAndSize(CurPtr+SizeTmp, SizeTmp2) == ':') {
        Kind = tok::hashhash;                          // '%:%:' -> '##'
        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
                             SizeTmp2, Result);
      } else {                                         // '%:' -> '#'
        // We parsed a # character.  If this occurs at the start of the line,
        // it's actually the start of a preprocessing directive.  Callback to
        // the preprocessor to handle it.
        // TODO: -fpreprocessed mode??
        if (TokAtPhysicalStartOfLine && !LexingRawMode && !Is_PragmaLexer)
          goto HandleDirective;

        Kind = tok::hash;
      }
    } else {
      Kind = tok::percent;
    }

Indentation is not a sufficient way to make C's if-statements readable.

Makes you want to do something like this:

if-stmt
: "if" expr "->" stmt
| "if" expr "{" stmt_list "}" else-clause
;

else-clause
: <empty>
: "else" "{" smt_list "}"
: "elseif" expr "{" smt_list "}" else-clause
;

bas · « **Reply #11 on:** November 07, 2018, 10:37:37 AM »

Yes, lexical macros (like C) allow everything, but cause so much issues in tooling etc. Once C2 has a
replacement macro system, we can simplify the Lexer a lot.

C2 forum

News:

Author Topic: Why do we need Clang and LLVM? (Read 90633 times)

lerno

Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

bas

Re: Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

lerno

Re: Why do we need Clang and LLVM?

bas

Re: Why do we need Clang and LLVM?