Author Topic: C2 in C  (Read 90354 times)

lerno

  • Full Member
  • ***
  • Posts: 247
    • View Profile
C2 in C
« on: January 23, 2019, 01:12:38 AM »
I'm implementing a version of the C2 parser in C right now. Not all that much to see. It parses all of C2 as far as I know (...probably a quite a few bugs...) and mostly follows C2C's architecture. Reading recipe files and a some other stuff and also the semantic analysis is just started.

However the idea is that the parser is prepared from the first for some compile time evaluation. For example, this would be completely valid and resolved during compile time:

Code: [Select]
Foo foo[sizeof(i32) > 0 ? 1 : 2];

I'm intending to have everything constant folded by default to open up for some lightweight compile time resolution. I suspect it will help with future semantic macros and generics.

That said it's far from being done.

bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Re: C2 in C
« Reply #1 on: January 29, 2019, 10:18:12 AM »
The difference between C and C2 is that you need multiple passes to analyze C2. So sizeof(Foo) cannot be resolved until you
know what Foo is. This means the parser (with Sema) will have to create the AST first and the Analyzer will iteratively resolve
it. That's what's happening currently. A pass is very lightweight, if it only wants to look at a subset (for example the imports), since
we are iterating over a data structure and not actually 'parsing' it again.

So constant folding is just a certain pass that can happen if other required passes have been done.

lerno

  • Full Member
  • ***
  • Posts: 247
    • View Profile
Re: C2 in C
« Reply #2 on: February 01, 2019, 10:37:40 PM »
I know, but the parser is sometimes hardcoded to identify types (or sometimes even just built in types). This should be relaxed in order to allow more compile time evaluation – which in turn opens up for semantic macros.

lerno

  • Full Member
  • ***
  • Posts: 247
    • View Profile
Re: C2 in C
« Reply #3 on: February 12, 2019, 01:17:55 AM »
My pure C implementation of C2 is moving forward slowly (at https://github.com/lerno/titanos - look at the development branch). It's using LLVM exclusively (no C gen).

I'm focusing on the areas where C2C is weak today, so that would both be LLVM and constant folding.

The compiler uses BigInt for integer constants (and really should do the same for floats, but right now it doesn't). I recently worked on the implicit casts and in order to make sense out of it I've changed a bit from C's implicit casting, mostly following Zig.

To recap C's casting rules:

  • with two operands where at least one is float, upgrade everything to the biggest float (float -> double -> long double)
  • if no float is found, int conversion resumes.
  • If both types are signed or both types are unsigned: promote to the largest of the (eg short -> int -> long etc) (unsigned short -> unsigned int -> unsigned long etc)
  • If they have different sign and the signed version can represent represent all numbers of the unsigned (e.g. u16 - i32), promote to the signed version.
  • If it's not possible to represent it, promote to the unsigned version of the type. (e.g. u32 - i32 => u32, u64 - i32 => u64)

For constant folding I've followed these rules instead:

  • Float conversion as with C
  • Folding an operation with bigint and bit limited int will convert the bigint to that bit size. If it's not possible to convert the constant, a compile time error will occur. E.g. const i8 c = 1; const i32 d = c + 200; is illegal since 200 cannot be converted to i8 without loss.
  • If two non bigints are folded, it's converted to the biggest type (like in C), for signed / unsigned conversion: if the unsigned constant may be contained in the signed type, then the conversion is valid, otherwise it is a compile time error. (e.g. cast<u8>(200) + cast<i8>(1) is a compile time error)
  • In addition, constant overflow is a compile time error. So cast<u8>(200) + cast<u8>(100) would be a compile time error as 300 does not fit in u8.

In order to make unsigned <=> signed conversions I'm thinking of some very simple lossy conversion, like

Code: [Select]
i8 a = -1;
u8 b = @ucast(a); // Bitcast of a
a = @scast(b); // Bitcast of b

Maybe even add a special assignment like (placeholder syntax!):

Code: [Select]
b u= a; // same as b = @ucast(a)
if (b u== a) ...; // same as if (b == @ucast(a)) ...
a s= b; // same as a = @scast(a)
if (b s== a) ...; // same as if (@scast(b) == a) ...