Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - chqrlie

Pages: [1]
1
Ideas / Re: Require explicit (un)initialization.
« on: January 30, 2019, 09:06:15 PM »
I don't like mandatory initialization either, but implicit initialization to the zero value of the type is confusing too. The compiler should just issue an error, not a warning when it detects that a variable is used before intialization. There are border cases where it is difficult, or plain impossible to determine if a variable was initialized or not, requiring explicit initialization in these cases is OK IMHO. This is basically what I get with `-Wall -Werror`.

Regarding a possible syntax to specify uninitialized status, I would suggest:

Code: [Select]
int a = void;  // a not initialized, warning if used before later assignment.
This could be used to tell the compiler that a variable is no longer valid and thus prevent its use in further computations. For example:

Code: [Select]
free(p);
p = void;
return p;   // error: use of an invalid value.


2
Ideas / Re: Operation's priorities - more difficulties then benefits
« on: January 24, 2019, 10:08:20 PM »
What is ergonomical in swapping the semantics of < and << ?
Can you cite another language where a < b does not mean a less than b?
This change is a gratuitous departure from common practice, not just C compatibility... very confusing and not at all ergonomical.

3
Ideas / Re: Operation's priorities - more difficulties then benefits
« on: January 24, 2019, 04:48:37 PM »
You cannot seriously proposeto change the meaning of operators such as <, >, << or >> ... It would be catastrophic.

Regarding the sometimes counter intuitive order of precedence of the lesser used operators, it is indeed confusing and error-prone. Many compiles offer warnings for unparenthesized use, preventing common errors.

Changing the precedence rules is IMHO a bad idea because it is a gratuitous difference with C, causing more confusion when porting code and/or writing in both languages. A simpler proposal is to make these relative precedence of these operators undefined, causing compile time errors when the are used without parentheses except for simple arguments and unary operators.  This would make some C code incompatible with C2 but would preserve compatibility of C2 code with C.

4
Ideas / Re: Unnatural behavior in C type cast - (uint64)(int32)
« on: January 21, 2019, 11:36:34 PM »
I agree that the behavior uint64 var3 = (uint64)(int64)var0; is somewhat unnatural, but converting a negative value as an unsigned value is bound to cause surprising results. Your solution is just as surprising: uint64 var4 = (uint64)(uint32)var0; has the disadvantage that converting the resulting value back to a signed type int64 does not produce the original value, whereas the C semantics do ensure that.

In other words (int64)var3 == var0 for all values of var0, but (int64)var4 == var0 only holds for positive values of var0.

I rest my case.

5
Ideas / Re: Unsigned char
« on: February 05, 2016, 04:03:52 PM »
There should be 3 distinct 8-bit types (at least):

* `int8`: an 8 bit signed integer, translated to C as `int8_t`, equivalent to `signed char`.
* `uint8` or `byte`: an 8 bit unsigned integer, translated to `uint8_t` or `unsigned char`.
* `char`: an unsigned 8 bit byte, part of a UTF-8 encoded null terminated string.
   This latter type translates to C as `char` and the C compiler should be configured with `-funsigned-char` to treat `char` as unsigned by default.  I already pointed the inconsistencies attached to the type `char` being signed by default on many platforms, mostly for historical reasons.

For me, `char` and `uint8` are distinct types.  They have the same representation and arithmetic properties but should be used appropriately depending on context and semantics.  For example a string literal is a `char[]`, not a `byte[]`, a `uint8[]` nor an `int8[]`.  Character constants should probably be of type `char` instead of `int` as they are in C.

Another question is whether to use `int8` and `uint8` or the simpler alternatives `i8` and `u8`, and similarly `i16`, `i32` and `i64`...

6
Ideas / Re: Unsigned char
« on: September 15, 2015, 12:52:07 AM »
In your code snippet, using a special value for c still works the same if you use 255 instead of -1.  Yet it seems a tad sloppy to use a valid char value to indicate some special case: the standard idiom for this is to make c an int and use a value that is not a valid char value.  EOF comes to mind (namely -1 in your example), which is precisely the reason why char should *not* be int8.

Regarding the parsing of sizeof(X), of course it poses a problem to restrict the syntax in unexpected ways without a meaningful error or warning message.  It violates the principle of least surprise.  I for instance wasted almost an hour trying to compile simple test code and digging through the compiler source code and then into the modified clang parser... not a very good trip.  It is quite common to see sizeof used to get the size of a struct member - sizeof(a.b) - or that of a dereferenced pointer - sizeof(*p) -, so you need to accept an expression in sizeof(X).  I also tried the operator syntax sizeof 'a', which is not ambiguous, but is not accepted either.

I have never used sizeof('a') in production code, but in test code and it is actually a simple way to check if the compiler is invoked in C or C++ mode without the preprocessor:

Code: [Select]
if (sizeof('a') == 1) {
    /* C++ mode */
} else {
    /* C mode */
}



7
Ideas / Re: Unsigned char
« on: September 13, 2015, 02:10:08 PM »
The impact should be very limited.  You can still produce C code with the char type for strings if you pass the appropriate flag to the C compiler such as -funsigned-char for gcc and clang.

Incidentally, character literals have type int in C, i.e. sizeof('a') == sizeof(int).  This is different from C++ where character literals have type char, arguably a more natural choice and definitely needed to select the correct overloaded function in cases such as:
Code: [Select]
    cout << 0x41;  // will output 65 on stdout.
    cout << 'A';   // will output A on stdout.

The C2 documentation does not specify whether 'a' has type char or int.  I'm having trouble compiling simple test code using the sizeof operator and I haven't seen from the C++ source code how you treat sizeof and more specifically sizeof('a').

8
Implementation Details / Re: BitOffsets
« on: September 08, 2015, 11:37:51 PM »
I see the rationale.  I guess it is logical to follow usage from the hardware community.  The ARM specs indeed use this convention, whereas the Intel docs use a mixture of reg[HIGH-LOW] and reg[HIGH .. LOW].

It bothers me that this convention is still inconsistent with the spirit of C where arrays always use offsets and sizes, not minimum and maximum indices.

We discussed case ranges in the form LOW .. HIGH, e.g.:
Code: [Select]
    switch (c) {
      case 'A'..'Z':  upper++; break;
      case 'a'..'z':   lower++; break;
    }
Do you intend to introduce a syntax for array slices similar to case ranges?  It would be very concise for initializing and copying, but the same issue arises: should we use offset:size or min..max (inclusive) or even start...stop (start included, stop excluded) ?  All 3 syntaxes, borrowed from other languages, could coexist but consistency is needed for similar but separate semantics.

The syntax for bit slices HIGH .. LOW would then be consistent as a variation of LOW .. HIGH.

9
Implementation Details / Re: BitOffsets
« on: August 30, 2015, 11:40:55 PM »
Your proposal for bitfield manipulation operators is very interesting, but I find the syntax confusing, because it is somewhat inconsistent with the C conventions.

In your example:

    int counter = value[14:10];

the width of the bitfield is not obvious.  As a matter of fact, people unfamiliar with the construct may expect the width to be 4 bits or perhaps 10 bits.  It is inconsistent with both the C indexing paradigm and the standard bitfield syntax where the number of bits appears after the colon.  I would favor this alternative for the same bitfield:

    int counter = value[10:5];  /* extract 5 bits at bit offset 10 from the integer value */

This syntax is more consistent, as you can see in this classic example:

   int red    = rgb[0:5];
   int green = rgb[5:6];
   int blue   = rgb[11:5];

This syntax is the same for storing values as well or even updating them:

    value[10:5] = 43;  /* should produce a diagnostic: 43 does not fit in 5 bits */
    value[10:5] += 1;  /* much more concise than the C alternative, and better code generated too! */

Both the offset and the width could be unsigned integer expressions with some limitations.  We probably do not want to integrate this construct generically into the type system and create pointers or references to bitfields, fixed or variable, but the above constructs can be implemented simply and do simplify the programmer's life.

Finally, I have written a lot of database code dealing with bitmaps and wished for a good syntax for arrays of 1, 2 or 4 bit values.  The same syntax could be used for this purpose on integer pointer types.  For non trivial bit widths, it is very cumbersome to write the code by hand and even more complicated to use the best underlying type to take advantage of target specific opcodes.  Definitely something you want the compiler to take care of.

10
Ideas / Re: Unsigned char
« on: August 30, 2015, 10:56:06 PM »
Of course int8*, uint8* and int* are all different types and coercing on into another should require an explicit cast.

With the current proposal, char being an alias to uint8, char* and uint8* are really the same type.  This comes as no surprise to C programmers used to the typedef semantics.  Yet it would be nice to have a way to prevent this in some cases, while preserving the scalar nature of these types.

My point boils down to 2 separate questions:
- should type aliases convert transparently, as they do in C?
- should pointers to type aliases be distinguishable or not?  eg: should we require a cast to pass a char* to a function expecting a uint8*?  They have the same sizes and value sets, but are different semantic beasts: converting a uint8 to a string would yield a different representation from the conversion of a uint8 with the same value.  You do not want to have overloading in C2, so this example is contrived, but you get the idea.

11
Ideas / Re: Enum classes
« on: August 30, 2015, 10:37:14 PM »
You both have valid points.  I don't have a definite position on this issue (aside from the syntax preferences).  I am a pragmatic programmer: I'd rather build an opinion from first hand experience with a real life project. Hopefully before the end of this year!

12
Ideas / Re: Enum classes
« on: August 30, 2015, 11:28:02 AM »
This is probably just a matter of taste, but the syntax with the :: class or namespace specifier does not appeal to me at all.

It seems also overkill to make enum constants separate by default.  It is an unnecessary departure from C.  The module namespace already isolates the names from other modules.

Whether a single level of module name spacing is sufficient is another matter.  I'm considering porting qemacs to C2 as a full scale sample project, and it would make sense the have separate modules for the various functional parts of this moderate size project (50K slcc).  Yet all these modules should be submodules of a more general qemacs "package".

In any case, please avoid the double colon ( :: ) and the angle brackets (< >), the C++/Java look and feel is not appealing.

13
Ideas / Unsigned char
« on: August 30, 2015, 12:27:27 AM »
Hello everyone,

I'm very pleased with Bas initiative to simplify and extend the C language, fixing some of the shortcomings while keeping the spirit.

Choosing as the base types the fixed bit size signed and unsigned integers and IEEE floating point types is quite pragmatic.  Interestingly, it does not prevent users from defining some of the standard C type names such as float, double, int, short and long as aliases.

Regarding the char type, I believe it should be an alias for uint8 instead of int8 as proposed currently. 

I completely agree that char should be an 8 bit type, but making it signed is a big source of problems.

Most current C compilers offer a switch to select between signed and unsigned char for the naked char type but sadly default to signed char.  This choice probably linked to compatibility reasons for historic code, but it is inconsistent with the rest of the C specification:

getc() for instance returns an int with a value set of 0..UCHAR_MAX plus EOF: comparing the return value from getc() to a char variable or even a character literal will fail for non ASCII characters and might even mistakenly match EOF.

The macros defined in <ctype.h> also take an int with the same set of values: passing a char to these macros is incorrect for non ASCII values.  The glibc implementation performs dirty tricks to prevent erratic behaviour by using tables with 384 entries where 257 would suffice.

Making char unsigned seems the only consistent choice for Unicode too: code points are positive as well, in the range 0..0x10FFFF.

Conversely, I fail to see any advantage in making char signed by default.  Historic code is not an issue, and programmers should only use the char type for actual character strings, and uint8 or int8 for variable and arrays of bytes.  int8 and uint8 are standard types in C2, so using char for this is not the default choice.

Lastly, I think char*, int8* and uint8* should be incompatible pointer types.  I'm not sure if that's the case with the current specification.

14
Ideas / Re: Switch statement
« on: August 29, 2015, 11:50:08 PM »
Following the principle of least surprise, DerSaidin's proposal makes a lot of sense.

The question is do we really want a new keyword such as fallthrough or fallthru?  As an alternative to a new keyword, we could use the goto keyword with new semantics:

    goto next;      /* fall thru to the code below */
    goto default;   /* to branch to the default case */
    goto case 'a';  /* to branch to the handler for 'a' */

Regarding combining multiple cases, it would be preferable to allow multiple case statements to follow one another without any intervening statement as this is the common idiom for C and it is widely used and poses no real problem.  We could also extend the syntax for the case clause, but with operators that do not clash with current use:  commas could be used to enumerate values, and a range operator could be introduced for to specify inclusive value ranges (.. seems an obvious choice for this).

Pages: [1]