General Category > Ideas

Unsigned char

(1/3) > >>

chqrlie:
Hello everyone,

I'm very pleased with Bas initiative to simplify and extend the C language, fixing some of the shortcomings while keeping the spirit.

Choosing as the base types the fixed bit size signed and unsigned integers and IEEE floating point types is quite pragmatic.  Interestingly, it does not prevent users from defining some of the standard C type names such as float, double, int, short and long as aliases.

Regarding the char type, I believe it should be an alias for uint8 instead of int8 as proposed currently. 

I completely agree that char should be an 8 bit type, but making it signed is a big source of problems.

Most current C compilers offer a switch to select between signed and unsigned char for the naked char type but sadly default to signed char.  This choice probably linked to compatibility reasons for historic code, but it is inconsistent with the rest of the C specification:

getc() for instance returns an int with a value set of 0..UCHAR_MAX plus EOF: comparing the return value from getc() to a char variable or even a character literal will fail for non ASCII characters and might even mistakenly match EOF.

The macros defined in <ctype.h> also take an int with the same set of values: passing a char to these macros is incorrect for non ASCII values.  The glibc implementation performs dirty tricks to prevent erratic behaviour by using tables with 384 entries where 257 would suffice.

Making char unsigned seems the only consistent choice for Unicode too: code points are positive as well, in the range 0..0x10FFFF.

Conversely, I fail to see any advantage in making char signed by default.  Historic code is not an issue, and programmers should only use the char type for actual character strings, and uint8 or int8 for variable and arrays of bytes.  int8 and uint8 are standard types in C2, so using char for this is not the default choice.

Lastly, I think char*, int8* and uint8* should be incompatible pointer types.  I'm not sure if that's the case with the current specification.

bas:

Good arguments! Can't argue with that ;)
I'll change the char to uint8.

Currently char in equal to int8 (and will be equal to uint8), so char* and uint8* will be completely
equal then (just different syntax). Do you mean you want to make char* and uint8* incompatible
or char* and int8* ? (after changing char to unsigned)

DerSaidin:
I agree with char being uint8.

Imo:

* uint8* and int* should be incompatible
* uint8* and char* could be incompatible, but it doesn't matter much
Just because they happen to be the same size they should be compatible?
I think there is a reasonable argument to separate a number from a character.

I think it doesn't matter too much because explicit casting is still available if that is really the intention.

chqrlie:
Of course int8*, uint8* and int* are all different types and coercing on into another should require an explicit cast.

With the current proposal, char being an alias to uint8, char* and uint8* are really the same type.  This comes as no surprise to C programmers used to the typedef semantics.  Yet it would be nice to have a way to prevent this in some cases, while preserving the scalar nature of these types.

My point boils down to 2 separate questions:
- should type aliases convert transparently, as they do in C?
- should pointers to type aliases be distinguishable or not?  eg: should we require a cast to pass a char* to a function expecting a uint8*?  They have the same sizes and value sets, but are different semantic beasts: converting a uint8 to a string would yield a different representation from the conversion of a uint8 with the same value.  You do not want to have overloading in C2, so this example is contrived, but you get the idea.

bas:
I'm currently implementing the proposed change (char == uint8). The next interesting question I
run into is the following: the following piece of code should work right?

--- Code: ---const char* text = "Hello World";
char c = 'a';

--- End code ---

So changing *char* would imply that the type of string literals would also change to uint8 and that a
character literal ('a') would also have the type uint8..
I don't think the impact is very big, but I'm asking here to avoid overseeing something important...

Navigation

[0] Message Index

[#] Next page

Go to full version