C2 forum

General Category => Ideas => Topic started by: kyle on January 12, 2015, 08:19:45 PM

Title: control endian-ness of data and bit structs
Post by: kyle on January 12, 2015, 08:19:45 PM
I was thinking again about Bas' addition of bit fields in such a nice way into the system.  I have some code that implements a fairly ugly protocol (industrial control systems) and there are few bit fields, but the endian-ness of the bytes in the words is very important. 

I have lots of little inline (type safe and the compilers generally do smart things with them) functions that convert the values back and forth between host and target endian types. 

Given that I am looking at this kind of code a lot right now, I thought about things that would make my life a lot easier for low level hardware and protocol support.


C is pretty good at handling bytes.  With packed structs, you can do a lot, but you still have to handle endian problems manually.

foo: int32 little_endian;
bar: uint16 big_endian;

Many hardware systems are controlled by bit fields and those fields may be big or little endian.  Some (there are engineers who probably need to rethink some of their designs) have both :-(

How about a "bitstruct"?

bitstruct [160] {
    control: integer [0..30];
    clear_interrupts: integer [31..31];
    command: integer [32..37];
    ugly_field: integer [38..42,49..53];
    ...
}

The array-like statement [160] defines the number of bits.  Fields within the bitstruct specify where the field is in the bits. 

It is common enough, though thankfully not that common, for some fields to be split into two or more sections.  I showed that with the "ugly_field" example.  I have seen cases where the most significant bits are before the least significant bits.  In that case we'd have something like:

ugly_field: integer [49..53, 38..42];

Ugly!

I am not sure how you would want the little_endian and big_endian keywords to work here.  Maybe you would specify it with the bit ranges like in ugly_field?

I just thought I would throw that out there.

Best,
Kyle
Title: Re: control endian-ness of data and bit structs
Post by: bas on January 14, 2015, 08:23:40 AM
There is a subtle issue here,
In C/C2 you can use bitfields to pack multiple small fields into a word or so. this rougly looks like this in C:
Code: [Select]
struct Foo {
  int a : 3;
  int b : 10;
  unsigned int c : 6;
}
The position where the compiles decides to store the values is left to the compiler, it is unspecified.
So you can (should  ;)) never use this to overlay a hardware register to extract a field.

The other thing is C2's bitoffsets, which is designed to address bit-fields in for example a hardware register.
C2 still has bitfields as well, since they have a valid use.

The second thing is that endianness has nothing to do with bitoffsets, bit 5-7 is just bit 5-7, nothing more,
nothing less. Shifting a word 5 positions to the right for example is done in hardware, for software you can just do
Code: [Select]
value2 = (value >> 5);

In my opinion, bitfields and bitoffsets covers the bit stuff. Bit features that span multiple words has no
valid use case.

I do like the idea of adding endianness features to the languages, since many C projects create their own
htonl()  htons()   ntohl() (host-to-network-long, etc) macros. Maybe we could use some intrinsics here that
allow for better code-generation on some platforms...
Title: Re: control endian-ness of data and bit structs
Post by: kyle on January 15, 2015, 01:31:31 AM
Hi Bas,

Yes, with bit-fields the standard does not specify where they go.  In fact, it specifies that it is platform (and implementation, I think) dependent.

That is what I want to avoid.  Right now, in C, you have to play games with the specific compiler you have for the specific platform you target.  The idea of the bitstruct (I'm sure someone can come up with a better name) is to allow the programmer to specify exactly how bits are laid out in memory. 

The endian extra keywords are very handy, but only for specific integral types (I am not sure if there is a consistent definition for floating point).  I was just getting confused with putting both of them together.

There are three things that make it difficult to easily express hardware interfaces in C:

I think it is harder to look at code that pulls out bit-fields in code and determine that it is right.  You always have to think about the platform.  If you could avoid that, then code would be simpler and easier for you to determine correctness.  C2's bit ranges/bitoffsets get very close to this, but it is still code, not data.

Many CPUs have specific instructions to get bit ranges out of machine words.  Some even have instructions to save data with different endian-ness.

It is not just hardware interfaces that could use this.  There are machine languages (hello, x86-64) where the register bits are split across the instruction.  For instance the REX byte contains a few (2? 3?) bits that are the most significant bits of the registers to allow accessing 16 registers instead of 8.  But the lower bits are in the MOD/RM byte which is later in the instruction.

Many languages use tagged pointers/values such as Lua (LuaJIT uses doubles and does some odd things with the mantissa bits IIRC), Smalltalk etc.  It would be easier to write VMs for those languages if you had data types that expressed exactly what you were doing.  There there is JavaScript...  Ugh.  The main VMs do use some tagged pointers or values. 

I will go see if there are any languages which allow specification down to the bit level.  Probably BitC did.

Best,
Kyle