Author Topic: Tagged unions  (Read 4677 times)

lerno

  • Full Member
  • ***
  • Posts: 247
    • View Profile
Tagged unions
« on: November 07, 2018, 01:14:15 AM »
On github Rust's enums were mentioned, and that they're essentially tagged unions.

Several languages that offer unions also have tagged unions "out of the box".

Borrowing from Cyclone a bit, consider this:

Code: [Select]
tagged union Foo {
  int i;
  const char *c;
};

Foo foo;
foo.i = 3;
@istag(foo.i) // => true
@istag(foo.c) // => false
foo.c = "hello";
@istag(foo.i) // => false
@istag(foo.c) // => true

switch(@tag(foo)) {
  case Foo.i: printf("Was %d\n", foo.i); break;
  case Foo.c: printf("Was %s\n", foo.c); break;
}

The code is a variant of what Cyclone and Rust does.

Note that I use the @-prefix for the compile-time keywords. If we decide on not having a @-prefix, then tag / istag would need to be keywords.

lerno

  • Full Member
  • ***
  • Posts: 247
    • View Profile
Re: Tagged unions
« Reply #1 on: November 07, 2018, 01:35:24 AM »
Note that tagged unions is syntax sugar for something like

Code: [Select]
type enum Foo__tag : i8 {
   i, c
};

type struct Foo {
  Foo__tag tag;
  union {
  int i;
  const char *c;
  };
};

@istag(foo.i) compiles to (foo.tag == Foo__tag.i), and @tag(foo) to foo.tag finally Foo.i is an alias for Foo__tag.i

bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Re: Tagged unions
« Reply #2 on: November 12, 2018, 10:10:16 AM »
Ahh that makes it clear what tagged unions are. I would prefer to keep this out of the language as
it generates 'underwater' code.

lerno

  • Full Member
  • ***
  • Posts: 247
    • View Profile
Re: Tagged unions
« Reply #3 on: November 13, 2018, 03:14:00 AM »
But it creates a lot of boilerplate not to have it. It's a feature of lots of earlier languages. Pascal, Modula-2, Ada had them (for example) too.

I really understand the need not to have unnecessary stuff, but boilerplate must be removed as well. If you look at the Stmt classes - those are tagged unions and there is a (boilerplate) method to look at what type it is (is it type X?) and if so, allow it to be cast into that particular type.

Look at the many examples here: https://en.wikipedia.org/wiki/Tagged_union

If you look at Pascal, the discriminator tag is very explicit:

Code: [Select]
type shapeKind = (square, rectangle, circle);
     shape = record
                centerx : integer;
                centery : integer;
                case kind : shapeKind of
                   square : (side : integer);
                   rectangle : (length, height : integer);
                   circle : (radius : integer);
      end;

Something similar could be a minimal C2 extension:

Code: [Select]
type enum ShapeKind {
  SQUARE,
  CIRCLE,
  RECTANGLE
}

type Shape struct {
  i32 centerX;
  i32 centerY;
  ShapeKind kind;
  union (kind) {
    case SHAPE:
      i32 side;
    case RECTANGLE:
      i32 length, height;
    case CIRCLE:
      i32 radius;
  }
}

This should be iterated a few times

Code: [Select]
type Shape struct {
  i32 centerX;
  i32 centerY;
  i8 kind; // Just like this? Have the enum be implicitly created?
  union (kind) {
    square: { i32 side; }
    rectangle: { i32 length, height; }
    circle: { i32 radius; }
  }
}

It removes some with manual tagged unions:

1. Boilerplate is removed
2. The tag is explicitly linked to the union - good for code reading.
3. It has automatic update of the tag when accessing union fields, this removes a source of errors.
4. The compiler can analyse and make better warnings.
5. In the debug builds, the compiler can insert a check to warn when a union field is accessed when the type is different.

Note that everything can be done extremely straightforward here. The only "magic" is the dual update of tag and union value during write (shape.radius = 10 makes the shape a circle). Even that can be dropped actually. I think the most important parts are 1, 2, 4, 5. The uses of automatic tag updates can definitely be questioned.