Author Topic: Vectors in combination with semi-automatic memory management  (Read 7069 times)

gerard

  • Newbie
  • *
  • Posts: 2
    • View Profile
Vectors in combination with semi-automatic memory management
« on: January 03, 2017, 07:05:20 PM »
I’ve been playing with runtime grow-able “arrays” (vectors), in combination with semi-automatic memory cleanup (RAII and “fat pointers”).

C has lots of memory issues such as use after free(), forgotten free() and no bounds checking in combination with pointer arithmetic. Most “higher level” languages deal with this with automated memory management, mostly GC -which is user friendly-, and in case of Rust it’s with the lifetime guarantees -which is pretty hard-. My idea is to use “fat pointers”, a couple of rules and two new types: vectors and slices.

This is how it works. All malloc’ed data is automatically free’d when the function goes out of scope. This can be enabled with using “fat pointers” and initialization of these fat pointers with zero. This means that when a function goes out of scope the only thing it has to do is check whether the pointer is zero or not and in the latter case call free(), at all “exit points” of the function (maybe with an under the hood “goto cleanup”). Except in the case when a function returns a vector of course. Function parameters are also excluded from automatic cleanup. Free() can still be used inside the function that has the vector variable, for instance to limit memory consumption, but not in other functions, to keep it clear. This is a pretty large distinction between C but at the same time it also reduces hard to find errors.

Well, this is the teaser. I have quite a lot of detail in my mind but not written it down yet. If you’re interested I will put it down, it’s probably a page or two/three so it could take a week or so.

kyle

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: Vectors in combination with semi-automatic memory management
« Reply #1 on: January 04, 2017, 05:46:09 AM »
I'm trying to follow along here.   

So how do you handle the case that the function mallocs something and then puts it somewhere it can be reached from outside the scope of the function?   This is a common pattern where a function initializes an entry in a tree or something similar.  In that case, you would need to be able to zero out (or otherwise cause free to be skipped) the local pointer to the malloc'ed data.   

Is this just for the case that there is a local, growable, array in a function?

Clearly you have something in mind here, but I am getting confused by what seems to be a collection of not-completely related things like fat pointers, deferred (a la Go) deallocation etc.

bas

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Re: Vectors in combination with semi-automatic memory management
« Reply #2 on: January 04, 2017, 08:04:00 AM »
C has a perfect mechanism to clean up what you allocate in a function: the stack.

When you need some piece of memory that has a bigger lifetime than the function, you use malloc.
So I'm a bit confused as well what you want. In C++ destructor functions are used to do the stuff you describe
(with smartpointer concepts).


kyle

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: Vectors in combination with semi-automatic memory management
« Reply #3 on: January 07, 2017, 07:45:49 PM »

I'm sorry, I must be really missing something key that holds this together :-(

The part about preventing other functions from deallocating the new array sounds almost like Rust's borrowing mechanism.  I can see how implementing a semi-transparent version of C++'s unique_ptr and shared_ptr could be really powerful without having to implement too much of a run time.  GCC already lets you implement a form of Go's defer() which would give you part of what you are looking for...  I think?

Can you give some concrete examples using pseudo-code?   I must have had extra stupid pills this morning :-(

kyle

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: Vectors in combination with semi-automatic memory management
« Reply #4 on: January 11, 2017, 04:55:04 AM »
So for each level in the call hierarchy, you have data elements that are "local" in the sense that they can only be freed when the function in which they are allocated goes out of scope.  You can pass slices down to called functions, but not 'up' or 'out' to any higher level data structure.  Higher level here means higher in the call hierarchy.

So, this does sound like C99's dynamic arrays, only they are allocated more like malloc than on the stack.   You could implement something like this with GCC-specific additions now and prototype it.

I am not sure why structs would be a problem since you could just allocate them on the stack and the final return and increase of the stack pointer would wipe them out.   In the case of your arrays, you would need to trigger a clean up function when leaving the function scope.

It does sound like this is very much like Rust.

kyle

  • Newbie
  • *
  • Posts: 48
    • View Profile
Re: Vectors in combination with semi-automatic memory management
« Reply #5 on: January 16, 2017, 07:41:07 AM »
You are covering a lot of ground here.

I'll try to address each part.

First, I'll note that if you require the compiler to have a hook for when control leaves a function to delete a vector, you might as well make it general.  There are a lot of things that can be "freed" when control leaves a function, not just memory.  You can close files, sockets, shared memory etc.  C++;s RAII idiom is based on this and Go uses a more explicit method of doing this with its defer construct.  I am more of a fan of RAII than defer simply because defer can get forgotten whereas RAII is baked into the code when you declare your variable.

Data lifetime and the function call hierarchy are not necessarily the same.  Data tends to fall into three main buckets: ephemeral within one function, data only passed down to sub-functions that do operations on it, or data that is built in a function and intended to be returned upward.   For the latter, think of a function that builds a tree node.  As far as I can tell, your vectors only cover the first two of these cases.  How do you turn it off so that functions can construct data and return it to the caller?

Vectors and Blocks

If I understand correctly, vectors are resizable and blocks are not.   Other than that, they are intended to be used more or less the same way?  If so, why the difference?  The memory in question is allocated out of the heap in both cases.  If you allow resizing to reallocate, you can merge the two back into one construct.  The fewer things the better.

The for..range loop is not possible without either significant other infrastructure and conventions (C++) or limited application (just arrays).  From what you are doing here, it seems like it might be easier just to drop C's pointers-are-arrays (except when they are not) approach.  Either disallow pointer arithmetic or minimize it and make arrays carry their sizes with them.  This may be another area for simplification:  If you use an array, that means you want bounds checking.  If you use a pointer, you do not.

You bring some interesting ideas to the table, but I think it might make sense to strip them down into the more basic parts possible and see how those can be combined to build what you want.

  • Automatic freeing: generalize this to some sort of on-function-exit hook.  That is very powerful and you can do a lot with it. C.f. C++'s RAII and Go's defer.
  • Vectors/blocks:  simplify to arrays having bounds.  The for..range construct only works with arrays.  Arrays could be resizeable and assignable independent of this.  I would make a much more clear separation between pointers and arrays.
  • slices.  These are generally good and can be implemented as arrays with shared data parts (though you need to watch the lifetimes!)

Side Effects

Removing globals does not make functions pure.   Think about passing an address around (a pointer).  I pass the value of the pointer around, but each function that gets it can dereference the pointer to look at the value in memory.   So, I just made it a little bit harder to have globals, but I did not really remove them.

I think that memory safety is definitely something interesting.  There are so many security exploits that rely on buffer overflows, bad pointer arithmetic etc.