Stick with your own implementation of malloc. In that way you have behind-the-scenes access to what it does and you are not relying on the behaviour of the RTL and the application.
So basically it means i will now have to look at other possible areas for Optimisation...