I hope, Intel will add to its processors instructions to operate with big integers. Now, extensions SSE/AVX are ugly - only packed, without setting of carry/zero/etc flags... It's real headache to use them for calculations of big numbers (common purpose registers give simplier way). It seems, other developers think the same.

RSA is used widely now: people like it. The most expensive operation in it is multiplication of big integers. For example, integers with 4000 bits of length (500 bytes). It's realized by multiplication of polynoms with segment length 8 bytes. It gives a lot of single multiplications (and it grows quadratically with growing of modulus length). On length 4000 bits, one RSA decrypt takes 400 milliseconds (in my realization on 1.8 GHz). 90% of processing time are multiplications.

If even actual registers XMM/YMM/ZMM could operate with solid big integers (32 bytes * 32 bytes = 64 bytes), this time can be 4^2=16 times less - 25 milliseconds. I think, this is good argument for Intel to add this functionality in future processors. They already added instructions for fast AES encryption and for fast hash calculations (SHA-1, SHA-2).

So - in future, when processors will support it - it will be cool to extend integer types to bigger length: "uint128", "uint256", "uint512". For using with SSE/AVX registers. With aligning by default. With possibility to make add, subtract, multiplicate, bit-shift etc.

Maybe, real numbers will become longer too? "real128", "real256", "real512" ?

Does anybody need it? Maybe, scientists?

*************************************************************************

SSE/AVX are ugly - has to be because of changing of generations of developers. Early programming core of this processor family is very solid. This touched even common purpose registers - when x86_64 appeared. Did you know following? If you make "MOV AL, 4" - AH and higher 16 bits of EAX don't change. But, if you make "MOV EAX, 4" - higher 32 bits of RAX are set to zero (!). This behavior is like MOVZX instruction - has to be it's simply design error. What is interesting - Intel documentation says nothing about it.