I’m very slowly working on a simple vector math library, appropriately named SVML, which will be released soon. Eventually, it will be the vector math equivalent of The-Best-Thing-Ever™ (or at least I hope so). My goals are three-fold:

  1. Concatenate all the vector math functionality I’ve used in previous projects
  2. Learn more about C++
  3. Have fun

Since this was a learning/fun project, I decided to work on the stuff I didn’t already know how to do first. With my recent thesis work with shaders and graphics calculations, implementation of swizzling functionality is a must for me. Ideally, my swizzling would have an interface identical to that of shaders, with both reading and writing masks. I should be able to write:

my2DVector.yx = my4DVector.zw;

which would store the z component of my4DVector in the y component of my2DVector, and the w component of my4DVector in the x component of my2DVector. Similar conversions and operational overloads would apply to all combinations of swizzles, so long as the types and dimensions are equivalent.

I did some looking around at other libraries and forum posts where people had tried to implement it. I was able to find quite a few implementation attempts. But none of these did what I wanted. Only one had true write swizzling, and try as I might, I wasn’t able to figure out how they did it (only the header was available, I believe, not the actual implementation). All the rest implemented read swizzling as function calls, some with #define’d names to give the familiar syntax.

I didn’t know what I was getting into, but I decided to see if I could find a better way. This is an account of my journey.


Basics

The first, easiest route is to create function overloads for every potential read swizzle operation. This replaces a single function with swizzle-numbers defining the operation I want with lots of individually quick code.

I created lots of member functions like these:

vec2 xx()const{return vec2(x,x);}
vec2 xy()const{return vec2(x,y);}
vec2 yx()const{return vec2(y,x);}
vec2 yy()const{return vec2(y,y);}

and so on for every combination allowed by the dimension. This accomplished read swizzling. You can get rid of the parentheses by using #define’s:

#define xx xx()
#define xy xy()
#define yx yx()
#define yy yy()

So far, so good. What about write swizzling? Well, since I had polluted the global space with every combination of more than one component, I couldn’t use the same symbol for the write functions. I settled on a swizzle notation that appended an underscore. Additionally, I restricted input to swizzles that made sense. If you have a swizzle with duplicates, you’ve duplicated assignments needlessly. So, for the 2D write swizzles, I have this:

void xy_(float xI, float yI)
{
	x = xI;
	y = yI;
}

void yx_(float xI, float yI)
{
	y = xI;
	x = yI;
}

So that works. This version has correct read syntax at the expense of a large number of functions and a large amount of #define pollution. Writing requires different syntax. Instead of being able to simply assign values to it with an assignment operator, this syntax had to be used:

my2DVector.yx_(3.0f, 1.0f);

I might also have eventually made write functions that would take other vectors (and therefore swizzles), but at this point, I thought there must be a better way.

I started asking around, everyone I could. Showing them what I wanted to do, and what things I’d seen. Mike Fried suggested that such syntax might be possible with unions, but wasn’t able to elaborate how. I kept that on the back burner and re-examined what my syntax would mean in terms of C++, rather than what I wanted it to do. That’s when my next idea hit me.


Parent/Child Relationship

In C++, the dot operator “.” is used to access members of an object. The object could be a class, a struct, or union. The member could be a variable, a function, or… a child object (in the form of a variable).

Yes, of course! I could create sub-structs with the names of the swizzle. This would allow me to use the exact syntax desired, without all that nasty #define pollution. But then, how could I read the values or write them? Having the right syntax is useless if you can’t do anything.

That’s where the other operator overloads come in. I could overload the child struct’s operators to provide all the mathematical operations (reading) AND the assignment operators (writing). It’s exactly the syntax I want! But there’s a problem…

Even though I never intended to use these sub-structs in the absence of their parents, the possibility exists. Thus, child structs (classes, etc) are their own sovereign objects. They cannot access their parent’s data members directly. They’re not even supposed to be able to do so ever.

But I want to. So how could I do it? I came up with one solution. In the constructor of the parent object, I could send a copy of the parent’s this reference to a pointer (“that”) in the child objects. The problem with this is that for every single sub-struct, I had to create a pointer-sized variable, which is a lot of extra space I wouldn’t otherwise need. We’re talking about 300 pointers for a 4D vector. That’s a LOT for something that’s supposed to store only 4 numbers.

A friend of mine mentioned that some code he has seen uses a sizeof operator on the parent and then does a pointer subtraction from the child’s this pointer to get the pointer for the parent. This is apparently illegal, but works. Thankfully, in another epiphany I thought of the solution.


Unions to the Rescue

Realizing that I didn’t want access to the parent itself, just its data members, it suddenly fell into place how a union would allow such access. Mike Fried put the idea in my head, but it took me a while to get to it in a functioning setup.

So, the code looks something like this:

union vec2
{
	float v[2];
	V2() {}
	V2(float x, float y)
	{
		v[0] = x;
		v[1] = y;
	}

	struct X
	{
		float v[2];
		X& operator=(const float& rhs)
		{
			v[0] = rhs;
			return *this;
		}
		operator float()
		{
			return v[0];
		}
	} x;

	struct Y
	{
		float v[2];
		Y& operator=(const float& rhs)
		{
			v[1] = rhs;
			return *this;
		}
		operator float()
		{
			return v[1];
		}
	} y;

	struct YX
	{
		float v[2];
		YX& operator=(const vec2& rhs)
		{
			float second = rhs.v[1];
			v[1] = rhs.v[0];
			v[0] = second;
			return *this;
		}
		operator vec2()
		{
			return vec2(v[1], v[0]);
		}
	} yx;
};

And it works. Why does it work? All members of the union share the same memory. While this is usually used to access data in a different data format, I’m using it to access the data formatted the same way with a different name. When I’m accessing the v member array of the child structs of the union, I’m accessing the EXACT same memory as the parent, effectively giving me “access” to the parent’s data. This solves the access problem without the needless extra memory usage.

The good news is that I get exactly the syntax I wanted. Unfortunately, I have to generate a lot more code than what’s displayed here. That’s one of the two problems with this method, in fact.

I wrote a perl script to generate the code for this variation. WITHOUT the constructor code, the basic mathematical, assignment, and comparison operators summed up to over 10,000 lines of code. Too much.

Another, more difficult to spot potential problem is that the operators do not get “reduced down” to their parent types after being used. What can happen is illustrated with this code:

vec2 test1;
vec2 test2(2, 1);

test2.xy = test1.yx = test2.xy;

In the above multiple assignment, you might expect the operation to flip test2 from being (2, 1) to (1, 2). However, what happens is that test2′s (2, 1) gets stored into test1 as (1, 2). But then it is ACCESSED out of order for the next assignment, reversing it again. This is not an error, but it is behavior I wasn’t expecting. For now, I ignore this pitfall (frankly because future methods don’t have a solution for this yet).

So how do I reduce the amount of code required. How do I remove all those duplicated operators in the child structs?


Implicit Conversions and Non-Member Functions

Because all children have conversion operators to their parent type (so that member operators would work with parents and children), they are capable of being implicitly converted whenever needed. In another sudden realization, I found that the operator overloads could be placed outside the union which would force both sides of the given operator to be implicitly converted as necessary.

This reduces all duplicated functions (300 or so in the worst case) down to one. Well, not ALL of them. All swizzle operators with no duplicates still need assignment operators, but this is far less than 300 duplicates. In fact, for the worst case of a 4D vector, the number of swizzles that require assignment operators is only 60. The rest of the functions have only one copy for both the parent type and all swizzle variations (one function per operation per dimension).

So now I have the exact syntax I want and a minimum amount of function duplication. Can I go farther? Well, yes. I asked around at various places “If you could have your perfect vector math library, what functionality would you include?” The most common request was templated vector types.

What a perfect opportunity to learn more about templates!


Template Problems

To start off, I added the templated types to union types and the non-member functions. I used a typedef to give myself the same syntax I had before (vec2 is much easier to type than VECTOR2<float>). And I attempted to compile. Everything broke. After exploring and talking to a few people, it became clear that the operation of compiling templated types effectively “uses up” one of the allowed conversion operations that I needed for implicit conversions. Having more than one automatic compile-time conversion is dangerous, so the language doesn’t allow it.

At this point, the functionality is effectively possible to complete (float type vectors with swizzling) and it has the easy ability to add more (#define a “FLOAT_TYPE” instead of float in the definitions, then add an integer version). But, at the suggestion of some of my fellow students, I submitted the problem to Dr. Volper.

Dr. Volper had a lot of interesting suggestions that eventually led to a setup where the external functions explicitly converted the swizzle down to its parent type by using an internal typedef that provided a way of specifying the type of the parent inside the child:

Placed inside the child:

typedef VECTOR2 PARENT;

The addition overloaded operator as an example of the explicit conversion inside the non-member functions:

template <typename Swizzle1, typename Swizzle2>
VECTOR2<typename Swizzle1::PARENT> operator+(const Swizzle1& lhs, const Swizzle2& rhs)
{
	typename Swizzle1::PARENT vector1(lhs);
	typename Swizzle2::PARENT vector2(rhs);
	return typename Swizzle1::PARENT(vector1.v[0] + vector2.v[0], vector1.v[1] + vector2.v[1]);
}

This handles the case where a swizzle is added to another swizzle of the same dimension. I made other functions where the function signature is (vector, swizzle), (swizzle, vector), and (vector, vector). Since there is no implicit conversion, only one conversion is done by the compiler, which makes the code work as expected even with templates.

So while I don’t get the efficiency of using only one function for every operation, I get only four copies, not 300. A fair trade to allow for templating.

But there’s a problem. Did you notice it? Since Swizzle1 and Swizzle2 are two different templated types, this operator overload will apply to ALL addition operations not explained by other functions. This means that if you attempt to do an addition with another type of function where the types are different, it will use this function, even if the types are not even vectors or swizzles. Furthermore, the function signature is now identical for all dimensions of vector. What is needed is some kind of partial specialization…


Substitution Failure Is Not An Error (SFINAE)

Once again, Dr. Volper came to the rescue and suggested a solution and provided an example. This is a problem that enough other C++ programmers have run into that it has a name. It’s called Substitution Failure Is Not An Error (SFINAE).

It took me a while to understand this concept, but long-story-short, it’s a compile-time if. It forces the return type of the operator overload to become incorrect during mid-compile if the compiler attempts to use it with the wrong type of object. I found this page and this page helpful in understanding enough to make use of this in my own functions.

In the end, the addition overload function will look like this:

template <typename SWIZZLE0, typename SWIZZLE1>
typename EnableIf< Is2D< typename SWIZZLE0::PARENT >, typename SWIZZLE0::PARENT >::type
operator+(const SWIZZLE0& lhs, const SWIZZLE1& rhs)
{
	typename SWIZZLE0::PARENT lVec(lhs);
	typename SWIZZLE0::PARENT rVec(rhs);
	return typename SWIZZLE0::PARENT(lVec.v[0] + rVec.v[0], lVec.v[1] + rVec.v[1]);
}

The support functions that allows this principle to work are these:

template <typename TYPE, typename R, bool b = TYPE::value> struct EnableIf {};
template <typename TYPE, typename R> struct EnableIf<TYPE, R, true> { typedef R type; };
template <typename TYPE> struct Is2D { enum { value = false }; };
template <typename TYPE> struct Is3D { enum { value = false }; };
template <typename TYPE> struct Is4D { enum { value = false }; };
template <typename TYPE> struct Is2D< VECTOR2<TYPE> > { enum { value = true }; };
template <typename TYPE> struct Is3D< VECTOR3<TYPE> > { enum { value = true }; };
template <typename TYPE> struct Is4D< VECTOR4<TYPE> > { enum { value = true }; };

And that will allow all functions to be differentiated while retaining generality where I need it.

For a further example, here’s my ToString function that takes a swizzle type (2D):

template <typename SWIZZLE>
typename EnableIf< Is2D< typename SWIZZLE::PARENT >, std::string >::type
ToString(const SWIZZLE& printSwizzle)
{
	typename SWIZZLE::PARENT vectorSwizzle(printSwizzle);
	std::ostringstream buffer;
	buffer << "(" << vectorSwizzle.v[0] << ", " << vectorSwizzle.v[1] << ")";
	return buffer.str();
}

And that’s where I am so far. I’m rewriting the perl script to generate this new format of the code. It will be a while before I finish that because there are a lot of functions to add in addition to the mathematical, assignment, and comparison operators. However, I don’t see any further problems with completing the library in it’s current code form.

What has been accomplished? Vector types with swizzle read and write masks in the EXACT syntax expected, with no redundant memory. It also uses the minimal possible function redundancy to allow for complete templating of vector types. All issues solved so far.

Where to go from here? Finish that perl script, test, and release version 1.0 for everyone to try. I anticipate adding assembly optimizations (SIMD?) eventually, as well as seeing if this optimization is compatible with my swizzling. There’s also the question if such a massive usage of child structs significantly effects the speed of the code.

I’m open to suggestions for improvement both to the core makeup of the unions/structs as well as feature requests for the complete library. Leave a comment and tell me how you feel.