The Right Way to do Equality in C#

One of the pitfalls of doing development in C#, Java, C++, or really any predominantly Object Oriented language (OOP) is how “equality” is defined.

In C#, for instance, you have the following methods that are built into every object:

My personal opinion: in any managed language, checking for referential equality is a pretty bad default - and if you’re working with immutable objects then equality by reference doesn’t really work.

In C/C++, where pointer arithmetic and knowing the precise location of something in memory matters it’s a different story. Equality by reference is the correct default in that case.

What’s the right thing to do?

Equality by value - i.e. determining if two objects are equal by comparing their content.

In Akka.NET all message classes are supposed to be immutable, which means a reference to a message is useless as soon as someone modifies it (because it creates a copy.) Therefore the Akka.NET development team has had a lot of practice implementing equality by value on many of the built-in message classes.

Here’s what that technique looks like:

  1. Implement IEquatable<T> for your class (where T is the class;)
  2. Override object.Equals(object o); and
  3. [Special cases] Override object.GetHashCode() using a high-entropy function, with caveats.

1. Add a little IEquatable<T> love for your class

The [IEquatable interface](https://msdn.microsoft.com/en-us/library/ms131187.aspx) is straightforward - it simply adds the following method to your class:

bool Equals(T other);

Nothing complicated here.

The important distinction between this Equals method and the built-in object.Equals(object o) one that comes with every .NET object is that when you’re comparing two classes of type T the IEquatable<T> method is what gets called as it’s the most specific match.

Here’s an example using an actual POCO class:

public class Foo : IEquatable<Foo>{
	public int MyNum {get; set;}
	public string MyStr {get; set;}
	public DateTime Time {get; set;}

	#region Equality

	public bool Equals(Foo other){
		throw new NotImplementedException();
	}
			
	#endregion
}

Time to implement the Equals(T other) method. This is tedious, but straightforward: we want to determine that the values of each individual property are equal. All properties that you want to include for comparisons must be equal in order for two object instances to be equal, so here’s what that would look like for this implementation:

public bool Equals(Foo other){
	if(other == null) return false;
	return MyNum == other.MyNum &&
			Time == other.Time &&
			string.Equals(MyStr, other.MyStr);
}

In this case we’re comparing all of the properties, because they’re simple. I’m paranoid about running into a NullReferenceException so I throw in a test to see if other is null immediately. string is a nullable type, so but string.Equals will be able to return true or false without throwing an exception even if one or both of the strings are null.

But what if one of my properties was another custom POCO object, say a Bar class? How would I perform this equality by value check for Foo? Well here’s the bad news - Bar and any other class used as a property of Foo also have to be equality by value.

It’s because of this that implementing equality by value in C# often feels like a yak-shaving exercise, but the reward is worth the pain.

2. Tune up object.Equals(object o)

Most of the hard work in this process goes into step 1. Step 2 is pretty boilerplate by contrast:

public class Foo : IEquatable<Foo>{
	public int MyNum {get; set;}
	public string MyStr {get; set;}
	public DateTime Time {get; set;}

	#region Equality

	public bool Equals(Foo other){
		if(other == null) return false;
		return MyNum == other.MyNum &&
			Time == other.Time &&
			string.Equals(MyStr, other.MyStr);
	}

	public override bool Equals(object obj){
		if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != GetType()) return false;
		return Equals(obj as Foo);
	}
			
	#endregion
}

We override the object.Equals method and replace it with some boilerplate code that builds upon our work with the IEquatable<Foo>.Equals(Foo other) method:

  1. Use ReferenceEquals to determine of obj is null - immediately return false if that’s the case.
  2. Use ReferenceEquals if obj actually does refer to this and return true.
  3. Check to see if the Type of obj is the equal to our current Type - return false if otherwise.
  4. Cast obj to Foo and hand it off to Equals(Foo other) to do all of the work we did in step 1.

Calls 1-2 on this list are normally all that the object.Equals method does.

Last step!

3. Use some prime numbers and bit-shifting to get a unique GetHashCode

So we have the ability to determine if two Foo instances are equal by value, but we haven’t fixed their GetHashCode functions yet. What does this mean?

Well, if equality by value was important to you and you wanted to use Foo in a HashSet<Foo>, you could end up adding two Foo instances with identical values to your HashSet<Foo> by accident - because that collection keys off of the hashcode of each Foo object to determine uniqueness.

There are lots of other cases where the hashcode gets used by built-in pieces of the .NET framework, so it’s critical that we override GetHashCode.

public class Foo : IEquatable<Foo>{
	public int MyNum {get; set;}
	public string MyStr {get; set;}
	public DateTime Time {get; set;}

	#region Equality

	public bool Equals(Foo other){
		if(other == null) return false;
		return MyNum == other.MyNum &&
			Time == other.Time &&
			string.Equals(MyStr, other.MyStr);
	}

	public override bool Equals(object obj){
		if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != GetType()) return false;
		return Equals(obj as Foo);
	}

	public override int GetHashCode(){
		unchecked{
			var hashCode = 13;
                hashCode = (hashCode * 397) ^ MyNum;
				var myStrHashCode = 
					!string.IsNullOrEmpty(MyStr) ? 
						MyStr.GetHashCode() : 0;
                hashCode = (hashCode * 397) ^ MyStr ;
                hashCode = 
					(hashCode * 397) ^ Time.GetHashCode();
                return hashCode;
		}
	}
			
	#endregion
}

Ok, weird unchecked keyword and lots of prime numbers for some reason - what the hell is going on?

We don’t really care what the value of hashCode is - all we care about is that it’s an extremely unique hash code that only another object with the exact same values could provide.

We’re going to use a computation technique that provides a reproducible hashcode for all equal-by-value instances of Foo and makes the likelihood of a hash collision extremely low.

If you want to read an explanation of this technique written by someone who’s much more talented than I am, check out Jon Skeet’s answer about C# GetHashCode functions on StackOverflow.

First, we use the unchecked keyword to the let the CLR know that we don’t care if hashCode overflows or underflows in this instance - all we care about is the value.

Second - we pick two different prime numbers, one to act as the seed for the hash and the other to be used as part of our hash multiplier. For each property we need to include in the hash function we multiply the current hash times the prime and then we incorporate the hashcode of each individual member.

In my case I’m using bitwise or (^) but you could just use addition and achieve and equally robust results.

When do I really need to override GetHashCode ?

An important caveat - you should only override GetHashCode if your objects are immutable.

If the properties of your Foo class can change and change the GetHashCode result, any collection (List<Foo>, etc…) will behave unpredictably or throw an exception if the hashcode of an individual Foo instance in the collection changes.

You can still follow steps 1 and 2 of this guide and have solid equality by value semantics, but only override the hashcode when you’re designing a class to be immutable from the get-go.

Discussion, links, and tweets

I'm the CTO and founder of Petabridge, where I'm making distributed programming for .NET developers easy by working on Akka.NET, Phobos, and more..