Logo

NHibernate

The object-relational mapper for .NET

Identity Field, Equality and Hash Code

Blog Signature Gabriel

In this post I'll describe a possible base class for domain entities which implements a surrogate key as identity field and provides equality and hash code.

Introduction

Martin Fowler writes in his PoEAA book: "The identity field saves a database ID field in an object to maintain identity between an in-memory object and a database row."

And further he states: "The first concern is whether to use meaningful or meaningless keys. A meaningful key is something like the U.S. Social Security Number... A meaningless key is essentially a random number the database dreams up that's never intended for human use."

There are many reasons why meaningful keys often are NOT good candidates for an identity field. Primarily they often are not immutable (due to possible human errors) and not unique. Thus Martin Fowler states: "... As a result, meaningful keys should be distrusted. ..."

Having you provided some background about the ongoing dispute about what is a good candidate for an identity field I'll now make my choice. I always choose meaningless keys as identity fields. Such fields are often called surrogate key. Important: "The surrogate key is not derived from application data."

My favorite type of surrogate key is a GUID (global unique identifier). The mathematical algorithm used to generate a new GUID is such as that it is (nearly) impossible to generate the same ID twice (the probability tends to zero).

NHibernate supports GUID as one possible type for the identity field.

Problem Description

When dealing with NHibernate one often uses a special type of collection known as Set. A set is a collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.Equals(e2), and at most one null element. As the Set is not provided by the .NET framework NHibernate uses the IESI collections library which contains an implementation of a set.

In the definition above you find which is the important predicate to decide whether two elements are the same or not. It is the Equals function. By default the Equals function takes the hash code of two objects and compares it. So if two variables e1 and e2 refer to 2 different instances of a class Equals will always return false. But we want to use the identity field as the relevant part in the comparison of two instances. If two different instances have the same identity field then they are equal (that is they refer to the same database record).

Implementation

The default implementation of the Equals function is to be found in the System.Object class. From this class all other classes in .NET implicitly or explicitly inherit. Fortunately the Equals function is virtual and we are able to override it. But when overriding the Equals function we have to also override the GetHashCode function.

Assuming that we take a GUID called Id as identity field we can define the following base class from which all our domain classes directly or indirectly will inherit

public class IdentityFieldProvider<T>
    where T : IdentityFieldProvider<T>
{
    private Guid _id;
 
    public virtual Guid Id
    {
        get { return _id; }
        set { _id = value; }
    }
}

Now lets override the Equals method. A possible solution is

public override bool Equals(object obj)
{
    T other = obj as T;
    if (other == null)
        return false;
 
    // handle the case of comparing two NEW objects
    bool otherIsTransient = Equals(other.Id, Guid.Empty);
    bool thisIsTransient = Equals(Id, Guid.Empty);
    if (otherIsTransient && thisIsTransient)
        return ReferenceEquals(other, this);
 
    return other.Id.Equals(Id);
}

We have to distinguish 3 possible cases. The first one is that the user/developer wants to compare two objects of different type. This case is trivial; the answer is ALWAYS "not equal". The second case is when the two objects are both new (also called transient) then the two references point to the same instance. And the third case just takes the implementation of the Equals method of the GUID type to check for equality.

Now we have to also override the GetHashCode method also inherited from System.Object.

private int? _oldHashCode;
 
public override int GetHashCode()
{
    // Once we have a hash code we'll never change it
    if (_oldHashCode.HasValue)
        return _oldHashCode.Value;
 
    bool thisIsTransient = Equals(Id, Guid.Empty);
    
    // When this instance is transient, we use the base GetHashCode()
    // and remember it, so an instance can NEVER change its hash code.
    if (thisIsTransient)
    {
        _oldHashCode = base.GetHashCode();
        return _oldHashCode.Value;
    }
    return Id.GetHashCode();
}

Now, why this kind of code you might ask yourself? Well, a object should never ever change it's hash code during its life, that is from the moment the object is instantiated until it is disposed. If a object is restored from database there is no problem since any existing database record has always a well defined and unique identity field. Thus we can derive the hash code from this Id field. This is done in the last line of code in the code snippet above.

A little bit more problematic is the case when a new object is created in memory, then it's identity field is undefined (the object has not been saved to the database so far and is thus considered as being transient). In our case undefined means that the Id field has a value of Guid.Empty. In this case we take the default implementation (of System.Object) of the GetHashCode method to generate a hash code. But we store is in an instance variable for further reference.

Later in the life cycle of the instance it may be persisted to the database (but still continues to sit around in the memory). At this moment NHibernate assigns a new unique value to the Id field of the instance. Now the object isn't transient any more but the 2 first lines in the method avoid that the hash code of the object changes. It is still the same object as before. It has just been made persistent.

Finally we can also override the two operators '==' and '!=' to make it possible to compare two instances with those operators instead of only the Equals method.

public static bool operator ==(IdentityFieldProvider<T> x, IdentityFieldProvider<T> y)
{
    return Equals(x, y);
}
 
public static bool operator !=(IdentityFieldProvider<T> x, IdentityFieldProvider<T> y)
{
    return !(x == y);
}

That's it. You can now use this class as the base for every entity class in your domain and never ever have to think about the identity field and the equality of objects. It just happens...

Enjoy

Blog Signature Gabriel


Posted Sat, 06 September 2008 12:02:27 PM by gabriel.schenker
Filed under: identity, equality

comments powered by Disqus
© NHibernate Community 2024