NHibernate Mapping - <set/>

And now it is time to go to the <set/> and explore it. Most of the collections in NHibernate follow much the same rules, so I am not going to go over them in details:

<set
    name="propertyName"                                         (1)
    table="table_name"                                          (2)
    schema="schema_name"                                        (3)
    lazy="true|false"                                           (4)
    inverse="true|false"                                        (5)
    cascade="all|none|save-update|delete|all-delete-orphan"     (6)
    sort="unsorted|natural|comparatorClass"                     (7)
    order-by="column_name asc|desc"                             (8)
    where="arbitrary sql where condition"                       (9)
    fetch="select|join|subselect"                               (10)
    batch-size="N"                                              (11)
    access="field|property|ClassName"                           (12)
    optimistic-lock="true|false"                                (13)
    outer-join="auto|true|false"                                (14)
>

    <key .... />
    <one-to-many .... />
</set>

1) is the collection property name, just like <property/> or <many-to-one/> are the value property names.

2) table is obviously the table name in which the values for this association exists.

3) schema is the schema in which that table lives.

4) lazy controls whatever this collection will be lazy loaded or not. By default it is set to true. Let us see how this work:

<set name="Posts" table="Posts">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

With the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blog = session.Get<Blog>(1);
	foreach (var post in blog.Posts)
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

This produces the following statements:

We need two select statements to load the data.

However, if we change the set definition to:

<set name="Posts" table="Posts" lazy="false">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

We… would get the exact same output. Why is that?

The answer is quite simple, lazy only control whatever the collection will be loaded lazily or not. It does not control how we load it. The default is to use a second select for that, because that tend to be more efficient in the general case, since this avoid the possibility of a Cartesian product. There are other options, of course.

If we just set lazy to false, it means that when we load the entity, we load the collection. The reason that we see the same output from SQL perspective is that we don’t have a time perspective of that. With lazy set to true, the collection will only be loaded in the foreach. With lazy set to true, the collection will be loaded on the Get call.

You are probably interested in outer-join, which we can set to true, which will give us:

<set name="Posts" table="Posts" outer-join="true">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And would result in the following SQL:

Here we get both the blog and its posts in a single query to the server.

The reason that lazy is somewhat complicated is that there are quite a bit of options to select from when choosing the fetching strategy for the collection, and in general, it is suggested that you would not set this in the mapping, because that is usually too generic. It is preferred to control this at a higher level, when you are actually making use of the entities.

5) inverse is something that I talk about extensively here, so I’ll not repeat this.

6) cascade is also something that I already talked about

7) sort gives you a way to sort the values in the collection, by providing a comparator. Note that this is done in memory, not in the database. The advantage is that it will keep thing sorted even for values that you add to the collection in memory.

8) order-by gives you the ability to sort the values directly from the database.

Note that both 7 & 8 does not work with generic sets and that in general, you don’t want to rely on those ordering properties, you want to use the natural properties of the selected collection. Sets are, by definition, unordered set of unique elements. But generic sorted bags does work:

<bag name="Posts" table="Posts" order-by="Title ASC">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</bag>

And would produce in the following SQL:

9) where allow us to use some arbitrary SQL expression to limit the values in the collection. Usually this is used to filter out things like logically deleted rows. Here is a silly example:

<set name="Posts" table="Posts"
		  where="(len(Title) > 6)">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

Which would result in:

Note that there is important subtlety here, if you intend to use this collection with eager loading, you must make sure that your where clause can handle null values appropriately (in the case of an outer join).

10) fetch controls how we get the values from the database. There are three values, select, join and subselect. The default is select, and you are already familiar with it. Setting it to join would result in:

<set name="Posts" table="Posts" fetch="join">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blog = session.CreateCriteria(typeof(Blog))
		.SetMaxResults(5)
		.List<Blog>()[0];
	foreach (var post in blog.Posts)
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

Will give us:

Setting it to subselect will show something quite a bit more interesting:

We have two queries, the first to load the blogs, and the second one:

In this case, we load all the related posts using a subselect. This is probably one of the more efficient ways of doing this. We load all the posts for all the blogs in a single query. That assumes, of course, that we actually want to use all those posts. In the code seen above, this is actually a waste, since we only ever access the first blog Posts collection.

11) batch-size is another way of controlling how we load data from the database. It is similar to fetch, but it gives us more control. Let us see how it actually work in action before we discuss it.

<set name="Posts" table="Posts" batch-size="5">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And this code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blogs = session.CreateCriteria(typeof(Blog))
		.SetMaxResults(30)
		.List<Blog>();
	foreach (var post in blogs.SelectMany(x=>x.Posts))
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

Produces:

Fist we load the blogs, and we load 30 of them. Now, when we access any of the unloaded collections, something very interesting is going to happen. NHibernate is going to search for up to batch-size unloaded collections of the same type and try to load them all in a single query. The idea is that we take a SELECT N+1 situation and turn that into a SELECT N/batch-size + 1 situation.

In this case, it will turn a 31 queries situation into a 7 queries situation. And we can increase the batch size a bit to reduce this even further. As usual, we have to balance the difference between local and global optimizations. If we make batch-size too large, we load too much data, if we make it too small, we still have too many queries.

This is one of the reasons that I consider those fancy options important, but not as important as setting the fetching strategy for each scenario independently. That is usually a much better strategy overall.

12) access was already discussed elsewhere.

13) optimistic-lock was already discussed elsewhere.

14) outer-join was discussed above, when we talked about lazy.

Posted Mon, 13 April 2009 11:00:18 AM by Ayende

Filed under: mapping