NHibernate Code Base Analysis

Patrick Smacchia writing. I am not a NH developer but the creator of a static analysis tool for .NET developer: NDepend. I recently analyzed NH v3.0.0 Candidate Release 1 with NDepend and I had a chance to discuss some results with NH developer Fabio Maulo. Fabio suggested me to show some results on the NH blog, so here it is.

NDepend generated a report by analyzing NH v3.0.0 CR1 code base. See the report here. NDepend has also the ability to show static analysis results live, inside Visual Studio. The live results are richer than the static report results. Here, I will mostly focus on results extracted from the report, but a few additional results will be obtained from the richer NDepend live capabilities.

Code Size

NH code base weights almost 63K Lines of Code (LoC as defined here). Developers hates LoC as a productivity yardstick measurement, but it doesn't mean that the LoC code metric is useless. LoC represents a great way to compare code base size and gets an idea of the overall development effort. In the report namespace metrics section, we can see that the namespace NHibernate.Hql.Ast.ANTLR.* generated by ANTLR weights around 18K LoC. So we can consider that NH handcrafted code weights 45 LoC. Now we have a number to compare to the 19K LoC of NUnit, the 28K LoC of CC.NET, the 32K LoC of Db4o, the 110K LoC of NDepend, the roughly 130 KLoC of Llblgen, the roughly 500K LoC (or so) of R# (that certainly contains a significant portion of generated code) and the roughly 2M LoC of the .NET Fx 4.

So not only NH is one of the most successful OSS initiative, it is also one of the biggest OSS code base. To quote one NH contributor, NH is a big beast!

Assembly Partitioning

NH is packaged in a single NHibernate.dll assembly. I am a big advocate of reducing the number of assemblies and one assembly seems an ideal number. This way:

Projects consumers of NH just need to link, maintain the reference to just one assembly. This is a very good thing compared to many other OSS Fx that force to reference, maintain many assemblies.
Compilation time is much (much) faster. Compilation time of one single VS project can be easily 10 times faster than the compilation time of the same code base partitioned in many VS projects.
Startup-time of an application using NH is a bit faster. Indeed, the CLR comes with a slight overhead for each extra assemblies to load at runtime.

On the dependency graph or dependency matrix diagrams of the report, I can see that the NH assembly is linking 3 extra assemblies that needs to be redistributed as well: Antlr3.Runtime, Remotion.Data.Linq, and Iesi.Collections.

Code Coverage and NH Code Correctness

The report shows the number 75.93% code coverage ratio. This is an excellent score, especially taken account the large code size. I consider code coverage ratio as the queen the of the code quality metrics. The higher it is, the less likely it is to release a bug in production. However things are not so simple. High code coverage ratio matters if (and only if) the number of checks performed while running unit tests is also high. These checks are usually done in test code (through API like Assert.IsTrue(...) ). But few developers realize that checks have the same value if they are done in the code tested itself through the API Debug.Assert(...) or through the new Microsoft Code Contract API. The two important things is that checks (or contract if you prefer) must not slow down execution, and must fail abruptly when the condition is violated.

I can quickly see that NH doesn't use Debug.Assert(...) nor the new Microsoft Code Contract API. But on the other hands I can see that NH comes with 2735 unit tests, all successfully executed. This significant number, coupled with the 75,93% code coverage ratio, advocate for an excellent testing plan for NH. To quote one NH contributor I talked with once: NH is very hard to break! (but by using code contracts and striving for an even higher code coverage ratio it would be even harder to break).

An another and obvious reason why NH code is rock solid, is related to the huge NH community size, that can be counted in hundred of thousands of developers and projects. In this condition, any bug has very few chances to live for a long time.

Code Architecture

Most of .NET developers consider (wrongly IMHO) that .NET code must be componentized through .NET assembly (meaning through VS projects). As discussed above, having very few assemblies comes with important benefits. The essential point is that assemblies are physical artifacts while components are logical artifacts. Hence assembly partitioning must be driven by physical reasons (like lazy-code loading or an addin system).

Nevertheless a 63K LoC code base needs a solid architecture. A solid architecture is the key for high code maintainability. How to define components in .NET code? Personally my preference goes to the usage of namespaces to define component. This way of doing comes wit many advantages: namespaces are logical artifacts, namespaces can be structured hierarchically, architecture explorer tooling can deal out-of-the-box with namespaces, namespaces are supported at language-level and namespaces can be used to draw explicit and concrete boundaries.

In a framework such as NH, namespaces are essentially used to organize the public API. This way of doing is not incompatible with componentizing the code through namespaces. But in the case of NH, the project inherited the API structure of the Hibernate project in the Java sphere. The former Hibernate project doesn't rely on code componentization through namespaces, so NH doesn't as well. And there is no hope for any refactoring : this would result in a fatal tsunami of breaking changes in the NH public API.

So NH code base has no obvious (at least to me) nor explicit componentization. I know there are architecture guidelines that NH contributors must learn, understand and follow, but sitting outside of the project, I cannot easily figure them out.

Code Quality

If you look back at the report, you'll see many typical Code Quality rules violated. As said, I consider Code Coverage ratio as the queen of code quality rules, but that doesn't mean that other code quality metrics don't matter. So I can see through the rule Methods too complex - critical (ILCyclomaticComplexity) two dozens of awfully complex methods. Most of them seems to be generated by ANTLR . So there is room here to refine the NDepend Code Query Rule to exclude this generated code, like for example...

// <Name>Methods too complex - critical (ILCyclomaticComplexity)</Name>
WARN IF Count > 0 IN SELECT METHODS
OUT OF NAMESPACES "NHibernate.Hql.Ast.ANTLR" WHERE
  ILCyclomaticComplexity > 40 AND
  ILNestingDepth > 4
  ORDER BY ILCyclomaticComplexity DESC

...and see than now only 3 handcrafted methods are matched (one of those, NHibernate.Cfg.Configuration.GenerateSchemaUpdateScript(Dialect,DatabaseMetadata) has 49 lines of code, a Cyclomatic Complexity of 25 and is 87% covered by tests).

The rule violated Methods with too many parameters - critical (NbParameters) is more a concern since we can see here a redundant code smell of having many constructors with plenty of parameters (up to 22 parameters for the ctor of the class NHibernate.Cfg.Mappings).

The rule violated Type should not have too many responsibilities (Efferent Coupling) seems to me another concern. It exhibits several god classes, meaning classes with too many responsibilities. Here NDepend bases its measure on the Efferent Coupling code metric, that represents, the number of other types a type is using. The notion of class responsibility is a bit abstract, it is often translated to the tenet: a class should have only one reason to change which is still abstract in my opinion. Obviously the higher the Efferent Coupling, the more likely a class has too many responsibilities. God classes often result from the lack of refactoring during project evolution, iterations after iterations. The god class represented an initial clear concept that has evolved without appropriate refactoring, and developers got used to live with this code smell. In the context of a public framework such as NH, refactoring a public god class or interface might be not and option if this implies unacceptable API public breaking changes.

The rule violated Complex methods should be 100% covered by tests exhibits a few hundreds of relatively complex methods not thoroughly covered by tests. Here also a lot of these methods belong to NHibernate.Hql.Ast.ANTLR and by filtering them, we still have more than 200 matches. This fact is a concern because having high code coverage ratio is not enough. What is important is to have a lot of methods and classes, 100% covered by tests. Indeed, empirically I noticed that: code that is hard to test is often code that contains subtle and hard to find bugs. Unfortunately, the 10% code hard to test is the code that demands more than 50% of test writing resources.

We could continue to enumerate one by one Code Quality rules violated. The truth is that any sufficiently large code base contains thousands of violation of most basic code quality rules. An important decision must be taken to care for code quality before the code becomes so messy that it discourage developers to work on it (and to be honest, I had feedback from two NH contributors that left the project, partly for that reason). Once again, the NH situation here is more the rule than the exception and I'd say that if you are a real-world developer yourself, there are 9 chances on 10 that you are not satisfied by the code quality of the everyday code base you are working on. The problem when deciding to begin to care for code quality is that tooling like NDepend or FxCop reports literally thousands of flaws. However, a tool like NDepend makes things easier through its support for baseline. Concretely one can decide to continuously compare the code base against, say, the last release, and then fix flaws only on code refactored or added since the baseline. This way the team follow the rule if it's not broken don't fix it and it achieves better and better code quality without significant effort. Concretely a CQL rule that should take account of the baseline can be refactored as easily as:

// <Name>From now, all methods added or refactored should not be too complex</Name>
WARN IF Count > 0 IN SELECT METHODS WHERE

// Match methods new or modified since Baseline for Comparison...
(WasAdded OR CodeWasChanged) AND

// ...that are too complex
CyclomaticComplexity > 10

Code Evolution

And this was a good transition to the last part I'd like to comment: Code Diff. As said NDepend can compare 2 versions of a code base and in the report we compared NH v3.0.0.CR1 with v2.1.2.GA. The rule API Breaking Changes: Types seems to exhibit a few matches:

// <Name>API Breaking Changes: Types</Name>
WARN IF Count > 0 IN SELECT TYPES
WHERE IsPublic AND (VisibilityWasChanged OR WasRemoved)

Types like NHibernate.Dialect.SybaseAnywhereDialect, NHibernate.Cache.ISoftLock or NHibernate.Cfg.ConfigurationSchema.ClassCacheUsage were public types that have either be removed, renamed, or set to internal types. Also we can see that some public interfaces such as, NHibernate.Proxy.IProxyFactory or NHibernate.Hql.IQueryTranslator have been changed. This can break client code if these interfaces were meant to be implemented by clients.

In the Code diff report section, the query Public Types added and also Namespaces added represent a mean to list new features added to NH v3.

// <Name>Public Types added</Name>
SELECT TYPES WHERE WasAdded AND IsPublic

Here, we mostly see the prominent new NH v3 linq capabilities through the numerous NHibernate.Linq.* namespaces added, and we can also focus on the many secondary featurettes like NHibernate.SqlTypes.XmlSqlType or NHibernate.Transaction.AdoNetWithDistributedTransactionFactory.

Posted Fri, 26 November 2010 03:40:00 AM by Patrick Smacchia

Filed under: News, NHibernate