DarioSantarelli.Blog(this);

Archive for November, 2011

[Entity Framework v4] Identity map pattern

Posted by dariosantarelli on November 26, 2011


One of the most important pattern that a good ORM technology should support in order to face the object-relational impedance mismatch is the Identity Map pattern. It’s just one of a set of conceptual and technical difficulties emerging when objects or class definitions are mapped in a straightforward way to database tables or relational schemas.

What’s Identity Map?

In Martin Fowler’s book “Patterns of Enterprise Application Architecture”, the Identity Map is defined as a way of ensuring “that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them.”  If the requested data has already been loaded from the database, the identity map has to return the same instance of the already instantiated object and if it has not been loaded yet, it should load it and stores the new object in the map. In this way, it follows a similar principle to lazy loading. As result, the Identity Map design pattern introduces a consistent way of querying and persisting objects (e.g. through a context-specific in-memory cache) which prevents applications from duplicate retrievals of the same object data from the database.

Ok, in order to better understand this concept, let’s start from a non-Identity Map example. If we have an application that uses a simple persistence layer that performs a database query and then materializes one or more objects, we might see code that creates different instances of the same logical entity:

[TestMethod]
public void Non_IdentityMap_Solution_Provides_Different_Copies_Of_The_Same_Customer()
{
    Customer customer1 = DAL.Customers.GetCustomerById("dsantarelli");
    Customer customer2 = DAL.Customers.GetCustomerById("dsantarelli");

    // customer1 and customer2 should represent the same customer...         
    Assert.AreEqual(customer1.CustomerId, customer2.CustomerId);
    Assert.AreEqual(customer1.Email, customer2.Email);

    // ... but they are two separate instances!     
    Assert.IsFalse(customer1 == customer2);

    // If we change a property of customer1... 
    customer1.Email = "xxx@yyy.zzz";

    // ... then, which instance should be valid?  
    Assert.AreNotEqual(customer1.Email, customer2.Email);
}

In this example, customer1 and customer2 both contain separate copies of the data for the same customer. If we change the data in customer1, the change has no effect on customer2. If we make changes to both and then save them back to the database, one just overwrites the changes of the other. That’s because our persistence framework just doesn’t know that customer1 and customer2 both contain data for the same logical entity.

Conclusion: multiple objects containing data for the same entity, lead to concurrency problems when it’s time to save data.

How does Entity Framework approach the Identity Map pattern?

Now let’s have a look at the Identity Map way! In the unit test below, we have some Entity Framework code in which three different object queries are executed in order to get data for the same customer:

[TestMethod]
public void EF_IdentityMap_Solution_Provides_References_To_The_Same_Instance_Of_Customer()
{
    using (EFContext context = new EFContext())
    {
        Customer customer1 = context.Customers.Single(c => c.CustomerId == "dsantarelli");
        Customer customer2 = context.Customers.Single(c => c.Email == "dario@santarelli.com");
        Customer customer3 = context.Customers.First(c => c.ContactName == "Dario Santarelli");

        // The three queries above should return the same customer.  
         // So, customer 1,2 and 3 are references to the same instance of Customer. 
        Assert.IsTrue(customer1 == customer2);
        Assert.IsTrue(customer2 == customer3);

        // Now if we change a property of customer1... 
        customer1.Email = "xxx@yyy.zzz";

        // ... then customer 1,2 and 3 still remain valid references to the same instance of Customer. 
         Assert.AreEqual(customer1.Email, customer2.Email);
        Assert.AreEqual(customer2.Email, customer3.Email);
    }
}

How you can see, now all 3 customers are equal. Moreover, when we change a property on customer1, we get that same change on customer2 and customer3. In fact, they’re all references to a single object that is managed by the EF’s ObjectContext. Behind the scenes EF ensures that only one entity object is created and the multiple entities that we try to load are just multiple references to that one object, regardless of how many times or how many different are the ways we load an entity. This is a behavior compliant with the Identity Map pattern!

The key is EntityKey

So how does this work?  First of all, every entity type has a key that uniquely identifies that entity.

If your Customer entity inherits from EntityObject (which is the base class for all data classes generated by the Entity Data Model tools) or simply implements the IEntityWithKey interface, in the debugger you’ll notice that Customer has a property that EF created for you named EntityKey (which corresponds to the primary key in the database). EntityKey contains data about all the information ObjectContext needs in order to maintain an Identity Map. You could think of the map as a “cache” that contains only one instance of each object identified by its EntityKey.

REMEMBER: Entity Framework v4 does not require you to implement IEntityWithKey in a custom data class especially if you use POCO entities.

In the previous example, when we get customer1 from our context, by default EF runs the query, creates an instance of Customer (uniquely identified by its key CustomerId), stores that object in the cache, and gives us back a reference to it. When we get customer2 from the context, the context does run the query again and pulls data from our database, but then it sees that it already has a customer entity with the same EntityKey in the cache so it throws out the data and returns a reference to the entity that’s already in cache. The same thing happens for customer3.

So how many database queries EF will perform if we write something like this?

Customer customer1 = context.Customers.Single(c => c.CustomerId == "dsantarelli");
Customer customer2 = context.Customers.Single(c => c.CustomerId == "dsantarelli");
Customer customer3 = context.Customers.Single(c => c.CustomerId == "dsantarelli");

The answer is: three.

Wait… if there’s a cache, why is it performing three queries? The second part of Martin Fowler’s definition of Identity map says “… looks up objects using the map when referring to them”. An obvious question is: if I’m loading an object that already exists in my cache, and EF is just going to return a reference to that cached object and throw away any changes it gets from the database query, can’t I just get the object directly from my cache and skip the database query altogether? That could really reduce database load.

The answer is: you could explicitly get an entity directly from the cache without hitting the database, but only if you use a special method to get the entity by its EntityKey. Here an example:

EntityKey entityKey = new EntityKey("EFContext.Customers", "CustomerId", "dsantarelli");
object customerObj;
if (context.TryGetObjectByKey(entityKey, out customerObj))
{
    // the customer has been found in the cache Customer customer = (Customer)customerObj;
}

What about if we don’t know the actual value of an EntityKey? Well, we can’t use this feature.

In fact, having to use the EntityKey is a big limitation since most of the time you want to look up data by some other field and not by a primary key which could be a Guid or another data type impossible to know.

Identity Map and MergeOptions

Now two interesting questions:

Can I customize the strategy that EF uses to compare the datasource values and the cache entities values?
What happens to cached entities when the underlying database rows change?

Suppose to have the following code:

Customer customer1 = context.Customers.Single(c => c.CustomerId == "dsantarelli");
// Now someone changes the customer1 record in the DB!!! 
Customer customer2 = context.Customers.Single(c => c.CustomerId == "dsantarelli");

After customer1 is loaded, someone changes the record in the DB. Will customer2 have the original values, or the new values? Remember that customer1 and customer2 are references to the same entity object in the cache, and our first db hit when we got customer1 did pull the original value, but then the query for customer2 also hit the database and pulled data. How does EF handle that? The answer is: it depends on the MergeOption enumeration. The possible options are:

AppendOnly (default) : It simply throws the new data out. If an object is already in the context, the current and original values of object’s properties in the entry are not overwritten with data source values. The state of the object’s entry and state of properties of the object in the entry do not change and Identity Map is guaranteed. Here’s a test example:

[TestMethod]
public void EF_AppendOnly_MergeOption_Throws_NewData_Away()
{
    using (EFContext context = new EFContext())
    {
        context.Customers.MergeOption = MergeOption.AppendOnly;

        Customer customer1 = context.Customers.Single(c => c.CustomerId == "dsantarelli");
        Assert.AreEqual(customer1.ContactName, "Dario Santarelli");

        // Now someone changes the customer1 record in the DB
        // by setting ContactName = "Luigi Santarelli" !!!
        ChangeDBRecord("dsantarelli" , "Luigi Santarelli");

        Customer customer2 = context.Customers.Single(c => c.CustomerId == "dsantarelli");

        Assert.IsTrue(customer1 == customer2); // They are references to the same Customer instance (Identity Map)
        Assert.AreEqual(customer2.ContactName, "Dario Santarelli");  // Original values win! }
}

OverwriteChanges: Unlike the AppendOnly option, it applies new data. If an object is already in the context, the current and original values of object’s properties in the entry are overwritten with data source values, ignoring every changes we make in the meanwhile. Identity Map principle is still preserved.

[TestMethod]
public void EF_OverwriteChanges_MergeOption_Applies_NewData()
{
    using (EFContext context = new EFContext())
    {
        context.Customers.MergeOption = MergeOption.OverwriteChanges;

        Customer customer1 = context.Customers.Single(c => c.CustomerId == "dsantarelli");
        Assert.AreEqual(customer1.ContactName, "Dario Santarelli");

        // Now someone changes the customer1 record in the DB 
        // by setting ContactName = "Luigi Santarelli" !!! 
        ChangeDBRecord("dsantarelli" , "Luigi Santarelli");

        customer2 = context.Customers.Single(c => c.CustomerId == "dsantarelli");

        Assert.IsTrue(customer1 == customer2); // They are references to the same instance (Identity Map) 
        Assert.AreEqual(customer2.ContactName, "Luigi Santarelli"); // New values win }
}

NoTracking : In this scenario, objects are not tracked in the ObjectStateManager. Each time we hit the DB for getting a customer, the EF provides a new instance of the Customer class. So, in this case, the Identity Map principle is broken (we can find some analogies with the non-Identity Map solution presented at the beginning of this post).

[TestMethod]
public void EF_NoTracking_MergeOption_Applies_NewData_And_Provides_Different_Copies_Of_The_Same_Customer()
{
    using (EFContext context = new EFContext())
    {
        context.Customers.MergeOption = MergeOption.NoTracking;

        Customer customer1 = context.Customers.Single(c => c.CustomerId == "dsantarelli");
        Assert.AreEqual(customer1.ContactName, "Dario Santarelli");

        // Now someone changes the customer1 record in the DB
        // by setting ContactName = "Luigi Santarelli" !!!
        ChangeDBRecord("dsantarelli", "Luigi Santarelli");

        Customer customer2 = context.Customers.Single(c => c.CustomerId == "dsantarelli");

        Assert.IsFalse(customer1 == customer2); // They are NOT references to the same instance (NO Identity Map) 
        Assert.AreEqual(customer1.ContactName, "Dario Santarelli"); // customer1 has original values 
        Assert.AreEqual(customer2.ContactName, "Luigi Santarelli"); // customer2 has new values }
}

PreserveChanges : this option is quite a compromise between the AppendOnly and the OverwriteChanges options.

  • If we don’t change any property of our entity (i.e. the state of the entity is Unchanged), the current and original values in the entry are overwritten with data source values. The state of the entity remains Unchanged and no properties are marked as modified.
  • If we change a property of our entity (i.e. the state of the entity is Modified), the current values of modified properties are not overwritten with data source values. The original values of unmodified properties are overwritten with the values from the data source.
  • Entity Framework v4 compares the current values of unmodified properties with the values that were returned from the data source. If the values are not the same, the property is marked as modified.

So, let’s see this behavior in a test…

[TestMethod]
public void EF_PreserveChanges_MergeOption_Preserves_Client_Changes()
{
    using (EFContext context = new EFContext())
    {
        context.Customers.MergeOption = MergeOption.PreserveChanges;

        Customer customer1 = context.Customers.Single(c => c.CustomerId == "dsantarelli");
        Assert.AreEqual(customer1.ContactName, "Dario Santarelli");

        customer1.ContactName = "Carlo Santarelli"; // We change the ContactName in memory

        // Now someone changes the customer1 record in the DB
        // by setting ContactName = "Luigi Santarelli" !!!
        ChangeDBRecord("dsantarelli", "Luigi Santarelli");

        Customer customer2 = context.Customers.Single(c => c.CustomerId == "dsantarelli");

        Assert.IsTrue(customer1 == customer2); // They are references to the same instance (Identity Map) 
        Assert.AreEqual(customer2.ContactName, "Carlo Santarelli"); // Our changes are preserved! }
}

HTH

Advertisements

Posted in Entity Framework | Tagged: , , | Leave a Comment »