Monday, August 20, 2012

How to Improve Performance of Entity Framework Query ?

What is Eager Lording?
  • When using Eager Loading, the related entities are loaded along with your target entity set
  • You use an Include statement in your query to indicate which related entities you want to bring in

Bad way of using Eager Loading
  • Single Call for Database Server with Many Include - BAD

public Invoice GetInvoice(string providerKey, string ownerKey, Guid id)
        {
            return (from owner in Catalog.Owners
                    where owner.Key == ownerKey
                    from invoice in owner.Invoices
                    where invoice.Provider.Key == providerKey
                    where invoice.Id == id
                    select invoice)
                    .Include(i => i.Owner.Credits)
                    .Include(i => i.Provider)
                    .Include(i => i.Items.Select(s => s.Allocation.Service))
                    .Include(i => i.Items.Select(s => s.Allocation.Pet))
                    .FirstOrDefault();
        }


Before breaking the query: Time for load the data  ~ 17 s

Before breaking the query

Best way of using Eager Loading
  • 2 or More Small Data Request from Database Server - BEST

public Invoice GetInvoice(string providerKey, string ownerKey, Guid id)
        {
            //split the eager method(Include) for better performances 
            var invoiceObject = (from owner in Catalog.Owners
                                 where owner.Key == ownerKey
                                 from invoice in owner.Invoices
                                 where invoice.Provider.Key == providerKey
                                 where invoice.Id == id
                                 select invoice)
                                 .Include(i => i.Owner.Credits)
                                 .Include(i => i.Provider)
                                 .FirstOrDefault();

            invoiceObject = (from owner in Catalog.Owners
                             where owner.Key == ownerKey
                             from invoice in owner.Invoices
                             where invoice.Provider.Key == providerKey
                             where invoice.Id == id
                             select invoice)
                             .Include(i => i.Items.Select(s => s.Allocation.Service))
                             .Include(i => i.Items.Select(s => s.Allocation.Pet))
                             .FirstOrDefault();

                 return invoiceObject;
        }

After breaking the query: Time for load the data ~ 0.038 s

After breaking the query

Performance Boost ~ Over 400 times than before

I have used JetBrains dotTrace Performance Tool for do the above Testing.

Do you Need to know more about JetBrains dotTrace ?


JetBrains dotTrace


Are You an Entity Framework Lover ?


 Entity Framework 5



Conclusion
  • If you would like to use Eager loading method for data loading with entity framework then use brake query method
  • Don’t use more than 2 Includes on single query (MAX = 2 Includes per Query)
  • Break them as above for better performances
  • It will give Over 400X performance gain

P.S.

In detail Answers for Comments Posted by Anonymous September 6, 2012 12:23 AM and
Anonymous September 9, 2012 7:27 AM Developers :

Part 1 : 

You can see that when I ran the first part of the invoiceObject by using debuger it generated Dynamic Proxy for Provider object But Not for the Items List object. 


Dynamic Proxy for Provider object







                            
Part 2 : 

In below screen you can see it Generated Data for Items List object But Nothing Happened to the Provider Object which was generated earlier (Your override concept is not happening here.Because 2 Sub queries are Merged by using Union. Theses are not happening like basic programming concepts.Its generating T-Sql by using Dynamic Proxies.).Because of that in this methodology (BEST) is giving Same Out Put as Earlier one (BAD) but Very Very Speedy way.(400 times faster than BAD scenario) 


Nothing Happened to the Provider Object













 
                       

Happy Coding.

May Be Use Full To You :

31 comments:

  1. Very nice Article.Thumps up.

    ReplyDelete
  2. Yes, You correct

    ReplyDelete
  3. Hi,

    Your code seems to have a problem :
    you override your object "invoiceObject" in the second query. You won't get the same results as the first one, because you don't iterate on the same collection.

    Since you don't compare the same queries, you cannot state that the second one is faster than the first one.

    ReplyDelete
    Replies
    1. Hi,
      Here your wrong.B'cos this solution is running on production and is giving same out put as before.There are 2 Trick here.

      1. Second option(BEST) goes data base 2 times and getting much lighter data from database.But on first scenario(BAD) it goes data base once and bring much bigger data.so because of that it consume more time than 2nd scenario(BEST).

      2.when doing 1st scenario (BAD) it generates so many outer joins for generate the T-sql needed for database processing(approximately 2230 T-SQL lines).But on second scenario(BEST) it's around 60 lines per call(120 in all).

      If you need to investigate this T-Sql query generation for EF then you can see my "How to Convert Linq Entity Query into T-SQL" blog post by using below link

      http://sampathloku.blogspot.com/2012/08/how-to-convert-linq-entity-query-into-t.html

      If you need further clarifications then let me know.

      Thanks.

      Delete

  4. Hi SampathLoku,

    the person above is right, your code contains a mistake. As he/she mentioned, the it's not about the performance, it's about the result sets. Both your codes (BAD and BEST) cannot return the same results, and here's why:

    1. In the first part of your "BEST" code sample, you perform a LINQ query to get an invoice object including some parameters ( "Owner.Credits" and "Provider" ), which you assign to the "invoiceObject" object.

    2. In the second part, you perform the same LINQ query as in the first part, this time including "Allocation.Service" and "Allocation.Pet" parameters, which you also assign to the same "invoiceObject" object, and therefore you overwrite the content of the "invoiceObject" that you get in the first part of the query.

    THE PROBLEM: the issue is that you are not using the content of the "invoiceObject" object in the second part of the query (THERE IS NO OCCURRENCE OF THE "invoiceObject" OBJECT IN THE SECOND PART OF THE QUERY). Since this is the basic of programming, you are losing the result of the first query in the second query.

    Thus, in your BEST solution, when you return from the GetInvoice() method, your result doesn't contain the "Owner.Credits" and "Provider" values.

    Now, it may be due to a copy/paste error...

    T.

    ReplyDelete
    Replies
    1. Hi,

      Your explanation is Wrong here.Please see the P.S. Section of above post for more details.If you need more clarifications then let me know.

      Thanks for Your comment.

      Delete
  5. Sampath,

    return invoiceObject, always will return the bottom part of the query.thats basic.
    bcoz invoiceObject is overidding.

    anyway appreciate your effort.

    thanks.

    ReplyDelete
    Replies
    1. Hi,

      Your explanation is Wrong here.Please see the P.S. Section of above post for more details.
      Then you can see part 1's provider object doesn't overridden by part 2's operation.Provider object still there.According to your explanation provider object must be null.B'cos second query does not retrieve provider object.
      In EF Linq not overridden the first instance.Its union the both Query output.Try to do your self and then let me know.

      Thanks for Your comment.

      Delete
  6. I'm here from StackOverflow.
    Thanks for your detailed post.

    I'm using VS2012 Ultimate with Azure SQL.
    How can I get the inspection screen like you (link: http://1.bp.blogspot.com/-V_kYZe8ATDg/UDJrx5zu9zI/AAAAAAAAABc/nsyPm1YBz-k/s1600/Before.png )

    ReplyDelete
    Replies
    1. Hi,

      Thanks for your feedback.

      I got that screen by using "JetBrains dotTrace Performance Tool".
      You can get more details about that by using below link:
      http://www.jetbrains.com/profiler/

      Delete
  7. Hi Sampath, I'm trying to understand this post, given that some of the comments say it is not correct.

    Does the returned value of your BEST method contain the "Owner.Credits" and "Provider" values, or not?

    If it does, then can you explain how it is doing that? I would appreciate this, because as I understand it the code *should* be discarding what comes out of the first query, when you assign the second one. Is it caching the first one, somehow, but invisibly? If so, how did you find out?!!

    Thanks, this is very interesting.

    ReplyDelete
    Replies
    1. Hi strumpet,

      Your Q 1 : Does the returned value of your BEST method
      contain the "Owner.Credits" and "Provider"
      values,or not?

      Answer : Yes.It does.(i.e. It's having earlier query
      2 traverse objects (Credits & Provider) when
      comes to the final out put)

      Explanation : Because 2 Sub queries are Merged
      by using Union inside the Application server (IIS).This is done by using EF API. Theses are not happening like basic programming concepts such as override.Its generating T-Sql by using Dynamic Proxies.

      Note 1 : You will have better explanation with images If you can see the P.S. section of the above article.

      Note 2 : I am using This article's Best method scenario for all our live customers EF queries.So no problem about the final out put.The only problem Now there having is lightning fast response for there queries :)

      You also can try this for your queries and feel the difference.If you're having further clarifications plz let me know.
      Keep in Touch.

      Delete
  8. Hi Asava,

    Thanks for the useful link about another ORM API.
    Keep in Touch.

    ReplyDelete
  9. At what point does this become inefficient? If i have 20 includes with one DB roundtrip and i use your method to split it up into 10 roundtrips with 2 includes...at what point does your BEST method become BAD?

    ReplyDelete
    Replies
    1. Hi Skylar,

      Yes.Here You made good question.So My strong suggestion is don't try to use more than 4 includes with a targeted query.If you have 4 (or less) includes then you can brake it with 2 includes per query as above and can get mentioned performance gain.But If you're having more than that you must rethink about your View(UI) and You must redesign your view for better performance.B'cos more round trips to the SQL server is also kill the performance.In conclusion my method also may be BAD when it's having more than 4 includes.I hope you got my point.
      If you're having more queries Plz let me know.

      Delete
  10. can you give a rough estimate of your database row count and including the joined tables?

    ReplyDelete
    Replies
    1. Hi Gary,

      Thanks for your feedback.

      I can provide a rough estimation about the row counts at the time when I was doing above R&D.Please check that.

      Owners = 8,941 Rows
      Invoices = 3,215 Rows
      Services = 14,295 Rows
      Pets = 7,250 Rows
      Providers = 2,010 Rows
      Allocations = 15,142 Rows
      Credits = 44 Rows

      Delete
  11. That's very bad performance compared to custom sql queries and stored procedures, sql server should have not much problem handling millions of rows with proper joins, I think I'd be better off running custom stored procs

    ReplyDelete
    Replies
    1. Hi Gary,

      I don't think your conclusion is right here. Above I showed the one bad way of use eager loading. So if you use EF on proper way you'll have lots of nice and cool features on it.So use EF in the right manner. Please read below article to know more about the performance improvements of the EF.

      http://msdn.microsoft.com/en-us/library/cc853327(v=vs.110).aspx

      Delete
  12. Hi Sampath,

    This article is very informative and makes sense.

    We have a Windows service running on a Windows Server that polls a SQL server database on another Windows server at regular intervals to check for new records on a SQL Database table. If there any new records to be processed, it attempts to fetch the record along with it related entities using INCLUDE statement to proces this new record into a legacy IBM Mainframe server. This windows service uses a Entity Framework query that has 57 includes in this one query and the T-SQL join that it generates is killing the SQL server when we look at in the Activity Monitor. The basic idea behind this big include is to avoid round trips to SQL server by firing this big one big SQL query.

    I have tried to break this query based on your suggestion into smaller includes. I now have 14 queries with 4 includes in each of the query. I would say that these 14 queries will generate smaller joins but I now have 14 round trips. Do you see this as a optimal solution? I reckon, our first priority is to reduce the load on the SQL server because of the MASSIVE JOIN that is generated because of the query with 57 includes.

    Also, can I ask you one more question? The big query with the includes as written with NoTracking() because it was a read only query. When I broke down the queries, with NoTracking(), results from the 2nd query overwrites the values returned back from the previous include. So, I reckon, I will need to drop the AsNoTracking()? Otherwise, only the last query's values are returned.

    Thanks,
    Venky

    ReplyDelete
    Replies
    1. Hi Venky,

      Thanks :)
      Can't you use stored procedure on your scenario ? If you use EF 6 or above, you can use SP. I hope It'll give better performance for your scenario.
      Note that the number of includes should be limited to 2 if you need to have better performance as I mentioned on conclusion section.I didn't test above with the NoTracking().I think you'll have issues if you use NoTracking().So drop it and test it.Note that you can create the index keys on your table according to your EF query.
      That thing will increase the performance of your query.Please check the below mentioned article.Good luck !

      http://sampathloku.blogspot.com/2012/08/how-to-convert-linq-entity-query-into-t.html

      Delete
  13. Hi Sampath,

    Thanks for your reply. We also have a ASP .NET MVC Web that queries the same database where we are using SQL stored procedure and we're getting timeout from the stored procedure as well. Basically, the SQL stored procedure uses more or less the same Query that the entity framework generates with the includes.

    For now, on the windows service, I have broken down the query with 57 includes into 14 queries with 4-5 includes on each of the query per your example. Now it means that we have 14 round trips to the database but I reckon the load on the SQL server will be reduced because now the big query generated by EF because of the one query with 57 includes has been replaced with 14 queries with 4 includes. I have 4 includes on each of the query as opposed to 2 that you had sugggested as a ideal benchmark. Do you reckon there will be a difference if I go with 4 instead of 2? If I break it with 2 instead of 4, I reckon I will be ending up with 25-30 round trips if I use 2 include.

    Also, when I removed the NoTracking() the queries started to work properly. With NoTracking() the query nulls the value returned from previous query.

    Thanks,
    Venky

    ReplyDelete
    Replies
    1. Hi Venky,
      Yes.It'll definitely improve the performance of your query if you use 2 includes per query.But yes it'll have roughly around 30 round trips according to your scenario.But with a minimum set of data.It retrieves data from your sql server and merged it inside the application server (IIS).So my advice is to bring the small set of data with more round trips.It'll give better performance than the larger set of data with 1 or 2 round trips. Actually you can test it with your live data.Hope this will help to you.

      Delete
  14. Hi,
    Just came across with this post and I tried to use it without success.
    Before me there were people saying your code doesn't work because you override the output object and you always replied they were wrong.

    So Let me tell you why you're wrong and this code doesn't work with Eager Loading.

    In the images I can see that the entities are Dynamic Proxies, meaning, you're using lazy loading. That's why it worked.
    If you disable the LazyLoading
    (Configuration.LazyLoadingEnabled = false;
    Configuration.ProxyCreationEnabled = false;)

    you'll see that in the 2nd query you don't have the Owner or Provider.

    Best regards.

    Tiago

    ReplyDelete
  15. hi
    when putting FirstOrDefault() to every call,shall i put it under last query?
    is that perfaorm well?

    ReplyDelete

Thanks for your Feedback.