1. N+1 Problem
N+1 problem is mostly talked about in the context of ORMs. In this kind of problem, the system needs to load N children of a one parent entity where only the parent entity was requested for.
By default, ORMs are configured with lazy-loading disabled, so one query issued for the parent entity causes N more queries, i.e. one each for N child entities.
This N+1 problem is often considered a significant performance bottleneck, and so shall be solved at the design level of the application.
2. N+1 Problem in REST APIs
Though mostly directly associated, yet the N+1 problem is not specific to ORMs only. This problem can be related to the context of web APIs as well, e.g. REST APIs.
In the case of web APIs, the N+1 problem is a situation where client applications are required to call the server N+1 times to fetch one collection resource + N child resources.
This is mostly because the collection resource did not provided enough information about the child resources to help the client application to build its user interface altogether.
For example, a REST API returning a collection of books as a resource.
<books uri="/books" size="100">
<book uri="/books/1" id="1">
<isbn>3434253561</isbn>
</book>
<book uri="/books/2" id="2">
<isbn>3423423534</isbn>
</book>
<book uri="/books/3" id="3">
<isbn>5352342344</isbn>
</book>
...
...
</books>
Here /books
resource return list of books with information including only it’s id
and isbn
. This information is not enough to build a client application UI, which will want to typically show the books name
in UI rather than ISBN.
In some situations, the clients may want to show other information such as the author’s name and the publication year as well.
In the above scenario, the client application MUST make N more requests for each individual book resource at /books/{id}
. So in the total client will end up invoking REST APIs N+1 times.
The above scenario is only for example. The idea is that insufficient information in collection resources may lead to the N+1 problem in REST APIs.
3. How to Solve N+1 Problem
The good thing about the previously discussed problem is that we know what exactly what is the issue. And this makes the solution pretty easy.
Include enough information in single resources inside collection resources.
We may require to consult with API consumers, do the market research for similar applications and their user interfaces, or simply put ourselves in the client’s shoes.
Moreover, we may evolve our APIs over time as our understanding of client requirements improve. This is possible using API versioning.
Dunno, sounds like something that could be easily solved with an optional uri parameter or custom header e.g. “detailed”. That way you could avoid high payloads when not necessary.
I mean… Having a “?detailed=true” filter doesn’t fix the issue of the N+1 problem. You’ll still hit the DB with a ton of individual requests, rather than just 1.
We’re looking for O(1), not O(n) here. Your idea just shifts the workload to potentially only impact a smaller set of requests. The solution is to analyse where your DB is getting hammered a lot, and looking at revising the data structure to include that data in to the root item, rather than making extra queries for often accessed information… This is really just a symptom of the advantage that document/NoSQL databases give over normalised SQL ones, the fact that data duplication isn’t public enemy number one, and can, in certain situations, IMPROVE performance.
Great article, but you may want to include an example at the end that mimics the original example but is the solution to the problem. Explaining it is perfectly fine, but a visual example helps convey the idea better imo. In fact, maybe adding a visual to show the reduced number of requests for each example would be good too.
Graphql to rescue
GraphQL doesn’t solve the problem… GraphQL doesn’t concern itself with HOW THE DATA IS STORED!
If the data is stored/grouped/whatever poorly, then you’ll still encounter the N+1 problem described here. This is a logical science problem, rather than an application implementation problem.
From the GraphQL.org FAQ:
“these resolver functions should delegate to a _business logic layer_ responsible for communicating with the various underlying data sources”
GraphQL is just responsible for getting data out of your chosen database in an efficient manner… If, similarly to the example in this article, I have references to each book, based on its ISBN, then GraphQL will STILL have to make the same lookups, because the data is STILL stored in that way. It’s not magic, and it still needs to link data together via ISBN’s, which means it still has to make additional database lookups… 1, for the initial query (to get all the ISBN’s it’s interested in), then N for all the book names it needs to get (looking up each book with a given ISBN to find its name).