Over fetching Data with Rest API's
Api's are fantastic, but only if they are used effectively. While building an API for search operations, developers tend to provide a lot of filters in the API itself, and based on these filters the size of the data varies, which can be more if the expected filter is returning a huge number of records. Also at the consumer end, the data as received sometimes gets used holistically but most of the time, clients use the required data and leave the rest of it. In another sense, they take to get more than they can consume. This problem is called Over fetching of data.
The problem as solved by Facebook (now Meta).
The same problem of over-fetching exists with most of the internet-based applications and the same was there with Facebook in the year 2012. As they enhanced their mobile footprints to deliver a single development interface that can enable write once run everywhere scenario, they experience a lot of performance issues with their apps, causing it to sometimes even crash. A considerable amount of code is required to build the data on the server and parse it on the client-side. This frustration inspired them to develop an API that enabled the clients to dictate the level of data they require to load. Thus reducing the time performed to prepare the data and then to parse it. Graphql.org explains more details on graphql and different libraries available to use it.
Extending Graph QL to filter data at the database level.
Problem in fetching data without projections.
In the below image, I have explained a scenario where with the help of graphql we are reducing network load as well as data processing time on the API or server level. Now the overall sequence of events suggests that the filtration of data happens on no-4, i.e. API does the filter, but still, the data pull from API to mongo will be pulling the entire bunch of data for the users, which you can see as the JSON pulled in step 2 & 3 is basically pulling all the user fields for the fetched documents. Hence although the client is relieved with the issue of over fetching the data still the API which fetches the data from the database does has this issue. So how can this be fixed? It's explained in the below section.
Solving the problem
The answer is dynamically converting graph ql query to mongo db projection and using those projections while querying the database as explained in the next section. For our explanation, i have shown in points 2 (a) and 2 (b), that with an algorithm we can extend the graph ql query itself from server to mongo by converting it to projections.
One might ask what on earth will projection optimize the end output, but the point is projections help to minimize the data size transferred from mongo-db server to the application server, thus reducing the network cost, and mongo-db also recommends using projections for quicker execution of mongo queries. Thus by using a custom algorithm we can reduce total cost at API level with the help of mongo db projections and also at the UI level with the help of graphql.