Instagram’s Explore page displays highly customised content specific to each user. With billions of posts and an equally large number of users, this feat is only achieved with the use of trained machine learning models. Instagram tackled this challenge by creating a series of custom query languages which support the scale of Explore while boosting developer efficiency.
Before building a recommending engine, Instagram addressed three important needs for its developer tools-
- The ability to conduct rapid experimentation at scale
- The need to obtain a stronger signal on the breadth of people’s interests
- The need for a computationally efficient way to ensure that the recommendations are both high quality and fresh
They created a new domain-specific language, ‘IGQL’, which was optimised for retrieving candidates in the recommender system. IGQL made it simple for performing tasks which are usually quite complex while allowing engineers to focus on ML (machine learning) and business logistics. It provided a high degree of code reusability, with programmers coding in a python-like manner and executing efficiently in C.
As Instagram has a large number of interest-focused accounts with tons of posts within them, they decided to sort on the account-surface level rather than the media-level. There are many ways in which a user can interact with an account including liking or saving posts. Instagram defines a value model to capture the prominence of different signals to decide whether the content is relevant. For example- saving a post takes high precedence compared to liking it. These accounts are called seed accounts and are used as a basis to find similar accounts for recommendations.
Using ‘word embedding’, they would study the order in which words appear in the text to measure how related they are. This is how they predict accounts with which a person is likely to interact in a given session within the Instagram app. They define a distance metric between two accounts and based on a KNN lookup, they find topically similar accounts for an account in the embedding. After the accounts have been selected, they are passed through a simpler ‘distillation’ neural network model before being passed through a main high-performance model. This is mainly done to improve efficiency and decrease the computational power required.
Instagram makes sure that the content they recommend is both safe and appropriate for a global community. Using a variety of signals and ML systems, they filter out policy-violating content and spam. They also make sure you discover a plethora of new interests by downranking posts from the same author or the same seed account.
Here’s what they had to say-
“The scale of both the Instagram community and inventory requires enabling a culture of high-velocity experimentation and developer efficiency to reliably recommend the best of Instagram for each person’s individual interests. Our custom tools and systems have given us a strong foundation for the continuous learning and iteration that are essential to building and scaling”
The best way to discover interesting new content on Explore is by interacting with accounts you like, which in turn helps the algorithm filter through the numerous posts to present you with the content you love.