Recommender systems have been made famous by Amazon – “Customers who bought this also bought that …”. Organizations in retail and related businesses who are not using a recommender system are behind the curve. The fact that fifty or so recommender systems are available shows how widespread is their use. You can find thirty or so of them listed here on this web site.
Not all recommender systems are equal, and several well known methods are used to create recommendations. Obviously they are based on some form of filtering (we don’t want to recommend every product we sell), and the two main methods are called collaborative filtering and content based filtering. Each has its strengths and weaknesses, and in an attempt to cancel out the weaknesses, a hybrid approach can also be used.
Collaborative Filtering
This is based on the idea that a community of customers demonstrates behavior that can be exploited. If a customer buys A and then B, another customer buying A might be amenable to buying B. Collaborative filtering uses two different approaches. Item based filtering is precisely the example just given. It bases recommendations on how often products are purchased together – either at the same time or during different shopping sessions. The profile of the customer is not used at all, other than details of products that have been purchased. The more customers purchase two or more items, the more the associated items will be recommended to a new customer buying one of those items. The strength of this approach is that a retailer needs no profile of a customer to start making recommendations. It overcomes what is called the ‘cold start’ problem.
The other approach to collaborative filtering is called User based. This works by building a profile of customers – what they have purchased in the past, their age, and any other details available to the retailer. Customers are then grouped together based on these profiles using a variety of data mining techniques (K nearest neighbors is a favorite). When a customer is shopping, the recommender system will suggest items that other customers have purchased based on the similarity of profile. This approach does suffer from the cold start problem. Until we have sufficient history for a customer we cannot say which group of customers they belong to.
So item based filtering focuses on the products that are typically purchased together, and user based filtering builds a profile of customers so they can be grouped in some way. User based filtering is much more difficult and resource hungry. Customer profiles need to be rebuilt regularly, otherwise they quickly become useless, and for a large retail operation with millions of customer this may not be feasible. A hybrid approach can be taken, where new customers are served recommendations based on an item based approach, and established customers using the user based approach.
Content Based Filtering
A wholly different approach is taken in content based filtering. Additional content is added to product information, usually in the form of tags. These tags might indicate price range, product features – and so on. When a customer is looking at a particular product, all those with similar tags are served up as recommendations. The problem here of course is the overhead associated with creating tags, and more importantly, knowing which tags are meaningful to customers. Even so, this approach does not require heavy-duty computing resources, and can be manually fashioned by knowledgeable users in the business.
Reporting
It’s one thing to put a recommender system in place, and it is another for it to be effective. Any recommender system should come with copious reporting capabilities, so management can see what is working and what isn’t. A/B testing scenarios should be supported along with other methods of fine tuning. Again, such tools should create suitable reports for business users to make decisions.
DIY or Solution
Most organizations will buy a ready made recommender solution – and there are lots of them. If an organization already has a data science team, then they may choose to build their own – but this is unusual. A decision then has to be made on whether to use an online solution, or something that is hosted on-premises.
Software as a Service (SaaS) cloud based solutions are popular (e.g bitREC, Magnetic and BlueKnow) , and can often be easily integrated into on-premises applications using a programming interface of some sort. This type of offering can be strengthened by the use of external data (demographics etc.).
Finally, there is a fine line to be drawn here over privacy issues. Netflix for example had to stop a project because they were sued over privacy issues. So be careful.