The Apriori algorithm is used to discover frequent item sets and devise association rules from a transactional database (frequent pattern mining). It is based on a set of parameters called “support” and “confidence”. Support refers to the frequency of occurrence of each item in the database, while confidence is a conditional probability expressing how likely it is that the item is purchased with another.
It starts by identifying individual items with high support, and then extends them to larger frequent item sets using the downward closure property. The goal is to find the most frequent item set that meets the minimum support threshold specified by the user or a problem statement.
Apriori has been shown to be highly efficient in finding frequent patterns and can be applied to a wide range of applications, including transportation, medical diagnosis, and health care. It has been implemented to find frequent patterns in asthma medication and combined symptom-medication data sets.
The procedure of Apriori
Let’s assume that you have a database of transactions from a night store where people mainly buy wine and cheese or beer and potato chips. You want to identify the combinations of products that are frequently purchased together so you can create more effective advertisements for those products. You can use the Apriori algorithm to do this.
Firstly, the algorithm generates a database of frequent item sets based on a given set of item values. These item sets are then compared to obtain support for each set. If a subset of candidate item sets is not supported, it is removed from the list of frequent item sets and the set of supported item sets is updated accordingly.
The next step is candidate pruning. This involves removing all infrequent candidate item sets from the database, thereby eliminating redundant item sets. It is important to keep a minimum number of candidates because it reduces the total number of possible item sets and improves the efficiency of the algorithm.
After that, the Apriori algorithm traverses the item lattice in a breadth-first manner and discovers the frequent 1-itemset first, followed by the frequent 2-itemset and so on. If the list of frequent item sets is large, then it will need to scan the database many times and remove irrelevant item sets before and during the iteration.
Recommendations by Apriori algorithm
Recommendations are made using the Apriori algorithm, which combines the concept of Association rules learning to search for interesting patterns in large databases that contain historical transactions (buying habits of customers). This technique has many applications, including figuring out which items are purchased together in market baskets or financial analysis tools that help show how various stocks trend together.
The order list of an existing e-commerce platform shows customers’ purchase information within a specific time range. Because we study the association between product purchases, only the information of the order number and the purchase product portfolio is retained, as shown in the following table
According to the information in the table, there are A1-A5 products sold, and a total of 9 orders were generated.
Assumption: The more product portfolio purchases you make, the more likely you are to make an associated purchase.
This requires artificially setting a minimum number of occurrences based on experience. Because the number of orders above is too small, it is set to 2; that is, if this product portfolio appears more than 2 times, it is considered that it has the possibility of related purchases.
Therefore, it is necessary to count the number of occurrences of each product portfolio. Because there are five products, the probability that a combination contains the number of products is 1-5, which needs to be analyzed step by step.
Calculate the support of all items and remove the items with minimum support less than 2,as shown in the following table
Then combine two items. Calculate the support of all itemset and remove the itemset with minimum support less than 2,as shown in the following table
Then combine three items. Calculate the support of all itemset and remove the itemset with minimum support less than 2
Then combine four items. According to the above principles, the tables of the 4 items combinations are as follows:
<?php
require_once __DIR__ . '\vendor\autoload.php';
$samples = [['A1', 'A2', 'A5'], ['A2','A4'], ['A2', 'A3'], ['A1', 'A2', 'A4'], ['A1', 'A3'], ['A2', 'A3'], ['A1', 'A3'], ['A1', 'A2','A3','A5'],['A1', 'A2', 'A5']];
$labels = ['a','b'];
use Phpml\Association\Apriori;
$associator = new Apriori($support = 0.2, $confidence = 0.5);
$associator->train($samples, $labels);
$results=$associator->predict(['A1']);
foreach ($results as $re) {
foreach ($re as $va){
echo "$va";
}
}
Suppose a customer has bought products of A1, by using Apriori Algorithm we can recommend products of A2, A3, A3 to him