Grouping Whisky Brands
By Wesley Satelis
April 9, 2022
In this post we will be using the unsupervised grouping method Partition Around Medoids (PAM), to create clusters of whisky brands based on ratings given by users of the website https://www.whiskybase.com/whiskies/brands. The PAM method is a variation of the widely known k-means, the main difference is that PAM uses observations in the dataset as cluster centroids, and k-means uses the cluster mean instead.
The original dataset has the following variables.
- Brand: Whisky brand;
- Country: Country of origin of the whisky;
- Whiskies: Number of different whiskies;
- Votes: Number of votes given to that brand;
- Rating: (0-100) Rating given by a regular user to that whisky;
- WB Ranking: (A - G) Ranking based on ratings given by specialists in whisky.
The following table shows how many whisky brand each country has, I chose to discard countries with less than 10 whisky brands since those wouldn’t yield very interesting results.
Country | Number of whisky brands |
---|---|
Scotland | 3670 |
United States | 1421 |
Germany | 401 |
Ireland | 322 |
Canada | 161 |
Japan | 130 |
France | 118 |
Switzerland | 106 |
United Kingdom | 86 |
Australia | 82 |
Austria | 76 |
Netherlands | 55 |
Sweden | 41 |
Belgium | 30 |
India | 28 |
Denmark | 24 |
New Zealand | 22 |
Czech Republic | 17 |
Spain | 13 |
Taiwan | 10 |