All Posts
-
Mar 19, 2024
This notebook explores a dataset containing the top 50 bestselling books on Amazon from the years 2010 to 2020 inclusive. Data was scraped from Amazon webpages and additional information was obtained from Google Books API.
-
Mar 15, 2024
In this notebook we will explore a synthetic bank customer churn dataset used in a Kaggle community prediction competition, treating this like a real world problem and avoiding the use of any performance-boosting tricks that is are only specific to this competition dataset (i.e. utilizing data leakages due to the syntheticity of the data.
-
Mar 13, 2024
In this notebook we take a look at a [Kaggle Playground Series](https://www.kaggle.com/competitions/playground-series-s4e2) competition where users submit their predictions for a multi-class classification problem on the sample's weight class.
-
Feb 20, 2024
In this notebook we will be exploring the IMDB dataset available on Kaggle, containing 50,000 reviews categorised as either positive or negative reviews. A text classification model will then be fine-tuned over DistilBERT and evaluated.
-
Feb 2, 2024
Capstone project for Google's Advanced Data Analytics Course on Coursera, simulating a scenario where the HR department of a large consulting firm is looking for insights from our data analysis and predictions on employee churn data.