wenhao.L

All Posts

Mar 19, 2024
Exploratory Data Analysis for Amazon's Top 50 Bestselling Books from 2010-2020
This notebook explores a dataset containing the top 50 bestselling books on Amazon from the years 2010 to 2020 inclusive. Data was scraped from Amazon webpages and additional information was obtained from Google Books API.
Categories: Data wrangling , Data analysis , Visualization
Mar 15, 2024
Bank Churn Exploration and Binary Classification
In this notebook we will explore a synthetic bank customer churn dataset used in a Kaggle community prediction competition, treating this like a real world problem and avoiding the use of any performance-boosting tricks that is are only specific to this competition dataset (i.e. utilizing data leakages due to the syntheticity of the data.
Categories: Data analysis , Visualization , Machine learning
Mar 13, 2024
Multi-Class Prediction of Obesity Risk
In this notebook we take a look at a [Kaggle Playground Series](https://www.kaggle.com/competitions/playground-series-s4e2) competition where users submit their predictions for a multi-class classification problem on the sample's weight class.
Categories: Data analysis , Visualization , Machine learning
Feb 20, 2024
Text Classficiation with DistilBERT (IMDB Dataset)
In this notebook we will be exploring the IMDB dataset available on Kaggle, containing 50,000 reviews categorised as either positive or negative reviews. A text classification model will then be fine-tuned over DistilBERT and evaluated.
Categories: Data analysis , Visualization , Machine learning , Deep learning
Feb 2, 2024
Capstone Project: Salifort Motors HR Suggestion
Capstone project for Google's Advanced Data Analytics Course on Coursera, simulating a scenario where the HR department of a large consulting firm is looking for insights from our data analysis and predictions on employee churn data.
Categories: Data analysis , Visualization , Machine learning