Today in this post we are going to do a simple but thorough EDA for Shopee-Price Match Guarantee, one of the competitions going on Kaggle. Exploratory data analysis is the first step towards solving any data science or machine learning problem. It is the simplest way of getting familiarity with the data available at your disposal. The more you know the data the closer you get to solving the problem.
So, let's get our hand dirty and enter the world of Exploratory data analysis. …
A lot of things that we see quite often tend to follow a routine. And, this routine often depends on time and varies with time. For instance, the number of people watching a movie on a particular day. Do you think it will depend on time?
Well, let’s see for people working from Monday to Friday which is a major part of the population we can assume that from Monday morning to Friday noon less number of people will go to theatres and cinema halls for watching the movie. And, we will see a major crowd on weekends. …
You might have heard that before proceeding with a machine learning problem it is good to do en end-to-end analysis of the data by carrying a proper exploratory data analysis. A common question that pops in people’s head after listening to this as to why EDA?
· What is it, that makes EDA so important?
· How to do proper EDA and get insights from the data?
· What is the right way to begin with exploratory data analysis?
Computers are really good at answering questions with single, verifiable answers. But, humans are often still better at answering questions about opinions, recommendations, or personal experiences.
Humans are better at addressing subjective questions that require a deeper, multidimensional understanding of context — something computers aren’t trained to do well…yet.. Questions can take many forms — some have multi-sentence elaborations, others may be simple curiosity or a fully developed problem. They can have multiple intents, or seek advice and opinions. Some may be helpful and others interesting. Some are simple right or wrong.
Problem Statement Overview:
Overfitting is a common problem with machine learning models especially when we have just a few training datapoints. The lesser the number of train data points, the less able is our model to generalize on the unseen or the test data points.
Hence, we need to be careful while the training process and see how our model performs. By just getting an accuracy of about 90% on train data we cannot assume that our model will perform the same on unseen dataset.
This is the problem we are going to deal with today. We will also see by…