Titanic competition revolves around predicting the survival status of passengers based on a set of features such as age, gender, class, and more. The dataset, derived from the infamous Titanic disaster, serves as a real-world scenario for honing data science skills. In this section of my portfolio, I share the journey of exploring the Titanic dataset, performing in-depth analysis, and implementing machine learning models to predict passenger survival. By navigating through this project, you'll gain insights into my problem-solving approach, model selection, and the thought process behind feature engineering.
This dataset from Kaggle contains data on Titanic Shipwreck Dataset, which can be found here. The dataset includes information about Sex, Age, Fare, Parch, and and many more. Some of the features included are store number, temperature, fuel price, and markdown data.
Matplotlib and Seaborn were used to explore the dataset to find patterns. Since it is a classification project, K-Nearest Neighbors and Logistic Regression algorithms were used to make predictions in this dataset.
From the model, we can see that the KNNeighbor was not a good predictor for this model with a 67 percent accuracy, however, logistic regression gave a whopping score of 94 percent, showing a great fit.
The percentage of people who survived the Titanic in the test dataset was:
Men - 18 percent
Women - 74 percent