To help build your data science skills, we're inviting you to join the National Data Science competition where you will be using real-world data sets and advanced analytical tools. Through this project, you will develop modern data science skills and apply them to a captivating challenge: assessing the "College Happiness Score" and various factors influencing it.
Challenge: Predict the "College Happiness Score" across American Universities
As prospective students navigate their higher education options, understanding the value and satisfaction a college provides is crucial. Leveraging data from the College Scorecard Database, Forbes Rankings, HappyScore Data, Crime Data, Undergraduate Enrollment figures, and insights from Columbia University's Advanced Data Analysis Course, students will create models to gauge university life quality using any subset of combination of features provided in our dataset. They will delve into what makes a college environment not just survivable, but enjoyable, and predict how various institutions rank on the happiness scale.
<aside> đź’ˇ You can access the data here.
</aside>
The basics of coding in Python, a language esteemed in both academia and industry. Techniques for handling and interpreting real, untamed data sets. Contemporary statistical modeling and machine learning methods, tailored for multidimensional data like the "College Happiness Score."
We have split the data in two parts - the first part will be your training data and the second part is the test data. Both sets of data have features and labels, but you will only be able to view those of the training data. The objective is for you to use the information available in the training data to create models to make a prediction using the test set features to make an educated guess about what the correct labels and college happiness score will be.
<aside> đź’ˇ Notes: Since this is a regression problem, we will use the Mean Square Error of your prediction and the true happiness score as the final grading principle. For details about Mean Square Error, please see here.
You can certainly treat it as a ranking problem. We do not expect you to have the full knowledge of ranking problems - you are welcome to use any methods you like. In any case, please be aware we need the happiness score in the end, not the ranks.
</aside>