In the last blog on this page, we introduced the CRISP model to you. This model is the leading methodology during a Data Mining project and it also has a central role during our training.
First, the Data preparation phase was covered: how to delete rows/columns in our dataset, how do we calculate the mean/median/modus and how do we generate a boxplot. Let’s say we started off easy. It becomes a lot harder when the dataset contains missing values, but, we as BI consultants and used to working with datasets, could relate to this topic as well.
But during week 4 and 5, we found ourselves in unknown territory. We covered the next phase in the CRISP cycle: the modeling phase. We heard about algorithms like the linear regression, logistic regression, clustering, classification and decision trees. Algorithms which are called weak predictors. Strong predictors are the ensemble techniques combining multiple weak predictors. Some of these ensemble techniques are Bagging, Boosting, Stacking, Random Forest plot and Neural Networks. Quite the black boxes of machine learning, because it is hard or even impossible to explain what really happens.
Those black boxes are just exactly the thing a company needs to make their predictions. A famous example of this is the Netflix competition. Netflix once held a competition to find a better algorithm to predict consumer preferences (to give better movie and show recommendations). The algorithms who were best in predicting were ensemble techniques. You can read more about that in this pdf document:
So we need to ensemble our skills and talents to make a complete Data Science team (as discussed in the previous blog), but we might also need to ensemble the techniques and algorithms.
Interested in finding out more about Data Science? Click here and bookmark the page so you don´t miss any updates.
N.B. Thanks to Eline Bangert for this contribution.
This article belongs to
- Data Science
- Just Blogger