Time flies when you ‘R’ having fun

As you know from an earlier blog post, Just-BI is very keen on staying up to date and knowledgeable of the latest developments. Therefore, some of us are participating in the Data Science course @ DIKW. And as we find it important to share knowledge with our customers, we want to keep you posted.

Here are just a few things we have learned during training days 2&3.

  • Programming in R and Python
  • Preparing datasets for Data Mining
  • Creating models using methods like:
    • Logistic Regression
    • Classification
    • K-means
      and understanding their algorithms from both a functional and mathematical perspectiveTraining Models
  • Validating Models
  • Comparing ROC curves and confusion matrixes

But let’s maybe start with the big picture and look at some of the most important crafts of Data Science first.
What is it all about?

  • Discover unknown unknowns in data
  • Obtain predictive, actionable insights
  • Communicate business data stories
  • Build business decision confidence
  • Create valuable Data Products

Is it a bird? Is it a plane? Unless Superman works for you, you can’t possibly get all of that done by one single person. A Data Science TEAM is required, and not just any team of random people; it should consist of people with a variety of skills.

Here is an overview of the range of skills which should be covered by a Data Science team

source-h-d-harris-et-al-2013-analyzing-the-analyzers2

Source: Credit: H. Harris et.al. “Analyzing the Analyzers”

And in this picture you can see how the skill sets should be distributed between the different roles within a  Data Science team

analyzing-the-analyzers-e1409072743262

Source: Credit: H. Harris et.al. “Analyzing the Analyzers”

Once a Data Science team is in place and ready to start, the process of Data Mining (otherwise known as “work”) can begin. But where and how to start?

The Cross Industry Standard Process for Data Mining is considered the leading methodology to tackle possible problems and provides structure for any Data Mining project. This graphic illustrates what the work flow looks like and which steps to take first.

512px-crisp-dm_process_diagram

generic-tasks-bold-and-outputs-italic-of-the-crisp-dm-reference-model

Watch this space for coming updates and good reads. In future blogs, we will go into more detail of some topics mentioned here.

And for our Dutch readers (or for everybody who does not mind watching a documentary on Dutch television) some food for thought and interesting insights.

This video called “What makes you click” questions the (ethical) borders regarding collection and usage of personal data by organizations. Who uses this data and for what? Find out why even some of the data collectors are requesting the implementation of an ethical code around personal data.

This article belongs to
Tags
  • Dats Sience
Author
  • Viveca Cohen