This is the third blog post in our BW/4 HANA journey. We travel using an instance you can spin up via SAP Cloud Appliance Library and it takes about 30 minutes before you can start driving. However, the box is totally empty, apart from some Business Content InfoObjects primarily in the BPC and IP area, which is not going to be available until the second half of 2017. It looks like SAP is getting ready for the future!
With new Business Content still in the making, we are in dire need of data, data, data to continue our journey. Of course we could opt for standard (read boring) AdventureWorks or eFashion data… but we thought it would be more entertaining to try and find some FUN data.
>> The Search Process: Google is your Best Friend
So I turned to my best friend Google to find a database on Marvel Super Heroes of which I’ve been an avid fan of, since childhood. The closest that I could get to was a graph database of the Marvel Universe which gives a very nice insight to the Marvel Universe social network. Major links are the X-Men, the Avengers, Spider-Man and Wolverine, so it’s best to tweet to them, to get your message across the Universe.
After some further browsing, I stumbled upon these two great sites which have a collection of very interesting and often hilarious datasets:
- 100+ Interesting Data Sets for Statistics by Robb Seaton
- Publicly available large data sets for database research
There are loads of datasets out there but more suitable for Data Science and Machine Learning purposes. A very nice one was the Netflix Prize Data Set which was used in a million-dollar (!) competition to help improve Netflix’s recommendation algorithm. The data is still available here, if you would like to have a stab at it.
Another fine example of a lucrative use of big data is analyzing Twitter feeds to predict the stock market
If you are looking for really big datasets, AWS is the place to be but that’s something our Data Science Class of 2016 can further explore.
>> The Ultimate Datasets
Most of the datasets found were not really suitable for our use case (too big and unstructured). With the help of Just-Michael we finally landed on…. (Drum roll please…) Crime and Public toilets. If that doesn’t sound like a FUN combination, I don’t know what else will!
Keep following us to find out where this will lead us, with our next steps in the BW/4HANA Journey.
This article belongs to
Tags
- BW4HANA
- BW4HANA Journey
- DATASET
- FUNDATASET
Author
- Ralf Slofstra