A Path Into Data Science While Creating a Business Purpose
Where I Come From
It often puts a smile on my face when I read that somebody was late in data science when moving into this profession at age 30 or even as late as 35 years of age.
Actually, I was 48 years of age when getting into the field of data science.
My whole professional life I’ve been embedded in the banking and real estate industry. In this business frame, real estate debt financing, real estate investment, risk management and restructuring have been my core undertakings.
But I was old school. Yes, of course — in the early days, we also created our investment tools and tried to use market information for our investment or debt financing decisions.
Though, those tools were created in an Excel environment and we were far off from using data in a structured way as it is possible in today’s world.
In this article, I would like to show how I found my way into the world of Big Data and how you can combine your early, more conventional business expertise with the advanced toolkit of Big Data in order to cover a business purpose for a whole industry.
How It Started
Actually, I came around a book called Moonshot! Game-Changing Strategies to Build Billion-Dollar Businesses written by the wonderful John Sculley, the former CEO of Pepsi and Apple (references of all books mentioned are attached below).
And there is this one quote:
Today there is a tsunami that involves four exponential technologies converging at such speeds that they are ushering in a second digital age.
The first leg of this technology tsunami is cloud computing.
The second leg of this technology tsunami is termed the Internet of Things.
The third leg of this technological tsunami effect is Big Data.
The fourth leg of the exponential technology tsunami is all about mobile.
Not aware of those other things, not even aware what Big Data really means (yeah, you would not believe…) I somehow realised that the area of Big Data is -to some extent- connected to Statistics.
At the very same time, it just happened that I got interested in the topic of statistics.
Don’t know why, but it was enough for me to bite the bullet and look deeper into the area of statistics and see where this will head to.
But where to start?
Statistics literature is not really known to be extensively entertaining. Frankly, the content is quite hard to understand when you start from scratch.
Lucky me, the learning platform www.edx.org came to my help!
Columbia University’s online courses on this platform:
- Statistical Thinking for Data Science and Analytics
- Machine Learning for Data Science and Analytics
- Enabling Technologies for Data Science and Analytics: The Internet of Things
became my first steps in this area. Though, the expertise comes in a bulk and can be rather overwhelming for a newbie.
I did not catch it in the very first place, but in one of the first lectures there was some side hint in the form of a book recommendation:
I cannot express my gratitude to the author often enough! This book opened the gates to the field of statistics for me. Lightly written, entertaining, at the same time challenging and explaining anything what you have to know as beginner. Big recommendation!
From here, I got hungry for more.
Next goal was to solidify the knowledge and go further.
In this context, I would recommend
- Statistics Essentials for Dummies
- Statistics II for Dummies
- Statistik: Klassisch oder Bayes
- and of course
Don’t be shy that you are addressed as a “Dummy”. Those are really good books, easy to read and make a dried up stuff pretty consumable.
And yes, I did ALL of the 1,001 exercises. Some I did two times and the really tough ones I did three times (or more) until I got it…
Changing The Gear
At some point in this journey, I understood that I could increase the knowledge in statistics but I started to get limited by the tools I had at hand in order to turn this knowledge into practice.
This was a serious problem as I am this type of learner who needs to practice on his own (quite a lot, to be frankly…) in order to be able to “see” the substance. Unfortunately, Excel was not an option in this case anymore.
After some misfit trials, I detected the programming language R.
What a revelation! This is an open-source language especially created for statistical problems and offers almost unlimited use.
Issue was that I was also a newbie in programming…
But again — learning platforms like www.edx.org to the rescue!
Harvard University created a whole programme of interconnected courses not only dealing with R programming, but also handling statistics, devtools and machine learning. Programme in detail:
- R Basics
- Inference and Modeling
- Productivity Tools
- Linear Regression
Just to reveal my initial level of grasping: I did not comprehend why I would have to learn the difference between a vector, a data frame, a matrix, a list or an array. How boring this was. So I skipped it. Needless to say that I came back very fast to follow up on these topics…
I also have to admit, running through those courses would not have been possible without referring to some complementary sources of wisdom. In front of all, I would recommend:
Another sources would be e.g.
- Practical Statistics for Data Scientists
- Machine Learning with R (good for a basic understanding though this one is already bit outdated as it does not cover the tidy-models — machine learning environment)
- Tidy Modelling with R
Of course, those are just excerpts and examples of possible courses, books and ways to go. But I can tell you it worked for me. It literally opened horizons in terms of understanding and in terms of deepening your professional toolkit.
And it does not stop there. At this stage it is time to combine this knowledge with practical expertise. And it is time to handle special issues.
In a nutshell, there are a lot of offers and the amount of available wisdom which is out there seems endless. Here some features:
- medium.com: anything about data science, big data and technical issues;
- Analytics Vidhya (via analyticsvidhya.com): anything for the data science community (I meanwhile have the honour to be a writer for this page);
- RPubs by RStudio (rpubs.com): anything about R programming topics;
- Rbloggers (r-bloggers.com): anything about R programming topics and one of my favourites;
- Tidy-models (tidymodels.org): machine learning with a comprehensive framework in R;
- stack-overflow (stackoverflow.com): for special questions and topics in programming;
- towards data science (towardsdatascience.com): the name is program…;
- Richard Mc Elreath: Statistical Rethinking — A Bayesian Course with Examples in R and Stan; available as book or as tutorial on youtube.com and on Github;
- Michael Betancourt: the grandmaster for anything which is connected to Bayesian Statistics;
- Andrew Ng: the grandmaster for anything which is connected to Deep Learning/ Artificial Neural Networks;
- Journal of Statistical Software (jstatsoft.org): you guess…
- countless other blogs, publications and contributions and (partly) freely available on the net.
To get an idea how things are moved into action, e.g. the following books:
- A Man For All Markets
- The Signal and the Noise
- Forecasting — Principles and Practice
- Machine Learning — A Probabilistic Approach
- An Introduction to Bayesian Data Analysis for Cognitive Science
With time you will develop a broad portfolio of covered topics in the area of Big Data/ Data Science. As an example, here is mine:
Anyway, it is time to start your own action. After all you want to embed all this knowledge and expertise in your professional life.
In this context, it can be of tremendous advantage when one brings in some business experience. For me, it was commercial real estate investment and financing, restructuring and risk management.
I started to combine my decades of “conventional” business expertise with all these fascinating tools from “big data” in order to create a business purpose!
This purpose, or better the vision was to change the way financial decisions are done in the commercial real estate industry by embedding data driven risk decisions and incorporating advanced predictive analytics methods in the operational workflow of companies.
The goal is to support decision makers by providing early warning systems, predictive options for decision making and a combined analytics of market and business risk metrics.
In short, we intend to reduce the uncertainty of decision makers in an industry which is typically exposed to longterm commitments in an increasingly volatile market environment.
Once more, old fashioned experience is combined with new skills in order to create a new business purpose! In this case: RiskTech solutions for the commercial real estate industry.
In doing so, we try to build the bridge between the world of advanced technical tools and latest state-of-the-art model developments to what makes sense in the respective industry and drives the best results for the decision makers.
So, sometimes we also opt for a seemingly “outdated” modelling framework when we see that it gives better results, is more stable and more scalable for the business usage respectively when it simply fits much better for the task at hand. In any case, it is good to be able to rely on your business experience.
In this article, I tried to argue how you can start your journey in the world of Big Data, and how you can combine these new skills with your previous experience in order to create a new business purpose.
Needless to say, there are so much more options than just the one I showed here. Maybe you like the programming language Python more than R, or you prefer www.coursera.org, www.analyticsvidhya.com or www.datacamp.com to www.edx.org as learning platform respectively you know some books or blogs you find so much better than I presented here.
Fine with me and you should definitely proceed!
As long as you have the big picture in mind: Acquiring new skills in an area which is driving the second digital age and combining it with your earlier industry expertise in order to satisfy a business need!
Moonshot! Game-Changing Strategies to Build Billion-Dollar Businesses by John Sculley, 2014
Head First Statistics by Dawn Griffiths, 2005
Statistics Essentials for Dummies by Deborah Rumsey, PhD, 2010
Statistics II for Dummies by Deborah Rumsey, PhD, 2009
Statistik: Klassisch oder Bayes by Wolfgang Tschirk, 2014
Statistics for Dummies — 1,001 Practice Problems published by John Wiley & Sons Inc., 2014
The Book of R by Tilman M. Davies, 2016
Practical Statistics for Data Scientists by Peter Bruce & Andrew Bruce, 2017
Machine Learning with R by Brett Lantz, 2019
Tidy Modeling with R by Max Kuhn and Julia Silge, 2021 (pre-release)
Statistical Rethinking — A Bayesian Course With Examples in R and Stan by Richard McElreath, 2020
A Man For All Markets by Edward O. Thorp, 2017
The Signal and the Noise by Nate Silver, 2012
Forecasting: Principles and Practice by Rob J. Hyndman and George Athanasopoulos, 2021 (3rd edition)
Machine Learning — A Probabilistic Approach by Kevin P. Murphy, 2012
An Introduction to Bayesian Data Analysis for Cognitive Science by Bruno Nicenboim, Daniel Schad and Shravan Vasishth, 2021