My Big Data Experience – Part IV: Programming and Statistics

My Big Data Experience – Part IV: Programming and Statistics

Last post I spoke about value, MVP (minimum viable product), obstacles, as well as building a high performance team. This post I would like to talk a little more about value proposition, skill sets and how it contributes to MVP.

Here are the other posts in case you have missed it:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics.
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

Secret to Success

It’s passion! No, I am not talking about that tingling feeling you get in your pant when you see an attractive guy / girl / sheep. I am talking about the one thing that wakes you up in the morning. That’s right boys and girls, passion is: Coffee

NO! The other thing, the drive, the motivation that keeps you going! For some it’s their children or running a pet shop, for others it about saving humanity. Why is this important? When passion is combined with skills, you can achieve great things.

If passion is like an engine, skills are the oil that runs it.

Especially in big data and data science, there are countless challenges and hardship waiting for you. You are going to hit brick wall after brick wall (that’s why my face is so messed up… *cry in the corner*): when technologies doesn’t work together, when code runs on local but fails on the cluster, when new libraries is not backward compatible with old code, when your model cannot be productionised, the list goes on.

If you are passionate about the work you are doing, these challenges are still hard but you will have the heart to take them on, one after another. I believe this applies to any business or field. Without passion, you won’t last long in this field. I know people who works in big data for money and they hated the job after a few months in, and very quickly they would change to a different role. Big data is constantly evolving and people are still trying to find and establish best practices. Even when you are passionate, it is already hard to keep up with the technologies and methodologies not to mention when you are not. It is moving so fast that no one person could keep up to date and still deliver value.

Unqualified Need Not Apply

So how much skills do you need to accomplish the tasks at hand?

Does this look familiar:

“We are looking for a candidate with at least 8+ years of commercial experience in Java, Python, Scala, R. Must have masters or doctors in mathematics, statistics, or computer science. Must have worked with Hadoop, Spark, Hive, HBase, Kafka, Storm, Elasticsearch, Kibana. Must have experience in the cloud services i.e. AWS. You will also meet stakeholders to gather and analyse business requirements and build predictive models using machine learning; required to extract insights from data and present findings and visualisations to C-suite. Beneficial if you have solved world peace, saved Lois Lane, and have been elected as a president of a country”.

The scary thing is that I have seen jobs with descriptions similar to this…

Choose a Car!

(Image Source)

Anyone chose the Mercedes SMART (bottom right)?

Everybody wants to be able to build / own a Ferrari, Lamborghini or some other luxury car, but do you need it? Similarly, companies wants to be able to be the next Google, building the next DeepMind that kicks ass and chew bubble gum, but will it solve your current business problem? With this kind of mentality (overly-hyped-about-how-big-data-and-machine-learning-will-make-your-mother-in-law-happy), companies wants to hire the best with all the bells and whistles.

But all you really need is this:

(Image Source)

Look at how awesome this car is: wheels, passenger seat, engine, moves you from A to B. What more do you need? Are you still thinking about that schweet-schweet Lamborghini, be honest now…

Similarly, you can keep dreaming about your next Big-Digital-Disruption (I still don’t understand what this means) but it would be more practical if you stick to the basics, hence it is so important to emphasise on a business case and the MVP for it, so you know what “done” looks like. Why? Because it will start to provide value which means your boss will be happy to keep you on the payroll, having a Lamborghini is the cherry on top.

So… ask yourself:

  • What skill sets do you need to build a Lamborghini?
  • What skill sets do you need to build a basic car?

Lamborghini Engineer’s skill sets vs. Mechanic’s skill sets, it’s very different right?

So what do you really need?

Just enough to get the job done!

Side Note: If you don’t even know how to build the basic car, don’t even think about building a luxury car. When you have the basics, then re-evaluate the next MVP, hire or train accordingly. Rinse and repeat to build it up  to a Lamborghini slowly.

I don’t have a straight answer as to how much programming, statistics or mathematics you will need, it’s really depends on who is hiring and the project they want to build. Typically there is a spectrum of employers ranging from Research Labs to IT shops on the other end. If your employer is a research lab, they will want some serious stats / maths background, as opposed to an IT shop which will want more programmers and engineers who knows how to write clean code that scales well. That is why you will need to research which types of project you want to be part of and also the type of person you are. Personally, I am a hands-on type of guy, so I prefer building products and seeing it in action for the major part of my time, with some R&D on the side. Hopefully you will get interviewed by someone who knows what they are looking for, because there are many stories where candidate interviewed for a data scientist but are expected to be a developer, and guess what, they quit their job very quickly.

That been said, one thing for sure is that you will need to be able to learn and adapt quickly whether it’s stats, maths, programming, communication, domain knowledge or crisis control (yep, if your system doesn’t fall over at least once, you are not playing with big data). You will be faced with challenges every single day, if you are not, then you are not innovating, and are simply doing what has been done before and that’s not the spirit of data science.

That’s why I want to end off with a quote that sums up this post.

Science is not only a disciple of reason but, also, one of romance and passion.

— Stephen Hawking

On the Next Episode

Next, we will travel to the world of data science and see how important each of the components are, you might be surprised by the answer! Hope you have enjoyed this, any questions and comments are welcome.

Until next time, stay curious!

3 thoughts on “My Big Data Experience – Part IV: Programming and Statistics

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s