So you have decided on the role you wanted to play in the big data space, and you know this is the path for you. Great! If not, I guess you will find out soon enough.
Here are the other posts in case you have missed it:
- Part I: Where should I start?
- Part II: Spare Time and Getting the Job.
- Part III: Building the Right Team and Breaking the Business Model.
- Part IV: Programming and Statistics.
- Part V: Data Munging and Feature Engineering. (Coming soon)
- Part VI: Yippee! Machine Learning. (Coming soon)
- Part VIII: How to Interview a Data Scientist. (Coming soon)
Spot the fluff
So here you go, you need to know all this to create your big data products. Enjoy!
Don’t worry, you don’t need know them all, just technologies you need to get the job done. What technologies should we use then? Imagine you go to a hardware store, walk down the aisle and see hammers, nails, screws, spanner, whole bunch of other tools and a feather duster that your mom used to hit you with. Which one do you buy? All of them? Definitely not, you are on a budget so what do you do? This is where Big Data Vendors comes to the rescue?
These vendor prepackaged a technology stack for businesses and provide supports, cutting down all the unnecessary fluff to get the job done. BUT, as we know, technology changes rapidly and do expect tech consolidation soon – meaning some technologies will be eliminated. Not to mention big data is still very much hyped. So what now?
The question is not really about the technology, it’s about the skill sets. Skills can be transferred and applied to different technologies and companies. Technology may not, since two companies of the same industry may have different technology stack.
Skills that are required for data science all have a few things in common:
- Soft skills (Verbal and Written Communication Skills)
- Business Acumen / Domain Knowledge
- Data wrangling
- Programming skills
- Distributed file stores
- Parallel processing
- Machine Learning / Statistics / Linear Algebra
- Real-time data streaming
- Real-time predictive analysis
- Data visualization
Back in my previous post I mentioned the importance of knowing which role you wanted to play, this is where it will play a big part. For example, if you are a data engineer then your main focus would be parallel processing (functional programming) and setting up data infrastructure, or if you are a data scientist then machine learning, statistics and sampling strategy are important.
One thing to keep in mind is that technologies are merely tools and will change over time but skill set will remain useful and be applied to new technologies. Always pick a technology with a strong community that is tried and tested, and invest your time learning these tools as needed. Simply put, be a skilled artisan that can use different tools (new or old), like learning the hammering skill instead of learning about the hammer.
Talk technologies, think skill sets, then focus!
You will need to focus, having knowledge in a little of everything sound good on paper but when it comes to doing the job, you will struggle. Build up your other area of expertise with time, one bite at a time, that’s how you will make progress. By the time you are proficient in all the technologies, big data would be out of fashion. So that’s out of the way, how do I get into data science?
Getting the Job
This is probably the hard part, but not impossible. Before you go out and find a data analytics job, here is something to keep close to your heart, and if you don’t have one (since we are all robots), keep it on a post-it so you can look at it everyday.
It’s easy to get into data science, it is hard to stay a data scientist.
Now, if you are in a position where your company is looking for data scientist, you are one of the lucky few, you just need to apply.
Here are some tips for the rest:
- Pet Projects – do a project that you are interested in, like predicting property or stock prices to name a few.
- Kaggle competition – plenty of practice and potential prize to be won.
- Show initiative – create the opportunity in the workplace by implementing machine learning, drive the change from the bottom up.
- Local meetups – Network, network, network, go meet other data scientists.
- Learn to learn fast.
Words of caution:
The only thing I find big about big data is the hype, it’s really big, which is a problem for job applications. This means that companies may or may not know what they are looking for, also there will be many competitions as well. On multiple occasions, I had companies asking for x, interviewed y, tested me on z and in the end doesn’t know what they want because they didn’t have a business case, they wanted big data just because they don’t want to miss the boat. My honest suggestion is to focus on building skill sets and wait for the hype to die down, then look for the surviving company to invest your career.
Sum it up!
Data science is not a get rich fast scheme, it’s about passion for problem solving and analysis, so don’t be in a rush to get into data science, take your time and enjoy the learning then apply it into your work. Hope you enjoyed reading this post. On the next post, I will be talking about how to add value to the business.