My Big Data Experience – Part I: Where should I start?

My Big Data Experience – Part I: Where should I start?

Few years ago, eating with my usual lunch group, one colleague mentioned “beeeeeg data”. That was the first time I heard the term, it sounded really silly and I joked, “what is this new hype now, it sounded like a sales pitch to sell me a 3D-TV” like any other hype train, I had no intention of jumping on.

And then here I am, on that said train sharing my experiences. I wanted to share my thought working in a data science domain. Perhaps I could provide some value towards the data science community with my experiences. I will update them as time goes on, here are some topics that I wanted to share over time:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics.
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

What you have probably heard about big data / data science

  1. Money!
  2. Hottest job at the moment
  3. Machine Learning / Artificial Intelligence
  4. Maths, stats and programming required
  5. Data munging
  6. But mostly this diagram

data_science

These ideas are thrown around fairly often, but there are multiple roles within the data science domain. For example: Data Analysts, Data Scientists, Business Analysts, Data Engineer, Data Architect, Statistician, Database Administrator, and Data and Analytics Manager. You can read the article that explains the different role within the field of data science here (source: kdnuggets).

I believe this is one main reason why companies are struggling to find data scientists. There is this perception that data science can be done by one person instead of a team, similar to hiring a full-stack developer where the developer must know how to code the front-end, back-end, design the user interface, and setting up the production environment. Company should instead look at different roles that fit a data science team. It is fairly rare to find one person with all the required skill-sets, hence the term “Unicorn” is used to describe data scientists.

What people don’t tell you

  1. Data science is a team effort, even if you are an “Unicorn” that can do everything.
  2. Communication and sharing idea with team members is super important, it is necessary to cross validate each others ideas.
  3. You will need to learn to speak to non-technical people using easy to understand business examples.
  4. You will need to ask lots of business questions across multiple departments to get a better idea of the business processes to do your analysis.
  5. Data product will take a long time to develop, especially when you do not have enough data and data collection is very time consuming.
  6. You will have great difficulty obtaining data, because of politics, red tapes and lack of infrastructure.
  7. You will have difficult time working with data, because there are no ways to properly  link up multiple sources and data integrity is a real issue (raw vs. processed data).
  8. Data munging and exploratory data analysis will take up 80% of your time.
  9. Machine learning is fun but it is only 20% of the work.
  10. You will have to constantly remind people that data munging and feature engineering is more important than machines learning and therefore should spend more time there.
  11. You will have enormous pressure from management to provide value in terms of a data product especially when they do not understand the data science methodology, this can be solved by providing small findings continuously.
  12. You will be in meetings, a lot.
  13. There will be many projects that you need to deliver and you will need to learn to prioritise your tasks and juggle between projects.
  14. Every business problem is unique, there is no one method to solve all of them. You will need to spend your spare time learning and researching the problem and upskill yourself.
  15. Most companies don’t have a BIG data problem.
  16. You need to be passionate about the data, this is not negotiable.

Conclusion

My final suggestion to anyone who wanted to start a career in data science is to answer these questions honestly then go forward:

  1. Define the role you wanted to play.
  2. Do you have a passion for data science?
  3. Are you willing to spend time after work to improve yourself?

I believe if you have the right attitude and aptitude, you will be successful in any path in life. Good luck!

3 thoughts on “My Big Data Experience – Part I: Where should I start?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s