My Big Data Experience – Part IV: Programming and Statistics

FeaturedMy Big Data Experience – Part IV: Programming and Statistics

Last post I spoke about value, MVP (minimum viable product), obstacles, as well as building a high performance team. This post I would like to talk a little more about value proposition, skill sets and how it contributes to MVP.

Here are the other posts in case you have missed it:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics.
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

Secret to Success

It’s passion! No, I am not talking about that tingling feeling you get in your pant when you see an attractive guy / girl / sheep. I am talking about the one thing that wakes you up in the morning. That’s right boys and girls, passion is: Coffee

NO! The other thing, the drive, the motivation that keeps you going! For some it’s their children or running a pet shop, for others it about saving humanity. Why is this important? When passion is combined with skills, you can achieve great things.

If passion is like an engine, skills are the oil that runs it.

Especially in big data and data science, there are countless challenges and hardship waiting for you. You are going to hit brick wall after brick wall (that’s why my face is so messed up… *cry in the corner*): when technologies doesn’t work together, when code runs on local but fails on the cluster, when new libraries is not backward compatible with old code, when your model cannot be productionised, the list goes on.

If you are passionate about the work you are doing, these challenges are still hard but you will have the heart to take them on, one after another. I believe this applies to any business or field. Without passion, you won’t last long in this field. I know people who works in big data for money and they hated the job after a few months in, and very quickly they would change to a different role. Big data is constantly evolving and people are still trying to find and establish best practices. Even when you are passionate, it is already hard to keep up with the technologies and methodologies not to mention when you are not. It is moving so fast that no one person could keep up to date and still deliver value.

Unqualified Need Not Apply

So how much skills do you need to accomplish the tasks at hand?

Does this look familiar:

“We are looking for a candidate with at least 8+ years of commercial experience in Java, Python, Scala, R. Must have masters or doctors in mathematics, statistics, or computer science. Must have worked with Hadoop, Spark, Hive, HBase, Kafka, Storm, Elasticsearch, Kibana. Must have experience in the cloud services i.e. AWS. You will also meet stakeholders to gather and analyse business requirements and build predictive models using machine learning; required to extract insights from data and present findings and visualisations to C-suite. Beneficial if you have solved world peace, saved Lois Lane, and have been elected as a president of a country”.

The scary thing is that I have seen jobs with descriptions similar to this…

Choose a Car!

(Image Source)

Anyone chose the Mercedes SMART (bottom right)?

Everybody wants to be able to build / own a Ferrari, Lamborghini or some other luxury car, but do you need it? Similarly, companies wants to be able to be the next Google, building the next DeepMind that kicks ass and chew bubble gum, but will it solve your current business problem? With this kind of mentality (overly-hyped-about-how-big-data-and-machine-learning-will-make-your-mother-in-law-happy), companies wants to hire the best with all the bells and whistles.

But all you really need is this:

(Image Source)

Look at how awesome this car is: wheels, passenger seat, engine, moves you from A to B. What more do you need? Are you still thinking about that schweet-schweet Lamborghini, be honest now…

Similarly, you can keep dreaming about your next Big-Digital-Disruption (I still don’t understand what this means) but it would be more practical if you stick to the basics, hence it is so important to emphasise on a business case and the MVP for it, so you know what “done” looks like. Why? Because it will start to provide value which means your boss will be happy to keep you on the payroll, having a Lamborghini is the cherry on top.

So… ask yourself:

  • What skill sets do you need to build a Lamborghini?
  • What skill sets do you need to build a basic car?

Lamborghini Engineer’s skill sets vs. Mechanic’s skill sets, it’s very different right?

So what do you really need?

Just enough to get the job done!

Side Note: If you don’t even know how to build the basic car, don’t even think about building a luxury car. When you have the basics, then re-evaluate the next MVP, hire or train accordingly. Rinse and repeat to build it up  to a Lamborghini slowly.

I don’t have a straight answer as to how much programming, statistics or mathematics you will need, it’s really depends on who is hiring and the project they want to build. Typically there is a spectrum of employers ranging from Research Labs to IT shops on the other end. If your employer is a research lab, they will want some serious stats / maths background, as opposed to an IT shop which will want more programmers and engineers who knows how to write clean code that scales well. That is why you will need to research which types of project you want to be part of and also the type of person you are. Personally, I am a hands-on type of guy, so I prefer building products and seeing it in action for the major part of my time, with some R&D on the side. Hopefully you will get interviewed by someone who knows what they are looking for, because there are many stories where candidate interviewed for a data scientist but are expected to be a developer, and guess what, they quit their job very quickly.

That been said, one thing for sure is that you will need to be able to learn and adapt quickly whether it’s stats, maths, programming, communication, domain knowledge or crisis control (yep, if your system doesn’t fall over at least once, you are not playing with big data). You will be faced with challenges every single day, if you are not, then you are not innovating, and are simply doing what has been done before and that’s not the spirit of data science.

That’s why I want to end off with a quote that sums up this post.

Science is not only a disciple of reason but, also, one of romance and passion.

— Stephen Hawking

On the Next Episode

Next, we will travel to the world of data science and see how important each of the components are, you might be surprised by the answer! Hope you have enjoyed this, any questions and comments are welcome.

Until next time, stay curious!

My Big Data Experience – Part III: Building the Right Team and Breaking the Business Model

My Big Data Experience – Part III: Building the Right Team and Breaking the Business Model

Last post I wrote about the importance of having a business case and investing in core data science skills instead of technologies. This post I will continue the same idea of a good business case and then building a high performance team to achieve this and provide value.

**WARNING** It’s a long read, also very opinionated **WARNING**

Here are the other posts in case you have missed it:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

Short Story

Months has gone by since you joined and the CEO asked you to extract insights from company’s data. While you were gathering your thoughts in your familiar nine-to-five cubicle, suddenly a cold chill ran down your spine as you noticed a reflection on the screen and you heard a soft voice: “What can you show me?”.

You scurried around to close the browser containing your favorite morning news sites and struggled to find the file to present your analysis. “Yes, got it!” as you double-click the file. You smiled awkwardly at you boss as the progress bar took another step.

The CEO stood silently.

Gone through the sales pitch, displayed trends, graphs after graphs and explained each one of them. You paused, looking for some feedback. The CEO said: “It is interesting that you can identify all the right-handed customer based on data analysis and machine learning, but how is that information going to help our banking business? Are you also going to spend another few months to find the left-handed customer too?”

Insights Does Not Mean Value

The story might be silly and simple, but I hope it delivered the idea. Just because you can find insights from data, doesn’t mean it’s useful. That is why I need to emphasize on the importance of a business case. Having a business case will guide you to identify and generate the correct insights, specifically the end goal in mind as well as all the processes in between.

For example: assuming you have a dataset of customer transactions, without a business case you would look at the data, generate some features, play with charts, finding correlations between features without knowing what you wanted. On the other hand, when you do have a business case such as “I want to prevent credit card fraud” or “I want to predict the next customer purchase”, you start thinking differently. Like “Oh, I wonder how many times the customer went to that specific shops?”, “What is the characteristics of a credit card fraud?”, “What are the previous sequences of items that the customer bought?”. By answering these questions, it would generate specific features that might help solve the problem. That is why the definition of a business case is super important.

This may sound rather silly and redundant and you may think: “Of course you need a business case! Everybody knows that! Duh!?”. You would be surprised how many people want big data / data science without a business case and thinks that machine learning is the silver bullet to improve their business (yep, the hype is real). Sometimes you will need to communicate with stakeholders multiple times to clarify the business case, because honestly, sometimes they don’t even know what they want.

Value Does Not Mean Insights

Here you will need to distinguish between big data and data science. So far I have been talking about them synonymously (my bad). Simply put, big data can be regarded as the technology stack and its implementation; data science is the analysis of data and extracting of insights using statistical methods and machine learning. When used well together, it can make black magic happen (because you know, it’s a black box… haha…).

Just by using Big Data you could already cut down licensing costs, streamline processes, deduplicate data, streaming, distributed processing, monitoring and many more. These use cases are examples that doesn’t require data science, and could potentially transform your business already. Who needs data science anyway!

So What Exactly is Value?

Think of it like this, you go to McDonald and you ordered and paid for 10 pieces of Chicken McNuggets. You opened the box and discovered 11 pieces of Chicken McNuggets inside, you get more than you put in and get that warm happy feeling on the inside. More concretely, value doesn’t always mean money, it’s could be time saved, convenience, quality of life changes etc. and can be seen as such:

A product or service that the users will love and is feasible for the engineers to produce.

Remember, business case comes first then determine the Minimum Viable Products (MVP) to be built. This concept is very important, it allows you to determine the resources you need, get the product to the customer fast, quick feedback, and fail fast. My boss once said: “Perfection in software engineering is just too expensive”, hence define the end goal of your MVP first. My recommendation is to go for the many small wins instead of the one big win, eventually you will build up enough foundation / pipeline with the small wins to achieve the big win continuously.

Even though data science could produce insights and models that could potentially change the business. The problem is that data science project generally takes a very long time to complete, not to mention most of the time it would only be sitting on the laptop of your resident data scientist. If you could not productionise your data product, it would not be helpful to the business and instead only a cost to the business. Sometimes it is also not feasible for the engineers to productionise the data product, for instance, even though Apache Spark is growing rapidly, it is still missing quite a bit of machine learning functionalities and cannot produce complex models, not to mention that crunching big data with parallel processing (another topic for discussion later) is rather complicated and tricky compared to the sample dataset that data scientists generally uses.

I am over simplifying this because in reality value generation is actually quite difficult to accomplish. You will need to a strong management with very clear defined strategy with an end goal and a strong team to deliver the result. There are forces beyond technology that will prevent value generation. Also the stages of building a data product is quite complex and deserves a topic on it own, which I will not discuss here.

Breaking the Business Model

If we are talking about value generation, we also need to talk about obstacles that prevent value generation. With every new technology comes a healthy or not so healthy dose  of scepticism and cynicism. The hardest aspect of implementing big data is the mind-set, the bigger the organisation the harder it is to implement. Here are a few examples of obstacles, they might be obvious but still worth mentioning:

1. Misunderstanding of the Technology (can or cannot do)

How many times have you read that machine learning is going to cure cancer and create world peace, or that the robots are coming to take your job and kill all humans, or how big data is going to solve all your business problems? These same articles doesn’t tell you about the limitations of machine learning and big data, do they? So, it is important that you understand the business strategy how to choose and implement the correct big data technology to drive that strategy. Nothing breaks my heart to see a technology being used improperly and then say that big data does not work, like people trying to use HDFS / HBase as a traditional relational database and then say that it doesn’t work for them, blaming big data doesn’t provide value and a waste of money.

2. Bad Communication between Team Members

If you ever have spare time go watch your data scientist and engineers argue, it’s quite entertaining (they are such different people). Both has their point of view, but sooner or later they will be at an impasse. This is because the data pipeline is long and complicated and can be regarded as a big circle. The end product need to provide a feedback loop back to the source and the source needs to provide data to the end product. At any given point in the pipeline when one party of the project is not communicating with the other parties, the project will come to a halt. Some examples: when data scientist cannot get their data to work their black magic from the engineers; or when the engineer cannot get any explanation to what the model does from the data scientist; or when one data scientist makes one business assumption and another data scientist makes a different business assumption regarding the data without consulting the domain expert, which can spells big trouble when the system goes live i.e. think about interest rate adjustment for a bank, or credit check etc.

3. Fear

With all the news around how big data and artificial intelligence will replace jobs, why would any human want to lose their job? Fear brings about uncertainty and resistance to implementing new strategies and technologies. Imagine if you were a data analyst and your job is to deliver reports and present findings. Now that big data can do it in real-time, why do they even need you. It’s simply cheaper and faster when a machine does it.

4. Politics

This is one of my biggest headache in a large corporation, and a game that I refuse to play. Needless to say, everyone has their own agenda working in the company and if your department is the spanky new big data hot-shot in the company, you will also be the spanky new hot target to shoot down, where everyone is waiting for you to fail. Imagine when your company fails to improve their sales, the big data team get blamed for not having a recommender system that personalise customer’s needs. There are many more political scenarios that goes on in a corporate, that I am not aware of. At the end, politics are not a good thing, a good healthy competition between the teams are okay, politics are a no-no because it drives down morale and also creates hostility among colleagues which mean no team work.


If your organisation has all the above mentioned issues, then you don’t need big data because you already have big problems which is by far more costlier than big data. Now what do I mean by breaking the business model, big data implementation requires new ways of thinking:

  • Instead of batching, think streaming.
  • Instead of overnight processesing, think real-time processing.
  • Instead of rule based decisions, think predictive / generative models.
  • Instead of customisation, think personalisation
  • Instead of firing and retrenching people, think about re-assigning and freeing up resources to do more valuable projects that machines cannot do.
  • Instead of thinking “use all the data we’ve got”, think business cases and hand-pick the relevant data to use.
  • etc.

These new ways of thinking will surely change the business model over time and improve the processes and hopefully leads to profitability. One other main concern that I need to talk about is around ethics and privacy which I cannot over emphasise enough. Just because you CAN collect data from your customer, doesn’t mean you SHOULD. Just imagine that data getting out (internally or externally), what is the chance of company getting sued or face reputation loss? Sometimes you even need to exclude a deterministic feature out of your model because it is private and sensitive. For example: a simple predictive model to determine whether to grant or reject a home loan, but the key deterministic feature is based on their gender or race? How would you be able to explain to the customers that their application got rejected? Will you be able to defend yourself in the court of law? Also, if you ever have doubts on whether you should or shouldn’t be collecting specific data or use them, the chances are that it is not ethical and you shouldn’t collect and use them.

Building a High Performance Team

Now that you have everything in place: a business case, strong management support and strategy. You are ready to build a team to deliver results. Different stages of the data pipeline requires different team compositions, generally planning a big data team is like planning a heist or baking a cake. General rule of thumb, keep it small at about 5 ~ 10 people:

  • 1 Data Architect
  • 1 ~ 2 Data Admins
  • 1 ~ 2 Data Scientists
  • 2 ~ 4 Data Engineers
  • 1 Domain Expert / Product Manager
  • 2 Eggs
  • 1 Cup of Flour
  • Bake for 15 min at 180 degree °C
  • A sense of humour

Yes, yes, skill shortages blah blah blah. Allow me to be brutally honest, I think skills shortage is a myth partly because some of the hiring criteria / requirements are rather restrictive, much like putting people into a shoe box for people with small feet. Big Data is so new and tools out there are still developing, and if you want to create value you will need to be creative to get around all the limitations of the technology or build your own. So why put people in a shoe box where you need a very specific knowledge on specific technology or machine learning, it’s just not practical because the landscape is changing so fast. Especially when you don’t have a business case but you are looking for people with HDFS / Kafka / Spark  / HBase etc. these technology might not even be suitable for your business case even though they are popular. What you need is someone who is passionate, flexible, creative and a willingness to adapt and learn.

It’s true that you need specific skillset for specific roles but what really works well is a team with overlapping skills. For instance, engineer with some data science knowledge, architect with machine learning, admin with statistics, product manager with a bit of everything etc. Passionate people are driven, you don’t need to force them to learn or research the big data technology, they will come to you with many solutions. The key factor is a focus on the business case and constant communication, this drives ideas and implementation, especially when the overlapping of skillset kicks in, people with different skill backgrounds think in different fantastic ways.

Sum It Up!

In case you haven’t got it, you will need a business case, define the minimum viable product with clearly defined end goals. With a strong management support and a passionate team you can create value through constant communication.

Next, I will go through the importance of programming and statistics in the data pipeline, and continuing the theme of value generation on how these two aspects plays a role in the stages of data development as well as the ratio between the two. Hope you enjoyed this, any questions and comments are welcome!

Until next time, have a nice day!

My Big Data Experience – Part II: Spare Time and Getting the Job

My Big Data Experience – Part II: Spare Time and Getting the Job

So you have decided on the role you wanted to play in the big data space, and you know this is the path for you. Great! If not, I guess you will find out soon enough.

Here are the other posts in case you have missed it:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics.
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

Spot the fluff

So here you go, you need to know all this to create your big data products. Enjoy!

This is where you get lost as a newbie (source here)

Don’t worry, you don’t need know them all, just technologies you need to get the job done. What technologies should we use then? Imagine you go to a hardware store, walk down the aisle and see hammers, nails, screws, spanner, whole bunch of other tools and a feather duster that your mom used to hit you with. Which one do you buy? All of them? Definitely not, you are on a budget so what do you do? This is where Big Data Vendors comes to the rescue?

Some examples of big data vendors / platforms (source here)

These vendor prepackaged a technology stack for businesses and provide supports, cutting down all the unnecessary fluff to get the job done. BUT, as we know, technology changes rapidly and do expect tech consolidation soon – meaning some technologies will be eliminated. Not to mention big data is still very much hyped. So what now?

The question is not really about the technology, it’s about the skill sets. Skills can be transferred and applied to different technologies and companies. Technology may not, since two companies of the same industry may have different technology stack.

Spare time

Skills that are required for data science all have a few things in common:

  • Soft skills (Verbal and Written Communication Skills)
  • Business Acumen / Domain Knowledge
  • Data wrangling
  • Programming skills
  • Distributed file stores
  • Parallel processing
  • Machine Learning / Statistics / Linear Algebra
  • Real-time data streaming
  • Real-time predictive analysis
  • Data visualization

Back in my previous post I mentioned the importance of knowing which role you wanted to play, this is where it will play a big part. For example, if you are a data engineer then your main focus would be parallel processing (functional programming) and setting up data infrastructure, or if you are a data scientist then machine learning, statistics and sampling strategy are important.

One thing to keep in mind is that technologies are merely tools and will change over time but skill set will remain useful and be applied to new technologies. Always pick a technology with a strong community that is tried and tested, and invest your time learning these tools as needed. Simply put, be a skilled artisan that can use different tools (new or old), like learning the hammering skill instead of learning about the hammer.

Talk technologies, think skill sets, then focus!

You will need to focus, having knowledge in a little of everything sound good on paper but when it comes to doing the job, you will struggle. Build up your other area of expertise with time, one bite at a time, that’s how you will make progress. By the time you are proficient in all the technologies, big data would be out of fashion. So that’s out of the way, how do I get into data science?

Getting the Job

This is probably the hard part, but not impossible. Before you go out and find a data analytics job, here is something to keep close to your heart, and if you don’t have one (since we are all robots), keep it on a post-it so you can look at it everyday.

It’s easy to get into data science, it is hard to stay a data scientist.

Now, if you are in a position where your company is looking for data scientist, you are one of the lucky few, you just need to apply.

Here are some tips for the rest:

  1. Pet Projects – do a project that you are interested in, like predicting property or stock prices to name a few.
  2. Kaggle competition – plenty of practice and potential prize to be won.
  3. Show initiative – create the opportunity in the workplace by implementing machine learning, drive the change from the bottom up.
  4. Local meetups – Network, network, network, go meet other data scientists.
  5. Learn to learn fast.

Words of caution:

The only thing I find big about big data is the hype, it’s really big, which is a problem for job applications. This means that companies may or may not know what they are looking for, also there will be many competitions as well. On multiple occasions, I had companies asking for x, interviewed y, tested me on z and in the end doesn’t know what they want because they didn’t have a business case, they wanted big data just because they don’t want to miss the boat. My honest suggestion is to focus on building skill sets and wait for the hype to die down, then look for the surviving company to invest your career.

Sum it up!

Data science is not a get rich fast scheme, it’s about passion for problem solving and analysis, so don’t be in a rush to get into data science, take your time and enjoy the learning then apply it into your work.  Hope you enjoyed reading this post. On the next post, I will be talking about how to add value to the business.

My Big Data Experience – Part I: Where should I start?

My Big Data Experience – Part I: Where should I start?

Few years ago, eating with my usual lunch group, one colleague mentioned “beeeeeg data”. That was the first time I heard the term, it sounded really silly and I joked, “what is this new hype now, it sounded like a sales pitch to sell me a 3D-TV” like any other hype train, I had no intention of jumping on.

And then here I am, on that said train sharing my experiences. I wanted to share my thought working in a data science domain. Perhaps I could provide some value towards the data science community with my experiences. I will update them as time goes on, here are some topics that I wanted to share over time:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics.
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

What you have probably heard about big data / data science

  1. Money!
  2. Hottest job at the moment
  3. Machine Learning / Artificial Intelligence
  4. Maths, stats and programming required
  5. Data munging
  6. But mostly this diagram


These ideas are thrown around fairly often, but there are multiple roles within the data science domain. For example: Data Analysts, Data Scientists, Business Analysts, Data Engineer, Data Architect, Statistician, Database Administrator, and Data and Analytics Manager. You can read the article that explains the different role within the field of data science here (source: kdnuggets).

I believe this is one main reason why companies are struggling to find data scientists. There is this perception that data science can be done by one person instead of a team, similar to hiring a full-stack developer where the developer must know how to code the front-end, back-end, design the user interface, and setting up the production environment. Company should instead look at different roles that fit a data science team. It is fairly rare to find one person with all the required skill-sets, hence the term “Unicorn” is used to describe data scientists.

What people don’t tell you

  1. Data science is a team effort, even if you are an “Unicorn” that can do everything.
  2. Communication and sharing idea with team members is super important, it is necessary to cross validate each others ideas.
  3. You will need to learn to speak to non-technical people using easy to understand business examples.
  4. You will need to ask lots of business questions across multiple departments to get a better idea of the business processes to do your analysis.
  5. Data product will take a long time to develop, especially when you do not have enough data and data collection is very time consuming.
  6. You will have great difficulty obtaining data, because of politics, red tapes and lack of infrastructure.
  7. You will have difficult time working with data, because there are no ways to properly  link up multiple sources and data integrity is a real issue (raw vs. processed data).
  8. Data munging and exploratory data analysis will take up 80% of your time.
  9. Machine learning is fun but it is only 20% of the work.
  10. You will have to constantly remind people that data munging and feature engineering is more important than machines learning and therefore should spend more time there.
  11. You will have enormous pressure from management to provide value in terms of a data product especially when they do not understand the data science methodology, this can be solved by providing small findings continuously.
  12. You will be in meetings, a lot.
  13. There will be many projects that you need to deliver and you will need to learn to prioritise your tasks and juggle between projects.
  14. Every business problem is unique, there is no one method to solve all of them. You will need to spend your spare time learning and researching the problem and upskill yourself.
  15. Most companies don’t have a BIG data problem.
  16. You need to be passionate about the data, this is not negotiable.


My final suggestion to anyone who wanted to start a career in data science is to answer these questions honestly then go forward:

  1. Define the role you wanted to play.
  2. Do you have a passion for data science?
  3. Are you willing to spend time after work to improve yourself?

I believe if you have the right attitude and aptitude, you will be successful in any path in life. Good luck!

Do it right the first time and then what?

What would you do if you were given a house with a collapsed ceiling and some tools? Would you start hammering away at every broken ceiling tiles or find the problems to why the ceiling collapsed? Maybe another pillar was needed to support the structure or perhaps the ceiling were simply old? Either way, strap on your tool belt, put on a helmet and say: “I am going in!”

Do it right the first time!

Software development is much like building a house, you start with the design and foundations and then you build upwards and “Do it right the first time!” is actually the easy part. The amount of design patterns, cookie-cutter architectures, and boiler-plate code that are available online makes it relatively easy to setup, especially with Maven Archetypes (Boom! Project setup done). The system may have a few custom components here and there but the process is relatively simple, there won’t be many obstacles implementing a new system. All you need is a good architecture and the rest will fit in like Lego pieces.

A good initial design helps speed development process and maintenance but in this non-perfect world with growing functionalities, deadline, business pressure to deliver, copy and pasting, lack of code maintenance, refactoring and review. Year after year, the system accumulates technical debt and hopefully your company can pay its interest on time. How does the existing framework fare now, after one year, five years or ten years?

So back to the initial example of fixing a house, you have a broken ceiling what would you do? I believe most people would say: “Let’s find the problem, fix it so it doesn’t happen again”. Unfortunately, because of technical debt, sometimes the problem is buried so deep within the system that the root cause cannot be identified because it is caused by a combination of existing problems. So the developer just patches (*cough* hack *cough*) it and pray to Linus Torvalds that it does not break while they are on production support duty.

Now, let’s talk business.

How much do you think this kind of problem is going to cost you? Technical problems such as: hard coding, lazy copy and paste, hard to understand code, duplicating work etc., these issues cause new projects to take longer to implement and it is more prone to bugs. Not to mention the amount of resources you will need to maintain and debug the system. How much time are your developers spending on fixing (patching) the system instead of development? Are you hiring developers for the right reasons?

I had a project where every time we wanted to implement a new functionality, it would require us to modify 28 files and multiple database tables just to make the functionality appear in the correct place excluding the actual implementation of the function because of all the hard coding and bad architecture design. This process usually takes between 1 ~ 2 weeks to completed, then another few weeks just to code the logic for the new functionality. Now, imagine doing this every single release cycle. Eventually I asked my manager for some time (approx. 3 months) to refactor the system and another month for regression testing. Now it only takes three SQL statements (approx. 5 minutes) to complete the same task. Spending some effort to refactor saves money and time in the long run and certainly makes the developer’s job easier.

In conclusion, good initial design is really important, however, it is not enough. Doing it right is not just about the initial setup but also continuous refactoring and streamlining the system via code review and system (redundancy) analysis. It’s like cleaning a house, if you clean your house everyday then you will have less to do and everyone will benefit.