My Big Data Experience – Part IV: Programming and Statistics

My Big Data Experience – Part IV: Programming and Statistics

Last post I spoke about value, MVP (minimum viable product), obstacles, as well as building a high performance team. This post I would like to talk a little more about value proposition, skill sets and how it contributes to MVP.

Here are the other posts in case you have missed it:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics.
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

Secret to Success

It’s passion! No, I am not talking about that tingling feeling you get in your pant when you see an attractive guy / girl / sheep. I am talking about the one thing that wakes you up in the morning. That’s right boys and girls, passion is: Coffee

NO! The other thing, the drive, the motivation that keeps you going! For some it’s their children or running a pet shop, for others it about saving humanity. Why is this important? When passion is combined with skills, you can achieve great things.

If passion is like an engine, skills are the oil that runs it.

Especially in big data and data science, there are countless challenges and hardship waiting for you. You are going to hit brick wall after brick wall (that’s why my face is so messed up… *cry in the corner*): when technologies doesn’t work together, when code runs on local but fails on the cluster, when new libraries is not backward compatible with old code, when your model cannot be productionised, the list goes on.

If you are passionate about the work you are doing, these challenges are still hard but you will have the heart to take them on, one after another. I believe this applies to any business or field. Without passion, you won’t last long in this field. I know people who works in big data for money and they hated the job after a few months in, and very quickly they would change to a different role. Big data is constantly evolving and people are still trying to find and establish best practices. Even when you are passionate, it is already hard to keep up with the technologies and methodologies not to mention when you are not. It is moving so fast that no one person could keep up to date and still deliver value.

Unqualified Need Not Apply

So how much skills do you need to accomplish the tasks at hand?

Does this look familiar:

“We are looking for a candidate with at least 8+ years of commercial experience in Java, Python, Scala, R. Must have masters or doctors in mathematics, statistics, or computer science. Must have worked with Hadoop, Spark, Hive, HBase, Kafka, Storm, Elasticsearch, Kibana. Must have experience in the cloud services i.e. AWS. You will also meet stakeholders to gather and analyse business requirements and build predictive models using machine learning; required to extract insights from data and present findings and visualisations to C-suite. Beneficial if you have solved world peace, saved Lois Lane, and have been elected as a president of a country”.

The scary thing is that I have seen jobs with descriptions similar to this…

Choose a Car!

collage-cars-4786121
(Image Source)

Anyone chose the Mercedes SMART (bottom right)?

Everybody wants to be able to build / own a Ferrari, Lamborghini or some other luxury car, but do you need it? Similarly, companies wants to be able to be the next Google, building the next DeepMind that kicks ass and chew bubble gum, but will it solve your current business problem? With this kind of mentality (overly-hyped-about-how-big-data-and-machine-learning-will-make-your-mother-in-law-happy), companies wants to hire the best with all the bells and whistles.

But all you really need is this:

basic_car
(Image Source)

Look at how awesome this car is: wheels, passenger seat, engine, moves you from A to B. What more do you need? Are you still thinking about that schweet-schweet Lamborghini, be honest now…

Similarly, you can keep dreaming about your next Big-Digital-Disruption (I still don’t understand what this means) but it would be more practical if you stick to the basics, hence it is so important to emphasise on a business case and the MVP for it, so you know what “done” looks like. Why? Because it will start to provide value which means your boss will be happy to keep you on the payroll, having a Lamborghini is the cherry on top.

So… ask yourself:

  • What skill sets do you need to build a Lamborghini?
  • What skill sets do you need to build a basic car?

Lamborghini Engineer’s skill sets vs. Mechanic’s skill sets, it’s very different right?

So what do you really need?

Just enough to get the job done!

Side Note: If you don’t even know how to build the basic car, don’t even think about building a luxury car. When you have the basics, then re-evaluate the next MVP, hire or train accordingly. Rinse and repeat to build it up  to a Lamborghini slowly.

I don’t have a straight answer as to how much programming, statistics or mathematics you will need, it’s really depends on who is hiring and the project they want to build. Typically there is a spectrum of employers ranging from Research Labs to IT shops on the other end. If your employer is a research lab, they will want some serious stats / maths background, as opposed to an IT shop which will want more programmers and engineers who knows how to write clean code that scales well. That is why you will need to research which types of project you want to be part of and also the type of person you are. Personally, I am a hands-on type of guy, so I prefer building products and seeing it in action for the major part of my time, with some R&D on the side. Hopefully you will get interviewed by someone who knows what they are looking for, because there are many stories where candidate interviewed for a data scientist but are expected to be a developer, and guess what, they quit their job very quickly.

That been said, one thing for sure is that you will need to be able to learn and adapt quickly whether it’s stats, maths, programming, communication, domain knowledge or crisis control (yep, if your system doesn’t fall over at least once, you are not playing with big data). You will be faced with challenges every single day, if you are not, then you are not innovating, and are simply doing what has been done before and that’s not the spirit of data science.

That’s why I want to end off with a quote that sums up this post.

Science is not only a disciple of reason but, also, one of romance and passion.

— Stephen Hawking

On the Next Episode

Next, we will travel to the world of data science and see how important each of the components are, you might be surprised by the answer! Hope you have enjoyed this, any questions and comments are welcome.

Until next time, stay curious!

My Big Data Experience – Part III: Building the Right Team and Breaking the Business Model

My Big Data Experience – Part III: Building the Right Team and Breaking the Business Model

Last post I wrote about the importance of having a business case and investing in core data science skills instead of technologies. This post I will continue the same idea of a good business case and then building a high performance team to achieve this and provide value.

**WARNING** It’s a long read, also very opinionated **WARNING**

Here are the other posts in case you have missed it:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

Short Story

Months has gone by since you joined and the CEO asked you to extract insights from company’s data. While you were gathering your thoughts in your familiar nine-to-five cubicle, suddenly a cold chill ran down your spine as you noticed a reflection on the screen and you heard a soft voice: “What can you show me?”.

You scurried around to close the browser containing your favorite morning news sites and struggled to find the file to present your analysis. “Yes, got it!” as you double-click the file. You smiled awkwardly at you boss as the progress bar took another step.

The CEO stood silently.

Gone through the sales pitch, displayed trends, graphs after graphs and explained each one of them. You paused, looking for some feedback. The CEO said: “It is interesting that you can identify all the right-handed customer based on data analysis and machine learning, but how is that information going to help our banking business? Are you also going to spend another few months to find the left-handed customer too?”

Insights Does Not Mean Value

The story might be silly and simple, but I hope it delivered the idea. Just because you can find insights from data, doesn’t mean it’s useful. That is why I need to emphasize on the importance of a business case. Having a business case will guide you to identify and generate the correct insights, specifically the end goal in mind as well as all the processes in between.

For example: assuming you have a dataset of customer transactions, without a business case you would look at the data, generate some features, play with charts, finding correlations between features without knowing what you wanted. On the other hand, when you do have a business case such as “I want to prevent credit card fraud” or “I want to predict the next customer purchase”, you start thinking differently. Like “Oh, I wonder how many times the customer went to that specific shops?”, “What is the characteristics of a credit card fraud?”, “What are the previous sequences of items that the customer bought?”. By answering these questions, it would generate specific features that might help solve the problem. That is why the definition of a business case is super important.

This may sound rather silly and redundant and you may think: “Of course you need a business case! Everybody knows that! Duh!?”. You would be surprised how many people want big data / data science without a business case and thinks that machine learning is the silver bullet to improve their business (yep, the hype is real). Sometimes you will need to communicate with stakeholders multiple times to clarify the business case, because honestly, sometimes they don’t even know what they want.

Value Does Not Mean Insights

Here you will need to distinguish between big data and data science. So far I have been talking about them synonymously (my bad). Simply put, big data can be regarded as the technology stack and its implementation; data science is the analysis of data and extracting of insights using statistical methods and machine learning. When used well together, it can make black magic happen (because you know, it’s a black box… haha…).

Just by using Big Data you could already cut down licensing costs, streamline processes, deduplicate data, streaming, distributed processing, monitoring and many more. These use cases are examples that doesn’t require data science, and could potentially transform your business already. Who needs data science anyway!

So What Exactly is Value?

Think of it like this, you go to McDonald and you ordered and paid for 10 pieces of Chicken McNuggets. You opened the box and discovered 11 pieces of Chicken McNuggets inside, you get more than you put in and get that warm happy feeling on the inside. More concretely, value doesn’t always mean money, it’s could be time saved, convenience, quality of life changes etc. and can be seen as such:

A product or service that the users will love and is feasible for the engineers to produce.

Remember, business case comes first then determine the Minimum Viable Products (MVP) to be built. This concept is very important, it allows you to determine the resources you need, get the product to the customer fast, quick feedback, and fail fast. My boss once said: “Perfection in software engineering is just too expensive”, hence define the end goal of your MVP first. My recommendation is to go for the many small wins instead of the one big win, eventually you will build up enough foundation / pipeline with the small wins to achieve the big win continuously.

Even though data science could produce insights and models that could potentially change the business. The problem is that data science project generally takes a very long time to complete, not to mention most of the time it would only be sitting on the laptop of your resident data scientist. If you could not productionise your data product, it would not be helpful to the business and instead only a cost to the business. Sometimes it is also not feasible for the engineers to productionise the data product, for instance, even though Apache Spark is growing rapidly, it is still missing quite a bit of machine learning functionalities and cannot produce complex models, not to mention that crunching big data with parallel processing (another topic for discussion later) is rather complicated and tricky compared to the sample dataset that data scientists generally uses.

I am over simplifying this because in reality value generation is actually quite difficult to accomplish. You will need to a strong management with very clear defined strategy with an end goal and a strong team to deliver the result. There are forces beyond technology that will prevent value generation. Also the stages of building a data product is quite complex and deserves a topic on it own, which I will not discuss here.

Breaking the Business Model

If we are talking about value generation, we also need to talk about obstacles that prevent value generation. With every new technology comes a healthy or not so healthy dose  of scepticism and cynicism. The hardest aspect of implementing big data is the mind-set, the bigger the organisation the harder it is to implement. Here are a few examples of obstacles, they might be obvious but still worth mentioning:

1. Misunderstanding of the Technology (can or cannot do)

How many times have you read that machine learning is going to cure cancer and create world peace, or that the robots are coming to take your job and kill all humans, or how big data is going to solve all your business problems? These same articles doesn’t tell you about the limitations of machine learning and big data, do they? So, it is important that you understand the business strategy how to choose and implement the correct big data technology to drive that strategy. Nothing breaks my heart to see a technology being used improperly and then say that big data does not work, like people trying to use HDFS / HBase as a traditional relational database and then say that it doesn’t work for them, blaming big data doesn’t provide value and a waste of money.

2. Bad Communication between Team Members

If you ever have spare time go watch your data scientist and engineers argue, it’s quite entertaining (they are such different people). Both has their point of view, but sooner or later they will be at an impasse. This is because the data pipeline is long and complicated and can be regarded as a big circle. The end product need to provide a feedback loop back to the source and the source needs to provide data to the end product. At any given point in the pipeline when one party of the project is not communicating with the other parties, the project will come to a halt. Some examples: when data scientist cannot get their data to work their black magic from the engineers; or when the engineer cannot get any explanation to what the model does from the data scientist; or when one data scientist makes one business assumption and another data scientist makes a different business assumption regarding the data without consulting the domain expert, which can spells big trouble when the system goes live i.e. think about interest rate adjustment for a bank, or credit check etc.

3. Fear

With all the news around how big data and artificial intelligence will replace jobs, why would any human want to lose their job? Fear brings about uncertainty and resistance to implementing new strategies and technologies. Imagine if you were a data analyst and your job is to deliver reports and present findings. Now that big data can do it in real-time, why do they even need you. It’s simply cheaper and faster when a machine does it.

4. Politics

This is one of my biggest headache in a large corporation, and a game that I refuse to play. Needless to say, everyone has their own agenda working in the company and if your department is the spanky new big data hot-shot in the company, you will also be the spanky new hot target to shoot down, where everyone is waiting for you to fail. Imagine when your company fails to improve their sales, the big data team get blamed for not having a recommender system that personalise customer’s needs. There are many more political scenarios that goes on in a corporate, that I am not aware of. At the end, politics are not a good thing, a good healthy competition between the teams are okay, politics are a no-no because it drives down morale and also creates hostility among colleagues which mean no team work.

Alternatives

If your organisation has all the above mentioned issues, then you don’t need big data because you already have big problems which is by far more costlier than big data. Now what do I mean by breaking the business model, big data implementation requires new ways of thinking:

  • Instead of batching, think streaming.
  • Instead of overnight processesing, think real-time processing.
  • Instead of rule based decisions, think predictive / generative models.
  • Instead of customisation, think personalisation
  • Instead of firing and retrenching people, think about re-assigning and freeing up resources to do more valuable projects that machines cannot do.
  • Instead of thinking “use all the data we’ve got”, think business cases and hand-pick the relevant data to use.
  • etc.

These new ways of thinking will surely change the business model over time and improve the processes and hopefully leads to profitability. One other main concern that I need to talk about is around ethics and privacy which I cannot over emphasise enough. Just because you CAN collect data from your customer, doesn’t mean you SHOULD. Just imagine that data getting out (internally or externally), what is the chance of company getting sued or face reputation loss? Sometimes you even need to exclude a deterministic feature out of your model because it is private and sensitive. For example: a simple predictive model to determine whether to grant or reject a home loan, but the key deterministic feature is based on their gender or race? How would you be able to explain to the customers that their application got rejected? Will you be able to defend yourself in the court of law? Also, if you ever have doubts on whether you should or shouldn’t be collecting specific data or use them, the chances are that it is not ethical and you shouldn’t collect and use them.

Building a High Performance Team

Now that you have everything in place: a business case, strong management support and strategy. You are ready to build a team to deliver results. Different stages of the data pipeline requires different team compositions, generally planning a big data team is like planning a heist or baking a cake. General rule of thumb, keep it small at about 5 ~ 10 people:

  • 1 Data Architect
  • 1 ~ 2 Data Admins
  • 1 ~ 2 Data Scientists
  • 2 ~ 4 Data Engineers
  • 1 Domain Expert / Product Manager
  • 2 Eggs
  • 1 Cup of Flour
  • Bake for 15 min at 180 degree °C
  • A sense of humour

Yes, yes, skill shortages blah blah blah. Allow me to be brutally honest, I think skills shortage is a myth partly because some of the hiring criteria / requirements are rather restrictive, much like putting people into a shoe box for people with small feet. Big Data is so new and tools out there are still developing, and if you want to create value you will need to be creative to get around all the limitations of the technology or build your own. So why put people in a shoe box where you need a very specific knowledge on specific technology or machine learning, it’s just not practical because the landscape is changing so fast. Especially when you don’t have a business case but you are looking for people with HDFS / Kafka / Spark  / HBase etc. these technology might not even be suitable for your business case even though they are popular. What you need is someone who is passionate, flexible, creative and a willingness to adapt and learn.

It’s true that you need specific skillset for specific roles but what really works well is a team with overlapping skills. For instance, engineer with some data science knowledge, architect with machine learning, admin with statistics, product manager with a bit of everything etc. Passionate people are driven, you don’t need to force them to learn or research the big data technology, they will come to you with many solutions. The key factor is a focus on the business case and constant communication, this drives ideas and implementation, especially when the overlapping of skillset kicks in, people with different skill backgrounds think in different fantastic ways.

Sum It Up!

In case you haven’t got it, you will need a business case, define the minimum viable product with clearly defined end goals. With a strong management support and a passionate team you can create value through constant communication.

Next, I will go through the importance of programming and statistics in the data pipeline, and continuing the theme of value generation on how these two aspects plays a role in the stages of data development as well as the ratio between the two. Hope you enjoyed this, any questions and comments are welcome!

Until next time, have a nice day!

My Big Data Experience – Part II: Spare Time and Getting the Job

My Big Data Experience – Part II: Spare Time and Getting the Job

So you have decided on the role you wanted to play in the big data space, and you know this is the path for you. Great! If not, I guess you will find out soon enough.

Here are the other posts in case you have missed it:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics.
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

Spot the fluff

So here you go, you need to know all this to create your big data products. Enjoy!

matt_turck_big_data_landscape_full
This is where you get lost as a newbie (source here)

Don’t worry, you don’t need know them all, just technologies you need to get the job done. What technologies should we use then? Imagine you go to a hardware store, walk down the aisle and see hammers, nails, screws, spanner, whole bunch of other tools and a feather duster that your mom used to hit you with. Which one do you buy? All of them? Definitely not, you are on a budget so what do you do? This is where Big Data Vendors comes to the rescue?

16-top-big-data-analytics-platforms
Some examples of big data vendors / platforms (source here)

These vendor prepackaged a technology stack for businesses and provide supports, cutting down all the unnecessary fluff to get the job done. BUT, as we know, technology changes rapidly and do expect tech consolidation soon – meaning some technologies will be eliminated. Not to mention big data is still very much hyped. So what now?

The question is not really about the technology, it’s about the skill sets. Skills can be transferred and applied to different technologies and companies. Technology may not, since two companies of the same industry may have different technology stack.

Spare time

Skills that are required for data science all have a few things in common:

  • Soft skills (Verbal and Written Communication Skills)
  • Business Acumen / Domain Knowledge
  • Data wrangling
  • Programming skills
  • Distributed file stores
  • Parallel processing
  • Machine Learning / Statistics / Linear Algebra
  • Real-time data streaming
  • Real-time predictive analysis
  • Data visualization

Back in my previous post I mentioned the importance of knowing which role you wanted to play, this is where it will play a big part. For example, if you are a data engineer then your main focus would be parallel processing (functional programming) and setting up data infrastructure, or if you are a data scientist then machine learning, statistics and sampling strategy are important.

One thing to keep in mind is that technologies are merely tools and will change over time but skill set will remain useful and be applied to new technologies. Always pick a technology with a strong community that is tried and tested, and invest your time learning these tools as needed. Simply put, be a skilled artisan that can use different tools (new or old), like learning the hammering skill instead of learning about the hammer.

Talk technologies, think skill sets, then focus!

You will need to focus, having knowledge in a little of everything sound good on paper but when it comes to doing the job, you will struggle. Build up your other area of expertise with time, one bite at a time, that’s how you will make progress. By the time you are proficient in all the technologies, big data would be out of fashion. So that’s out of the way, how do I get into data science?

Getting the Job

This is probably the hard part, but not impossible. Before you go out and find a data analytics job, here is something to keep close to your heart, and if you don’t have one (since we are all robots), keep it on a post-it so you can look at it everyday.

It’s easy to get into data science, it is hard to stay a data scientist.

Now, if you are in a position where your company is looking for data scientist, you are one of the lucky few, you just need to apply.

Here are some tips for the rest:

  1. Pet Projects – do a project that you are interested in, like predicting property or stock prices to name a few.
  2. Kaggle competition – plenty of practice and potential prize to be won.
  3. Show initiative – create the opportunity in the workplace by implementing machine learning, drive the change from the bottom up.
  4. Local meetups – Network, network, network, go meet other data scientists.
  5. Learn to learn fast.

Words of caution:

The only thing I find big about big data is the hype, it’s really big, which is a problem for job applications. This means that companies may or may not know what they are looking for, also there will be many competitions as well. On multiple occasions, I had companies asking for x, interviewed y, tested me on z and in the end doesn’t know what they want because they didn’t have a business case, they wanted big data just because they don’t want to miss the boat. My honest suggestion is to focus on building skill sets and wait for the hype to die down, then look for the surviving company to invest your career.

Sum it up!

Data science is not a get rich fast scheme, it’s about passion for problem solving and analysis, so don’t be in a rush to get into data science, take your time and enjoy the learning then apply it into your work.  Hope you enjoyed reading this post. On the next post, I will be talking about how to add value to the business.

My Big Data Experience – Part I: Where should I start?

My Big Data Experience – Part I: Where should I start?

Few years ago, eating with my usual lunch group, one colleague mentioned “beeeeeg data”. That was the first time I heard the term, it sounded really silly and I joked, “what is this new hype now, it sounded like a sales pitch to sell me a 3D-TV” like any other hype train, I had no intention of jumping on.

And then here I am, on that said train sharing my experiences. I wanted to share my thought working in a data science domain. Perhaps I could provide some value towards the data science community with my experiences. I will update them as time goes on, here are some topics that I wanted to share over time:

  1. Part I: Where should I start?
  2. Part II: Spare Time and Getting the Job.
  3. Part III: Building the Right Team and Breaking the Business Model.
  4. Part IV: Programming and Statistics.
  5. Part V: Data Munging and Feature Engineering. (Coming soon)
  6. Part VI: Yippee! Machine Learning. (Coming soon)
  7. Part VIII: How to Interview a Data Scientist. (Coming soon)

What you have probably heard about big data / data science

  1. Money!
  2. Hottest job at the moment
  3. Machine Learning / Artificial Intelligence
  4. Maths, stats and programming required
  5. Data munging
  6. But mostly this diagram

data_science

These ideas are thrown around fairly often, but there are multiple roles within the data science domain. For example: Data Analysts, Data Scientists, Business Analysts, Data Engineer, Data Architect, Statistician, Database Administrator, and Data and Analytics Manager. You can read the article that explains the different role within the field of data science here (source: kdnuggets).

I believe this is one main reason why companies are struggling to find data scientists. There is this perception that data science can be done by one person instead of a team, similar to hiring a full-stack developer where the developer must know how to code the front-end, back-end, design the user interface, and setting up the production environment. Company should instead look at different roles that fit a data science team. It is fairly rare to find one person with all the required skill-sets, hence the term “Unicorn” is used to describe data scientists.

What people don’t tell you

  1. Data science is a team effort, even if you are an “Unicorn” that can do everything.
  2. Communication and sharing idea with team members is super important, it is necessary to cross validate each others ideas.
  3. You will need to learn to speak to non-technical people using easy to understand business examples.
  4. You will need to ask lots of business questions across multiple departments to get a better idea of the business processes to do your analysis.
  5. Data product will take a long time to develop, especially when you do not have enough data and data collection is very time consuming.
  6. You will have great difficulty obtaining data, because of politics, red tapes and lack of infrastructure.
  7. You will have difficult time working with data, because there are no ways to properly  link up multiple sources and data integrity is a real issue (raw vs. processed data).
  8. Data munging and exploratory data analysis will take up 80% of your time.
  9. Machine learning is fun but it is only 20% of the work.
  10. You will have to constantly remind people that data munging and feature engineering is more important than machines learning and therefore should spend more time there.
  11. You will have enormous pressure from management to provide value in terms of a data product especially when they do not understand the data science methodology, this can be solved by providing small findings continuously.
  12. You will be in meetings, a lot.
  13. There will be many projects that you need to deliver and you will need to learn to prioritise your tasks and juggle between projects.
  14. Every business problem is unique, there is no one method to solve all of them. You will need to spend your spare time learning and researching the problem and upskill yourself.
  15. Most companies don’t have a BIG data problem.
  16. You need to be passionate about the data, this is not negotiable.

Conclusion

My final suggestion to anyone who wanted to start a career in data science is to answer these questions honestly then go forward:

  1. Define the role you wanted to play.
  2. Do you have a passion for data science?
  3. Are you willing to spend time after work to improve yourself?

I believe if you have the right attitude and aptitude, you will be successful in any path in life. Good luck!

Do it right the first time and then what?

What would you do if you were given a house with a collapsed ceiling and some tools? Would you start hammering away at every broken ceiling tiles or find the problems to why the ceiling collapsed? Maybe another pillar was needed to support the structure or perhaps the ceiling were simply old? Either way, strap on your tool belt, put on a helmet and say: “I am going in!”

Do it right the first time!

Software development is much like building a house, you start with the design and foundations and then you build upwards and “Do it right the first time!” is actually the easy part. The amount of design patterns, cookie-cutter architectures, and boiler-plate code that are available online makes it relatively easy to setup, especially with Maven Archetypes (Boom! Project setup done). The system may have a few custom components here and there but the process is relatively simple, there won’t be many obstacles implementing a new system. All you need is a good architecture and the rest will fit in like Lego pieces.

A good initial design helps speed development process and maintenance but in this non-perfect world with growing functionalities, deadline, business pressure to deliver, copy and pasting, lack of code maintenance, refactoring and review. Year after year, the system accumulates technical debt and hopefully your company can pay its interest on time. How does the existing framework fare now, after one year, five years or ten years?

So back to the initial example of fixing a house, you have a broken ceiling what would you do? I believe most people would say: “Let’s find the problem, fix it so it doesn’t happen again”. Unfortunately, because of technical debt, sometimes the problem is buried so deep within the system that the root cause cannot be identified because it is caused by a combination of existing problems. So the developer just patches (*cough* hack *cough*) it and pray to Linus Torvalds that it does not break while they are on production support duty.

Now, let’s talk business.

How much do you think this kind of problem is going to cost you? Technical problems such as: hard coding, lazy copy and paste, hard to understand code, duplicating work etc., these issues cause new projects to take longer to implement and it is more prone to bugs. Not to mention the amount of resources you will need to maintain and debug the system. How much time are your developers spending on fixing (patching) the system instead of development? Are you hiring developers for the right reasons?

I had a project where every time we wanted to implement a new functionality, it would require us to modify 28 files and multiple database tables just to make the functionality appear in the correct place excluding the actual implementation of the function because of all the hard coding and bad architecture design. This process usually takes between 1 ~ 2 weeks to completed, then another few weeks just to code the logic for the new functionality. Now, imagine doing this every single release cycle. Eventually I asked my manager for some time (approx. 3 months) to refactor the system and another month for regression testing. Now it only takes three SQL statements (approx. 5 minutes) to complete the same task. Spending some effort to refactor saves money and time in the long run and certainly makes the developer’s job easier.

In conclusion, good initial design is really important, however, it is not enough. Doing it right is not just about the initial setup but also continuous refactoring and streamlining the system via code review and system (redundancy) analysis. It’s like cleaning a house, if you clean your house everyday then you will have less to do and everyone will benefit.

Moving Forward with Spring & AngularJS

Moving Forward with Spring & AngularJS

Introduction

“Do you know Spring? What about Angular?” are questions that often get asked during discussions and interviews. After some research, I have decided to migrate my own application (Stock Monitor) to use the “new” frameworks and share my experiences and opinions regarding the migration process and the obstacles I have encountered, things I like, dislike and completely horrified, and whether it is applicable for large scale projects.

Are AngularJS and Spring Framework here to stay? Here is a comic that demonstrate my initial concern with new frameworks both back-end and front-end.

comic

Original Implementation of Stock Monitor

The idea of Stock Monitor was to develop a desktop application for personal use to quickly access to stock portfolio without having to log onto stock broker’s site or requiring internet connection. It consists of simple MVC architecture with CRUD and charting functionalities. It was a proof of concept and was developed within one week. Here is a list of the technologies used and a screen shot of the application:

  • PostgreSQL
  • EclipseLink / JPA
  • Java Swing
  • JFreeChart

old

Implementing Stock Monitor Two-Point-Oh!

As the old adage goes: “if it ain’t broke, don’t fix it”. So why am I doing this?

  1. There are better ways of doing things, which provides learning opportunities whether it is technology, design pattern or different problem solving mindsets.
  2. Keeping up with industry standards. If you are not improving, you are falling behind.
  3. To be able to demonstrate and present these new frameworks at  Architecture Board in a job environment and discuss the implications of using such frameworks on existing projects.

if it ain’t broke, don’t fix it.

With a list of Java Frameworks and a list of JavaScript Frameworks available, how did I manage to choose Spring and Angular? Choosing the framework itself is quite daunting. The way I go about it is actually quite simple. Firstly, it has to meet the requirement for my application; secondly, the supporting community.

Stock Monitor 2.0 will need the following:

  1. Easy to scale
  2. Easy to test
  3. REST / SOAP Architecture
  4. Single page web application
  5. Framework maturity
  6. Security

Spring was an easy choice because it caters for these requirements, and has a large user base with constant update and fixes. AngularJS on the other hand, has a fairly small user base according to statistics. I chose it because of the design philosophy, the MVC model for the front-end, working with modules, but the deciding factor is that Google is maintaining it (the next big thing?).

stat

(Reference: here)

Now that the boring stuff is out of the way, I will discuss Spring and AngularJS in more details separately.

Disclaimer: please keep in mind that this is not a tutorial but an opinion piece based on my experience with these technologies. Also, my primary skill-set mainly reside in back-end development, so the learning curve varies between the front-end and the back-end development.

Spring Web MVC Framework, I choose you!

Here are a list of components and versions I used for the back-end framework:

  • Java 1.8
  • Maven 3.3.3
  • Apache Tomcat 8.0.27
  • Spring Framwork 4.2.1.RELEASE
  • Spring Security 3.2.5.RELEASE
  • Spring Security CSRF Token Filter 1.1
  • Hibernate 5.0.2.Final
  • Postgresql 9.4-1203-jdbc42
  • slf4j 1.6.1
  • Jackson JSON Processor 2.4.4

Spring Bean Configuration

Spring is easy to setup, by adding the Spring library in the Maven pom file, you are good to go. Spring uses the Model-View-Controller (MVC) convention for it’s configurations or Spring Bean Definitions which is require for dependency injection. There are three ways to go about configuring Spring Beans:

  • Java Based Configuration
  • XML Based Configuration
  • Combination of Java and XML Configuration

What I dislike: I tried the XML configuration at first and it was fairly frustrating mainly because I refactored the project packaging structure a few times before I got the structure I wanted. The problem being that the XML configuration does not automatically update the references of the Java class when you move / rename classes, so it was a very manual exercise when it comes to refactoring. Also, as the project grows I would imagine that the size of this configuration file be fairly lengthy and complicated, however, this problem can be overcome by splitting the configuration file into multiple files. There are advantages though, anyone picking up the project can easily locate the configurations in a single XML file as oppose to crawling through the Java Code to understand the dependencies between the classes with annotations.

In the end, I chose the Java Based Configuration over XML, simply because I believe that code in itself provides a good-enough-documentation and it also makes debugging easier. Java Based Configuration works with annotations such as:

  • @Configuration
  • @Profile(“development”)
  • @EnableTransactionManagement
  • @Bean
  • @EnableWebMvc
  • @ComponentScan
  • @EnableWebSecurity

I have four configuration files in the application: RootContextConfig.java, AppSecurityConfig.java, ServletContextConfig.java and WebAppInitializer.java. I have defined the configuration in a different package structure, separated from the main application like so:

  • com.stockmonitor.app (holds the application logic)
  • com.stockmonitor.conf (holds my configuration files)

When you add the @Configuration annotation to the Java class, Spring automatically identifies it as a configuration file, furthermore you can add @Profile(“name”) to define a different profiles with specific configuration, such as a testing or development with different properties such as datasource, testing configuration, test data etc.

The RootContextConfig.java below, it is where the main configuration for Stock Monitor reside. It is where I have configured the data source and transaction manager. The @ComponentScan annotation tells the application to go and look for other component within the folders that you have specified and add them to the application context, so the application is aware of them. This means that you do not have to declare every single component in the configuration file, Spring will automatically include them based on the path you have provided.

@Configuration
@ComponentScan({ 
	StringConstant.APPLICATION_ROOT_PATH + ".service", 
	StringConstant.APPLICATION_ROOT_PATH + ".dao", 
	StringConstant.APPLICATION_ROOT_PATH + ".init",
	StringConstant.APPLICATION_ROOT_PATH + ".security"})
@EnableTransactionManagement
public class RootContextConfig {

	@Bean(name = "transactionManager")
	public PlatformTransactionManager transactionManager(EntityManagerFactory entityManagerFactory, DriverManagerDataSource dataSource) {
		JpaTransactionManager transactionManager = new JpaTransactionManager();
		transactionManager.setEntityManagerFactory(entityManagerFactory);
		transactionManager.setDataSource(dataSource);
		return transactionManager;
	}
	
    @Bean(name = "datasource")
    public DriverManagerDataSource dataSource() {
        DriverManagerDataSource dataSource = new DriverManagerDataSource();
        dataSource.setDriverClassName(org.postgresql.Driver.class.getName());
        dataSource.setUrl("jdbc:postgresql://localhost:5432/stockmonitor");
        dataSource.setUsername("postgres");
        dataSource.setPassword("password");
        return dataSource;
    }

    @Bean(name = "entityManagerFactory")
    public LocalContainerEntityManagerFactoryBean entityManagerFactory(DriverManagerDataSource dataSource) {
        LocalContainerEntityManagerFactoryBean entityManagerFactoryBean = new LocalContainerEntityManagerFactoryBean();
        entityManagerFactoryBean.setDataSource(dataSource);
        entityManagerFactoryBean.setPackagesToScan(new String[]{StringConstant.APPLICATION_ROOT_PATH + ".model"});
        entityManagerFactoryBean.setLoadTimeWeaver(new InstrumentationLoadTimeWeaver());
        entityManagerFactoryBean.setJpaVendorAdapter(new HibernateJpaVendorAdapter());

        Map<String, Object> jpaProperties = new HashMap<String, Object>();
        jpaProperties.put("hibernate.hbm2ddl.auto", "");
        jpaProperties.put("hibernate.show_sql", "false");
        jpaProperties.put("hibernate.format_sql", "true");
        jpaProperties.put("hibernate.use_sql_comments", "true");
        entityManagerFactoryBean.setJpaPropertyMap(jpaProperties);

        return entityManagerFactoryBean;
    }
}

The AppSecurityConfig.java is where I use the Spring Security and CSRF Protection to filter and block the access of specific paths in the URL and block cross site attacks via a unique token. It uses  SecurityUserDetailsService from Spring Security to do the user lookup.

@Configuration
@EnableWebSecurity
public class AppSecurityConfig extends WebSecurityConfigurerAdapter {
	
    private static final Logger LOGGER = Logger.getLogger(AppSecurityConfig.class);

    @Autowired
    private SecurityUserDetailsService userDetailsService;

    @Autowired
    protected DataSource dataSource;

    @Override
    protected void configure(AuthenticationManagerBuilder auth) throws Exception {
        auth.userDetailsService(userDetailsService).passwordEncoder(new BCryptPasswordEncoder());
    }

    @Override
    protected void configure(HttpSecurity http) throws Exception {
        CsrfTokenResponseHeaderBindingFilter csrfTokenFilter = new CsrfTokenResponseHeaderBindingFilter();
        http.addFilterAfter(csrfTokenFilter, CsrfFilter.class);

        http
            .authorizeRequests()
            	.antMatchers("/resources/public/**").permitAll()
            	.antMatchers("/resources/img/**").permitAll()
            	.antMatchers("/resources/bower_components/**").permitAll()
            	.antMatchers(HttpMethod.POST, "/user").permitAll()
            	.anyRequest().authenticated()
            .and()
            .formLogin()
            	.defaultSuccessUrl("/resources/view/#")
            	.loginProcessingUrl("/authenticate")
            	.usernameParameter("username")
            	.passwordParameter("password")
            	.successHandler(new AjaxAuthenticationSuccessHandler(new SavedRequestAwareAuthenticationSuccessHandler()))
            	.loginPage("/resources/public/login.html")
            .and()
            .httpBasic()
            .and()
            .logout()
            	.logoutUrl("/logout")
            	.logoutSuccessUrl("/resources/public/login.html").permitAll();
    }
}

The ServletContextConfig.java tells Spring that this is a Web MVC architecture via the @EnableWebMvc and scans for the controller layer. Other web security component can also be declared in here via the extension of the WebMvcConfigureAdapter.java.

@Configuration
@EnableWebMvc
@ComponentScan(StringConstant.APPLICATION_ROOT_PATH + ".controller")
public class ServletContextConfig extends WebMvcConfigurerAdapter {

	@Override
	public void addResourceHandlers(ResourceHandlerRegistry registry) {
		registry.addResourceHandler("/resources/**").addResourceLocations("/resources/");
	}
}

The WebAppInitializer.java basically replaces the web.xml configurations, you can define all you web.xml context in here, loading the all the configuration files on server start-up.

public class WebAppInitializer extends AbstractAnnotationConfigDispatcherServletInitializer {

	@Override
	protected Class<?>[] getRootConfigClasses() {
		return new Class<?>[] { RootContextConfig.class, AppSecurityConfig.class };
	}

	@Override
	protected Class<?>[] getServletConfigClasses() {
		return new Class<?>[] { ServletContextConfig.class };
	}

	@Override
	protected String[] getServletMappings() {
		return new String[] { "/" };
	}
}

I find the configurations are the hardest part about learning the Spring framework, there are many aspects that you can configure but Spring made it easy. Most of the configurations are by convention, so it is fairly intuitive to learn. Once you have configured the location of the components reside by declaring them with annotations, everything else falls into place and that is the strength of Spring. Now let us take a look at the components of Stock Monitor two-point-oh! Here you will encounter a few more annotations, which are more specialized @Component annotation that declares the class as a Spring bean:

  • @Repository
  • @Service
  • @Controller

Model & Data Access Object

The original model objects was implemented with EclipseLink, porting it to Hibernate was a breeze. However, the difference is the data access objects (DAO). By declaring @Repository, it allows Spring to know that we are working with model objects and will handle the exception as DataAccessException.

@Repository
public class StockRepository {
	
	@PersistenceContext
	private EntityManager em;

	public List<Stock> findStocks(String email) {
		List<Stock> stocks = em.createNamedQuery(Stock.FIND_BY_USER, Stock.class).setParameter("usemail", email).getResultList();
		return stocks;
	}
	
	public boolean isStockNotExist(String email, String ticker) {
		List<Stock> stocks = em.createNamedQuery(Stock.FIND_BY_USER_AND_TICKER, Stock.class).setParameter("stticker", ticker).getResultList();
		return stocks.isEmpty();
	}
	
    public void delete(Integer id) {
        Stock stock = em.find(Stock.class, id);
        em.remove(stock);
    }

    public Stock findStockById(Integer id) {
        return em.find(Stock.class, id);
    }
	
	public Stock save(Stock stock) {
		return em.merge(stock);
	}
}

Service

The @Service annotation declares the class as a service component which is used for processing business logic and validations. Currently the @Service annotation does nothing more than its parent @Component annotation but that might change in the future. The @Autowired annotation will automagically injects the object for you and you will see this annotation quite often.

The @Transactional annotation is part of the Spring Declarative Transaction Management. You can define the types of transaction you will require which are usually done manually via the entity manager, but spring does this for you with the @Transactional annotations, you can add additional parameters such as: rollback, timeout, isolation, propagation, readOnly etc.

@Service
public class StockService {
	private static final Logger LOGGER = LoggerFactory.getLogger(StockService.class);

	@Autowired
	private StockRepository stockRepository;

	@Autowired
	private UserRepository userRepository;

	@Transactional
	public SearchResult<Stock> findStocks(String email) {
		assertNotBlank(email, "email cannot be blank");
		
        List<Stock> stocks = stockRepository.findStocks(email);
        
        return new SearchResult<>(stocks.size(), stocks);
	}
	
	@Transactional
	public void deleteStocks(List<Integer> ids) {
		notNull(ids, "selection is mandatory");
		ids.stream().forEach((id) -> stockRepository.delete(id));
	}

	@Transactional
	public Stock saveStock(String email, Integer id, String sector, String name, String ticker, Boolean customPrice, BigDecimal price) {
		assertNotBlank(email, "email cannot be blank");
		assertNotBlank(sector, "Stock sector is mandatory");
		assertNotBlank(name, "Stock name is mandatory");
		assertNotBlank(ticker, "Stock ticker is mandatory");
		if (customPrice) {
			notNull(price, "Stock price is mandatory");	
		}

		SectorEnum sectorEnum = SectorEnum.getEnum(sector);
		notNull(sectorEnum, "Cannot find the sector");
		
		Stock stock = null;
		if (id != null) {
			stock = stockRepository.findStockById(id);
			stock.setSector(sectorEnum);
			stock.setName(name);
			stock.setTicker(ticker);
			stock.setCustomPrice(customPrice);
			stock.setPrice(price);
			LOGGER.info("Updating existing stock: id{}", id);
		} else {
			User user = userRepository.findUserByEmail(email);

			if (user != null) {
				stock = stockRepository.save(new Stock(user, sectorEnum, name, ticker, customPrice, price));
				LOGGER.warn("A stock was attempted to be saved for a non-existing user: {} " + email);
			}
		}
		return stock;
	}

	@Transactional
	public List<Stock> saveStocks(String email, List<StockDTO> stocks) {
		return stocks.stream().map((stock) -> saveStock(email, stock.getId(), stock.getSector(), stock.getName(), stock.getTicker(),
				stock.isCustomPrice(), new BigDecimal(stock.getPrice()))).collect(Collectors.toList());
	}
}

Controller

Here I used a RESTful web services, by declaring the @Controller annotation, the application will look for the @RequestMapping annotation as it is requires by the DispatcherServlet in Spring Web MVC Framework. The value inside the @RequestMapping(“/stock”) with determine which controller it should go to and the request header will indicate the method to call.

Exception handling are managed by the @ExceptionHandler annotation. You can specify the types of exceptions and how it should be handled when it is thrown by the code. So you won’t see multiple try-catch blocks inside the method, keeping the code clean and the ability to re-use exception handling. It is also returns the “view” object whenever an exception occurs.

What I like: Look at the code below and ask yourself the question: “How do you pass a POJO through to a RESTful web service?” This was the question I had, usually this is done via passing the object through the HttpServletRequest via the HttpSession object, but here we are using JSON which gets converted to POJO and vice versa. When the front-end (ajax in this case) sends the content type as “application/json”, Spring picks it up with the @RequestBody annotation and converts it into a POJO and it works the other way around with @ResponseBody which convert the POJO back into JSON. Just like that, no extra work, it is that simple.

@Controller
@RequestMapping("/stock")
public class StockController {
	private static final Logger LOGGER = LoggerFactory.getLogger(StockController.class);

    @Autowired
    private StockService stockService;

    @ResponseBody
    @ResponseStatus(HttpStatus.OK)
    @RequestMapping(method = RequestMethod.GET)
    public StocksDTO findStocks(Principal principal) {
    	LOGGER.info("======================================== BEGIN SERVLET REQUEST ========================================");
    	LOGGER.info("> Moving to: StockService.findStocks");

        SearchResult<Stock> result = stockService.findStocks(principal.getName());

        return new StocksDTO(StockDTO.mapFromStocksEntities(result.getResult()));
    }
    
    @ResponseBody
    @ResponseStatus(HttpStatus.OK)
    @RequestMapping(method = RequestMethod.POST)
    public List<StockDTO> saveStocks(Principal principal, @RequestBody List<StockDTO> stocks) {
    	LOGGER.info("======================================== BEGIN SERVLET REQUEST ========================================");
    	LOGGER.info("> Moving to: StockService.saveStocks");

        List<Stock> savedStocks = stockService.saveStocks(principal.getName(), stocks);
        return savedStocks.stream().map(StockDTO::mapFromStockEntity).collect(Collectors.toList());
    }

    @ResponseBody
    @ResponseStatus(HttpStatus.OK)
    @RequestMapping(method = RequestMethod.DELETE)
    public void deleteStocks(@RequestBody List<Integer> id) {
    	LOGGER.info("======================================== BEGIN SERVLET REQUEST ========================================");
    	LOGGER.info("> Moving to: StockService.deleteStocks");

    	stockService.deleteStocks(id);
    }

    @ExceptionHandler(Exception.class)
    public ResponseEntity<String> errorHandler(Exception exc) {
        LOGGER.error(exc.getMessage(), exc);
        return new ResponseEntity<>(exc.getMessage(), HttpStatus.BAD_REQUEST);
    }
}

Spring Verdict

So here I have explained the basic concept of the Spring Web MVC Framework, but does it answer my original question? Is it too soon to jump on the hype train? Does it satisfy my requirements? How well does it perform in a team environment and large projects?

In short, I enjoy learning and developing with the framework and it feels really solid. The development time is extremely fast, without having to declare and initialize all my service and component before use. Due to it modular design, debugging is really easy, code is easy to scale and maintain; it really allows me to focus on developing business requirements and leave the plumbing to Spring. It took less than a week to learn and migrate Stock Monitor to Spring MVC Framework, Angular on the other hand took much longer. I admit my implementation and requirements are fairly simple, and there are many concept that I have not dealt with, but with the huge supporting documentations and user base, I have confidence that it is fairly easy to learn.

focus on developing the business requirements and leave the plumbing to Spring

Team Environment

Large IT companies usually have multiple systems that integrates with one another which makes service orientated architecture a general requirement. Spring framework “encourages” good coding practices such as loose-coupling with dependency injection, re-usability, modular declarative programming and service oriented design. In brief, Spring simply saves you time, whether reading code, debugging, or development; it allows team members to talk about design and implementation of the components in a clear fashion as opposed to how component should work together. Does that mean we should go and migrate all the systems to Spring?

For new projects, I highly recommend Spring Framework but for existing projects? Depends on the size of the project, small to medium project is doable, but not for anything larger. Let me clarify, a full overhaul will not be an option, it will takes too much time and resources just to do the same thing, business (the guy who pays the bills) will not allow it, it serves no value. This being said, I would develop new components with the Spring Framework along side the existing framework. In conclusion, I find Spring Framework to be robust and easy to use, the learning curve is fairly mild, and I think it is a great tool for developing solid scalable applications.

AngularJS Superheroic Framework

Here are a list of components and versions I used for the front-end framework:

  • NodeJS 0.10.25
  • Bower 1.6.5
  • Angular 1.4.7
  • Angular-bootstrap 0.14.3
  • Angular-chart.js 0.8.5
  • Angular-loading-bar 0.8.0
  • Angular-messages 1.4.7
  • Angular-toggle-switch 1.3.0
  • Angular-ui-router 0.2.15
  • Bootstrap 3.3.5
  • JQuery 2.1.4
  • font-awesome 4.4.0
  • lodash 2.4.1
  • metisMenu 2.2.0
  • oclazyload 1.0.8
  • requireJS 2.1.20
  • spring-security-csrf-token-interceptor 0.1.5

AngularJS Setup & Configurations

Setting up AngularJS is relatively easy: download the JavaScript files, open up your index.html, include it in the html and you are good to go. However, I took a more adventurous route and I got lost and ended up in framework-hell (a special hell for developers who likes new and shiny frameworks).

Really Ugly: During my research on AngularJS, I discovered Bower to manage the front-end libraries so I tried decided to try it out. Before I install Bower I needed to install npm, before I can install npm I needed to install NodeJS. Oh, I also needed to setup a Git repository before you can run Bower. Now that the environment is set up, you can proceed to write the bower configuration file – bower.json. Also, not to mention I was trying to do this through a very restrictive proxy which did not help with the frustration. That being said, Bower is actually pretty cool, it’s like Maven for the front-end and manages the dependencies between different modules and their dependent version.

Once I have all the required libraries, several hundred files in my application later: Angular & related modules, Bootstrap, Font-Awesome, MetisMenu etc., I still need to include them in the HTML page. This is where I stumbled across RequireJS to manage the dependencies for me, why? This is because Angular has a lot of modules, and for each modules there are dependencies that you are required to include them in the HTML page.

RequireJS is a JavaScript file and module loader. It is optimized for in-browser use, but it can be used in other JavaScript environments, like Rhino and Node. Using a modular script loader like RequireJS will improve the speed and quality of your code.

I didn’t find RequireJS useful in Stock Monitor, the reason being that I still had to manually configure the dependencies on the configuration file for each HTML page, I didn’t find an big advantage over the standard script include. So I searched for alternatives that can scan and inject the JavaScript file / modules automatically and insert them into my HTML page and I came across Grunt. This is when I realize I have been wasting time trying to find the perfect framework instead of developing functionalities. All this took about 2 ~ 3 days and I still haven’t coded a single line in AngularJS. Here is where I gave up, and just stick to the manual method by adding them the old way and started to develop functionalities instead.

<!doctype html>
<html class="no-js">
<head>
	<meta charset="utf-8">
	<title>Stock Monitor</title>
	<meta name="description" content="">
	<meta name="viewport" content="width=device-width">
	<link rel="stylesheet" href="/app/resources/bower_components/bootstrap/dist/css/bootstrap.min.css" />
	<link rel="stylesheet" href="/app/resources/css/main.css">
	<link rel="stylesheet" href="/app/resources/css/stockMonitor.css">
	<link rel="stylesheet" href="/app/resources/bower_components/metisMenu/dist/metisMenu.min.css">
	<link rel="stylesheet" href="/app/resources/bower_components/angular-loading-bar/build/loading-bar.min.css">
	<link rel="stylesheet" href="/app/resources/bower_components/font-awesome/css/font-awesome.min.css" type="text/css">
    <script src="/app/resources/bower_components/jquery/dist/jquery.min.js"></script>
    <script src="/app/resources/bower_components/angular/angular.min.js"></script>
    <script src="/app/resources/bower_components/angular-ui-router/release/angular-ui-router.min.js"></script>
    <script src="/app/resources/bower_components/oclazyload/dist/ocLazyLoad.min.js"></script>
    <script src="/app/resources/bower_components/angular-loading-bar/build/loading-bar.min.js"></script>
    <script src="/app/resources/bower_components/angular-bootstrap/ui-bootstrap-tpls.min.js"></script>
    <script src="/app/resources/bower_components/metisMenu/dist/metisMenu.min.js"></script>
    <script src="/app/resources/bower_components/Chart.js/Chart.min.js"></script>
    <script src="/app/resources/bower_components/lodash/dist/lodash.js"></script>
    <script src="/app/resources/angular/frontend-services.js"></script>
    <script src="/app/resources/angular/app.js"></script>
    <script src="/app/resources/js/stockMonitor.js"></script>
</head>

<body>
	<div ng-app="stockMonitor">
		<div ui-view></div>
	</div>
</body>
</html>

As a suggestion and to make your life easy, just start with index.html with AngularJS JavaScript loaded then add the other components as you go along. Start writing code before you get lost in the framework nightmares.

Building a Single Page Web Application

The most important aspect when it comes to the front-end implementation is the user experience. It is fast and responsive as it only loads the HTML page once and only refreshes the components within the page when required, which reduces the server load. Here are the components that I have used to achieve this.

Module

AngularJS works with modules, it is a container for different part of the application. Such as routing, directives, services, controllers etc. It is the basic foundation for AngularJS, which is reusable and declarative.

Here is the main module that defines the routing which we look explain in the next section. The parameters in the square bracket ([]) are the dependencies that are required in this module. All configuration needs to be configured under a module i.e. services, controller, config, directives etc.

angular.module('stockMonitorApp', [ 'oc.lazyLoad', 'ui.router', 'ui.bootstrap', 'angular-loading-bar' ])

Here is my service modules, where all my services get initialized and declared.

angular.module('frontendServices', [])

Here is how you include the AngularJS modules in the HTML page or componenets.

<div ng-app="stockMonitorApp">
	<div ui-view></div>
</div>

Routing

In order for the application to know where to go when a link or a button is pressed, routing is required. This tells the the site what component to load in the <ui-view> directive, as opposed to loading a whole HTML page. Angular-ui-router module is used to achieve this, by defining different state and its dependencies in the module configuration. This is really nice as oppose to the JSF configuration file or a database driven routing or a html static link. The ui-router also hides the URL with the hashtag instead of a static link.

For example: http://localhost:8080/app/resources/view/index.html#/dashboard/home

'use strict';

angular.module('stockMonitorApp', [ 'oc.lazyLoad', 'ui.router', 'ui.bootstrap', 'angular-loading-bar' ])
	.config(['$stateProvider', '$urlRouterProvider', '$ocLazyLoadProvider', function($stateProvider, $urlRouterProvider, $ocLazyLoadProvider) {
			
		$ocLazyLoadProvider.config({
			debug : false,
			events : true,
		});
		
		//default state
		$urlRouterProvider.otherwise('/dashboard/home');

		$stateProvider.state('dashboard', {
			url : '/dashboard',
			templateUrl : '/app/resources/view/dashboard/main.html',
			resolve : {
				loadMyDirectives : function($ocLazyLoad) {
					return $ocLazyLoad.load({
						name : 'stockMonitorApp',
						files : [
						         '/app/resources/angular/directive/header/header.js',
						         '/app/resources/angular/directive/header/header-notification/header-notification.js',
						         '/app/resources/angular/directive/sidebar/sidebar.js'
						         ]
					})
						
					$ocLazyLoad.load({
						name : 'toggle-switch',
						files : [
						         "/app/resources/bower_components/angular-toggle-switch/angular-toggle-switch.min.js",
						         "/app/resources/bower_components/angular-toggle-switch/angular-toggle-switch.css" ]
					})
					
					$ocLazyLoad.load({
						name : 'ngAnimate',
						files : [ '/app/resources/bower_components/angular-animate/angular-animate.js' ]
					})
				
					$ocLazyLoad.load({
						name : 'ngCookies',
						files : [ '/app/resources/bower_components/angular-cookies/angular-cookies.js' ]
					})
					
					$ocLazyLoad.load({
						name : 'ngResource',
						files : [ '/app/resources/bower_components/angular-resource/angular-resource.js' ]
					})
					
					$ocLazyLoad.load({
						name : 'ngSanitize',
						files : [ '/app/resources/bower_components/angular-sanitize/angular-sanitize.js' ]
					})
					
					$ocLazyLoad.load({
						name : 'ngTouch',
						files : [ '/app/resources/bower_components/angular-touch/angular-touch.js' ]
					})
				}
			}
		})

		.state('dashboard.home', {
			url : '/home',
			controller : 'MainCtrl',
			templateUrl : '/app/resources/view/dashboard/home.html',
			resolve : {
				loadMyFiles : function($ocLazyLoad) {
					return $ocLazyLoad.load({
						name : 'stockMonitorApp',
						files : [
						         '/app/resources/angular/frontend-services.js',
						         '/app/resources/angular/controller/main.js',
						         '/app/resources/angular/directive/dashboard/stats/stats.js' 
						        ]
					})
				}
			}
		})
		
		.state('dashboard.stock', {
			url : '/stock',
			controller : 'StockCtrl',
			templateUrl : '/app/resources/view/stock.html',
			resolve : {
				loadMyFile : function($ocLazyLoad) {
					return $ocLazyLoad.load({
						name : 'chart.js',
						files : [
						         '/app/resources/bower_components/angular-chart.js/dist/angular-chart.min.js',
						         '/app/resources/bower_components/angular-chart.js/dist/angular-chart.css' 
						        ]
					}),
						
					$ocLazyLoad.load({
						name : 'stockMonitorApp',
						files : [ 
						         '/app/resources/angular/frontend-services.js',
						         '/app/resources/angular/controller/stockController.js' 
						        ]
					})
				}
			}
		})
		
		.state('dashboard.logout', {
			url : '/logout',
			controller : 'LogoutCtrl',
			templateUrl : '/app/resources/view/logout.html',
			resolve : {
				loadMyFile : function($ocLazyLoad) {
					return $ocLazyLoad.load({
						name : 'stockMonitorApp',
						files : [ 
						         '/app/resources/angular/controller/frontend-services.js',
						         '/app/resources/angular/controller/logoutController.js' 
						        ]
					})
				}
			}
		})		
	} 
]);

Service

I have defined my service in a different module, where it does the communication between the front-end and the back-end with the request mapping that we have defined in the Controller. Here we are sending the /stock in the URL and it the server picks it up and goes to the controller with that mapping and then process and return the response. The service layer sends the data via JSON to the server side for processing.

	.service('StockService', ['$http', '$q', function($http, $q) {
		return {
			//find stock
			findStocks: function(){
               var deferred = $q.defer();

                $http({
                    method: 'POST',
                    url: '/app/stock',
                    data: stocks,
                    headers: {
                        "Content-Type": "application/json",
                        "Accept": "text/plain, application/json"
                    }
                })
                .then(function (response) {
                    if (response.status == 200) {
                        deferred.resolve();
                    } else {
                    	deferred.reject("Error finding stocks: " + response.data);
                    }
                });

                return deferred.promise;
			},
			//save list
			saveStocks: function(stocks) {
               var deferred = $q.defer();

                $http({
                    method: 'POST',
                    url: '/app/stock',
                    data: stocks,
                    headers: {
                        "Content-Type": "application/json",
                        "Accept": "text/plain, application/json"
                    }
                })
                .then(function (response) {
                    if (response.status == 200) {
                        deferred.resolve();
                    } else {
                    	deferred.reject("Error saving stocks: " + response.data);
                    }
                });

                return deferred.promise;
			},
			//delete list
			deleteStocks: function(ids) {
               var deferred = $q.defer();

                $http({
                    method: 'DELETE',
                    url: '/app/stock',
                    data: ids,
                    headers: {
                        "Content-Type": "application/json"
                    }
                })
                .then(function (response) {
                    if (response.status == 200) {
                        deferred.resolve();
                    } else {
                        deferred.reject('Error deleting stocks');
                    }
                });

                return deferred.promise;
			}
		}
	}])

Controller

In the controller layer, we are coordinating the communication between the service layer and the view layer. Here we retrieve the data from the service layer and populate and manipulate the view object, then we sent it back to the service layer for processing. The controller constantly talks to the view layer asynchronously via the $http and $q modules via the service layer when there is a change.

    .controller('StockCtrl', ['$scope' , 'StockService', '$timeout',
        function ($scope, StockService, $timeout) {
    	
            $scope.vm = {
                originalStocks: [],
                stocks: [],
                isSelectionEmpty: true,
                errorMessages: [],
                infoMessages: []
            };
            
            loadStockData(null, null, null, null, 1);

            function markAppAsInitialized() {
                if ($scope.vm.appReady == undefined) {
                    $scope.vm.appReady = true;
                }
            }

            function loadStockData() {
            	StockService.findStocks()
            		.then(function (data) {
            			$scope.vm.errorMessages = [];
                        $scope.vm.originalStocks = _.map(data.stock, function (stock) {
                            return stock;
                        });

                        $scope.vm.stocks = _.cloneDeep($scope.vm.originalStocks);

                        _.each($scope.vm.stocks, function (stock) {
                            stock.selected = false;
                        });

                        markAppAsInitialized();

                        if ($scope.vm.stocks && $scope.vm.stocks.length == 0) {
                            showInfoMessage("No results found.");
                        }
                    },
                    function (errorMessage) {
                        showErrorMessage(errorMessage);
                        markAppAsInitialized();
            		});
            }

            $scope.add = function () {
                $scope.vm.stocks.unshift({
                    id: null,
                    name: null,
                    ticker: null,
                    sector: null,
                    custom: null,
                    price: null,
                    selected: false,
                    new: true
                });
            };

The controller is call via the ng-controller directive.

<body id="stockMonitorApp" ng-controller="StockCtrl">

View

Angular uses directive when it comes to view model, you can make custom directives or use the built in ones. Directives are used long side the HTML element.

At a high level, directives are markers on a DOM element (such as an attribute, element name, comment or CSS class) that tell AngularJS’s HTML compiler ($compile) to attach a specified behavior to that DOM element (e.g. via event listeners), or even to transform the DOM element and its children.

Here we can see some of these directives in action: ng-repeat, ng-model, tt-editable-cell, tt-date-time-picker, tt-cell-field, tt-numeric-field. The first two being angular directives while the rests are custom made. Custom directive can be very flexible, you can creative new components and reuse them in your HTML pages making AngularJS component focused. You can think of directives as JSP Custom Tags except it is much easier to implement and is written in HTML and JavaScript.

			<div class="table-responsive">
				<div class="btn-group pull-right">
					<button type="submit" class="btn btn-default btn-lg" ng-click="add()">Add</button>
					<button type="submit" class="btn btn-default btn-lg" ng-click="delete()" ng-disabled="vm.isSelectionEmpty" disabled>Delete</button>
					<button type="reset" class="btn btn-default btn-lg" ng-click="reset()">Reset</button>
					<button type="submit" class="btn btn-default btn-lg btn-primary" ng-click="save()">Save</button>
				</div>
				<table class="table table-hover">
					<thead>
						<tr>
							<th></th>
							<th>Stock Name</th>
							<th>Ticker / Abbreviation</th>
							<th>Sector</th>
							<th>Use Custom Price?</th>
							<th>Custom Price</th>
						</tr>
					</thead>
					<tbody>
						<tr ng-repeat="stock in vm.stocks | excludeDeleted | limitTo : 10" tt-editable-row rowValue="stock">
							<td>
								<input type="checkbox" ng-model="stock.selected" ng-click="selectionChanged()">
							</td>
							<td tt-editable-cell value="stock.name" is-new="stock.new">
								<input tt-date-time-picker tt-cell-field type="text" ng-model="stock.name">
							</td>
							<td tt-editable-cell value="stock.ticker" is-new="stock.new">
								<input type="text" tt-cell-field ng-model="stock.ticker">
							</td>
							<td tt-editable-cell value="stock.sector" is-new="stock.new">
								<input type="text" tt-cell-field tt-numeric-field ng-model="stock.sector">
							</td>
							<td tt-editable-cell value="stock.custom" is-new="stock.new">
								<input type="checkbox" ng-model="stock.custom">
							</td>
							<td tt-editable-cell value="stock.price" is-new="stock.new">
								<input type="text" tt-cell-field tt-numeric-field ng-model="stock.price">
							</td>
						</tr>
					</tbody>
				</table>
			</div>

Here is a example of how to define a custom directive:

	.directive('ttEditableCell', function() {
	    return {
	        scope: {
	            value: '=',
	            isNew: '='
	        },
	        transclude:true,
	        replace:true,
	        templateUrl: '/app/resources/partials/editable-cell.html',
	        controller: ['$scope', function($scope) {
	
	            $scope.cellState = {
	                editMode: false
	            };
	
	            $scope.onValueChanged = function (newValue) {
	                $scope.value = newValue;
	            };
	
	            $scope.edit = function() {
	                $scope.cellState.editMode = true;
	            };
	
	        }]
	    };
	})
<td class="editable-cell">
    <div ng-hide="cellState.editMode || isNew" ng-click="edit()" class="edit-widget cell-description" >
        {{value}}
    </div>
    <div ng-show="cellState.editMode || isNew" ng-cloak>
        <div class="edit-widget" ng-transclude>

        </div>
    </div>
</td>

AngularJS Verdict

Here is a screenshot of Stock Monitor 2.0, it is fairly basic and incomplete in terms of functionality but it is able to demonstrate the framework from the front-end to the back-end. The learning curve is around 2 ~ 3 weeks for me.

StockMonitor2

So, back to the topic of a single page application. Here the header and the side bar only get loaded once, only the main frame <ui-view> get loaded with new content when navigating between component, here it loads the stock.html. When you click on the Add Button, the HTML calls the Angular controller to insert a new row of inputs. Once you have inserted all the values and click on the Save Button, angular controller goes and validate the values then compiles them into JSON and send them off to the service layer. The service layer then sends the JSON data through to the back-end for processing. All this happens in real time with ajax, there are no page refreshes or loading during this process, making the application fluid and user friendly. This is achieved via the Angular directives inside the HTML tags.

What I like: The AngularJS implementation is clean and maintainable, I like the segregation of duties of the MVC framework instead of one JavaScript file that does everything. In a team environment it can be very useful, as long as the view (JSON data + HTTP request data) is defined between the front-end and the back-end team, they can continue to develop their own components. The front-end team can simply run a JSON server to simulate the data that get returned from the back-end without waiting for the back-end developers, which can speed up development process.

The directives are quite powerful and it uses HTML and JavaScript to achieve this, which means you can actually do virtually anything and promotes reusable component. These component get updated real-time via the view-controller interaction, so the front-end is always updated without page refresh.

There is a wealth of modules and documentations online, so finding resources are easy.

What I dislike: Lots of coding, it almost feels as if I was writing the back-end implementation again. I cannot simply use the object from the back-end Java object directly, instead I had to operate with a JavaScript object from the JSON data. Also, defining the service layer and the controller layer can be tedious, it is like having a deja-vu, “haven’t I done this before?” feeling.

The problem that I had with AngularJS is when I missed a JavaScript include, AngularJS tried to load it over and over again which resulted in a memory leak and the application stopped responding. This makes debugging difficult, because the browser becomes unresponsive and the error log is so full, that I don’t know what the initial errors were.

With regards to migrating to AngularJS, my answer is a definite “No”, not even for small projects. My reasoning is that you will need to breakdown the entire existing front-end to use a Angular controller and directives. The process is inefficient, so I would not recommend migrating.

However, with regards to new project AngualrJS may be an option, but this really depends on the type of requirements. For simple project, AngularJS is an overkill. For medium and large project in a team environment, I believe it can work, since it has a mature framework that promotes modularity, reusuability and it is very structured. However, here is the big BUT.

Really Ugly: Do you know Angular 2.0 is on it way? Do you know that it is not backward compatible? Yep. I am not sure where AngularJS will end up in the future. My concern is whether I should continue to develop the front-end with AngularJS, wait for the next version or ditch the framework entirely?

Final Conclusion

In the end, I had great experiences working with both Spring and AngularJS. Spring really surprises me and exceeded my expectations, Angular on the other hand was interesting to work with but I am not sold on it. I believe it all depends on the business requirements, then finding the appropriate tools to solve it. AngularJS feels too complicated for my requirements, but I do believe there is a place for it in a more complex environment.