Improve Operations, Performance, and Business Metrics with Application Monitoring Solutions

Enhanced user experience is one of the top goals for every SaaS provider. How do you ensure that your customers are experiencing the value you intended your applications to deliver? 

“Customer experience is the next competitive battleground.”

– Jerry Gregoire (the founder and Chairman of Redbird Flight Simulations, Inc.)

Having real-time, non-real-time and partially real-time metrics at your fingertips may help you to stay ahead of expected and unexpected challenges regarding your service performance, business, and operations.  

Why do startups need to monitor their well-tested applications?

Earlier, the software was delivered (or “shipped”) to the customers through a CD and that was the extent of it. The job of the SaaS Provider was completed. However, since the advent of the SaaS and the cloud services, the software developers have visual access to more information about the delivered application, for instance, how customers are using it, which feature is popular with the user, which feature is unpopular among the users, and much more.

This allows the SaaS providers to leave the hardware mindset behind, they no longer have to ‘build it to never break’. Instead, the application monitoring allows SaaS providers to make customer satisfaction an iterative process for a product. Why?  Because application monitoring allows you to not only track the performance of your product in the customer environment but also the oversight of operations and some valuable business insights that you can leverage to enhance your revenues. 

What are the different categories of metrics used to monitor such applications?

Broadly, the application monitoring metrics are divided into three categories.

The first and most important one is operational metrics. Generally, it covers the health of your services, all the related underlying microservices, and the interaction of the service in its environment. Operational metrics are usually real-time metrics that notify SaaS providers as soon as an application or any feature in the application ceases to function.  

The second category of metrics is business metrics which determine if your service is offering the value that you designed it to offer. Business metrics can be real-time metrics but they usually are not. They are measured over weeks and months to identify a trend. 

The third type of metric is performance metrics. An application may be running and providing the value it was meant to, but it may be taking a long time to load, or TCP handshake is taking longer than expected, such measurements are defined by the performance metrics of the application. Performance metrics are semi-real-time metrics. 

What are some good examples of operational metrics?

The basic metrics that SaaS providers can use for monitoring applications like operationality, CPU memory, or I/O usage are offered by the cloud for free. These are also metrics that don’t require any instrumentation in your code.

There are instances where metrics are measured through the use of instrumentation installed in your application. If the code crashes, these instrumentations will generate an alert for the undesired event across multiple microservices. 

Metrics that are threshold-based and can generate a notification and/or when the threshold is breached are also used in operations depending on your services. 

Operational metrics are layered in that order that you start small and add sophisticated metrics over that foundation. 

What are some top-tier performance metrics for application monitoring?

Page load time is a good example of how you measure the performance of your service. Similarly the time it takes the customer to submit a form hosted on your website is another. The time it takes for my ride-hailing app to enter an address and then the time difference to when the cab driver receives the notification are all performance measurements that vary depending on your service. 

For more sophisticated monitoring, metrics such as upon notification of an error, observing performance at a more granular level like TCP handshake time and SSL time, etc. are incorporated and you gradually build on top of simpler metrics. 

Does microservices architecture pose more complexity compared to a monolithic architecture?

Microservices may be harder to build but are easier to monitor because the metric that is experiencing error will have the issue limited to that microservice alone. 

Monolith applications are easier to build but are harder to monitor, simply because if something is rendered defective, debugging and addressing the problem can be very time-consuming and will affect every aspect of your business. 

What are your recommendations regarding an alert system?

The Cloudwatch monitoring and observability solution for application and infrastructure by Amazon is a very powerful system. There are multiple alerting tools like email notifications, SNS, and SQL notifications that are built into it. It also offers multiple ticketing as well as communication system integrations, on-call tools, PagerDuty, and much more. 

Other than that, Google Cloud MonitoringDataDog, and New Relic are a few SaaS providers of Monitoring Solutions. 

How do monitoring solutions help operations teams with correlation, triage, and root cause diagnosis?

As a SaaS service provider, you ought to know about a problem in your service or application before or when the customer complains. One application is supported by multiple microservices and identification of the performance issue and knowing where to find that issue before you can resolve it, is only possible through metrics for monitoring applications. 

Once you have identified the problem and its source of a performance bottleneck, you can start fixing that problem to bring the application back to its original functional state and that’s how triage, correlation, and root cause diagnosis help you with identification, mediating the problem, and keeping your service functional.  

Are there monitoring solutions where you are proactively looking and might foresee an event?

As a SaaS-based startup, your software stack is missing a key monitoring aspect if your system was unable to alert you hours ahead of your customer. Your customer informing you about an error should be the last resort. Usually, this last resort is seen in instances where SaaS providers missed or misjudged some metric and deemed it irrelevant once it broke in the customer environment and you weren’t notified. SaaS providers need to carefully choose their monitoring deck to avoid this oversight and monitoring needs to be done proactively rather than reactively. 

How much should the cost be for any application monitoring solution?

Cloudwatch, on-call rotation systems, and other metrics are not super expensive but can get expensive as your application scales and requires more and more components.  The human resources may be expensive, however, the actual dollar cost of setting up monitoring systems is generally pretty low. 

How do we monitor API endpoints?

For APIs, I always suggest canaries, simply because it is the easiest, fastest, and painless way to monitor your API endpoints. Every public-facing API should have multiple canaries that are continually testing it. There are numerous tools available, in the market, by cloud providers and third-party providers for testing API endpoints, in terms of security testing, function, and performance testing. 

Cloudwatch uses the concept of synthetics that allows you to set up a synthetic stack in another region from where you can continuously test your public API endpoints. 

How does a startup founder go about setting up a monitoring system and which tools are the best ones for it?

As a startup founder, it is easy to fall for the idea that more data means more results. I would caution against it and instead suggest starting small. For instance, start with CPU memory, focus on mechanisms and processes, and make a trial of when things might break. Install solutions for those points. Start with a few metrics preferably that can fill in one single screen and then focus on your mechanisms, your backend system, your paging, and the culture of operational excellence rather than the metrics themselves. Once you have achieved that successfully without oversights, evolve from there. 

What is the difference between agent-based and agentless monitoring?

In agentless monitoring, an agent is not required to be installed in the system for you to be able to monitor it. Systems that already exist emit those metrics automatically. Agent-based monitoring requires an agent to be installed when you integrate with an SDK and that agent collects all the desired metrics and emits them to your application backend. I recommend agentless monitoring but it can only take you so far. The agent-based approach provides much deeper insights. Take application performance monitoring as an example. Here, agent-based monitoring can allow you to passively monitor dom objects and measure the page load time via a short script that allows you to observe which part of the TCP handshake is taking longer, and so forth. 

This depends on how deep you want to observe the operations or performance and also on the feasibility of inserting an agent-based solution. 

What good vs. bad monitoring practices do you see in the market and what does the future look like for Monitoring Solutions?

For now, there are two strategies in the market: the new way of shipping software with monitoring and observability services vs. the old way of shipping systems that never break. For me, technology is easy, the people are the hardest part of the puzzle. But I have noticed that startups are stuck in legacy practices of not having visibility to their customer environment and their service performance. 

And as far as the future is concerned, the practice of monitoring your application is the future, where you can measure performance metrics, supervise the operational metrics of your application and observe business metrics for what features are providing value and which are not. All these metrics provide you with significant data to make agile decisions for your startup. All successful companies today, as well as in the future, will have these metrics and will be able to look each other in the eye and say we made a mistake and we’re going to change it because the data tells us otherwise. I think such courageous teams are winning today and they will continue to win in the future as well.

To Watch the Complete Episode, Please Click on This YouTube Link:

Read more

Supercharge your Product Development with Infrastructure as Code

Infrastructure as Code (IaC) is a dynamic and automated foundation for DevOps. But, as a startup founder, how much do you really know about IaC? 

We invited Abdullah Shah, Senior Software Engineer working with the Cloud Security Engineering Team at Salesforce, to enlighten us about IaC. Here is how it went down:

Why is there a need for IaC? What trouble will companies face if they don’t embrace IaC?

Without IaC, the IT infrastructure is configured, operated, and maintained manually. Historically, applications were developed using legacy IT infrastructure i.e. servers, storage, and compute. All these functions were provided using bare metal hardware in an on-prem environment. The configuration, operation, and maintenance were manually performed which meant high rates of human errors, delays, and demand for shedload members on the IT team. In addition, these manual processes exhibited lack of standardization of procedures and ineffective documentation, if at all. Collectively, this resulted not only in Environmental Drift but overall an inconsistent environment for application development and created even more challenges during scale-ups. 

Explain to us the concept of IaC, and why should the companies work with it? 

The irregularities we discussed in the absence of IaC, necessitate a more sophisticated solution for infrastructure to develop products. Infrastructure as Code (IaC) is that very revolution. 

IaC, in contrast to the legacy manual model, is a more descriptive model which allows the configuration, provision, and management of infrastructure using machine-readable files.

Infrastructure as Code
Infrastructure as Code

With automated Infrastructure, configuration, operations, and maintenance functions are now performed using scripts. As a result, IaC gives you performance consistency, monitoring accountability, operations agility, scale-up flexibility,  audit transparency, software security, and a much faster development cycle. Companies that have embraced IaC benefit from reduced operations costs and rapid time to market new products, services, and features. All in all, higher revenue and more satisfied customers. 

As a Start Up founder, what steps do I need to take to embrace IaC?

I would advise promptly embracing IaC or at least outlining the roadmap to immediately focus on embracing it. 

The first step is to evaluate the infrastructure requirements for the products and/or services you offer. Secondly, with multiple options available, make a categorical decision on your tech stack according to your product. 

Startups usually want to build and push products to the market rather quickly since they believe it’s merely prototype(s). However, I would encourage you to create a solid foundation with the right IaC principles and to implement IaC from the ground up. With a strong footing, scaling beyond the first 2- 3 servers will become streamlined and efficient with IaC. 

Is IaC required to have DevOps in a Start Up? How are they related? 

DevOps and IaC go hand in hand. One would not exist without the other. Although all companies apply the DevOps principles to varying degrees, the most popular is the Shift Left approach which is synonymous with the service ownership model. The idea is that the developers in IT teams are not working in silos but collaboratively, to create a holistic view of the entire application development lifecycle. In this spirit, developers are not only responsible for application development but also for creating the right infrastructure for the operations and deployment of the application as well. This means that the responsibility of coding has been fanned out among the IT team members. Testing and monitoring roles have been shifted left to the developers and all of this is enabled by IaC. 

Do I need to test and monitor IaC?

There is no substitute for testing and monitoring IaC; it practices more stringent test requirements. Infrastructure can be automated flawlessly with the use of correct IaC templates and avoiding misconfigurations. An array of testing tools are available for you to choose but the fundamental notion of testing being critical, for IaC, cannot be overstated.

With QA functions disappearing in pure DevOps culture, who would carry out these stringent testing and monitoring?

In pure DevOps, CI/CD has automated IT testing and operations which in turn accelerates the process of application deployment exponentially. This results in continuous updates to your application and infrastructure. Now, if you don’t have automated testing, you are in a heap of trouble. At this rate of deployment, humans can not keep up and companies must implement automated testing strategies throughout the application development supply chain. 

How would you address the IaC related fear of automation, in industry?

The idea is simple. If you anticipate failure, you can prepare for it and then mitigate it. The preparation you need is to implement a robust testing strategy and it is equally important to have a control feedback loop to continuously audit the strategy and improve. The ultimate goal is the provision of the right environment that doesn’t miss any build failures. Failures, caught and addressed in the next loop, will allow streamlined product deployment moving forward. 

Explain the difference between declarative and imperative approaches for us.

The difference is that Declarative is a functional approach and Imperative is a procedural one. Declarative IaC approach answers the question ‘What is the preferred state’ and Imperative IaC approach answers the question ‘How is the preferred state achieved’’. Since it is critical to maintain your infrastructure in the desired state, it is recommended to use a declarative IaC approach. Imperative IaC approach relies on control loops and conditional statements which can become a complex endeavor. 

Some supporting tools are available that create an imperative layer over the declarative model to provide an abstraction. Pulumi, for instance, is one such tool that is self declarative and can provide an imperative layer. Amazon CDK and Terraform are other examples of such tools that provide the best of both approaches. 

Which of these approaches in your opinion can help the companies with their tech debt?

Traditional practices slow down the application development cycle and can lead to technical debts. In unpredictable cases e.g. immediate customer requirement, badly written code, or new feature request, the right automated testing strategies are your only way out of incurring technical debt. That is exactly what IaC promises. It creates those guard rails around your processes that reduce technical debt. 

Mutable and Immutable infrastructures, which one would you recommend? 

During application development, changes are inevitable. Your infrastructure will need to be scaled up or down, updated, and/or patches will have to be applied. Post-deployment, if the infrastructure can be changed, it is mutable infrastructure and vice versa. In case of immutable infrastructure, the changes are rolled out on a replica new machine before the old immutable infrastructure is taken down. There are multiple ways to go about it. However, when in doubt, go with mutable infrastructure. 

In terms of tech stack, what are the best tools to implement IaC?

There is a spectrum of choices available, as discussed earlier. Amazon CDK, CDKFT, CloudFormation, Terraform, and Pulumi are all tools used to implement IaC. I suggest democratizing the decision among developers, SREs, and stakeholders since tech stack is not only meant for IaC but the entire application development pipeline is orchestrated. For developers, you have version control using Git; for operations CI/CD, AWS pipeline, GitOps; for code build service, there are custom Jenkins tools. Argo CD is a popular tool that operates specifically with Kubernetes-based deployments while Spinnaker allows you to deploy custom IaC technologies. However, the decision depends on your use cases and what is required to implement them.

My recommendation for IaC, is Terraform, as they offer one of the best tools, have an early mover advantage, a vibrant community, simplified syntax, and well-written documentation.

Does IaC help with the provisioning? 

For Sure. IaC, much like software development, is version controlled in a centralized repository. Any update or change in features will be validated, tested, built, and pushed to required registries automatically. These automatic processes present a holistic development pipeline. All these tools come together to facilitate everything from integration to deployment and provisioning is definitely one aspect of the whole picture as well. 

About IaC On Cloud Vs IaC On-Prem; your thoughts?

On-Prem infrastructures are heterogeneous when it comes to provisioning identity services, secret services, and/or network security. Limited automations can be performed on On-Prem environments. Most services have to be custom-built and are not scalable. On-Cloud, on the other hand, is standardized in service provision. Infrastructural resources become more flexible to scale up or down as required. Documentation best practices, abstractions at hand, repeatable and predictable environment are some of the factors that put IaC On-Cloud in a league of its own and hence offer more value. 

What IaC profiles should I look for, as a StartUp founder? 

Broadly speaking, for pure DevOps, Developers and Site Reliability Engineers (SRE) with the mindset of service ownership model are increasingly in-demand skills in Silicon Valley and across the world as well.

Tell us about the best practices for IaC in the industry?

To build an application infrastructure specifically in the agile world of IaC, the best practices require a set of foundational services at the periphery as well. For your company, best practices mean the acquisition of full package of foundational services i.e. Compute, storage, database, networking, security, all of it. 

What does the future hold for IaC?

IaC has already saved a lot of money for different companies. It has sped up the process of software development from integration to deployment and ultimately delivers value to the customers. IaC helps create and serve those customers in a very fast, agile, and repeatable way. 

In terms of the future, we already know manual infrastructure provision can not scale or perform at the same pace as IaC. Hence, lost customers and technical debt are natural outcomes. To enable those fast, repeatable deployments, and product iterations, IaC is the inevitable future and I very strongly believe that companies are and will implement full robust, orchestrated infrastructure, and pipelines at the heart of their businesses. 

To Watch the Complete Episode, Please Click on This YouTube Link:

Read more

How to Get Started With DevOps: Queries of a Startup Founder

Cloud DevOps

As a startup founder, why should you know about DevOps? To answer that question, you need to avoid the most common mistakes similar ventures make when they start. Financial concerns aside, most startups experience challenges with timely product release, team collaboration and processes establishment, product quality, and customer satisfaction, to name a few. What if we were to tell you that all notable corporations are using one key concept to address the above challenges, and you can have a head-start if you instill that key concept into your startup from conception? Yes, you guessed it right! That key concept, being heavily adopted, is DevOps.

To answer all the above questions, we invited Ali Khayam to our Xgrid Talk series. Dr. Ali Khayam is currently working as a GM-SDN and Internet Services at Amazon Web Services (AWS). Being an expert on the subject, Dr. Ali sat down with us to help us navigate the concept of DevOps. 

What Is DevOps?

The split ownership model that the software is built, tested, and operated by separate teams is debunked by DevOps. In DevOps, you build it, you run it. The concept has been in practice but it became mainstream with the introduction of cloud technology. Since the servers and networking infrastructure are no longer managed by the developers on site, it is reasonable for the developer team to run the software in production as well. 

What Is CI/CD?

CI stands for Continuous Integration whereas CD stands for Continuous Deployment. CI/CD is the tool that enables the DevOps philosophy to be implemented. Continuous Integration is an automated build and test process. It is a huge improvement on the manual testing which was a very tedious procedure with enormous test plans and spreadsheets. Continuous Integration has eliminated the requirement of dedicated quality assurance teams. With CI, the software developers are the ones that create automated tests for each check-in. This allows for each commit to be tested before becoming a possible cause of regression for your existing software. If any of the commits are not working as expected or any packages are not built, CI will prohibit merging that commit to the repository. CD will then deploy the software without human intervention. 

What Are the Benefits of DevOps?

Rendering quality assurance and deployment teams unnecessary is the most cost-effective benefit for an organization. DevOps also reduces time to market and enables more ownership among the development team since they oversee the end-to-end process of development.  

What Steps Does a Startup Have to Take To Implement DevOps?

If a startup has not yet been using the DevOps methodology, the team needs to be brought on board. First, they need to understand that their role is no longer confined to the development of software and unit testing but they need to build functional and system tests as well. 

The automation of the CI/CD pipeline will require the organization to choose from an array of tools already available. Depending on your use case and where your workload is hosted, plays a vital role in selecting a tool. For example, most public clouds offer inbuilt CI/CD options. For data centers, you have Jenkins among other options. In the case of a SaaS platform, you have CircleCI, TravisCI, etc. for CI while Argo CD, Flux CD, etc. for CD.

It is a common misconception that DevOps cannot be opted until monolithic architecture is dissected into microservices. If test frameworks are not available, it will be worthwhile for startups to hire developers that can develop those testing frameworks to automate the monolithic architecture. 

Once the right tools for your CI/CD pipeline have been identified and implemented, the next step is to iterate the process to make the product better and/or scalable for your application architecture.

What Employment Profiles fit a DevOps Team?

In order to understand the role of a DevOps team, we need to understand the software stack and its operations. Applications run on an infrastructure and the DevOps software stack needs a host for this infrastructure which provides the orchestration layer for the developers to write application software.

This is where DevOps engineers come into play. However, all developers should be DevOps engineers where they should build, test, and deploy the application. Apart from DevOps, other profiles are Site Reliability Engineers (SREs) and Infrastructure Engineers. 

The infrastructure engineers manage the IaC. Developers write the application and their own test cases. The responsibility of the performance of the code resides with the developer. The role of SREs has cross cutting concerns, it is to make sure that your entire infrastructure and all the applications are working as expected.

What Is the Path to Transition From Traditional Development to DevOps?

If you’re just starting, the fastest and easiest way to get familiarized with the concept of DevOps is using a cloud-based solution or a SaaS product. The models built in these platforms are portable and comparable to the legacy infrastructure. 

What Key Performance Indicators (KPIs) Do You Have in Place for DevOps?

The KPIs remain the same for traditional application development and DevOps application development e.g. transaction per second, latency of API calls, and such. The DevOps methodology applies to the way the development is done by changing the responsibility from multiple teams to one team so KPIs for both environments do not vary significantly. 

Infrastructure as Code (IaC): Is It the Underlying Theme for DevOps?

IaC is a widely adopted practice. It helps remove misconfigurations from the product deployment process. With IaC, the scalability and replication of the stack has become effortless, in contrast to legacy systems where the time and error-prone human elements were part of the parcel. 

Every cloud service provider or SaaS offering provides you with an Application Programming Interface (API) service. The developers write code on top of that API surface. The next time if the startup needs to replicate that stack, only the name of the region has to be changed and the same stack will be up and ready in another region. The flexibility that comes with DevOps using IaC is massive in scale; the extent of which can be seen in the fact that these high-level abstractions are available in simple config-based languages that allow the API calls to be seamless. 

What Tech Stack in DevOps Development Do You Suggest?

The Tech Stack selection is a decision that is based on where your startup has its workload. If it’s on Cloud or AWS to be specific, Cloud Development Kit (CDK) and CloudFormation are two native options. However, CDK is the new generation which is recommended. 

Puppet, Chef, or Ansible automations are other options available on the cloud which are feature-rich. They are quite similar in terms of functionality. Ansible is an SSH infrastructure which is easy to set up and manages everything through SSH commands and provides an automation of abstraction on top of that.

Puppet and Chef are more full-blown languages that have their own management channels and offer separation of config from everything else in code. In conclusion, it really depends on what employment profiles you have and how long do you have to bootstrap the development process. With IaC, it is recommended to select it carefully, keeping time, effort, and skill sets in your hands. It takes longer to onboard developers on Puppet and Chef but they offer more flexibility.

Your use case factors in while choosing an automation for your development process. Whether it is a front-end or a back-end application and what languages are being used for the application. 

In case of CI/CD, you make a decision based on if you are developing fully on-prem or are using cloud-hosted facilities or have a hybrid infrastructure or what your budget allows you. Jenkins is cheaper,  hence a good place to start. Deploy to CD frameworks, many of which work on top of Jenkins seamlessly. Jenkins offers its services on cloud as well. Nevertheless, clouds offer their native options such as code deploy or code pipelines as well. 

For cloud management specifically, the native cloud-offered services are recommended because they provide a much richer experience as they are better integrated features. The application management and performance monitoring can be done with external sources but for Cloud itself, the native solutions are best.

For security, since it is a layered construct, I would recommend taking full advantage of cloud-native security offers at the base. On top of it, there is a communication layer with anomaly or intrusion detection. There are SaaS and cloud-native solutions available to tackle this and can be chosen keeping your startup in mind.

How to Bring Culture Change in a Startup That Started With Legacy Architecture?

If you inherited a team of developers with legacy architecture, the cultural change as well as structural shift in the team is inevitable. Change mindsets. Hire strong team players who can build automations and improve the performance of the development process in a timely fashion.

What Has Been the Adoption Rate of DevOps Among Corporations, SMBs, and Startups?

The market segmentation is not based on the size of the company but how long a company has existed. New companies find it easier to start with DevOps. The more established companies that started with legacy architecture, however, are slow in their transition to DevOps. Companies such as AWS and Netflix are exceptions to this general trend. 

While transitioning from development and operations team to a consolidated DevOps team, start with the QA team and ask them to automate their operations. They can learn how to build and integrate every check-in and then can be integrated with the development team.

Has DevOps Made It Harder for Employers to Find These Employment Profiles?

DevOps has made the process of application development faster. I would say, the developers’ mindset hasn’t shifted as quickly as the technology. This gap exists because universities have not changed their curriculum. The graduates are not familiar with DevOps terminologies. 

What Would You Recommend to a Legacy Engineer to Learn to Be Reskilled?

All the SaaS, Cloud, and CI/CD companies have built a lot of training material around their services. So pick your technology stack and get started!

Three Pieces of Advice for Startups Who Are About to Start Their DevOps Journey

Start early. Invest your time and effort into building quality IaC and testing frameworks even if it means your first 2-3 rollouts take longer. Worth it. Secondly, be very clear on the KPIs of your success and the metrics of your product development health and output. Lastly, iterate and keep the metrics and parameters tight throughout the journey. Do not let the culture corrode around the edges. It is easy to regress to legacy methods if you are not vigilant throughout your DevOps journey. 

To Watch the Complete Episode, Please Click on This YouTube Link:

Read more