3 Biggest Misconceptions of Migrating to the Cloud

Surinder Singh

Dec 27, 2024

On the go and prefer an audio version of this newsletter instead?

Listen to these AI Agents summarize and discuss this article.

1×

0:00

-21:02

Come back and read the article when you are ready to unpack the details.

Cloud Migration as a Value Creation Initiative

There is no stopping the mass migration of technology workloads from existing data centers into the public cloud (e.g. AWS, Azure, GCP).

Companies looking to modernize their technology stacks or digitize their business workflows automatically think of the public cloud as the destination to operate out of when they are ready to launch to their users.

And public cloud companies have done a great job at marketing the benefits of the cloud such as scalability, flexibility, agility, elasticity, productivity and cost savings.

In the ~100 diligence I’ve supported evaluating Enterprise Software companies, there is almost always a value creation initiative that involves migration into the cloud. The general hypothesis here is that the migration into the cloud will help modernize the existing asset (“SaaSify”), support scale and growth as per the investment thesis and/or drive multiple expansion because the workload (software products) operates in the cloud and appears to be modern.

While the hypothesis might hold true at time of exit, the execution often comes with a set of speed bumps and detours that require corrections along the journey to get to our desired state.

These are the 3 biggest misconceptions I’ve seen for value creation initiatives driving migrations into the cloud, in some cases prematurely recommended without fully understanding the operational characteristics of the workload to be migrated against the investment goals of the acquisition.

1) Cloud = SaaS

Software-as-a-Service (SaaS) is how you charge (business model) your customers for your software, rather than its technology stack, architecture, composition or where it is hosted from.

SaaS is your customer paying for your software on a recurring basis, for as long as they are getting value. In some cases, it’s delivered over the internet in a browser (e.g. Salesforce) while in other cases SaaS could be software you have to download and install on your computer (e.g. Microsoft Office 365).

SaaS can be operated and delivered from a data center where you rent the rack, pay for power & cooling and bandwidth while providing your own hardware. SaaS can also be operated and delivered from a public cloud where power, cooling, bandwidth and hardware (undifferentiated services) are all abstracted away from you.

The benefit of the public cloud is that the undifferentiated services are abstracted and taken care for you while you focus solely on your differentiated services (your software, data and workflows).

Data Center vs Public Cloud Operational Stack

How much of the undifferentiated services you want to (or should) manage comes down to the needs and characteristics of your operating environment.

It’s not uncommon during diligence for investors, portfolio operations teams or your diligence partners to take the position that all non-cloud workloads must be migrated into the cloud, in order to support scale/growth from the investment thesis because of all the marketing we’ve received from public cloud companies and industry sentiments in general.

The competitive nature of acquisitions along with the sparse access to information in the data room can lead to assessments that prematurely recommend migration of workloads into the cloud without fully understanding how future financial outcomes map to operational characteristics in the cloud.

Reasons to stay in your data center

The reality however is that not all workloads require migration into the cloud and each workload should be assessed on a case-by-case basis. The following workloads characteristics could be ideal scenarios for continued operations in the data center (depending on your financial model and growth projections):

Software products towards the end of its maturity curve.
- No active marketing dollars will be allocated towards growing this product.
- No active efforts to modernize the platform with the only R&D investment to “keep the lights on”
Slow business growth anticipated for the product in your financial models
Software products that are not business critical, lacking high resiliency requirements
Software products that have predictable usage pattern
- e.g. users only use the software from 8am to 6pm, Monday to Friday
Software products that run on a legacy technology stack today that don’t have allocation of budget for modernization.
- Legacy architectures typically perform poorly in the cloud without any modernization and costs much more than operating out of the data center
Not disrupting long customer onboarding cycles due to complex data conversion and integration needs (enough time to provision resources in a data center)
High-cost sensitivity towards operating the product even when factoring growth

The stigma of not operating in the cloud gives investors the perception of not having a modern asset or the inability to scale to support growth, which is actually an ill-placed belief.

Litmus test for cloud migration

Are you willing to invest to modernize the technology stack for your product?
- Can you realize and quantify the benefits of a modern technology stack?
- Consider the cloud as a viable option if you can show a positive ROI vs TCO
Determine the rate of growth for each of your products and whether they require scalability with short lead times
- Is timing between bookings and go-live too short to provision and setup hardware in your existing data center?
Products requiring high resiliency (99.9% uptime or higher) can be good targets for the cloud
Does usage for your products constantly spike beyond its normal operating threshold?
- e.g. eCommerce stores, Financial Systems, B2C platforms

There are a lot of benefits (some harder to realize that the others) to moving into the cloud but each workload should be assessed individually to determine whether the additional investment in cost will generate a positive return which leads me to my next point.

2) The Cloud will be cheaper than my Data Center

By far, the most common misconception is that operating in the cloud is going to be cheaper or at least cost neutral, compared to the current data center setup.

The cloud offers benefits that are difficult to measure (in a quantifiable way) such as scalability, flexibility, agility and elasticity and comparing cloud costs against data center operating costs alone, is the wrong way of looking at this.

In a report published on April 28, 2021, by Gartner titled “Realize Cost Savings After Migrating to the Cloud”, Garner shares that organizations that lack a well-defined plan for cloud cost management may be overspending on the cloud by 70%

Based on my time as an Operator building software and operating workloads out of the cloud and a PE Operating Partner supporting PortCos migrating into the cloud, I have no qualms stating that in most cases, cloud costs end up being higher than their data center costs.

It is your job to determine whether these higher operating costs, which afford you with the scalability, flexibility, agility and elasticity, delivers a positive ROI over the holding period.

It could very well be that these cloud benefits don’t really add any value to your business, relative to the investment thesis you are after.

We can examine decisions and actions taken before migration and after migration to understand why cloud costs net out higher than your data center costs.

Before Migration

The decisions and actions taken before migration that end up driving costs higher include:

Teams wanting to operate in the cloud the same way they did in the data center
- This is also known as the “lift-and-shift” approach
- Solutions tend to be hypervisor (VM) based running on EC2, allowing teams to run their applications with minimal changes needed
- EC2 compute nodes come in pre-configured compute and memory combination and hence might not represent the most efficient bundling for your workload
  - You either end up overpaying for memory or overpaying for compute depending on the operational characteristics of your workload
  - I’ve seen scenarios where teams running a Windows based workload that had high memory consumption having to provision EC2 nodes that were bundled with high compute (vCPU) capacity. This resulted in the setup provisioning more vCPUs than needed and since it was running on Windows, the team also had to pay a Windows license fee for every vCPU that was provisioned, even though the compute was not needed.
Incorrect inventory gathering during the assessment phase of the migration
- Teams take inventory of what they have in the data center instead of taking inventory of actual usage and translating that usage into capacity needed for the cloud. In essence, not right sizing for the cloud to support a tailored fit solution
Lack of expertise to forecast consumption and map to operational characteristics of the cloud which leads to the next point
Wait-and-see approach
- Seen this all too often. “Let’s push everything into the cloud and wait 4 months to see what our costs look like before we can tell the CFO and CEO how much the cloud will cost”
- Most get a sticker shock when they see their invoice on the 4th month and it’s a scramble to get costs back in line, often leading to bad technology and operational decisions for short-term financial correction

After Migration

Once workloads are in the cloud, the following decisions and actions continue to keep costs high in the cloud which include:

Lack of oversight and governance of the cloud environment
- Loose controls around who can spin up new resources in the cloud, how they are to be managed and lack of automated techniques to manage them
Immature cloud operating practices
- Sloppy operations in the cloud such as leaving resources running after no longer being needed. This is like leaving your faucet running after you’re done washing your hands. Imaging your water bill at the end of the month!
- Overprovisioning of resources: spinning up resources with ample headroom to accommodate usage spikes
  - It’s not uncommon to find compute resources sitting idle 90% of the time because engineers have over-provisioned more than what their workloads need, to cover those just-in-case scenarios when there is a spike in consumption. This is a very “data center approach” to solving for elasticity of demand, an anti-pattern in the cloud
- Not taking advantage of Platform-as-a-Service (PaaS) offerings to simplify orchestration and operations in the cloud to manage databases, in-memory caches, load balancing and containerization
  - Resorting to EC2 compute nodes to self-operate and manage capabilities that are available as a readily available resource
Lack of modernization to take advantage of the cloud
- It’s not uncommon to see teams maintain their old architecture and continue to use relational databases to power event driven architectures, queues or in-memory caches when the cloud has tailored fit solutions to support these structures at a better cost profile

What used to be a fixed monthly spend in the data center is now variable spend, based on your customers’ consumption patterns and your engineering team’s maturity of operating in the cloud.

R&D OPEX (Headcount) Costs

One of the benefits the cloud touts is Productivity. The ability to do more with less or the ability to free up your team to work on higher value work by delegating the undifferentiated work to the cloud providers.

The concept is sound however what’s observed in the field is that teams that used to manage data centers operations often move over to operate their cloud footprint hence not realizing on the productivity savings the cloud promises.

The expense for the team still sits on the P&L resulting in no savings or the higher value work they’ve moved on to can’t be quantified and translated as outcomes on the P&L.

Cloud cost management

PE Firms to offer their PortCos access to relevant training to upskill for the cloud
PortCos to stand up a FinOps practice to allow for collaboration between IT, DevOps and Finance
Understanding the IaaS → PaaS → FaaS modernization journey and learning how to apply them to their workloads, in alignment with their growth plans and investment thesis
Understand how to quantify the benefits of scalability, flexibility, agility and elasticity so that migrations into the cloud can be justified, even at a higher cost relative to your data center
Realize on the Productivity benefit from the cloud by finding ways to do more with less in the cloud and capturing the R&D OPEX savings on the P&L

3) Operational Resiliency included out of the box

There is the perception that once your workload is in the cloud, you can breathe a sigh of relief knowing that everything is now redundant, running several nines (99.99%) and should a catastrophic event take place, your data and operations are safe.

I’ve got bad news!

Those benefits don’t come by default when workloads are migrated into the cloud. Operational resiliency has to be explicitly designed and actioned into your migration plans and you need to break down requirements into High Availability and Fault Tolerance.

High Availability

High availability is your environment’s ability to minimize downtime during normal operating scenarios. It is the difference between running your operations at 99% uptime vs 99.99% uptime, based on the needs of your customer and business.

Let’s assume an Enterprise Software company just migrated a workload from their data center into the cloud, running on EC2 compute nodes. As listed on this AWS page, there are several different SLAs for the use of EC2 nodes, one at the instance level (99.5%) and another for the region level (99.99%).

Not understanding or not architecting for the nuanced difference between a Region, Availability Zone or a Virtual Private Cloud (VPC) will end up delivering different uptimes (and hence SLAs), from the ones you’ve promised to your customers.

A 99.5% uptime translates to 7m 12s of downtime per day while a 99.99% uptime translates to 8.6s of downtime per day. Depending on your industry and customers, that could be the difference between retaining satisfied and happy customers vs churning customers because your systems aren’t resilient enough for their needs.

Your uptime requirements (and SLAs) along with your application design, tenancy model and network design will determine the level of sophistication needed to achieve High Availability. This in turn will have a direct impact on the cost of your cloud infrastructure which could be as much as 2X what you were used to in the data center.

Fault Tolerance

Fault Tolerance expands on the notion of High Availability to offer the greatest level of protection. If High Availability gives you the necessary uptime to operate under "normal” circumstances, fault tolerance is about the ability to weather catastrophic failures should multiple components fail.

Fault Tolerance systems are intrinsically Highly Available with near zero downtime, but a Highly Available solution is not completely Fault Tolerant.

Imagine if a single geographic region (e.g. East Coast) in your public cloud provider went down, would your application be able to withstand that disruption by switching over to a region on another coast?

If your software powers industries such as Online eCommerce Stores, Medical Systems, Financial Systems, Telecommunications or Aviation, the need for High Availability and Fault Tolerance will be a critical requirement.

Designing for Operational Resiliency

Out of box deployments will almost never get you the level of resiliency you need for your business. You need to explicitly design for High Availability and Fault Tolerance according to the needs of your customers and industry.

Review existing contracts to understand SLAs promised to customers
Understand consumption pattern throughout the day, week, month to plan for the right level of capacity and elasticity needed
Design a highly available system based on identified consumption and operational characteristics
- e.g. Single Region, Multi-AZ setup, Load Balanced, Horizontal Scaling
Determine level of Fault Tolerance required
- Some customers and industries could be forgiving in the event an entire region goes out (e.g. East Coast)
- Consider Multi-Region setup if Fault Tolerance requirements are stringent
  - e.g. Operating out of the East Coast and West Coast or East Coast and London
- Consider a Hybrid approach (Data Center + Public Cloud) to hedge your risks

Considerations for the Cloud

While the public cloud has been around since 2006, the continued innovation coming from public cloud vendors make it a challenge to find the right tailored fit solution for your workload migrations, especially with the multitude of choices available today.

Teams who try to do this on their own without any prior experience often run into obstacles and speed bumps, delaying the realization of benefits from the cloud. It is recommended to find a partner that specializes in a particular cloud, for your workload type, to execute your migration or at the very least, guide you with the appropriate architectural design.

As part of each workload migration, PortCos should develop their Total Cost of Ownership (TCO) models that take into consideration the costs to modernize, migrate and test in the cloud along with the overlapping (double bubble cost) data center costs associated during the migration period. PortCos should also explicitly plan to capture R&D OPEX savings to realize on the benefit of Productivity that the cloud can deliver instead of passively relying on those savings at the tail end of the initiative.

In addition to understanding the TCO, PortCos should perform a Return On Investment (ROI) vs TCO analysis to capture and quantify the benefits (e.g. scalability, flexibility, agility, elasticity, productivity and cost savings) of the cloud for workloads being migrated into the cloud. The impact of benefits (ROI) should be visible on the P&L in years to come in order to justify the migration into the cloud.

Critical Success Factors for migrating into the Cloud

Perform a cloud readiness assessment to determine how much of your workload can be migrated into the cloud to take advantage of its native offerings
Leverage the 7R Migration Strategy to determine how best to migrate each workload into the cloud
- Prepare to make an investment in modernizing your workload, ahead of your migration, to maximize ROI from the cloud without running into speed bumps
- Revisit your application architecture and tenancy model to determine how well it can support scalability and elasticity to handle projected growth in the business as well as spikes in consumption
Prepare a financial model that captures both the TCO and ROI for your cloud migration, ensuring the break event point is well within your hold period

Digital Value Creation

Discussion about this post