Google Cloud Architect Case Study

Google Cloud Architect Case Study | Networking Funda

  • Post comments:1 Comment
  • Reading time:13 mins read

Google Cloud Architect Case Study #1

Case Study 01: Design and Plan

This is a very common situation, where it isn’t actually an application and a solution, but rather earlier in the business lifecycle moving to the cloud, where the customer is interested in establishing standards and systems whereby their organizations can design and develop their own solutions as needed.

The problem the customer came to us with was that their project teams were provisioning hardware, which was slow and arduous. A side effect of the slowness was that it stifled experimentation and innovation. Nobody wants to ask for hardware unless they know exactly what they need it for. That was the business requirement the customer expressed to us.

A customer had this interesting business requirement…

  • Provisioning VMs within their own data center took weeks
  • They wanted to empower their IT teams to develop new applications in the Cloud with greater autonomy and more quickly, using a hybrid Cloud strategy

We need a solution where the customer can provision projects, including run-time environments, at scale. And it needs to work very quickly. There is a review process. So this is not a “free-for-all”. There is a human being or a group that will review certain parameters – a review gate. And basically, once a request has passed the review gate, it should only take a couple of minutes for the resources to be deployed.

We mapped that to technical requirements like this…

  • A turn-key Google Cloud project for any developer team that requested it
  • Manually provisioned within 24 hours, following a brief review of the request.

From a practical implementation perspective, that meant setting up the Organization node, and Folders for each organization, for different kinds of organizations and different kinds of projects. We implemented VPN tunnels to Google Cloud so the company could reach the cloud resources without having to go through the public internet space unprotected.

We also went through a process of identifying the proper IAM roles to assign to the various environments. And we needed to establish processes for Project Owners to be able to log into the environment. We needed to synchronize these roles and identities with the Active Directory service in the data center.

And this is how we implemented that technical requirement.

  • User identity via Google Cloud Directory Sync (GCDS) from on-premise
  • Network connectivity between colo and Google Cloud via two VPN tunnels
  • Organization node, folders by department and use case, project by environment
  • IAM roles commonly Project Owner, not Project Creator.

We needed to set up processes for establishing new project creation and who would be assigned specific roles. So there was a kind of process and logic around this that needed to be established. What kind of data is going to be stored in each project? Is it confidential? We need to go through some basic classification. And who is going to pay for it? In this case, there were multiple billing accounts, and that needed to be decided.

Google Cloud Architect Case Study #2

Case Study 02: Provision and Manage

One thing I see with every large customer is that they have business requirements; they want to document resources. If I’m paying the bill, I want to know who is consuming resources, spinning up VMs for example. And I want to know that this is going through some sort of review process.

This customer wanted me to minimize the impact on the developer’s productivity. As soon as you start to add any sort of process, you slow things down. And they were very adamant that I should not hurt their productivity. That was both a technical requirement and a business requirement.

A customer had this interesting business requirement…

  • Cloud infrastructure resource provisioning must be documented and auditable
  • Changes to the cloud infrastructure must go through a review process
  • Specific: Must minimize impact to developer velocity

We immediately knew that part of the solution was “infrastructure as code”. There are a lot of reasons to implement infrastructure as code. But forcing a solution into this implementation means you get the benefits of auditing, code review, and all those things that help satisfy the business requirement.

The customer has dozens of development teams but only one Cloud Engineer. So the solution needed to be distributed. Each team is actually responsible for owning its own process.

We mapped that to technical requirements like this…

  • Provisioning infrastructure should be done as Infrastructure as Code
  • Specific: Decentralize approval/code review process. Teams should own the process.

And this is how we implemented that technical requirement.

  • Terraform/Deployment Manager
  • IAM Strategy – Least Privilege
  • Service Accounts
  • Specific: Utilize source control options, such as Github Enterprise

They use Github Enterprise with a code owners file. The file is a text file that limits who can actually perform actions in the repository.

Google Cloud Architect Case Study #3

Case 03: Security and Compliance

This customer had a common FinServ requirement. The customer did not want any data to traverse the public internet, for obvious reasons. So they had a security strategy that included a technical requirement to use private APIs to access Google Cloud resources.

They saw this as fundamental to their security strategy. Additionally, they wanted to know how the Cloud Provider secured Standards Certifications, and what they did to stay current. They were concerned that the provider might lose a certification that they were relying on for business.

A large financial company wanted to improve their security posture, a common FinServ requirement…

Security

  • Business Requirement: Data cannot traverse the public Internet.
  • Technical Requirement: Must have private API access to Google Cloud services as a good security practice and to minimize data exfiltration.

Compliance

  • Business Requirement: Cloud provider must earn the trust of the business. How does Google Cloud maintain the latest standards around security, availability, process integrity, privacy, and confidentiality?

The first thing we did was made sure all access to Google Cloud was through secure methods, including SSL, VPN, Interconnect, and private API.

We decided to use a new feature that was in alpha, called VPC Service control. https://cloud.google.com/vpc-service-controls/ This enables a security perimeter. For example, BigQuery could be placed inside a security perimeter, and then could only be accessed at a private endpoint. And then there were standards and compliance such as ISO and SOC. We provided these to the customer – and they needed to sign agreements to be covered by Google’s guarantees about these standards.

We mapped that to technical requirements and Google Cloud’s products and services…

Security

  • Ensure all traffic to Google Cloud is through secure methods, such as SSL/TLS, VPN, Interconnect, and private APIs / private endpoints.

Compliance

  • Google Cloud has Standards, Regulations & Certifications that would meet their compliance requirements and help earn their trust in our platform.

And this is how we implemented that technical requirement.

VPC Service Controls / Secure Google Cloud API

  • Restrict access to user’s Google Cloud resources based on the Google Cloud Virtual Network or IP range.
  • Restrict the set of Google APIs and Google Cloud resources accessible from user’s Google Cloud Virtual Network.

Standards, Regulations & Certifications

  • Products regularly undergo independent verification of:
  • Security / Privacy / Compliance Controls
  • Certifications
  • ISO 27001, 27017, and 27018 and SOC 1, 2, and 3 certifications.
  • An interesting point about both security and compliance, is that it is a “shared responsibility” model. So although we provided secure access and layered protection, the customer needed to use IAM to manage access to its employees and implement secure practices in its procedures. Also, the standards compliance covers the cloud resources, but not the customer’s application. So they may need to take extra steps to ensure that the overall solution is compliant.

Google Cloud Architect Case Study #4

Case 04: Technical and Business Processes

You may have seen this situation before. The customer experience is that pushing to production is scary because you never know when things might break. This customer could only push to production once a month. There was a significant risk of downtime. And when there is downtime, the application breaks, and that impacts revenue. So the summary is… we need to be able to develop and deploy features, need to present them to production, without it being an event.

A customer had this interesting business requirement…

  • Pushing to Prod is a big event and happens once a month.
  • Significant risk of downtime due to unforeseen issues.
  • Downtimes exceeding SLA have a revenue impact.
  • Need to develop and deploy features without burning the house down. Pushing to Prod should be a non-event.

There are two passes through the problem in the architectural process. First, there is the business pass, which includes both the business challenge and the people processes – understanding what roles there are and what actions the people need to be able to take. And then there is the technical pass which is mapping all these needs and procedures to a technical solution.

We mapped that to technical requirements like this…

Establish CI/CD pipeline:

  • Single source repo per product; git-flow as branching model.
  • Automate build, self-testing, rapid.
  • Automate deployment.
  • Setup robust monitoring, logging and alerting for visibility.

Promote team culture:

  • Test Driven Development.
  • Push often, address broken builds immediately.
  • Transparency.
  • Change management / Release process.

To push production on demand, we needed to analyze the development process. We figured out that a single source repo made the most sense for the whole team. We decided to go with git-flow as a branching model instead of other kinds of development.

We automated the build process. But also we made sure that the builds were self-testing using SALT test coverage. And we measured the build process and made sure that it was rapid. We also automated the deployment of the solution software. Couple this automation with monitoring, logging, and alerting, and you get a nice build-and-deploy system that can be handed off to a team.

But that doesn’t solve the entire problem. The system has to be used. And this team was not going to be accustomed to using the development paradigm we had implemented. So we also needed to enhance their processes. This primarily had to do with how test and development worked together. In the new paradigm they would start writing the testing along with development.

The new way was to push to production often and push early, and as soon as something breaks — fix it. This meant a new team culture, one of accountability and transparency. Where all the stakeholders could participate in identifying and resolving a problem. And that meant change management and leadership buy-in was necessary for the technical solution to be successful.

And this is how we implemented that technical requirement.

  • Cloud Source Repositories – for hosting their repositories
  • Container Builder – that builds the docker container images
  • Container Registry – that hosts those container images
  • Google Kubernetes Engine + Helm – for running and managing
  • Spinnaker – for CDP – continuous deliver
  • Cloud Load Balancing, Google Cloud’s operations suite, Cloud IAM + Service Accounts – management constructs for proper visibility
  • Cloud Load Balancing, Stackdriver, Cloud IAM + Service Accounts – management constructs for proper visibility.

Google Cloud Architect Case Study #5

Case 05: Managing Implementation

A lot of the customers we see are in the Enterprise space, so their needs are very similar. This example came from a Financial Services company. We often see similar requirements among FinServ companies.

A Finserv customer had this interesting business requirement…

  • Encryption in transit and at rest for all developer operations
  • Follows Google Best Practices
  • All Keys must be managed by Company – they wanted to own the keys

The real trick here is that the structure and solution had to be put into production at one time. It couldn’t be built in parts into production. It had to be all working when it went into production. That caused us to think about what parts were inherent, and what parts we could automate. So we ended up using a Jenkins pipeline and Deployment Manager templates for parts of this automation.

We mapped that to technical requirements like this…

  • Use Google Authentication.
  • No Public IP access unless through bastion host.
  • No Operations team access to the production environment. That means “no ops” – everything is automated.
  • Minimize downloaded keys.
  • Keys accounted for via business logic application.

All of the Google APIs are encrypted in transit and authenticated. So that requirement was inherited and automatic. The production team needed operations access but without handing them keys. So what we did is implemented all operations in deployment pipelines using Jenkins and Deployment Manager. The business logic was implemented using python in the Deployment Manager templates.

And this is how we implemented that technical requirement.

  • All Google APIs are encrypted in transit, and authenticated.
  • Production has operations team access – all deployment pipelines via Jenkins and Deployment Manager.  Business Logic in python templates in Deployment Manager.
  • The Cloud SDK was not installed in local machines – Cloud Shell ensures no keys are downloaded.
  • Service Account keys when needed for off-Google Cloud clients are managed via deployment pipelines.

There are two kinds of operations actions; on-Google Cloud actions and off-Google Cloud actions. For on-Google Cloud actions, we didn’t install the Cloud SDK on local machines. Instead, we set them up to use Cloud Shell. That ensured that no keys were downloaded.

For off-Google Cloud actions, the Service Account keys were managed via the deployment pipelines. Any time there is a need for off-Google Cloud access, the clients are managed via the deployment pipelines. So that means there is a full audit, control, and records of those keys, who had access to them, and when and where they were used.

Google Cloud Architect Case Study #6

Case: 06 Solution and Operations Reliability

A customer had this interesting business requirement…

  • The back-office system needs to support frequent updates
  • The back-office system needs to be available – especially between 06:00 CEST and 18:00 CEST
  • A failure in one part of the back-office system shouldn’t bring down the entire system
  • Customer wants to re-architect the system. Does not want to bring down the entire system when doing an update.

We mapped that to technical requirements like this…

Microservices!

  • Break apart the back-office system into independent services
  • Create a standard way for teams to publish logs and metrics for their services
  • Create a standard way for services to be rolled out

This was a natural fit for microservices. They knew that when they told development groups that they would be developing their own microservices, that they needed standards for reliability and scalability, and they wanted common ways to monitor the applications.

And this is how we implemented that technical requirement.

Google Kubernetes Engine

  • Microservices deployed into a shared cluster
  • Surging Rolling Deployments with GKE’s deployment resource

Stackdriver

  • Custom Metrics – a wrapper library around the Cloud Monitoring client libraries to:
  • Expose “common” metrics
  • Expose custom metrics

The solution was to use Cloud Monitoring. Exposing the metrics could be done through dashboards. Exposed metrics through Prometheus standard, scraped from APIs and sent to Cloud Monitoring where it could be exposed through dashboards.

They used custom metrics in Cloud Monitoring, so they were able to monitor and scale their microservices based on those metrics.

Also Read

Professional Google Cloud Architect Exam Tips

Architecting with Google Kubernetes Engine: Foundations | Download Now

This Post Has One Comment

Leave a Reply