Key Facts for Google Cloud Platform Engineer – These are key facts and concepts around Google Cloud Platform Cloud Engineering and will help in a quick revision for your Google Associate Cloud Engineer Study.
- The command – line command to create a Cloud Storage bucket is gsutil mb, where gsutil is command line for accessing and manipulating Cloud Storage from command line. mb is the specific command for creating, or making, a bucket.
- Adding Virtual Machines to an instance group can be triggered in an autoscaling policy by all of the following :
- CPU Utilisation
- Stackdriver metrics
- Load balancing serving capacity
- Datastore options in GCP for transactions and the ability to perform relational database operations using fully complinat SQL data store – Spanner & Cloud SQL
- Instance templates are used to create a group of identical VMs. The instance templates include the following configuration parameters or attributes of a
- Machine Type
- Boot Disk image
- Container Image
- The most efficient way to implement an object management policy via administrators that requires ojects stored in Cloud Storage to be migrated from regional storage to enearline storage 90 days after the object is created is via lifecycle management configuration policy specifying an age of 90 days and SetStorageClass as nearline.
- Command to synchronize the contents of the two buckets gsutil rsync.
- All of the following are components of firewall rules a.) direction of traffic, b.) priority c.) action on match d.) enforcement status e.) target f.) source g.) protocol
- VPC’s are global resources and subnets are regional resources.
- Web application – deployment but does not want to manage managed servers or clusters. A good option is a PaaS – App Engine.
- Data warehouse needing SQL query capabilities over petabytes of data but with no manage servers or clusters, such requirements can be met by Big Query.
- Internet of Things space, will stream large volumes of data into GCp. The data needs to be filtered, transformed and analysed before being stored in GCP Datastore. — Cloud Dataflow.
- Cloud Dataflow allows for stream and batch processing of data and is well suited for ETL work.
- Dataproc is a managed Hadoop and Spark service thiat is used for big data analysics
- Buckets, directories and subdirectories are used to organise storage
- gcloud is the command line tool for IAM and list-grantable-roles will list roles granted to a resource, gcloud iam list-grantable-roles <resource>
- Cloud Endpoints is an API Service
- Cloud Interconnect is a network service.
- Compute Engine Virtual Machine is a zonal resource.
- Within the Zonal & Regional scope, GCP geographic scopes are network latencies generally less than 1 millisecond.
- To create a custom role, a user must possess iam.roles.create
- Project is the base-level organizing entity for creating and using GCP resources and services.
- Organisations, folders and projects are the components used to manage an organizational hierarchy.
- gcloud compute regions describe, gets a list of all CPU types available in a particular zone.
- Cloud Function responds to events in Cloud Storage, making them a good choice for taking actiona after a file is loaded
- Billing is setup at the Project level in the GCP resource hierarchy.
- Cloud Dataproc is the managed Spark Service
- Cloud Dataflow is for stream and processing
- Rate Quotas resets at regular intervals.
- There are two types of quotas in billing, Rate Quotas and Allocation Quotas.
- In Kubernetes Engine, a node pool is a subset of node instances within a cluster that all have the same configuration.
- Code for Cloud Functions can be written in Node.js and Python
- Preemptible virtual machines may be shutdown at any time but will always be shut down after running for 24 hours.
- After deciding to use Cloud Key Management Services and before you can start to create cryptographic keys you must enable KMS Api (Google Cloud Key Management Service) and setup billing.
- GCP Service for storing and managing Docker containers is Container Registry.
- You must verify the project selected is the one you want to work with, once you have opened the GCP console at console.google.com before performing task on VM’s. All operations you perform apply to resoures in the identified project.
- One time task you will need to complete before using the console is setting up the billing. You will be able to create the project only if the billing is enabled.
- A name for VM, machine type, a region and a zone are minimal set of info you will need while creating a VM.
- Different zones may have different machine types available.
- Billing of different departments for the cost of VM’s used for their applications is possible with labels and descriptions.
- Google Cloud Interconnect – Dedicated is used to provide a dedicated connection between customer’s data center and a Google data center
- Purpose of instance groups in a Kubernetes cluster is to create sets of VM’s that can be managed as a unit.
- A Kubernetes cluster has a single cluster master and one or more nodes to execute workloads.
- A pod is a single instance of a running process in a cluster
- To ensure applications calling Kubernetes services
- ReplicaSets are controllers that are responsible for maintaining the correct number of pods.
- Deployments are versions of application code running on a cluster.
- To maintain availability even if there is a major network outage in a data center, Multizone/multiregion clusters are available in Kubernetes Engine and are used to provide resiliency to an application
- Starting with an existing template, filling in parameters, and generating the
gcloudcommand is the most reliable way to deploy a Kubernetes cluster with GPUs.
- gcloud beta container clusters create ch07-cluster-1 –num-nodes=4 will create a cluster named
ch07-cluster-1with four nodes.
- Application name, container image, and initial command can all be specified when using create deployment from cloud console when creating a deployment from cloud console. Time to Live (TTL) is not specified and not an attribute of deployments.
- Deployment configuration files created in Cloud Console use YAML format.
- When working on a Kubernetes Engine a cloud engineer may need to configure, Nodes, Pods, services, clusters and container images.
- After observing performance degradation, inorder to see details of a specific cluster, after opening Cloud Console, Click the cluster name to see details of a specific cluster.
- You can find the number of vCPUS on the cluster listing in the Total Cores column or on the Details Page in the Node Pool section in the size parameter.
- High level characteristics of a cluster — gcloud container clusters list
- gcloud container clusters get-credentials is the correct command to configure kubectl to use GCP credentials for the cluster.
- Clicking Edit button allows you to change, add, or remove labels from the Kubernetes cluser.
- When resizing, the gcloud container clusteres resize command requires the name of the cluster,size and the node pool to modify.
- Pods are used to implement replicas of a deployment, and it is best practice to modify deployments which are configured with a specification of the number of replicas that should always run.
- In the Kubernetes Engine Navigation menu, you would select Workloads inorder to see a list of deployments.
- 4 actions available for deployments is Autoscale, Expose, Rolling Update and Scale.
- Command to list deployments is kubectl get deployments
- You can specify container image, cluster name and application name along with the labels, initial command and namespace.
- The Deployment Details page includes services.
- kubetcl run command is used to start a deployment. It takes name for the deployment, image & port.
- Command for service not functioning as expected and needs to be removed from the cluster — kubectl delete service m1-classfied
- Container Registry is the service for managing images that can be used in other services like Kubernetes Engine and Compute Engine.
- gcloud container images list — is to list container images in the command line.
- gcloud container images describe — to get a detailed description of each containers
- kubectl expose deployment — makes a service accessible.
- Autoscaling is the most cost-effective and least burdensome way to respond to changes in demand for a service.
- Version aspect of App Engine components would you use to minimize disruptions during updates to the service. Versions support migration. An app can have multiple versions, and by deploying with the
--migrateparameter, you can migrate traffic to the new version.
- Autoscaling enables setting a maximum and minimum number of instances, the best way to ensure that you have enough instances to meet demand without spending more than you have to.
- Applications have one or more services. Services have one or more versions. Versions are executed on one or more instances when the application is running. Hence in the hierarchy of App Engine components, instance is the lowest-level component.
- gcloud app deploy, is used to deploy App engine app from command line.
app.yamlfile is used to configure an App Engine application, so if python associated with the app were to be upgraded, it would be upgraded via the file.
- A project can support only one App Engine app
- The best way to get the code out as soon as possible without exposing it to customers, would be to deploy with
gcloud app deploy --no-promote
- App Engine applications are accessible from URLs that consist of the project name followed by appspot.com.
- Related to App Engine,
max_concurrent_requestslets you specify the maximum number of concurrent requests before another instance is started.
target_throughput_utilizationfunctions similarly but uses a 0.05 to 0.95 scale to specify maximum throughput utilization.
max_instancesspecifies the maximum number of instances but not the criteria for adding instances.
max_pending_latencyis based on the time a request waits, not the number of requests.
- App Engine Basic scaling only allows for idle time and maximum instances
- App Engine, The
runtimeparameter specifies the language environment to execute in. The script to execute is specified by the
scriptparameter. The URL to access the application is based on the project name and the domain appspot.com.
- Two kinds of instances present in App Engine Standard, Resident instances are used with manual scaling while dynamic instances are used with autoscaling and basic scaling.
- For Apps running in App Engine, using dynamic instances by specifying autoscaling or basic scaling will automatically adjust the number of instances in use based on load.
- For Apps running in App Engine, gcloud app services set-traffic, can allocate some users to a new version without exposing all users to it
- For Apps running in App Engine, –split-traffic parameter to
gcloud app services set-trafficis used to specify the method to use when splitting traffic
- For Apps running in App Engine, –splits parameter to
gcloud app services set-trafficis used to specify the percentage of traffic that should go to each instance
For Apps running in App Engine,--migrateis the parameter for specifying that traffic should be moved or migrated to the newer instance,
- The cookie used for cookie based splitting in App Engine is called GOOGAPPUID
- From the App Engine console you can view the list of services and versions as well as information about the utilization of each instance.
- All three methods listed, IP address, HTTP cookie, and random splitting, are allowed methods for splitting traffic for App Engine Traffic
- New app will require several backend services, three business logic services and access to relational databases. Each service will provide a single function and it will require services to complete a business task.Service execution time is dependent on the size of input and is expected to take up to 30 minutes in some cases. App Engine is designed to support multiple tightly coupled services comprising an application.
- Cloud Functions, which is designed to support single-purpose functions that operate independently and in response to isolated events in the Google Cloud and complete within a specified period of time.
- In Cloud Functions,a timeout period that is too low would explain why the smaller files are processed in time but the largest are not.
- In Cloud Functions, An event, is an action that occurs in GCP, such as a file being written to Cloud Storage or a message being added to a Cloud Pub/Sub topic.
- In Cloud Functions, a trigger is a declaration that a certain function should execute when an event occurs.
- GCP products listed do generate events that can have triggers associated with them, Cloud Storage, Cloud Pub/Sub/ Firebase, HTTP.
- Python, Node.js 6, Node.js 8 are supported in Cloud Functions.
- In Cloud Functions, an HTTP trigger can be invoked by making a request using DELETE, POST and GET.
- With Cloud Storage working with Cloud Functions, upload or finalise, delete, metadata update and archive are the 4 events supported.
- Following feature cannot be specified in a parameter and must be implemented in Function code, i.e. File type to apply the function to.
- Cloud Functions can have between 128MB and 2GB of memory allocated.
- By default Cloud Functions can run for up to 1 minute before timing out, you can, however, set the
timeoutparameter for a cloud function for periods of up to 9 minutes before timing out.
- Python Cloud Functions is currently in beta. The standard set of
gcloudcommands does not include commands for alpha or beta release features by default. You will need to explicitly install beta features using the
gcloud components install betacommand
google.storage.object.finalize, which occurs after a file is uploaded.
- If you are defining a cloud function to write a record to a database when a file in Cloud Storage is archived, you need
- If you’d like to stop using a cloud function and delete it from your project, gcloud functions delete.
As part of python code for a cloud function to work with Cloud Pub/Sub. Deccode function will be required in Pub/Sub cloud function for messages in Pub/Sub topics are encoded to allow binary data to be used in places where text data is expected. Messages need to be decoded to access the data in the message.
- Bigtable is a wide-column database that can ingest large volumes of data consistently.
- Once a bucket is created as either regional or multi-regional, it cannot be changed to the other.
- The goal is to reduce cost, so you would want to use the least costly storage option. Coldline has the lowest per-gigabyte charge at $0.07/GB/month.
- Memorystore is a managed Redis cache. The cache can be used to store the results of queries. Follow-on queries that reference the data stored in the cache can read it from the cache, which is much faster than reading from persistent disks. SSDs have significantly lower latency than hard disk drives and should be used for performance-sensitive applications like databases.
- While versioning on a bucket, the latest version of the object is called the live version.
- Lifecycle configurations can change storage class from regional to nearline or coldline. Once a bucket is created as regional or multiregional, it cannot be changed to the other.
- Transactions and support for tabular data are important, Cloud SQL and Spanner are relational databases and are well suited for transaction-processing applications.
- Sample command for deployment of python cloud function call pub_sub_function_test,
- gcloud functions deploy pub_sub_function_test –runtime python37 –trigger-topic gcp-ace-exam-test-topic
- There is only one type of event that is triggered in Cloud Pub/Sub, and that is when a message is published.
- Both MySQL and PostgreSQL are Cloud SQL options
- nam3 is a single super region
- us-central1 is a region
- us-west1-a is a zone
- Multiregional and multi-super-regional location of nam-eur-aisa1 is one the most expensive Cloud Spanner Configurations.
- BigQuery, Datastore, and Firebase are all fully managed services that do not require you to specify configuration information for VMs.
- Document data model is used by Datastore.
- BigQuery is a managed service designed for data warehouses and analytics. It uses standard SQL for querying and can support tens of petabytes of data.
- Bigtable can support tens of petabytes of data, but it does not use SQL as a query language.
- Firestore is a document database that has mobile supporting features, like data synchronization.
- Consistency, cost, read / write patterns, transaction support and latency are features of storage which should be considered while securing additional storage.
- Once a bucket has its storage class set to coldline, it cannot be changed to another storage class.
- To use BigQuery to store data, you must have a data set to store it.
- With a second-generation instance in Cloud SQL, you can configure the MySQL version, connectivity, machine type, automatic backups, failover replicas, database flags, maintenance windows, and labels.
- Access charges are used with nearline and coldline storage
- Memorystore can be configured to use between 1GB and 300GB of memory.
- Your company has a web application that allows job seekers to upload résumé files. Some files are in Microsoft Word, some are PDFs, and others are text files. You would like to store all résumés as PDFs. And the solution for this one is, implement a Cloud Function on Cloud Storage to execute on a finalize event. The function checks the file type, and if it is not PDF, the function calls a PDF converter function and writes the PDF version to the bucket that has the original.
- Options for uploading code to a cloud function are as follows
- Inline editor
- Zip upload
- Cloud source repository
- The HTTP trigger allows for the use of POST, GET, and PUT calls to invoke a cloud function.
- Cloud SQL is a fully managed relational database service, but database administrators still have to perform some tasks. Creating Databases is one of them.
- Cloud SQL is controlled using the
gcloudcommand; the sequence of terms in
gcloudfollowed by the service, in this case
SQL; followed by a resource, in this case
backups, and a command or verb, in this case
create. Command is used to create a backup of a Cloud SQL database. “gcloud sql backups create”
- Command will run automatic backup on an instance called ace-exam-mysql. The base command is
gcloud sql instances patch, which is followed by the instance name and a start time passed to the
–-backup-start-timeparameter. gcloud sql instances patch ace-exam-mysql –backup-start-time 03:00
- GQL – Go Query Language is used for Datastore
- Export Data from Datastore uses the following command, gcloud datastore export -namespaces='[NAMESPACE]’ gs://[BUCKET_NAME]
- BigQuery analyzes the query and BigQuery displays an estimate of the amount of data scanned. This is important because BigQuery charges for data scanned in queries.
- Command to get an estimate of the volume of data scanned by BigQuery from the command line, the correct
bqcommand structure, which includes
––dry_runoption. This option calculates an estimate without actually running the query. bq ––location=[LOCATION] query –use_legacy_sql=false ––dry_run [SQL_QUERY]
- You are using Cloud Console and want to check on some jobs running in BigQuery. You navigate to the BigQuery part of the console. Job History is the menu item would you click to view jobs.
- Estimate the cost of running BigQuery query, BigQuery provides an estimate of the amount of data scanned, and the Pricing Calculator gives a cost estimate for scanning that volume of data.
- You have just created a Cloud Spanner instance. You have been tasked with creating a way to store data about a product catalog, the next step is to create a database within the instance. Once a database is created, tables can be created, and data can be loaded into tables.
- Your software team is developing a distributed application and wants to send messages from one application to another. Once the consuming application reads a message, it should be deleted. You want your system to be robust to failure, so messages should be available for at least three days before they are discarded. It involves sending messages to the topic, and the subscription model is a good fit. Pub/Sub has a retention period to support the three-day retention period.
- Pub/Sub works with topics, which receive and hold messages, and subscriptions, which make messages available to consuming applications.
- Command line tools for Big Table Environment can be secured using
gcloud components install cbtto install the Bigtable command-line tool.
- cbt createtable iot-ingest-data, to create a table iot-ingest-data in Big Table
- Cloud Dataproc is a managed service for Spark and Hadoop. Cassandra is a big data distributed database but is not offered as a managed service by Google
- gcloud dataproc clusters create spark-nightly-analysis ––zone us-west2-a, command to create data proc cluster
- Command to rename an object stored in a bucket, gsutil mv gs://[BUCKET_NAME]/[OLD_OBJECT_NAME] gs://[BUCKET_NAME]/ [NEW_OBJECT_NAME
- Dataproc with Spark and its machine learning library are ideal for the use of selling more products.
- gsutil mb is used to create buckets in Cloud Storage
- gsutil cp is the command to copy files from your local device to a bucket in Cloud Storage assuming your have Cloud SDK installed.
- Using the cloud console you can upload files and folders, if you are migrating a large number of files from a local storage system to Cloud Storage
- When exporting a database from Cloud SQL, the export file format options are CSV and SQL
- SQL format, exports a database as a series of SQL data definition commands. These commands can be executed in another relational database without having to first create a schema
- gcloud sql export sql ace-exam-mysql1 gs://ace-exam-buckete1/ace-exam-mysql-export.sql \ ––database=mysql, this command will export a MySQL database called
ace-exam-mysql1to a file called
ace-exam-mysql-export.sqlin a bucket named
- Command is required to back up data from your Datastore database to an object storage system. Your data is stored in the default namespace,
gcloud datastore export ––namespaces=“(default)” gs://ace-exam-bucket1
- Datastore export command process creates a metadata file with information about the data exported and a folder that has the data itself
- XML is not an option in BigQuery export process, CSV, AVRO,JSON are valid options.
- CSV, AVRO,Parquet are valid BigQuery options.
- bq load ––autodetect ––source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE], BigQuery to analyse the data and make data available for analysis in BigQuery.
- You have set up a Cloud Spanner process to export data to Cloud Storage. You notice that each time the process runs you incur charges for another GCP service, which you think is related to the export process.Dataflow is a pipeline service for processing streaming and batch data that implements workflows used by Cloud Spanner.
- Exporting from Dataproc exports data about the cluster configuration. Dataproc supports Apache Spark, which has libraries for machine learning.
- Correct command to create a Pub/Sub topic, gcloud pubsub topics create
- gcloud pubsub subscriptions create ––topic=ace-exam-topic1 ace-exam-sub1, command will create a subscription on the topic ace-exam-topic1
- Direct Advantages of using a message queue in distributed systems, It decouples services, so if one lags, it does not cause other services to lag.
- gcloud components install beta, to install beta glcoud commands
- BigQuery parameter to automatically detect the schedma of a file import, –autodetect.
- Avro supports Deflate and Snappy compression when exporting from BigQuery. CSV supports Gzip and no compression.
- As a developer on a project using Bigtable for an IoT application, you will need to export data from Bigtable to make some data available for analysis with another tool. A Java program designed for importing and exporting data from Bigtable.
- The A record is used to map a domain name to an IPv4 address. The AAAA record is used for IPv6 addresses.
- DNSSEC is a secure protocol designed to prevent spoofing and cache poisoning.
- The TTL parameters in a DNS record specify, the time a record can be in a cache before the data should be queried again.
- Command to create a DNS zone in command line, gcloud beta dns managed-zones create.
- –visibility=private is the parameter that can be set to private for DNS.
- HTTP(S), SSL Proxy and TCP Proxy provide global load balancing
- Network TCP/UDP enables balancing based on IP protocol, address and port.
- while you are configuring a load balancer and want to implement private load balancing, only between by VMs is the option to select.
- TCP Proxy load balancers require you to configure both frontend and back end.
- Health checks monitor the health of VMs used with load balancers
- You specify ports to forward when configuring the frontend, so option B is correct. The backend is where you configure how traffic is routed to VMs.
- gcloud compute forwarding-rules create, used to create a network load balancer at the command line.
- A team is setting up a web service for internal use. They want to use the same IP address for the foreseeable future. Static addresses will be used, they are assigned until they are released.
- VM to experiment with a new Python data science library. You’ll SSH via the server name into the VM, use the Python interpreter interactively for a while and then shut down the machine.An ephemeral address is sufficient, since resources outside the subnet will not need to reach the VM and you can SSH into the VM from the console.
- You cannot reduce the number of addresses using any of the commands.
- 2 raised to 32 – 2 raised to prefix length = number of ip adresses in subnet.
- Premium is the network service level that routes all traffic over the Google network
- You are deploying a distributed system. Messages will be passed between Compute Engine VMs using a reliable UDP protocol. All VMs are in the same region. Internal TCP/UDP is a good option. It is a regional load balancer that supports UDP.
- Network Services is the section of Cloud Console that has the Cloud DNS console
- Stopping and starting a VM will release ephemeral IP addresses. Use a static IP address to have the same IP address across reboots.
- Virtual private clouds are global. By default, they have subnets in all regions. Resources in any region can be accessed through the VPC.
- IP ranges are assigned to subnets, when CIDR ranges are defined, they are defined as per the number of subnets.
- Dynamic routing is the parameter that specifies whether routes are learned regionally or globally.
- gcloud compute networks create – command to create a VPC
- The Flow Log option of the
create vpccommand determines whether logs are sent to Stackdriver.
- Shared VPC’s can be created at the organisation or folder level of the resource hierarchy.
- While creating a VM that should exist in a custom subnet, one needs specify the subnet in the Networking tab of the Management, Security, Disks, Networking, Sole Tenancy section of the form.
- VPC peering is used for interproject communications.
- Target, this part of the firewall rule can reference the network tag to determine the set of instances afftected by the rule.
- Direction specifies whether the rule is applied to incoming or outgoing traffic.
- The 0.0.0.0/0 matches all IP addresses.
- gcloud compute firewall-rules create, the product you are working with is compute and the resource you are creating is a firewall rule.
- Using gcloud to create a firewall rule, –network parameter is used to specify the subnet it should apply to.
- gcloud compute firewall-rules create fwr1 –allow=udp:20000-30000 –direction=ingress.The service endpoints will accept any UDP traffic and each endpoint will use a port in the range of 20000–30000.
- You want it to apply only if there is not another rule that would deny that traffic. 65535 is the appropriate priority because it is the largest number allowed in the range of values for priorities. The larger the number, the lower the priority. Having the lowest priority will ensure that other rules that match will apply.
- The VPC create option is available in the Hybrid Connectivity section
- If you want to configure the GCP end of the VPN, the following section of the Create VPN form would be used, the Google Compute Engine VPN is where you specify information about the Google Cloud end of the VPN connection.
- Global dynamic routing is used to learn all routes on a network and if you want the router on a tunnel you are creating to learn routes from all GCP regions on the network.The autonomous system number (ASN) is a number used to identify a cloud router on a network
- When using
gcloudto create a VPN, you need to create forwarding rules, tunnels, and gateways, so all the
gcloudcommands listed would be used. gcloud compute forwarding-rule, gcloud compute target-vpn-gateways, and gcloud compute vpn-tunnels
- When you create a cloud route, you need to assign an ASN for the BGP protocol.
- Incase a remote component in your network has failed, which results in a transient network error, when you submit a gsutil command, it fails because of a transient error, by default the command will retry using a truncated binary exponential back-off strategy. This strategy is as follows, gsutil will retry using a truncated binary exponential backoff strategy:
- Wait a random period between [0..1] seconds and retry;
- If that fails, wait a random period between [0..2] seconds and retry;
- If that fails, wait a random period between [0..4] seconds and retry;
- And so on, up to a configurable maximum number of retries (default = 23),with each retry period bounded by a configurable maximum period of time (default = 60 seconds).
Thus, by default, gsutil will retry 23 times over 1+2+4+8+16+32+60… seconds for about 10 minutes. You can adjust the number of retries and maximum delay of any individual retry by editing the num_retries and max_retry_delay configuration variables in the “[Boto]” section of the .boto config file. Most users shouldn’t need to change these values.For data transfers (the gsutil cp and rsync commands), gsutil provides additional retry functionality, in the form of resumable transfers. Essentially, a transfer that was interrupted because of a transient error can be restarted without starting over from scratch. For more details about this, see the “RESUMABLE TRANSFERS” section of gsutil help.