Section 4: Implementing observability practices (~20% of the exam)
4.1 Managing logs. Considerations include:
● Collecting and importing logs (e.g., Cloud Logging agent, Cloud Audit Logs, VPC Flow
Logs, Cloud Service Mesh)
● Logging optimization (e.g., sltering, sampling, exclusions, cost, source considerations)
● Exporting logs (e.g., BigQuery, Pub/Sub, for auditing)
● Retaining logs
● Analyzing logs
● Handling sensitive data (e.g., personally identiable information [PII], protected health
information [PHI])
4.2 Managing metrics. Considerations include:
● Collecting and analyzing metrics (e.g., application, platform, networking, Cloud Service
Mesh, Google Cloud Managed Service for Prometheus, hybrid/multi-cloud)
● Creating custom metrics from logs
● Using Metrics Explorer for ad hoc metric analysis
● Creating synthetic monitors
4.3 Managing dashboards and alerts. Considerations include:
● Managing dashboards (e.g., creating, altering, sharing, playbooks)
● Conguring alerting and alerting policies (e.g., SLIs, SLOs, cost control)
● Widely used third-party alerting tools (Datadog, New Relic, Dynatrace, Zabbix, Nagios, Sematext, Grafana, SolarWinds, Splunk, and Azure Monitor)
Note: cloud trace (for latency of requests) and application profiler (performance bottlenecks within specific code functions) covered in Nex part
questions
GCP Cloud Logging Vs error reporting vs cloud monitoring
Cloud monitoring for metrics monitor over 1500+ metric 100+ resources it can be system resoruces: like CPU, Disk , memory, Network,
application metrics like latency, throughput, responsetime, error rate, requset rate etc.
Network metrics
Jitter: An indicator of network stability
Throughput: A measure of how many messages or packets reach their destination
Latency: The time it takes for data to travel across a network
Packet loss: The number of data packets that are lost during transmission
Round trip time: The time it takes for data to travel from a device to a server and back again
Bandwidth: The maximum rate at which data can be transmitted over a network connection
Error rate: A metric that affects user experience, business operations, and network efficiency
Uptime checks & alerts
GCP Cloud Logging
Logs automatically collected in Google managed services like GKE, app engine, cloud run etc. but if you Only VM computing purose you need install an agent as of 2025 its called Ops Agent. (we cover billing for logs in nex part) usally 0.50$ for 1gb as on date.
these logs just like syslogs, server logs like access_log and error logs and application all in one place with help of ops agent.
Error reporting
we have options for errors open, acknowleged, resloved, and muted,
GCP Error Reporting essentially “grabs” information from Cloud Logging, (if nginx errors >> sysops agent >>cloudlogging>>error reporting)
cost of gcp error reporting: no extra cost already for clog logging $0.50 per GB of logs ingested with the first 50 GB per month being free for each project for 30 days additional charge of $0.01 per GB per month.
Benefit: instead of checking error logs (sys, db, server, application etc) in system old fashioned we can use error reporting for fast resolving.
Trace vs profiler in php
in order to idnetifty application performance bottlenecs we use trace and then profiling.
In PHP development, a “trace” generally refers to a detailed record of function calls and execution flow within your code, often used for debugging and understanding the path of data through a program, while a “profiler” is a tool that measures the performance of your code, identifying specific sections that are taking the most time to execute and thus helping you pinpoint performance bottlenecks.
cloud trace vs cloud profiler gcp
Cloud Trace to understand why individual requests are slow in districbuted system (OS, application1, app2, database) . If after that you still wonder where CPU time is spent, then Cloud Profiler mostly in application code.
Tracing is about describing transactions.
GCP Cloud trace
enable the Cloud Trace API within your Google Cloud project, which is essentially done through the Google Cloud Console, and then install the appropriate client library for your programming language to start sending traces from your application.
cloud profiler
Types of profiling available with google profiles, you may promoteous, newrelic datadog for measuring application performance.
Profile type | Go | Java | Node.js | Python |
---|---|---|---|---|
CPU time | Y | Y | Y | |
Heap memory | Y | Y | Y | |
Allocated heap | Y | |||
Contention | Y | |||
Threads | Y | |||
Wall time | Y | Y | Y |
GCP Observability Pricing
all logs are priced VPC flow logs, (based on storage and querying)
cloud logging
$0.50/GiB; One-time charge for streaming logs into log bucket storage for indexing, querying, and analysis; includes up to 30 days of storage in log buckets. free First 50 GiB/project/month.
more details here. better to create budget & quotas limits alert..
gcp billing alert for logging
CLoud monitoring:
free First 150 MiB per billing account for metrics charged by bytes ingested
$0.2580/MiB1: first 150–100,000 MiB (All Monitoring data except data ingested by using Managed Service for Prometheus)
$0.1510/MiB: next 100,000–250,000 MiB$0.0610/MiB: >250,000 MiB
more details here
OpenTelemetry
OpenTelemetry SDK that export telemetry signals (traces, metrics, logs) directly into a backend. OpenTelemetry SDK using OpenTelemetry protocol (OTLP) — or other collectors (using the OTLP exporter) that send telemetry signals to a collector
Ask a Question: