This repo houses all the GCP code for the chromeos-prebuilts
project.
Most of the GCP resources are defined and deployed through a terraform+annealing setup as per go/cciac. The terraform code for the resources is defined in the chromeos_prebuilts folder.
Some resources are deployed through Cloud Build and some are managed and deployed manually. Details for how different resource types are managed is documented below.
We're using Cloud Functions as a serverless platform for the lookup service.
The entry points for the Cloud Functions are defined in cloud_functions/main.py
. Each entry point has a functions_framework
decorator which helps receive the function arguments based on the signature type specified.
All changes, once uploaded to Gerrit, are deployed to staging automatically. All changes, once merged, are deployed to production automatically.
We use Proctor/Cloud Build Integrations (go/gcb-ggob) to automatically deploy changes. A Cloud Build build is triggered whenever a new Gerrit patchset is uploaded in the prebuilts-cloud
project for the staging environment. See here for details on the Cloud Build trigger setup through Proctor.
During a build, a few things happen:
A Cloud Build trigger can be run manually by following the steps here. Once a CL successfully merged, a separate trigger deploys the latest source code to production.
The scripts and config files for the the cloud build deployments are in the cloudbuild/
folder.
The main way to test deployed Cloud Functions via a trigger including HTTP requests and Pub/Sub. staging-lookup-service
is setup to accept HTTP triggers via the main
function. A function cannot have more than one trigger associated with it, but a trigger can be associated with many functions (as long as the functions are unique).
To test staging-lookup-service
, go to the Testing
tab and click Test the Function
. Output is only available if the Cloud Function is deployed using 1st gen
environment.
To develop and test cloud functions locally, we should be able to connect to the cloud sql instance from our local machine/cloudtop. Hence, Public IP
has been enabled on the prebuilts-staging
instance to accommodate this and the database can be accessed locally through cloud sql auth proxy. Public IP should not be enabled in production.
Note: We are running the cloud function locally but are using GCP's staging environment, including staging Cloud SQL database and Secret Manager instances.
Steps to run cloud functions locally (All of these steps are done outside the chroot):
cloud-sql-proxy
to your $PATH
.$cloud-sql-proxy --address 127.0.0.1 --port 5432 chromeos-prebuilts:us-central1:prebuilts-staging
scripts/.env.defaults
file. Add a scripts/.env.local
file to override the values. This can be helpful for local testing without checking it into the repo. Env variables used:scripts/.env.defaults
, useful for local testing and development (use the cloud-sql-proxy database host and port)../scripts/run_server_local.sh
(This script also sets up a virtual env and installs the required packages)..env.local
, update FUNCTION_TARGET
and FUNCTION_SIGNATURE_TYPE
based on whether you want to run the lookup or the update service.FUNCTION_TARGET=lookup_service
and FUNCTION_SIGNATURE_TYPE=http
FUNCTION_TARGET=update_service
and FUNCTION_SIGNATURE_TYPE=cloud_event
./scripts/test_cloud_function_local.sh -r lookup
./scripts/test_cloud_function_local.sh
file.Note: The cloud function auto restarts when any of the source files are changed.
We use pytest as our unittesting framework and VPython to run unittests in a Python VirtualEnv. See here for a list of available wheels. vpython3 run_tests.py
will run all unittests by default.
We have two Cloud SQL instances:
prebuilts
- the production instanceprebuilts-staging
- the staging instanceEach instance currently contains a lookup_service
database for the project.
During initial development, we are manually deploying changes to the staging
instances of both Cloud SQL and Cloud Functions. DO NOT deploy to production instances as the production deployment process will be automated later.
To manually deploy changes:
cloudsql-manual-deployments
bucket.IMPORT
.lookup_service
as the database destination.Import
and wait a few minutes for the import to complete.The following steps can be used to connect to the database and verify deployments and/or query data:
./cloud-sql-proxy --port 5432 chromeos-prebuilts:us-central1:staging-prebuilts
.psql "host=127.0.0.1 sslmode=disable dbname=lookup_service user=postgres"
.We're using Pub/Sub to receive messages and update metadata for snapshots, binhosts, etc in the database. The update_service
cloud functions receive messages as cloud events through Pub/Sub subscriptions via EventArc triggers. Each use case has its own Pub/Sub topic (e.g. update_snapshot_data, update_binhost_data) and the corresponding cloud function processes the messages, performs database operations. With 2nd gen cloud functions, each function can only have one trigger. So each Pub/Sub topic has its own cloud function for processing messages.
PubSub topics are defined and deployed through terraform, in the chromeos_prebuilts folder.
We're using protocol buffers to have a consistent data format when sending, retrieving Pub/Sub messages.
scripts/gen_proto.sh
script compiles and puts the generated proto files in cloud_functions/protobuf/chromiumos/
.The binhost lookup service is a HTTP GET endpoint running in a cloud function. The protocol buffer definitions for the request and response are defined in prebuilts_cloud.proto.
The query parameter to be sent with the request:
LookupBinhostsRequest
object encoded as a URL safe base64 byte object.The response is a LookupBinhostsResponse
object encoded as a URL safe base64 byte object, sent in the response body.
GCP Secret Manager is used to store sensitive information that can be accessed by the cloud functions. The secrets are created and managed manually through the cloud console. Secrets being used:
NOTE: Name of each of these secrets is prefixed by the environment name. e.g. staging-prebuilts-db
The cloud functions have alerts setup through GCP and the ChromeOS Build team is notified via email when the alert metric breaches the configured threshold (The alerts are configured via terraform).
The incidents can be viewed and managed in the Alerting page in the cloud console. Gen 2 cloud functions use the Cloud run service, the best way to view the logs is to go to the specific cloud function page -> logs -> “View in logs explorer” icon. The logs can be filtered based on time, severity and other filters by altering the query. The incidents auto close when the metrics for the specific alert go back to within the threshold range.
Errors in the lookup-service-binhosts cloud function usually mean the builders/developers did not get the required binhosts, these alerts need to be looked into and fixed so that the builds are fast and efficient.
Errors in the update-service-snapshot-data, update-service-binhost-data cloud functions usually mean the data from the builders was not saved properly in the database. These alerts are very critical and we need to ensure that the messages are reprocessed after the issue is fixed.