Symlink this directory into your appengine app.
cd infra/appengine/myapp ln -s ../../appengine_module/gae_ts_mon .
Add the scheduled task to your cron.yaml
file. Create it if you don't have one already.
cron: - description: Send ts_mon metrics url: /internal/cron/ts_mon/send schedule: every 1 minutes target: <cron_module_name> # optional
Only if your application is using webapp2:
Include the URL handler for that scheduled task in your app.yaml
file.
includes: - gae_ts_mon # handles /internal/cron/ts_mon/send
Initialize the library in your request handler.
import gae_ts_mon [...] app = webapp2.WSGIApplication(my_handlers) gae_ts_mon.initialize_prod(app)
You must do this in every top-level request handler that's listed in your app.yaml to ensure metrics are registered no matter which type of request an instance receives first.
If your app does not contain a webapp2.WSGIApplication
instance (e.g. it's a Cloud Endpoints only app), then pass None
as the first argument to gae_ts_mon.initialize
.
There are multiple variations of the gae_ts_mon.initialize
method:
gae_ts_mon.initialize
Maintained for backwards compatibility for apps which have not been flipped over to either initialize_adhoc() or initialize_prod(). New apps must use initialize_prod().gae_ts_mon.initialize_adhoc
By default this uses the shared prodx-mon-chrome-infra service account for authentication with Prod X Mon. This is deprecated and no longer recommended.gae_ts_mon.initialize_prod
This uses the default App Engine service account for authentication with Prod X Mon. All new apps must use this method.The gae_ts_mon.initialize
method takes an optional parameter:
is_enabled_fn
(function with n0o arguments returning bool
): a callback to enable/disable sending the actual metrics. Default: None
which is equivalent to lambda: True
. The callback is called on every metrics flush, and takes effect immediately. Make sure the callback is efficient, or it will slow down your requests.Instrument all Cloud Endpoint methods if you have any by adding a decorator:
@gae_ts_mon.instrument_endpoint() @endpoints.method(...) def your_method(self, request): ...
Give your app's service account permission to send metrics to the API. There are two ways to do this, the latter has been deprecated.
You need to follow the go/monapi-onboarding to allow the project to send metrics to Prod X Mon. It is advised to use the default App Engine service account (which looks like $GCP_PROJECT_NAME@appspot.gserviceaccount.com) to include the role “roles/prodx.metricPublisher”.
You need the email address of your app‘s “App Engine default service account” from the IAM & Admin
page in the cloud console. It’ll look something like app-id@appspot.gserviceaccount.com
. Add it as a “Service Account Actor” of the “App Engine Metric Publishers” service account in the google.com:prodx-mon-chrome-infra project by selecting it from the list and clicking “Permissions”. If you see an error “You do not have viewing permissions for the selected resource.”, then please ask the current chrome-trooper to do it for you. You‘ll then need to pass in the Service Account app-engine-metric-publishers@prodx-mon-chrome-infra.google.com.iam.gserviceaccount.com
into the gae_ts_mon.initialize
method (see above). You also need to enable the Google Identity and Access Management (IAM) API for your project if it’s not enabled already.
You‘re done! You can now use ts_mon metrics exactly as you normally would using the infra_libs.ts_mon module. Here’s a quick example, but see the timeseries monitoring docs for more information.
from infra_libs import ts_mon class MyHandler(webapp2.RequestHandler): goats_teleported = ts_mon.CounterMetric( 'goats/teleported', 'Number of goats teleported', None) def get(self): count = goat_teleporter.teleport() goats_teleported.increment(count) self.response.write('Teleported %d goats this time' % count)
Multiple Appengine modules are fully supported - the module name will appear in as job_name
field in metrics when they are exported.
The scheduled task only needs to run in one module.
Normally, each AppEnigne app's instance sends its own set of metrics at its own pace. This can be a problem, however, if you want to report a metric that only makes sense globally, e.g. the count of certain Datastore entities computed once a minute in a cron job.
Setting such a metric in an individual instance is incorrect, since a cron job will run in a randomly selected instance, and that instance will continue to send the same old value until it's picked by the cron job again. The receiving monitoring endpoint will not be able to tell which metric is the most recent, and by default will try to sum up the values from all the instances, resulting in a wrong value.
A “global” metric is a metric that is not tied to an instance, and is guaranteed to be computed and sent at most once per minute, globally. Here's an example of how to set up a global metric:
from infra_libs import ts_mon # Override default target fields for app-global metrics. TARGET_FIELDS = { 'job_name': '', # module name 'hostname': '', # version 'task_num': 0, # instance ID } remaining = ts_mon.GaugeMetric('goats/remaining', '...', None) in_flight = ts_mon.GaugeMetric('goats/in_flight', '...', None) def set_global_metrics(): # Query some global resource, e.g. Datastore remaining.set(len(goats_to_teleport()), target_fields=TARGET_FIELDS) in_flight.set(len(goats_being_teleported()), target_fields=TARGET_FIELDS) ts_mon.register_global_metrics([remaining, in_flight]) ts_mon.register_global_metrics_callback('my callback', set_global_metrics)
The registered callback will be called at most once per minute, and only one instance will be running it at a time. A global metric is then cleared the moment it is sent. Thus, global metrics will be sent at the correct intervals, regardless of the number of instances the app is currently running.
Note also the use of target_fields
parameter: it overrides the default target fields which would otherwise distinguish the metric per module, version, or instance ID. Using target_fields
in regular, “local” metrics is not allowed, as it would result in errors on the monitoring endpoint, and loss of data.