tree: 69301dfafcbba1bfe82ab2d971ed4cc9fc0fcf70 [path history] [tgz]
  1. test/
  2. third_party/
  3. .expect_tests.cfg
  9. include.yaml
  13. OWNERS

Setting up timeseries monitoring on App Engine.

  1. Symlink this directory into your appengine app.

    cd infra/appengine/myapp
    ln -s ../../appengine_module/gae_ts_mon .
  2. Add the scheduled task to your cron.yaml file. Create it if you don't have one already.

    - description: Send ts_mon metrics
      url: /internal/cron/ts_mon/send
      schedule: every 1 minutes
      target: <cron_module_name>  # optional
  3. Only if your application is using webapp2:

    1. Include the URL handler for that scheduled task in your app.yaml file.

      - gae_ts_mon  # handles /internal/cron/ts_mon/send
  4. Initialize the library in your request handler.

    import gae_ts_mon
    app = webapp2.WSGIApplication(my_handlers)

    You must do this in every top-level request handler that's listed in your app.yaml to ensure metrics are registered no matter which type of request an instance receives first.

    If your app does not contain a webapp2.WSGIApplication instance (e.g. it's a Cloud Endpoints only app), then pass None as the first argument to gae_ts_mon.initialize.

    There are multiple variations of the gae_ts_mon.initialize method:

    • gae_ts_mon.initialize Maintained for backwards compatibility for apps which have not been flipped over to either initialize_adhoc() or initialize_prod(). New apps must use initialize_prod().
    • gae_ts_mon.initialize_adhoc By default this uses the shared prodx-mon-chrome-infra service account for authentication with Prod X Mon. This is deprecated and no longer recommended.
    • gae_ts_mon.initialize_prod This uses the default App Engine service account for authentication with Prod X Mon. All new apps must use this method.

    The gae_ts_mon.initialize method takes an optional parameter:

    • is_enabled_fn (function with n0o arguments returning bool): a callback to enable/disable sending the actual metrics. Default: None which is equivalent to lambda: True. The callback is called on every metrics flush, and takes effect immediately. Make sure the callback is efficient, or it will slow down your requests.
  5. Instrument all Cloud Endpoint methods if you have any by adding a decorator:

    def your_method(self, request):
  6. Give your app's service account permission to send metrics to the API. There are two ways to do this, the latter has been deprecated.


    You need to follow the go/monapi-onboarding to allow the project to send metrics to Prod X Mon. It is advised to use the default App Engine service account (which looks like $ to include the role “roles/prodx.metricPublisher”.


    You need the email address of your app‘s “App Engine default service account” from the IAM & Admin page in the cloud console. It’ll look something like Add it as a “Service Account Actor” of the “App Engine Metric Publishers” service account in the project by selecting it from the list and clicking “Permissions”. If you see an error “You do not have viewing permissions for the selected resource.”, then please ask the current chrome-trooper to do it for you. You‘ll then need to pass in the Service Account into the gae_ts_mon.initialize method (see above). You also need to enable the Google Identity and Access Management (IAM) API for your project if it’s not enabled already.

You‘re done! You can now use ts_mon metrics exactly as you normally would using the infra_libs.ts_mon module. Here’s a quick example, but see the timeseries monitoring docs for more information.

from infra_libs import ts_mon

class MyHandler(webapp2.RequestHandler):
  goats_teleported = ts_mon.CounterMetric(
      'Number of goats teleported',

  def get(self):
    count = goat_teleporter.teleport()

    self.response.write('Teleported %d goats this time' % count)

Appengine Modules

Multiple Appengine modules are fully supported - the module name will appear in as job_name field in metrics when they are exported.

The scheduled task only needs to run in one module.

Global Metrics

Normally, each AppEnigne app's instance sends its own set of metrics at its own pace. This can be a problem, however, if you want to report a metric that only makes sense globally, e.g. the count of certain Datastore entities computed once a minute in a cron job.

Setting such a metric in an individual instance is incorrect, since a cron job will run in a randomly selected instance, and that instance will continue to send the same old value until it's picked by the cron job again. The receiving monitoring endpoint will not be able to tell which metric is the most recent, and by default will try to sum up the values from all the instances, resulting in a wrong value.

A “global” metric is a metric that is not tied to an instance, and is guaranteed to be computed and sent at most once per minute, globally. Here's an example of how to set up a global metric:

from infra_libs import ts_mon

# Override default target fields for app-global metrics.
    'job_name':  '',  # module name
    'hostname': '',  # version
    'task_num':  0,  # instance ID

remaining = ts_mon.GaugeMetric('goats/remaining', '...', None)
in_flight = ts_mon.GaugeMetric('goats/in_flight', '...', None)

def set_global_metrics():
  # Query some global resource, e.g. Datastore
  remaining.set(len(goats_to_teleport()), target_fields=TARGET_FIELDS)
  in_flight.set(len(goats_being_teleported()), target_fields=TARGET_FIELDS)

ts_mon.register_global_metrics([remaining, in_flight])
ts_mon.register_global_metrics_callback('my callback', set_global_metrics)

The registered callback will be called at most once per minute, and only one instance will be running it at a time. A global metric is then cleared the moment it is sent. Thus, global metrics will be sent at the correct intervals, regardless of the number of instances the app is currently running.

Note also the use of target_fields parameter: it overrides the default target fields which would otherwise distinguish the metric per module, version, or instance ID. Using target_fields in regular, “local” metrics is not allowed, as it would result in errors on the monitoring endpoint, and loss of data.