blob: 19ec06a0508457d316a572deb870135bcbed3987 [file] [log] [blame] [view]
LogDog
======
LogDog is a high-performance log collection and dissemination platform. It is
designed to collect log data from a large number of cooperative individual
sources and make it available to users and systems for consumption. It is
composed of several services and tools that work cooperatively to provide a
reliable log streaming platform.
Like other LUCI components, LogDog primarily aims to be useful to the
[Chromium](https://www.chromium.org/) project.
LogDog offers several useful features:
* Log data is streamed, and is consequently available the moment that it is
ingested in the system.
* Flexible hierarchial log namespaces for organization and navigation.
* Recognition of different projects, and application of different ACLs for each
project.
* Able to stream text, binary data, or records.
* Long term (possibly indefinite) log data storage and access.
* Log data is sourced from read-forward streams (think files, sockets, etc.).
* Leverages the LUCI Configuration Service for configuration and management.
* Log data is implemented as [protobufs](api/logpb/log.proto).
* The entire platform is written in Go.
* Rich metadata is collected and stored alongside log records.
* Built entirely on scalable platform technologies, targeting Google Cloud
Platform.
* Resource requirements scale linearly with log volume.
## APIs
Most applications will interact with a LogDog Coordinator instance via its
[Coordinator Logs API](api/endpoints/coordinator/logs/v1).
Chrome Operations currently runs a LogDog instance serving *.chromium.org
which is located at logs.chromium.org. You can view it's RPC explorer
[here](https://logs.chromium.org/rpcexplorer/services/).
Access it through the command line with the prpc from depot_tools with
`prpc show logs.chromium.org`
## Life of a Log Stream
Log streams pass through several layers and states during their path from
generation through archival.
1. **Streaming**: A log stream is being emitted by a **Butler** instance and
pushed through the **Transport Layer** to the **Collector**.
1. **Pre-Registration**: The log stream hasn't been observed by a
**Collector** instance yet, and exists only in the mind of the **Butler**
and the **Transport** layer.
1. **Registered**: The log stream has been observed by a **Collector**
instance and successfully registered with the **Coordinator**. At this
point, it becomes queryable, listable, and the records that have been
loaded into **Intermediate Storage** are streamable.
1. **ArchivePending**: One of the following events cause the log stream to be
recognized as finished and have an archival request dispatched. The archival
request is submitted to the **Archivist** cluster.
* The log stream's terminal entry is collected, and the terminal index is
successfully registered with the **Coordinator**.
* A sufficient amount of time has expired since the log stream's
registration.
1. **Archived**: An **Archivist** instance has received an archival request for
the log stream, successfully executed the request according to its
parameters, and updated the log stream's state with the **Coordinator**.
Most of the lifecycle is hidden from the Logs API endpoint by design. The user
need not distinguish between a stream that is streaming, has archival pending,
or has been archived. They will issue the same `Get` requests and receive the
same log stream data.
A user may differentiate between a streaming and a complete log by observing its
terminal index, which will be `< 0` if the log stream is still streaming.
## Components
The LogDog platform consists of several components:
* [Coordinator](appengine/coordinator), a hosted service which serves log data
to users and manages the log stream lifecycle.
* [Butler](client/cmd/logdog_butler), which runs on each log stream producing
system and serves log data to the Collector for consumption.
* [Collector](server/cmd/logdog_collector), a microservice which takes log
stream data and ingests it into intermediate storage for streaming and
archival.
* [Archivist](server/cmd/logdog_archivist), a microservice which compacts
completed log streams and prepares them for long-term storage.
LogDog offers several log stream clients to query and consume log data:
* [LogDog](client/cmd/logdog), a CLI to query and view log streams.
* [Web App](/web/apps/logdog-app), a heavy log stream navigation
application built in [Polymer](https://www.polymer-project.org).
* [Web Viewer](/web/apps/logdog-view), a lightweight log stream viewer built in
[Polymer](https://www.polymer-project.org).
Additionally, LogDog is built on several abstract middleware technologies,
including:
* A **Transport**, a layer for the **Butler** to send data to the **Collector**.
* An **Intermediate Storage**, a fast highly-accessible layer which stores log
data immediately ingested by the **Collector** until it can be archived.
* An **Archival Storage**, for cheap long-term file storage.
Log data is sent from the **Butler** through **Transport** to the **Collector**,
which stages it in **Intermediate Storage**. Once the log stream is complete
(or expired), the **Archivist** moves the data from **Intermediate Storage** to
**Archival Storage**, where it will permanently reside.
The Chromium-deployed LogDog service uses
[Google Cloud Platform](https://cloud.google.com/) for several of the middleware
layers:
* [Google AppEngine](https://cloud.google.com/appengine), a scaling application
hosting service.
* [Cloud Datastore](https://cloud.google.com/datastore/), a powerful
transactional NOSQL structured data storage system. This is used by the
Coordinator to store log stream state.
* [Cloud Pub/Sub](https://cloud.google.com/pubsub/), a publish / subscribe model
transport layer. This is used to ferry log data from **Butler** instances to
**Collector** instances for ingest.
* [Cloud BigTable](https://cloud.google.com/bigtable/), an unstructured
key/value storage. This is used as **intermediate storage** for log stream
data.
* [Cloud Storage](https://cloud.google.com/storage/), used for long-term log
stream archival storage.
* [Container Engine](https://cloud.google.com/container-engine/), which manages
Kubernetes clusters. This is used to host the **Collector** and **Archivist**
microservices.
Additionally, other LUCI services are used, including:
* [Auth Service](https://github.com/luci/luci-py/tree/master/appengine/auth_service),
a configurable hosted access control system.
* [Configuration Service](https://github.com/luci/luci-py/tree/master/appengine/config_service),
a simple repository-based configuration service.
## Instantiation
To instantiate your own LogDog instance, you will need the following
prerequisites:
* A **Configuration Service** instance.
* A Google Cloud Platform project configured with:
* Datastore
* A Pub/Sub topic (Butler) and subscription (Collector) for log streaming.
* A Pub/Sub topic (Coordinator) and subscription (Archivist) for archival
coordination.
* A Container Engine instance for microservice hosting.
* A BigTable cluster.
* A Cloud Storage bucket for archival staging and storage.
Other compatible optional components include:
* An **Auth Service** instance to manage authentication. This is necessary if
something stricter than public read/write is desired.
### Config
The **Configuration Service** must have a valid service entry text protobuf for
this LogDog service (defined in
[svcconfig/config.proto](api/config/svcconfig/config.proto)).
### Coordinator
After deploying the Coordinator to a suitable cloud project, several
configuration parameters must be defined visit its settings page at:
`https://<your-app>/admin/portal`, and configure:
* Configure the "Configuration Service Settings" to point to the **Configuration
Service** instance.
* If using timeseries monitoring, update the "Time Series Monitoring Settings".
* If using **Auth Service**, set the "Authorization Settings".
If you are using a BigTable instance outside of your cloud project (e.g.,
staging, dev), you will need to add your BigTable service account JSON to the
service's settings. Currently this cannot be done without a command-line tool.
Hopefully a proper settings page will be added to enable this, or alternatively
Cloud BigTable will be updated to support IAM.