blob: 78f971ac4f1c6976e22b86559de41bf6d39e0bd2 [file] [log] [blame] [view]
Note: For googler only: Gsubtreed is deprecated.
Please use [copybara](https://go/copybara-chrome#copybara-gsubtreed).
# Gsubtreed
#### (git subtree daemon)
Gsubtreed is a daemon which mirrors subtrees from a large git repo to their
own smaller repos. It is typically run for medium-term intervals (e.g. 10 min),
restarting after every interval. While it's running it has a short (e.g. 5s)
poll+process cycle for the repo that it's mirroring.
## Setup:
1. Push a file named `config.json` to the `refs/gsubtreed-config/main` ref of
the repo you intend to mirror. Its contents is determined by
[`GsubtreedConfigRef`][1]. You must at least set the `enabled_paths`
portion.
* You should create a new line of history to avoid having unecessary extra
files in this ref (as opposed to branching it off of another ref like
`master`). Here's an example for the `infra.git` repo:
```sh
mkdir temp_dir
cd temp_dir
git init
vim config.json
git add config.json
git commit -am 'Created gsubtreed config'
git push https://chromium.googlesource.com/infra/infra HEAD:refs/gsubtreed-config/main
```
To modify an existing `config.json` e.g. in `infra.git`:
```sh
mkdir temp_dir
cd temp_dir
git init
git remote add origin https://chromium.googlesource.com/infra/infra
git fetch origin refs/gsubtreed-config/main
git checkout FETCH_HEAD
# hack hack
git commit -m 'Updated gsubtreed config'
git push origin HEAD:refs/gsubtreed-config/main
```
Anyone with a committer status in the repo should be able to push
the config. However, config modification usually means you are
adding or deleting a mirrored subtree, which requires creating a
new repo or deleting an old one. For that, you will likely need a
[git-admin support](https://bugs.chromium.org/p/chromium/issues/entry?template=Infra-Git).
1. (optional) If you have existing git mirrors (probably mirrored via git-svn),
disable the mirroring service for them. If not, create the mirrors. By
default if gsubtreed mirrors the path `bob` in `https://.../repo`, it will
expect to find the mirror repo at `https://.../repo/bob`. You can change
this with the `base_url` parameter in the config.
* If you do have existing mirrors, you will also need to bootstrap them.
gsubtreed uses the mirror repos to store state about what it has already
processed. It stores state as commit footers (e.g. `Cr-Mirrored-From`)
The existing mirrors don't have this state so we have to cheat.
* gsubtreed has a feature which allows it to pretend as if a given commit
has extra commit footers that it doesn't really have. These are stored
in a git-notes ref `refs/notes/extra_footers` using the `git notes`
tool. Setting them requires finding the commit in the original git repo
which corresponds with the commit in the mirror repo.
* The tool `run.py infra.services.gsubtreed.bootstrap_from_existing` does
most of this work for you, but it's not bullet-proof and tends to assume
a lot of things about how the repos are related to each other (it was
written as a one-off to assist the chromium git migration). However
studying its source code should be enlightening.
* Note that `git notes` may not respect the `--ref` option with a fully
qualified ref (e.g. `refs/notes/extra_footers`). For example, `git notes
--ref refs/b/l/a add ...` may actually store notes in
`refs/notes/refs/b/l/a`. You can check which ref is really being used by
running `git notes --ref refs/b/l/a get-ref`. If you have this problem
AND your mirrors already have notes in `refs/notes/extra_footers`, you
may have to locally inititalize the weird ref (the `get-ref` one) with
the commit hash of `refs/notes/extra_footers` of your mirror. You can do
this directly with the `git update-ref` command after fetching the
existing notes from the mirror.
1. Once your mirrors are in a good state (either empty or primed with
`extra_footers`), you should be able to run: `run.py
infra.services.gsubtreed <repo url>` and it should Just Work.
1. At this point you should set up a new bot on the [luci.infra.cron
bucket/pool][2] to run this on a regular interval.
* Request a new GCE Linux bot -
[sample ticket](http://crbug.com/626818) (internal)
* Create [new task service account (internal)](
http://go/luci-new-task-account).
* Add relevant new entries `cr-buildbucket.cfg`, `luci-scheduler.cfg` and
`luci-milo.cfg`, using existing gsubtreed builders as a template.
Make sure to use the task service account from above.
* Register your new service account with Gerrit Gerrit quota usage and
rejections monitoring ([example internal CL](http://cl/209480289)).
It may take some time for the changes to take effect.
## Usage:
run.py infra.services.gsubtreed <repo_url>
## GsubtreedConfigRef fields
* `interval` *number*: The time in seconds between iterations of gsubtreed.
Each iteration will fetch from the main repo, and try to push to all of the
subtree repos.
* `base_url` *string*: The base URL is the url relative to which all mirror
repos are assumed to exist. For example, if you mirror the path `bob`, and
base_url is `https://.../main_repo`, then it would assume that the mirror
for the `bob` subtree is `https://.../main_repo/bob`. By default, base_url
is set to the repo that gsubtreed is processing.
* `enabled_paths` *[string]*: A list of paths in the repo to mirror. These are
absolute paths from the ref root. Any commits which affect these paths will
be mirrored to the target repo `'/'.join((base_url, path))`.
* `enabled_refglobs` *[string]*: A list of git-style absolute refglobs that
gsubtreed should attempt to mirror. If a subtree appears in multiple refs
covered by the refglob, then all of those refs will be pushed to the mirror
for that subtree. Say you are mirroring the subtree `bob` and the refglob
`refs/*`. If `bob` appeared on `refs/foo` and `refs/bar`, the `bob` subtree
repo would then contain both a `refs/foo` and a `refs/bar` ref.
* `path_map_exceptions` *{string: string}*: A dictionary mapping
`enabled_path` to 'mirror_repo_path'. This will be used instead of the
generic join rule for calculating the mirror URL (so it would be
`'/'.join((base_url, path_map_exceptions[path]))` instead of using path
directly). For example if this had the value `{'path/to/foo': 'bar'}`, and
base_url was `https://example.com`, then it would mirror `path/to/foo` to
`https://example.com/bar`, instead of `https://example.com/path/to/foo`.
* `path_extra_push` *{string: [string]}*: A dictionary mapping `enabled_path`
to a list of full_git_repo_urls. Any time we find changes in the
enabled_path, we'll also push those subtree commits to all the git repos in
full_git_repo_urls.
[1]: ./gsubtreed.py#32
[2]: https://ci.chromium.org/p/infra/g/cron/builders