Note: For googler only: Gsubtreed is deprecated. Please use copybara.
Gsubtreed is a daemon which mirrors subtrees from a large git repo to their own smaller repos. It is typically run for medium-term intervals (e.g. 10 min), restarting after every interval. While it‘s running it has a short (e.g. 5s) poll+process cycle for the repo that it’s mirroring.
Push a file named config.json
to the refs/gsubtreed-config/main
ref of the repo you intend to mirror. Its contents is determined by GsubtreedConfigRef
. You must at least set the enabled_paths
portion.
master
). Here's an example for the infra.git
repo:mkdir temp_dir cd temp_dir git init vim config.json git add config.json git commit -am 'Created gsubtreed config' git push https://chromium.googlesource.com/infra/infra HEAD:refs/gsubtreed-config/main
To modify an existing config.json
e.g. in infra.git
:
mkdir temp_dir cd temp_dir git init git remote add origin https://chromium.googlesource.com/infra/infra git fetch origin refs/gsubtreed-config/main git checkout FETCH_HEAD # hack hack git commit -m 'Updated gsubtreed config' git push origin HEAD:refs/gsubtreed-config/main
Anyone with a committer status in the repo should be able to push the config. However, config modification usually means you are adding or deleting a mirrored subtree, which requires creating a new repo or deleting an old one. For that, you will likely need a git-admin support.
(optional) If you have existing git mirrors (probably mirrored via git-svn), disable the mirroring service for them. If not, create the mirrors. By default if gsubtreed mirrors the path bob
in https://.../repo
, it will expect to find the mirror repo at https://.../repo/bob
. You can change this with the base_url
parameter in the config.
If you do have existing mirrors, you will also need to bootstrap them. gsubtreed uses the mirror repos to store state about what it has already processed. It stores state as commit footers (e.g. Cr-Mirrored-From
) The existing mirrors don't have this state so we have to cheat.
gsubtreed has a feature which allows it to pretend as if a given commit has extra commit footers that it doesn't really have. These are stored in a git-notes ref refs/notes/extra_footers
using the git notes
tool. Setting them requires finding the commit in the original git repo which corresponds with the commit in the mirror repo.
The tool run.py infra.services.gsubtreed.bootstrap_from_existing
does most of this work for you, but it's not bullet-proof and tends to assume a lot of things about how the repos are related to each other (it was written as a one-off to assist the chromium git migration). However studying its source code should be enlightening.
Note that git notes
may not respect the --ref
option with a fully qualified ref (e.g. refs/notes/extra_footers
). For example, git notes --ref refs/b/l/a add ...
may actually store notes in refs/notes/refs/b/l/a
. You can check which ref is really being used by running git notes --ref refs/b/l/a get-ref
. If you have this problem AND your mirrors already have notes in refs/notes/extra_footers
, you may have to locally inititalize the weird ref (the get-ref
one) with the commit hash of refs/notes/extra_footers
of your mirror. You can do this directly with the git update-ref
command after fetching the existing notes from the mirror.
Once your mirrors are in a good state (either empty or primed with extra_footers
), you should be able to run: run.py infra.services.gsubtreed <repo url>
and it should Just Work.
At this point you should set up a new bot on the luci.infra.cron bucket/pool to run this on a regular interval.
Request a new GCE Linux bot - sample ticket (internal)
Add relevant new entries cr-buildbucket.cfg
, luci-scheduler.cfg
and luci-milo.cfg
, using existing gsubtreed builders as a template. Make sure to use the task service account from above.
Register your new service account with Gerrit Gerrit quota usage and rejections monitoring (example internal CL). It may take some time for the changes to take effect.
run.py infra.services.gsubtreed <repo_url>
interval
number: The time in seconds between iterations of gsubtreed. Each iteration will fetch from the main repo, and try to push to all of the subtree repos.
base_url
string: The base URL is the url relative to which all mirror repos are assumed to exist. For example, if you mirror the path bob
, and base_url is https://.../main_repo
, then it would assume that the mirror for the bob
subtree is https://.../main_repo/bob
. By default, base_url is set to the repo that gsubtreed is processing.
enabled_paths
[string]: A list of paths in the repo to mirror. These are absolute paths from the ref root. Any commits which affect these paths will be mirrored to the target repo '/'.join((base_url, path))
.
enabled_refglobs
[string]: A list of git-style absolute refglobs that gsubtreed should attempt to mirror. If a subtree appears in multiple refs covered by the refglob, then all of those refs will be pushed to the mirror for that subtree. Say you are mirroring the subtree bob
and the refglob refs/*
. If bob
appeared on refs/foo
and refs/bar
, the bob
subtree repo would then contain both a refs/foo
and a refs/bar
ref.
path_map_exceptions
{string: string}: A dictionary mapping enabled_path
to ‘mirror_repo_path’. This will be used instead of the generic join rule for calculating the mirror URL (so it would be '/'.join((base_url, path_map_exceptions[path]))
instead of using path directly). For example if this had the value {'path/to/foo': 'bar'}
, and base_url was https://example.com
, then it would mirror path/to/foo
to https://example.com/bar
, instead of https://example.com/path/to/foo
.
path_extra_push
{string: [string]}: A dictionary mapping enabled_path
to a list of full_git_repo_urls. Any time we find changes in the enabled_path, we'll also push those subtree commits to all the git repos in full_git_repo_urls.