MultipleRepositoryMethods
Jump to navigation
Jump to search
Notes on using multiple git repositories in a build. Having a good way to do this is a requirement for implementing a clean layered approach.
Requirements
Collected from various emails:
- Looks, feels and behaves like a standard git repo for the end user, no special tools needed.
- Can be checked out by the end user with one easy command.
- Contains full history for all the components in bisectable commits.
- The user can submit changes back easily with standard git workflow.
- tool should make it easy to add or change the pull URL for a repo.
Git Submodules
The good
- its part of git, no extra tools, languages, etc are required to be installed.
- locks submodules to a specific commit, so that subprojects can proceed at their own pace, while the superproject can move them forward as they are verified, and tested.
The bad
- Regarding git submodules, I haven't used them in a couple of years but the last time I did, the workflow was terrible. It kept rolling back submodule versions because someone forgot to do a full "git submodule update" and then did a "git commit -a" in the superproject; we were forever forgetting to commit and push the superproject after changing submodules; it seemed to delight in finding new ways to create merge conflicts; etc.
- http://book.git-scm.com/5_submodules.html (pitfalls, about 2/3 way through page)
- Submodule was committed and pushed, but superproject wasn't committed and/or pushed, so other developers kept using old submodule.
- Superproject was committed and pushed, but subproject wasn't committed and/or pushed. As a result other developers couldn't clone/update.
- Submodules are checked out as detached HEADs. I can't count the number of times I accidentally committed on top of that damned HEAD and had to go back later to create a new branch and cherry-pick my change over.
- If you accidentally commit on top of the detached HEAD, "git submodule update" will silently eat your changes. This led to people being afraid to run "git submodule update", which made the following problem more frequent.
- (Note, git does not eat your changes, they are still there and can be found with git reflog. It is agreed this is a pain, but it is not too bad to recover. See http://bec-systems.com/site/696/git-submodules-what-to-do-when-you-commit-to-no-branch)
- Developer 1 changed subproject A and properly pushed the superproject. Developer 2 did a "git pull" on the superproject but not a "git submodule update". Developer 2 then changed subproject B, committed, pushed, and did a "git commit -a" in the superproject not realizing this rolls back the superproject version of submodule A.
- The record of subproject revisions, and to a lesser extent the .gitmodules file, are essentially hot-spots for unrelated changes, just like the old checksums.ini. It just means more merge commits if people pull instead of rebase.
- I don't know if this is still true, but git just couldn't handle conflicts in the subproject revisions. It would abort the merge, hard,
- Most of the problems would be solved by rigorously using the tools the way they were designed, but I didn't meet anyone who was capable of doing this to the extent required, and recovering from problems was painful and time consuming.
Git Subtrees
https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt
The good
- Part of git, no extra tools, etc (sort of... it's in git's "contrib". Most distros appear to package it).
- No extra commands to run when fetching from the superproject (e.g. no 'update' post-clone command is required).
- Superproject pulls in specific commits of the child projects (via a subtree merge).
- Superproject can (optionally) contain the entire commit history up to the most recent subtree merge. This means that you can look at (and even checkout) commits from each subproject from the same repository if needed.
- Changes can be made to superproject and then "split" back into each child project via a subtree command.
- If you don't really care about pushing back to the upstream child repos (e.g. they are out of maintenance), you can commit directly to your superproject and forget about the subtrees all together.
The bad
- The commit history for the superproject can be a little messy. Subtrees are based on the concept of merges, so in the ideal case your superproject is a series of subtree merges from the other independent commit graphs also housed in the repository.
- Each subproject must have a dedicated directory to live in, or the splitting of changes doesn't work (unlike combo-layer, which e.g. allows oe-core to be in the root of a repo with other subprojects).
- There are reports that splitting changes for child projects can be slow in some cases
Google Repo
Notes
- requires python
- very capable tool, allows submodules to be specified by branch or by hash
- Most projects are using XML configuration, but it seems to support using git submodules as well, with fewer pitfalls than using git submodules natively
- Tied to Gerrit (code review tool) for upload, i.e. plain "git push" not supported
Braid
Notes
- https://github.com/evilchelu/braid
- written in ruby
update-proj.sh
- Pull script: http://jrz.cbnco.com/git/?p=toastix/toastix.git;a=blob;f=update-proj.sh
- Push script: http://jrz.cbnco.com/git/?p=toastix/toastix.git;a=blob;f=push.sh
- Sample config: http://jrz.cbnco.com/git/?p=toastix/toastix.git;a=blob;f=projects.list
Notes
- half-baked pair of shell scripts
- uses flat file for configuration
- requires different config file for committers and non-committers