Jethro's Braindump

Git Scalar


Git has trouble handling large repositories. This problem is resolved using VFS for Git. This is essentially a wrapper around GVFS, replacing the git command.

1. Focusing on Files that Matter

scalar clone uses Git’s sparse-checkout feature in cone mode, cloning a subset of the Git repositories’ files. Scalar uses Git’s updated index file format, which reduces the size of the Git index. The Git index is a list of every tracked path at current HEAD.

With Scalar, the populated size is at most as large as the number of files in tracked files.

Scalar also configures the Git repository to work better with modified files using Git’s fsmonitor feature. Without this, Git needs to scan the entire working directory to find which paths were modified.

2. Reducing Object Transfer

The VFS for Git protocol reduces object transfer.

3. Reducing Waiting for Expensive Operations

  1. Disable auto Git garbage collection by setting, and run do incremental GC instead.
  2. Periodically run git fetch
  3. Write the commit graph using the incremental format
  4. Cleanup loose objects
  5. Git multi-pack-files to efficiently pack and store Git objects. It does some sort of reference counting: unreferenced pack-files can be deleted.