diff mbox series

[v2,3/3] clone: document partial clone section

Message ID c1a44a35095e7d681c312ecaa07c46e49f2fae67.1586791560.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series clone: document partial clone section | expand

Commit Message

John Passaro via GitGitGadget April 13, 2020, 3:26 p.m. UTC
From: Dyrone Teng <dyroneteng@gmail.com>

Partial clones are created using 'git clone', but there is no related
help information in the git-clone documentation. Add a relevant section
to help users understand what partial clones are and how they differ
from normal clones.

The section briefly introduces the applicable scenarios and some
precautions of partial clone. If users want to know more about its
technical design and other details, users can view the link of
git-partial-clone(7) according to the guidelines in the section.

Signed-off-by: Dyrone Teng <dyroneteng@gmail.com>
---
 Documentation/git-clone.txt | 72 +++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

Comments

Philippe Blain Oct. 27, 2020, 1:41 p.m. UTC | #1
Hi Dyrone, 

> Le 13 avr. 2020 à 11:26, Dyrone Teng via GitGitGadget <gitgitgadget@gmail.com> a écrit :
> 
> From: Dyrone Teng <dyroneteng@gmail.com>
> 
> Partial clones are created using 'git clone', but there is no related
> help information in the git-clone documentation. Add a relevant section
> to help users understand what partial clones are and how they differ
> from normal clones.
> 
> The section briefly introduces the applicable scenarios and some
> precautions of partial clone.

"some precautions users should take when using partial clone" 
would read better, I think.

> If users want to know more about its
> technical design and other details, users can view the link of
> git-partial-clone(7) according to the guidelines in the section.

Note: git-partial-clone(7) does not exist, i.e., there is document
named 'gitpartial-clone.txt' in 'Documentation/ 'that is listed in the 'MAN7_TXT'
variable of the documentation Makefile. What exists is a document called
'partial-clone.txt' in the 'technical' folder of the documentation.
You can do `git grep 'technical/' in 'Documentation/' to see how these pages
are referred to in the rest of the documentation.

Also, the wording could be better:

"In case users want to know more about the technical design of the partial clone
feature, add a link to 'technical/partial-clone.txt'."

would be sufficient.

> 
> Signed-off-by: Dyrone Teng <dyroneteng@gmail.com>
> ---
> Documentation/git-clone.txt | 72 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 72 insertions(+)
> 
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index eafa1c39927..a6e13666ea1 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -310,6 +310,78 @@ or `--mirror` is given)
> 	for `host.xz:foo/.git`).  Cloning into an existing directory
> 	is only allowed if the directory is empty.
> 
> +Partial Clone
> +-------------
> +
> +By default, `git clone` will download every reachable object, including
> +every version of every file in the history of the repository. The
> +**partial clone** feature allows Git to transfer fewer objects and
> +request them from the remote only when they are needed, so some
> +reachable objects can be omitted from the initial `git clone` and
> +subsequent `git fetch` operations.
> +
> +To use the partial clone feature, you can run `git clone` with the 
> +`--filter=<filter-spec>` option. If you want to clone a repository
> +without download

s/download/downloading/

> any blobs, the form `filter=blob:none` will omit all
> +the blobs. If the repository has some large blobs and you want to
> +prevent some large blobs being downloaded by an appropriate threshold,

repeating "some large blobs" two times here feels a little awkward. Maybe:

"If the repository has some large blobs and you want to prevent them from
being downloaded"

> +the form `--filter=blob:limit=<n>[kmg]`omits blobs larger than n bytes
> +or units (see linkgit:git-rev-list[1]).

I think you could give an example here, and refer to git-rev-list[1] for the full syntax
(also "or units" is a little unclear here). So maybe something like that:

"the form `--filter=blob:limit=1m` would prevent downloading objects bigger than 1 MiB 
(see the description of the `--filter=<filter-spec>` option in linkgit:git-rev-list[1] for the 
full filter syntax)".

> +
> +As mentioned before, a partially cloned repository may have to request
> +the missing objects when they are needed. So some 'local' commands may
> +fail without a network connection to the remote repository.
> +
> +For example, The <repository> contains two branches which names 'master'
> +and 'topic. Then, we clone the repository by

wording, and formatting:

For example, let's say a remote repository contains two branches named 'master'
and 'topic'. We clone the repository with

> +
> +    $ git clone --filter=blob:none --no-checkout <repository>
> +
> +With the `--filter=blob:none` option Git will omit all the blobs and
> +the `--no-checkout` option Git will not perform a checkout of HEAD

some punctuation would help:

With the  `--filter=blob:none` option, Git will omit all the blobs, and
with `--no-checkout`, Git will not checkout `HEAD`

> +after the clone is complete. Then, we check out the remote tracking
> +'topic' branch by

Here you are not checking ou the remote-tracking 'topic' branch,
you are creating a local branch 'topic' that tracks the remote-tracking branch
'origin/topic'.

> +
> +    $ git checkout -b topic origin/topic 
> +
> +The output looks like
> +
> +------------
> +    remote: Enumerating objects: 1, done.
> +    remote: Counting objects: 100% (1/1), done.
> +    remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
> +    Receiving objects: 100% (1/1), 43 bytes | 43.00 KiB/s, done.
> +    Branch 'topic' set up to track remote branch 'topic' from 'origin'.
> +    Switched to a new branch 'topic'
> +------------
> +
> +The output is a bit surprising but it shows how partial clone works.
> +When we check out the branch 'topic' Git will request the missing blobs
> +because they are needed. Then, We

s/We/we/

> can switch back to branch 'master' by
> +
> +    $ git checkout master
> +
> +This time the output looks like
> +
> +------------
> +    Switched to branch 'master'
> +    Your branch is up to date with 'origin/master'.
> +------------
> +
> +It shows that when we switch back to the previous location, the checkout
> +is done without a download because the repository has all the blobs that
> +were downloaded previously.
> +
> +`git log` may also make a surprise with partial clones.

"make a surprise" reads awkward. "have surprising behaviour" would be better.

> `git log
> +-- <pathspec>` will not cause downloads with the blob filters,

I think "if the repository was cloned with `--filter=blob:none`" would be clearer
than "with the blob filters".

> because
> +it's only reading commits and trees. In addition

"However, any options" would be more appropriate than "In addition to" here.

> to any options that
> +require git

s/git/Git/ (and the same thing below)

> to look at the contents of blobs, like "-p" and "--stat"
> +, options

you don't have to spell out "options" again here. And the options
should be enclosed in backticks instead of double quotes.

> that cause git to report pathnames, like "--summary" and
> +"--raw", will trigger lazy/on-demand fetching of blobs, as they are
> +needed to detect inexact renames.
> +
> +linkgit:partial-clone[1]

Again, I'm pretty sure that does not work. You should build the documentation
locally and check that the links you are adding work. "MyFirstContrbution" 
has pointer on how to do that.


Thank you for working on that ! 
It's always great to see people wanting to improve the documentation.

Cheers,

Philippe.
diff mbox series

Patch

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index eafa1c39927..a6e13666ea1 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -310,6 +310,78 @@  or `--mirror` is given)
 	for `host.xz:foo/.git`).  Cloning into an existing directory
 	is only allowed if the directory is empty.
 
+Partial Clone
+-------------
+
+By default, `git clone` will download every reachable object, including
+every version of every file in the history of the repository. The
+**partial clone** feature allows Git to transfer fewer objects and
+request them from the remote only when they are needed, so some
+reachable objects can be omitted from the initial `git clone` and
+subsequent `git fetch` operations.
+
+To use the partial clone feature, you can run `git clone` with the 
+`--filter=<filter-spec>` option. If you want to clone a repository
+without download any blobs, the form `filter=blob:none` will omit all
+the blobs. If the repository has some large blobs and you want to
+prevent some large blobs being downloaded by an appropriate threshold,
+the form `--filter=blob:limit=<n>[kmg]`omits blobs larger than n bytes
+or units (see linkgit:git-rev-list[1]).
+
+As mentioned before, a partially cloned repository may have to request
+the missing objects when they are needed. So some 'local' commands may
+fail without a network connection to the remote repository.
+
+For example, The <repository> contains two branches which names 'master'
+and 'topic. Then, we clone the repository by
+
+    $ git clone --filter=blob:none --no-checkout <repository>
+
+With the `--filter=blob:none` option Git will omit all the blobs and
+the `--no-checkout` option Git will not perform a checkout of HEAD
+after the clone is complete. Then, we check out the remote tracking
+'topic' branch by
+
+    $ git checkout -b topic origin/topic 
+
+The output looks like
+
+------------
+    remote: Enumerating objects: 1, done.
+    remote: Counting objects: 100% (1/1), done.
+    remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
+    Receiving objects: 100% (1/1), 43 bytes | 43.00 KiB/s, done.
+    Branch 'topic' set up to track remote branch 'topic' from 'origin'.
+    Switched to a new branch 'topic'
+------------
+
+The output is a bit surprising but it shows how partial clone works.
+When we check out the branch 'topic' Git will request the missing blobs
+because they are needed. Then, We can switch back to branch 'master' by
+
+    $ git checkout master
+
+This time the output looks like
+
+------------
+    Switched to branch 'master'
+    Your branch is up to date with 'origin/master'.
+------------
+
+It shows that when we switch back to the previous location, the checkout
+is done without a download because the repository has all the blobs that
+were downloaded previously.
+
+`git log` may also make a surprise with partial clones. `git log
+-- <pathspec>` will not cause downloads with the blob filters, because
+it's only reading commits and trees. In addition to any options that
+require git to look at the contents of blobs, like "-p" and "--stat"
+, options that cause git to report pathnames, like "--summary" and
+"--raw", will trigger lazy/on-demand fetching of blobs, as they are
+needed to detect inexact renames.
+
+linkgit:partial-clone[1]
+
 :git-clone: 1
 include::urls.txt[]