[v2,03/11] clone: add --sparse mode
diff mbox series

Message ID fef41b794a9886664616ce5e5c7902a82a474c2d.1568904188.git.gitgitgadget@gmail.com
State New
Headers show
Series
  • New sparse-checkout builtin and "cone" mode
Related show

Commit Message

kdnakt via GitGitGadget Sept. 19, 2019, 2:43 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

When someone wants to clone a large repository, but plans to work
using a sparse-checkout file, they either need to do a full
checkout first and then reduce the patterns they included, or
clone with --no-checkout, set up their patterns, and then run
a checkout manually. This requires knowing a lot about the repo
shape and how sparse-checkout works.

Add a new '--sparse' option to 'git clone' that initializes the
sparse-checkout file to include the following patterns:

	/*
	!/*/

These patterns include every file in the root directory, but
no directories. This allows a repo to include files like a
README or a bootstrapping script to grow enlistments from that
point.

During the 'git sparse-checkout init' call, we must first look
to see if HEAD is valid, or else we will fail while trying to
update the working directory. The first checkout will actually
update the working directory correctly.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-clone.txt        |  8 +++++++-
 builtin/clone.c                    | 27 +++++++++++++++++++++++++++
 builtin/sparse-checkout.c          |  6 ++++++
 t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++
 4 files changed, 53 insertions(+), 1 deletion(-)

Comments

Elijah Newren Oct. 5, 2019, 7:40 p.m. UTC | #1
On Thu, Sep 19, 2019 at 3:06 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:

> During the 'git sparse-checkout init' call, we must first look
> to see if HEAD is valid, or else we will fail while trying to
> update the working directory. The first checkout will actually
> update the working directory correctly.

This is new since the RFC series, but I'm not sure I understand.  Is
the issue you're fixing here that a 'git init somerepo' would hit this
codepath and print funny errors because HEAD doesn't exist yet and
thus the whole `git read-tree -mu HEAD` stuff can't work?  Or that
when the remote has HEAD pointing at a bad commit that you get error
messages different than expected?

> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 895479970d..656e6ebdd5 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -99,6 +99,7 @@ static int sparse_checkout_init(int argc, const char **argv)
>         char *sparse_filename;
>         FILE *fp;
>         int res;
> +       struct object_id oid;
>
>         if (sc_enable_config())
>                 return 1;
> @@ -120,6 +121,11 @@ static int sparse_checkout_init(int argc, const char **argv)
>         fprintf(fp, "/*\n!/*/\n");
>         fclose(fp);
>
> +       if (get_oid("HEAD", &oid)) {
> +               /* assume we are in a fresh repo */
> +               return 0;
> +       }
> +
>  reset_dir:
>         return update_working_directory();
>  }
Derrick Stolee Oct. 7, 2019, 1:56 p.m. UTC | #2
On 10/5/2019 3:40 PM, Elijah Newren wrote:
> On Thu, Sep 19, 2019 at 3:06 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> 
>> During the 'git sparse-checkout init' call, we must first look
>> to see if HEAD is valid, or else we will fail while trying to
>> update the working directory. The first checkout will actually
>> update the working directory correctly.
> 
> This is new since the RFC series, but I'm not sure I understand.  Is
> the issue you're fixing here that a 'git init somerepo' would hit this
> codepath and print funny errors because HEAD doesn't exist yet and
> thus the whole `git read-tree -mu HEAD` stuff can't work?  Or that
> when the remote has HEAD pointing at a bad commit that you get error
> messages different than expected?

At the point where `git clone --sparse` calls `git sparse-checkout init`,
there is no HEAD. We need to initialize the sparse-checkout before the
clone operation populates the working directory and creates the HEAD
ref. For that reason, `git read-tree -mu HEAD` wouldn't work. But that's
fine, since there is nothing to do. The index update will happen later.

> 
>> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
>> index 895479970d..656e6ebdd5 100644
>> --- a/builtin/sparse-checkout.c
>> +++ b/builtin/sparse-checkout.c
>> @@ -99,6 +99,7 @@ static int sparse_checkout_init(int argc, const char **argv)
>>         char *sparse_filename;
>>         FILE *fp;
>>         int res;
>> +       struct object_id oid;
>>
>>         if (sc_enable_config())
>>                 return 1;
>> @@ -120,6 +121,11 @@ static int sparse_checkout_init(int argc, const char **argv)
>>         fprintf(fp, "/*\n!/*/\n");
>>         fclose(fp);
>>
>> +       if (get_oid("HEAD", &oid)) {
>> +               /* assume we are in a fresh repo */
>> +               return 0;
>> +       }
>> +
>>  reset_dir:
>>         return update_working_directory();
>>  }

Patch
diff mbox series

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 5fc97f14de..03299a8adb 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -15,7 +15,7 @@  SYNOPSIS
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch] [--no-tags]
 	  [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
-	  [--[no-]remote-submodules] [--jobs <n>] [--] <repository>
+	  [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -156,6 +156,12 @@  objects from the source repository into a pack in the cloned repository.
 	used, neither remote-tracking branches nor the related
 	configuration variables are created.
 
+--sparse::
+	Initialize the sparse-checkout file so the working
+	directory starts with only the files in the root
+	of the repository. The sparse-checkout file can be
+	modified to grow the working directory as needed.
+
 --mirror::
 	Set up a mirror of the source repository.  This implies `--bare`.
 	Compared to `--bare`, `--mirror` not only maps local branches of the
diff --git a/builtin/clone.c b/builtin/clone.c
index a693e6ca44..16f4e8b6fd 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -58,6 +58,7 @@  static const char *real_git_dir;
 static char *option_upload_pack = "git-upload-pack";
 static int option_verbosity;
 static int option_progress = -1;
+static int option_sparse_checkout;
 static enum transport_family family;
 static struct string_list option_config = STRING_LIST_INIT_NODUP;
 static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
@@ -145,6 +146,8 @@  static struct option builtin_clone_options[] = {
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
 	OPT_BOOL(0, "remote-submodules", &option_remote_submodules,
 		    N_("any cloned submodules will use their remote-tracking branch")),
+	OPT_BOOL(0, "sparse", &option_sparse_checkout,
+		    N_("initialize sparse-checkout file to include only files at root")),
 	OPT_END()
 };
 
@@ -723,6 +726,27 @@  static void update_head(const struct ref *our, const struct ref *remote,
 	}
 }
 
+static int git_sparse_checkout_init(const char *repo)
+{
+	struct argv_array argv = ARGV_ARRAY_INIT;
+	int result = 0;
+	argv_array_pushl(&argv, "-C", repo, "sparse-checkout", "init", NULL);
+
+	/*
+	 * We must apply the setting in the current process
+	 * for the later checkout to use the sparse-checkout file.
+	 */
+	core_apply_sparse_checkout = 1;
+
+	if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
+		error(_("failed to initialize sparse-checkout"));
+		result = 1;
+	}
+
+	argv_array_clear(&argv);
+	return result;
+}
+
 static int checkout(int submodule_progress)
 {
 	struct object_id oid;
@@ -1096,6 +1120,9 @@  int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_required_reference.nr || option_optional_reference.nr)
 		setup_reference();
 
+	if (option_sparse_checkout && git_sparse_checkout_init(repo))
+		return 1;
+
 	remote = remote_get(option_origin);
 
 	strbuf_addf(&default_refspec, "+%s*:%s*", src_ref_prefix,
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 895479970d..656e6ebdd5 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -99,6 +99,7 @@  static int sparse_checkout_init(int argc, const char **argv)
 	char *sparse_filename;
 	FILE *fp;
 	int res;
+	struct object_id oid;
 
 	if (sc_enable_config())
 		return 1;
@@ -120,6 +121,11 @@  static int sparse_checkout_init(int argc, const char **argv)
 	fprintf(fp, "/*\n!/*/\n");
 	fclose(fp);
 
+	if (get_oid("HEAD", &oid)) {
+		/* assume we are in a fresh repo */
+		return 0;
+	}
+
 reset_dir:
 	return update_working_directory();
 }
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index a6c6b336c9..26b4ce9acd 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -88,5 +88,18 @@  test_expect_success 'init with existing sparse-checkout' '
 	test_cmp expect dir
 '
 
+test_expect_success 'clone --sparse' '
+	git clone --sparse repo clone &&
+	git -C clone sparse-checkout list >actual &&
+	cat >expect <<-EOF &&
+		/*
+		!/*/
+	EOF
+	test_cmp expect actual &&
+	ls clone >dir &&
+	echo a >expect &&
+	test_cmp expect dir
+'
+
 test_done