diff mbox series

[4/6] builtin/stash: provide a way to export stashes to a ref

Message ID 20220310173236.4165310-5-sandals@crustytoothpaste.net (mailing list archive)
State Superseded
Headers show
Series Importing and exporting stashes to refs | expand

Commit Message

brian m. carlson March 10, 2022, 5:32 p.m. UTC
A common user problem is how to sync in-progress work to another
machine.  Users currently must use some sort of transfer of the working
tree, which poses security risks and also necessarily causes the index
to become dirty.  The experience is suboptimal and frustrating for
users.

A reasonable idea is to use the stash for this purpose, but the stash is
stored in the reflog, not in a ref, and as such it cannot be pushed or
pulled.  This also means that it cannot be saved into a bundle or
preserved elsewhere, which is a problem when using throwaway development
environments.

Let's solve this problem by allowing the user to export the stash to a
ref (or, to just write it into the repository and print the hash, à la
git commit-tree).  Introduce git stash export, which writes a chain of
commits identical to the ones used for reflog-based stashes, except that
the first parent is always a chain to the previous stash, or to a
single, empty commit (for the final item).

Iterate over each stash from topmost to bottomost, looking up the data
for each one, and then create the chain from the single empty commit
back up in reverse order.  Preserve the author and committer
information, as well as the commit message, so we produce an identical
stash when importing later.

If the user has specified specific stashes they'd like to export
instead, use those instead of iterating over all of the stashes.

As part of this, specifically request quiet behavior when looking up the
OID for a revision because we will eventually hit a revision that
doesn't exist and we don't want to die when that occurs.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/stash.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 179 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason March 11, 2022, 2:08 a.m. UTC | #1
On Thu, Mar 10 2022, brian m. carlson wrote:

> +	size_t author_len, committer_len;
> +	struct commit *this = NULL;
> +	const char *orig_author = NULL, *orig_committer = NULL;
> +	char *author = NULL, *committer = NULL;
> +	const char *buffer = NULL;
> +	unsigned long bufsize;
> +	const char *p;
> +	char *msg = NULL;

These shouldn't be initialized unless they really need to..

> +	this = lookup_commit_reference(the_repository, &info->w_commit);

..and some are clobbered right away here, so all of these should not be initializzed.

> +	buffer = get_commit_buffer(this, &bufsize);
> +	orig_author = find_commit_header(buffer, "author", &author_len);
> +	orig_committer = find_commit_header(buffer, "committer", &committer_len);
> +	p = memmem(buffer, bufsize, "\n\n", 2);

...since by doing so we hide genuine "uninitialized"
warnings. E.g. "author_len" here isn't initialized, but is set by
find_commit_header(), but if that line was removed we'd warn below, but
not if it's initialized when the variables are declared..

> +		for (size_t i = 0;; i++, nitems++) {
> +			char buf[32];
> +			int ret;
> +
> +			if (nalloc <= i) {
> +				size_t new = nalloc * 3 / 2 + 5;
> +				items = xrealloc(items, new * sizeof(*items));
> +				nalloc = new;

Can't we just use the usual ALLOC_GROW() pattern here?

> +			}
> +			snprintf(buf, sizeof(buf), "%zu", i);

Aren't the %z formats unportable (even with our newly found reliance on
more C99)? I vaguely recall trying them recently and the windows CI jobs
erroring...

> +	for (ssize_t i = nitems - 1; i >= 0; i--) {

The ssize_t type can be really small (it's not a signed size_t), so this
is unportable, but in practice maybe it's OK...

In this case if you're just wanting to count down in a list maybe you
can use the pattern I used in 99d60545f87 (string-list API: change "nr"
and "alloc" to "size_t", 2022-03-07)?

> +	if (ref) {
> +		update_ref(NULL, ref, &prev->object.oid, NULL, 0, UPDATE_REFS_DIE_ON_ERR);
> +	} else {
> +		puts(oid_to_hex(&prev->object.oid));
> +	}

Nit: braces can be gone.

> +out:
> +	for (size_t i = 0; i < nitems; i++) {
> +		free_stash_info(&items[i]);
> +	}

Ditto.

> +static int export_stash(int argc, const char **argv, const char *prefix)
> +{
> +	int ret = 0;
> +	int print = 0;
> +	const char *ref = NULL;
> +	struct option options[] = {
> +		OPT_BOOL(0, "print", &print,
> +			 N_("print the object ID instead of writing it to a ref")),
> +		OPT_STRING(0, "to-ref", &ref, "refname",

Needs _("refname")

> +			   N_("save the data to the given ref")),
> +		OPT_END()
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, options,
> +			     git_stash_export_usage,
> +			     PARSE_OPT_KEEP_DASHDASH);
> +
> +	if (!(!!ref ^ print))
> +		return error("exactly one of --print or --to-ref is required");

Cute, but maybe we can just use OPT_CMDMODE(), and if it's "to-ref"
shift one off argv afterwards (which'll be your &ref).

I.e. it'll give you the option incompatibility check, and without a new
translaed string.

Which, if we're keeping this version should use _().
Phillip Wood March 14, 2022, 9:19 p.m. UTC | #2
Hi Brian and Ævar

Firstly I think this is a useful feature to add to git stash, thanks for 
working on it Brian

On 11/03/2022 02:08, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Mar 10 2022, brian m. carlson wrote:
> 
>> +	size_t author_len, committer_len;
>> +	struct commit *this = NULL;
>> +	const char *orig_author = NULL, *orig_committer = NULL;
>> +	char *author = NULL, *committer = NULL;
>> +	const char *buffer = NULL;
>> +	unsigned long bufsize;
>> +	const char *p;
>> +	char *msg = NULL;
> 
> These shouldn't be initialized unless they really need to..
> 
>> +	this = lookup_commit_reference(the_repository, &info->w_commit);
> 
> ..and some are clobbered right away here, so all of these should not be initializzed.
> 
>> +	buffer = get_commit_buffer(this, &bufsize);
>> +	orig_author = find_commit_header(buffer, "author", &author_len);
>> +	orig_committer = find_commit_header(buffer, "committer", &committer_len);
>> +	p = memmem(buffer, bufsize, "\n\n", 2);

You could start searching from orig_committer rather than buffer but I'm 
sure it doesn't make any real difference. The sequencer does something 
similar to this to replay commits when rebasing - is there any scope for 
sharing code between the two?

> ...since by doing so we hide genuine "uninitialized"
> warnings. E.g. "author_len" here isn't initialized, but is set by
> find_commit_header(), but if that line was removed we'd warn below, but
> not if it's initialized when the variables are declared..
> 
>> +		for (size_t i = 0;; i++, nitems++) {

Do we need i and nitems?

>> +			char buf[32];
>> +			int ret;
>> +
>> +			if (nalloc <= i) {
>> +				size_t new = nalloc * 3 / 2 + 5;
>> +				items = xrealloc(items, new * sizeof(*items));
>> +				nalloc = new;
> 
> Can't we just use the usual ALLOC_GROW() pattern here?
ALLOC_GROW_BY() zeros out the memory which would mean we could remove 
the memset() calls in the loops. I noticed in some other loops we know 
the size in advance and could use CALLOC_ARRAY().

>> +			}
>> +			snprintf(buf, sizeof(buf), "%zu", i);
> 
> Aren't the %z formats unportable (even with our newly found reliance on
> more C99)? I vaguely recall trying them recently and the windows CI jobs
> erroring...

According to [1] it has been available since at least 2015. It is 
certainly much nicer than casting every size_t to uintmax_t and having 
to use PRIuMAX.

>> +	for (ssize_t i = nitems - 1; i >= 0; i--) {
> 
> The ssize_t type can be really small (it's not a signed size_t), so this
> is unportable, but in practice maybe it's OK...

I'm not really convinced by this ssize_t can be small argument[2], do 
you know of any platforms where it is true?

Best Wishes

Phillip

[1] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/format-specification-syntax-printf-and-wprintf-functions?view=msvc-140#size

[2] 
https://lore.kernel.org/git/e881a455-88b5-9c87-03a8-caaee68bb344@gmail.com/
Phillip Wood March 15, 2022, 10:50 a.m. UTC | #3
On 14/03/2022 21:19, Phillip Wood wrote:
> Hi Brian and Ævar

Sorry brian, I forgot you prefer a lowercase "b"

> Firstly I think this is a useful feature to add to git stash, thanks for 
> working on it Brian
> 
> On 11/03/2022 02:08, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, Mar 10 2022, brian m. carlson wrote:
>>
>>> +    size_t author_len, committer_len;
>>> +    struct commit *this = NULL;
>>> +    const char *orig_author = NULL, *orig_committer = NULL;
>>> +    char *author = NULL, *committer = NULL;
>>> +    const char *buffer = NULL;
>>> +    unsigned long bufsize;
>>> +    const char *p;
>>> +    char *msg = NULL;
>>
>> These shouldn't be initialized unless they really need to..
>>
>>> +    this = lookup_commit_reference(the_repository, &info->w_commit);
>>
>> ..and some are clobbered right away here, so all of these should not 
>> be initializzed.
>>
>>> +    buffer = get_commit_buffer(this, &bufsize);
>>> +    orig_author = find_commit_header(buffer, "author", &author_len);
>>> +    orig_committer = find_commit_header(buffer, "committer", 
>>> &committer_len);
>>> +    p = memmem(buffer, bufsize, "\n\n", 2);
> 
> You could start searching from orig_committer rather than buffer but I'm 
> sure it doesn't make any real difference. The sequencer does something 
> similar to this to replay commits when rebasing - is there any scope for 
> sharing code between the two?

I had a look at sequencer.c this morning and I think the answer is "not 
easily". In principle it would be nice to have a split_commit_header() 
function that filled a struct of pointer, length pairs for the author, 
committer, message and optionally the subject that could be used here, 
in the sequencer and in fast-export but I think that is more work than 
is reasonable for this series.

Best Wishes

Phillip
Junio C Hamano March 16, 2022, 5:05 p.m. UTC | #4
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> diff --git a/builtin/stash.c b/builtin/stash.c
> index 128f0a01ef..582a04dbab 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -33,6 +33,7 @@ static const char * const git_stash_usage[] = {
>  	   "          [--] [<pathspec>...]]"),
>  	N_("git stash save [-p|--patch] [-S|--staged] [-k|--[no-]keep-index] [-q|--quiet]\n"
>  	   "          [-u|--include-untracked] [-a|--all] [<message>]"),
> +	N_("git stash export (--print | --to-ref <ref>) [<stashes>]"),

If you mean we can take zero or more stash entries, the way to write
it I think is 

	[<stash>...]

cf. a few lines above about <pathspec>.

> +static const char * const git_stash_export_usage[] = {
> +	N_("git stash export (--print | --to-ref <ref>) [<stashes>]"),
> +	NULL
> +};

Likewise.

I think we agreed that it is a better design to reuse the actual
stash entries as one of the parents of the exported chain element
(while using another parent to string the entries together), so I
won't comment on the actual implementation that recreates multi-way
merge commits with no relation to the stash entries, which we won't
see in the next iteration.

Thanks.
brian m. carlson March 16, 2022, 9:48 p.m. UTC | #5
On 2022-03-14 at 21:19:10, Phillip Wood wrote:
> Hi Brian and Ævar
> 
> Firstly I think this is a useful feature to add to git stash, thanks for
> working on it Brian

Thanks.  I'm glad folks other than me will find it useful.

> On 11/03/2022 02:08, Ævar Arnfjörð Bjarmason wrote:
> > 
> > On Thu, Mar 10 2022, brian m. carlson wrote:
> > 
> > > +	size_t author_len, committer_len;
> > > +	struct commit *this = NULL;
> > > +	const char *orig_author = NULL, *orig_committer = NULL;
> > > +	char *author = NULL, *committer = NULL;
> > > +	const char *buffer = NULL;
> > > +	unsigned long bufsize;
> > > +	const char *p;
> > > +	char *msg = NULL;
> > 
> > These shouldn't be initialized unless they really need to..
> > 
> > > +	this = lookup_commit_reference(the_repository, &info->w_commit);
> > 
> > ..and some are clobbered right away here, so all of these should not be initializzed.

This function got hoisted out of what would otherwise be duplicated
code, and that's why they're all initialized (because we would otherwise
have called free on an uninitialized value).  I can remove the ones that
aren't strictly needed.

> > > +	buffer = get_commit_buffer(this, &bufsize);
> > > +	orig_author = find_commit_header(buffer, "author", &author_len);
> > > +	orig_committer = find_commit_header(buffer, "committer", &committer_len);
> > > +	p = memmem(buffer, bufsize, "\n\n", 2);
> 
> You could start searching from orig_committer rather than buffer but I'm
> sure it doesn't make any real difference. The sequencer does something
> similar to this to replay commits when rebasing - is there any scope for
> sharing code between the two?

I can look into it.  The amount of code that would be duplicated here is
very minimal, so I'm okay with just adding a few lines here.

> > ...since by doing so we hide genuine "uninitialized"
> > warnings. E.g. "author_len" here isn't initialized, but is set by
> > find_commit_header(), but if that line was removed we'd warn below, but
> > not if it's initialized when the variables are declared..
> > 
> > > +		for (size_t i = 0;; i++, nitems++) {
> 
> Do we need i and nitems?

I can look into removing them.

> > > +			char buf[32];
> > > +			int ret;
> > > +
> > > +			if (nalloc <= i) {
> > > +				size_t new = nalloc * 3 / 2 + 5;
> > > +				items = xrealloc(items, new * sizeof(*items));
> > > +				nalloc = new;
> > 
> > Can't we just use the usual ALLOC_GROW() pattern here?
> ALLOC_GROW_BY() zeros out the memory which would mean we could remove the
> memset() calls in the loops. I noticed in some other loops we know the size
> in advance and could use CALLOC_ARRAY().

Yeah, I can switch to that.  I was looking for that, but I was thinking
of a function and not a macro, so I missed it.

> > > +			}
> > > +			snprintf(buf, sizeof(buf), "%zu", i);
> > 
> > Aren't the %z formats unportable (even with our newly found reliance on
> > more C99)? I vaguely recall trying them recently and the windows CI jobs
> > erroring...
> 
> According to [1] it has been available since at least 2015. It is certainly
> much nicer than casting every size_t to uintmax_t and having to use PRIuMAX.

If we're relying on a new enough MSVC for C11, then it's much newer than
2015, so we should be fine.  It's mandatory on POSIX systems.
Ævar Arnfjörð Bjarmason March 18, 2022, 1:34 p.m. UTC | #6
On Wed, Mar 16 2022, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> On 2022-03-14 at 21:19:10, Phillip Wood wrote:
>> Hi Brian and Ævar
>> 
>> Firstly I think this is a useful feature to add to git stash, thanks for
>> working on it Brian
>
> Thanks.  I'm glad folks other than me will find it useful.
>
>> On 11/03/2022 02:08, Ævar Arnfjörð Bjarmason wrote:
>> > 
>> > On Thu, Mar 10 2022, brian m. carlson wrote:
>> > 
>> > > +	size_t author_len, committer_len;
>> > > +	struct commit *this = NULL;
>> > > +	const char *orig_author = NULL, *orig_committer = NULL;
>> > > +	char *author = NULL, *committer = NULL;
>> > > +	const char *buffer = NULL;
>> > > +	unsigned long bufsize;
>> > > +	const char *p;
>> > > +	char *msg = NULL;
>> > 
>> > These shouldn't be initialized unless they really need to..
>> > 
>> > > +	this = lookup_commit_reference(the_repository, &info->w_commit);
>> > 
>> > ..and some are clobbered right away here, so all of these should not be initializzed.
>
> This function got hoisted out of what would otherwise be duplicated
> code, and that's why they're all initialized (because we would otherwise
> have called free on an uninitialized value).  I can remove the ones that
> aren't strictly needed.
>
>> > > +	buffer = get_commit_buffer(this, &bufsize);
>> > > +	orig_author = find_commit_header(buffer, "author", &author_len);
>> > > +	orig_committer = find_commit_header(buffer, "committer", &committer_len);
>> > > +	p = memmem(buffer, bufsize, "\n\n", 2);
>> 
>> You could start searching from orig_committer rather than buffer but I'm
>> sure it doesn't make any real difference. The sequencer does something
>> similar to this to replay commits when rebasing - is there any scope for
>> sharing code between the two?
>
> I can look into it.  The amount of code that would be duplicated here is
> very minimal, so I'm okay with just adding a few lines here.
>
>> > ...since by doing so we hide genuine "uninitialized"
>> > warnings. E.g. "author_len" here isn't initialized, but is set by
>> > find_commit_header(), but if that line was removed we'd warn below, but
>> > not if it's initialized when the variables are declared..
>> > 
>> > > +		for (size_t i = 0;; i++, nitems++) {
>> 
>> Do we need i and nitems?
>
> I can look into removing them.
>
>> > > +			char buf[32];
>> > > +			int ret;
>> > > +
>> > > +			if (nalloc <= i) {
>> > > +				size_t new = nalloc * 3 / 2 + 5;
>> > > +				items = xrealloc(items, new * sizeof(*items));
>> > > +				nalloc = new;
>> > 
>> > Can't we just use the usual ALLOC_GROW() pattern here?
>> ALLOC_GROW_BY() zeros out the memory which would mean we could remove the
>> memset() calls in the loops. I noticed in some other loops we know the size
>> in advance and could use CALLOC_ARRAY().
>
> Yeah, I can switch to that.  I was looking for that, but I was thinking
> of a function and not a macro, so I missed it.
>
>> > > +			}
>> > > +			snprintf(buf, sizeof(buf), "%zu", i);
>> > 
>> > Aren't the %z formats unportable (even with our newly found reliance on
>> > more C99)? I vaguely recall trying them recently and the windows CI jobs
>> > erroring...
>> 
>> According to [1] it has been available since at least 2015. It is certainly
>> much nicer than casting every size_t to uintmax_t and having to use PRIuMAX.
>
> If we're relying on a new enough MSVC for C11, then it's much newer than
> 2015, so we should be fine.  It's mandatory on POSIX systems.

FWIW I dug into my logs and I ran into it with %zu (not %z), but that's
what you're using.

Sorry about being inaccurate, it seems %z's portability isn't the same
as %z.

I ran into it in mid-2021 in the GitHub CI, but those logs are deleted
now (and I didn't re-push that specific OID):

    https://github.com/avar/git/runs/2298653913

Where it would emit output like:

    builtin/log.c: In function 'gen_message_id':
    311
    builtin/log.c:1047:29: error: unknown conversion type character 'z' in format [-Werror=format=]
    312
     1047 |  strbuf_addf(&fmt, "%%s-%%0%zud.%%0%zud-%%s-%%s-%%s", tmp.len, tmp.len);

This SO post, whose accuracy I can't verify, claims it is supported in
VS 2013 or later, and that the way to check for it is with "_MSC_VER >=
1800":

    https://stackoverflow.com/questions/44382862/how-to-printf-a-size-t-without-warning-in-mingw-w64-gcc-7-1

So if we are going to use it and that's true (which would be great!) it
would IMO make sense to introduce it in some prep commit where we delete
e.g. "_MSC_VER>=1310" and other things you'll see in-tree if you look
through:

    git grep _MSC_VER

I.e. to push out some canary commit for using that specific feature, and
along with it delete the old MSVC compatibility shims (which presumably
we can't use if we're going to hard depend on %zu).
Ævar Arnfjörð Bjarmason March 18, 2022, 1:41 p.m. UTC | #7
On Mon, Mar 14 2022, Phillip Wood wrote:

> Hi Brian and Ævar
> [...]

Hi, sorry about the late reply, just on this point (which I see is stil
unaddressed), and changing $subject for the generic question:

>>> +	for (ssize_t i = nitems - 1; i >= 0; i--) {
>> The ssize_t type can be really small (it's not a signed size_t), so
>> this
>> is unportable, but in practice maybe it's OK...
>
> I'm not really convinced by this ssize_t can be small argument[2], do
> you know of any platforms where it is true?

Where we'd overflow in this code as written? Yes, every platform we
build on, since e.g. on Linux it's got half the unsigned size of
SIZE_MAX, on my 64 bit box:

    SIZE_MAX  = 18446744073709551615
    SSIZE_MAX = 9223372036854775807

Of course exceeding 2^63-1 or even 2^31-1 number of stashes seems
unlikely in practice.

If you meant are there platforms where ssize_t is as small as you can
pedintically make it, i.e. not just half the signed range of size_t, but
something much smaller?

Possibly, but I think that's unlikely in practice given the
homogenization in computing. Even C just mandated 2's compliment!

Although I think it's still imporant to understand that in the specific
case of ssize_t, unlike other "signed version of type X" the *intent* of
the standard is clearly to *not* mandate that it's a signed version of
size_t. I.e. it's intended for functions like:

     ssize_t read(int fildes, void *buf, size_t nbyte);

Which both per the standard and in practice might have limits that are
much smaller than their respective types. E.g. on my Linux box read(2)
says:

    On Linux, read() (and similar system calls) will transfer at most
    0x7ffff000 (2,147,479,552) bytes, returning the number of bytes
    actually transferred.  (This is true on both 32-bit and 64-bit
    systems.)

I can see how *some* platform might take those liberties in
practice. I.e. 1/2 of your addressable memory/offset != number of bytes
you'd want to return or consider at once for I/O purposes.

But in any case, we do have existing unportable code in-tree, but
generally with C it's good practice to avoid unportable code if it's
easy, you never know what platform will be released tomorrow that you
might have to scramble to support.

As I noted in this and other similar cases it's trivial to just use
size_t here. It's just a matter of changing (leaving the C99-specifics
here aside):

	for (i = nitems - 1; i >= 0; i--) {
		item = ary[i];
                [...];

To, as I did in 99d60545f87 (string-list API: change "nr" and "alloc" to
"size_t", 2022-03-07):

	for (i = nitems; i >= 1; i--) {
		item = ary[i - 1];
                [...];

Or even, if repeating "i - 1" is tedious:

	for (cnt = nitems; cnt >= 1; cnt--) {
		size_t i = cnt - 1;

		item = ary[i];
                [...];

Portability concerns aside I think such code is much clearer and easier
to reason about, since you no longer have to carefully squint to see if
we're doing the right thing with the two types in play.
Phillip Wood March 18, 2022, 4:26 p.m. UTC | #8
On 18/03/2022 13:34, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Mar 16 2022, brian m. carlson wrote:
> 
>> [[PGP Signed Part:Undecided]]
>> On 2022-03-14 at 21:19:10, Phillip Wood wrote:
>>> Hi Brian and Ævar
>>>
>>> Firstly I think this is a useful feature to add to git stash, thanks for
>>> working on it Brian
>>
>> Thanks.  I'm glad folks other than me will find it useful.
>>
>>> On 11/03/2022 02:08, Ævar Arnfjörð Bjarmason wrote:
>>>>
>>>> On Thu, Mar 10 2022, brian m. carlson wrote:
>>>>
>>>>> +	size_t author_len, committer_len;
>>>>> +	struct commit *this = NULL;
>>>>> +	const char *orig_author = NULL, *orig_committer = NULL;
>>>>> +	char *author = NULL, *committer = NULL;
>>>>> +	const char *buffer = NULL;
>>>>> +	unsigned long bufsize;
>>>>> +	const char *p;
>>>>> +	char *msg = NULL;
>>>>
>>>> These shouldn't be initialized unless they really need to..
>>>>
>>>>> +	this = lookup_commit_reference(the_repository, &info->w_commit);
>>>>
>>>> ..and some are clobbered right away here, so all of these should not be initializzed.
>>
>> This function got hoisted out of what would otherwise be duplicated
>> code, and that's why they're all initialized (because we would otherwise
>> have called free on an uninitialized value).  I can remove the ones that
>> aren't strictly needed.
>>
>>>>> +	buffer = get_commit_buffer(this, &bufsize);
>>>>> +	orig_author = find_commit_header(buffer, "author", &author_len);
>>>>> +	orig_committer = find_commit_header(buffer, "committer", &committer_len);
>>>>> +	p = memmem(buffer, bufsize, "\n\n", 2);
>>>
>>> You could start searching from orig_committer rather than buffer but I'm
>>> sure it doesn't make any real difference. The sequencer does something
>>> similar to this to replay commits when rebasing - is there any scope for
>>> sharing code between the two?
>>
>> I can look into it.  The amount of code that would be duplicated here is
>> very minimal, so I'm okay with just adding a few lines here.
>>
>>>> ...since by doing so we hide genuine "uninitialized"
>>>> warnings. E.g. "author_len" here isn't initialized, but is set by
>>>> find_commit_header(), but if that line was removed we'd warn below, but
>>>> not if it's initialized when the variables are declared..
>>>>
>>>>> +		for (size_t i = 0;; i++, nitems++) {
>>>
>>> Do we need i and nitems?
>>
>> I can look into removing them.
>>
>>>>> +			char buf[32];
>>>>> +			int ret;
>>>>> +
>>>>> +			if (nalloc <= i) {
>>>>> +				size_t new = nalloc * 3 / 2 + 5;
>>>>> +				items = xrealloc(items, new * sizeof(*items));
>>>>> +				nalloc = new;
>>>>
>>>> Can't we just use the usual ALLOC_GROW() pattern here?
>>> ALLOC_GROW_BY() zeros out the memory which would mean we could remove the
>>> memset() calls in the loops. I noticed in some other loops we know the size
>>> in advance and could use CALLOC_ARRAY().
>>
>> Yeah, I can switch to that.  I was looking for that, but I was thinking
>> of a function and not a macro, so I missed it.
>>
>>>>> +			}
>>>>> +			snprintf(buf, sizeof(buf), "%zu", i);
>>>>
>>>> Aren't the %z formats unportable (even with our newly found reliance on
>>>> more C99)? I vaguely recall trying them recently and the windows CI jobs
>>>> erroring...
>>>
>>> According to [1] it has been available since at least 2015. It is certainly
>>> much nicer than casting every size_t to uintmax_t and having to use PRIuMAX.
>>
>> If we're relying on a new enough MSVC for C11, then it's much newer than
>> 2015, so we should be fine.  It's mandatory on POSIX systems.
> 
> FWIW I dug into my logs and I ran into it with %zu (not %z), but that's
> what you're using.
> 
> Sorry about being inaccurate, it seems %z's portability isn't the same
> as %z.
> 
> I ran into it in mid-2021 in the GitHub CI, but those logs are deleted
> now (and I didn't re-push that specific OID):
> 
>      https://github.com/avar/git/runs/2298653913
> 
> Where it would emit output like:
> 
>      builtin/log.c: In function 'gen_message_id':
>      311
>      builtin/log.c:1047:29: error: unknown conversion type character 'z' in format [-Werror=format=]
>      312
>       1047 |  strbuf_addf(&fmt, "%%s-%%0%zud.%%0%zud-%%s-%%s-%%s", tmp.len, tmp.len);
> 
> This SO post, whose accuracy I can't verify, claims it is supported in
> VS 2013 or later, and that the way to check for it is with "_MSC_VER >=
> 1800":
> 
>      https://stackoverflow.com/questions/44382862/how-to-printf-a-size-t-without-warning-in-mingw-w64-gcc-7-1

Oh, so it is mingw that is the problem, not MSVC. Indeed using "%zu" 
(see the diff below) fails for the "win build" job[1] but the "win+VS 
build" succeeds[2]. mingw config.mak.uname to uses 
-D__MINGW_USE_ANSI_STDIO=0. Even if we could change that to =1 it would 
be insufficient as it does not affect the format specifiers allowed by 
__attribute__((format (printf, 3, 4))). I think to do that we would have 
to change "printf" to "__MINGW_PRINTF_FORMAT" for each attribute 
declaration[3].

In short unfortunately I don't think we can easily use "%zu"

Best wishes

Phillip

diff --git a/add-interactive.c b/add-interactive.c
index e1ab39cce3..1790ad6359 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -193,9 +193,8 @@ static ssize_t find_unique(const char *string, 
struct prefix_item_list *list)
         struct string_list_item *item;

         if (list->items.nr != list->sorted.nr)
-               BUG("prefix_item_list in inconsistent state (%"PRIuMAX
-                   " vs %"PRIuMAX")",
-                   (uintmax_t)list->items.nr, (uintmax_t)list->sorted.nr);
+               BUG("prefix_item_list in inconsistent state (%zu vs %zu)",
+                   list->items.nr, list->sorted.nr);

         if (index < 0)
                 item = list->sorted.items[-1 - index].util;

[1] 
https://github.com/phillipwood/git/runs/5601840748?check_suite_focus=true
[2] 
https://github.com/phillipwood/git/runs/5601840528?check_suite_focus=true
[3] https://sourceforge.net/p/mingw-w64/wiki2/gnu%20printf

> So if we are going to use it and that's true (which would be great!) it
> would IMO make sense to introduce it in some prep commit where we delete
> e.g. "_MSC_VER>=1310" and other things you'll see in-tree if you look
> through:
> 
>      git grep _MSC_VER
> 
> I.e. to push out some canary commit for using that specific feature, and
> along with it delete the old MSVC compatibility shims (which presumably
> we can't use if we're going to hard depend on %zu).
Johannes Schindelin March 24, 2022, 2:02 p.m. UTC | #9
Hi brian,

On Wed, 16 Mar 2022, brian m. carlson wrote:

> On 2022-03-14 at 21:19:10, Phillip Wood wrote:
>
> > On 11/03/2022 02:08, Ævar Arnfjörð Bjarmason wrote:
> > >
> > > On Thu, Mar 10 2022, brian m. carlson wrote:
> > >
> > > > +			}
> > > > +			snprintf(buf, sizeof(buf), "%zu", i);
> > >
> > > Aren't the %z formats unportable (even with our newly found reliance on
> > > more C99)? I vaguely recall trying them recently and the windows CI jobs
> > > erroring...
> >
> > According to [1] it has been available since at least 2015. It is certainly
> > much nicer than casting every size_t to uintmax_t and having to use PRIuMAX.
>
> If we're relying on a new enough MSVC for C11, then it's much newer than
> 2015, so we should be fine.  It's mandatory on POSIX systems.

The MSVCRT we're using in GCC is much, much older, and that won't change
anytime soon, I don't think.

You _might_ get the code to compile a `%zu` format by playing some macro
tricks, but executing the result would still not work.

It sure would be nice if we could use all that POSIX promises... but we
can't. Sorry...

Ciao,
Dscho
diff mbox series

Patch

diff --git a/builtin/stash.c b/builtin/stash.c
index 128f0a01ef..582a04dbab 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -33,6 +33,7 @@  static const char * const git_stash_usage[] = {
 	   "          [--] [<pathspec>...]]"),
 	N_("git stash save [-p|--patch] [-S|--staged] [-k|--[no-]keep-index] [-q|--quiet]\n"
 	   "          [-u|--include-untracked] [-a|--all] [<message>]"),
+	N_("git stash export (--print | --to-ref <ref>) [<stashes>]"),
 	NULL
 };
 
@@ -89,6 +90,12 @@  static const char * const git_stash_save_usage[] = {
 	NULL
 };
 
+static const char * const git_stash_export_usage[] = {
+	N_("git stash export (--print | --to-ref <ref>) [<stashes>]"),
+	NULL
+};
+
+
 static const char ref_stash[] = "refs/stash";
 static struct strbuf stash_index_path = STRBUF_INIT;
 
@@ -139,6 +146,7 @@  static int get_stash_info_1(struct stash_info *info, const char *commit, int qui
 	const char *revision;
 	struct object_id dummy;
 	struct strbuf symbolic = STRBUF_INIT;
+	struct object_context unused;
 
 	strbuf_init(&info->revision, 0);
 	if (!commit) {
@@ -158,7 +166,9 @@  static int get_stash_info_1(struct stash_info *info, const char *commit, int qui
 
 	revision = info->revision.buf;
 
-	if (get_oid(revision, &info->w_commit)) {
+	if (get_oid_with_context(the_repository, revision,
+				 GET_OID_QUIETLY | GET_OID_RETURN_FAILURE,
+				 &info->w_commit, &unused)) {
 		if (!quiet)
 			error(_("%s is not a valid reference"), revision);
 		free_stash_info(info);
@@ -1775,6 +1785,172 @@  static int save_stash(int argc, const char **argv, const char *prefix)
 	return ret;
 }
 
+static char *write_commit_with_parents(struct object_id *out, const struct stash_info *info, struct commit_list *parents)
+{
+	size_t author_len, committer_len;
+	struct commit *this = NULL;
+	const char *orig_author = NULL, *orig_committer = NULL;
+	char *author = NULL, *committer = NULL;
+	const char *buffer = NULL;
+	unsigned long bufsize;
+	const char *p;
+	char *msg = NULL;
+
+	this = lookup_commit_reference(the_repository, &info->w_commit);
+	buffer = get_commit_buffer(this, &bufsize);
+	orig_author = find_commit_header(buffer, "author", &author_len);
+	orig_committer = find_commit_header(buffer, "committer", &committer_len);
+	p = memmem(buffer, bufsize, "\n\n", 2);
+
+	if (!orig_author || !orig_committer || !p) {
+		error(_("cannot parse commit %s"), oid_to_hex(&info->w_commit));
+		goto out;
+	}
+	/* Jump to message. */
+	p += 2;
+
+	author = xmemdupz(orig_author, author_len);
+	committer = xmemdupz(orig_committer, committer_len);
+
+	if (commit_tree_extended(p, bufsize - (p - buffer),
+				 &info->w_tree, parents,
+				 out, author, committer,
+				 NULL, NULL)) {
+		error(_("could not write commit"));
+		goto out;
+	}
+	msg = xmemdupz(p, bufsize - (p - buffer));
+out:
+	unuse_commit_buffer(this, buffer);
+	free(author);
+	free(committer);
+	return msg;
+}
+
+static int do_export_stash(const char *ref, size_t argc, const char **argv)
+{
+	struct object_id base;
+	struct commit *prev;
+	struct stash_info *items = NULL;
+	size_t nitems = 0, nalloc = 0;
+	int res = 0;
+
+	prepare_fallback_ident("git stash", "git@stash");
+
+	/* First, we create a single empty commit. */
+	if (commit_tree(NULL, 0, the_hash_algo->empty_tree, NULL, &base, NULL, NULL))
+		return error(_("unable to write base commit"));
+
+	prev = lookup_commit_reference(the_repository, &base);
+
+
+	if (argc) {
+		/*
+		 * Find each specified stash, and load data into the array.
+		 */
+		for (size_t i = 0; i < argc; i++, nitems++) {
+			int ret;
+
+			if (nalloc <= i) {
+				size_t new = nalloc * 3 / 2 + 5;
+				items = xrealloc(items, new * sizeof(*items));
+				nalloc = new;
+			}
+			memset(&items[i], 0, sizeof(*items));
+			/* We want this to be quiet because it might not exist. */
+			ret = get_stash_info_1(&items[i], argv[i], 1);
+			if (ret)
+				return error(_("unable to find stash entry %s"), argv[i]);
+		}
+	} else {
+		/*
+		 * Walk the reflog, finding each stash entry, and load data into the
+		 * array.
+		 */
+		for (size_t i = 0;; i++, nitems++) {
+			char buf[32];
+			int ret;
+
+			if (nalloc <= i) {
+				size_t new = nalloc * 3 / 2 + 5;
+				items = xrealloc(items, new * sizeof(*items));
+				nalloc = new;
+			}
+			snprintf(buf, sizeof(buf), "%zu", i);
+			memset(&items[i], 0, sizeof(*items));
+			/* We want this to be quiet because it might not exist. */
+			ret = get_stash_info_1(&items[i], buf, 1);
+			if (ret)
+				break;
+		}
+	}
+
+	/*
+	 * Now, create a set of commits identical to the regular stash commits,
+	 * but where their first parents form a chain to our original empty
+	 * base commit.
+	 */
+	for (ssize_t i = nitems - 1; i >= 0; i--) {
+		struct commit_list *parents = NULL;
+		struct commit_list **next = &parents;
+		struct object_id out;
+		char *msg;
+
+		next = commit_list_append(prev, next);
+		next = commit_list_append(lookup_commit_reference(the_repository, &items[i].b_commit), next);
+		next = commit_list_append(lookup_commit_reference(the_repository, &items[i].i_commit), next);
+		if (items[i].has_u)
+			next = commit_list_append(lookup_commit_reference(the_repository,
+									  &items[i].u_commit),
+						  next);
+
+		msg = write_commit_with_parents(&out, &items[i], parents);
+		if (!msg) {
+			res = -1;
+			goto out;
+		}
+		free(msg);
+		prev = lookup_commit_reference(the_repository, &out);
+	}
+	if (ref) {
+		update_ref(NULL, ref, &prev->object.oid, NULL, 0, UPDATE_REFS_DIE_ON_ERR);
+	} else {
+		puts(oid_to_hex(&prev->object.oid));
+	}
+out:
+	for (size_t i = 0; i < nitems; i++) {
+		free_stash_info(&items[i]);
+	}
+	free(items);
+
+	return res;
+}
+
+static int export_stash(int argc, const char **argv, const char *prefix)
+{
+	int ret = 0;
+	int print = 0;
+	const char *ref = NULL;
+	struct option options[] = {
+		OPT_BOOL(0, "print", &print,
+			 N_("print the object ID instead of writing it to a ref")),
+		OPT_STRING(0, "to-ref", &ref, "refname",
+			   N_("save the data to the given ref")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options,
+			     git_stash_export_usage,
+			     PARSE_OPT_KEEP_DASHDASH);
+
+	if (!(!!ref ^ print))
+		return error("exactly one of --print or --to-ref is required");
+
+
+	ret = do_export_stash(ref, argc, argv);
+	return ret;
+}
+
 int cmd_stash(int argc, const char **argv, const char *prefix)
 {
 	pid_t pid = getpid();
@@ -1818,6 +1994,8 @@  int cmd_stash(int argc, const char **argv, const char *prefix)
 		return !!push_stash(argc, argv, prefix, 0);
 	else if (!strcmp(argv[0], "save"))
 		return !!save_stash(argc, argv, prefix);
+	else if (!strcmp(argv[0], "export"))
+		return !!export_stash(argc, argv, prefix);
 	else if (*argv[0] != '-')
 		usage_msg_optf(_("unknown subcommand: %s"),
 			       git_stash_usage, options, argv[0]);