Message ID | 20181220002610.43832-1-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | diff: add support for reading files literally with --no-index | expand |
On Thu, Dec 20, 2018 at 12:26:10AM +0000, brian m. carlson wrote: > In some shells, such as bash and zsh, it's possible to use a command > substitution to provide the output of a command as a file argument to > another process, like so: > > diff -u <(printf "a\nb\n") <(printf "a\nc\n") > > However, this syntax does not produce useful results with git diff > --no-index. > > On macOS, the arguments to the command are named pipes under /dev/fd, > and git diff doesn't know how to handle a named pipe. On Linux, the > arguments are symlinks to pipes, so git diff "helpfully" diffs these > symlinks, comparing their targets like "pipe:[1234]" and "pipe:[5678]". > > Because this behavior is not very helpful, and because git diff has many > features that people would like to use even on non-Git files, add an > option to git diff --no-index to read files literally, dereferencing > symlinks and reading them as a normal file. > > Note that this behavior requires that the files be read entirely into > memory, just as we do when reading from standard input. Yeah, I think this is a useful feature to have. It seemed familiar, and indeed there were some patches a while back: https://public-inbox.org/git/20170113102021.6054-1-dennis@kaarsemaker.net/ Compared to that series, the implementation here seems simpler and less likely to have unexpected effects. Reading the patch, one thing I _thought_ it did (but it does not) is to impact only the actual arguments on the command line, not entries we find from recursing trees. I.e., if I said: ln -s bar a/foo ln -s baz b/foo git diff --no-index --literally a/foo b/foo git diff --no-index --literally a/ b/ should I still see a/foo as a symlink in the second one, or not? The distinction is a bit subtle, but I think treating only the actual top-level arguments as symlinks would solve your problem, but still allow a more detailed diff for the recursive cases. There's some more discussion on this in the earlier iteration of that series: https://public-inbox.org/git/20161111201958.2175-1-dennis@kaarsemaker.net/ I also suspect people might want a config option to make this the default (which should be OK, as git-diff is, after all, porcelain). But either way, a command line option is the first step. > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > This is a long-standing annoyance of mine, and it also makes some use > cases possible (for example, diffing filtered and non-filtered objects). > > We don't include a test for the pipe scenario because I couldn't get > that case to work in portable shell (although of course it works in > bash). I have, however, tested it on both macOS and Linux. No clue how > this works on Windows. I think focusing on symlinks makes sense. You could probably test pipes with mkfifo, but looking at your implementation, it's pretty clear that it will operate on anything that works with read(). > diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt > index 030f162f30..4c4695c88d 100644 > --- a/Documentation/git-diff.txt > +++ b/Documentation/git-diff.txt > @@ -111,6 +111,11 @@ include::diff-options.txt[] > "Unmerged". Can be used only when comparing the working tree > with the index. > > +--literally:: > + Read the specified files literally, as `diff` would, > + dereferencing any symlinks and reading data from pipes. > + This option only works with `--no-index`. > + Looks like spaces for indent, whereas the context uses tabs. I think "literal" is a good way to describe this concept. If we do grow a config option to make this the default, then countermanding it would be "--no-literally", which parses a bit funny as English. If you agree with my "only the top-level" line of reasoning above, maybe "--literal-arguments" and "--no-literal-arguments" might make sense. We could also in theory offer several levels: no literals, top-level literals, everything literal, at which point it becomes a tri-state. > -static struct diff_filespec *noindex_filespec(const char *name, int mode) > +static int populate_literally(struct diff_filespec *s) > +{ > + struct strbuf buf = STRBUF_INIT; > + size_t size = 0; > + int fd = xopen(s->path, O_RDONLY); > + > + if (strbuf_read(&buf, fd, 0) < 0) > + return error_errno("error while reading from '%s'", s->path); > + > + s->should_munmap = 0; > + s->data = strbuf_detach(&buf, &size); > + s->size = size; > + s->should_free = 1; > + s->read_literally = 1; > + return 0; > +} > + > +static struct diff_filespec *noindex_filespec(const char *name, int mode, > + struct diff_options *o) > [...] > + else if (o->flags.read_literally) > + populate_literally(s); The implementation is pleasantly simple. If we want to do the "only top-level" thing, we'd just have to pass in a depth flag here. > diff --git a/diff.c b/diff.c > index dc9965e836..740d0087b9 100644 > --- a/diff.c > +++ b/diff.c > @@ -4282,18 +4282,18 @@ static void run_diff_cmd(const char *pgm, > fprintf(o->file, "* Unmerged path %s\n", name); > } > > -static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *istate) > +static void diff_fill_oid_info(struct diff_filespec *one, struct diff_options *o) It might be worth breaking these "pass the options around" hunks into a separate preparatory patch. -Peff
On Thu, Dec 20, 2018 at 1:26 AM brian m. carlson <sandals@crustytoothpaste.net> wrote: > @@ -5159,6 +5159,8 @@ int diff_opt_parse(struct diff_options *options, > options->flags.funccontext = 1; > else if (!strcmp(arg, "--no-function-context")) > options->flags.funccontext = 0; > + else if (!strcmp(arg, "--literally")) > + options->flags.read_literally = 1; You probably want to check in diff_setup_done() that if flags.read_literally is set but flags.no_index is not, then abandon ship and die() because --literally is not used with --no-index. Even when --no-index is implicit, flags.no_index should be set. I wonder if --follow-symlinks would be a good alternative for this (then if the final destination is unmmapable then we just read the file whole in memory without the user asking, so it will work with pipes). --follow-symlinks then could be made work with non-"no-index" case too. But --literally is also ok.
On Thu, Dec 20, 2018 at 06:06:11PM +0100, Duy Nguyen wrote: > On Thu, Dec 20, 2018 at 1:26 AM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > @@ -5159,6 +5159,8 @@ int diff_opt_parse(struct diff_options *options, > > options->flags.funccontext = 1; > > else if (!strcmp(arg, "--no-function-context")) > > options->flags.funccontext = 0; > > + else if (!strcmp(arg, "--literally")) > > + options->flags.read_literally = 1; > > You probably want to check in diff_setup_done() that if > flags.read_literally is set but flags.no_index is not, then abandon > ship and die() because --literally is not used with --no-index. Even > when --no-index is implicit, flags.no_index should be set. Yeah, good catch. "git diff --literally HEAD" should report an error. > I wonder if --follow-symlinks would be a good alternative for this > (then if the final destination is unmmapable then we just read the > file whole in memory without the user asking, so it will work with > pipes). --follow-symlinks then could be made work with non-"no-index" > case too. But --literally is also ok. It's more than symlinks, though. Reading from a named pipe, we'd want to see the actual contents with --literally (and not "oops, I don't know how to represent a named pipe"). -Peff
On Thu, Dec 20, 2018 at 6:18 PM Jeff King <peff@peff.net> wrote: > > I wonder if --follow-symlinks would be a good alternative for this > > (then if the final destination is unmmapable then we just read the > > file whole in memory without the user asking, so it will work with > > pipes). --follow-symlinks then could be made work with non-"no-index" > > case too. But --literally is also ok. > > It's more than symlinks, though. Reading from a named pipe, we'd want to > see the actual contents with --literally (and not "oops, I don't know > how to represent a named pipe"). Yes, but I think at least --no-index it makes sense to just fall back to read() if we can't mmap(). mmap is more of an optimization than a requirement. There's no loss going from "oops I don't know how to represent it" to "here's the content from whatever what that device is". Symlinks are different because we have two possible representations and the user should choose.
On Thu, Dec 20, 2018 at 06:23:42PM +0100, Duy Nguyen wrote: > On Thu, Dec 20, 2018 at 6:18 PM Jeff King <peff@peff.net> wrote: > > > I wonder if --follow-symlinks would be a good alternative for this > > > (then if the final destination is unmmapable then we just read the > > > file whole in memory without the user asking, so it will work with > > > pipes). --follow-symlinks then could be made work with non-"no-index" > > > case too. But --literally is also ok. > > > > It's more than symlinks, though. Reading from a named pipe, we'd want to > > see the actual contents with --literally (and not "oops, I don't know > > how to represent a named pipe"). > > Yes, but I think at least --no-index it makes sense to just fall back > to read() if we can't mmap(). mmap is more of an optimization than a > requirement. There's no loss going from "oops I don't know how to > represent it" to "here's the content from whatever what that device > is". Symlinks are different because we have two possible > representations and the user should choose. Oh, I see. I misread your paragraph. Yeah, I think that is the behavior I would want by default, too, though the previous thread makes me thing some people would literally rather have the "I can't represent this" behavior (because they really do want a diff that can reconstruct the filesystem state, and an error if not). -Peff
On Thu, Dec 20, 2018 at 6:32 PM Jeff King <peff@peff.net> wrote: > > On Thu, Dec 20, 2018 at 06:23:42PM +0100, Duy Nguyen wrote: > > > On Thu, Dec 20, 2018 at 6:18 PM Jeff King <peff@peff.net> wrote: > > > > I wonder if --follow-symlinks would be a good alternative for this > > > > (then if the final destination is unmmapable then we just read the > > > > file whole in memory without the user asking, so it will work with > > > > pipes). --follow-symlinks then could be made work with non-"no-index" > > > > case too. But --literally is also ok. > > > > > > It's more than symlinks, though. Reading from a named pipe, we'd want to > > > see the actual contents with --literally (and not "oops, I don't know > > > how to represent a named pipe"). > > > > Yes, but I think at least --no-index it makes sense to just fall back > > to read() if we can't mmap(). mmap is more of an optimization than a > > requirement. There's no loss going from "oops I don't know how to > > represent it" to "here's the content from whatever what that device > > is". Symlinks are different because we have two possible > > representations and the user should choose. > > Oh, I see. I misread your paragraph. Yeah, I think that is the behavior > I would want by default, too, though the previous thread makes me thing > some people would literally rather have the "I can't represent this" > behavior (because they really do want a diff that can reconstruct the > filesystem state, and an error if not). I can't see a good use case for that. But yeah it would not harm to be a bit more conservative, then make --literally default later and leave --no-literally as an escape hatch.
On Thu, Dec 20 2018, brian m. carlson wrote: > We don't include a test for the pipe scenario because I couldn't get > that case to work in portable shell (although of course it works in > bash). I have, however, tested it on both macOS and Linux. No clue how > this works on Windows. You can (and should) add tests using the PROCESS_SUBSTITUTION prereq Dennis used in an earlier version of this: https://public-inbox.org/git/20170113102021.6054-3-dennis@kaarsemaker.net/ Even if that wasn't possible a bash-specific test is better than no test, and can be had by including t/lib-bash.sh in your test, see e.g. the completion tests.
On Thu, Dec 20, 2018 at 10:43:06PM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Dec 20 2018, brian m. carlson wrote: > > > We don't include a test for the pipe scenario because I couldn't get > > that case to work in portable shell (although of course it works in > > bash). I have, however, tested it on both macOS and Linux. No clue how > > this works on Windows. > > You can (and should) add tests using the PROCESS_SUBSTITUTION prereq > Dennis used in an earlier version of this: > https://public-inbox.org/git/20170113102021.6054-3-dennis@kaarsemaker.net/ I'm happy to steal that code. I didn't know this had come up before, so I didn't think of it.
On Thu, Dec 20, 2018 at 10:48:41AM -0500, Jeff King wrote: > The distinction is a bit subtle, but I think treating only the actual > top-level arguments as symlinks would solve your problem, but still > allow a more detailed diff for the recursive cases. Yeah, I think that would be better. I'll add a test. > Looks like spaces for indent, whereas the context uses tabs. Will fix. > I think "literal" is a good way to describe this concept. If we do grow > a config option to make this the default, then countermanding it would > be "--no-literally", which parses a bit funny as English. > > If you agree with my "only the top-level" line of reasoning above, maybe > "--literal-arguments" and "--no-literal-arguments" might make sense. > > We could also in theory offer several levels: no literals, top-level > literals, everything literal, at which point it becomes a tri-state. Yeah, this is what the POSIX symlink options (-H, -L, -P) do. I originally came up with "--dereference" as the name, but I decided to change it, because it doesn't affect just symlinks. I think --literal-arguments is better. Theoretically, we could adopt the POSIX options for symlink/pipe handling if we want to in the future, but I think that's a decision we should make later. > > diff --git a/diff.c b/diff.c > > index dc9965e836..740d0087b9 100644 > > --- a/diff.c > > +++ b/diff.c > > @@ -4282,18 +4282,18 @@ static void run_diff_cmd(const char *pgm, > > fprintf(o->file, "* Unmerged path %s\n", name); > > } > > > > -static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *istate) > > +static void diff_fill_oid_info(struct diff_filespec *one, struct diff_options *o) > > It might be worth breaking these "pass the options around" hunks into a > separate preparatory patch. I can do that.
Hi Brian, On Thu, 20 Dec 2018, brian m. carlson wrote: > In some shells, such as bash and zsh, it's possible to use a command > substitution to provide the output of a command as a file argument to > another process, like so: > > diff -u <(printf "a\nb\n") <(printf "a\nc\n") > > However, this syntax does not produce useful results with git diff > --no-index. > > On macOS, the arguments to the command are named pipes under /dev/fd, > and git diff doesn't know how to handle a named pipe. On Linux, the > arguments are symlinks to pipes, so git diff "helpfully" diffs these > symlinks, comparing their targets like "pipe:[1234]" and "pipe:[5678]". ... and in Git for Windows' Bash, it would result in something like: $ git -P diff --no-index <(printf "a\nb\n") <(printf "a\nc\n") error: Could not access '/proc/24012/fd/63' ... because the Bash is "MSYS2-aware" and knows about `/proc/` while `git.exe` is a pure Win32 executable that has no idea what Bash is talking about. Sadly, your patch does not change the situation one bit (because it does not change the fact that the MSYS2 Bash passes a path to `git.exe` that is not a valid Windows path, and neither could it, but that's not the problem of your patch). I reviewed your patch and it looks good to me! Thanks, Dscho > Because this behavior is not very helpful, and because git diff has many > features that people would like to use even on non-Git files, add an > option to git diff --no-index to read files literally, dereferencing > symlinks and reading them as a normal file. > > Note that this behavior requires that the files be read entirely into > memory, just as we do when reading from standard input. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > This is a long-standing annoyance of mine, and it also makes some use > cases possible (for example, diffing filtered and non-filtered objects). > > We don't include a test for the pipe scenario because I couldn't get > that case to work in portable shell (although of course it works in > bash). I have, however, tested it on both macOS and Linux. No clue how > this works on Windows. > > Documentation/git-diff.txt | 5 +++++ > diff-no-index.c | 34 +++++++++++++++++++++++++++------- > diff.c | 24 +++++++++++++----------- > diff.h | 1 + > diffcore.h | 1 + > t/t4053-diff-no-index.sh | 28 ++++++++++++++++++++++++++++ > 6 files changed, 75 insertions(+), 18 deletions(-) > > diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt > index 030f162f30..4c4695c88d 100644 > --- a/Documentation/git-diff.txt > +++ b/Documentation/git-diff.txt > @@ -111,6 +111,11 @@ include::diff-options.txt[] > "Unmerged". Can be used only when comparing the working tree > with the index. > > +--literally:: > + Read the specified files literally, as `diff` would, > + dereferencing any symlinks and reading data from pipes. > + This option only works with `--no-index`. > + > <path>...:: > The <paths> parameters, when given, are used to limit > the diff to the named paths (you can give directory > diff --git a/diff-no-index.c b/diff-no-index.c > index 9414e922d1..2707206aee 100644 > --- a/diff-no-index.c > +++ b/diff-no-index.c > @@ -75,7 +75,25 @@ static int populate_from_stdin(struct diff_filespec *s) > return 0; > } > > -static struct diff_filespec *noindex_filespec(const char *name, int mode) > +static int populate_literally(struct diff_filespec *s) > +{ > + struct strbuf buf = STRBUF_INIT; > + size_t size = 0; > + int fd = xopen(s->path, O_RDONLY); > + > + if (strbuf_read(&buf, fd, 0) < 0) > + return error_errno("error while reading from '%s'", s->path); > + > + s->should_munmap = 0; > + s->data = strbuf_detach(&buf, &size); > + s->size = size; > + s->should_free = 1; > + s->read_literally = 1; > + return 0; > +} > + > +static struct diff_filespec *noindex_filespec(const char *name, int mode, > + struct diff_options *o) > { > struct diff_filespec *s; > > @@ -85,6 +103,8 @@ static struct diff_filespec *noindex_filespec(const char *name, int mode) > fill_filespec(s, &null_oid, 0, mode); > if (name == file_from_standard_input) > populate_from_stdin(s); > + else if (o->flags.read_literally) > + populate_literally(s); > return s; > } > > @@ -101,14 +121,14 @@ static int queue_diff(struct diff_options *o, > > if (S_ISDIR(mode1)) { > /* 2 is file that is created */ > - d1 = noindex_filespec(NULL, 0); > - d2 = noindex_filespec(name2, mode2); > + d1 = noindex_filespec(NULL, 0, o); > + d2 = noindex_filespec(name2, mode2, o); > name2 = NULL; > mode2 = 0; > } else { > /* 1 is file that is deleted */ > - d1 = noindex_filespec(name1, mode1); > - d2 = noindex_filespec(NULL, 0); > + d1 = noindex_filespec(name1, mode1, o); > + d2 = noindex_filespec(NULL, 0, o); > name1 = NULL; > mode1 = 0; > } > @@ -189,8 +209,8 @@ static int queue_diff(struct diff_options *o, > SWAP(name1, name2); > } > > - d1 = noindex_filespec(name1, mode1); > - d2 = noindex_filespec(name2, mode2); > + d1 = noindex_filespec(name1, mode1, o); > + d2 = noindex_filespec(name2, mode2, o); > diff_queue(&diff_queued_diff, d1, d2); > return 0; > } > diff --git a/diff.c b/diff.c > index dc9965e836..740d0087b9 100644 > --- a/diff.c > +++ b/diff.c > @@ -4282,18 +4282,18 @@ static void run_diff_cmd(const char *pgm, > fprintf(o->file, "* Unmerged path %s\n", name); > } > > -static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *istate) > +static void diff_fill_oid_info(struct diff_filespec *one, struct diff_options *o) > { > if (DIFF_FILE_VALID(one)) { > if (!one->oid_valid) { > struct stat st; > - if (one->is_stdin) { > + if (one->is_stdin || one->read_literally) { > oidclr(&one->oid); > return; > } > if (lstat(one->path, &st) < 0) > die_errno("stat '%s'", one->path); > - if (index_path(istate, &one->oid, one->path, &st, 0)) > + if (index_path(o->repo->index, &one->oid, one->path, &st, 0)) > die("cannot hash %s", one->path); > } > } > @@ -4341,8 +4341,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o) > return; > } > > - diff_fill_oid_info(one, o->repo->index); > - diff_fill_oid_info(two, o->repo->index); > + diff_fill_oid_info(one, o); > + diff_fill_oid_info(two, o); > > if (!pgm && > DIFF_FILE_VALID(one) && DIFF_FILE_VALID(two) && > @@ -4389,8 +4389,8 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o, > if (o->prefix_length) > strip_prefix(o->prefix_length, &name, &other); > > - diff_fill_oid_info(p->one, o->repo->index); > - diff_fill_oid_info(p->two, o->repo->index); > + diff_fill_oid_info(p->one, o); > + diff_fill_oid_info(p->two, o); > > builtin_diffstat(name, other, p->one, p->two, > diffstat, o, p); > @@ -4414,8 +4414,8 @@ static void run_checkdiff(struct diff_filepair *p, struct diff_options *o) > if (o->prefix_length) > strip_prefix(o->prefix_length, &name, &other); > > - diff_fill_oid_info(p->one, o->repo->index); > - diff_fill_oid_info(p->two, o->repo->index); > + diff_fill_oid_info(p->one, o); > + diff_fill_oid_info(p->two, o); > > builtin_checkdiff(name, other, attr_path, p->one, p->two, o); > } > @@ -5159,6 +5159,8 @@ int diff_opt_parse(struct diff_options *options, > options->flags.funccontext = 1; > else if (!strcmp(arg, "--no-function-context")) > options->flags.funccontext = 0; > + else if (!strcmp(arg, "--literally")) > + options->flags.read_literally = 1; > else if ((argcount = parse_long_opt("output", av, &optarg))) { > char *path = prefix_filename(prefix, optarg); > options->file = xfopen(path, "w"); > @@ -5720,8 +5722,8 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid > if (DIFF_PAIR_UNMERGED(p)) > continue; > > - diff_fill_oid_info(p->one, options->repo->index); > - diff_fill_oid_info(p->two, options->repo->index); > + diff_fill_oid_info(p->one, options); > + diff_fill_oid_info(p->two, options); > > len1 = remove_space(p->one->path, strlen(p->one->path)); > len2 = remove_space(p->two->path, strlen(p->two->path)); > diff --git a/diff.h b/diff.h > index ce5e8a8183..7dedd3bcd1 100644 > --- a/diff.h > +++ b/diff.h > @@ -97,6 +97,7 @@ struct diff_flags { > unsigned stat_with_summary:1; > unsigned suppress_diff_headers:1; > unsigned dual_color_diffed_diffs:1; > + unsigned read_literally:1; > }; > > static inline void diff_flags_or(struct diff_flags *a, > diff --git a/diffcore.h b/diffcore.h > index b651061c0e..363869447a 100644 > --- a/diffcore.h > +++ b/diffcore.h > @@ -48,6 +48,7 @@ struct diff_filespec { > #define DIRTY_SUBMODULE_UNTRACKED 1 > #define DIRTY_SUBMODULE_MODIFIED 2 > unsigned is_stdin : 1; > + unsigned read_literally : 1; > unsigned has_more_entries : 1; /* only appear in combined diff */ > /* data should be considered "binary"; -1 means "don't know yet" */ > signed int is_binary : 2; > diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh > index 6e0dd6f9e5..53e6bcdc19 100755 > --- a/t/t4053-diff-no-index.sh > +++ b/t/t4053-diff-no-index.sh > @@ -137,4 +137,32 @@ test_expect_success 'diff --no-index from repo subdir with absolute paths' ' > test_cmp expect actual > ' > > +test_expect_success 'diff --no-index --literally' ' > + echo "diff --git a/../../non/git/a b/../../non/git/b" >expect && > + test_expect_code 1 \ > + git -C repo/sub \ > + diff --literally ../../non/git/a ../../non/git/b >actual && > + head -n 1 <actual >actual.head && > + test_cmp expect actual.head > +' > + > +test_expect_success SYMLINKS 'diff --no-index --literally with symlinks' ' > + test_write_lines a b c >f1 && > + test_write_lines a d c >f2 && > + ln -s f1 s1 && > + ln -s f2 s2 && > + cat >expect <<-\EOF && > + diff --git a/s1 b/s2 > + --- a/s1 > + +++ b/s2 > + @@ -1,3 +1,3 @@ > + a > + -b > + +d > + c > + EOF > + test_expect_code 1 git diff --no-index --literally s1 s2 >actual && > + test_cmp expect actual > +' > + > test_done >
On Fri, Dec 21, 2018 at 12:52:04PM +0100, Johannes Schindelin wrote: > ... and in Git for Windows' Bash, it would result in something like: > > $ git -P diff --no-index <(printf "a\nb\n") <(printf "a\nc\n") > error: Could not access '/proc/24012/fd/63' > > ... because the Bash is "MSYS2-aware" and knows about `/proc/` while > `git.exe` is a pure Win32 executable that has no idea what Bash is talking > about. > > Sadly, your patch does not change the situation one bit (because it does > not change the fact that the MSYS2 Bash passes a path to `git.exe` that is > not a valid Windows path, and neither could it, but that's not the problem > of your patch). That tells me that I need to exclude Windows (excepting Cygwin) if I add a PROCESS_SUBSTITUTION test like Ævar suggested, and that indeed is helpful information. I'm sorry my patch won't be useful on Windows, though.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > +test_expect_success SYMLINKS 'diff --no-index --literally with symlinks' ' > + test_write_lines a b c >f1 && > + test_write_lines a d c >f2 && > + ln -s f1 s1 && > + ln -s f2 s2 && > + cat >expect <<-\EOF && > + diff --git a/s1 b/s2 > + --- a/s1 > + +++ b/s2 > + @@ -1,3 +1,3 @@ > + a > + -b > + +d > + c > + EOF > + test_expect_code 1 git diff --no-index --literally s1 s2 >actual && > + test_cmp expect actual > +' This is good as a goal, but the implementation seems to be overly eager to dereference any symlink or non-regular file found in any level of recursion. The use case presented as the justification in the proposed log message, and the explanation in the documentation, suggests that only the paths given from the command line are treated this way. It may make sense to do this as two orthgonal flags. Dereference symlinks and named pipes given from the command line is one use case. Dereference any symlinks encountered during recursion is another. And the latter might be useful even inside a repository as an option, even though the former would never make sense unless running in --no-index mode. Thanks.
On Wed, Jan 02, 2019 at 10:56:45AM -0800, Junio C Hamano wrote: > This is good as a goal, but the implementation seems to be overly > eager to dereference any symlink or non-regular file found in any > level of recursion. The use case presented as the justification in > the proposed log message, and the explanation in the documentation, > suggests that only the paths given from the command line are treated > this way. > > It may make sense to do this as two orthgonal flags. Dereference > symlinks and named pipes given from the command line is one use > case. Dereference any symlinks encountered during recursion is > another. And the latter might be useful even inside a repository > as an option, even though the former would never make sense unless > running in --no-index mode. Yeah, I definitely think I'll be doing a reroll to address the recursion issue as Peff suggested. I've just been on vacation and will be traveling again soon, so I haven't gotten to it yet. Feel free to wait for the reroll before picking it up.
Hi, brian m. carlson wrote: > In some shells, such as bash and zsh, it's possible to use a command > substitution to provide the output of a command as a file argument to > another process, like so: > > diff -u <(printf "a\nb\n") <(printf "a\nc\n") > > However, this syntax does not produce useful results with git diff > --no-index. Thanks much for fixing this. It's something I've run into, too. [...] > --- a/Documentation/git-diff.txt > +++ b/Documentation/git-diff.txt > @@ -111,6 +111,11 @@ include::diff-options.txt[] > "Unmerged". Can be used only when comparing the working tree > with the index. > > +--literally:: > + Read the specified files literally, as `diff` would, > + dereferencing any symlinks and reading data from pipes. > + This option only works with `--no-index`. I may be a minority in this opinion, but I had trouble understanding what --literally would do from its name. I suspect we can come up with a better name. Unfortunately, I'm terrible at coming up with names. :-P --dereference would be a good name when it comes to symlinks, but it's not a good name for reading what is on the other side of a pipe. On the plus side, it matches "diff" and "cp"'s name for the "follow symbolic links" option. --plain captures the desire a little better --- we want a plain read(2) from the file instead of trying to be smart and look at whether it is e.g. a block device. But in the context of "diff", that would seem more like an option that affects the output. What would you think of - always reading from fifos instead of describing them, since I've never encountered a use case where people want the latter - --dereference to control whether to follow symlinks ? [...] > --- a/diff-no-index.c > +++ b/diff-no-index.c > @@ -75,7 +75,25 @@ static int populate_from_stdin(struct diff_filespec *s) > return 0; > } > > -static struct diff_filespec *noindex_filespec(const char *name, int mode) > +static int populate_literally(struct diff_filespec *s) > +{ > + struct strbuf buf = STRBUF_INIT; > + size_t size = 0; > + int fd = xopen(s->path, O_RDONLY); > + > + if (strbuf_read(&buf, fd, 0) < 0) > + return error_errno("error while reading from '%s'", s->path); > + > + s->should_munmap = 0; > + s->data = strbuf_detach(&buf, &size); > + s->size = size; > + s->should_free = 1; > + s->read_literally = 1; Oh! --read-literally works perfectly for me as a name. :) Jonathan
On Thu, Jan 03, 2019 at 06:18:55PM -0800, Jonathan Nieder wrote: > I may be a minority in this opinion, but I had trouble understanding > what --literally would do from its name. I suspect we can come up > with a better name. > > Unfortunately, I'm terrible at coming up with names. :-P > > --dereference would be a good name when it comes to symlinks, but > it's not a good name for reading what is on the other side of a pipe. > On the plus side, it matches "diff" and "cp"'s name for the "follow > symbolic links" option. > > --plain captures the desire a little better --- we want a plain > read(2) from the file instead of trying to be smart and look at > whether it is e.g. a block device. But in the context of "diff", that > would seem more like an option that affects the output. > > What would you think of > > - always reading from fifos instead of describing them, since I've > never encountered a use case where people want the latter > > - --dereference to control whether to follow symlinks This is actually surprisingly difficult. The reason I implemented this only for no-index mode is because there are actually several places we can stat a file in the diff code, and implementing a --dereference option that catches all of those cases and getting the option passed down to them is non-trivial. Since my primary environment is on Linux, I need to implement functional --dereference semantics before implementing the FIFO functionality to know if it works in my use case. I agree that would be a nicer approach overall, so let me see if I can take another stab at it in a way that isn't excessively complex. It would also make the name less controversial, since "--dereference" is the obvious choice for dereferencing symlinks. > [...] > > --- a/diff-no-index.c > > +++ b/diff-no-index.c > > @@ -75,7 +75,25 @@ static int populate_from_stdin(struct diff_filespec *s) > > return 0; > > } > > > > -static struct diff_filespec *noindex_filespec(const char *name, int mode) > > +static int populate_literally(struct diff_filespec *s) > > +{ > > + struct strbuf buf = STRBUF_INIT; > > + size_t size = 0; > > + int fd = xopen(s->path, O_RDONLY); > > + > > + if (strbuf_read(&buf, fd, 0) < 0) > > + return error_errno("error while reading from '%s'", s->path); > > + > > + s->should_munmap = 0; > > + s->data = strbuf_detach(&buf, &size); > > + s->size = size; > > + s->should_free = 1; > > + s->read_literally = 1; > > Oh! --read-literally works perfectly for me as a name. :) If I can't manage to split this out into two pieces, I'll pick that as the name for the reroll.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: >> - --dereference to control whether to follow symlinks > > This is actually surprisingly difficult. The reason I implemented this > only for no-index mode is because there are actually several places we > can stat a file in the diff code, and implementing a --dereference > option that catches all of those cases and getting the option passed > down to them is non-trivial. Another thing to worry about is symlinks that point outside the working tree. When a tracked content "dir/link" is a symlink to "/etc/motd", it probably makes sense to open("/etc/motd") and read() it on the working tree side of the diff, and probably even on the index side of the diff, but what about obtaining contents for "dir/link" in a year-old commit under --deference mode? I am not sure if it makes sense to read from the filesystem in such a case. I personally am perfectly fine if this "do not compare readlink(2), but read contents literally" is limited to the --no-index mode. Thanks.
On Fri, Jan 04, 2019 at 11:26:56AM -0800, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > >> - --dereference to control whether to follow symlinks > > > > This is actually surprisingly difficult. The reason I implemented this > > only for no-index mode is because there are actually several places we > > can stat a file in the diff code, and implementing a --dereference > > option that catches all of those cases and getting the option passed > > down to them is non-trivial. > > Another thing to worry about is symlinks that point outside the > working tree. When a tracked content "dir/link" is a symlink to > "/etc/motd", it probably makes sense to open("/etc/motd") and read() > it on the working tree side of the diff, and probably even on the > index side of the diff, but what about obtaining contents for > "dir/link" in a year-old commit under --deference mode? I am not > sure if it makes sense to read from the filesystem in such a case. > > I personally am perfectly fine if this "do not compare readlink(2), > but read contents literally" is limited to the --no-index mode. That's a good point. I think I'll stick with the current design, then, since that seems like the least surprising way forward. It also means that we don't read outside of the working tree unless --no-index is in use, which may be beneficial for security purposes. Thanks for a helpful perspective.
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt index 030f162f30..4c4695c88d 100644 --- a/Documentation/git-diff.txt +++ b/Documentation/git-diff.txt @@ -111,6 +111,11 @@ include::diff-options.txt[] "Unmerged". Can be used only when comparing the working tree with the index. +--literally:: + Read the specified files literally, as `diff` would, + dereferencing any symlinks and reading data from pipes. + This option only works with `--no-index`. + <path>...:: The <paths> parameters, when given, are used to limit the diff to the named paths (you can give directory diff --git a/diff-no-index.c b/diff-no-index.c index 9414e922d1..2707206aee 100644 --- a/diff-no-index.c +++ b/diff-no-index.c @@ -75,7 +75,25 @@ static int populate_from_stdin(struct diff_filespec *s) return 0; } -static struct diff_filespec *noindex_filespec(const char *name, int mode) +static int populate_literally(struct diff_filespec *s) +{ + struct strbuf buf = STRBUF_INIT; + size_t size = 0; + int fd = xopen(s->path, O_RDONLY); + + if (strbuf_read(&buf, fd, 0) < 0) + return error_errno("error while reading from '%s'", s->path); + + s->should_munmap = 0; + s->data = strbuf_detach(&buf, &size); + s->size = size; + s->should_free = 1; + s->read_literally = 1; + return 0; +} + +static struct diff_filespec *noindex_filespec(const char *name, int mode, + struct diff_options *o) { struct diff_filespec *s; @@ -85,6 +103,8 @@ static struct diff_filespec *noindex_filespec(const char *name, int mode) fill_filespec(s, &null_oid, 0, mode); if (name == file_from_standard_input) populate_from_stdin(s); + else if (o->flags.read_literally) + populate_literally(s); return s; } @@ -101,14 +121,14 @@ static int queue_diff(struct diff_options *o, if (S_ISDIR(mode1)) { /* 2 is file that is created */ - d1 = noindex_filespec(NULL, 0); - d2 = noindex_filespec(name2, mode2); + d1 = noindex_filespec(NULL, 0, o); + d2 = noindex_filespec(name2, mode2, o); name2 = NULL; mode2 = 0; } else { /* 1 is file that is deleted */ - d1 = noindex_filespec(name1, mode1); - d2 = noindex_filespec(NULL, 0); + d1 = noindex_filespec(name1, mode1, o); + d2 = noindex_filespec(NULL, 0, o); name1 = NULL; mode1 = 0; } @@ -189,8 +209,8 @@ static int queue_diff(struct diff_options *o, SWAP(name1, name2); } - d1 = noindex_filespec(name1, mode1); - d2 = noindex_filespec(name2, mode2); + d1 = noindex_filespec(name1, mode1, o); + d2 = noindex_filespec(name2, mode2, o); diff_queue(&diff_queued_diff, d1, d2); return 0; } diff --git a/diff.c b/diff.c index dc9965e836..740d0087b9 100644 --- a/diff.c +++ b/diff.c @@ -4282,18 +4282,18 @@ static void run_diff_cmd(const char *pgm, fprintf(o->file, "* Unmerged path %s\n", name); } -static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *istate) +static void diff_fill_oid_info(struct diff_filespec *one, struct diff_options *o) { if (DIFF_FILE_VALID(one)) { if (!one->oid_valid) { struct stat st; - if (one->is_stdin) { + if (one->is_stdin || one->read_literally) { oidclr(&one->oid); return; } if (lstat(one->path, &st) < 0) die_errno("stat '%s'", one->path); - if (index_path(istate, &one->oid, one->path, &st, 0)) + if (index_path(o->repo->index, &one->oid, one->path, &st, 0)) die("cannot hash %s", one->path); } } @@ -4341,8 +4341,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o) return; } - diff_fill_oid_info(one, o->repo->index); - diff_fill_oid_info(two, o->repo->index); + diff_fill_oid_info(one, o); + diff_fill_oid_info(two, o); if (!pgm && DIFF_FILE_VALID(one) && DIFF_FILE_VALID(two) && @@ -4389,8 +4389,8 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o, if (o->prefix_length) strip_prefix(o->prefix_length, &name, &other); - diff_fill_oid_info(p->one, o->repo->index); - diff_fill_oid_info(p->two, o->repo->index); + diff_fill_oid_info(p->one, o); + diff_fill_oid_info(p->two, o); builtin_diffstat(name, other, p->one, p->two, diffstat, o, p); @@ -4414,8 +4414,8 @@ static void run_checkdiff(struct diff_filepair *p, struct diff_options *o) if (o->prefix_length) strip_prefix(o->prefix_length, &name, &other); - diff_fill_oid_info(p->one, o->repo->index); - diff_fill_oid_info(p->two, o->repo->index); + diff_fill_oid_info(p->one, o); + diff_fill_oid_info(p->two, o); builtin_checkdiff(name, other, attr_path, p->one, p->two, o); } @@ -5159,6 +5159,8 @@ int diff_opt_parse(struct diff_options *options, options->flags.funccontext = 1; else if (!strcmp(arg, "--no-function-context")) options->flags.funccontext = 0; + else if (!strcmp(arg, "--literally")) + options->flags.read_literally = 1; else if ((argcount = parse_long_opt("output", av, &optarg))) { char *path = prefix_filename(prefix, optarg); options->file = xfopen(path, "w"); @@ -5720,8 +5722,8 @@ static int diff_get_patch_id(struct diff_options *options, struct object_id *oid if (DIFF_PAIR_UNMERGED(p)) continue; - diff_fill_oid_info(p->one, options->repo->index); - diff_fill_oid_info(p->two, options->repo->index); + diff_fill_oid_info(p->one, options); + diff_fill_oid_info(p->two, options); len1 = remove_space(p->one->path, strlen(p->one->path)); len2 = remove_space(p->two->path, strlen(p->two->path)); diff --git a/diff.h b/diff.h index ce5e8a8183..7dedd3bcd1 100644 --- a/diff.h +++ b/diff.h @@ -97,6 +97,7 @@ struct diff_flags { unsigned stat_with_summary:1; unsigned suppress_diff_headers:1; unsigned dual_color_diffed_diffs:1; + unsigned read_literally:1; }; static inline void diff_flags_or(struct diff_flags *a, diff --git a/diffcore.h b/diffcore.h index b651061c0e..363869447a 100644 --- a/diffcore.h +++ b/diffcore.h @@ -48,6 +48,7 @@ struct diff_filespec { #define DIRTY_SUBMODULE_UNTRACKED 1 #define DIRTY_SUBMODULE_MODIFIED 2 unsigned is_stdin : 1; + unsigned read_literally : 1; unsigned has_more_entries : 1; /* only appear in combined diff */ /* data should be considered "binary"; -1 means "don't know yet" */ signed int is_binary : 2; diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh index 6e0dd6f9e5..53e6bcdc19 100755 --- a/t/t4053-diff-no-index.sh +++ b/t/t4053-diff-no-index.sh @@ -137,4 +137,32 @@ test_expect_success 'diff --no-index from repo subdir with absolute paths' ' test_cmp expect actual ' +test_expect_success 'diff --no-index --literally' ' + echo "diff --git a/../../non/git/a b/../../non/git/b" >expect && + test_expect_code 1 \ + git -C repo/sub \ + diff --literally ../../non/git/a ../../non/git/b >actual && + head -n 1 <actual >actual.head && + test_cmp expect actual.head +' + +test_expect_success SYMLINKS 'diff --no-index --literally with symlinks' ' + test_write_lines a b c >f1 && + test_write_lines a d c >f2 && + ln -s f1 s1 && + ln -s f2 s2 && + cat >expect <<-\EOF && + diff --git a/s1 b/s2 + --- a/s1 + +++ b/s2 + @@ -1,3 +1,3 @@ + a + -b + +d + c + EOF + test_expect_code 1 git diff --no-index --literally s1 s2 >actual && + test_cmp expect actual +' + test_done
In some shells, such as bash and zsh, it's possible to use a command substitution to provide the output of a command as a file argument to another process, like so: diff -u <(printf "a\nb\n") <(printf "a\nc\n") However, this syntax does not produce useful results with git diff --no-index. On macOS, the arguments to the command are named pipes under /dev/fd, and git diff doesn't know how to handle a named pipe. On Linux, the arguments are symlinks to pipes, so git diff "helpfully" diffs these symlinks, comparing their targets like "pipe:[1234]" and "pipe:[5678]". Because this behavior is not very helpful, and because git diff has many features that people would like to use even on non-Git files, add an option to git diff --no-index to read files literally, dereferencing symlinks and reading them as a normal file. Note that this behavior requires that the files be read entirely into memory, just as we do when reading from standard input. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- This is a long-standing annoyance of mine, and it also makes some use cases possible (for example, diffing filtered and non-filtered objects). We don't include a test for the pipe scenario because I couldn't get that case to work in portable shell (although of course it works in bash). I have, however, tested it on both macOS and Linux. No clue how this works on Windows. Documentation/git-diff.txt | 5 +++++ diff-no-index.c | 34 +++++++++++++++++++++++++++------- diff.c | 24 +++++++++++++----------- diff.h | 1 + diffcore.h | 1 + t/t4053-diff-no-index.sh | 28 ++++++++++++++++++++++++++++ 6 files changed, 75 insertions(+), 18 deletions(-)