diff mbox series

[v3] grep: provide sane default to grep_source struct

Message ID 20190522003402.77767-1-emilyshaffer@google.com (mailing list archive)
State New, archived
Headers show
Series [v3] grep: provide sane default to grep_source struct | expand

Commit Message

Emily Shaffer May 22, 2019, 12:34 a.m. UTC
grep_buffer creates a struct grep_source gs and calls grep_source()
with it. However, gs.name is null, which eventually produces a
segmentation fault in grep_source()->grep_source_1()->show_line()
when grep_opt.status_only is not set.

This seems to be unreachable from existing commands but is reachable in
the event that someone rolls their own revision walk and neglects to set
rev_info->grep_filter->status_only. Conceivably, someone might want to
print all changed lines of all commits which match some regex.

To futureproof, give developers a heads-up if grep_source() is called
and a valid name field is expected but not given. This path should be
unreachable now, but in the future may become reachable if developers
want to add the ability to use grep_buffer() in prepared in-core data
and provide hints to the user about where the data came from.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---

I think Peff and Jonathan are right. If someone does want to hack away
on something in the very early stages, the BUG() line gives a pretty
clear indicator of what to modify to make it work.

I also moved the BUG() to grep_source_1() to keep it close to the error
itself and reflowed the commit message.

 - Emily

 grep.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jonathan Nieder May 22, 2019, 12:58 a.m. UTC | #1
Emily Shaffer wrote:

> Subject: grep: provide sane default to grep_source struct

This subject line doesn't describe what the patch does any more.  How
about something like

	grep: fail early if call could print output and name is not set

?

> grep_buffer creates a struct grep_source gs and calls grep_source()
> with it. However, gs.name is null, which eventually produces a
> segmentation fault in grep_source()->grep_source_1()->show_line()
> when grep_opt.status_only is not set.

This is the place to describe the motivation behind the patch:

- what is the current behavior?
- what is the motivation behind the current behavior?
- in what scenario does it fail?
- how does it fail, and what is undesirable about that failure?

Perhaps something like

	grep_source, the workhorse of Git's grep library, allows passing in an
	arbitrary grep_source representing the haystack in which to search for
	a needle matching a pattern.  In most callers, the grep_source::name
	field is set to an appropriate prefix to print before a colon when a
	result matches:

		README:Git is an Open Source project covered by the GNU General Public

	One caller, grep_buffer, leaves the 'name' field set to NULL because
	there is not enough context to determine an appropriate name to put in
	such output lines.  In practice, this has been fine because the only
	caller of grep_buffer is "git log --grep" and that caller sets
	status_only to disable output (and only check whether a match exists).
	But this is brittle: a future caller can call grep_buffer without
	status_only set, and as soon as it hits a match, grep_source will
	segfault in the process of trying to print

		(null):Git is an Open Source project covered by the GNU General Public

> This seems to be unreachable from existing commands but is reachable in
> the event that someone rolls their own revision walk and neglects to set
> rev_info->grep_filter->status_only. Conceivably, someone might want to
> print all changed lines of all commits which match some regex.

Focusing on the use case instead of the details of how it is implemented,
we can simplify this to

	For example, such a future caller might want to print all matching
	lines from commits which match a regex.

> To futureproof, give developers a heads-up if grep_source() is called
> and a valid name field is expected but not given. This path should be
> unreachable now, but in the future may become reachable if developers
> want to add the ability to use grep_buffer() in prepared in-core data
> and provide hints to the user about where the data came from.

Here we can concisely describe the improved behavior with the fix:

	Futureproof by diagnosing a use of the API that could trigger that
	condition early, before we know whether the pattern matches:

		BUG: git.c:832: grep call which could print a name requires grep_source.name be non-NULL
		Aborted

And then what it makes possible:

	This way, the caller's author gets a quick indication of how to fix
	it, by ensuring that either grep_source.name is set or that
	status_only is set to prevent printing output.

And how it came up:

	Noticed while adding such a call in a tutorial on revision walks.

> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
[...]
> I also moved the BUG() to grep_source_1() to keep it close to the error
> itself and reflowed the commit message.

Makes sense.  Thanks for these summaries of what changed between each
revision of the patch --- they're very helpful.

With whatever subset of the above suggested commit message changes
make sense,

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

[...]
> --- a/grep.c
> +++ b/grep.c
> @@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
>  	enum grep_context ctx = GREP_CONTEXT_HEAD;
>  	xdemitconf_t xecfg;
>  
> +	if (!opt->status_only && gs->name == NULL)

optional: can use !gs->name here.

> +		BUG("grep call which could print a name requires "
> +		    "grep_source.name be non-NULL");
Jeff King May 22, 2019, 4:01 a.m. UTC | #2
On Tue, May 21, 2019 at 05:34:02PM -0700, Emily Shaffer wrote:

> I think Peff and Jonathan are right. If someone does want to hack away
> on something in the very early stages, the BUG() line gives a pretty
> clear indicator of what to modify to make it work.
> 
> I also moved the BUG() to grep_source_1() to keep it close to the error
> itself and reflowed the commit message.

Thanks. This version looks good to me (though if you took any of
Jonathan's suggestions in the other part of the thread that would be
fine with me, too).

-Peff
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index 0d50598acd..f7c3a5803e 100644
--- a/grep.c
+++ b/grep.c
@@ -1780,6 +1780,10 @@  static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle
 	enum grep_context ctx = GREP_CONTEXT_HEAD;
 	xdemitconf_t xecfg;
 
+	if (!opt->status_only && gs->name == NULL)
+		BUG("grep call which could print a name requires "
+		    "grep_source.name be non-NULL");
+
 	if (!opt->output)
 		opt->output = std_output;