diff mbox series

[1/2,GSOC] ref-filter: add obj-type check in grab contents

Message ID 97955705c22e00a718a8de7555ab7e0e401e792e.1622558243.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series ref-filter: add %(raw) atom | expand

Commit Message

ZheNing Hu June 1, 2021, 2:37 p.m. UTC
From: ZheNing Hu <adlternative@gmail.com>

Only tag and commit objects use `grab_sub_body_contents()` to grab
object contents in the current codebase.  We want to teach the
function to also handle blobs and trees to get their raw data,
without parsing a blob (whose contents looks like a commit or a tag)
incorrectly as a commit or a tag.

Skip the block of code that is specific to handling commits and tags
early when the given object is of a wrong type to help later
addition to handle other types of objects in this function.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Comments

Junio C Hamano June 3, 2021, 2:10 a.m. UTC | #1
"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  /* See grab_values */
> -static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
> +static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
> +				   struct object *obj)

Neither this step or the next change needs anything but type member
of the 'obj' (and 'buf' is coming from oi.content of the result of
asking about that same 'obj').

I wonder if we should do one of the following:

 (1) stop passing "void *buf" and instead "struct expand_data
     *data", and use "data->content" to access "buf", which would
     allow you to access "data->type" to perform the added check.

 (2) instead of adding "struct obj *obj" to the parameters, just add
     "enum object_type type", as that is the only thing you need.

Obviously (2) is with lessor impact, but if it can be done safely
without breaking the code [*], (1) would probably be a much more
preferrable direction to go in the longer term.

    Side note [*].  A caller is allowed to choose to feed "buf" that
    is different from "oi.content" (perhaps buf may sometimes want
    to be a utf-8 recoded version of oi.content for certain types of
    objects) with the current system, but if we pass expand_data
    throughout the callchain, such a caller is broken, for example.
ZheNing Hu June 3, 2021, 4:52 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> 于2021年6月3日周四 上午10:10写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> >  /* See grab_values */
> > -static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
> > +static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
> > +                                struct object *obj)
>
> Neither this step or the next change needs anything but type member
> of the 'obj' (and 'buf' is coming from oi.content of the result of
> asking about that same 'obj').
>
> I wonder if we should do one of the following:
>
>  (1) stop passing "void *buf" and instead "struct expand_data
>      *data", and use "data->content" to access "buf", which would
>      allow you to access "data->type" to perform the added check.
>
>  (2) instead of adding "struct obj *obj" to the parameters, just add
>      "enum object_type type", as that is the only thing you need.
>
> Obviously (2) is with lessor impact, but if it can be done safely
> without breaking the code [*], (1) would probably be a much more
> preferrable direction to go in the longer term.
>

I agree with (1). In future versions of grab_sub_body_contents(), we will
use the content of "data" more frequently instead of using the
crude "obj". The type provided by "obj" can also be provided by
"data". So yes, I would be very willing to let grab_sub_body_contents()
only use "data". (delete "obj")

E.g.

static void grab_sub_body_contents(struct atom_value *val, int deref,
struct expand_data *data)

Using (2), we will need more parameters to pass other object info.

>     Side note [*].  A caller is allowed to choose to feed "buf" that
>     is different from "oi.content" (perhaps buf may sometimes want
>     to be a utf-8 recoded version of oi.content for certain types of
>     objects) with the current system, but if we pass expand_data
>     throughout the callchain, such a caller is broken, for example.
>

Just see the situation in front of us: grab_sub_body_contents()
have only one caller: grab_values(). If someone need a function like
grab_sub_body_contents() to grab another buf, they can use rewrite
a more universal function interface:

static void grab_sub_body_contents(struct atom_value *val, int deref,
struct expand_data *data)
{
   grab_sub_body_contents_internal(val, deref, data->content,
data->size, data->type);
}

static void grab_sub_body_contents_internal(struct atom_value *val,
int deref, void *buf,
                                           unsigned long buf_size,
enum object_type type)
{
...
}

But for the time being, the above one is sufficient.

Thanks.
--
ZheNing Hu
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index 4db0e40ff4c6..c0334857653a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1356,7 +1356,8 @@  static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 }
 
 /* See grab_values */
-static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
+static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
+				   struct object *obj)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
@@ -1371,10 +1372,13 @@  static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			continue;
 		if (deref)
 			name++;
-		if (strcmp(name, "body") &&
-		    !starts_with(name, "subject") &&
-		    !starts_with(name, "trailers") &&
-		    !starts_with(name, "contents"))
+
+		if ((obj->type != OBJ_TAG &&
+		     obj->type != OBJ_COMMIT) ||
+		    (strcmp(name, "body") &&
+		     !starts_with(name, "subject") &&
+		     !starts_with(name, "trailers") &&
+		     !starts_with(name, "contents")))
 			continue;
 		if (!subpos)
 			find_subpos(buf,
@@ -1443,12 +1447,12 @@  static void grab_values(struct atom_value *val, int deref, struct object *obj, v
 	switch (obj->type) {
 	case OBJ_TAG:
 		grab_tag_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, buf, obj);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
 		grab_commit_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, buf, obj);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
 		break;