diff mbox series

[RFC,1/5] ref-filter: add objectsize:disk option

Message ID 01020166f76d845f-1a02a31e-5094-4b27-974d-a23811066c58-000000@eu-west-1.amazonses.com (mailing list archive)
State New, archived
Headers show
Series ref-filter: add new formatting options | expand

Commit Message

Olga Telezhnaya Nov. 9, 2018, 7:44 a.m. UTC
Add new formatting option objectsize:disk to know
exact size that object takes up on disk.

Signed-off-by: Olga Telezhnaia <olyatelezhnaya@gmail.com>
---
 ref-filter.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)


--
https://github.com/git/git/pull/552

Comments

Junio C Hamano Nov. 12, 2018, 5:03 a.m. UTC | #1
Olga Telezhnaya <olyatelezhnaya@gmail.com> writes:

> @@ -876,11 +882,13 @@ static void grab_common_values(struct atom_value *val, int deref, struct expand_
>  			name++;
>  		if (!strcmp(name, "objecttype"))
>  			v->s = xstrdup(type_name(oi->type));
> -		else if (!strcmp(name, "objectsize")) {
> +		else if (!strcmp(name, "objectsize:disk")) {
> +			v->value = oi->disk_size;
> +			v->s = xstrfmt("%lld", (long long)oi->disk_size);

oi.disk_size is off_t; do we know "long long" 

   (1) is available widely enough (I think it is from c99)?

   (2) is sufficiently wide so that we can safely cast off_t to?

   (3) will stay to be sufficiently wide as machines get larger
       together with standard types like off_t in the future?

I'd rather use intmax_t (as off_t is a signed integral type) so that
we do not have to worry about these questions in the first place.

> +		} else if (!strcmp(name, "objectsize")) {
>  			v->value = oi->size;
>  			v->s = xstrfmt("%lu", oi->size);

This is not a suggestion but is a genuine question, but I wonder if
two years down the road somebody who meets this API for the first
time find the asymmetry between "objectsize" and "objectsize:disk" a
tad strange and suggest the former to have "objectsize:real" or some
synonym.  Or we can consider "objectsize" the primary thing (hence
needing no colon-plus-modifier to clarify what kind of size we are
asking) and leave these two deliberatly asymmetric.  I am leaning
towards the latter myself.

> -		}
> -		else if (deref)
> +		} else if (deref)
>  			grab_objectname(name, &oi->oid, v, &used_atom[i]);
>  	}
>  }
>
> --
> https://github.com/git/git/pull/552
Johannes Schindelin Nov. 12, 2018, 12:03 p.m. UTC | #2
Hi,

On Mon, 12 Nov 2018, Junio C Hamano wrote:

> Olga Telezhnaya <olyatelezhnaya@gmail.com> writes:
> 
> > @@ -876,11 +882,13 @@ static void grab_common_values(struct atom_value *val, int deref, struct expand_
> >  			name++;
> >  		if (!strcmp(name, "objecttype"))
> >  			v->s = xstrdup(type_name(oi->type));
> > -		else if (!strcmp(name, "objectsize")) {
> > +		else if (!strcmp(name, "objectsize:disk")) {
> > +			v->value = oi->disk_size;
> > +			v->s = xstrfmt("%lld", (long long)oi->disk_size);
> 
> oi.disk_size is off_t; do we know "long long" 
> 
>    (1) is available widely enough (I think it is from c99)?
> 
>    (2) is sufficiently wide so that we can safely cast off_t to?
> 
>    (3) will stay to be sufficiently wide as machines get larger
>        together with standard types like off_t in the future?
> 
> I'd rather use intmax_t (as off_t is a signed integral type) so that
> we do not have to worry about these questions in the first place.

You mean something like

			v->s = xstrfmt("%"PRIdMAX, (intmax_t)oi->disk_size);


If so, I agree.

Ciao,
Dscho

P.S.: I wondered whether we have precedent for PRIdMAX, as we used to use
only PRIuMAX, but yes: JeffH's json-writer uses PRIdMAX.

> 
> > +		} else if (!strcmp(name, "objectsize")) {
> >  			v->value = oi->size;
> >  			v->s = xstrfmt("%lu", oi->size);
> 
> This is not a suggestion but is a genuine question, but I wonder if
> two years down the road somebody who meets this API for the first
> time find the asymmetry between "objectsize" and "objectsize:disk" a
> tad strange and suggest the former to have "objectsize:real" or some
> synonym.  Or we can consider "objectsize" the primary thing (hence
> needing no colon-plus-modifier to clarify what kind of size we are
> asking) and leave these two deliberatly asymmetric.  I am leaning
> towards the latter myself.
> 
> > -		}
> > -		else if (deref)
> > +		} else if (deref)
> >  			grab_objectname(name, &oi->oid, v, &used_atom[i]);
> >  	}
> >  }
> >
> > --
> > https://github.com/git/git/pull/552
>
Jeff King Nov. 12, 2018, 1:05 p.m. UTC | #3
On Mon, Nov 12, 2018 at 02:03:20PM +0900, Junio C Hamano wrote:

> > +		} else if (!strcmp(name, "objectsize")) {
> >  			v->value = oi->size;
> >  			v->s = xstrfmt("%lu", oi->size);
> 
> This is not a suggestion but is a genuine question, but I wonder if
> two years down the road somebody who meets this API for the first
> time find the asymmetry between "objectsize" and "objectsize:disk" a
> tad strange and suggest the former to have "objectsize:real" or some
> synonym.  Or we can consider "objectsize" the primary thing (hence
> needing no colon-plus-modifier to clarify what kind of size we are
> asking) and leave these two deliberatly asymmetric.  I am leaning
> towards the latter myself.

I think to some degree that ship has already sailed (and is my fault!).

The ulterior motive here is to eventually unify the cat-file formatter
with the ref-filter formatter.  So for that we'll have to support
%(objectsize) anyway.

That still leaves the option of having %(objectsize:real) later and
marking a bare %(objectsize) as a deprecated synonym. But I don't think
there's any advantage to trying to deal with it at this stage.

-Peff
Jeff King Nov. 12, 2018, 1:12 p.m. UTC | #4
On Mon, Nov 12, 2018 at 01:03:25PM +0100, Johannes Schindelin wrote:

> > oi.disk_size is off_t; do we know "long long" 
> > 
> >    (1) is available widely enough (I think it is from c99)?
> > 
> >    (2) is sufficiently wide so that we can safely cast off_t to?
> > 
> >    (3) will stay to be sufficiently wide as machines get larger
> >        together with standard types like off_t in the future?
> > 
> > I'd rather use intmax_t (as off_t is a signed integral type) so that
> > we do not have to worry about these questions in the first place.
> 
> You mean something like
> 
> 			v->s = xstrfmt("%"PRIdMAX, (intmax_t)oi->disk_size);

I think elsewhere we simply use PRIuMAX for printing large sizes via
off_t; we know this value isn't going to be negative.

I'm not opposed to PRIdMAX, which _is_ more accurate, but...

> P.S.: I wondered whether we have precedent for PRIdMAX, as we used to use
> only PRIuMAX, but yes: JeffH's json-writer uses PRIdMAX.

That's pretty recent. I won't be surprised if we have to do some
preprocessor trickery to handle that at some point. We have a PRIuMAX
fallback already. That comes from c4001d92be (Use off_t when we really
mean a file offset., 2007-03-06), but it's not clear to me if that was
motivated by a real platform or an over-abundance of caution.

I'm OK with just using PRIdMAX as appropriate for now. It will serve as
a weather-balloon, and we can #define our way out of it later if need
be.

-Peff
Junio C Hamano Nov. 13, 2018, 1:52 a.m. UTC | #5
Jeff King <peff@peff.net> writes:

>> You mean something like
>> 
>> 			v->s = xstrfmt("%"PRIdMAX, (intmax_t)oi->disk_size);
>
> I think elsewhere we simply use PRIuMAX for printing large sizes via
> off_t; we know this value isn't going to be negative.
>
> I'm not opposed to PRIdMAX, which _is_ more accurate, but...
>
>> P.S.: I wondered whether we have precedent for PRIdMAX, as we used to use
>> only PRIuMAX, but yes: JeffH's json-writer uses PRIdMAX.
>
> That's pretty recent. I won't be surprised if we have to do some
> preprocessor trickery to handle that at some point. We have a PRIuMAX
> fallback already. That comes from c4001d92be (Use off_t when we really
> mean a file offset., 2007-03-06), but it's not clear to me if that was
> motivated by a real platform or an over-abundance of caution.
>
> I'm OK with just using PRIdMAX as appropriate for now. It will serve as
> a weather-balloon, and we can #define our way out of it later if need
> be.

I am OK if we avoid PRIdMAX and use PRIuMAX instead with a cast to
the corresponding size in this codepath, as long as we properly
handle negative oi.disk_size field, which may be telling some
"unusual" condition to us.
Olga Telezhnaya Nov. 20, 2018, 9:17 a.m. UTC | #6
вт, 13 нояб. 2018 г. в 04:52, Junio C Hamano <gitster@pobox.com>:
>
> Jeff King <peff@peff.net> writes:
>
> >> You mean something like
> >>
> >>                      v->s = xstrfmt("%"PRIdMAX, (intmax_t)oi->disk_size);
> >
> > I think elsewhere we simply use PRIuMAX for printing large sizes via
> > off_t; we know this value isn't going to be negative.
> >
> > I'm not opposed to PRIdMAX, which _is_ more accurate, but...
> >
> >> P.S.: I wondered whether we have precedent for PRIdMAX, as we used to use
> >> only PRIuMAX, but yes: JeffH's json-writer uses PRIdMAX.
> >
> > That's pretty recent. I won't be surprised if we have to do some
> > preprocessor trickery to handle that at some point. We have a PRIuMAX
> > fallback already. That comes from c4001d92be (Use off_t when we really
> > mean a file offset., 2007-03-06), but it's not clear to me if that was
> > motivated by a real platform or an over-abundance of caution.
> >
> > I'm OK with just using PRIdMAX as appropriate for now. It will serve as
> > a weather-balloon, and we can #define our way out of it later if need
> > be.
>
> I am OK if we avoid PRIdMAX and use PRIuMAX instead with a cast to
> the corresponding size in this codepath, as long as we properly
> handle negative oi.disk_size field, which may be telling some
> "unusual" condition to us.

Maybe we want to change the type (from off_t to unsigned) directly in
struct object_info? That will help us not to make additional
checkings. Or, at least, I suggest to add check to
oid_object_info_extended() so that this function will give a guarantee
that the size is non-negative. That will make code cleaner (otherwise
we need to add checks everywhere after oid_object_info_extended()
usage).

Please, look at this one also [1]. Thanks a lot!

[1] https://public-inbox.org/git/CAL21BmnoZuRih3Ky66_Tk0PweD36eZ6=fbY3jGumRcSJ=Bc_pQ@mail.gmail.com/

>
>
Junio C Hamano Nov. 21, 2018, 12:27 a.m. UTC | #7
Оля Тележная  <olyatelezhnaya@gmail.com> writes:

>> I am OK if we avoid PRIdMAX and use PRIuMAX instead with a cast to
>> the corresponding size in this codepath, as long as we properly
>> handle negative oi.disk_size field, which may be telling some
>> "unusual" condition to us.
>
> Maybe we want to change the type (from off_t to unsigned) directly in
> struct object_info? That will help us not to make additional
> checkings. Or, at least, I suggest to add check to
> oid_object_info_extended() so that this function will give a guarantee
> that the size is non-negative.

I am fine with the approach.  The potential gain of allowing
oi.disk_size is it would allow the code to say "I'll give these
pieces of info about the object, but the on-disk size is unknown"
without failing the whole object_info_extended() request.  And I
personally do not think such an ability is not all that important
or useful.
Johannes Schindelin Nov. 22, 2018, 11:33 a.m. UTC | #8
Hi Junio,

On Wed, 21 Nov 2018, Junio C Hamano wrote:

> Оля Тележная  <olyatelezhnaya@gmail.com> writes:
> 
> >> I am OK if we avoid PRIdMAX and use PRIuMAX instead with a cast to
> >> the corresponding size in this codepath, as long as we properly
> >> handle negative oi.disk_size field, which may be telling some
> >> "unusual" condition to us.
> >
> > Maybe we want to change the type (from off_t to unsigned) directly in
> > struct object_info? That will help us not to make additional
> > checkings. Or, at least, I suggest to add check to
> > oid_object_info_extended() so that this function will give a guarantee
> > that the size is non-negative.
> 
> I am fine with the approach.  The potential gain of allowing
> oi.disk_size is it would allow the code to say "I'll give these
> pieces of info about the object, but the on-disk size is unknown"
> without failing the whole object_info_extended() request.  And I
> personally do not think such an ability is not all that important
> or useful.

I see that this topic advanced to `next`, which means that the Windows
build of `next` is now broken.

To fix this, I prepared a GitGitGadget PR
(https://github.com/gitgitgadget/git/pull/87) and will submit it as soon
as I am satisfied that the build works.

Ciao,
Dscho
Junio C Hamano Nov. 24, 2018, 1:48 a.m. UTC | #9
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> To fix this, I prepared a GitGitGadget PR
> (https://github.com/gitgitgadget/git/pull/87) and will submit it as soon
> as I am satisfied that the build works.

Thanks.  This won't be in the upcoming release anyway, so we can fix
it up without "oops, let's pile another fix on it" if we wanted to
after the release by kicking it back to 'pu', but in the meantime,
keeping the tip of 'next' free from known breakage certainly is a
sensible thing to do.
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index 0c45ed9d94a4b..8ba1a4e72f2c3 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -231,12 +231,18 @@  static int objecttype_atom_parser(const struct ref_format *format, struct used_a
 static int objectsize_atom_parser(const struct ref_format *format, struct used_atom *atom,
 				  const char *arg, struct strbuf *err)
 {
-	if (arg)
-		return strbuf_addf_ret(err, -1, _("%%(objectsize) does not take arguments"));
-	if (*atom->name == '*')
-		oi_deref.info.sizep = &oi_deref.size;
-	else
-		oi.info.sizep = &oi.size;
+	if (!arg) {
+		if (*atom->name == '*')
+			oi_deref.info.sizep = &oi_deref.size;
+		else
+			oi.info.sizep = &oi.size;
+	} else if (!strcmp(arg, "disk")) {
+		if (*atom->name == '*')
+			oi_deref.info.disk_sizep = &oi_deref.disk_size;
+		else
+			oi.info.disk_sizep = &oi.disk_size;
+	} else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(objectsize) argument: %s"), arg);
 	return 0;
 }
 
@@ -876,11 +882,13 @@  static void grab_common_values(struct atom_value *val, int deref, struct expand_
 			name++;
 		if (!strcmp(name, "objecttype"))
 			v->s = xstrdup(type_name(oi->type));
-		else if (!strcmp(name, "objectsize")) {
+		else if (!strcmp(name, "objectsize:disk")) {
+			v->value = oi->disk_size;
+			v->s = xstrfmt("%lld", (long long)oi->disk_size);
+		} else if (!strcmp(name, "objectsize")) {
 			v->value = oi->size;
 			v->s = xstrfmt("%lu", oi->size);
-		}
-		else if (deref)
+		} else if (deref)
 			grab_objectname(name, &oi->oid, v, &used_atom[i]);
 	}
 }