ls-files: use correct format string

Message ID	20190407184751.28027-1-t.gummerer@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Thomas Gummerer <t.gummerer@gmail.com> To: git@vger.kernel.org Cc: Thomas Gummerer <t.gummerer@gmail.com> Subject: [PATCH] ls-files: use correct format string Date: Sun, 7 Apr 2019 19:47:51 +0100 Message-Id: <20190407184751.28027-1-t.gummerer@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	ls-files: use correct format string \| expand ls-files: use correct format string

Message ID

20190407184751.28027-1-t.gummerer@gmail.com (mailing list archive)

State

New, archived

Headers

From: Thomas Gummerer <t.gummerer@gmail.com>
To: git@vger.kernel.org
Cc: Thomas Gummerer <t.gummerer@gmail.com>
Subject: [PATCH] ls-files: use correct format string
Date: Sun,  7 Apr 2019 19:47:51 +0100
Message-Id: <20190407184751.28027-1-t.gummerer@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

ls-files: use correct format string | expand

Commit Message

Thomas Gummerer April 7, 2019, 6:47 p.m. UTC

struct stat_data and struct cache_time both use unsigned ints for all
their members.  However the format string for 'git ls-files --debug'
currently uses %d for formatting these numbers.  This means that we
potentially print these values incorrectly if they are greater than
INT_MAX.

This has been the case since the --debug option was introduced in 'git
ls-files' in 8497421715 ("ls-files: learn a debugging dump format",
2010-07-31).

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

I noticed this running 'git ls-files --debug' the other day.
Presumably there's not too many users of 'git ls-files', and even
fewer will see values that trigger this behaviour.  

 builtin/ls-files.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Jeff King April 11, 2019, 4:18 a.m. UTC | #1

On Sun, Apr 07, 2019 at 07:47:51PM +0100, Thomas Gummerer wrote:

> struct stat_data and struct cache_time both use unsigned ints for all
> their members.  However the format string for 'git ls-files --debug'
> currently uses %d for formatting these numbers.  This means that we
> potentially print these values incorrectly if they are greater than
> INT_MAX.
> 
> This has been the case since the --debug option was introduced in 'git
> ls-files' in 8497421715 ("ls-files: learn a debugging dump format",
> 2010-07-31).

I didn't see any comment on this, but it seems like it must be obviously
correct, since as you note we do define those fields as unsigned. I'm
really surprised that -Wformat doesn't catch this, though. I wonder why.

-Peff

Thomas Gummerer April 11, 2019, 9:28 p.m. UTC | #2

On 04/11, Jeff King wrote:
> On Sun, Apr 07, 2019 at 07:47:51PM +0100, Thomas Gummerer wrote:
> 
> > struct stat_data and struct cache_time both use unsigned ints for all
> > their members.  However the format string for 'git ls-files --debug'
> > currently uses %d for formatting these numbers.  This means that we
> > potentially print these values incorrectly if they are greater than
> > INT_MAX.
> > 
> > This has been the case since the --debug option was introduced in 'git
> > ls-files' in 8497421715 ("ls-files: learn a debugging dump format",
> > 2010-07-31).
> 
> I didn't see any comment on this, but it seems like it must be obviously
> correct, since as you note we do define those fields as unsigned. I'm
> really surprised that -Wformat doesn't catch this, though. I wonder why.

Good point.  A bit of digging led me to -Wformat-signedness, which
should catch this.  This turns up a lot of errors in our codebase.  I
didn't go through to see how many of them are actual errors, and how
many are false-positives though.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65446 describes how the
option can lead to false positives, e.g.

    printf ("%u\n", unsigned_short);

might turn up an error.  From a quick test this seems to work
correctly with gcc 8.2.1 that I have on my machine though, so the
issue might be fixed in newer gcc version, even though that bug report
is still marked as new.

Maybe it's worth going through the warnings at some point to see if it
would be possible to turn -Wformat-signedness on.

Jeff King April 11, 2019, 11:49 p.m. UTC | #3

On Thu, Apr 11, 2019 at 10:28:30PM +0100, Thomas Gummerer wrote:

> > I didn't see any comment on this, but it seems like it must be obviously
> > correct, since as you note we do define those fields as unsigned. I'm
> > really surprised that -Wformat doesn't catch this, though. I wonder why.
> 
> Good point.  A bit of digging led me to -Wformat-signedness, which
> should catch this.  This turns up a lot of errors in our codebase.  I
> didn't go through to see how many of them are actual errors, and how
> many are false-positives though.

Ah, right, I totally forgot that signedness got its own warning class.
Thanks for enlightening me.

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65446 describes how the
> option can lead to false positives, e.g.
> 
>     printf ("%u\n", unsigned_short);
> 
> might turn up an error.  From a quick test this seems to work
> correctly with gcc 8.2.1 that I have on my machine though, so the
> issue might be fixed in newer gcc version, even though that bug report
> is still marked as new.

Interesting. Looking at that thread, I actually don't think it would be
so bad to warn there anyway. It's true that due to integer promotion an
unsigned short will work with %u, but I'd be just as happy to switch
such a format to "%hu", which is more correct.

> Maybe it's worth going through the warnings at some point to see if it
> would be possible to turn -Wformat-signedness on.

I skimmed over a few of the results. There are definitely some that
could produce funny output. There are also many that are harmless (e.g.,
printing a constant 0 with "%o", which technically should be "0U"). I
don't think it's high priority, but if anybody wants to chip away at it,
be my guest.

In the meantime, I think your patch here is an obvious improvement.

-Peff

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 29a8762d46..463105ccb5 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -112,11 +112,11 @@  static void print_debug(const struct cache_entry *ce)
 	if (debug_mode) {
 		const struct stat_data *sd = &ce->ce_stat_data;
 
-		printf("  ctime: %d:%d\n", sd->sd_ctime.sec, sd->sd_ctime.nsec);
-		printf("  mtime: %d:%d\n", sd->sd_mtime.sec, sd->sd_mtime.nsec);
-		printf("  dev: %d\tino: %d\n", sd->sd_dev, sd->sd_ino);
-		printf("  uid: %d\tgid: %d\n", sd->sd_uid, sd->sd_gid);
-		printf("  size: %d\tflags: %x\n", sd->sd_size, ce->ce_flags);
+		printf("  ctime: %u:%u\n", sd->sd_ctime.sec, sd->sd_ctime.nsec);
+		printf("  mtime: %u:%u\n", sd->sd_mtime.sec, sd->sd_mtime.nsec);
+		printf("  dev: %u\tino: %u\n", sd->sd_dev, sd->sd_ino);
+		printf("  uid: %u\tgid: %u\n", sd->sd_uid, sd->sd_gid);
+		printf("  size: %u\tflags: %x\n", sd->sd_size, ce->ce_flags);
 	}
 }

ls-files: use correct format string

Commit Message

Comments

Patch