diff mbox series

[01/19] reftable/stack: stop using `read_in_full()`

Message ID 20250127-pks-reftable-drop-git-compat-util-v1-1-6e280a564877@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: stop using "git-compat-util.h" | expand

Commit Message

Patrick Steinhardt Jan. 27, 2025, 1:04 p.m. UTC
There is a single callsite of `read_in_full()` in the reftable library.
Open-code the function to reduce our dependency on the Git library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

Justin Tobler Jan. 27, 2025, 4:57 p.m. UTC | #1
On 25/01/27 02:04PM, Patrick Steinhardt wrote:
> There is a single callsite of `read_in_full()` in the reftable library.
> Open-code the function to reduce our dependency on the Git library.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/stack.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/reftable/stack.c b/reftable/stack.c
> index f7c1845e15..9490366795 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -115,13 +115,16 @@ int reftable_new_stack(struct reftable_stack **dest, const char *dir,
>  
>  static int fd_read_lines(int fd, char ***namesp)
>  {
> -	off_t size = lseek(fd, 0, SEEK_END);
>  	char *buf = NULL;
>  	int err = 0;
> +	off_t size;
> +
> +	size = lseek(fd, 0, SEEK_END);
>  	if (size < 0) {
>  		err = REFTABLE_IO_ERROR;
>  		goto done;
>  	}
> +
>  	err = lseek(fd, 0, SEEK_SET);
>  	if (err < 0) {
>  		err = REFTABLE_IO_ERROR;
> @@ -134,9 +137,16 @@ static int fd_read_lines(int fd, char ***namesp)
>  		goto done;
>  	}
>  
> -	if (read_in_full(fd, buf, size) != size) {
> -		err = REFTABLE_IO_ERROR;
> -		goto done;
> +	for (size_t total_read = 0; total_read < (size_t) size; ) {

The cast from off_t -> size_t matches the currect behavior, but is it
always safe to do this? In `git-compat-util.h` it looks like we have
`xsize_t()` to safely handle these conversions. Since this series is
moving away from `git-compat-util.h` should ideally something similar be
implemented?

> +		ssize_t bytes_read = read(fd, buf + total_read, size - total_read);
> +		if (bytes_read < 0 && (errno == EAGAIN || errno == EINTR))

The error handling here for EAGAIN doesn't go as far as what `xread()`
does via `handle_nonblock()`. In this scenario is that ok?

> +			continue;
> +		if (bytes_read < 0 || !bytes_read) {
> +			err = REFTABLE_IO_ERROR;
> +			goto done;
> +		}
> +
> +		total_read += bytes_read;
>  	}
>  	buf[size] = 0;
>  
> 
> -- 
> 2.48.1.362.g079036d154.dirty
> 
>
Patrick Steinhardt Jan. 28, 2025, 8:06 a.m. UTC | #2
On Mon, Jan 27, 2025 at 10:57:20AM -0600, Justin Tobler wrote:
> On 25/01/27 02:04PM, Patrick Steinhardt wrote:
> > diff --git a/reftable/stack.c b/reftable/stack.c
> > index f7c1845e15..9490366795 100644
> > --- a/reftable/stack.c
> > +++ b/reftable/stack.c
> > @@ -115,13 +115,16 @@ int reftable_new_stack(struct reftable_stack **dest, const char *dir,
> >  
> >  static int fd_read_lines(int fd, char ***namesp)
> >  {
> > -	off_t size = lseek(fd, 0, SEEK_END);
> >  	char *buf = NULL;
> >  	int err = 0;
> > +	off_t size;
> > +
> > +	size = lseek(fd, 0, SEEK_END);
> >  	if (size < 0) {
> >  		err = REFTABLE_IO_ERROR;
> >  		goto done;
> >  	}
> > +
> >  	err = lseek(fd, 0, SEEK_SET);
> >  	if (err < 0) {
> >  		err = REFTABLE_IO_ERROR;
> > @@ -134,9 +137,16 @@ static int fd_read_lines(int fd, char ***namesp)
> >  		goto done;
> >  	}
> >  
> > -	if (read_in_full(fd, buf, size) != size) {
> > -		err = REFTABLE_IO_ERROR;
> > -		goto done;
> > +	for (size_t total_read = 0; total_read < (size_t) size; ) {
> 
> The cast from off_t -> size_t matches the currect behavior, but is it
> always safe to do this? In `git-compat-util.h` it looks like we have
> `xsize_t()` to safely handle these conversions. Since this series is
> moving away from `git-compat-util.h` should ideally something similar be
> implemented?

It is safe, because a couple lines further up we check for `size < 0`
and error out if that is the case. So we know it's a positive integer,
and thus it can be represented via `size_t`.

> > +		ssize_t bytes_read = read(fd, buf + total_read, size - total_read);
> > +		if (bytes_read < 0 && (errno == EAGAIN || errno == EINTR))
> 
> The error handling here for EAGAIN doesn't go as far as what `xread()`
> does via `handle_nonblock()`. In this scenario is that ok?

Yes, because we don't set `O_NONBLOCK` in the reftable library.

I'll note that in the commit message.

Patrick
Junio C Hamano Jan. 28, 2025, 5:05 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

>> The cast from off_t -> size_t matches the currect behavior, but is it
>> always safe to do this? In `git-compat-util.h` it looks like we have
>> `xsize_t()` to safely handle these conversions. Since this series is
>> moving away from `git-compat-util.h` should ideally something similar be
>> implemented?
>
> It is safe, because a couple lines further up we check for `size < 0`
> and error out if that is the case. So we know it's a positive integer,
> and thus it can be represented via `size_t`.

Even where off_t (which measures on-disk file in bytes) may be wider
than size_t (which measures in-core piece of memory in bytes)?

>> > +		ssize_t bytes_read = read(fd, buf + total_read, size - total_read);
>> > +		if (bytes_read < 0 && (errno == EAGAIN || errno == EINTR))
>> 
>> The error handling here for EAGAIN doesn't go as far as what `xread()`
>> does via `handle_nonblock()`. In this scenario is that ok?
>
> Yes, because we don't set `O_NONBLOCK` in the reftable library.
>
> I'll note that in the commit message.

Good.  Thanks.
Patrick Steinhardt Jan. 29, 2025, 7:29 a.m. UTC | #4
On Tue, Jan 28, 2025 at 09:05:44AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> The cast from off_t -> size_t matches the currect behavior, but is it
> >> always safe to do this? In `git-compat-util.h` it looks like we have
> >> `xsize_t()` to safely handle these conversions. Since this series is
> >> moving away from `git-compat-util.h` should ideally something similar be
> >> implemented?
> >
> > It is safe, because a couple lines further up we check for `size < 0`
> > and error out if that is the case. So we know it's a positive integer,
> > and thus it can be represented via `size_t`.
> 
> Even where off_t (which measures on-disk file in bytes) may be wider
> than size_t (which measures in-core piece of memory in bytes)?

Wait, can that actually happen? Hm. I assume it can, for example on 32
bit systems with large-file support enabled. There I assume that `off_t`
would be a 64 bit signed integer, whereas `size_t` may be a 32 bit
unsigned integer.

Will address in v3.

Patrick
diff mbox series

Patch

diff --git a/reftable/stack.c b/reftable/stack.c
index f7c1845e15..9490366795 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -115,13 +115,16 @@  int reftable_new_stack(struct reftable_stack **dest, const char *dir,
 
 static int fd_read_lines(int fd, char ***namesp)
 {
-	off_t size = lseek(fd, 0, SEEK_END);
 	char *buf = NULL;
 	int err = 0;
+	off_t size;
+
+	size = lseek(fd, 0, SEEK_END);
 	if (size < 0) {
 		err = REFTABLE_IO_ERROR;
 		goto done;
 	}
+
 	err = lseek(fd, 0, SEEK_SET);
 	if (err < 0) {
 		err = REFTABLE_IO_ERROR;
@@ -134,9 +137,16 @@  static int fd_read_lines(int fd, char ***namesp)
 		goto done;
 	}
 
-	if (read_in_full(fd, buf, size) != size) {
-		err = REFTABLE_IO_ERROR;
-		goto done;
+	for (size_t total_read = 0; total_read < (size_t) size; ) {
+		ssize_t bytes_read = read(fd, buf + total_read, size - total_read);
+		if (bytes_read < 0 && (errno == EAGAIN || errno == EINTR))
+			continue;
+		if (bytes_read < 0 || !bytes_read) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+
+		total_read += bytes_read;
 	}
 	buf[size] = 0;