diff mbox series

[5/6] src/fssum: Allow single file input

Message ID 91a82230f5929b45a6093576b7c7afd311fde97a.1590006879.git.raghavan.arvind@gmail.com (mailing list archive)
State New, archived
Headers show
Series Changes to fssum to support POSIX | expand

Commit Message

Arvind Raghavan May 20, 2020, 9:21 p.m. UTC
Allow regular links and symlinks to be passed as input to fssum.

Signed-off-by: Arvind Raghavan <raghavan.arvind@gmail.com>
Signed-off-by: Jayashree Mohan <jaya@cs.utexas.edu>
Signed-off-by: Vijay Chidambaram <vijay@cs.utexas.edu>
---
 src/fssum.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

Comments

Amir Goldstein May 21, 2020, 9:18 a.m. UTC | #1
On Thu, May 21, 2020 at 3:10 AM Arvind Raghavan
<raghavan.arvind@gmail.com> wrote:
>
> Allow regular links and symlinks to be passed as input to fssum.
>
> Signed-off-by: Arvind Raghavan <raghavan.arvind@gmail.com>
> Signed-off-by: Jayashree Mohan <jaya@cs.utexas.edu>
> Signed-off-by: Vijay Chidambaram <vijay@cs.utexas.edu>
> ---
>  src/fssum.c | 35 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/src/fssum.c b/src/fssum.c
> index ece0f556..2d1624ca 100644
> --- a/src/fssum.c
> +++ b/src/fssum.c
> @@ -29,6 +29,7 @@
>  #include <inttypes.h>
>  #include <assert.h>
>  #include <endian.h>
> +#include <libgen.h>
>
>  #define CS_SIZE 16
>  #define CHUNKS 128
> @@ -884,8 +885,40 @@ main(int argc, char *argv[])
>         if (gen_manifest)
>                 fprintf(out_fp, "Flags: %s\n", flagstring);
>
> +       struct stat64 path_st;
> +       if (fstat64(fd, &path_st)) {
> +               perror("fstat");
> +               exit(-1);
> +       }
> +
>         sum_init(&cs);
> -       sum(fd, 1, &cs, path, "");
> +
> +       if (S_ISDIR(path_st.st_mode)) {
> +               sum(fd, 1, &cs, path, "");
> +       } else if (S_ISREG(path_st.st_mode) || S_ISLNK(path_st.st_mode)) {
> +               // Copy because dirname may modify path
> +               char* path_copy = alloc(strlen(path));
> +               strcpy(path_copy, path);
> +
> +               char* dir_path = dirname(path);
> +               char* name = basename(path_copy);
> +
> +               int dirfd = open(dir_path, O_RDONLY);
> +               if (fd == -1) {
> +                       fprintf(stderr, "failed to open %s: %s\n", dir_path,
> +                               strerror(errno));
> +                       exit(-1);
> +               }
> +
> +               sum_one(dirfd, 1, &cs, dir_path, "", name);

Instead of all of the above, how about just:
               sum_one(fd, 1, &cs, path, "", "");

From looking at sum_one() code, it seems to me like that will work,
but I may be missing something.
It's not that you *want* the name in the checksum, it is not even
part of the metadata that is being synced with fsync.

Other than that patch set looks excellent.
Very pleasant for review :-)

One little thing is missing from the cover letter -
Which tests did you run to verify these changes do not regress existing
tests?

Feel free to add to all the other patches in the series:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

If you are able to change sum_one() of empty name as I suggested,
feel free to add that to this patch as well.

Thanks,
Amir.
Arvind Raghavan May 22, 2020, 1:06 a.m. UTC | #2
On 05/21, Amir Goldstein wrote:
> On Thu, May 21, 2020 at 3:10 AM Arvind Raghavan
> <raghavan.arvind@gmail.com> wrote:
> >
> > Allow regular links and symlinks to be passed as input to fssum.
> >
> > Signed-off-by: Arvind Raghavan <raghavan.arvind@gmail.com>
> > Signed-off-by: Jayashree Mohan <jaya@cs.utexas.edu>
> > Signed-off-by: Vijay Chidambaram <vijay@cs.utexas.edu>
> > ---
> >  src/fssum.c | 35 ++++++++++++++++++++++++++++++++++-
> >  1 file changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/fssum.c b/src/fssum.c
> > index ece0f556..2d1624ca 100644
> > --- a/src/fssum.c
> > +++ b/src/fssum.c
> > @@ -29,6 +29,7 @@
> >  #include <inttypes.h>
> >  #include <assert.h>
> >  #include <endian.h>
> > +#include <libgen.h>
> >
> >  #define CS_SIZE 16
> >  #define CHUNKS 128
> > @@ -884,8 +885,40 @@ main(int argc, char *argv[])
> >         if (gen_manifest)
> >                 fprintf(out_fp, "Flags: %s\n", flagstring);
> >
> > +       struct stat64 path_st;
> > +       if (fstat64(fd, &path_st)) {
> > +               perror("fstat");
> > +               exit(-1);
> > +       }
> > +
> >         sum_init(&cs);
> > -       sum(fd, 1, &cs, path, "");
> > +
> > +       if (S_ISDIR(path_st.st_mode)) {
> > +               sum(fd, 1, &cs, path, "");
> > +       } else if (S_ISREG(path_st.st_mode) || S_ISLNK(path_st.st_mode)) {
> > +               // Copy because dirname may modify path
> > +               char* path_copy = alloc(strlen(path));
> > +               strcpy(path_copy, path);
> > +
> > +               char* dir_path = dirname(path);
> > +               char* name = basename(path_copy);
> > +
> > +               int dirfd = open(dir_path, O_RDONLY);
> > +               if (fd == -1) {
> > +                       fprintf(stderr, "failed to open %s: %s\n", dir_path,
> > +                               strerror(errno));
> > +                       exit(-1);
> > +               }
> > +
> > +               sum_one(dirfd, 1, &cs, dir_path, "", name);
> 
> Instead of all of the above, how about just:
>                sum_one(fd, 1, &cs, path, "", "");
> 
> From looking at sum_one() code, it seems to me like that will work,
> but I may be missing something.
> It's not that you *want* the name in the checksum, it is not even
> part of the metadata that is being synced with fsync.

The issue here is that we preserved the code from sum which does
all its opens using openat with the parent directory fd and a
filename. Since we're trying to reuse that code I believe we need
to have this somewhat ugly boilerplate.

> Other than that patch set looks excellent.
> Very pleasant for review :-)

Thanks! :)

> One little thing is missing from the cover letter -
> Which tests did you run to verify these changes do not regress existing
> tests?

I just ran the relevant tests and encountered a small issue with
the refactoring patch. This is my bad, since we changed lstat to
use fstatat, we are no longer doing a fchdir which a readlink
call later on relies on. I can fix it by changing the readlink to
a readlinkat.

I'll add that change and add the set of relevant patches to the
cover letter in a V2.

> Feel free to add to all the other patches in the series:
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> 
> If you are able to change sum_one() of empty name as I suggested,
> feel free to add that to this patch as well.
> 
> Thanks,
> Amir.
Amir Goldstein May 22, 2020, 5:37 a.m. UTC | #3
On Fri, May 22, 2020 at 4:06 AM Arvind Raghavan
<raghavan.arvind@gmail.com> wrote:
>
> On 05/21, Amir Goldstein wrote:
> > On Thu, May 21, 2020 at 3:10 AM Arvind Raghavan
> > <raghavan.arvind@gmail.com> wrote:
> > >
> > > Allow regular links and symlinks to be passed as input to fssum.
> > >
> > > Signed-off-by: Arvind Raghavan <raghavan.arvind@gmail.com>
> > > Signed-off-by: Jayashree Mohan <jaya@cs.utexas.edu>
> > > Signed-off-by: Vijay Chidambaram <vijay@cs.utexas.edu>
> > > ---
> > >  src/fssum.c | 35 ++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 34 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/fssum.c b/src/fssum.c
> > > index ece0f556..2d1624ca 100644
> > > --- a/src/fssum.c
> > > +++ b/src/fssum.c
> > > @@ -29,6 +29,7 @@
> > >  #include <inttypes.h>
> > >  #include <assert.h>
> > >  #include <endian.h>
> > > +#include <libgen.h>
> > >
> > >  #define CS_SIZE 16
> > >  #define CHUNKS 128
> > > @@ -884,8 +885,40 @@ main(int argc, char *argv[])
> > >         if (gen_manifest)
> > >                 fprintf(out_fp, "Flags: %s\n", flagstring);
> > >
> > > +       struct stat64 path_st;
> > > +       if (fstat64(fd, &path_st)) {
> > > +               perror("fstat");
> > > +               exit(-1);
> > > +       }
> > > +
> > >         sum_init(&cs);
> > > -       sum(fd, 1, &cs, path, "");
> > > +
> > > +       if (S_ISDIR(path_st.st_mode)) {
> > > +               sum(fd, 1, &cs, path, "");
> > > +       } else if (S_ISREG(path_st.st_mode) || S_ISLNK(path_st.st_mode)) {
> > > +               // Copy because dirname may modify path
> > > +               char* path_copy = alloc(strlen(path));
> > > +               strcpy(path_copy, path);

If you stay with this code please use strdup().

> > > +
> > > +               char* dir_path = dirname(path);
> > > +               char* name = basename(path_copy);
> > > +
> > > +               int dirfd = open(dir_path, O_RDONLY);
> > > +               if (fd == -1) {
> > > +                       fprintf(stderr, "failed to open %s: %s\n", dir_path,
> > > +                               strerror(errno));
> > > +                       exit(-1);
> > > +               }
> > > +
> > > +               sum_one(dirfd, 1, &cs, dir_path, "", name);
> >
> > Instead of all of the above, how about just:
> >                sum_one(fd, 1, &cs, path, "", "");
> >
> > From looking at sum_one() code, it seems to me like that will work,
> > but I may be missing something.
> > It's not that you *want* the name in the checksum, it is not even
> > part of the metadata that is being synced with fsync.
>
> The issue here is that we preserved the code from sum which does
> all its opens using openat with the parent directory fd and a
> filename. Since we're trying to reuse that code I believe we need
> to have this somewhat ugly boilerplate.

Ok. But if you stay with this please add a comment about why
this is done with a hint for the future how to fix this properly.

Or (up to you) you can fix it by calling this helper instead of openat():

int open_one(int dirfd, const char *name)
{
    if (!name || !*name)
        return dup(dirfd);
    return openat(dirfd, name, 0);
}

fstatat() can take empty name with AT_EMPTY_PATH flag.
readlinkat() should be able to take an empty name, but documentation
is not clear whether fd must be O_PATH - need to verify if it works with
non O_PATH fd.

Again, you don't have to do this to get my reviewed-by its just if you
want to and then of course do it in a prep patch, the same one that
gets rid of fchdir and converts to fstatat() and readlinkat().

>
> > Other than that patch set looks excellent.
> > Very pleasant for review :-)
>
> Thanks! :)
>
> > One little thing is missing from the cover letter -
> > Which tests did you run to verify these changes do not regress existing
> > tests?
>
> I just ran the relevant tests and encountered a small issue with
> the refactoring patch. This is my bad, since we changed lstat to
> use fstatat, we are no longer doing a fchdir which a readlink
> call later on relies on. I can fix it by changing the readlink to
> a readlinkat.
>

I see two valid options. please chose the one you like.

1. Revert removal of fchdir. let refactoring be only refactoring.
2. Remove fchdir and convert to fstatat/readlinkat in separate prep patch
    (with or without the empty name support suggested above)

> I'll add that change and add the set of relevant patches to the
> cover letter in a V2.

For patches that did not change from v1 please add my reviewed-by
so I know I do not need to re-review them.

Please include summary of "changes since v1" in cover letter.

Thanks,
Amir.
Arvind Raghavan May 31, 2020, 6:28 p.m. UTC | #4
On 05/22, Amir Goldstein wrote:
> On Fri, May 22, 2020 at 4:06 AM Arvind Raghavan
> <raghavan.arvind@gmail.com> wrote:
> >
> > On 05/21, Amir Goldstein wrote:
> > > On Thu, May 21, 2020 at 3:10 AM Arvind Raghavan
> > > <raghavan.arvind@gmail.com> wrote:
> > > >
> > > > Allow regular links and symlinks to be passed as input to fssum.
> > > >
> > > > Signed-off-by: Arvind Raghavan <raghavan.arvind@gmail.com>
> > > > Signed-off-by: Jayashree Mohan <jaya@cs.utexas.edu>
> > > > Signed-off-by: Vijay Chidambaram <vijay@cs.utexas.edu>
> > > > ---
> > > >  src/fssum.c | 35 ++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 34 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/src/fssum.c b/src/fssum.c
> > > > index ece0f556..2d1624ca 100644
> > > > --- a/src/fssum.c
> > > > +++ b/src/fssum.c
> > > > @@ -29,6 +29,7 @@
> > > >  #include <inttypes.h>
> > > >  #include <assert.h>
> > > >  #include <endian.h>
> > > > +#include <libgen.h>
> > > >
> > > >  #define CS_SIZE 16
> > > >  #define CHUNKS 128
> > > > @@ -884,8 +885,40 @@ main(int argc, char *argv[])
> > > >         if (gen_manifest)
> > > >                 fprintf(out_fp, "Flags: %s\n", flagstring);
> > > >
> > > > +       struct stat64 path_st;
> > > > +       if (fstat64(fd, &path_st)) {
> > > > +               perror("fstat");
> > > > +               exit(-1);
> > > > +       }
> > > > +
> > > >         sum_init(&cs);
> > > > -       sum(fd, 1, &cs, path, "");
> > > > +
> > > > +       if (S_ISDIR(path_st.st_mode)) {
> > > > +               sum(fd, 1, &cs, path, "");
> > > > +       } else if (S_ISREG(path_st.st_mode) || S_ISLNK(path_st.st_mode)) {
> > > > +               // Copy because dirname may modify path
> > > > +               char* path_copy = alloc(strlen(path));
> > > > +               strcpy(path_copy, path);
> 
> If you stay with this code please use strdup().
> 
> > > > +
> > > > +               char* dir_path = dirname(path);
> > > > +               char* name = basename(path_copy);
> > > > +
> > > > +               int dirfd = open(dir_path, O_RDONLY);
> > > > +               if (fd == -1) {
> > > > +                       fprintf(stderr, "failed to open %s: %s\n", dir_path,
> > > > +                               strerror(errno));
> > > > +                       exit(-1);
> > > > +               }
> > > > +
> > > > +               sum_one(dirfd, 1, &cs, dir_path, "", name);
> > >
> > > Instead of all of the above, how about just:
> > >                sum_one(fd, 1, &cs, path, "", "");
> > >
> > > From looking at sum_one() code, it seems to me like that will work,
> > > but I may be missing something.
> > > It's not that you *want* the name in the checksum, it is not even
> > > part of the metadata that is being synced with fsync.
> >
> > The issue here is that we preserved the code from sum which does
> > all its opens using openat with the parent directory fd and a
> > filename. Since we're trying to reuse that code I believe we need
> > to have this somewhat ugly boilerplate.
> 
> Ok. But if you stay with this please add a comment about why
> this is done with a hint for the future how to fix this properly.
> 
> Or (up to you) you can fix it by calling this helper instead of openat():
> 
> int open_one(int dirfd, const char *name)
> {
>     if (!name || !*name)
>         return dup(dirfd);
>     return openat(dirfd, name, 0);
> }
> 
> fstatat() can take empty name with AT_EMPTY_PATH flag.
> readlinkat() should be able to take an empty name, but documentation
> is not clear whether fd must be O_PATH - need to verify if it works with
> non O_PATH fd.

I might be doing something wrong but I can't seem to get
AT_EMPTY_PATH to work with fstatat. I don't see it on the man
page and when I pass it to fstatat it errors out with invalid
argument.

In the case that fstatat doesn't support AT_EMPTY_PATH, do you
see another way to fix the above code?

Thanks,
Arvind
Amir Goldstein May 31, 2020, 7:31 p.m. UTC | #5
On Sun, May 31, 2020 at 9:29 PM Arvind Raghavan
<raghavan.arvind@gmail.com> wrote:
>
> On 05/22, Amir Goldstein wrote:
> > On Fri, May 22, 2020 at 4:06 AM Arvind Raghavan
> > <raghavan.arvind@gmail.com> wrote:
> > >
> > > On 05/21, Amir Goldstein wrote:
> > > > On Thu, May 21, 2020 at 3:10 AM Arvind Raghavan
> > > > <raghavan.arvind@gmail.com> wrote:
> > > > >
> > > > > Allow regular links and symlinks to be passed as input to fssum.
> > > > >
> > > > > Signed-off-by: Arvind Raghavan <raghavan.arvind@gmail.com>
> > > > > Signed-off-by: Jayashree Mohan <jaya@cs.utexas.edu>
> > > > > Signed-off-by: Vijay Chidambaram <vijay@cs.utexas.edu>
> > > > > ---
> > > > >  src/fssum.c | 35 ++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 34 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/src/fssum.c b/src/fssum.c
> > > > > index ece0f556..2d1624ca 100644
> > > > > --- a/src/fssum.c
> > > > > +++ b/src/fssum.c
> > > > > @@ -29,6 +29,7 @@
> > > > >  #include <inttypes.h>
> > > > >  #include <assert.h>
> > > > >  #include <endian.h>
> > > > > +#include <libgen.h>
> > > > >
> > > > >  #define CS_SIZE 16
> > > > >  #define CHUNKS 128
> > > > > @@ -884,8 +885,40 @@ main(int argc, char *argv[])
> > > > >         if (gen_manifest)
> > > > >                 fprintf(out_fp, "Flags: %s\n", flagstring);
> > > > >
> > > > > +       struct stat64 path_st;
> > > > > +       if (fstat64(fd, &path_st)) {
> > > > > +               perror("fstat");
> > > > > +               exit(-1);
> > > > > +       }
> > > > > +
> > > > >         sum_init(&cs);
> > > > > -       sum(fd, 1, &cs, path, "");
> > > > > +
> > > > > +       if (S_ISDIR(path_st.st_mode)) {
> > > > > +               sum(fd, 1, &cs, path, "");
> > > > > +       } else if (S_ISREG(path_st.st_mode) || S_ISLNK(path_st.st_mode)) {
> > > > > +               // Copy because dirname may modify path
> > > > > +               char* path_copy = alloc(strlen(path));
> > > > > +               strcpy(path_copy, path);
> >
> > If you stay with this code please use strdup().
> >
> > > > > +
> > > > > +               char* dir_path = dirname(path);
> > > > > +               char* name = basename(path_copy);
> > > > > +
> > > > > +               int dirfd = open(dir_path, O_RDONLY);
> > > > > +               if (fd == -1) {
> > > > > +                       fprintf(stderr, "failed to open %s: %s\n", dir_path,
> > > > > +                               strerror(errno));
> > > > > +                       exit(-1);
> > > > > +               }
> > > > > +
> > > > > +               sum_one(dirfd, 1, &cs, dir_path, "", name);
> > > >
> > > > Instead of all of the above, how about just:
> > > >                sum_one(fd, 1, &cs, path, "", "");
> > > >
> > > > From looking at sum_one() code, it seems to me like that will work,
> > > > but I may be missing something.
> > > > It's not that you *want* the name in the checksum, it is not even
> > > > part of the metadata that is being synced with fsync.
> > >
> > > The issue here is that we preserved the code from sum which does
> > > all its opens using openat with the parent directory fd and a
> > > filename. Since we're trying to reuse that code I believe we need
> > > to have this somewhat ugly boilerplate.
> >
> > Ok. But if you stay with this please add a comment about why
> > this is done with a hint for the future how to fix this properly.
> >
> > Or (up to you) you can fix it by calling this helper instead of openat():
> >
> > int open_one(int dirfd, const char *name)
> > {
> >     if (!name || !*name)
> >         return dup(dirfd);
> >     return openat(dirfd, name, 0);
> > }
> >
> > fstatat() can take empty name with AT_EMPTY_PATH flag.
> > readlinkat() should be able to take an empty name, but documentation
> > is not clear whether fd must be O_PATH - need to verify if it works with
> > non O_PATH fd.
>
> I might be doing something wrong but I can't seem to get
> AT_EMPTY_PATH to work with fstatat. I don't see it on the man
> page and when I pass it to fstatat it errors out with invalid
> argument.
>

https://www.man7.org/linux/man-pages/man2/stat.2.html

Says:
       AT_EMPTY_PATH (since Linux 2.6.39)
... This flag is Linux-specific; define _GNU_SOURCE to obtain its definition.

And I see that fssum.c does define _GNU_SOURCE, so not sure what the
problem is.

> In the case that fstatat doesn't support AT_EMPTY_PATH, do you
> see another way to fix the above code?
>

Yes, you can pass path_st to sum_one() you already have it.

But I don't think that you or anyone that cares about xfstests is running
a kernel older than 2.6.39...

Thanks,
Amir.
Arvind Raghavan June 21, 2020, 11:07 p.m. UTC | #6
On 05/31, Amir Goldstein wrote:
> > I might be doing something wrong but I can't seem to get
> > AT_EMPTY_PATH to work with fstatat. I don't see it on the man
> > page and when I pass it to fstatat it errors out with invalid
> > argument.
> >
> 
> https://www.man7.org/linux/man-pages/man2/stat.2.html
> 
> Says:
>        AT_EMPTY_PATH (since Linux 2.6.39)
> ... This flag is Linux-specific; define _GNU_SOURCE to obtain its definition.
> 
> And I see that fssum.c does define _GNU_SOURCE, so not sure what the
> problem is.

Ah yes, it is in the documentation, and I've got it working now!
Must've been some other issue I had... I apologize for the delay,
I'll send out a v2 shortly.

Thanks,
Arvind
diff mbox series

Patch

diff --git a/src/fssum.c b/src/fssum.c
index ece0f556..2d1624ca 100644
--- a/src/fssum.c
+++ b/src/fssum.c
@@ -29,6 +29,7 @@ 
 #include <inttypes.h>
 #include <assert.h>
 #include <endian.h>
+#include <libgen.h>
 
 #define CS_SIZE 16
 #define CHUNKS	128
@@ -884,8 +885,40 @@  main(int argc, char *argv[])
 	if (gen_manifest)
 		fprintf(out_fp, "Flags: %s\n", flagstring);
 
+	struct stat64 path_st;
+	if (fstat64(fd, &path_st)) {
+		perror("fstat");
+		exit(-1);
+	}
+
 	sum_init(&cs);
-	sum(fd, 1, &cs, path, "");
+
+	if (S_ISDIR(path_st.st_mode)) {
+		sum(fd, 1, &cs, path, "");
+	} else if (S_ISREG(path_st.st_mode) || S_ISLNK(path_st.st_mode)) {
+		// Copy because dirname may modify path
+		char* path_copy = alloc(strlen(path));
+		strcpy(path_copy, path);
+
+		char* dir_path = dirname(path);
+		char* name = basename(path_copy);
+
+		int dirfd = open(dir_path, O_RDONLY);
+		if (fd == -1) {
+			fprintf(stderr, "failed to open %s: %s\n", dir_path,
+				strerror(errno));
+			exit(-1);
+		}
+
+		sum_one(dirfd, 1, &cs, dir_path, "", name);
+
+		free(path_copy);
+		close(dirfd);
+	} else {
+		fprintf(stderr, "path must be file or dir: %s", path);
+		exit(-1);
+	}
+
 	sum_fini(&cs);
 
 	close(fd);