diff mbox

[RFC,v2a,11/12] net: ceph: use vfs_time data type instead of timespec

Message ID 1455269766-2994-12-git-send-email-deepa.kernel@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Deepa Dinamani Feb. 12, 2016, 9:36 a.m. UTC
The VFS inode timestamps are not y2038 safe as they use
struct timespec. These will be changed to use struct timespec64
instead and that is y2038 safe.
But, since the above data type conversion will break the end
file systems, use vfs_time aliases here to access inode times.

These timestamps are passed in as arguments to functions
using inode timestamps. Hence, these need to change along
with vfs to support 64 bit timestamps. vfs_time helps do
this transition.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---
 include/linux/ceph/messenger.h  | 1 +
 include/linux/ceph/osd_client.h | 4 ++--
 net/ceph/osd_client.c           | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

Comments

Dave Chinner Feb. 13, 2016, 10:08 p.m. UTC | #1
On Fri, Feb 12, 2016 at 01:36:05AM -0800, Deepa Dinamani wrote:
> The VFS inode timestamps are not y2038 safe as they use
> struct timespec. These will be changed to use struct timespec64
> instead and that is y2038 safe.
> But, since the above data type conversion will break the end
> file systems, use vfs_time aliases here to access inode times.
> 
> These timestamps are passed in as arguments to functions
> using inode timestamps. Hence, these need to change along
> with vfs to support 64 bit timestamps. vfs_time helps do
> this transition.
> 
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>

Just a point to highlight the problem with this approach:

> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index f8f2359..1273db6 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -2401,7 +2401,7 @@ bad:
>   */
>  void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off,
>  				struct ceph_snap_context *snapc, u64 snap_id,
> -				struct timespec *mtime)
> +				struct vfs_time *mtime)
>  {
>  	struct ceph_msg *msg = req->r_request;
>  	void *p;

So this change assumes that mtime is not passed by reference to
another function. If we change vfs_time to be a timespec64, then
dereferencing in this function works fine, but passing to another
function will not because that function will be expecting a
timespec.

That, indeed, is what happens here. A few lines into this function:

        if (req->r_flags & CEPH_OSD_FLAG_WRITE)
                ceph_encode_timespec(p, mtime);

And that function:

static inline void ceph_encode_timespec(struct ceph_timespec *tv,
                                        const struct timespec *ts)
{
        tv->tv_sec = cpu_to_le32((u32)ts->tv_sec);
        tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec);
}

expects a timespec. It will silently lose 64 bit times even if it
did compile. I note in version 2b, the mtime was not passed by
reference as a vfs time, but converted at the call site to
a timespec and so the internal usage of the timestamp remains
unchanged and unaffected by a VFS level timespec->timespec64 change.

I think an approach that requires changes to the API without
actually beign able to verify they are correct, fully propagated or
don't impact on write/disk formats before the final change of the
VFS type is not going to fly. This is the sort of subtle bug that
can occur with type changes, and hence why I think that the fs
developers should be left to do the conversion of their filesystem
to support 64 bit times (i.e. approach 2b).

Any change is going to take a significant amount of testing and
verification, and that's something we don't have yet. Nobody has
written any tests for xfstests to verify correct 64 bit timestamp
behaviour, nor do we have tests to verify 32 bit timestamp behaviour
on a 64 bit time kernel. These are things that we are going to need;
all filesystems should behave the same w.r.t. these configurations,
so we really do need regression tests for this....

Cheers,

Dave.
Deepa Dinamani Feb. 14, 2016, 1:46 a.m. UTC | #2
On Sat, Feb 13, 2016 at 2:08 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Feb 12, 2016 at 01:36:05AM -0800, Deepa Dinamani wrote:
>> The VFS inode timestamps are not y2038 safe as they use
>> struct timespec. These will be changed to use struct timespec64
>> instead and that is y2038 safe.
>> But, since the above data type conversion will break the end
>> file systems, use vfs_time aliases here to access inode times.
>>
>> These timestamps are passed in as arguments to functions
>> using inode timestamps. Hence, these need to change along
>> with vfs to support 64 bit timestamps. vfs_time helps do
>> this transition.
>>
>> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
>
> Just a point to highlight the problem with this approach:
>
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index f8f2359..1273db6 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -2401,7 +2401,7 @@ bad:
>>   */
>>  void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off,
>>                               struct ceph_snap_context *snapc, u64 snap_id,
>> -                             struct timespec *mtime)
>> +                             struct vfs_time *mtime)
>>  {
>>       struct ceph_msg *msg = req->r_request;
>>       void *p;
>
> So this change assumes that mtime is not passed by reference to
> another function. If we change vfs_time to be a timespec64, then
> dereferencing in this function works fine, but passing to another
> function will not because that function will be expecting a
> timespec.
>
> That, indeed, is what happens here. A few lines into this function:
>
>         if (req->r_flags & CEPH_OSD_FLAG_WRITE)
>                 ceph_encode_timespec(p, mtime);
>
> And that function:
>
> static inline void ceph_encode_timespec(struct ceph_timespec *tv,
>                                         const struct timespec *ts)
> {
>         tv->tv_sec = cpu_to_le32((u32)ts->tv_sec);
>         tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec);
> }

I'm not sure where you picked up this encode function from.

You might be missing the patches( 9 and 10) before this?:

2b5f8e517c6fd7121fc1b890c51c6256bc21beb6 net: ceph: use vfs_time data
type instead of timespec
ca5b82952a6a522ae058ccede57ba1a71da498c5 fs: ceph: Replace timespec
data type with vfs_time
3a3ac0bdd23284c4f27a7ab1c133056c1a998075 fs: ceph: Change encode and
decode functions to use vfs_time

So encode function actually looks like

-static inline void ceph_decode_timespec(struct timespec *ts,
+static inline void ceph_decode_timespec(struct vfs_time *ts,
                                        const struct ceph_timespec *tv)
 {
-       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
-       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
+       ts->tv_sec = (s64)(u32)le32_to_cpu(tv->tv_sec);
+       ts->tv_nsec = (long)(u32)le32_to_cpu(tv->tv_nsec);
 }

There is a conversion error here which I will be fixing separately
from the series.

Also, there is another commit in my tree that is pointed to in the cover letter
that is also required here:

40219ae801c0284a233ed908b07bb468d219cbc8 net: ceph: Remove
CURRENT_TIME references

Changes have been split so that they can done in manageable chunks
just like how we are not
changing all filesystems at once.

> I think an approach that requires changes to the API without
> actually beign able to verify they are correct, fully propagated or
> don't impact on write/disk formats before the final change of the
> VFS type is not going to fly. This is the sort of subtle bug that
> can occur with type changes, and hence why I think that the fs
> developers should be left to do the conversion of their filesystem
> to support 64 bit times (i.e. approach 2b).

Approach 2b has the same problem of not being able to verify the
conversion before the vfs switch. Consider what happens if you miss changing
one of the instances of direct inode time access. So 2b is also not completely
verifiable that the changes are completely propagated. The only approach that
does this is the 2c because the data types in the individual filesystems are
decoupled from vfs data types from the get go.

Besides, anything omitted by 2a or 2b in the process of conversion should be
easily verifiable when vfs does switch. At this point, there will be
warnings in case of pointer conversion or
errors in case of pass by value, if the data types do not match.

Apart from this, process of how individual filesystems are converted
will help avoid
these bugs. Here is one of the tricks I plan to do (consider example
approach 2a):

1. Change an individual filesystem to use vfs_time.
2. Change vfs to timespec64 and verify that the targeted filesystem
will actually
    compile.

Tagging tricks are also useful here.

Keep in mind that timespec = timespec64 already on 64 bit systems.
But, still there might be some tricky cases which should be okay
because it will have to reviewed by individual
filesystem maintainers.

> Any change is going to take a significant amount of testing and
> verification, and that's something we don't have yet. Nobody has
> written any tests for xfstests to verify correct 64 bit timestamp
> behaviour, nor do we have tests to verify 32 bit timestamp behaviour
> on a 64 bit time kernel. These are things that we are going to need;
> all filesystems should behave the same w.r.t. these configurations,
> so we really do need regression tests for this....

Agree. This is needed regardless of what approach is chosen.
And, this is a problem for all filesystems even today.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Deepa Dinamani Feb. 14, 2016, 2:05 a.m. UTC | #3
>> static inline void ceph_encode_timespec(struct ceph_timespec *tv,
>>                                         const struct timespec *ts)
>> {
>>         tv->tv_sec = cpu_to_le32((u32)ts->tv_sec);
>>         tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec);
>> }

Pointed to decode function change in previous email.
Pasting encode function change also here:

 static inline void ceph_encode_timespec(struct ceph_timespec *tv,
-                                       const struct timespec *ts)
+                                       const struct vfs_time *ts)


-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Feb. 14, 2016, 9 p.m. UTC | #4
On Sat, Feb 13, 2016 at 05:46:11PM -0800, Deepa Dinamani wrote:
> On Sat, Feb 13, 2016 at 2:08 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Feb 12, 2016 at 01:36:05AM -0800, Deepa Dinamani wrote:
> >> The VFS inode timestamps are not y2038 safe as they use
> >> struct timespec. These will be changed to use struct timespec64
> >> instead and that is y2038 safe.
> >> But, since the above data type conversion will break the end
> >> file systems, use vfs_time aliases here to access inode times.
> >>
> >> These timestamps are passed in as arguments to functions
> >> using inode timestamps. Hence, these need to change along
> >> with vfs to support 64 bit timestamps. vfs_time helps do
> >> this transition.
> >>
> >> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> >
> > Just a point to highlight the problem with this approach:
> >
> >> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> >> index f8f2359..1273db6 100644
> >> --- a/net/ceph/osd_client.c
> >> +++ b/net/ceph/osd_client.c
> >> @@ -2401,7 +2401,7 @@ bad:
> >>   */
> >>  void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off,
> >>                               struct ceph_snap_context *snapc, u64 snap_id,
> >> -                             struct timespec *mtime)
> >> +                             struct vfs_time *mtime)
> >>  {
> >>       struct ceph_msg *msg = req->r_request;
> >>       void *p;
> >
> > So this change assumes that mtime is not passed by reference to
> > another function. If we change vfs_time to be a timespec64, then
> > dereferencing in this function works fine, but passing to another
> > function will not because that function will be expecting a
> > timespec.
> >
> > That, indeed, is what happens here. A few lines into this function:
> >
> >         if (req->r_flags & CEPH_OSD_FLAG_WRITE)
> >                 ceph_encode_timespec(p, mtime);
> >
> > And that function:
> >
> > static inline void ceph_encode_timespec(struct ceph_timespec *tv,
> >                                         const struct timespec *ts)
> > {
> >         tv->tv_sec = cpu_to_le32((u32)ts->tv_sec);
> >         tv->tv_nsec = cpu_to_le32((u32)ts->tv_nsec);
> > }
> 
> I'm not sure where you picked up this encode function from.
> 
> You might be missing the patches( 9 and 10) before this?:
> 
> 2b5f8e517c6fd7121fc1b890c51c6256bc21beb6 net: ceph: use vfs_time data
> type instead of timespec
> ca5b82952a6a522ae058ccede57ba1a71da498c5 fs: ceph: Replace timespec
> data type with vfs_time
> 3a3ac0bdd23284c4f27a7ab1c133056c1a998075 fs: ceph: Change encode and
> decode functions to use vfs_time

So I missed this last patch when quickly reading throught he series.
I think this is being broken up way too much, and that makes it hard
to see how the changes relate.

i.e. this series could have easily been just 3 or 4 patches - it's
only ~100 lines of code total that is changed by the series.

FWIW, let me put this in context for you, so maybe you'll understand
why I think this timespec64 changeover is actually a trivial, simply
thing. Last week I reviewed about 10,000 lines of new code amongst
some 14,000 lines of change (twice) for xfstests. That was amongst
getting ~1000 lines of my own changes reviewed and committed into
the XFS kernel tree, handling user problems, reviewing and
commenting on DAX changes, etc.

This week I only have a libxfs kernel/userspace code sync to do
(only 500 lines of change this time), and then I've about 10,000
lines of complex new XFS kernel code to review (i.e. for reverse
mapping, reflink, dedupe and copy-on-write support). I'll have to
review that more than once, and once that is done and all th changes
have been propagated over into the userspace code, I've got another
10,000 lines of code in userspace (again for reflink, etc) to
review, test and merge.

So, excuse me if I made a mistake and missed something in a patchset
that a) had 3 different versions posted, b) is way too fine-grained,
and c) being treated like a mountain when it's really a tiny
molehill. I do have much more important things to do with my time
than be dragged into another silly "oh this is so difficult and
hard" bikeshedding argument when I could easily write the entire
patchset to do a timespec64 changeover for the VFS in a couple of
hours. It's just not that hard to do.

And, FWIW, I'm still waiting to hear how we're going to regression
test all this. Has anyone written any xfstests yet to ensure that
all the filesystems behave the same and we won't break anything in
future as we add 64 bit timestamp support to filesystem on-disk
formats? IMO, there's more work in writing the regression tests to
make sure everything works correctly in all the different possible
combinations of filesystem, kernel and userspace support (e.g. 32 on 32,
32 on 64, 64 on 32 and 64 on 64). I'm much more concerned about
this aspect of the problem than actually changing the VFS
code, because without it we can't verify the changes we are making
are behaving correctly...

Cheers,

Dave.
Arnd Bergmann Feb. 17, 2016, 9:32 a.m. UTC | #5
On Monday 15 February 2016 08:00:50 Dave Chinner wrote:
> On Sat, Feb 13, 2016 at 05:46:11PM -0800, Deepa Dinamani wrote:
> > On Sat, Feb 13, 2016 at 2:08 PM, Dave Chinner <david@fromorbit.com> wrote:
> > > On Fri, Feb 12, 2016 at 01:36:05AM -0800, Deepa Dinamani wrote:

> So, excuse me if I made a mistake and missed something in a patchset
> that a) had 3 different versions posted, b) is way too fine-grained,
> and c) being treated like a mountain when it's really a tiny
> molehill. I do have much more important things to do with my time
> than be dragged into another silly "oh this is so difficult and
> hard" bikeshedding argument when I could easily write the entire
> patchset to do a timespec64 changeover for the VFS in a couple of
> hours. It's just not that hard to do.
> 
> And, FWIW, I'm still waiting to hear how we're going to regression
> test all this. Has anyone written any xfstests yet to ensure that
> all the filesystems behave the same and we won't break anything in
> future as we add 64 bit timestamp support to filesystem on-disk
> formats? IMO, there's more work in writing the regression tests to
> make sure everything works correctly in all the different possible
> combinations of filesystem, kernel and userspace support (e.g. 32 on 32,
> 32 on 64, 64 on 32 and 64 on 64). I'm much more concerned about
> this aspect of the problem than actually changing the VFS
> code, because without it we can't verify the changes we are making
> are behaving correctly...

You are mixing up way too many things here, for this series all
we need is for you to say that one of the approaches is ok, and they
are all to the point where they are simple enough that they don't
really do much at all. Deepa is taking baby steps here because
you complained about v1 being too complex.

This series is not about changing the on-disk format, it is not
even changing the VFS time format (yet), it's just a preparation
so we can eventually change it.

There are four different things that are going on at the
same time, all independent of one another:

1. Changing the file systems so we are able to do the change
   in struct inode, this series. The *only* part we care about
   here is that this does not change the existing behavior
   on either 32-bit or 64-bit systems, and that should be trivial
   to review.

2. Changing the file systems to provide information to VFS about
   the time stamp ranges they support in order to do proper
   handling of overflows in VFS. Deepa has posted a first set
   of patches to always use current_fs_time() consistently,
   work on that is continuing and once done, we can debate the
   policy for what should happen in case of overflow.

3. Writing test cases in xfstests and/or LTP. Yes, we need them,
   and I think Deepa has started on those, but I don't think
   they are needed at this point as there is little to test before
   steps 1 and 2 are done.

4. Changing file systems to use longer on-disk timestamps where
   needed. This is completely independent of anything else and up
   to the individual file system developers. Anyone can test
   this now on 64-bit architectures, and most file systems we
   care about (xfs being the notable exception, ext4 also
   until very recently) already do this properly.

After 1, 2 and 3 are done, the simple patch to switch over VFS
can be implemented and tested, followed by whatever work remains
to switch over file systems to use 64-bit timestamps in the
kernel (independent of what they use on disk, again).

My current line of thinking is that for step 1, I'd let Deepa
pick one of the three approaches she posted (I don't think we
found any showstoppers), and put the patches in my y2038 tree
for merging in 4.6. We can easily leave out the file systems that
have conflicts against linux-next, and you can put Deepa's
patch or another implementation of that into 4.7.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index afe886b..28bba12 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -8,6 +8,7 @@ 
 #include <linux/radix-tree.h>
 #include <linux/uio.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 #include <net/net_namespace.h>
 
 #include <linux/ceph/types.h>
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 7506b48..2b6f08b 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -322,7 +322,7 @@  extern struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *
 extern void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off,
 				    struct ceph_snap_context *snapc,
 				    u64 snap_id,
-				    struct timespec *mtime);
+				    struct vfs_time *mtime);
 
 extern struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *,
 				      struct ceph_file_layout *layout,
@@ -364,7 +364,7 @@  extern int ceph_osdc_writepages(struct ceph_osd_client *osdc,
 				struct ceph_snap_context *sc,
 				u64 off, u64 len,
 				u32 truncate_seq, u64 truncate_size,
-				struct timespec *mtime,
+				struct vfs_time *mtime,
 				struct page **pages, int nr_pages);
 
 /* watch/notify events */
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index f8f2359..1273db6 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2401,7 +2401,7 @@  bad:
  */
 void ceph_osdc_build_request(struct ceph_osd_request *req, u64 off,
 				struct ceph_snap_context *snapc, u64 snap_id,
-				struct timespec *mtime)
+				struct vfs_time *mtime)
 {
 	struct ceph_msg *msg = req->r_request;
 	void *p;