diff mbox

[09/10] fs: ceph: Replace CURRENT_TIME by ktime_get_real_ts()

Message ID 1454479670-8204-10-git-send-email-deepa.kernel@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Deepa Dinamani Feb. 3, 2016, 6:07 a.m. UTC
This is in preparation for the series that transitions
filesystem timestamps to use 64 bit time and hence make
them y2038 safe.

CURRENT_TIME macro will be deleted before merging the
aforementioned series.

Filesystems will use current_fs_time() instead of
CURRENT_TIME.
Use ktime_get_real_ts() here as this is not filesystem time.
ktime_get_real_ts() returns the timestamp in ns which can
be used to calculate MDS request timestamp.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
---
 fs/ceph/mds_client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Yan, Zheng Feb. 3, 2016, 2:34 p.m. UTC | #1
On Wed, Feb 3, 2016 at 2:07 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> This is in preparation for the series that transitions
> filesystem timestamps to use 64 bit time and hence make
> them y2038 safe.
>
> CURRENT_TIME macro will be deleted before merging the
> aforementioned series.
>
> Filesystems will use current_fs_time() instead of
> CURRENT_TIME.
> Use ktime_get_real_ts() here as this is not filesystem time.
> ktime_get_real_ts() returns the timestamp in ns which can
> be used to calculate MDS request timestamp.
>
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> Cc: "Yan, Zheng" <zyan@redhat.com>
> Cc: Sage Weil <sage@redhat.com>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: ceph-devel@vger.kernel.org
> ---
>  fs/ceph/mds_client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index e7b130a..348b22e 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
>         init_completion(&req->r_safe_completion);
>         INIT_LIST_HEAD(&req->r_unsafe_item);
>
> -       req->r_stamp = CURRENT_TIME;
> +       ktime_get_real_ts(&req->r_stamp);

I think we should use current_fs_time() here. I have squash the change
into another patch

>
>         req->r_op = op;
>         req->r_direct_mode = mode;
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Deepa Dinamani Feb. 3, 2016, 4:17 p.m. UTC | #2
On Wed, Feb 03, 2016 at 10:34:00PM +0800, Yan, Zheng wrote:
> On Wed, Feb 3, 2016 at 2:07 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> > This is in preparation for the series that transitions
> > filesystem timestamps to use 64 bit time and hence make
> > them y2038 safe.
> >
> > CURRENT_TIME macro will be deleted before merging the
> > aforementioned series.
> >
> > Filesystems will use current_fs_time() instead of
> > CURRENT_TIME.
> > Use ktime_get_real_ts() here as this is not filesystem time.
> > ktime_get_real_ts() returns the timestamp in ns which can
> > be used to calculate MDS request timestamp.
> >
> > Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> > Cc: "Yan, Zheng" <zyan@redhat.com>
> > Cc: Sage Weil <sage@redhat.com>
> > Cc: Ilya Dryomov <idryomov@gmail.com>
> > Cc: ceph-devel@vger.kernel.org
> > ---
> >  fs/ceph/mds_client.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index e7b130a..348b22e 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
> >         init_completion(&req->r_safe_completion);
> >         INIT_LIST_HEAD(&req->r_unsafe_item);
> >
> > -       req->r_stamp = CURRENT_TIME;
> > +       ktime_get_real_ts(&req->r_stamp);
> 
> I think we should use current_fs_time() here. I have squash the change
> into another patch

Ok. I missed this commit b8e69066d8afa8d2670dc697252ff0e5907aafad
earlier which says that the r_stamp is used as ctime now.
I had assumed that this is a message timestamp.

I was not able to find any documentation on what the server does
with the message sent by the client. Where can I find that?

So, this should actually look like

req->r_stamp = current_fs_time(mdsc->fsc->sb);

Let me know if you want me to resend.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 3, 2016, 9:27 p.m. UTC | #3
On Wednesday 03 February 2016 08:17:23 Deepa Dinamani wrote:
> On Wed, Feb 03, 2016 at 10:34:00PM +0800, Yan, Zheng wrote:
> > On Wed, Feb 3, 2016 at 2:07 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
> > >         init_completion(&req->r_safe_completion);
> > >         INIT_LIST_HEAD(&req->r_unsafe_item);
> > >
> > > -       req->r_stamp = CURRENT_TIME;
> > > +       ktime_get_real_ts(&req->r_stamp);
> > 
> > I think we should use current_fs_time() here. I have squash the change
> > into another patch
> 
> Ok. I missed this commit b8e69066d8afa8d2670dc697252ff0e5907aafad
> earlier which says that the r_stamp is used as ctime now.
> I had assumed that this is a message timestamp.
> 
> I was not able to find any documentation on what the server does
> with the message sent by the client. Where can I find that?
> 
> So, this should actually look like
> 
> req->r_stamp = current_fs_time(mdsc->fsc->sb);
> 
> Let me know if you want me to resend.

I see that the timestamp is sent using

	ceph_encode_copy(&p, &req->r_stamp, sizeof(req->r_stamp));

What happens with the timestamp across reboots if we change the
type? I assume the data will not be used across reboots, if it
does, we already have a problem on machines that can boot
both big-endian and little-endian kernels, or that can boot
both 32-bit and 64-bit kernels.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yan, Zheng Feb. 4, 2016, 2 a.m. UTC | #4
> On Feb 4, 2016, at 05:27, Arnd Bergmann <arnd@arndb.de> wrote:
> 
> On Wednesday 03 February 2016 08:17:23 Deepa Dinamani wrote:
>> On Wed, Feb 03, 2016 at 10:34:00PM +0800, Yan, Zheng wrote:
>>> On Wed, Feb 3, 2016 at 2:07 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>>>> --- a/fs/ceph/mds_client.c
>>>> +++ b/fs/ceph/mds_client.c
>>>> @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
>>>>        init_completion(&req->r_safe_completion);
>>>>        INIT_LIST_HEAD(&req->r_unsafe_item);
>>>> 
>>>> -       req->r_stamp = CURRENT_TIME;
>>>> +       ktime_get_real_ts(&req->r_stamp);
>>> 
>>> I think we should use current_fs_time() here. I have squash the change
>>> into another patch
>> 
>> Ok. I missed this commit b8e69066d8afa8d2670dc697252ff0e5907aafad
>> earlier which says that the r_stamp is used as ctime now.
>> I had assumed that this is a message timestamp.
>> 
>> I was not able to find any documentation on what the server does
>> with the message sent by the client. Where can I find that?
>> 
>> So, this should actually look like
>> 
>> req->r_stamp = current_fs_time(mdsc->fsc->sb);
>> 
>> Let me know if you want me to resend.

I have already squashed the change into patch 8

> 
> I see that the timestamp is sent using
> 
> 	ceph_encode_copy(&p, &req->r_stamp, sizeof(req->r_stamp));

this code is outdated, current code is:

{
          struct ceph_timespec ts;
          ceph_encode_timespec(&ts, &req->r_stamp);
          ceph_encode_copy(&p, &ts, sizeof(ts));
}
> 
> What happens with the timestamp across reboots if we change the
> type? I assume the data will not be used across reboots, if it
> does, we already have a problem on machines that can boot
> both big-endian and little-endian kernels, or that can boot
> both 32-bit and 64-bit kernels.
> 
> 	Arnd

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 4, 2016, 8:30 a.m. UTC | #5
On Thursday 04 February 2016 10:00:19 Yan, Zheng wrote:
> > On Feb 4, 2016, at 05:27, Arnd Bergmann <arnd@arndb.de> wrote:
> {
>           struct ceph_timespec ts;
>           ceph_encode_timespec(&ts, &req->r_stamp);
>           ceph_encode_copy(&p, &ts, sizeof(ts));
> }

Ok, that does make the behavior consistent on all architectures, but
leads to a different question:

struct ceph_timespec {
        __le32 tv_sec;
        __le32 tv_nsec;
} __attribute__ ((packed));

How do you define ceph_timespec, is tv_sec supposed to be signed or unsigned?

It seems that you treat it as signed, meaning you interpret times
from the server as being in the [1902..2038] range, rather than the
[1970..2106] range:

static inline void ceph_decode_timespec(struct timespec *ts,
                                        const struct ceph_timespec *tv)
{
        ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
        ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
}

Is that intentional and documented? If yes, what is your plan to deal
with y2038 support?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov Feb. 4, 2016, 9:01 a.m. UTC | #6
On Thu, Feb 4, 2016 at 9:30 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 04 February 2016 10:00:19 Yan, Zheng wrote:
>> > On Feb 4, 2016, at 05:27, Arnd Bergmann <arnd@arndb.de> wrote:
>> {
>>           struct ceph_timespec ts;
>>           ceph_encode_timespec(&ts, &req->r_stamp);
>>           ceph_encode_copy(&p, &ts, sizeof(ts));
>> }
>
> Ok, that does make the behavior consistent on all architectures, but
> leads to a different question:
>
> struct ceph_timespec {
>         __le32 tv_sec;
>         __le32 tv_nsec;
> } __attribute__ ((packed));
>
> How do you define ceph_timespec, is tv_sec supposed to be signed or unsigned?
>
> It seems that you treat it as signed, meaning you interpret times
> from the server as being in the [1902..2038] range, rather than the
> [1970..2106] range:
>
> static inline void ceph_decode_timespec(struct timespec *ts,
>                                         const struct ceph_timespec *tv)
> {
>         ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
>         ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
> }
>
> Is that intentional and documented? If yes, what is your plan to deal
> with y2038 support?

tv_sec is used as a time_t, so signed.  The problem is that ceph_timespec is
not only passed over the wire, but is also stored on disk, part of quite a few
other data structures.  The plan is to eventually switch to a 64-bit tv_sec and
tv_nsec, bump the version on all the structures that contain it and add
a cluster-wide feature bit to deal with older clients.  We've recently had
a discussion about this, so it may even happen in a not so distant future, but
no promises ;)

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 4, 2016, 1:31 p.m. UTC | #7
On Thursday 04 February 2016 10:01:31 Ilya Dryomov wrote:
> On Thu, Feb 4, 2016 at 9:30 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 04 February 2016 10:00:19 Yan, Zheng wrote:
> >> > On Feb 4, 2016, at 05:27, Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > static inline void ceph_decode_timespec(struct timespec *ts,
> >                                         const struct ceph_timespec *tv)
> > {
> >         ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
> >         ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
> > }
> >
> > Is that intentional and documented? If yes, what is your plan to deal
> > with y2038 support?
> 
> tv_sec is used as a time_t, so signed.  The problem is that ceph_timespec is
> not only passed over the wire, but is also stored on disk, part of quite a few
> other data structures. 

That is only part of the issue though:

Most file systems that store a timespec on disk define the function
differently:

static inline void ceph_decode_timespec(struct timespec *ts,
                                        const struct ceph_timespec *tv)
{
        ts->tv_sec = (time_t)(u32)le32_to_cpu(tv->tv_sec);
        ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
}

On systems that have a 64-bit time_t, the 1902..1970 interval
(0xffffffff80000000..0xffffffffffffffff) and the 2038..2106
interval (0x0000000080000000..0x00000000ffffffff) are written
as the same 32-bit numbers, so when reading back you have to
decide which interpretation you want, and your cast to
__kernel_time_t means that you get the first representation on
both 32-bit and 64-bit systems.

On systems with a 32-bit time_t, this is the only option you
have anyway, and some other file systems (ext2/3/4, xfs, ...)
made the same decision in order to behave in a consistent way
independent of what kernel (32-bit or 64-bit) you use. This
is generally a reasonable goal, but it means that you get the
overflow in 2038 rather than 2106.

Alex Elder changed the cephs behavior in 2013 to be the same
way, but from the changelog c3f56102f28d ("libceph: validate
timespec conversions"), I guess this was not intentional, as
he was also adding a comparison against U32_MAX, which should
have been S32_MAX.

A lot of other file systems (jfs, jffs2, hpfs, minix) apparently
prefer the 1970..2106 interpretation of time values.

> The plan is to eventually switch to a 64-bit tv_sec and
> tv_nsec, bump the version on all the structures that contain it and add
> a cluster-wide feature bit to deal with older clients.  We've recently had
> a discussion about this, so it may even happen in a not so distant future, but
> no promises 

Ok. We have a (rough) plan to deal with file systems that don't support
extended time stamps in the meantime, so depending on user preferences
we would either allow them to be used as before with times clamped
to the 2038 overflow date, or only mounted readonly for users that want
to ensure their systems can survive without regressions in 2038.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Farnum Feb. 4, 2016, 3:26 p.m. UTC | #8
On Thu, Feb 4, 2016 at 5:31 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 04 February 2016 10:01:31 Ilya Dryomov wrote:
>> On Thu, Feb 4, 2016 at 9:30 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Thursday 04 February 2016 10:00:19 Yan, Zheng wrote:
>> >> > On Feb 4, 2016, at 05:27, Arnd Bergmann <arnd@arndb.de> wrote:
>> >
>> > static inline void ceph_decode_timespec(struct timespec *ts,
>> >                                         const struct ceph_timespec *tv)
>> > {
>> >         ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
>> >         ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
>> > }
>> >
>> > Is that intentional and documented? If yes, what is your plan to deal
>> > with y2038 support?
>>
>> tv_sec is used as a time_t, so signed.  The problem is that ceph_timespec is
>> not only passed over the wire, but is also stored on disk, part of quite a few
>> other data structures.
>
> That is only part of the issue though:
>
> Most file systems that store a timespec on disk define the function
> differently:
>
> static inline void ceph_decode_timespec(struct timespec *ts,
>                                         const struct ceph_timespec *tv)
> {
>         ts->tv_sec = (time_t)(u32)le32_to_cpu(tv->tv_sec);
>         ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
> }
>
> On systems that have a 64-bit time_t, the 1902..1970 interval
> (0xffffffff80000000..0xffffffffffffffff) and the 2038..2106
> interval (0x0000000080000000..0x00000000ffffffff) are written
> as the same 32-bit numbers, so when reading back you have to
> decide which interpretation you want, and your cast to
> __kernel_time_t means that you get the first representation on
> both 32-bit and 64-bit systems.
>
> On systems with a 32-bit time_t, this is the only option you
> have anyway, and some other file systems (ext2/3/4, xfs, ...)
> made the same decision in order to behave in a consistent way
> independent of what kernel (32-bit or 64-bit) you use. This
> is generally a reasonable goal, but it means that you get the
> overflow in 2038 rather than 2106.
>
> Alex Elder changed the cephs behavior in 2013 to be the same
> way, but from the changelog c3f56102f28d ("libceph: validate
> timespec conversions"), I guess this was not intentional, as
> he was also adding a comparison against U32_MAX, which should
> have been S32_MAX.
>
> A lot of other file systems (jfs, jffs2, hpfs, minix) apparently
> prefer the 1970..2106 interpretation of time values.
>
>> The plan is to eventually switch to a 64-bit tv_sec and
>> tv_nsec, bump the version on all the structures that contain it and add
>> a cluster-wide feature bit to deal with older clients.  We've recently had
>> a discussion about this, so it may even happen in a not so distant future, but
>> no promises
>
> Ok. We have a (rough) plan to deal with file systems that don't support
> extended time stamps in the meantime, so depending on user preferences
> we would either allow them to be used as before with times clamped
> to the 2038 overflow date, or only mounted readonly for users that want
> to ensure their systems can survive without regressions in 2038.

I dug up the email conversation, about it, although I think Adam has
done more work than it indicates:
http://www.spinics.net/lists/ceph-devel/msg27900.html. I can't speak
to any kernel-specific issues but this kind of transition while
maintaining wire compatibility with older code is something we've done
a lot; it shouldn't be a big deal even in the kernel where we're
slightly less prolific with such things. :)
-Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 4, 2016, 9:02 p.m. UTC | #9
On Thursday 04 February 2016 07:26:51 Gregory Farnum wrote:
> On Thu, Feb 4, 2016 at 5:31 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 04 February 2016 10:01:31 Ilya Dryomov wrote:
> >> On Thu, Feb 4, 2016 at 9:30 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > A lot of other file systems (jfs, jffs2, hpfs, minix) apparently
> > prefer the 1970..2106 interpretation of time values.
> >
> >> The plan is to eventually switch to a 64-bit tv_sec and
> >> tv_nsec, bump the version on all the structures that contain it and add
> >> a cluster-wide feature bit to deal with older clients.  We've recently had
> >> a discussion about this, so it may even happen in a not so distant future, but
> >> no promises
> >
> > Ok. We have a (rough) plan to deal with file systems that don't support
> > extended time stamps in the meantime, so depending on user preferences
> > we would either allow them to be used as before with times clamped
> > to the 2038 overflow date, or only mounted readonly for users that want
> > to ensure their systems can survive without regressions in 2038.
> 
> I dug up the email conversation, about it, although I think Adam has
> done more work than it indicates:
> http://www.spinics.net/lists/ceph-devel/msg27900.html. I can't speak
> to any kernel-specific issues but this kind of transition while
> maintaining wire compatibility with older code is something we've done
> a lot; it shouldn't be a big deal even in the kernel where we're
> slightly less prolific with such things. 

On the kernel side, the interesting part is to figure out whether
the other end can support the new format or not, and setting the limit
in the superblock accordingly. Once you have determined that both
sides support the extended timestamps, sending a timestamp beyond 2038
must not fail or cause incorrect data.

On the wire protocol, you could consider extending the timestamps in
the same way as ext4, as you already have nanosecond timestamps, and
you can use the upper two bits of the nanoseconds to extend the seconds
field to 34 bits, giving you a range of valid times between 1902
and 2446, though if you have to make an incompatible change anyway,
going to 64 bit is easier.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index e7b130a..348b22e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1721,7 +1721,7 @@  ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
 	init_completion(&req->r_safe_completion);
 	INIT_LIST_HEAD(&req->r_unsafe_item);
 
-	req->r_stamp = CURRENT_TIME;
+	ktime_get_real_ts(&req->r_stamp);
 
 	req->r_op = op;
 	req->r_direct_mode = mode;