include: ceph: Fix encode and decode type conversions

Message ID	1455505313-1298-1-git-send-email-deepa.kernel@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <ceph-devel-owner@kernel.org> From: Deepa Dinamani <deepa.kernel@gmail.com> To: "Yan, Zheng" <zyan@redhat.com>, Sage Weil <sage@redhat.com>, Ilya Dryomov <idryomov@gmail.com>, ceph-devel@vger.kernel.org Cc: arnd@arndb.de, y2038@lists.linaro.org Subject: [PATCH] include: ceph: Fix encode and decode type conversions Date: Sun, 14 Feb 2016 19:01:53 -0800 Message-Id: <1455505313-1298-1-git-send-email-deepa.kernel@gmail.com> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk

Deepa Dinamani Feb. 15, 2016, 3:01 a.m. UTC

long/ kernel_time_t is 32 bit on a 32 bit system and
64 bit on a 64 bit system.

ceph_encode_timespec() encodes only the lower 32 bits on
a 64 bit system and encodes all of 32 bits on a 32bit
system.

ceph_decode_timespec() decodes 32 bit tv_sec and tv_nsec
into kernel_time_t/ long.

The encode and decode functions do not match when the
values are negative:

Consider the following scenario on a 32 bit system:
When a negative number is cast to u32 as encode does, the
value is positive and is greater than INT_MAX. Decode reads
back this value. And, this value cannot be represented by
long on 32 bit systems. So by section 6.3.1.3 of the
C99 standard, the result is implementation defined.

Consider the following scenario on a 64 bit system:
When a negative number is cast to u32 as encode does, the
value is positive. This value is later assigned by decode
function by a cast to long. Since this value can be
represented in long data type, this becomes a positive
value greater than INT_MAX. But, the value encoded was
negative, so the encode and decode functions do not match.

Change the decode function as follows to overcome the above
bug:
The decode should first cast the value to a s64 this will
be positive value greater than INT_MAX(in case of a negative
encoded value)and then cast this value again as s32, which
drops the higher order 32 bits.
On 32 bit systems, this is the right value in kernel_time_t/
long.
On 64 bit systems, assignment to kernel_time_t/ long
will sign extend this value to reflect the signed bit encoded.

Assume ceph timestamp ranges permitted are 1902..2038.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---
 include/linux/ceph/decode.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Yan, Zheng Feb. 15, 2016, 9:04 a.m. UTC | #1

> On Feb 15, 2016, at 11:01, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> 
> long/ kernel_time_t is 32 bit on a 32 bit system and
> 64 bit on a 64 bit system.
> 
> ceph_encode_timespec() encodes only the lower 32 bits on
> a 64 bit system and encodes all of 32 bits on a 32bit
> system.
> 
> ceph_decode_timespec() decodes 32 bit tv_sec and tv_nsec
> into kernel_time_t/ long.
> 
> The encode and decode functions do not match when the
> values are negative:
> 
> Consider the following scenario on a 32 bit system:
> When a negative number is cast to u32 as encode does, the
> value is positive and is greater than INT_MAX. Decode reads
> back this value. And, this value cannot be represented by
> long on 32 bit systems. So by section 6.3.1.3 of the
> C99 standard, the result is implementation defined.
> 
> Consider the following scenario on a 64 bit system:
> When a negative number is cast to u32 as encode does, the
> value is positive. This value is later assigned by decode
> function by a cast to long. Since this value can be
> represented in long data type, this becomes a positive
> value greater than INT_MAX. But, the value encoded was
> negative, so the encode and decode functions do not match.
> 
> Change the decode function as follows to overcome the above
> bug:
> The decode should first cast the value to a s64 this will
> be positive value greater than INT_MAX(in case of a negative
> encoded value)and then cast this value again as s32, which
> drops the higher order 32 bits.
> On 32 bit systems, this is the right value in kernel_time_t/
> long.
> On 64 bit systems, assignment to kernel_time_t/ long
> will sign extend this value to reflect the signed bit encoded.
> 
> Assume ceph timestamp ranges permitted are 1902..2038.
> 
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> ---
> include/linux/ceph/decode.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h
> index a6ef9cc..e777e99 100644
> --- a/include/linux/ceph/decode.h
> +++ b/include/linux/ceph/decode.h
> @@ -137,8 +137,8 @@ bad:
> static inline void ceph_decode_timespec(struct timespec *ts,
> 					const struct ceph_timespec *tv)
> {
> -	ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
> -	ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
> +	ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
> +	ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
> }
> static inline void ceph_encode_timespec(struct ceph_timespec *tv,
> 					const struct timespec *ts)

Applied, thanks

Yan, Zheng

> -- 
> 1.9.1
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Feb. 15, 2016, 11:17 a.m. UTC | #2

On Sunday 14 February 2016 19:01:53 Deepa Dinamani wrote:
> long/ kernel_time_t is 32 bit on a 32 bit system and
> 64 bit on a 64 bit system.
> 
> ceph_encode_timespec() encodes only the lower 32 bits on
> a 64 bit system and encodes all of 32 bits on a 32bit
> system.
> 
> ceph_decode_timespec() decodes 32 bit tv_sec and tv_nsec
> into kernel_time_t/ long.
> 
> The encode and decode functions do not match when the
> values are negative:
> 
> Consider the following scenario on a 32 bit system:
> When a negative number is cast to u32 as encode does, the
> value is positive and is greater than INT_MAX. Decode reads
> back this value. And, this value cannot be represented by
> long on 32 bit systems. So by section 6.3.1.3 of the
> C99 standard, the result is implementation defined.
> 
> Consider the following scenario on a 64 bit system:
> When a negative number is cast to u32 as encode does, the
> value is positive. This value is later assigned by decode
> function by a cast to long. Since this value can be
> represented in long data type, this becomes a positive
> value greater than INT_MAX. But, the value encoded was
> negative, so the encode and decode functions do not match.
> 
> Change the decode function as follows to overcome the above
> bug:
> The decode should first cast the value to a s64 this will
> be positive value greater than INT_MAX(in case of a negative
> encoded value)and then cast this value again as s32, which
> drops the higher order 32 bits.
> On 32 bit systems, this is the right value in kernel_time_t/
> long.
> On 64 bit systems, assignment to kernel_time_t/ long
> will sign extend this value to reflect the signed bit encoded.
> 
> Assume ceph timestamp ranges permitted are 1902..2038.
> 
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> ---

Actually I was thinking we'd do the opposite and document the
existing 64-bit behavior, and then make sure we do it the same
on 32-bit machines once we have the new syscalls.

The most important part of course is to document what the file
system is expected to do. Having this in the changelog is important
here, but I'd also like to see a comment in the code to ensure
readers can see that the behavior is intentional.
 
> diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h
> index a6ef9cc..e777e99 100644
> --- a/include/linux/ceph/decode.h
> +++ b/include/linux/ceph/decode.h
> @@ -137,8 +137,8 @@ bad:
>  static inline void ceph_decode_timespec(struct timespec *ts,
>                                         const struct ceph_timespec *tv)
>  {
> -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
> -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
> +       ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
> +       ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
>  }

Why did you choose to express this as "(s32)(s64)..." rather than
"(s64)(s32)..."? The result is the same (just the s32 cast by itself
would be sufficient), I just don't see yet how it clarifies what is
going on.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilya Dryomov Feb. 15, 2016, 12:37 p.m. UTC | #3

On Mon, Feb 15, 2016 at 4:01 AM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> long/ kernel_time_t is 32 bit on a 32 bit system and
> 64 bit on a 64 bit system.
>
> ceph_encode_timespec() encodes only the lower 32 bits on
> a 64 bit system and encodes all of 32 bits on a 32bit
> system.
>
> ceph_decode_timespec() decodes 32 bit tv_sec and tv_nsec
> into kernel_time_t/ long.
>
> The encode and decode functions do not match when the
> values are negative:
>
> Consider the following scenario on a 32 bit system:
> When a negative number is cast to u32 as encode does, the
> value is positive and is greater than INT_MAX. Decode reads
> back this value. And, this value cannot be represented by
> long on 32 bit systems. So by section 6.3.1.3 of the
> C99 standard, the result is implementation defined.
>
> Consider the following scenario on a 64 bit system:
> When a negative number is cast to u32 as encode does, the
> value is positive. This value is later assigned by decode
> function by a cast to long. Since this value can be
> represented in long data type, this becomes a positive
> value greater than INT_MAX. But, the value encoded was
> negative, so the encode and decode functions do not match.
>
> Change the decode function as follows to overcome the above
> bug:
> The decode should first cast the value to a s64 this will
> be positive value greater than INT_MAX(in case of a negative
> encoded value)and then cast this value again as s32, which
> drops the higher order 32 bits.
> On 32 bit systems, this is the right value in kernel_time_t/
> long.
> On 64 bit systems, assignment to kernel_time_t/ long
> will sign extend this value to reflect the signed bit encoded.
>
> Assume ceph timestamp ranges permitted are 1902..2038.
>
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
> ---
>  include/linux/ceph/decode.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h
> index a6ef9cc..e777e99 100644
> --- a/include/linux/ceph/decode.h
> +++ b/include/linux/ceph/decode.h
> @@ -137,8 +137,8 @@ bad:
>  static inline void ceph_decode_timespec(struct timespec *ts,
>                                         const struct ceph_timespec *tv)
>  {
> -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
> -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
> +       ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
> +       ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
>  }
>  static inline void ceph_encode_timespec(struct ceph_timespec *tv,
>                                         const struct timespec *ts)

I think you are forgetting that this is a wire protocol.  Changing how
values are interpreted on one side without looking at the other side is
generally not what you want to do.

There aren't any casts on the server side: __u32 is simply assigned to
time_t, meaning no sign-extension is happening - see src/include/utime.h
in ceph.git.  The current __kernel_time_t and long casts on the kernel
side are meaningless - they don't change the bit pattern, so everything
is interpreted the same way.  Your patch changes that.

If there is anything to be done here, it is documenting the existing
behavior.  This isn't to say that there aren't any quirks or bugs in
our time handling, it's that any changes concerning the protocol or how
the actual bits are interpreted should follow or align with the rest of
ceph and have a note in the changelog on that.

(As Greg and I mentioned before, we do have a semi-concrete plan on how
to deal with y2038 in ceph, just no timetable yet.)

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Deepa Dinamani Feb. 16, 2016, 5:02 a.m. UTC | #4

> Actually I was thinking we'd do the opposite and document the
> existing 64-bit behavior, and then make sure we do it the same
> on 32-bit machines once we have the new syscalls.

I'm not sure I follow what is opposite here.
You just want to document the behavior and fix it later when the time
range is extended beyond 2038?

> The most important part of course is to document what the file
> system is expected to do. Having this in the changelog is important
> here, but I'd also like to see a comment in the code to ensure
> readers can see that the behavior is intentional.

Will add this comment in the code.

>> diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h
>> index a6ef9cc..e777e99 100644
>> --- a/include/linux/ceph/decode.h
>> +++ b/include/linux/ceph/decode.h
>> @@ -137,8 +137,8 @@ bad:
>>  static inline void ceph_decode_timespec(struct timespec *ts,
>>                                         const struct ceph_timespec *tv)
>>  {
>> -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
>> -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
>> +       ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
>> +       ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
>>  }
>
> Why did you choose to express this as "(s32)(s64)..." rather than
> "(s64)(s32)..."? The result is the same (just the s32 cast by itself
> would be sufficient), I just don't see yet how it clarifies what is
> going on.

Let's say we encode -1.
so when you pass the same value to decode, ceph_timespec is {0xFFFFFFFF,0}.
0xFFFFFFFF is greater than INT_MAX. And, according to c99 the result
is implementation dependent if this cast to a s32.

Quoting from http://c0x.coding-guidelines.com/6.3.1.3.html :
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the newtype, it
is unchanged.
2 Otherwise, if the newtype is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the newtype until the value is in the range of
the newtype.49)
3 Otherwise, the newtype is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised

GCC does guarantee the behavior.
Quoting from GCC manual
(https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html):
The result of, or the signal raised by, converting an integer to a
signed integer type when the value cannot be represented in an object
of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N
to be within range of the type; no signal is raised.

But, as the C standard suggests, behavior is implementation dependent.
This is why I cast it to s64 before casting it to s32.
I explained in the commit text, but missed GCC part.
Quoting from commit text:

>> Consider the following scenario on a 32 bit system:
>> When a negative number is cast to u32 as encode does, the
>> value is positive and is greater than INT_MAX. Decode reads
>> back this value. And, this value cannot be represented by
>> long on 32 bit systems. So by section 6.3.1.3 of the
>> C99 standard, the result is implementation defined.
>>
>> Consider the following scenario on a 64 bit system:
>> When a negative number is cast to u32 as encode does, the
>> value is positive. This value is later assigned by decode
>> function by a cast to long. Since this value can be
>> represented in long data type, this becomes a positive
>> value greater than INT_MAX. But, the value encoded was
>> negative, so the encode and decode functions do not match.
>>
>> Change the decode function as follows to overcome the above
>> bug:
>> The decode should first cast the value to a s64 this will
>> be positive value greater than INT_MAX(in case of a negative
>> encoded value)and then cast this value again as s32, which
>> drops the higher order 32 bits.
>> On 32 bit systems, this is the right value in kernel_time_t/
>> long.
>> On 64 bit systems, assignment to kernel_time_t/ long
>> will sign extend this value to reflect the signed bit encoded

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Feb. 16, 2016, 8:56 a.m. UTC | #5

On Monday 15 February 2016 21:02:33 Deepa Dinamani wrote:
> > Actually I was thinking we'd do the opposite and document the
> > existing 64-bit behavior, and then make sure we do it the same
> > on 32-bit machines once we have the new syscalls.
> 
> I'm not sure I follow what is opposite here.
> You just want to document the behavior and fix it later when the time
> range is extended beyond 2038?

What I meant was that rather than changing the returned range from
1970..2106 to 1902..2038, I would make the current behavior explicit
and document that it is correct.

> >>  static inline void ceph_decode_timespec(struct timespec *ts,
> >>                                         const struct ceph_timespec *tv)
> >>  {
> >> -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
> >> -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
> >> +       ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
> >> +       ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
> >>  }
> >
> > Why did you choose to express this as "(s32)(s64)..." rather than
> > "(s64)(s32)..."? The result is the same (just the s32 cast by itself
> > would be sufficient), I just don't see yet how it clarifies what is
> > going on.
> 
> Let's say we encode -1.
> so when you pass the same value to decode, ceph_timespec is {0xFFFFFFFF,0}.
> 0xFFFFFFFF is greater than INT_MAX. And, according to c99 the result
> is implementation dependent if this cast to a s32.
> 
> Quoting from http://c0x.coding-guidelines.com/6.3.1.3.html :
> 6.3.1.3 Signed and unsigned integers
> 1 When a value with integer type is converted to another integer type
> other than _Bool, if the value can be represented by the newtype, it
> is unchanged.
> 2 Otherwise, if the newtype is unsigned, the value is converted by
> repeatedly adding or subtracting one more than the maximum value that
> can be represented in the newtype until the value is in the range of
> the newtype.49)
> 3 Otherwise, the newtype is signed and the value cannot be represented
> in it; either the result is implementation-defined or an
> implementation-defined signal is raised
> 
> GCC does guarantee the behavior.
> Quoting from GCC manual
> (https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html):
> The result of, or the signal raised by, converting an integer to a
> signed integer type when the value cannot be represented in an object
> of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
> For conversion to a type of width N, the value is reduced modulo 2^N
> to be within range of the type; no signal is raised.
> 
> But, as the C standard suggests, behavior is implementation dependent.
> This is why I cast it to s64 before casting it to s32.
> I explained in the commit text, but missed GCC part.

Ok, fair enough. I always assumed that "implementation specific"
in this context meant that on any architectures that implement
negative numbers as twos-complement (anything that Linux runs on)
we get the expected result. I'm sure we rely on that behavior in
a lot of places in the kernel, but you are correct that the C
standard makes your code well-defined, and the other one not.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Deepa Dinamani Feb. 18, 2016, 6:02 a.m. UTC | #6

> I think you are forgetting that this is a wire protocol.  Changing how
> values are interpreted on one side without looking at the other side is
> generally not what you want to do.
>
> There aren't any casts on the server side: __u32 is simply assigned to
> time_t, meaning no sign-extension is happening - see src/include/utime.h
> in ceph.git.  The current __kernel_time_t and long casts on the kernel
> side are meaningless - they don't change the bit pattern, so everything
> is interpreted the same way.  Your patch changes that.
>
> If there is anything to be done here, it is documenting the existing
> behavior.  This isn't to say that there aren't any quirks or bugs in
> our time handling, it's that any changes concerning the protocol or how
> the actual bits are interpreted should follow or align with the rest of
> ceph and have a note in the changelog on that.
>
> (As Greg and I mentioned before, we do have a semi-concrete plan on how
> to deal with y2038 in ceph, just no timetable yet.)

I checked the server side code.
So the server side also reads whatever is sent over wire as u32 and
then assigns it to a time_t.

Do you have any documentation about the wire protocol?

To me, the wire protocol seems to have been designed to support only
positive time values(u32).
This means ceph can represent times from 1970 - 2106 as long as both
server and client are 64 bit machines.
Or, client is 64 bit and server does not operate on time values.
And, on 32 bit machines implementation will be broken in 2038.

I will change back the above code to not use any casts(as they are a
no-op) as on the server side and note the above
things in the file and changelog:

 static inline void ceph_decode_timespec(struct timespec *ts,
                                         const struct ceph_timespec *tv)
 {
 -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
 -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
 +       ts->tv_sec = le32_to_cpu(tv->tv_sec);
 +       ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
  }

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Feb. 18, 2016, 9:17 a.m. UTC | #7

On Wednesday 17 February 2016 22:02:11 Deepa Dinamani wrote:
> 
> I checked the server side code.
> So the server side also reads whatever is sent over wire as u32 and
> then assigns it to a time_t.

I think the more important question is whether the server actually
does anything with the timestamp other than pass it back to the
client. If not, the interpretation it totally up to the client
and the type on the server has no meaning (as long as it can store
at least as many bits as the wire protocol, and passes all bits
back the same way).

> Do you have any documentation about the wire protocol?
> 
> To me, the wire protocol seems to have been designed to support only
> positive time values(u32).

I think assigning to a time_t usually just means that this wasn't
completely thought through when it got implemented (which we already
know from the fact that it started out using a 32-bit number from
the time). It's totally possible that this code originated on a 32-bit
machine and was meant to encode a signed number and that the behavior
changed without anyone noticing when it got ported to a 64-bit
architecture for the first time.

> This means ceph can represent times from 1970 - 2106 as long as both
> server and client are 64 bit machines.
> Or, client is 64 bit and server does not operate on time values.
> And, on 32 bit machines implementation will be broken in 2038.

> I will change back the above code to not use any casts(as they are a
> no-op) as on the server side and note the above
> things in the file and changelog:
> 
>  static inline void ceph_decode_timespec(struct timespec *ts,
>                                          const struct ceph_timespec *tv)
>  {
>  -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
>  -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
>  +       ts->tv_sec = le32_to_cpu(tv->tv_sec);
>  +       ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
>   }

Ok. I really like to have an explicit range-extension as well, but
as you say, your patch above does not change behavior.

I would write this as 

	ts->tv_sec = (__u64)le32_to_cpu(tv->tv_sec);
	ts->tv_nsec = le32_to_cpu(tv->tv_nsec);

which is again the same thing, but leaves less ambiguity.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Deepa Dinamani Feb. 18, 2016, 9:35 a.m. UTC | #8

On Thu, Feb 18, 2016 at 1:17 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 17 February 2016 22:02:11 Deepa Dinamani wrote:
>>
>> I checked the server side code.
>> So the server side also reads whatever is sent over wire as u32 and
>> then assigns it to a time_t.
>
> I think the more important question is whether the server actually
> does anything with the timestamp other than pass it back to the
> client. If not, the interpretation it totally up to the client
> and the type on the server has no meaning (as long as it can store
> at least as many bits as the wire protocol, and passes all bits
> back the same way).

Timestamps do get interpreted and written to on the server according to the
files in the mds directory.

>> Do you have any documentation about the wire protocol?
>>
>> To me, the wire protocol seems to have been designed to support only
>> positive time values(u32).
>
> I think assigning to a time_t usually just means that this wasn't
> completely thought through when it got implemented (which we already
> know from the fact that it started out using a 32-bit number from
> the time). It's totally possible that this code originated on a 32-bit
> machine and was meant to encode a signed number and that the behavior
> changed without anyone noticing when it got ported to a 64-bit
> architecture for the first time.
>
>> This means ceph can represent times from 1970 - 2106 as long as both
>> server and client are 64 bit machines.
>> Or, client is 64 bit and server does not operate on time values.
>> And, on 32 bit machines implementation will be broken in 2038.
>
>> I will change back the above code to not use any casts(as they are a
>> no-op) as on the server side and note the above
>> things in the file and changelog:
>>
>>  static inline void ceph_decode_timespec(struct timespec *ts,
>>                                          const struct ceph_timespec *tv)
>>  {
>>  -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
>>  -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
>>  +       ts->tv_sec = le32_to_cpu(tv->tv_sec);
>>  +       ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
>>   }
>
> Ok. I really like to have an explicit range-extension as well, but
> as you say, your patch above does not change behavior.

You mean change the functions and data types on both sides?
Not sure if this is meant for me or the Ceph team.

> I would write this as
>
>         ts->tv_sec = (__u64)le32_to_cpu(tv->tv_sec);
>         ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
>
> which is again the same thing, but leaves less ambiguity.

Will do.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Feb. 18, 2016, 11:29 a.m. UTC | #9

On Thursday 18 February 2016 01:35:25 Deepa Dinamani wrote:
> On Thu, Feb 18, 2016 at 1:17 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Wednesday 17 February 2016 22:02:11 Deepa Dinamani wrote:
> >>
> >> I checked the server side code.
> >> So the server side also reads whatever is sent over wire as u32 and
> >> then assigns it to a time_t.
> >
> > I think the more important question is whether the server actually
> > does anything with the timestamp other than pass it back to the
> > client. If not, the interpretation it totally up to the client
> > and the type on the server has no meaning (as long as it can store
> > at least as many bits as the wire protocol, and passes all bits
> > back the same way).
> 
> Timestamps do get interpreted and written to on the server according to the
> files in the mds directory.

Ok. I see they are stored internally as a __u32/__u32 pair in "class utime_t",
I just couldn't figure out whether this is used on the server at all, or
stored in a timespec or time_t.

> >> Do you have any documentation about the wire protocol?
> >>
> >> To me, the wire protocol seems to have been designed to support only
> >> positive time values(u32).
> >
> > I think assigning to a time_t usually just means that this wasn't
> > completely thought through when it got implemented (which we already
> > know from the fact that it started out using a 32-bit number from
> > the time). It's totally possible that this code originated on a 32-bit
> > machine and was meant to encode a signed number and that the behavior
> > changed without anyone noticing when it got ported to a 64-bit
> > architecture for the first time.
> >
> >> This means ceph can represent times from 1970 - 2106 as long as both
> >> server and client are 64 bit machines.
> >> Or, client is 64 bit and server does not operate on time values.
> >> And, on 32 bit machines implementation will be broken in 2038.
> >
> >> I will change back the above code to not use any casts(as they are a
> >> no-op) as on the server side and note the above
> >> things in the file and changelog:
> >>
> >>  static inline void ceph_decode_timespec(struct timespec *ts,
> >>                                          const struct ceph_timespec *tv)
> >>  {
> >>  -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
> >>  -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
> >>  +       ts->tv_sec = le32_to_cpu(tv->tv_sec);
> >>  +       ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
> >>   }
> >
> > Ok. I really like to have an explicit range-extension as well, but
> > as you say, your patch above does not change behavior.
> 
> You mean change the functions and data types on both sides?
> Not sure if this is meant for me or the Ceph team.

I meant it should be part of your patch.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilya Dryomov Feb. 18, 2016, 4:02 p.m. UTC | #10

On Thu, Feb 18, 2016 at 10:35 AM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> On Thu, Feb 18, 2016 at 1:17 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Wednesday 17 February 2016 22:02:11 Deepa Dinamani wrote:
>>>
>>> I checked the server side code.
>>> So the server side also reads whatever is sent over wire as u32 and
>>> then assigns it to a time_t.
>>
>> I think the more important question is whether the server actually
>> does anything with the timestamp other than pass it back to the
>> client. If not, the interpretation it totally up to the client
>> and the type on the server has no meaning (as long as it can store
>> at least as many bits as the wire protocol, and passes all bits
>> back the same way).
>
> Timestamps do get interpreted and written to on the server according to the
> files in the mds directory.
>
>>> Do you have any documentation about the wire protocol?
>>>
>>> To me, the wire protocol seems to have been designed to support only
>>> positive time values(u32).
>>
>> I think assigning to a time_t usually just means that this wasn't
>> completely thought through when it got implemented (which we already
>> know from the fact that it started out using a 32-bit number from
>> the time). It's totally possible that this code originated on a 32-bit
>> machine and was meant to encode a signed number and that the behavior
>> changed without anyone noticing when it got ported to a 64-bit
>> architecture for the first time.
>>
>>> This means ceph can represent times from 1970 - 2106 as long as both
>>> server and client are 64 bit machines.
>>> Or, client is 64 bit and server does not operate on time values.
>>> And, on 32 bit machines implementation will be broken in 2038.
>>
>>> I will change back the above code to not use any casts(as they are a
>>> no-op) as on the server side and note the above
>>> things in the file and changelog:
>>>
>>>  static inline void ceph_decode_timespec(struct timespec *ts,
>>>                                          const struct ceph_timespec *tv)
>>>  {
>>>  -       ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
>>>  -       ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
>>>  +       ts->tv_sec = le32_to_cpu(tv->tv_sec);
>>>  +       ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
>>>   }
>>
>> Ok. I really like to have an explicit range-extension as well, but
>> as you say, your patch above does not change behavior.
>
> You mean change the functions and data types on both sides?
> Not sure if this is meant for me or the Ceph team.
>
>> I would write this as
>>
>>         ts->tv_sec = (__u64)le32_to_cpu(tv->tv_sec);
>>         ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
>>
>> which is again the same thing, but leaves less ambiguity.
>
> Will do.

If we are going to touch this function at all, let's drop all casts and
just add a comment along the lines of "the lack of cast/sign-extension
here is intentional, this function has a counterpart on the server side
which also lacks cast/sign-extension, etc".  To someone who is not up
on these subtleties, the __u64 cast is going to be more confusing than
explicit...

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Feb. 18, 2016, 4:20 p.m. UTC | #11

On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
> 
> If we are going to touch this function at all, let's drop all casts and
> just add a comment along the lines of "the lack of cast/sign-extension
> here is intentional, this function has a counterpart on the server side
> which also lacks cast/sign-extension, etc".  To someone who is not up
> on these subtleties, the __u64 cast is going to be more confusing than
> explicit...

We certainly a comment, I was just suggesting to add the cast as well.

Again, it's not the lack of the cast on the server that is important
here, it's whether it gets interpreted as signed or unsigned at the
point where the timestamp is actually used for comparison or
printing. Let me suggest a comment here:

/*
 * ceph timestamps are unsigned 32-bit starting at 1970, which is
 * different from a signed 32-bit or 64-bit time_t. On 64-bit
 * architectures, this gets interpreted as a subset of time_t
 * in the range from 1970 to 2106.
 * Machines with a 32-bit time_t will incorrectly interpret the
 * timestamps with years 2038-2106 as negative numbers in the
 * 1902-1969 range, until both kernel and glibc are change to
 * using 64-bit time_t.
 */

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilya Dryomov Feb. 18, 2016, 5:22 p.m. UTC | #12

On Thu, Feb 18, 2016 at 5:20 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
>>
>> If we are going to touch this function at all, let's drop all casts and
>> just add a comment along the lines of "the lack of cast/sign-extension
>> here is intentional, this function has a counterpart on the server side
>> which also lacks cast/sign-extension, etc".  To someone who is not up
>> on these subtleties, the __u64 cast is going to be more confusing than
>> explicit...
>
> We certainly a comment, I was just suggesting to add the cast as well.
>
> Again, it's not the lack of the cast on the server that is important
> here, it's whether it gets interpreted as signed or unsigned at the
> point where the timestamp is actually used for comparison or
> printing. Let me suggest a comment here:

I'd argue it's important to have encode/decode code look the same.

I mentioned before that at least in some places it's interpreted as
signed and that the interpretation is not totally up to the client.
The printing (at least some of it) is right there in the same file
(src/include/utime.h) and you can see ... + 1900.

>
> /*
>  * ceph timestamps are unsigned 32-bit starting at 1970, which is
>  * different from a signed 32-bit or 64-bit time_t. On 64-bit
>  * architectures, this gets interpreted as a subset of time_t
>  * in the range from 1970 to 2106.
>  * Machines with a 32-bit time_t will incorrectly interpret the
>  * timestamps with years 2038-2106 as negative numbers in the
>  * 1902-1969 range, until both kernel and glibc are change to
>  * using 64-bit time_t.
>  */

I think a more accurate description would be "ceph timestamps are
signed 32-bit, with the caveat that something somewhere is likely to
break on a negative timestamp".

We are not looking at trying to reuse that extra bit for 1970..2106.
As I said earlier, utime_t is used in quite a lot of places throughout
the code base and is part of both in-core and on-disk data structures.
Auditing all those sites for that is probably never going to happen.
I think the plan is to live with what we've got until a proper 64-bit
sec/nsec conversion is done in a way that was described earlier
(message versions, ceph feature bit, etc).  Any vfs timespec-handling
changes should simply preserve the existing behavior, for now.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Feb. 19, 2016, 10 a.m. UTC | #13

On Thursday 18 February 2016 18:22:55 Ilya Dryomov wrote:
> On Thu, Feb 18, 2016 at 5:20 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
> >>
> >> If we are going to touch this function at all, let's drop all casts and
> >> just add a comment along the lines of "the lack of cast/sign-extension
> >> here is intentional, this function has a counterpart on the server side
> >> which also lacks cast/sign-extension, etc".  To someone who is not up
> >> on these subtleties, the __u64 cast is going to be more confusing than
> >> explicit...
> >
> > We certainly a comment, I was just suggesting to add the cast as well.
> >
> > Again, it's not the lack of the cast on the server that is important
> > here, it's whether it gets interpreted as signed or unsigned at the
> > point where the timestamp is actually used for comparison or
> > printing. Let me suggest a comment here:
> 
> I'd argue it's important to have encode/decode code look the same.
> 
> I mentioned before that at least in some places it's interpreted as
> signed and that the interpretation is not totally up to the client.
> The printing (at least some of it) is right there in the same file
> (src/include/utime.h) and you can see ... + 1900.

I see this code:

class utime_t {
  struct {
    __u32 tv_sec, tv_nsec;
  } tv;

  time_t        sec()  const { return tv.tv_sec; }

  ostream& gmtime(ostream& out) const {
    out.setf(std::ios::right);
    char oldfill = out.fill();
    out.fill('0');
    if (sec() < ((time_t)(60*60*24*365*10))) {
      // raw seconds.  this looks like a relative time.
      out << (long)sec() << "." << std::setw(6) << usec();
    } else {
      // localtime.  this looks like an absolute time.
      //  aim for http://en.wikipedia.org/wiki/ISO_8601
      struct tm bdt;
      time_t tt = sec();
      gmtime_r(&tt, &bdt);
      out << std::setw(4) << (bdt.tm_year+1900)  // 2007 -> '07'
          << '-' << std::setw(2) << (bdt.tm_mon+1)
          << '-' << std::setw(2) << bdt.tm_mday
          << ' '
          << std::setw(2) << bdt.tm_hour
          << ':' << std::setw(2) << bdt.tm_min
          << ':' << std::setw(2) << bdt.tm_sec;
      out << "." << std::setw(6) << usec();
      out << "Z";
    }
    out.fill(oldfill);
    out.unsetf(std::ios::right);
    return out;
  }

which interprets the time roughly in the way I explained:

* On 64-bit architectures, the time is interpreted as an unsigned
  number in the range from 1980-2106, while any times between 1970
  and 1980 are interpreted as a relative time and printed as
  a positive seconds number.

* On 32-bit architectures, times in the range 1980-2038 are printed
  as a date like 64-bit does, times from 1970-1980 are also printed
  as a positive seconds number, and anything with the top bit set
  is printed as a negative seconds number.

It this the intended behavior? I guess we could change the kernel
to reject any timestamps before 1980 and after 2038 in
ceph_decode_timespec and just set the Linux timestamp to (time_t)0
(Jan 1 1970) in that case, and document that there is no valid
interpretation for those.

> >  * ceph timestamps are unsigned 32-bit starting at 1970, which is
> >  * different from a signed 32-bit or 64-bit time_t. On 64-bit
> >  * architectures, this gets interpreted as a subset of time_t
> >  * in the range from 1970 to 2106.
> >  * Machines with a 32-bit time_t will incorrectly interpret the
> >  * timestamps with years 2038-2106 as negative numbers in the
> >  * 1902-1969 range, until both kernel and glibc are change to
> >  * using 64-bit time_t.
> >  */
> 
> I think a more accurate description would be "ceph timestamps are
> signed 32-bit, with the caveat that something somewhere is likely to
> break on a negative timestamp".

This is the opposite of what you said previously, citing:

| There aren't any casts on the server side: __u32 is simply assigned to
| time_t, meaning no sign-extension is happening

which I read as meaning you are using it as an unsigned number,
because of the way the assignment operator works when you assign
an unsigned 32-bit number to a signed 64-bit number.

> We are not looking at trying to reuse that extra bit for 1970..2106.
> As I said earlier, utime_t is used in quite a lot of places throughout
> the code base and is part of both in-core and on-disk data structures.
> Auditing all those sites for that is probably never going to happen.
> I think the plan is to live with what we've got until a proper 64-bit
> sec/nsec conversion is done in a way that was described earlier
> (message versions, ceph feature bit, etc).  Any vfs timespec-handling
> changes should simply preserve the existing behavior, for now.

And that is a third option: preserve the existing behavior (different
between 32-bit and 64-bit) even when we extend the 32-bit client to
have 64-bit timestamps. I can't believe that is what you actually meant
here but if you did, that would probably be the worst solution.

Please just make up your mind and say which behavior you want the
kernel client to have when dealing with servers that communicate
using 32-bit timestamps:

a) Remain consistent with current 64-bit servers and clients,
   interpreting the number as an unsigned 32-bit integer where
   possible, and remain broken on existing 32-bit machines
   until they start using 64-bit time_t.

b) Change 64-bit clients to behave the same way that that 32-bit
   clients and servers do today, and actually use the times as
   a signed number that you claimed they already did.

c) Interpret all times before 1980 or after 2038 as buggy servers
   and only accept times that are always handled consistently.
   Also fix the server to reject those times coming from broken
   clients.

d) Leave 64-bit clients unchanged (using unsigned timestamps) and
   document that 32-bit clients should keep interpreting them as
   signed even when we fix VFS, to preserve the existing behavior.

I think a), b) and c) are all reasonable, and I can send a patch
as soon as you know what you want. If you want d), please leave
me out of that and do it yourself.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ilya Dryomov Feb. 19, 2016, 11:42 a.m. UTC | #14

On Fri, Feb 19, 2016 at 11:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 18 February 2016 18:22:55 Ilya Dryomov wrote:
>> On Thu, Feb 18, 2016 at 5:20 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
>> >>
>> >> If we are going to touch this function at all, let's drop all casts and
>> >> just add a comment along the lines of "the lack of cast/sign-extension
>> >> here is intentional, this function has a counterpart on the server side
>> >> which also lacks cast/sign-extension, etc".  To someone who is not up
>> >> on these subtleties, the __u64 cast is going to be more confusing than
>> >> explicit...
>> >
>> > We certainly a comment, I was just suggesting to add the cast as well.
>> >
>> > Again, it's not the lack of the cast on the server that is important
>> > here, it's whether it gets interpreted as signed or unsigned at the
>> > point where the timestamp is actually used for comparison or
>> > printing. Let me suggest a comment here:
>>
>> I'd argue it's important to have encode/decode code look the same.
>>
>> I mentioned before that at least in some places it's interpreted as
>> signed and that the interpretation is not totally up to the client.
>> The printing (at least some of it) is right there in the same file
>> (src/include/utime.h) and you can see ... + 1900.
>
> I see this code:
>
> class utime_t {
>   struct {
>     __u32 tv_sec, tv_nsec;
>   } tv;
>
>   time_t        sec()  const { return tv.tv_sec; }
>
>   ostream& gmtime(ostream& out) const {
>     out.setf(std::ios::right);
>     char oldfill = out.fill();
>     out.fill('0');
>     if (sec() < ((time_t)(60*60*24*365*10))) {
>       // raw seconds.  this looks like a relative time.
>       out << (long)sec() << "." << std::setw(6) << usec();
>     } else {
>       // localtime.  this looks like an absolute time.
>       //  aim for http://en.wikipedia.org/wiki/ISO_8601
>       struct tm bdt;
>       time_t tt = sec();
>       gmtime_r(&tt, &bdt);
>       out << std::setw(4) << (bdt.tm_year+1900)  // 2007 -> '07'
>           << '-' << std::setw(2) << (bdt.tm_mon+1)
>           << '-' << std::setw(2) << bdt.tm_mday
>           << ' '
>           << std::setw(2) << bdt.tm_hour
>           << ':' << std::setw(2) << bdt.tm_min
>           << ':' << std::setw(2) << bdt.tm_sec;
>       out << "." << std::setw(6) << usec();
>       out << "Z";
>     }
>     out.fill(oldfill);
>     out.unsetf(std::ios::right);
>     return out;
>   }
>
> which interprets the time roughly in the way I explained:
>
> * On 64-bit architectures, the time is interpreted as an unsigned
>   number in the range from 1980-2106, while any times between 1970
>   and 1980 are interpreted as a relative time and printed as
>   a positive seconds number.
>
> * On 32-bit architectures, times in the range 1980-2038 are printed
>   as a date like 64-bit does, times from 1970-1980 are also printed
>   as a positive seconds number, and anything with the top bit set
>   is printed as a negative seconds number.
>
> It this the intended behavior? I guess we could change the kernel
> to reject any timestamps before 1980 and after 2038 in
> ceph_decode_timespec and just set the Linux timestamp to (time_t)0
> (Jan 1 1970) in that case, and document that there is no valid
> interpretation for those.

At least conceptually, this isn't a bad idea, as only 1970(1980)..2038
works on both architectures; everything else is likely to break
something in ceph either on 32-bit or on both, I think.  But, I'm not
sure if rejecting by setting to 0 is going to work all around and we
can't change just the kernel client.

>
>> >  * ceph timestamps are unsigned 32-bit starting at 1970, which is
>> >  * different from a signed 32-bit or 64-bit time_t. On 64-bit
>> >  * architectures, this gets interpreted as a subset of time_t
>> >  * in the range from 1970 to 2106.
>> >  * Machines with a 32-bit time_t will incorrectly interpret the
>> >  * timestamps with years 2038-2106 as negative numbers in the
>> >  * 1902-1969 range, until both kernel and glibc are change to
>> >  * using 64-bit time_t.
>> >  */
>>
>> I think a more accurate description would be "ceph timestamps are
>> signed 32-bit, with the caveat that something somewhere is likely to
>> break on a negative timestamp".
>
> This is the opposite of what you said previously, citing:
>
> | There aren't any casts on the server side: __u32 is simply assigned to
> | time_t, meaning no sign-extension is happening
>
> which I read as meaning you are using it as an unsigned number,
> because of the way the assignment operator works when you assign
> an unsigned 32-bit number to a signed 64-bit number.

All I was trying to convey throughout this thread is: we can't change
just the kernel client.  We can't add sign-extending casts without
either adding the same casts in ceph.git (which is both servers and
other clients) or explaining why it is OK to change just the kernel
client in the commit message.

>
>> We are not looking at trying to reuse that extra bit for 1970..2106.
>> As I said earlier, utime_t is used in quite a lot of places throughout
>> the code base and is part of both in-core and on-disk data structures.
>> Auditing all those sites for that is probably never going to happen.
>> I think the plan is to live with what we've got until a proper 64-bit
>> sec/nsec conversion is done in a way that was described earlier
>> (message versions, ceph feature bit, etc).  Any vfs timespec-handling
>> changes should simply preserve the existing behavior, for now.
>
> And that is a third option: preserve the existing behavior (different
> between 32-bit and 64-bit) even when we extend the 32-bit client to
> have 64-bit timestamps. I can't believe that is what you actually meant
> here but if you did, that would probably be the worst solution.
>
> Please just make up your mind and say which behavior you want the
> kernel client to have when dealing with servers that communicate
> using 32-bit timestamps:

I want the kernel client to be bug compatible with servers and other
clients in ceph.git.

>
> a) Remain consistent with current 64-bit servers and clients,
>    interpreting the number as an unsigned 32-bit integer where
>    possible, and remain broken on existing 32-bit machines
>    until they start using 64-bit time_t.
>
> b) Change 64-bit clients to behave the same way that that 32-bit
>    clients and servers do today, and actually use the times as
>    a signed number that you claimed they already did.
>
> c) Interpret all times before 1980 or after 2038 as buggy servers
>    and only accept times that are always handled consistently.
>    Also fix the server to reject those times coming from broken
>    clients.
>
> d) Leave 64-bit clients unchanged (using unsigned timestamps) and
>    document that 32-bit clients should keep interpreting them as
>    signed even when we fix VFS, to preserve the existing behavior.
>
> I think a), b) and c) are all reasonable, and I can send a patch
> as soon as you know what you want. If you want d), please leave
> me out of that and do it yourself.

My preference would be a), possibly with some comments along the lines
of c), to make the ..2038 range more explicit, as I doubt we've ever
assumed ..2106 on 64-bit.  This is all really old code and goes back to
the dawn of ceph and beyond the kernel client, so let's wait for Sage
to chime in.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Feb. 19, 2016, 4:30 p.m. UTC | #15

On Friday 19 February 2016 12:42:02 Ilya Dryomov wrote:
> On Fri, Feb 19, 2016 at 11:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 18 February 2016 18:22:55 Ilya Dryomov wrote:
> >> On Thu, Feb 18, 2016 at 5:20 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >> > On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
> >> >>
> >> >> If we are going to touch this function at all, let's drop all casts and
> >> >> just add a comment along the lines of "the lack of cast/sign-extension
> >> >> here is intentional, this function has a counterpart on the server side
> >> >> which also lacks cast/sign-extension, etc".  To someone who is not up
> >> >> on these subtleties, the __u64 cast is going to be more confusing than
> >> >> explicit...
> >> >
> >> > We certainly a comment, I was just suggesting to add the cast as well.
> >> >
> >> > Again, it's not the lack of the cast on the server that is important
> >> > here, it's whether it gets interpreted as signed or unsigned at the
> >> > point where the timestamp is actually used for comparison or
> >> > printing. Let me suggest a comment here:
> >>
> >> I'd argue it's important to have encode/decode code look the same.
> >>
> >> I mentioned before that at least in some places it's interpreted as
> >> signed and that the interpretation is not totally up to the client.
> >> The printing (at least some of it) is right there in the same file
> >> (src/include/utime.h) and you can see ... + 1900.
> >
> > I see this code:
> >
> > class utime_t {
> >   struct {
> >     __u32 tv_sec, tv_nsec;
> >   } tv;
> >
> >   time_t        sec()  const { return tv.tv_sec; }
> >
> >   ostream& gmtime(ostream& out) const {
> >     out.setf(std::ios::right);
> >     char oldfill = out.fill();
> >     out.fill('0');
> >     if (sec() < ((time_t)(60*60*24*365*10))) {
> >       // raw seconds.  this looks like a relative time.
> >       out << (long)sec() << "." << std::setw(6) << usec();
> >     } else {
> >       // localtime.  this looks like an absolute time.
> >       //  aim for http://en.wikipedia.org/wiki/ISO_8601
> >       struct tm bdt;
> >       time_t tt = sec();
> >       gmtime_r(&tt, &bdt);
> >       out << std::setw(4) << (bdt.tm_year+1900)  // 2007 -> '07'
> >           << '-' << std::setw(2) << (bdt.tm_mon+1)
> >           << '-' << std::setw(2) << bdt.tm_mday
> >           << ' '
> >           << std::setw(2) << bdt.tm_hour
> >           << ':' << std::setw(2) << bdt.tm_min
> >           << ':' << std::setw(2) << bdt.tm_sec;
> >       out << "." << std::setw(6) << usec();
> >       out << "Z";
> >     }
> >     out.fill(oldfill);
> >     out.unsetf(std::ios::right);
> >     return out;
> >   }
> >
> > which interprets the time roughly in the way I explained:
> >
> > * On 64-bit architectures, the time is interpreted as an unsigned
> >   number in the range from 1980-2106, while any times between 1970
> >   and 1980 are interpreted as a relative time and printed as
> >   a positive seconds number.
> >
> > * On 32-bit architectures, times in the range 1980-2038 are printed
> >   as a date like 64-bit does, times from 1970-1980 are also printed
> >   as a positive seconds number, and anything with the top bit set
> >   is printed as a negative seconds number.
> >
> > It this the intended behavior? I guess we could change the kernel
> > to reject any timestamps before 1980 and after 2038 in
> > ceph_decode_timespec and just set the Linux timestamp to (time_t)0
> > (Jan 1 1970) in that case, and document that there is no valid
> > interpretation for those.
> 
> At least conceptually, this isn't a bad idea, as only 1970(1980)..2038
> works on both architectures; everything else is likely to break
> something in ceph either on 32-bit or on both, I think.  But, I'm not
> sure if rejecting by setting to 0 is going to work all around and we
> can't change just the kernel client.

Setting to 0 is a bit fishy, but I couldn't think of anything better
either. Right now, we just overflow 32-bit timestamps all the time
and get unpredictable results. I think we will end up implementing
some form of truncation, so times earlier than the start of the 
file system specific epoch (1902, 1970, 1980, ... depending on the
implementation) get set to the earliest supported time, while
times after the last supported timestamp (2037, 2038, 2106, ...)
get set to the maximum.

The problem here is that we don't even know whether an underflow
or an overflow happened.

> I want the kernel client to be bug compatible with servers and other
> clients in ceph.git.
> 
> >
> > a) Remain consistent with current 64-bit servers and clients,
> >    interpreting the number as an unsigned 32-bit integer where
> >    possible, and remain broken on existing 32-bit machines
> >    until they start using 64-bit time_t.
> >
> > b) Change 64-bit clients to behave the same way that that 32-bit
> >    clients and servers do today, and actually use the times as
> >    a signed number that you claimed they already did.
> >
> > c) Interpret all times before 1980 or after 2038 as buggy servers
> >    and only accept times that are always handled consistently.
> >    Also fix the server to reject those times coming from broken
> >    clients.
> >
> > d) Leave 64-bit clients unchanged (using unsigned timestamps) and
> >    document that 32-bit clients should keep interpreting them as
> >    signed even when we fix VFS, to preserve the existing behavior.
> >
> > I think a), b) and c) are all reasonable, and I can send a patch
> > as soon as you know what you want. If you want d), please leave
> > me out of that and do it yourself.
> 
> My preference would be a), possibly with some comments along the lines
> of c), to make the ..2038 range more explicit, as I doubt we've ever
> assumed ..2106 on 64-bit.  This is all really old code and goes back to
> the dawn of ceph and beyond the kernel client, so let's wait for Sage
> to chime in.

Ok, thanks!

I think the same problem appeared in a lot of places when code was
originally written with the assumption that time_t is 32-bit,
and then it kept working in practice when ported to 64-bit systems
for the first time but the underlying assumptions changed.

/*
 * ceph timestamp handling is traditionally inconsistent between
 * 32-bit clients and servers using them as a signed 32-bit time_t
 * ranging from 1902 to 2038, and 64-bit clients and servers
 * interpreting the same numbers as unsigned 32-bit integers
 * with negative numbers turning into dates between 2038 and 2106.
 *
 * Most clients and servers these days are 64-bit, so let's pretend
 * that interpreting the times as "__u32" is intended, and use
 * that on 32-bit systems as well once using a 64-bit time_t.
 */

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sage Weil March 1, 2016, 4:18 p.m. UTC | #16

On Fri, 19 Feb 2016, Arnd Bergmann wrote:
> On Friday 19 February 2016 12:42:02 Ilya Dryomov wrote:
> > I want the kernel client to be bug compatible with servers and other
> > clients in ceph.git.
> > 
> > >
> > > a) Remain consistent with current 64-bit servers and clients,
> > >    interpreting the number as an unsigned 32-bit integer where
> > >    possible, and remain broken on existing 32-bit machines
> > >    until they start using 64-bit time_t.
> > >
> > > b) Change 64-bit clients to behave the same way that that 32-bit
> > >    clients and servers do today, and actually use the times as
> > >    a signed number that you claimed they already did.
> > >
> > > c) Interpret all times before 1980 or after 2038 as buggy servers
> > >    and only accept times that are always handled consistently.
> > >    Also fix the server to reject those times coming from broken
> > >    clients.
> > >
> > > d) Leave 64-bit clients unchanged (using unsigned timestamps) and
> > >    document that 32-bit clients should keep interpreting them as
> > >    signed even when we fix VFS, to preserve the existing behavior.
> > >
> > > I think a), b) and c) are all reasonable, and I can send a patch
> > > as soon as you know what you want. If you want d), please leave
> > > me out of that and do it yourself.
> > 
> > My preference would be a), possibly with some comments along the lines
> > of c), to make the ..2038 range more explicit, as I doubt we've ever
> > assumed ..2106 on 64-bit.  This is all really old code and goes back to
> > the dawn of ceph and beyond the kernel client, so let's wait for Sage
> > to chime in.
> 
> Ok, thanks!
> 
> I think the same problem appeared in a lot of places when code was
> originally written with the assumption that time_t is 32-bit,
> and then it kept working in practice when ported to 64-bit systems
> for the first time but the underlying assumptions changed.
> 
> /*
>  * ceph timestamp handling is traditionally inconsistent between
>  * 32-bit clients and servers using them as a signed 32-bit time_t
>  * ranging from 1902 to 2038, and 64-bit clients and servers
>  * interpreting the same numbers as unsigned 32-bit integers
>  * with negative numbers turning into dates between 2038 and 2106.
>  *
>  * Most clients and servers these days are 64-bit, so let's pretend
>  * that interpreting the times as "__u32" is intended, and use
>  * that on 32-bit systems as well once using a 64-bit time_t.
>  */

This sounsd right to me.  In practice we never have negative timestamps, 
so keeping the unsigned interpretation makes the most sense, and buys us 
several decades.  Although it's pretty likely we'll switch to a 64-bit 
encoding long before then...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

include: ceph: Fix encode and decode type conversions

Commit Message

Comments

Patch