diff mbox

[RFC,02/15] vfs: Change all structures to support 64 bit time

Message ID 1452144972-15802-3-git-send-email-deepa.kernel@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Deepa Dinamani Jan. 7, 2016, 5:35 a.m. UTC
The current representation of inode times in struct inode, struct iattr,
and struct kstat use struct timespec. timespec is not y2038 safe.

Use scalar data types (seconds and nanoseconds stored separately) to
represent timestamps in struct inode in order to maintain same size for
times across 32 bit and 64 bit architectures.
In addition, lay them out such that they are packed on a naturally
aligned boundary on 64 bit arch as 4 bytes are used to store nsec.
This makes each tuple(sec, nscec) use 12 bytes instead of 16 bytes.
This will help save RAM space as inode structure is cached in memory.
The other structures are transient and do not benefit from these
changes.

Add accessors for inode timestamps.
These provide a way to access the time field members.
Accessors abstract the timestamp representation so that any logic
to convert between the struct inode timestamps and other interfaces
can be placed here.
The plan is to globally change all references to these types through
these accessors only. So when the actual internal representation
changes, it will be transparent to the outside world.
This can be extended to add code to validate the inode times that
are being set.
Macros are chosen as accessors rather than functions because we can
condense 3 {a,c,m} time functions into a single macro. After we agree
on an approach, the implementation could be changed to use static
inline functions if it suits more.

Add inode_timespec aliases to help convert kstat and iattr times to use
64 bit times. These hide the internal data type.
Use uapi exposed data types here to keep minimal timstamp data type
conversions in API's interfacing with vfs.

After the CONFIG_FS_USES_64BIT_TIME is enabled, all inode_timespec
aliases will be removed and timespec64 data types and API's will
be used directly.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---
 include/linux/fs.h     | 55 +++++++++++++++++++++++++++++++++++++++++++-------
 include/linux/stat.h   |  6 +++---
 include/linux/time64.h | 21 +++++++++++++++++++
 3 files changed, 72 insertions(+), 10 deletions(-)

Comments

Dave Chinner Jan. 10, 2016, 11:03 p.m. UTC | #1
On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
> The current representation of inode times in struct inode, struct iattr,
> and struct kstat use struct timespec. timespec is not y2038 safe.
> 
> Use scalar data types (seconds and nanoseconds stored separately) to
> represent timestamps in struct inode in order to maintain same size for
> times across 32 bit and 64 bit architectures.
> In addition, lay them out such that they are packed on a naturally
> aligned boundary on 64 bit arch as 4 bytes are used to store nsec.
> This makes each tuple(sec, nscec) use 12 bytes instead of 16 bytes.
> This will help save RAM space as inode structure is cached in memory.
> The other structures are transient and do not benefit from these
> changes.

IMO, this decisions sends the patch series immediately down the
wrong path. TO me, this is a severe case of premature optimisation
because everything gets way more complex just to save those 8 bytes,
especially as those holes can be filled simply by changing the
variable declaration order in the structure and adding a simple
comment.

And, really, I don't like those VFS_INODE_[GS]ET_XTIME macros at
all; you've got to touch lots of code(*), making it all shouty and
harder to read.  They seem only to exist because of the above
structural change requires an abstract timestamp accessor while
CONFIG_FS_USES_64BIT_TIME exists. Given that goes away at the end o
the series, so should the macro - if we use a struct timespec64 in
the first place, it isn't even necessary as a temporary construct.

(*) I note you haven't touched XFS, which means you've probably
broken lots of other filesystem code. e.g. in XFS, functions like
xfs_vn_getattr() and xfs_vn_update_time() access inode->i_[acm]time
directly and hence are not going to compile after this patch series.

Cheers,

Dave.
Deepa Dinamani Jan. 12, 2016, 5:42 a.m. UTC | #2
> On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote:
>
>> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
>> The current representation of inode times in struct inode, struct iattr,
>> and struct kstat use struct timespec. timespec is not y2038 safe.
>>
>> Use scalar data types (seconds and nanoseconds stored separately) to
>> represent timestamps in struct inode in order to maintain same size for
>> times across 32 bit and 64 bit architectures.
>> In addition, lay them out such that they are packed on a naturally
>> aligned boundary on 64 bit arch as 4 bytes are used to store nsec.
>> This makes each tuple(sec, nscec) use 12 bytes instead of 16 bytes.
>> This will help save RAM space as inode structure is cached in memory.
>> The other structures are transient and do not benefit from these
>> changes.
>
> IMO, this decisions sends the patch series immediately down the
> wrong path.

There are other things the patch does that I would like to get comments
on: inode_timespec aliases, range check, individual fs changes etc.
These are independent of the inode timestamp representation changes.

>TO me, this is a severe case of premature optimisation
> because everything gets way more complex just to save those 8 bytes,
> especially as those holes can be filled simply by changing the
> variable declaration order in the structure and adding a simple
> comment.

I had tried rearranging the structure and the pahole tool does not show
any difference unless you pack and align the struct to 4 bytes on 64
bit arch.  The change actually saves 16 bytes on x86_64 and adds 12
bytes on i386.

Here is the breakdown for struct inode before and after the patch:

x86_64:
/* size: 544, cachelines: 9, members: 44 */           |
/* size: 528, cachelines: 9, members: 47 */
/* sum members: 534, holes: 3, sum holes: 10 */       |
/* sum members: 522, holes: 2, sum holes: 6 */

i386:
/* size: 328, cachelines: 6, members: 45 */           |
/* size: 340, cachelines: 6, members: 48 */
/* sum members: 326, holes: 1, sum holes: 2 */        |
/* sum members: 338, holes: 1, sum holes: 2 */

According to /proc/slabinfo I estimated savings of 4MB on a lightly
loaded system.

> And, really, I don't like those VFS_INODE_[GS]ET_XTIME macros at
> all; you've got to touch lots of code(*), making it all shouty and
> harder to read.  They seem only to exist because of the above
> structural change requires an abstract timestamp accessor while
> CONFIG_FS_USES_64BIT_TIME exists.
> Given that goes away at the end o
> the series, so should the macro - if we use a struct timespec64 in
> the first place, it isn't even necessary as a temporary construct

timespec64 was the first option considered here.  The problem with using
timespec64 is the long data type to represent nsec.  If it were possible
to change timespec64 nsec to int data type then it might be okay  to use
that if we are not worried about holes.  I do not see why time stamps
should have different representations on a 32 bit vs a 64 bit arch.
This left us with the option define a new data type to represent
timestamps.  I agreed with the concerns on the earlier RFC series that
there are already very many data types to represent time in the kernel.
So this left me with the option of using scalar types to represent these.
The scalar types were not used for optimization. They just happened to
serve that purpose as well.  This could be in a follow on patch, but as
long as we are changing the representation everywhere, I don't see why
there should be an intermediate step to change it to timespec64 only to
change it to this representation later.

As far as accessors are concerned, there already are accessors in the
VFS: generic_fillattr() and setattr_copy(). The problem really is that
there is more than one way of updating these attributes(timestamps in
this particular case). The side effect of this is that we don't always
call timespec_trunc() before assigning timestamps which can lead to
inconsistencies between on disk and in memory inode timestamps. Also,
since these also touch other attributes, these become more restrictive.
The accessors were an idea to streamline all accesses to timestamps in
inode.  Right now the accessor macros also figure out if the timestamps
were clamped and then call the registered callback.  But, this can be
extended to include fs_time_trunc() and then all the end users can
just use these and not worry about the right granularity or range.
As the commit text says, these can be changed to inline functions to
avoid shouty case.

> (*) I note you haven't touched XFS, which means you've probably
> broken lots of other filesystem code. e.g. in XFS, functions like
> xfs_vn_getattr() and xfs_vn_update_time() access inode->i_[acm]time
> directly and hence are not going to compile after this patch series.

I think I should have explained this more in my cover letter, as this
has come up twice now.  Patches 1-7 are the only ones that are relevant
and compiled and tested.  These change three example filesystems as an
illustration of the proposed solution.  Of course, every filesystem will
have to be changed similarly before patches 8-15 and a few more to change
additional filesystems to use timespec64 can be merged.  Patches 8-15 were
included merely to provide a complete picture, as I thought patches
explained the concept better than words only. These have not even been
compiled, as these are for illustration purposes as noted in the cover
letter.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 12, 2016, 8:29 a.m. UTC | #3
On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
> > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote:
> >
> >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
> >> The current representation of inode times in struct inode, struct iattr,
> >> and struct kstat use struct timespec. timespec is not y2038 safe.
> >>
> >> Use scalar data types (seconds and nanoseconds stored separately) to
> >> represent timestamps in struct inode in order to maintain same size for
> >> times across 32 bit and 64 bit architectures.
> >> In addition, lay them out such that they are packed on a naturally
> >> aligned boundary on 64 bit arch as 4 bytes are used to store nsec.
> >> This makes each tuple(sec, nscec) use 12 bytes instead of 16 bytes.
> >> This will help save RAM space as inode structure is cached in memory.
> >> The other structures are transient and do not benefit from these
> >> changes.
> >
> > IMO, this decisions sends the patch series immediately down the
> > wrong path.
> 
> There are other things the patch does that I would like to get comments
> on: inode_timespec aliases, range check, individual fs changes etc.
> These are independent of the inode timestamp representation changes.

The inode_timespec alias is part of the problem - AFAICT it exists
because you need a representation that is independent of both the
old and new in-memory structures.

The valid timestamp range stuff in the superblock is absolutely
necessary, but that's something that can be done completely
independently to the changes for supporting a differnet time storage
format.

And the fs changes cannot really be commented on until the VFs time
representation is sorted out properly...

> >TO me, this is a severe case of premature optimisation
> > because everything gets way more complex just to save those 8 bytes,
> > especially as those holes can be filled simply by changing the
> > variable declaration order in the structure and adding a simple
> > comment.
> 
> I had tried rearranging the structure and the pahole tool does not show
> any difference unless you pack and align the struct to 4 bytes on 64
> bit arch.  The change actually saves 16 bytes on x86_64 and adds 12
> bytes on i386.
> 
> Here is the breakdown for struct inode before and after the patch:
> 
> x86_64:
> /* size: 544, cachelines: 9, members: 44 */           |
> /* size: 528, cachelines: 9, members: 47 */
> /* sum members: 534, holes: 3, sum holes: 10 */       |
> /* sum members: 522, holes: 2, sum holes: 6 */
> 
> i386:
> /* size: 328, cachelines: 6, members: 45 */           |
> /* size: 340, cachelines: 6, members: 48 */
> /* sum members: 326, holes: 1, sum holes: 2 */        |
> /* sum members: 338, holes: 1, sum holes: 2 */
> 
> According to /proc/slabinfo I estimated savings of 4MB on a lightly
> loaded system.
> 
> > And, really, I don't like those VFS_INODE_[GS]ET_XTIME macros at
> > all; you've got to touch lots of code(*), making it all shouty and
> > harder to read.  They seem only to exist because of the above
> > structural change requires an abstract timestamp accessor while
> > CONFIG_FS_USES_64BIT_TIME exists.
> > Given that goes away at the end o
> > the series, so should the macro - if we use a struct timespec64 in
> > the first place, it isn't even necessary as a temporary construct
> 
> timespec64 was the first option considered here.  The problem with using
> timespec64 is the long data type to represent nsec.  If it were possible
> to change timespec64 nsec to int data type then it might be okay  to use
> that if we are not worried about holes.  I do not see why time stamps
> should have different representations on a 32 bit vs a 64 bit arch.

What's it matter? iot's irrelevant to the problem at hand.

Besides, have you looked at the existing timestamp definitions? they
use a struct timespec, which on a 64 bit system:

        struct timespec            i_atime;              /*    88    16 */
        struct timespec            i_mtime;              /*   104    16 */
        struct timespec            i_ctime;              /*   120    16 */

use 2 longs and are identical to a timespec64 in everything but
name. These should just be changed to a timespec64, and we suck up
the fact it increases the size of the 32 bit inode as we have to
increase it's size to support time > y2038 anyway.

This is what I meant about premature optimisation - you've got a
wonderfully complex solution to a problem that we don't need to
solve to support timestamps >y2038. It's also why it goes down the
wrong path at this point - most of the changes are not necessary if
all we need to do is a simple timespec -> timespec64 type change and
the addition timestamp range limiting in the existing truncation
function...

> The problem really is that
> there is more than one way of updating these attributes(timestamps in
> this particular case). The side effect of this is that we don't always
> call timespec_trunc() before assigning timestamps which can lead to
> inconsistencies between on disk and in memory inode timestamps.

That's a problem that can be fixed independently of y2038 support.
Indeed, we can be quite lazy about updating timestamps - by intent
and design we usually have different timestamps in memory compared
to on disk, which is one of the reasons why there are so many
different ways to change and update timestamps....

Cheers,

Dave.
Arnd Bergmann Jan. 12, 2016, 9:27 a.m. UTC | #4
On Tuesday 12 January 2016 19:29:57 Dave Chinner wrote:
> 
> This is what I meant about premature optimisation - you've got a
> wonderfully complex solution to a problem that we don't need to
> solve to support timestamps >y2038. It's also why it goes down the
> wrong path at this point - most of the changes are not necessary if
> all we need to do is a simple timespec -> timespec64 type change and
> the addition timestamp range limiting in the existing truncation
> function...

I originally suggested doing the split representation because I
was worried about the downsides of using timespec64 on 32-bit systems
after looking at actual memory consumption on my test box.

At this moment, I have a total of 145712700 inodes in memory on a machine
with 64GB ram, saving 12 bytes on each amounts to a total of 145MB. I think
it was more than that when I first looked, so it's between 0.2% and 0.3%
of savings in total memory, which is certainly worth discussing about,
given the renewed interest in conserving RAM in general. If we want to
save this memory, then doing it at the same time as the timespec64 conversion
is the right time so we don't need to touch every file twice.

One point that I had not considered though is on the 32-bit systems we
are talking about, not only is RAM much smaller, but also there would be
a smaller fraction of RAM available to store inodes, so there is not as
much to gain.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 13, 2016, 6:27 a.m. UTC | #5
On Tue, Jan 12, 2016 at 10:27:07AM +0100, Arnd Bergmann wrote:
> On Tuesday 12 January 2016 19:29:57 Dave Chinner wrote:
> > 
> > This is what I meant about premature optimisation - you've got a
> > wonderfully complex solution to a problem that we don't need to
> > solve to support timestamps >y2038. It's also why it goes down the
> > wrong path at this point - most of the changes are not necessary if
> > all we need to do is a simple timespec -> timespec64 type change and
> > the addition timestamp range limiting in the existing truncation
> > function...
> 
> I originally suggested doing the split representation because I
> was worried about the downsides of using timespec64 on 32-bit systems
> after looking at actual memory consumption on my test box.
> 
> At this moment, I have a total of 145712700 inodes in memory on a machine

Is that all? :P

> with 64GB ram, saving 12 bytes on each amounts to a total of 145MB.

I just posted a patchset that knocks 104 bytes off the XFS inode
(~12% reduction in size).  We need the changes in that patchset to
sanely support >y2038k support in XFS, and it means we now won't
need to grow the XFS inode to add that support, either.

> I think
> it was more than that when I first looked, so it's between 0.2% and 0.3%
> of savings in total memory, which is certainly worth discussing about,
> given the renewed interest in conserving RAM in general.  If we want to
> save this memory, then doing it at the same time as the timespec64 conversion
> is the right time so we don't need to touch every file twice.

You just uttered the key words: "If we want to save this memory"

So let's stop conflating two different lines of development because
we only actually *need* y2038k support.

The fact we haven't made timestamp space optimisations means
that nobody has thought it necessary or worthwhile. y2038k support
doesn't change the landscape under which we might consider the
optimisation, so we need to determine if the benefit outweighs the
cost in terms of code complexity and maintainability.

So separate the two changes - make the y2038k change simple and
obviously correct first by changing everything to timespec64. Then it
won't get delayed by bikeshedding about an optimisation of that is
of questionable benefit.

Cheers,

Dave.
Arnd Bergmann Jan. 13, 2016, 9:20 a.m. UTC | #6
On Wednesday 13 January 2016 17:27:16 Dave Chinner wrote:
> > I think
> > it was more than that when I first looked, so it's between 0.2% and 0.3%
> > of savings in total memory, which is certainly worth discussing about,
> > given the renewed interest in conserving RAM in general.  If we want to
> > save this memory, then doing it at the same time as the timespec64 conversion
> > is the right time so we don't need to touch every file twice.
> 
> You just uttered the key words: "If we want to save this memory"
> 
> So let's stop conflating two different lines of development because
> we only actually *need* y2038k support.
> 
> The fact we haven't made timestamp space optimisations means
> that nobody has thought it necessary or worthwhile. y2038k support
> doesn't change the landscape under which we might consider the
> optimisation, so we need to determine if the benefit outweighs the
> cost in terms of code complexity and maintainability.
> 
> So separate the two changes - make the y2038k change simple and
> obviously correct first by changing everything to timespec64. Then it
> won't get delayed by bikeshedding about an optimisation of that is
> of questionable benefit.

Fine with me. I think Deepa already started simplifying the series
already. I agree that for 64-bit machines, there is no need to optimize
that code now, since we are not regressing in terms of memory size.

For 32-bit machines, we are regressing anyway, the question is whether
it's by 12 or 24 bytes per inode. Let me try to estimate the worse-case
scenario here: let's assume that we have 1GB of RAM (anything more
on a 32-bit system gets you into trouble, and if you have less, there
will be less of a problem). Filling all of system ram with small tmpfs
files means a single 4K page plus 280 bytes for the minimum inode,
so we need an additional 6MB or 12MB to store the extra timespec
bits. Probably not too bad for a worst-case scenario, but there is
also the case of storing just the inodes but no pages, and that would
be worse.

I've added the linux-arm and linux-mips lists to cc, to see if anyone has
strong opinions on this matter. We don't have to worry about x86-32 here,
because sizeof(struct timespec64) is 12 bytes there anyway, and I don't
think there are any other 32-bit architectures that have large-scale
deployments or additional requirements we don't already have on ARM.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Deepa Dinamani Jan. 13, 2016, 4:33 p.m. UTC | #7
On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote:
> On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
> > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote:
> > >
> > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:

Summarizing, here are the open questions that need to be sorted before another
series:
1. What should be part of the series?
	a. y2038 safe data types conversion.
	b. range check and clamping
2. How to achieve a seamless transition?
	Is inode_timespec solution agreed upon to achieve 1a?
	An alternate approach is included in the cover letter.
3. policy for handling out of range timestamps:
	There was no conclusion on this from the previous series as noted in the
	cover letter.
	I think this can be solved by figuring out the answer to question:
	who should have a say in deciding the course of action if the timestamps
	are out of range:
	
	a. sysadmin through sysctl (Arnd's suggestion)
	b. have default vfs handlers with an option for individual fs to override.
	c. clamp and ignore
	d. disable expired fs at compile time (Arnd's suggestion)


> The inode_timespec alias is part of the problem - AFAICT it exists
> because you need a representation that is independent of both the
> old and new in-memory structures.

Maybe you are misunderstanding the fact that only struct inode is changed
to have individual fields. Whereas, timespec64 is used everywhere else in
the series.
inode_timespec is an alias to transition timespec references to
timespec64 without breaking anything. This is needed because we
don't want to change all references to timespec in a single patch like
the cover letter says.
The same RFC Patch 2 has more details.

> The valid timestamp range stuff in the superblock is absolutely
> necessary, but that's something that can be done completely
> independently to the changes for supporting a differnet time storage
> format.

If I'm defining new functions to support new format, then I should at
least get the function signatures right before using them.
These can be in a different patch, but should be in the same patch series,
before they are used anywhere.
For instance,
struct timespec timespec_trunc(struct timespec t, unsigned gran);
should now take superblock as an argument instead of gran.

> And the fs changes cannot really be commented on until the VFs time
> representation is sorted out properly...

Each fs is changed twice in the current approach to transition everything to
timespec64. And, there are different ways of doing this.
For instance, Arnd had an idea different from mine as to how this can be done:
He was suggesting using something like these accessor macros and incorporating
timespec64 from the beginning in the individual filesystems rather than
inode_timespec.
Again, this is independent of how timestamps are stored in struct inode.
There are others that are independent of inode timestamp representation as well


> Besides, have you looked at the existing timestamp definitions? they
> use a struct timespec, which on a 64 bit system:
> 
>         struct timespec            i_atime;              /*    88    16 */
>         struct timespec            i_mtime;              /*   104    16 */
>         struct timespec            i_ctime;              /*   120    16 */
> 
> use 2 longs and are identical to a timespec64 in everything but
> name. These should just be changed to a timespec64, and we suck up
> the fact it increases the size of the 32 bit inode as we have to
> increase it's size to support time > y2038 anyway.
> 
> This is what I meant about premature optimisation - you've got a
> wonderfully complex solution to a problem that we don't need to
> solve to support timestamps >y2038. It's also why it goes down the
> wrong path at this point - most of the changes are not necessary if
> all we need to do is a simple timespec -> timespec64 type change and
> the addition timestamp range limiting in the existing truncation
> function...

The pahole output I pasted in the previous email(on the left) was for
timespec.

Yes, I do know that timespec64 is same as timespec on 64 bit systems:
#if __BITS_PER_LONG == 64
# define timespec64 timespec

I think it's been agreed upon now that inode timestamps will be changed
to use timespec64 as Arnd mentioned, if I do not hear any objections.
The whole purpose of this is to gather comments.

> > The problem really is that
> > there is more than one way of updating these attributes(timestamps in
> > this particular case). The side effect of this is that we don't always
> > call timespec_trunc() before assigning timestamps which can lead to
> > inconsistencies between on disk and in memory inode timestamps.
> 
> That's a problem that can be fixed independently of y2038 support.
> Indeed, we can be quite lazy about updating timestamps - by intent
> and design we usually have different timestamps in memory compared
> to on disk, which is one of the reasons why there are so many
> different ways to change and update timestamps....

This has nothing to do with lazy updates.
This is about writing wrong granularities and non clamped values to
in-memory inode.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 13, 2016, 9:04 p.m. UTC | #8
On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
> On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote:
> > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
> > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
> 
> Summarizing, here are the open questions that need to be sorted before another
> series:
> 1. What should be part of the series?
> 	a. y2038 safe data types conversion.
> 	b. range check and clamping

Yes.

> 2. How to achieve a seamless transition?
> 	Is inode_timespec solution agreed upon to achieve 1a?

No. Just convert direct to timespec64.

> 	An alternate approach is included in the cover letter.
> 3. policy for handling out of range timestamps:
> 	There was no conclusion on this from the previous series as noted in the
> 	cover letter.
> 	a. sysadmin through sysctl (Arnd's suggestion)
> 	b. have default vfs handlers with an option for individual fs to override.
> 	c. clamp and ignore

I think it's a mix - if the timestamps come in from userspace,
fail with ERANGE. That could be controlled by sysctl via VFS
part of the ->setattr operation, or in each of the individual FS
implementations. If they come from the kernel (e.g. atime update) then
the generic behvaiour is to warn and continue, filesystems can
otherwise select their own policy for kernel updates via
->update_time.

> 	d. disable expired fs at compile time (Arnd's suggestion)

Not really an option, because it means we can't use filesystems that
interop with other systems (e.g. cameras, etc) because they won't
support y2038k timestamps for a long time, if ever (e.g. vfat).

> > > The problem really is that
> > > there is more than one way of updating these attributes(timestamps in
> > > this particular case). The side effect of this is that we don't always
> > > call timespec_trunc() before assigning timestamps which can lead to
> > > inconsistencies between on disk and in memory inode timestamps.
> > 
> > That's a problem that can be fixed independently of y2038 support.
> > Indeed, we can be quite lazy about updating timestamps - by intent
> > and design we usually have different timestamps in memory compared
> > to on disk, which is one of the reasons why there are so many
> > different ways to change and update timestamps....
> 
> This has nothing to do with lazy updates.
> This is about writing wrong granularities and non clamped values to
> in-memory inode.

Which really shouldn't happen because we should be clamping and/or
truncating timestamps at the creation/entry point into the
VFS/filesystem.

e.g.  current_fs_time(sb) is how filesystems grab the current kernel
time for timestamp updates. Add an equivalent current_fs_time64(sb)
to do return timespec64 and do clamping and limit warning, and now
you have a simple vehicle for converting the VFS and filesystems to
support y2038k clean date formats.

If there are places where filesystems are receiving or using
unchecked timestamps then those are bugs that need fixing. Those
need to be in separate patches to y2038k support...

Cheers,

Dave.
Arnd Bergmann Jan. 14, 2016, 4:53 p.m. UTC | #9
On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
> On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
> > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote:
> > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
> > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote:
> > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
> >
> > 2. How to achieve a seamless transition?
> > 	Is inode_timespec solution agreed upon to achieve 1a?
> 
> No. Just convert direct to timespec64.

The hard part here is how to split that change into logical patches
per file system. We have already discussed all sorts of ways to
do that, but there is no ideal solution, as you usually end up
either having some really large patches, or you have to modify
the same lines multiple times.

The most promising approaches are:

a) In Deepa's current patch set, some infrastructure is first
   introduced by changing the type from timespec to an identical
   inode_timespec, which lets us convert one file system at a time
   to inode_timespec and then change the type once they are all
   done. The downside is then that all file systems have to get
   touched twice so we end up with timespec64 everywhere.

b) A variation of that which I would do is to use have a smaller
   set of infrastructure first, so we can change one file system
   at a time to timespec64 while leaving the common structures to
   use timespec until all file systems are converted. The downside
   is the use of some conversion macros when accessing the times
   in the inode.
   When the common code is changed, those accessor macros get
   turned into trivial assignments that can be removed up later
   or changed in the same patch.

c) The opposite direction from b) is to first change the common
   code, but then any direct assignment between a timespec in
   a file system and the timespec64 in the inode/iattr/kstat/etc
   first needs a conversion helper so we can build cleanly,
   and then we do one file system at a time to remove them all
   again while changing the internal structures in the
   file system from timespec to timespec64.   

> > 	An alternate approach is included in the cover letter.
> > 3. policy for handling out of range timestamps:
> > 	There was no conclusion on this from the previous series as noted in the
> > 	cover letter.
> > 	a. sysadmin through sysctl (Arnd's suggestion)
> > 	b. have default vfs handlers with an option for individual fs to override.
> > 	c. clamp and ignore
> 
> I think it's a mix - if the timestamps come in from userspace,
> fail with ERANGE. That could be controlled by sysctl via VFS
> part of the ->setattr operation, or in each of the individual FS
> implementations. If they come from the kernel (e.g. atime update) then
> the generic behvaiour is to warn and continue, filesystems can
> otherwise select their own policy for kernel updates via
> ->update_time.

I'd prefer not to have it done by the individual file system
implementation, so we get a consistent behavior. Normally you either
care about correct time stamps, or you care about interoperability
and you don't want to have errors returned here.

It could be done per mount, but that seems overly complicated
for rather little to be gained.

> > 	d. disable expired fs at compile time (Arnd's suggestion)
> 
> Not really an option, because it means we can't use filesystems that
> interop with other systems (e.g. cameras, etc) because they won't
> support y2038k timestamps for a long time, if ever (e.g. vfat).

Let me clarify what my idea is here: I want a global kernel option
that disables all code that has known y2038 issues. If anyone tries
to build an embedded system with support beyond 2038, that should
disable all of those things, including file systems, drivers and
system calls, so we can reasonably assume that everything that works
today with that kernel build will keep working in the future and
not break in random ways.

For a file system, this can be done in a number of ways:

* Most file systems today interpret the time as an unsigned 32-bit
  number (as opposed to signed as ext3, xfs and few others do),
  so as long as we use timespec64 in the syscalls, we are ok.

* Some legacy file systems (maybe hfs) can remain disabled, as
  nobody cares about them any more.

* If we still care about them (e.g. ext2), we can make them support
  only read-only mode. In ext4, this would mean forbidding write
  access to file systems that don't have the extended inode format
  enabled.

Normal users that don't care about not breaking in 2038 obviously
won't set the option, and have the same level of backwards compatibility
support as today.

> > > > The problem really is that
> > > > there is more than one way of updating these attributes(timestamps in
> > > > this particular case). The side effect of this is that we don't always
> > > > call timespec_trunc() before assigning timestamps which can lead to
> > > > inconsistencies between on disk and in memory inode timestamps.
> > > 
> > > That's a problem that can be fixed independently of y2038 support.
> > > Indeed, we can be quite lazy about updating timestamps - by intent
> > > and design we usually have different timestamps in memory compared
> > > to on disk, which is one of the reasons why there are so many
> > > different ways to change and update timestamps....
> > 
> > This has nothing to do with lazy updates.
> > This is about writing wrong granularities and non clamped values to
> > in-memory inode.
> 
> Which really shouldn't happen because we should be clamping and/or
> truncating timestamps at the creation/entry point into the
> VFS/filesystem.
> 
> e.g.  current_fs_time(sb) is how filesystems grab the current kernel
> time for timestamp updates. Add an equivalent current_fs_time64(sb)
> to do return timespec64 and do clamping and limit warning, and now
> you have a simple vehicle for converting the VFS and filesystems to
> support y2038k clean date formats.

I think the current patch series does this already.
 
> If there are places where filesystems are receiving or using
> unchecked timestamps then those are bugs that need fixing. Those
> need to be in separate patches to y2038k support...

Fair enough, but that probably means that patch series will have to
come first. This will also reduce the number of places in which a
separate type conversion function needs to be added.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Deepa Dinamani Jan. 14, 2016, 6 p.m. UTC | #10
On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote:
> On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
> > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
> > > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote:
> > > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
> > > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote:
> > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
> > >
> > > 2. How to achieve a seamless transition?
> > > 	Is inode_timespec solution agreed upon to achieve 1a?
> > 
> > No. Just convert direct to timespec64.
> 
> The hard part here is how to split that change into logical patches
> per file system. We have already discussed all sorts of ways to
> do that, but there is no ideal solution, as you usually end up
> either having some really large patches, or you have to modify
> the same lines multiple times.
> 
> The most promising approaches are:
> 
> a) In Deepa's current patch set, some infrastructure is first
>    introduced by changing the type from timespec to an identical
>    inode_timespec, which lets us convert one file system at a time
>    to inode_timespec and then change the type once they are all
>    done. The downside is then that all file systems have to get
>    touched twice so we end up with timespec64 everywhere.
> 
> b) A variation of that which I would do is to use have a smaller
>    set of infrastructure first, so we can change one file system
>    at a time to timespec64 while leaving the common structures to
>    use timespec until all file systems are converted. The downside
>    is the use of some conversion macros when accessing the times
>    in the inode.
>    When the common code is changed, those accessor macros get
>    turned into trivial assignments that can be removed up later
>    or changed in the same patch.
> 
> c) The opposite direction from b) is to first change the common
>    code, but then any direct assignment between a timespec in
>    a file system and the timespec64 in the inode/iattr/kstat/etc
>    first needs a conversion helper so we can build cleanly,
>    and then we do one file system at a time to remove them all
>    again while changing the internal structures in the
>    file system from timespec to timespec64.   

Just a clarification here:
approaches b and c also need some functions that take times as arguments,
including a function pointer in the vfs layer to be supported in both forms:
timespec and timespec64 concurrently.
As included in the cover letter, these are:
generic_update_time(), inode->i_op->update_time(), lease_get_mtime(),
fstack_copy_attr_all(), setattr_copy(), generic_fillattr().

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 14, 2016, 9 p.m. UTC | #11
On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote:
> On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
> > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
> > > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote:
> > > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
> > > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote:
> > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
> > >
> > > 2. How to achieve a seamless transition?
> > > 	Is inode_timespec solution agreed upon to achieve 1a?
> > 
> > No. Just convert direct to timespec64.
> 
> The hard part here is how to split that change into logical patches
> per file system. We have already discussed all sorts of ways to
> do that, but there is no ideal solution, as you usually end up
> either having some really large patches, or you have to modify
> the same lines multiple times.
> 
> The most promising approaches are:
> 
> a) In Deepa's current patch set, some infrastructure is first
>    introduced by changing the type from timespec to an identical
>    inode_timespec, which lets us convert one file system at a time
>    to inode_timespec and then change the type once they are all
>    done. The downside is then that all file systems have to get
>    touched twice so we end up with timespec64 everywhere.

Yes, and the result is not pretty.

> b) A variation of that which I would do is to use have a smaller
>    set of infrastructure first, so we can change one file system
>    at a time to timespec64 while leaving the common structures to
>    use timespec until all file systems are converted. The downside
>    is the use of some conversion macros when accessing the times
>    in the inode.
>    When the common code is changed, those accessor macros get
>    turned into trivial assignments that can be removed up later
>    or changed in the same patch.

Doesn't really make a lot of sense to me. We have to change
everything evntually, and it's not that much work to do so up
front...

> c) The opposite direction from b) is to first change the common
>    code, but then any direct assignment between a timespec in
>    a file system and the timespec64 in the inode/iattr/kstat/etc
>    first needs a conversion helper so we can build cleanly,
>    and then we do one file system at a time to remove them all
>    again while changing the internal structures in the
>    file system from timespec to timespec64.   

No new helpers are necessary - we've already got the helper
functions we need. This:

int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
 {
        struct inode *inode = d_inode(old_dentry);
+       struct inode_timespec now = current_fs_time(inode->i_sb);

-       inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+       VFS_INODE_SET_XTIME(i_ctime, inode, now);
+       VFS_INODE_SET_XTIME(i_mtime, dir, now);
+       VFS_INODE_SET_XTIME(i_ctime, dir, now);
        inc_nlink(inode);
.....

is just wrong. All the type conversion and clamping and checking
done in that VFS_INODE_SET_XTIME() should be done in
current_fs_time() and have it return a timespec64 directly. Indeed,
it already does truncation, and can easily be made to do range
clamping, too.  i.e.  the change should simply be:

-       inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_fs_time(inode->i_sb);

This will work cleanly with a s/timespec/timespec64/ API changeover,
too, without the need for any additional helpers at all.

i.e. the first thing to do is make sure everything that
creates/converts timestamps uses/returns a struct timespec that is
clamped and truncated at the earliest possible opportunity. i.e. on
entry to the kernel and when pulled from the kernel time. Once all
the code is using and passing around truncated+clamped struct
timespec (like above), then we can simply do a
s/timespec/timespec64/ on the inode and API, and we're done.

> > I think it's a mix - if the timestamps come in from userspace,
> > fail with ERANGE. That could be controlled by sysctl via VFS
> > part of the ->setattr operation, or in each of the individual FS
> > implementations. If they come from the kernel (e.g. atime update) then
> > the generic behvaiour is to warn and continue, filesystems can
> > otherwise select their own policy for kernel updates via
> > ->update_time.
> 
> I'd prefer not to have it done by the individual file system
> implementation, so we get a consistent behavior.

Can't be helped, because different filesystems have different
timestamp behaviours, and not all use the generic VFS code for
timestamp updates. The filesystems need to use the correct helper
functions to obtain a valid current time, but you can't stop them
from storing and using arbitrary timestamp formats if they so
desire...

> > > 	d. disable expired fs at compile time (Arnd's suggestion)
> > 
> > Not really an option, because it means we can't use filesystems that
> > interop with other systems (e.g. cameras, etc) because they won't
> > support y2038k timestamps for a long time, if ever (e.g. vfat).
> 
> Let me clarify what my idea is here: I want a global kernel option
> that disables all code that has known y2038 issues. If anyone tries
> to build an embedded system with support beyond 2038, that should
> disable all of those things, including file systems, drivers and
> system calls, so we can reasonably assume that everything that works
> today with that kernel build will keep working in the future and
> not break in random ways.

It's not that black and white when it comes to filesystems. y2038k
support is determined by the on-disk structure of the filesystem
being mounted, and that is determined at mount time.  When the
filesystem mounts and sets it's valid timestamp ranges the VFS
will need to decide as to whether the filesystem is allowed to
continue mounting or not.

> For a file system, this can be done in a number of ways:
> 
> * Most file systems today interpret the time as an unsigned 32-bit
>   number (as opposed to signed as ext3, xfs and few others do),
>   so as long as we use timespec64 in the syscalls, we are ok.

Actually, sign conversion is a problem we currently have to be very
careful of.  See, for example, xfstests:tests/generic/258, which
tests timestamps recording times before epoch. i.e. in XFS we have
to convert the unsigned 32 bit disk timestamp to signed 32 bit
before storing it in the VFS timestamp so it behaves correctly on 64
bit systems. This results in us needing to do this when reading the
inode from disk:

+       /*
+        * time is signed, so need to convert to signed 32 bit before
+        * storing in inode timestamp which may be 64 bit. Otherwise
+        * a time before epoch is converted to a time long after epoch
+        * on 64 bit systems.
+        */
+       inode->i_atime.tv_sec = (int)be32_to_cpu(from->di_atime.t_sec);
+       inode->i_atime.tv_nsec = (int)be32_to_cpu(from->di_atime.t_nsec);
+       inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec);
+       inode->i_mtime.tv_nsec = (int)be32_to_cpu(from->di_mtime.t_nsec);
+       inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec);
+       inode->i_ctime.tv_nsec = (int)be32_to_cpu(from->di_ctime.t_nsec);
+
(http://oss.sgi.com/archives/xfs/2016-01/msg00456.html)

> * Some legacy file systems (maybe hfs) can remain disabled, as
>   nobody cares about them any more.
> 
> * If we still care about them (e.g. ext2), we can make them support
>   only read-only mode. In ext4, this would mean forbidding write
>   access to file systems that don't have the extended inode format
>   enabled.

For ext2/4, that would have to be handled internally by the
filesystem with feature masks. For other legacy filesystems, then
the VFS mount time checking could allow RO mounts if the supported
ranges are not y2038k clean. Compile time options are not
really the best approach here...

> > If there are places where filesystems are receiving or using
> > unchecked timestamps then those are bugs that need fixing. Those
> > need to be in separate patches to y2038k support...
> 
> Fair enough, but that probably means that patch series will have to
> come first.

Yes, that is normal practice for structuring a non-trivial feature
addition patch series. Bug fixes first, then cleanups, then the
functionality changes.

> This will also reduce the number of places in which a
> separate type conversion function needs to be added.

Precisely. I'm pretty sure this should come down to "no new
conversion functions needed" for the vast majority of the code.

Cheers,

Dave.
Arnd Bergmann Jan. 14, 2016, 10:46 p.m. UTC | #12
On Friday 15 January 2016 08:00:01 Dave Chinner wrote:
> On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote:
> > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
> > > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
> > c) The opposite direction from b) is to first change the common
> >    code, but then any direct assignment between a timespec in
> >    a file system and the timespec64 in the inode/iattr/kstat/etc
> >    first needs a conversion helper so we can build cleanly,
> >    and then we do one file system at a time to remove them all
> >    again while changing the internal structures in the
> >    file system from timespec to timespec64.   
> 
> No new helpers are necessary - we've already got the helper
> functions we need. This:
> 
> int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
>  {
>         struct inode *inode = d_inode(old_dentry);
> +       struct inode_timespec now = current_fs_time(inode->i_sb);
> 
> -       inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> +       VFS_INODE_SET_XTIME(i_ctime, inode, now);
> +       VFS_INODE_SET_XTIME(i_mtime, dir, now);
> +       VFS_INODE_SET_XTIME(i_ctime, dir, now);
>         inc_nlink(inode);
> .....
> 
> is just wrong. All the type conversion and clamping and checking
> done in that VFS_INODE_SET_XTIME() should be done in
> current_fs_time() and have it return a timespec64 directly. Indeed,
> it already does truncation, and can easily be made to do range
> clamping, too.  i.e.  the change should simply be:
>
> -       inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> +	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_fs_time(inode->i_sb);


Yes, that is the obvious case, and I guess works for at least half the
file systems when they always assign righthand side and lefthand
side of the time stamps using the external types or helpers like
CURRENT_TIME and current_fs_time().

However, there are a couple of file systems that need a bit more refactoring
before we can do this, e.g. in ntfs_truncate:

        if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) {
                struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb);
                int sync_it = 0;

                if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) ||
                    !timespec_equal(&VFS_I(base_ni)->i_ctime, &now))
                        sync_it = 1;
                VFS_I(base_ni)->i_mtime = now;
                VFS_I(base_ni)->i_ctime = now;
	}

The type of the local variable must match the return code of
current_fs_time(), so if we change over i_mtime and current_fs_time
globally, this either has to be rewritten first to avoid the use of
local variables, or it needs temporary conversion helpers, or
it has to be changed in the same patch. None of those is particularly
appealing. There are a few dozen such things in various file systems.

> > > I think it's a mix - if the timestamps come in from userspace,
> > > fail with ERANGE. That could be controlled by sysctl via VFS
> > > part of the ->setattr operation, or in each of the individual FS
> > > implementations. If they come from the kernel (e.g. atime update) then
> > > the generic behvaiour is to warn and continue, filesystems can
> > > otherwise select their own policy for kernel updates via
> > > ->update_time.
> > 
> > I'd prefer not to have it done by the individual file system
> > implementation, so we get a consistent behavior.
> 
> Can't be helped, because different filesystems have different
> timestamp behaviours, and not all use the generic VFS code for
> timestamp updates. The filesystems need to use the correct helper
> functions to obtain a valid current time, but you can't stop them
> from storing and using arbitrary timestamp formats if they so
> desire...

I mean the decision whether to clamp or error on an overflow
should be done consistently. Having a global sysctl knob or
a compile-time option is better than having each file system
implementor take a guess at what users might prefer, if we can't
come up with a behavior (e.g. clamp all the time, or error out
all the time) that everybody agrees is always correct.

> > > > 	d. disable expired fs at compile time (Arnd's suggestion)
> > > 
> > > Not really an option, because it means we can't use filesystems that
> > > interop with other systems (e.g. cameras, etc) because they won't
> > > support y2038k timestamps for a long time, if ever (e.g. vfat).
> > 
> > Let me clarify what my idea is here: I want a global kernel option
> > that disables all code that has known y2038 issues. If anyone tries
> > to build an embedded system with support beyond 2038, that should
> > disable all of those things, including file systems, drivers and
> > system calls, so we can reasonably assume that everything that works
> > today with that kernel build will keep working in the future and
> > not break in random ways.
> 
> It's not that black and white when it comes to filesystems. y2038k
> support is determined by the on-disk structure of the filesystem
> being mounted, and that is determined at mount time.  When the
> filesystem mounts and sets it's valid timestamp ranges the VFS
> will need to decide as to whether the filesystem is allowed to
> continue mounting or not.

Some file systems are always broken around 2038 (e.g. HFS in 2040),
so if we can't fix them, I want to be able to turn them off
in Kconfig along with the 32-bit time_t syscalls. For those
where y2038 support depends on on-disk feature flags (ext4 and
xfs I guess, maybe one or two more), the config option can
turn off write support for the old format.

This is a rather important part of the y2038 work: If anyone cares about
y2038 problems in this decade, they want to deploy systems with extremely
long service contracts and don't want to update them 20 years from now
to fix an obscure bug that could be prevented today, so we try hard to
identify every line of code that won't work then as it does today.

> > For a file system, this can be done in a number of ways:
> > 
> > * Most file systems today interpret the time as an unsigned 32-bit
> >   number (as opposed to signed as ext3, xfs and few others do),
> >   so as long as we use timespec64 in the syscalls, we are ok.
> 
> Actually, sign conversion is a problem we currently have to be very
> careful of.  See, for example, xfstests:tests/generic/258, which
> tests timestamps recording times before epoch. i.e. in XFS we have
> to convert the unsigned 32 bit disk timestamp to signed 32 bit
> before storing it in the VFS timestamp so it behaves correctly on 64
> bit systems. This results in us needing to do this when reading the
> inode from disk:
> 
> +       /*
> +        * time is signed, so need to convert to signed 32 bit before
> +        * storing in inode timestamp which may be 64 bit. Otherwise
> +        * a time before epoch is converted to a time long after epoch
> +        * on 64 bit systems.
> +        */
> +       inode->i_atime.tv_sec = (int)be32_to_cpu(from->di_atime.t_sec);
> +       inode->i_atime.tv_nsec = (int)be32_to_cpu(from->di_atime.t_nsec);
> +       inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec);
> +       inode->i_mtime.tv_nsec = (int)be32_to_cpu(from->di_mtime.t_nsec);
> +       inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec);
> +       inode->i_ctime.tv_nsec = (int)be32_to_cpu(from->di_ctime.t_nsec);
> +
> (http://oss.sgi.com/archives/xfs/2016-01/msg00456.html)

I'm very aware of this issue. Most file system developers however were
not, so e.g. a timestamp on btrfs is interpreted differently on 32-bit
and 64-bit kernels. Some file systems (AFS, NFSv3?) explicitly define
the timestamps as unsigned, and most others that store 32-bit seconds
apparently never thought about the issue and happen to use unsigned
interpretation on 64-bit systems, since that is what you get out of
be32_to_cpu() without adding the cast.

For the y2038 case, this means we are lucky: almost all the users today
have 64-bit hardware, so they can already represent timestamps in the
range from 1970 to 2106. Once we have 64-bit timestamps in 32-bit user
space, we just need to make that use the same format we use on the
64-bit machines already.

ext2/3/4, xfs and ocfs2 (maybe one or two more, I'd have to check)
currently behave in a consistent manner across 32-bit and 64-bit
architectures by allowing a range between 1902 and 2037, and we
obviously don't have a choice there but to keep that current
behavior, and extend the time format in one way or another to
store additional bits for the epoch.

> > * Some legacy file systems (maybe hfs) can remain disabled, as
> >   nobody cares about them any more.
> > 
> > * If we still care about them (e.g. ext2), we can make them support
> >   only read-only mode. In ext4, this would mean forbidding write
> >   access to file systems that don't have the extended inode format
> >   enabled.
> 
> For ext2/4, that would have to be handled internally by the
> filesystem with feature masks. For other legacy filesystems, then
> the VFS mount time checking could allow RO mounts if the supported
> ranges are not y2038k clean. Compile time options are not
> really the best approach here...

I'm not following the line of thought here. We have some users
that want ext4 to mount old file system images without long
inodes writable, because they don't care about the 2038 problem.
We also have other users that want to force the same file system
image to be read-only because they want to ensure that it does
not stop working correctly when the time overflow happens while
the fs is mounted.

If you don't want a compile-time option for it, how do you suggest
we decide which case we have?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 14, 2016, 10:54 p.m. UTC | #13
On Thursday 14 January 2016 23:46:16 Arnd Bergmann wrote:
> 
> I'm not following the line of thought here. We have some users
> that want ext4 to mount old file system images without long
> inodes writable, because they don't care about the 2038 problem.
> We also have other users that want to force the same file system
> image to be read-only because they want to ensure that it does
> not stop working correctly when the time overflow happens while
> the fs is mounted.
> 
> If you don't want a compile-time option for it, how do you suggest
> we decide which case we have?

In case that came across wrong, I'm assuming that the first
user also wants all the system calls enabled that pass 32-bit
time_t values, while the second one wants them all left out from
the kernel to ensure that no user space program gets incorrect
data. This could be done using a sysctl of course, but I still
think we want a compile-time option for the syscalls for clarity,
and I would simply use the same compile-time option to determine
the behavior of the file system, network protocols and device
drivers that deal with 32-bit timestamps outside of the kernel.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 15, 2016, 2:27 a.m. UTC | #14
On Thu, Jan 14, 2016 at 11:54:36PM +0100, Arnd Bergmann wrote:
> On Thursday 14 January 2016 23:46:16 Arnd Bergmann wrote:
> > 
> > I'm not following the line of thought here. We have some users
> > that want ext4 to mount old file system images without long
> > inodes writable, because they don't care about the 2038 problem.
> > We also have other users that want to force the same file system
> > image to be read-only because they want to ensure that it does
> > not stop working correctly when the time overflow happens while
> > the fs is mounted.
> > 
> > If you don't want a compile-time option for it, how do you suggest
> > we decide which case we have?
> 
> In case that came across wrong, I'm assuming that the first
> user also wants all the system calls enabled that pass 32-bit
> time_t values, while the second one wants them all left out from
> the kernel to ensure that no user space program gets incorrect
> data.

system call API support is a completely different class of problem.
It's out of the scope of this patchset, and really I don't care what
you do with them.

The point I'm making is that we'll have to modify all the existing
filesystem code to supply a valid timestamp range to the VFS at
mount time for the range checking/clamping, similar to how we do the
granularity specification right now. That means we can do rejection
of non-y2038k compliant filesystems at runtime based on what the
filesystem tells the VFS it supports..  Set up the default to be
reject if rw, allow if ro, and provide a mount option to override ad
allow mounting rw.

Users can then make the decision when mounting their filesystems.
If they are system/automatically mounted filesystems and aren't
y2038k compliant, then the override option can be added to
/etc/fstab and we're all good. If the truly paranoid users want to
disallow the override and/or read only mount options, then add a
sysctl to control that.

Cheers,

Dave.
Dave Chinner Jan. 15, 2016, 2:49 a.m. UTC | #15
On Thu, Jan 14, 2016 at 11:46:16PM +0100, Arnd Bergmann wrote:
> On Friday 15 January 2016 08:00:01 Dave Chinner wrote:
> > On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote:
> > > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
> > > > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
> > > c) The opposite direction from b) is to first change the common
> > >    code, but then any direct assignment between a timespec in
> > >    a file system and the timespec64 in the inode/iattr/kstat/etc
> > >    first needs a conversion helper so we can build cleanly,
> > >    and then we do one file system at a time to remove them all
> > >    again while changing the internal structures in the
> > >    file system from timespec to timespec64.   
> > 
> > No new helpers are necessary - we've already got the helper
> > functions we need. This:
> > 
> > int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
> >  {
> >         struct inode *inode = d_inode(old_dentry);
> > +       struct inode_timespec now = current_fs_time(inode->i_sb);
> > 
> > -       inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> > +       VFS_INODE_SET_XTIME(i_ctime, inode, now);
> > +       VFS_INODE_SET_XTIME(i_mtime, dir, now);
> > +       VFS_INODE_SET_XTIME(i_ctime, dir, now);
> >         inc_nlink(inode);
> > .....
> > 
> > is just wrong. All the type conversion and clamping and checking
> > done in that VFS_INODE_SET_XTIME() should be done in
> > current_fs_time() and have it return a timespec64 directly. Indeed,
> > it already does truncation, and can easily be made to do range
> > clamping, too.  i.e.  the change should simply be:
> >
> > -       inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> > +	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_fs_time(inode->i_sb);
> 
> 
> Yes, that is the obvious case, and I guess works for at least half the
> file systems when they always assign righthand side and lefthand
> side of the time stamps using the external types or helpers like
> CURRENT_TIME and current_fs_time().
> 
> However, there are a couple of file systems that need a bit more refactoring
> before we can do this, e.g. in ntfs_truncate:

Sure, and nfs is a pain because of all it's internal use of
timespecs, too. But we have these timespec_to_timespec64 helper
functions, and that's what we should use in these cases where the
filesystem cannot support full 64 bit timestamps internally. In
those cases, they'll be telling the superblock this at mount time
things like current_fs_time() won't be returning then a timestamp
that is out of range for a 32 bit timestamp....

>         if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) {
>                 struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb);
>                 int sync_it = 0;
> 
>                 if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) ||
>                     !timespec_equal(&VFS_I(base_ni)->i_ctime, &now))
>                         sync_it = 1;
>                 VFS_I(base_ni)->i_mtime = now;
>                 VFS_I(base_ni)->i_ctime = now;
> 	}
> 
> The type of the local variable must match the return code of
> current_fs_time(), so if we change over i_mtime and current_fs_time
> globally, this either has to be rewritten first to avoid the use of
> local variables, or it needs temporary conversion helpers, or
> it has to be changed in the same patch. None of those is particularly
> appealing. There are a few dozen such things in various file systems.

it gets rewritten to:

		struct timespec now;

		now = timespec64_to_timespec(current_fs_time(VFS_I(base_ni)->i_sb));
....

and the valid timestamp range for ntfs is set to 32bit timestamps.
This then leaves it up to the filesystem developers to make
the ntfs filesystem code 64 bit timestamp clean if the on disk
format is ever changed to support 64 bit times.

Same goes for NFS, and any of the other filesystems that use struct
timespec internally for time representation.

> > > > I think it's a mix - if the timestamps come in from userspace,
> > > > fail with ERANGE. That could be controlled by sysctl via VFS
> > > > part of the ->setattr operation, or in each of the individual FS
> > > > implementations. If they come from the kernel (e.g. atime update) then
> > > > the generic behvaiour is to warn and continue, filesystems can
> > > > otherwise select their own policy for kernel updates via
> > > > ->update_time.
> > > 
> > > I'd prefer not to have it done by the individual file system
> > > implementation, so we get a consistent behavior.
> > 
> > Can't be helped, because different filesystems have different
> > timestamp behaviours, and not all use the generic VFS code for
> > timestamp updates. The filesystems need to use the correct helper
> > functions to obtain a valid current time, but you can't stop them
> > from storing and using arbitrary timestamp formats if they so
> > desire...
> 
> I mean the decision whether to clamp or error on an overflow
> should be done consistently.

Sure, but that comes from using the helpers we already have, and
applying the clamping at a point where errors can be returned to
userspace. current_fs_time() should never return a timestamp outside
what the filesystem has said it supports and we've validated that
behaviour at mount time. hence it's only user provided timestamps
that we need range errors on.

> Having a global sysctl knob or
> a compile-time option is better than having each file system
> implementor take a guess at what users might prefer, if we can't
> come up with a behavior (e.g. clamp all the time, or error out
> all the time) that everybody agrees is always correct.

filesystem implementors will use the helper funtions that are
provided for this. If they don't (like all the current use of
CURRENT_TIME), then that's a bug that needs fixing. i.e. we need
a timespec_clamp() function, similar to timespec_trunc(), and y2038k
compliant filesystems and syscalls need to use them....

> > > > 
> > > > Not really an option, because it means we can't use filesystems that
> > > > interop with other systems (e.g. cameras, etc) because they won't
> > > > support y2038k timestamps for a long time, if ever (e.g. vfat).
> > > 
> > > Let me clarify what my idea is here: I want a global kernel option
> > > that disables all code that has known y2038 issues. If anyone tries
> > > to build an embedded system with support beyond 2038, that should
> > > disable all of those things, including file systems, drivers and
> > > system calls, so we can reasonably assume that everything that works
> > > today with that kernel build will keep working in the future and
> > > not break in random ways.
> > 
> > It's not that black and white when it comes to filesystems. y2038k
> > support is determined by the on-disk structure of the filesystem
> > being mounted, and that is determined at mount time.  When the
> > filesystem mounts and sets it's valid timestamp ranges the VFS
> > will need to decide as to whether the filesystem is allowed to
> > continue mounting or not.
> 
> Some file systems are always broken around 2038 (e.g. HFS in 2040),
> so if we can't fix them, I want to be able to turn them off
> in Kconfig along with the 32-bit time_t syscalls.

That can be done with kconfig depends rules - it has nothing to do
with this patch set.

> ext2/3/4, xfs and ocfs2 (maybe one or two more, I'd have to check)
> currently behave in a consistent manner across 32-bit and 64-bit
> architectures by allowing a range between 1902 and 2037, and we
> obviously don't have a choice there but to keep that current
> behavior, and extend the time format in one way or another to
> store additional bits for the epoch.

That's a filesystem implementation problem, not a generic inode
timestamp problem. i.e. this is handled when the filesystem converts
the inode timestamp from a timespec64 in the struct inode to
whatever format it stores the timestamp on disk. That conversion
does not change just because the VFS inode moves from a timespec to
a timespec64. Again, those on-disk format changes to support beyond
the current epoch are outside the scope of this patchset, because
they are not affected by the timestamp format the VFS choses to use.

Cheers,

Dave.
Deepa Dinamani Jan. 15, 2016, 5:03 a.m. UTC | #16
On Fri, Jan 15, 2016 at 08:00:01AM +1100, Dave Chinner wrote:
> On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote:
> > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
> > > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
> > > > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote:
> > > > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
> > > > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote:
> > > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
> > > >
> > c) The opposite direction from b) is to first change the common
> >    code, but then any direct assignment between a timespec in
> >    a file system and the timespec64 in the inode/iattr/kstat/etc
> >    first needs a conversion helper so we can build cleanly,
> >    and then we do one file system at a time to remove them all
> >    again while changing the internal structures in the
> >    file system from timespec to timespec64.   
> 
> No new helpers are necessary - we've already got the helper
> functions we need. This:
> 
> int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
>  {
>         struct inode *inode = d_inode(old_dentry);
> +       struct inode_timespec now = current_fs_time(inode->i_sb);
> 
> -       inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> +       VFS_INODE_SET_XTIME(i_ctime, inode, now);
> +       VFS_INODE_SET_XTIME(i_mtime, dir, now);
> +       VFS_INODE_SET_XTIME(i_ctime, dir, now);
>         inc_nlink(inode);
> .....
> 
> is just wrong. All the type conversion and clamping and checking
> done in that VFS_INODE_SET_XTIME() should be done in
> current_fs_time() and have it return a timespec64 directly. Indeed,
> it already does truncation, and can easily be made to do range
> clamping, too.  i.e.  the change should simply be:
> 
> -       inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> +	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_fs_time(inode->i_sb);

The current patch series already does this and macros don't do any clamping.
The accesors are only illustrating a way of using a callback when clamping is detected.
And, if we don't need accessors, I will find a different way of doing that if we agree
it is necessary.

+struct timespec64 current_fs_time(struct super_block *sb)
+{
+       struct timespec64 now = current_kernel_time64();
+
+       return fs_time_trunc(now, sb);
+}
+EXPORT_SYMBOL(current_fs_time);
+
+struct timespec64 current_fs_time_sec(struct super_block *sb)
+{
+       struct timespec64 ts = {ktime_get_real_seconds(), 0};
+
+       /* range check for time. */
+       fs_time_range_check(sb, &ts);
+
+       return ts;
+}

+struct inode_timespec
+fs_time_trunc(struct inode_timespec t, struct super_block *sb)
+{
+       u32 gran = sb->s_time_gran;
+
+       /* range check for time. */
+       fs_time_range_check(sb, &t);
+       if (unlikely(is_fs_timestamp_bad(t)))
+               return t;
+
+       /* Avoid division in the common cases 1 ns and 1 s. */
+       if (gran == 1)
+               ;/* nothing */
+       else if (gran == NSEC_PER_SEC)
+               t.tv_nsec = 0;
+       else if (gran > 1 && gran < NSEC_PER_SEC)
+               t.tv_nsec -= t.tv_nsec % gran;
+       else
+               WARN(1, "illegal file time granularity: %u", gran);
+
+       return t;
+}

But really, you are still misunderstanding what inode_timespec does.
It only introduces aliases and not the accessors.
This is only a way to avoid code duplication since we cannot change all fs
at one time.

And, this is what you would do if you were coding in any object oriented
language. This is object polymorphism.

So, I will paste here again. Whatever is below is all the extra code
inode_timespec introduces:

+#ifdef CONFIG_FS_USES_64BIT_TIME
+
+/* Place holder defines until CONFIG_FS_USES_64BIT_TIME
+ * is enabled.
+ * timespec64 data type and functions will be used at that
+ * time directly and these defines will be deleted.
+ */
+#define inode_timespec timespec64
+
+#define inode_timespec_compare timespec64_compare
+#define inode_timespec_equal   timespec64_equal
+
+#else
+
+#define inode_timespec timespec
+
+#define inode_timespec_compare timespec_compare
+#define inode_timespec_equal   timespec_equal
+
+#endif
+

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 15, 2016, 4:50 p.m. UTC | #17
On Friday 15 January 2016 13:49:37 Dave Chinner wrote:
> On Thu, Jan 14, 2016 at 11:46:16PM +0100, Arnd Bergmann wrote:
> > On Friday 15 January 2016 08:00:01 Dave Chinner wrote:
> > > On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote:
> > > > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
> > 
> > Yes, that is the obvious case, and I guess works for at least half the
> > file systems when they always assign righthand side and lefthand
> > side of the time stamps using the external types or helpers like
> > CURRENT_TIME and current_fs_time().
> > 
> > However, there are a couple of file systems that need a bit more refactoring
> > before we can do this, e.g. in ntfs_truncate:
> 
> Sure, and nfs is a pain because of all it's internal use of
> timespecs, too.

lustre is probably the worst.

> But we have these timespec_to_timespec64 helper
> functions, and that's what we should use in these cases where the
> filesystem cannot support full 64 bit timestamps internally. In
> those cases, they'll be telling the superblock this at mount time
> things like current_fs_time() won't be returning then a timestamp
> that is out of range for a 32 bit timestamp....

I'm not worried about the runtime problems here, just how to
get a series of patches that are each doing one reasonable thing
at a time.

> >         if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) {
> >                 struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb);
> >                 int sync_it = 0;
> > 
> >                 if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) ||
> >                     !timespec_equal(&VFS_I(base_ni)->i_ctime, &now))
> >                         sync_it = 1;
> >                 VFS_I(base_ni)->i_mtime = now;
> >                 VFS_I(base_ni)->i_ctime = now;
> > 	}
> > 
> > The type of the local variable must match the return code of
> > current_fs_time(), so if we change over i_mtime and current_fs_time
> > globally, this either has to be rewritten first to avoid the use of
> > local variables, or it needs temporary conversion helpers, or
> > it has to be changed in the same patch. None of those is particularly
> > appealing. There are a few dozen such things in various file systems.
> 
> it gets rewritten to:
> 
> 		struct timespec now;
> 
> 		now = timespec64_to_timespec(current_fs_time(VFS_I(base_ni)->i_sb));
> ....
> 
> and the valid timestamp range for ntfs is set to 32bit timestamps.
> This then leaves it up to the filesystem developers to make
> the ntfs filesystem code 64 bit timestamp clean if the on disk
> format is ever changed to support 64 bit times.
> 
> Same goes for NFS, and any of the other filesystems that use struct
> timespec internally for time representation.

That is what I meant in my previous mail about approach c) being ugly
because it requires sprinkling lots of timespec64_to_timespec() and
timespec_to_timespec64() in the initial patch in order to atomically
change the types in inode/iattr/kstat/... without introducing build
regressions.

It's a rather horrible patch, and quite likely to cause conflicts with
other patches that introduce another use of those structures in the
merge window.

> > Having a global sysctl knob or
> > a compile-time option is better than having each file system
> > implementor take a guess at what users might prefer, if we can't
> > come up with a behavior (e.g. clamp all the time, or error out
> > all the time) that everybody agrees is always correct.
> 
> filesystem implementors will use the helper funtions that are
> provided for this. If they don't (like all the current use of
> CURRENT_TIME), then that's a bug that needs fixing.

Ok, then we are in total agreement here: the policy remains
to be decided by common code, but the implementation can differ
per file system.

> i.e. we need
> a timespec_clamp() function, similar to timespec_trunc(), and y2038k
> compliant filesystems and syscalls need to use them....

I was thinking we end up with a single function that does both
clamp() and trunk(), but that's an implementation detail.
> > > > Let me clarify what my idea is here: I want a global kernel option
> > > > that disables all code that has known y2038 issues. If anyone tries
> > > > to build an embedded system with support beyond 2038, that should
> > > > disable all of those things, including file systems, drivers and
> > > > system calls, so we can reasonably assume that everything that works
> > > > today with that kernel build will keep working in the future and
> > > > not break in random ways.
> > > 
> > > It's not that black and white when it comes to filesystems. y2038k
> > > support is determined by the on-disk structure of the filesystem
> > > being mounted, and that is determined at mount time.  When the
> > > filesystem mounts and sets it's valid timestamp ranges the VFS
> > > will need to decide as to whether the filesystem is allowed to
> > > continue mounting or not.
> > 
> > Some file systems are always broken around 2038 (e.g. HFS in 2040),
> > so if we can't fix them, I want to be able to turn them off
> > in Kconfig along with the 32-bit time_t syscalls.
> 
> That can be done with kconfig depends rules - it has nothing to do
> with this patch set.

kconfig dependencies is what I meant for the simple cases where a
file system is known to always be broken, we just need a small
modification for the cases you mentioned below.

> > ext2/3/4, xfs and ocfs2 (maybe one or two more, I'd have to check)
> > currently behave in a consistent manner across 32-bit and 64-bit
> > architectures by allowing a range between 1902 and 2037, and we
> > obviously don't have a choice there but to keep that current
> > behavior, and extend the time format in one way or another to
> > store additional bits for the epoch.
> 
> That's a filesystem implementation problem, not a generic inode
> timestamp problem. i.e. this is handled when the filesystem converts
> the inode timestamp from a timespec64 in the struct inode to
> whatever format it stores the timestamp on disk. That conversion
> does not change just because the VFS inode moves from a timespec to
> a timespec64. Again, those on-disk format changes to support beyond
> the current epoch are outside the scope of this patchset, because
> they are not affected by the timestamp format the VFS choses to use.

Fine with me, we can have another series to add the Kconfig dependencies
and modify the file systems that need this.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 15, 2016, 5:01 p.m. UTC | #18
On Friday 15 January 2016 13:27:34 Dave Chinner wrote:
> On Thu, Jan 14, 2016 at 11:54:36PM +0100, Arnd Bergmann wrote:
> > On Thursday 14 January 2016 23:46:16 Arnd Bergmann wrote:
> > > 
> > > I'm not following the line of thought here. We have some users
> > > that want ext4 to mount old file system images without long
> > > inodes writable, because they don't care about the 2038 problem.
> > > We also have other users that want to force the same file system
> > > image to be read-only because they want to ensure that it does
> > > not stop working correctly when the time overflow happens while
> > > the fs is mounted.
> > > 
> > > If you don't want a compile-time option for it, how do you suggest
> > > we decide which case we have?
> > 
> > In case that came across wrong, I'm assuming that the first
> > user also wants all the system calls enabled that pass 32-bit
> > time_t values, while the second one wants them all left out from
> > the kernel to ensure that no user space program gets incorrect
> > data.
> 
> system call API support is a completely different class of problem.
> It's out of the scope of this patchset, and really I don't care what
> you do with them.

Sure, I was just providing some background about why we want a
compile-time option in general.

> The point I'm making is that we'll have to modify all the existing
> filesystem code to supply a valid timestamp range to the VFS at
> mount time for the range checking/clamping, similar to how we do the
> granularity specification right now. That means we can do rejection
> of non-y2038k compliant filesystems at runtime based on what the
> filesystem tells the VFS it supports..  Set up the default to be
> reject if rw, allow if ro, and provide a mount option to override ad
> allow mounting rw.

We can't really default to "reject if rw", because that would break
all systems using ext3 or xfs, unless users modify their fstab
or set the flag that makes the partition y2038 compliant.

The compile-time option that I'm thinking of would change the default
beween "always allow" and "reject if rw", based on whether the
system cares about this issue or not. Almost everyone today won't
care about it at all and would be rather annoyed by being unable
to mount their rootfs, but some people care about the behavior
a lot.

Having a global sysctl or mount option as an override would be good,
maybe both if that isn't over-engineering the problem when we already
have a compile-time option. 

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 15, 2016, 10:41 p.m. UTC | #19
On Fri, Jan 15, 2016 at 06:01:36PM +0100, Arnd Bergmann wrote:
> On Friday 15 January 2016 13:27:34 Dave Chinner wrote:
> > The point I'm making is that we'll have to modify all the existing
> > filesystem code to supply a valid timestamp range to the VFS at
> > mount time for the range checking/clamping, similar to how we do the
> > granularity specification right now. That means we can do rejection
> > of non-y2038k compliant filesystems at runtime based on what the
> > filesystem tells the VFS it supports..  Set up the default to be
> > reject if rw, allow if ro, and provide a mount option to override ad
> > allow mounting rw.
> 
> We can't really default to "reject if rw", because that would break
> all systems using ext3 or xfs, unless users modify their fstab
> or set the flag that makes the partition y2038 compliant.

Right, I was refering to the behaviour of a y2038k compliant kernel,
A current non-compliant kernel will have the default behaviour you
are suggesting.

> The compile-time option that I'm thinking of would change the default
> beween "always allow" and "reject if rw", based on whether the
> system cares about this issue or not. Almost everyone today won't
> care about it at all and would be rather annoyed by being unable
> to mount their rootfs, but some people care about the behavior
> a lot.

Yup, that's exactly what I was implying.

> Having a global sysctl or mount option as an override would be good,
> maybe both if that isn't over-engineering the problem when we already
> have a compile-time option. 

Distros should not be forces to ship multiple kernels just to
provide all the different runtime compliance behaviours their users
require. Make the policy runtime enforcable, but select the default
behaviour and supported policies via compile time options.

Cheers,

Dave
Andreas Dilger Jan. 16, 2016, 7:14 p.m. UTC | #20
> On Jan 15, 2016, at 9:50 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> 
> On Friday 15 January 2016 13:49:37 Dave Chinner wrote:
>> On Thu, Jan 14, 2016 at 11:46:16PM +0100, Arnd Bergmann wrote:
>>> On Friday 15 January 2016 08:00:01 Dave Chinner wrote:
>>>> On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote:
>>>>> On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
>>> 
>>> Yes, that is the obvious case, and I guess works for at least half the
>>> file systems when they always assign righthand side and lefthand
>>> side of the time stamps using the external types or helpers like
>>> CURRENT_TIME and current_fs_time().
>>> 
>>> However, there are a couple of file systems that need a bit more refactoring
>>> before we can do this, e.g. in ntfs_truncate:
>> 
>> Sure, and nfs is a pain because of all it's internal use of
>> timespecs, too.
> 
> lustre is probably the worst.

Lustre currently only has one-second granularity in a 64-bit field,
so it doesn't really care about the difference between timespec or
timespec64 at all.

The only other uses are for measuring relative times, so the 64-bitness
shouldn't really matter.

Could you please point out what issues exist so they can be fixed.

Cheers, Andreas

>> But we have these timespec_to_timespec64 helper
>> functions, and that's what we should use in these cases where the
>> filesystem cannot support full 64 bit timestamps internally. In
>> those cases, they'll be telling the superblock this at mount time
>> things like current_fs_time() won't be returning then a timestamp
>> that is out of range for a 32 bit timestamp....
> 
> I'm not worried about the runtime problems here, just how to
> get a series of patches that are each doing one reasonable thing
> at a time.
> 
>>>        if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) {
>>>                struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb);
>>>                int sync_it = 0;
>>> 
>>>                if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) ||
>>>                    !timespec_equal(&VFS_I(base_ni)->i_ctime, &now))
>>>                        sync_it = 1;
>>>                VFS_I(base_ni)->i_mtime = now;
>>>                VFS_I(base_ni)->i_ctime = now;
>>> 	}
>>> 
>>> The type of the local variable must match the return code of
>>> current_fs_time(), so if we change over i_mtime and current_fs_time
>>> globally, this either has to be rewritten first to avoid the use of
>>> local variables, or it needs temporary conversion helpers, or
>>> it has to be changed in the same patch. None of those is particularly
>>> appealing. There are a few dozen such things in various file systems.
>> 
>> it gets rewritten to:
>> 
>> 		struct timespec now;
>> 
>> 		now = timespec64_to_timespec(current_fs_time(VFS_I(base_ni)->i_sb));
>> ....
>> 
>> and the valid timestamp range for ntfs is set to 32bit timestamps.
>> This then leaves it up to the filesystem developers to make
>> the ntfs filesystem code 64 bit timestamp clean if the on disk
>> format is ever changed to support 64 bit times.
>> 
>> Same goes for NFS, and any of the other filesystems that use struct
>> timespec internally for time representation.
> 
> That is what I meant in my previous mail about approach c) being ugly
> because it requires sprinkling lots of timespec64_to_timespec() and
> timespec_to_timespec64() in the initial patch in order to atomically
> change the types in inode/iattr/kstat/... without introducing build
> regressions.
> 
> It's a rather horrible patch, and quite likely to cause conflicts with
> other patches that introduce another use of those structures in the
> merge window.
> 
>>> Having a global sysctl knob or
>>> a compile-time option is better than having each file system
>>> implementor take a guess at what users might prefer, if we can't
>>> come up with a behavior (e.g. clamp all the time, or error out
>>> all the time) that everybody agrees is always correct.
>> 
>> filesystem implementors will use the helper funtions that are
>> provided for this. If they don't (like all the current use of
>> CURRENT_TIME), then that's a bug that needs fixing.
> 
> Ok, then we are in total agreement here: the policy remains
> to be decided by common code, but the implementation can differ
> per file system.
> 
>> i.e. we need
>> a timespec_clamp() function, similar to timespec_trunc(), and y2038k
>> compliant filesystems and syscalls need to use them....
> 
> I was thinking we end up with a single function that does both
> clamp() and trunk(), but that's an implementation detail.
>>>>> Let me clarify what my idea is here: I want a global kernel option
>>>>> that disables all code that has known y2038 issues. If anyone tries
>>>>> to build an embedded system with support beyond 2038, that should
>>>>> disable all of those things, including file systems, drivers and
>>>>> system calls, so we can reasonably assume that everything that works
>>>>> today with that kernel build will keep working in the future and
>>>>> not break in random ways.
>>>> 
>>>> It's not that black and white when it comes to filesystems. y2038k
>>>> support is determined by the on-disk structure of the filesystem
>>>> being mounted, and that is determined at mount time.  When the
>>>> filesystem mounts and sets it's valid timestamp ranges the VFS
>>>> will need to decide as to whether the filesystem is allowed to
>>>> continue mounting or not.
>>> 
>>> Some file systems are always broken around 2038 (e.g. HFS in 2040),
>>> so if we can't fix them, I want to be able to turn them off
>>> in Kconfig along with the 32-bit time_t syscalls.
>> 
>> That can be done with kconfig depends rules - it has nothing to do
>> with this patch set.
> 
> kconfig dependencies is what I meant for the simple cases where a
> file system is known to always be broken, we just need a small
> modification for the cases you mentioned below.
> 
>>> ext2/3/4, xfs and ocfs2 (maybe one or two more, I'd have to check)
>>> currently behave in a consistent manner across 32-bit and 64-bit
>>> architectures by allowing a range between 1902 and 2037, and we
>>> obviously don't have a choice there but to keep that current
>>> behavior, and extend the time format in one way or another to
>>> store additional bits for the epoch.
>> 
>> That's a filesystem implementation problem, not a generic inode
>> timestamp problem. i.e. this is handled when the filesystem converts
>> the inode timestamp from a timespec64 in the struct inode to
>> whatever format it stores the timestamp on disk. That conversion
>> does not change just because the VFS inode moves from a timespec to
>> a timespec64. Again, those on-disk format changes to support beyond
>> the current epoch are outside the scope of this patchset, because
>> they are not affected by the timestamp format the VFS choses to use.
> 
> Fine with me, we can have another series to add the Kconfig dependencies
> and modify the file systems that need this.
> 
> 	Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas
Arnd Bergmann Jan. 16, 2016, 11:36 p.m. UTC | #21
On Saturday 16 January 2016 12:14:22 Andreas Dilger wrote:
> >> 
> >> Sure, and nfs is a pain because of all it's internal use of
> >> timespecs, too.
> > 
> > lustre is probably the worst.
> 
> Lustre currently only has one-second granularity in a 64-bit field,
> so it doesn't really care about the difference between timespec or
> timespec64 at all.
> 
> The only other uses are for measuring relative times, so the 64-bitness
> shouldn't really matter.
> 
> Could you please point out what issues exist so they can be fixed.

It's not really a bug that needs to be fixed, but more the general
issue of referencing inode->i_?time and attr->ia_?time and passing
them around. When we change the types in the inode and iattr from
timespec to timespec64, all assigments need to be modified, and lustre
has more of those assignments than any other file system I'm aware of.

	Arnd

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Jan. 17, 2016, 2:30 a.m. UTC | #22
On Jan 16, 2016, at 4:36 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> 
> On Saturday 16 January 2016 12:14:22 Andreas Dilger wrote:
>>>> 
>>>> Sure, and nfs is a pain because of all it's internal use of
>>>> timespecs, too.
>>> 
>>> lustre is probably the worst.
>> 
>> Lustre currently only has one-second granularity in a 64-bit field,
>> so it doesn't really care about the difference between timespec or
>> timespec64 at all.
>> 
>> The only other uses are for measuring relative times, so the 64-bitness
>> shouldn't really matter.
>> 
>> Could you please point out what issues exist so they can be fixed.
> 
> It's not really a bug that needs to be fixed, but more the general
> issue of referencing inode->i_?time and attr->ia_?time and passing
> them around. When we change the types in the inode and iattr from
> timespec to timespec64, all assigments need to be modified, and lustre
> has more of those assignments than any other file system I'm aware of.

All of those accesses are with the LTIME_S() macro to get/set only the
seconds field of the inode time, so it should only be a one-line patch?

Cheers, Andreas
Deepa Dinamani Jan. 18, 2016, 6:09 a.m. UTC | #23
Based on the discussion, here is how I propose to proceed:

1. Series for timestamp range check and clamping
2. Bug fixing patches like change all CURRENT_TIME use cases to 
current_fs_time()
3. Patches for vfs to use timespec64 internally (maybe a series, if 
required)
4. Patches that change all fs that use vfs APIs using timestamp arguments
(not a series)
5. Change individual fs to use timespec64 (not a series)
6. Change back whatever time conversion APIs left in vfs or individual fs 
(maybe a series, if required)

So, I don't see a need for submitting another series as all the changes now 
are handled on a case by case basis and no longer have a generic theme.

If everyone's in sync then I can proceed with the above plan.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 18, 2016, 10:56 a.m. UTC | #24
On Sunday 17 January 2016 22:09:26 Deepa Dinamani wrote:
> Based on the discussion, here is how I propose to proceed:
> 
> 1. Series for timestamp range check and clamping
> 2. Bug fixing patches like change all CURRENT_TIME use cases to 
> current_fs_time()
> 3. Patches for vfs to use timespec64 internally (maybe a series, if 
> required)
> 4. Patches that change all fs that use vfs APIs using timestamp arguments
> (not a series)
> 5. Change individual fs to use timespec64 (not a series)
> 6. Change back whatever time conversion APIs left in vfs or individual fs 
> (maybe a series, if required)
> 
> So, I don't see a need for submitting another series as all the changes now 
> are handled on a case by case basis and no longer have a generic theme.
> 
> If everyone's in sync then I can proceed with the above plan.

Sounds good to me. Step 3 of course is the hard one, and you may run into
further problems with it, as we both have in our previous attempts to
crack this nut, but with step 2 before it that may become manageable.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Deepa Dinamani Jan. 18, 2016, 5:40 p.m. UTC | #25
On Mon, Jan 18, 2016 at 2:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sunday 17 January 2016 22:09:26 Deepa Dinamani wrote:
>> Based on the discussion, here is how I propose to proceed:
>>
>> 1. Series for timestamp range check and clamping
>> 2. Bug fixing patches like change all CURRENT_TIME use cases to
>> current_fs_time()
>> 3. Patches for vfs to use timespec64 internally (maybe a series, if
>> required)
>> 4. Patches that change all fs that use vfs APIs using timestamp arguments
>> (not a series)
>> 5. Change individual fs to use timespec64 (not a series)
>> 6. Change back whatever time conversion APIs left in vfs or individual fs
>> (maybe a series, if required)
>>
>> So, I don't see a need for submitting another series as all the changes now
>> are handled on a case by case basis and no longer have a generic theme.
>>
>> If everyone's in sync then I can proceed with the above plan.
>
> Sounds good to me. Step 3 of course is the hard one, and you may run into
> further problems with it, as we both have in our previous attempts to
> crack this nut, but with step 2 before it that may become manageable.

Right, I don't agree with this approach and it will get very ugly.
I was just proposing a way to move forward because it looked like we are at
a stalemate.

Maybe xfs doesn't have these problems but some of the other fs-es do.
And, these will need changing twice: before(to use 64 bit arithmetic
like cifs, use current_fs_time() like fat etc) and along with vfs.

It will unnecessarily bloat the vfs switching to timespec64 code.
Below are 3 example filesystem changes that illustrates this problem:

Ext4:
1. cr_time
2. Encode and Decode api's

Both these ext4 changes need to made along with vfs change to ext4.
Many such fs exists and will make the vfs switch over very ugly.

FAT:
1. fat_time_fat2unix, fat_time_unix2fat

Both the above 2 functions also will have to be modified along with vfs.

CIFS:
1.  struct cifs_fscache_inode_auxdata - last_write_time, last_change_time
2.  cifs_fattr
3.  cifs_NTtimeToUnix, cifs_UnixTimeToNT, cnvrtDosUnixTm

All the above cifs changes also need to be changed in the same patch as
vfs switch to timespec64.

I don't think there is any nicer way to do this without having an
encapsulation layer like inode_timespec or accessors you mentioned to
change the underlying data type in the vfs.

Also, this scheme is so outrageously ugly that you can easily miss
some change.  There is no way of verifying the approach theoretically.
Of course, I will be using kernel tests like in other cases.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 18, 2016, 7:53 p.m. UTC | #26
On Monday 18 January 2016 09:40:12 Deepa Dinamani wrote:
> On Mon, Jan 18, 2016 at 2:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Sunday 17 January 2016 22:09:26 Deepa Dinamani wrote:
> >> Based on the discussion, here is how I propose to proceed:
> > Sounds good to me. Step 3 of course is the hard one, and you may run into
> > further problems with it, as we both have in our previous attempts to
> > crack this nut, but with step 2 before it that may become manageable.
> 
> Right, I don't agree with this approach and it will get very ugly.
> I was just proposing a way to move forward because it looked like we are at
> a stalemate.
> 
> Maybe xfs doesn't have these problems but some of the other fs-es do.
> And, these will need changing twice: before(to use 64 bit arithmetic
> like cifs, use current_fs_time() like fat etc) and along with vfs.
> 
> It will unnecessarily bloat the vfs switching to timespec64 code.
> Below are 3 example filesystem changes that illustrates this problem:
> 
> Ext4:
> 1. cr_time
> 2. Encode and Decode api's
> 
> Both these ext4 changes need to made along with vfs change to ext4.
> Many such fs exists and will make the vfs switch over very ugly.
> 
> FAT:
> 1. fat_time_fat2unix, fat_time_unix2fat
> 
> Both the above 2 functions also will have to be modified along with vfs.
> 
> CIFS:
> 1.  struct cifs_fscache_inode_auxdata - last_write_time, last_change_time
> 2.  cifs_fattr
> 3.  cifs_NTtimeToUnix, cifs_UnixTimeToNT, cnvrtDosUnixTm
> 
> All the above cifs changes also need to be changed in the same patch as
> vfs switch to timespec64.
> 
> I don't think there is any nicer way to do this without having an
> encapsulation layer like inode_timespec or accessors you mentioned to
> change the underlying data type in the vfs.
> 
> Also, this scheme is so outrageously ugly that you can easily miss
> some change.  There is no way of verifying the approach theoretically.
> Of course, I will be using kernel tests like in other cases.

I agree it's ugly and fragile to have one huge patch, but I think the
best way to illustrate it is to make it as small as possible and
then talk about whether that makes it acceptable or how we can
work around the problems.

Do you have an estimate what portion of the file systems need any
changes at all before we can flip over VFS to the new types?

If it's less than half, we you can try yet another variation (nothing
new really, we are always dealing with the same few tricks):

1. add timestamp range checking and clamping
2. kill off CURRENT_TIME
3. for each file system that uses struct timespec internally to pass
   around inode timestamps, do one patch that adds a
   timespec_to_inode_time() and vice versa, which gets defined like

static inline struct timespec timespec_to_inode(struct timespec t)
{
	return t;
}

4. change the internal representation in one patch that changes those
   helpers along with the struct members.
5. change the file systems to use timespec64 internally instead of
   timespec.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 18, 2016, 9:14 p.m. UTC | #27
On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote:
> On Monday 18 January 2016 09:40:12 Deepa Dinamani wrote:
> > On Mon, Jan 18, 2016 at 2:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Sunday 17 January 2016 22:09:26 Deepa Dinamani wrote:
> > >> Based on the discussion, here is how I propose to proceed:
> > > Sounds good to me. Step 3 of course is the hard one, and you may run into
> > > further problems with it, as we both have in our previous attempts to
> > > crack this nut, but with step 2 before it that may become manageable.
> > 
> > Right, I don't agree with this approach and it will get very ugly.
> > I was just proposing a way to move forward because it looked like we are at
> > a stalemate.
> > 
> > Maybe xfs doesn't have these problems but some of the other fs-es do.
> > And, these will need changing twice: before(to use 64 bit arithmetic
> > like cifs, use current_fs_time() like fat etc) and along with vfs.
> > 
> > It will unnecessarily bloat the vfs switching to timespec64 code.
> > Below are 3 example filesystem changes that illustrates this problem:
> > 
> > Ext4:
> > 1. cr_time
> > 2. Encode and Decode api's
> > 
> > Both these ext4 changes need to made along with vfs change to ext4.
> > Many such fs exists and will make the vfs switch over very ugly.
> > 
> > FAT:
> > 1. fat_time_fat2unix, fat_time_unix2fat
> > 
> > Both the above 2 functions also will have to be modified along with vfs.
> > 
> > CIFS:
> > 1.  struct cifs_fscache_inode_auxdata - last_write_time, last_change_time
> > 2.  cifs_fattr
> > 3.  cifs_NTtimeToUnix, cifs_UnixTimeToNT, cnvrtDosUnixTm
> > 
> > All the above cifs changes also need to be changed in the same patch as
> > vfs switch to timespec64.
> > 
> > I don't think there is any nicer way to do this without having an
> > encapsulation layer like inode_timespec or accessors you mentioned to
> > change the underlying data type in the vfs.
> > 
> > Also, this scheme is so outrageously ugly that you can easily miss
> > some change.  There is no way of verifying the approach theoretically.
> > Of course, I will be using kernel tests like in other cases.
> 
> I agree it's ugly and fragile to have one huge patch,

Nobody is suggesting one huge patch here. This can all be done with
small steps.

> but I think the
> best way to illustrate it is to make it as small as possible and
> then talk about whether that makes it acceptable or how we can
> work around the problems.
> 
> Do you have an estimate what portion of the file systems need any
> changes at all before we can flip over VFS to the new types?

All filesystems will, at least, need auditing. A large number of
them will need changes, no matter how we "abstract" the VFS type
change, even if it is just for 32->64 bit sign extension bugs.

Filesystems that have intermediate timestamp formats such as Lustre,
NFS, CIFS, etc will need conversion at the vfs/filesytem entry
points, and their internals will remain unchanged. Fixing the
internals is outside the scope fo the VFS change - the 64 bit VFS
inode support stops at the VFS inode/filesystem boundary.

> If it's less than half, we you can try yet another variation (nothing
> new really, we are always dealing with the same few tricks):
> 
> 1. add timestamp range checking and clamping
> 2. kill off CURRENT_TIME

Other way around. First make everything use the existing current
time functions, then ensure that incoming timestamps are truncated
correctly, then add range checking and clamping to the existing
time modification functions.

> 3. for each file system that uses struct timespec internally to pass
>    around inode timestamps, do one patch that adds a
>    timespec_to_inode_time() and vice versa, which gets defined like
> 
> static inline struct timespec timespec_to_inode(struct timespec t)
> {
> 	return t;
> }

This works, and is much cleaner than propagating the macro nastiness
everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be
better named as it describes the conversion exactly. I don't think
this is a huge patch, though - it's mainly the setattr/kstat
operations that need changing here.

> 4. change the internal representation in one patch that changes those
>    helpers along with the struct members.

If you are talking about converting internal filesystem
representations to (e.g. CIFS fattr, NFS fattr, etc) then this is
wrong. Those filesystems are isolated and able to use timespecs
internally by step 3, and without protocol/format changes can't
support y2038k compliant dates. Hence fixing such problems is a
problem for the filesystem developers and is not an issue for the
VFS timestamp conversion.

That said, stuff like the ext4 encode/decode routines (my eyes, they
bleed!) that pass the VFS inode timestamp by reference to other
functions will need fixing here.

> 5. change the file systems to use timespec64 internally instead of
>    timespec.

I think that will work and leave use with a relatively clean code
base, as well as be able to address y2038k support each individual
filesystem in our own time.

Cheers,

Dave.
Arnd Bergmann Jan. 18, 2016, 9:46 p.m. UTC | #28
On Tuesday 19 January 2016 08:14:59 Dave Chinner wrote:
> On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote:
> > On Monday 18 January 2016 09:40:12 Deepa Dinamani wrote:
> > > On Mon, Jan 18, 2016 at 2:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > 
> > I agree it's ugly and fragile to have one huge patch,
> 
> Nobody is suggesting one huge patch here. This can all be done with
> small steps.
> 
> > but I think the
> > best way to illustrate it is to make it as small as possible and
> > then talk about whether that makes it acceptable or how we can
> > work around the problems.
> > 
> > Do you have an estimate what portion of the file systems need any
> > changes at all before we can flip over VFS to the new types?
> 
> All filesystems will, at least, need auditing. A large number of
> them will need changes, no matter how we "abstract" the VFS type
> change, even if it is just for 32->64 bit sign extension bugs.
> 
> Filesystems that have intermediate timestamp formats such as Lustre,
> NFS, CIFS, etc will need conversion at the vfs/filesytem entry
> points, and their internals will remain unchanged. Fixing the
> internals is outside the scope fo the VFS change - the 64 bit VFS
> inode support stops at the VFS inode/filesystem boundary.

What I meant with "one huge patch" is simply just the change that
is needed to modify the type, if we don't use conversion helper
functions.

> > If it's less than half, we you can try yet another variation (nothing
> > new really, we are always dealing with the same few tricks):
> > 
> > 1. add timestamp range checking and clamping
> > 2. kill off CURRENT_TIME
> 
> Other way around. First make everything use the existing current
> time functions, then ensure that incoming timestamps are truncated
> correctly, then add range checking and clamping to the existing
> time modification functions.

Makes sense.

> > 3. for each file system that uses struct timespec internally to pass
> >    around inode timestamps, do one patch that adds a
> >    timespec_to_inode_time() and vice versa, which gets defined like
> > 
> > static inline struct timespec timespec_to_inode(struct timespec t)
> > {
> > 	return t;
> > }
> 
> This works, and is much cleaner than propagating the macro nastiness
> everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be
> better named as it describes the conversion exactly. I don't think
> this is a huge patch, though - it's mainly the setattr/kstat
> operations that need changing here.

Good idea for the name.

If you are ok with adding those helpers, then it can be done in small
steps indeed. I was under the assumption that you didn't like any
kind of abstraction of the type in struct inode at all.

> > 4. change the internal representation in one patch that changes those
> >    helpers along with the struct members.
> 
> If you are talking about converting internal filesystem
> representations to (e.g. CIFS fattr, NFS fattr, etc) then this is
> wrong. Those filesystems are isolated and able to use timespecs
> internally by step 3, and without protocol/format changes can't
> support y2038k compliant dates. Hence fixing such problems is a
> problem for the filesystem developers and is not an issue for the
> VFS timestamp conversion.

No, once we have the timespec_to_vfs_time helpers in all file
systems, that change is just for VFS, and should not touch
any file system specific code.

It is the equivalent of patch 8/15 in the current version
of the series, except that it changes one version of the
code to another rather than changing a CONFIG_* symbol
that alternates between the two versions coexisting in source.

When I first attempted the conversion, I ended up with a very
similar trick that Deepa has now, and it's very helpful to
find what the code locations are that need to be touched,
without doing them all at the same time, as you can simply
flip that option to try out another file system.

However, I agree that this is better not reflected in how the
patches get applied in the end, and there is no need to clutter
the git history with having both options in the code at the
same time, and we should try to avoid touching a lot of code
more than once wherever possible.

> > 5. change the file systems to use timespec64 internally instead of
> >    timespec.
> 
> I think that will work and leave use with a relatively clean code
> base, as well as be able to address y2038k support each individual
> filesystem in our own time.

Ok, thanks.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 19, 2016, 1:38 a.m. UTC | #29
On Mon, Jan 18, 2016 at 10:46:07PM +0100, Arnd Bergmann wrote:
> On Tuesday 19 January 2016 08:14:59 Dave Chinner wrote:
> > On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote:
> > > 3. for each file system that uses struct timespec internally to pass
> > >    around inode timestamps, do one patch that adds a
> > >    timespec_to_inode_time() and vice versa, which gets defined like
> > > 
> > > static inline struct timespec timespec_to_inode(struct timespec t)
> > > {
> > > 	return t;
> > > }
> > 
> > This works, and is much cleaner than propagating the macro nastiness
> > everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be
> > better named as it describes the conversion exactly. I don't think
> > this is a huge patch, though - it's mainly the setattr/kstat
> > operations that need changing here.
> 
> Good idea for the name.
> 
> If you are ok with adding those helpers, then it can be done in small
> steps indeed. I was under the assumption that you didn't like any
> kind of abstraction of the type in struct inode at all.

You're right, I don't like unnecessary abstractions.  I guess I've
not communicated the "convert timestamps at the edges, use native
timestamp types everywhere inside" structure very well, because type
conversion functions such as the above are an absolutely necessary
part of ensuring we don't need abstractions in the core code... :P

> > > 4. change the internal representation in one patch that changes those
> > >    helpers along with the struct members.
> > 
> > If you are talking about converting internal filesystem
> > representations to (e.g. CIFS fattr, NFS fattr, etc) then this is
> > wrong. Those filesystems are isolated and able to use timespecs
> > internally by step 3, and without protocol/format changes can't
> > support y2038k compliant dates. Hence fixing such problems is a
> > problem for the filesystem developers and is not an issue for the
> > VFS timestamp conversion.
> 
> No, once we have the timespec_to_vfs_time helpers in all file
> systems, that change is just for VFS, and should not touch
> any file system specific code.

OK, just wanted to make clear, because to me "internal" tends to
mean "within a specific filesystem" whilst "generic" is used to
refer to things at the VFS layer...

Cheers,

Dave.
Deepa Dinamani Jan. 19, 2016, 5:27 a.m. UTC | #30
On Mon, Jan 18, 2016 at 5:38 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Jan 18, 2016 at 10:46:07PM +0100, Arnd Bergmann wrote:
>> On Tuesday 19 January 2016 08:14:59 Dave Chinner wrote:
>> > On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote:
>> > > 3. for each file system that uses struct timespec internally to pass
>> > >    around inode timestamps, do one patch that adds a
>> > >    timespec_to_inode_time() and vice versa, which gets defined like
>> > >
>> > > static inline struct timespec timespec_to_inode(struct timespec t)
>> > > {
>> > >   return t;
>> > > }
>> >
>> > This works, and is much cleaner than propagating the macro nastiness
>> > everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be
>> > better named as it describes the conversion exactly. I don't think
>> > this is a huge patch, though - it's mainly the setattr/kstat
>> > operations that need changing here.
>>
>> Good idea for the name.
>>
>> If you are ok with adding those helpers, then it can be done in small
>> steps indeed. I was under the assumption that you didn't like any
>> kind of abstraction of the type in struct inode at all.
>
> You're right, I don't like unnecessary abstractions.  I guess I've
> not communicated the "convert timestamps at the edges, use native
> timestamp types everywhere inside" structure very well, because type
> conversion functions such as the above are an absolutely necessary
> part of ensuring we don't need abstractions in the core code... :P


Let's back out a bit and consider a few changes with the suggested "abstraction":

original code:

extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec *ts,
__le16 __time, __le16 __date, u8 time_cs);

fat_time_fat2unix(sbi, &inode->i_mtime, de->time, de->date, 0);

becomes ugly

extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec64 *ts,
__le16 __time, __le16 __date, u8 time_cs);

struct timespec64 mtime = vfs_time_to_timespec64(i_mtime, inode);
fat_time_fat2unix(sbi, &mtime, de->time, de->date, 0);

with inode_timespec it becomes

extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct inode_timespec *ts,
__le16 __time, __le16 __date, u8 time_cs);

fat_time_fat2unix(sbi, &inode->i_mtime, de->time, de->date, 0);

Time-conversion function abstraction:

Pros:
1. do not have to change vfs core code.

Cons:
1. makes all the filesystems that have to use this ugly.
2. How do you make people use these all the time and not go back to use
inode_timestamps directly as is the case right now?
3. Even during this switch, how do you stop people from adding new code which does
not use these functions?
4. There are some scenarios like direct assignments when these conversions are not
required and some other times they are. Imposing something that needs to be only used
sometimes and not even having a clear guidelines on this is done is very very wrong.
5. And, if you do not plan on removing these functions once done switching to timespec64,
I doubt it will ever get used again and you are leaving dead code in.
6. And, we cannot proving everything is in sync is again a problem.

inode_timespec:

Pros:
1. does not have to change vfs code.
2. individual filesystem changes are also less ugly.
3. Easy to manage: 1 simple rule: always use inode_timespec from now on until the
conversion and then use timespec64.
4. Goes away in the end.
5. Every step is simple and can be proved theoretically right.

Cons:
1. Needs 2 step process as with the other approach.


I think inode_timespec is a much better abstraction.
And, if we are going to use one, then it better be the right one.

-Deepa
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Jan. 19, 2016, 8:49 p.m. UTC | #31
On Mon, Jan 18, 2016 at 09:27:13PM -0800, Deepa Dinamani wrote:
> 
> 
> On Mon, Jan 18, 2016 at 5:38 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, Jan 18, 2016 at 10:46:07PM +0100, Arnd Bergmann wrote:
> >> On Tuesday 19 January 2016 08:14:59 Dave Chinner wrote:
> >> > On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote:
> >> > > 3. for each file system that uses struct timespec internally to pass
> >> > >    around inode timestamps, do one patch that adds a
> >> > >    timespec_to_inode_time() and vice versa, which gets defined like
> >> > >
> >> > > static inline struct timespec timespec_to_inode(struct timespec t)
> >> > > {
> >> > >   return t;
> >> > > }
> >> >
> >> > This works, and is much cleaner than propagating the macro nastiness
> >> > everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be
> >> > better named as it describes the conversion exactly. I don't think
> >> > this is a huge patch, though - it's mainly the setattr/kstat
> >> > operations that need changing here.
> >>
> >> Good idea for the name.
> >>
> >> If you are ok with adding those helpers, then it can be done in small
> >> steps indeed. I was under the assumption that you didn't like any
> >> kind of abstraction of the type in struct inode at all.
> >
> > You're right, I don't like unnecessary abstractions.  I guess I've
> > not communicated the "convert timestamps at the edges, use native
> > timestamp types everywhere inside" structure very well, because type
> > conversion functions such as the above are an absolutely necessary
> > part of ensuring we don't need abstractions in the core code... :P
> 
> 
> Let's back out a bit and consider a few changes with the suggested "abstraction":
> 
> original code:
> 
> extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec *ts,
> __le16 __time, __le16 __date, u8 time_cs);
> 
> fat_time_fat2unix(sbi, &inode->i_mtime, de->time, de->date, 0);
> 
> becomes ugly
> 
> extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec64 *ts,
> __le16 __time, __le16 __date, u8 time_cs);
> 
> struct timespec64 mtime = vfs_time_to_timespec64(i_mtime, inode);
> fat_time_fat2unix(sbi, &mtime, de->time, de->date, 0);

You're doing it wrong. fat_time_fat2unix() still gets passed
&inode->i_mtime, and the function prototype is changed to a
timespec64.  *Nothing else needs to change*, because
fat_time_fat2unix() does it own calculations and then stores the
time directly into the timespec structure members....

I think you're making a mountain out of a molehill. Most filesystems
will be unchanged except for s/timespec/timespec64/ as they store
values directly into timespec members when encoding/decoding. There
is no need for timestamp conversion in places like this - you're
simply not looking deep enough and applying the conversion at the
wrong layer.

Cheers,

Dave.
diff mbox

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 12ba937..b9f3cee 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -245,13 +245,13 @@  typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate);
  */
 struct iattr {
 	unsigned int	ia_valid;
-	umode_t		ia_mode;
-	kuid_t		ia_uid;
-	kgid_t		ia_gid;
-	loff_t		ia_size;
-	struct timespec	ia_atime;
-	struct timespec	ia_mtime;
-	struct timespec	ia_ctime;
+	umode_t			ia_mode;
+	kuid_t			ia_uid;
+	kgid_t			ia_gid;
+	loff_t			ia_size;
+	struct inode_timespec	ia_atime;
+	struct inode_timespec	ia_mtime;
+	struct inode_timespec	ia_ctime;
 
 	/*
 	 * Not an attribute, but an auxiliary info for filesystems wanting to
@@ -616,9 +616,18 @@  struct inode {
 	};
 	dev_t			i_rdev;
 	loff_t			i_size;
+#ifdef CONFIG_FS_USES_64BIT_TIME
+	time64_t		i_atime_sec;
+	time64_t		i_mtime_sec;
+	time64_t		i_ctime_sec;
+	s32			i_atime_nsec;
+	s32			i_mtime_nsec;
+	s32			i_ctime_nsec;
+#else
 	struct timespec		i_atime;
 	struct timespec		i_mtime;
 	struct timespec		i_ctime;
+#endif
 	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
 	unsigned short          i_bytes;
 	unsigned int		i_blkbits;
@@ -679,6 +688,38 @@  struct inode {
 	void			*i_private; /* fs or device private pointer */
 };
 
+#ifdef CONFIG_FS_USES_64BIT_TIME
+
+#define VFS_INODE_SET_XTIME(xtime, inode, ts64)			\
+	do {								\
+		struct inode_timespec __ts = (ts64);			\
+		(inode)->xtime##_sec = __ts.tv_sec;			\
+		(inode)->xtime##_nsec = __ts.tv_nsec;			\
+	} while (0)
+
+#define VFS_INODE_GET_XTIME(xtime, inode)			\
+	(struct timespec64){.tv_sec = (inode)->xtime##_sec,	\
+			    .tv_nsec = (inode)->xtime##_nsec}
+
+#else
+
+#define VFS_INODE_SET_XTIME(xtime, inode, ts)			\
+	((inode)->xtime = (ts))
+
+#define VFS_INODE_GET_XTIME(xtime, inode)       \
+	((inode)->xtime)
+
+#endif
+
+#define VFS_INODE_SWAP_XTIME(xtime, inode1, inode2)		\
+	do {							\
+		struct inode_timespec __ts =			\
+			VFS_INODE_GET_XTIME(xtime, inode1);	\
+		VFS_INODE_SET_XTIME(xtime, inode1,		\
+			VFS_INODE_GET_XTIME(xtime, inode2));	\
+		VFS_INODE_SET_XTIME(xtime, inode2, __ts);	\
+	} while (0)
+
 static inline int inode_unhashed(struct inode *inode)
 {
 	return hlist_unhashed(&inode->i_hash);
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 075cb0c..559983f 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -27,9 +27,9 @@  struct kstat {
 	kgid_t		gid;
 	dev_t		rdev;
 	loff_t		size;
-	struct timespec  atime;
-	struct timespec	mtime;
-	struct timespec	ctime;
+	struct inode_timespec	atime;
+	struct inode_timespec	mtime;
+	struct inode_timespec	ctime;
 	unsigned long	blksize;
 	unsigned long long	blocks;
 };
diff --git a/include/linux/time64.h b/include/linux/time64.h
index 367d5af..be98201 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -26,6 +26,27 @@  struct itimerspec64 {
 
 #endif
 
+#ifdef CONFIG_FS_USES_64BIT_TIME
+
+/* Place holder defines until CONFIG_FS_USES_64BIT_TIME
+ * is enabled.
+ * timespec64 data type and functions will be used at that
+ * time directly and these defines will be deleted.
+ */
+#define inode_timespec timespec64
+
+#define inode_timespec_compare	timespec64_compare
+#define inode_timespec_equal	timespec64_equal
+
+#else
+
+#define inode_timespec timespec
+
+#define inode_timespec_compare	timespec_compare
+#define inode_timespec_equal	timespec_equal
+
+#endif
+
 /* Parameters used to convert the timespec values: */
 #define MSEC_PER_SEC	1000L
 #define USEC_PER_MSEC	1000L