Message ID | 1452144972-15802-3-git-send-email-deepa.kernel@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: > The current representation of inode times in struct inode, struct iattr, > and struct kstat use struct timespec. timespec is not y2038 safe. > > Use scalar data types (seconds and nanoseconds stored separately) to > represent timestamps in struct inode in order to maintain same size for > times across 32 bit and 64 bit architectures. > In addition, lay them out such that they are packed on a naturally > aligned boundary on 64 bit arch as 4 bytes are used to store nsec. > This makes each tuple(sec, nscec) use 12 bytes instead of 16 bytes. > This will help save RAM space as inode structure is cached in memory. > The other structures are transient and do not benefit from these > changes. IMO, this decisions sends the patch series immediately down the wrong path. TO me, this is a severe case of premature optimisation because everything gets way more complex just to save those 8 bytes, especially as those holes can be filled simply by changing the variable declaration order in the structure and adding a simple comment. And, really, I don't like those VFS_INODE_[GS]ET_XTIME macros at all; you've got to touch lots of code(*), making it all shouty and harder to read. They seem only to exist because of the above structural change requires an abstract timestamp accessor while CONFIG_FS_USES_64BIT_TIME exists. Given that goes away at the end o the series, so should the macro - if we use a struct timespec64 in the first place, it isn't even necessary as a temporary construct. (*) I note you haven't touched XFS, which means you've probably broken lots of other filesystem code. e.g. in XFS, functions like xfs_vn_getattr() and xfs_vn_update_time() access inode->i_[acm]time directly and hence are not going to compile after this patch series. Cheers, Dave.
> On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote: > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: >> The current representation of inode times in struct inode, struct iattr, >> and struct kstat use struct timespec. timespec is not y2038 safe. >> >> Use scalar data types (seconds and nanoseconds stored separately) to >> represent timestamps in struct inode in order to maintain same size for >> times across 32 bit and 64 bit architectures. >> In addition, lay them out such that they are packed on a naturally >> aligned boundary on 64 bit arch as 4 bytes are used to store nsec. >> This makes each tuple(sec, nscec) use 12 bytes instead of 16 bytes. >> This will help save RAM space as inode structure is cached in memory. >> The other structures are transient and do not benefit from these >> changes. > > IMO, this decisions sends the patch series immediately down the > wrong path. There are other things the patch does that I would like to get comments on: inode_timespec aliases, range check, individual fs changes etc. These are independent of the inode timestamp representation changes. >TO me, this is a severe case of premature optimisation > because everything gets way more complex just to save those 8 bytes, > especially as those holes can be filled simply by changing the > variable declaration order in the structure and adding a simple > comment. I had tried rearranging the structure and the pahole tool does not show any difference unless you pack and align the struct to 4 bytes on 64 bit arch. The change actually saves 16 bytes on x86_64 and adds 12 bytes on i386. Here is the breakdown for struct inode before and after the patch: x86_64: /* size: 544, cachelines: 9, members: 44 */ | /* size: 528, cachelines: 9, members: 47 */ /* sum members: 534, holes: 3, sum holes: 10 */ | /* sum members: 522, holes: 2, sum holes: 6 */ i386: /* size: 328, cachelines: 6, members: 45 */ | /* size: 340, cachelines: 6, members: 48 */ /* sum members: 326, holes: 1, sum holes: 2 */ | /* sum members: 338, holes: 1, sum holes: 2 */ According to /proc/slabinfo I estimated savings of 4MB on a lightly loaded system. > And, really, I don't like those VFS_INODE_[GS]ET_XTIME macros at > all; you've got to touch lots of code(*), making it all shouty and > harder to read. They seem only to exist because of the above > structural change requires an abstract timestamp accessor while > CONFIG_FS_USES_64BIT_TIME exists. > Given that goes away at the end o > the series, so should the macro - if we use a struct timespec64 in > the first place, it isn't even necessary as a temporary construct timespec64 was the first option considered here. The problem with using timespec64 is the long data type to represent nsec. If it were possible to change timespec64 nsec to int data type then it might be okay to use that if we are not worried about holes. I do not see why time stamps should have different representations on a 32 bit vs a 64 bit arch. This left us with the option define a new data type to represent timestamps. I agreed with the concerns on the earlier RFC series that there are already very many data types to represent time in the kernel. So this left me with the option of using scalar types to represent these. The scalar types were not used for optimization. They just happened to serve that purpose as well. This could be in a follow on patch, but as long as we are changing the representation everywhere, I don't see why there should be an intermediate step to change it to timespec64 only to change it to this representation later. As far as accessors are concerned, there already are accessors in the VFS: generic_fillattr() and setattr_copy(). The problem really is that there is more than one way of updating these attributes(timestamps in this particular case). The side effect of this is that we don't always call timespec_trunc() before assigning timestamps which can lead to inconsistencies between on disk and in memory inode timestamps. Also, since these also touch other attributes, these become more restrictive. The accessors were an idea to streamline all accesses to timestamps in inode. Right now the accessor macros also figure out if the timestamps were clamped and then call the registered callback. But, this can be extended to include fs_time_trunc() and then all the end users can just use these and not worry about the right granularity or range. As the commit text says, these can be changed to inline functions to avoid shouty case. > (*) I note you haven't touched XFS, which means you've probably > broken lots of other filesystem code. e.g. in XFS, functions like > xfs_vn_getattr() and xfs_vn_update_time() access inode->i_[acm]time > directly and hence are not going to compile after this patch series. I think I should have explained this more in my cover letter, as this has come up twice now. Patches 1-7 are the only ones that are relevant and compiled and tested. These change three example filesystems as an illustration of the proposed solution. Of course, every filesystem will have to be changed similarly before patches 8-15 and a few more to change additional filesystems to use timespec64 can be merged. Patches 8-15 were included merely to provide a complete picture, as I thought patches explained the concept better than words only. These have not even been compiled, as these are for illustration purposes as noted in the cover letter. -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote: > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote: > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: > >> The current representation of inode times in struct inode, struct iattr, > >> and struct kstat use struct timespec. timespec is not y2038 safe. > >> > >> Use scalar data types (seconds and nanoseconds stored separately) to > >> represent timestamps in struct inode in order to maintain same size for > >> times across 32 bit and 64 bit architectures. > >> In addition, lay them out such that they are packed on a naturally > >> aligned boundary on 64 bit arch as 4 bytes are used to store nsec. > >> This makes each tuple(sec, nscec) use 12 bytes instead of 16 bytes. > >> This will help save RAM space as inode structure is cached in memory. > >> The other structures are transient and do not benefit from these > >> changes. > > > > IMO, this decisions sends the patch series immediately down the > > wrong path. > > There are other things the patch does that I would like to get comments > on: inode_timespec aliases, range check, individual fs changes etc. > These are independent of the inode timestamp representation changes. The inode_timespec alias is part of the problem - AFAICT it exists because you need a representation that is independent of both the old and new in-memory structures. The valid timestamp range stuff in the superblock is absolutely necessary, but that's something that can be done completely independently to the changes for supporting a differnet time storage format. And the fs changes cannot really be commented on until the VFs time representation is sorted out properly... > >TO me, this is a severe case of premature optimisation > > because everything gets way more complex just to save those 8 bytes, > > especially as those holes can be filled simply by changing the > > variable declaration order in the structure and adding a simple > > comment. > > I had tried rearranging the structure and the pahole tool does not show > any difference unless you pack and align the struct to 4 bytes on 64 > bit arch. The change actually saves 16 bytes on x86_64 and adds 12 > bytes on i386. > > Here is the breakdown for struct inode before and after the patch: > > x86_64: > /* size: 544, cachelines: 9, members: 44 */ | > /* size: 528, cachelines: 9, members: 47 */ > /* sum members: 534, holes: 3, sum holes: 10 */ | > /* sum members: 522, holes: 2, sum holes: 6 */ > > i386: > /* size: 328, cachelines: 6, members: 45 */ | > /* size: 340, cachelines: 6, members: 48 */ > /* sum members: 326, holes: 1, sum holes: 2 */ | > /* sum members: 338, holes: 1, sum holes: 2 */ > > According to /proc/slabinfo I estimated savings of 4MB on a lightly > loaded system. > > > And, really, I don't like those VFS_INODE_[GS]ET_XTIME macros at > > all; you've got to touch lots of code(*), making it all shouty and > > harder to read. They seem only to exist because of the above > > structural change requires an abstract timestamp accessor while > > CONFIG_FS_USES_64BIT_TIME exists. > > Given that goes away at the end o > > the series, so should the macro - if we use a struct timespec64 in > > the first place, it isn't even necessary as a temporary construct > > timespec64 was the first option considered here. The problem with using > timespec64 is the long data type to represent nsec. If it were possible > to change timespec64 nsec to int data type then it might be okay to use > that if we are not worried about holes. I do not see why time stamps > should have different representations on a 32 bit vs a 64 bit arch. What's it matter? iot's irrelevant to the problem at hand. Besides, have you looked at the existing timestamp definitions? they use a struct timespec, which on a 64 bit system: struct timespec i_atime; /* 88 16 */ struct timespec i_mtime; /* 104 16 */ struct timespec i_ctime; /* 120 16 */ use 2 longs and are identical to a timespec64 in everything but name. These should just be changed to a timespec64, and we suck up the fact it increases the size of the 32 bit inode as we have to increase it's size to support time > y2038 anyway. This is what I meant about premature optimisation - you've got a wonderfully complex solution to a problem that we don't need to solve to support timestamps >y2038. It's also why it goes down the wrong path at this point - most of the changes are not necessary if all we need to do is a simple timespec -> timespec64 type change and the addition timestamp range limiting in the existing truncation function... > The problem really is that > there is more than one way of updating these attributes(timestamps in > this particular case). The side effect of this is that we don't always > call timespec_trunc() before assigning timestamps which can lead to > inconsistencies between on disk and in memory inode timestamps. That's a problem that can be fixed independently of y2038 support. Indeed, we can be quite lazy about updating timestamps - by intent and design we usually have different timestamps in memory compared to on disk, which is one of the reasons why there are so many different ways to change and update timestamps.... Cheers, Dave.
On Tuesday 12 January 2016 19:29:57 Dave Chinner wrote: > > This is what I meant about premature optimisation - you've got a > wonderfully complex solution to a problem that we don't need to > solve to support timestamps >y2038. It's also why it goes down the > wrong path at this point - most of the changes are not necessary if > all we need to do is a simple timespec -> timespec64 type change and > the addition timestamp range limiting in the existing truncation > function... I originally suggested doing the split representation because I was worried about the downsides of using timespec64 on 32-bit systems after looking at actual memory consumption on my test box. At this moment, I have a total of 145712700 inodes in memory on a machine with 64GB ram, saving 12 bytes on each amounts to a total of 145MB. I think it was more than that when I first looked, so it's between 0.2% and 0.3% of savings in total memory, which is certainly worth discussing about, given the renewed interest in conserving RAM in general. If we want to save this memory, then doing it at the same time as the timespec64 conversion is the right time so we don't need to touch every file twice. One point that I had not considered though is on the 32-bit systems we are talking about, not only is RAM much smaller, but also there would be a smaller fraction of RAM available to store inodes, so there is not as much to gain. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 12, 2016 at 10:27:07AM +0100, Arnd Bergmann wrote: > On Tuesday 12 January 2016 19:29:57 Dave Chinner wrote: > > > > This is what I meant about premature optimisation - you've got a > > wonderfully complex solution to a problem that we don't need to > > solve to support timestamps >y2038. It's also why it goes down the > > wrong path at this point - most of the changes are not necessary if > > all we need to do is a simple timespec -> timespec64 type change and > > the addition timestamp range limiting in the existing truncation > > function... > > I originally suggested doing the split representation because I > was worried about the downsides of using timespec64 on 32-bit systems > after looking at actual memory consumption on my test box. > > At this moment, I have a total of 145712700 inodes in memory on a machine Is that all? :P > with 64GB ram, saving 12 bytes on each amounts to a total of 145MB. I just posted a patchset that knocks 104 bytes off the XFS inode (~12% reduction in size). We need the changes in that patchset to sanely support >y2038k support in XFS, and it means we now won't need to grow the XFS inode to add that support, either. > I think > it was more than that when I first looked, so it's between 0.2% and 0.3% > of savings in total memory, which is certainly worth discussing about, > given the renewed interest in conserving RAM in general. If we want to > save this memory, then doing it at the same time as the timespec64 conversion > is the right time so we don't need to touch every file twice. You just uttered the key words: "If we want to save this memory" So let's stop conflating two different lines of development because we only actually *need* y2038k support. The fact we haven't made timestamp space optimisations means that nobody has thought it necessary or worthwhile. y2038k support doesn't change the landscape under which we might consider the optimisation, so we need to determine if the benefit outweighs the cost in terms of code complexity and maintainability. So separate the two changes - make the y2038k change simple and obviously correct first by changing everything to timespec64. Then it won't get delayed by bikeshedding about an optimisation of that is of questionable benefit. Cheers, Dave.
On Wednesday 13 January 2016 17:27:16 Dave Chinner wrote: > > I think > > it was more than that when I first looked, so it's between 0.2% and 0.3% > > of savings in total memory, which is certainly worth discussing about, > > given the renewed interest in conserving RAM in general. If we want to > > save this memory, then doing it at the same time as the timespec64 conversion > > is the right time so we don't need to touch every file twice. > > You just uttered the key words: "If we want to save this memory" > > So let's stop conflating two different lines of development because > we only actually *need* y2038k support. > > The fact we haven't made timestamp space optimisations means > that nobody has thought it necessary or worthwhile. y2038k support > doesn't change the landscape under which we might consider the > optimisation, so we need to determine if the benefit outweighs the > cost in terms of code complexity and maintainability. > > So separate the two changes - make the y2038k change simple and > obviously correct first by changing everything to timespec64. Then it > won't get delayed by bikeshedding about an optimisation of that is > of questionable benefit. Fine with me. I think Deepa already started simplifying the series already. I agree that for 64-bit machines, there is no need to optimize that code now, since we are not regressing in terms of memory size. For 32-bit machines, we are regressing anyway, the question is whether it's by 12 or 24 bytes per inode. Let me try to estimate the worse-case scenario here: let's assume that we have 1GB of RAM (anything more on a 32-bit system gets you into trouble, and if you have less, there will be less of a problem). Filling all of system ram with small tmpfs files means a single 4K page plus 280 bytes for the minimum inode, so we need an additional 6MB or 12MB to store the extra timespec bits. Probably not too bad for a worst-case scenario, but there is also the case of storing just the inodes but no pages, and that would be worse. I've added the linux-arm and linux-mips lists to cc, to see if anyone has strong opinions on this matter. We don't have to worry about x86-32 here, because sizeof(struct timespec64) is 12 bytes there anyway, and I don't think there are any other 32-bit architectures that have large-scale deployments or additional requirements we don't already have on ARM. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote: > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote: > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote: > > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: Summarizing, here are the open questions that need to be sorted before another series: 1. What should be part of the series? a. y2038 safe data types conversion. b. range check and clamping 2. How to achieve a seamless transition? Is inode_timespec solution agreed upon to achieve 1a? An alternate approach is included in the cover letter. 3. policy for handling out of range timestamps: There was no conclusion on this from the previous series as noted in the cover letter. I think this can be solved by figuring out the answer to question: who should have a say in deciding the course of action if the timestamps are out of range: a. sysadmin through sysctl (Arnd's suggestion) b. have default vfs handlers with an option for individual fs to override. c. clamp and ignore d. disable expired fs at compile time (Arnd's suggestion) > The inode_timespec alias is part of the problem - AFAICT it exists > because you need a representation that is independent of both the > old and new in-memory structures. Maybe you are misunderstanding the fact that only struct inode is changed to have individual fields. Whereas, timespec64 is used everywhere else in the series. inode_timespec is an alias to transition timespec references to timespec64 without breaking anything. This is needed because we don't want to change all references to timespec in a single patch like the cover letter says. The same RFC Patch 2 has more details. > The valid timestamp range stuff in the superblock is absolutely > necessary, but that's something that can be done completely > independently to the changes for supporting a differnet time storage > format. If I'm defining new functions to support new format, then I should at least get the function signatures right before using them. These can be in a different patch, but should be in the same patch series, before they are used anywhere. For instance, struct timespec timespec_trunc(struct timespec t, unsigned gran); should now take superblock as an argument instead of gran. > And the fs changes cannot really be commented on until the VFs time > representation is sorted out properly... Each fs is changed twice in the current approach to transition everything to timespec64. And, there are different ways of doing this. For instance, Arnd had an idea different from mine as to how this can be done: He was suggesting using something like these accessor macros and incorporating timespec64 from the beginning in the individual filesystems rather than inode_timespec. Again, this is independent of how timestamps are stored in struct inode. There are others that are independent of inode timestamp representation as well > Besides, have you looked at the existing timestamp definitions? they > use a struct timespec, which on a 64 bit system: > > struct timespec i_atime; /* 88 16 */ > struct timespec i_mtime; /* 104 16 */ > struct timespec i_ctime; /* 120 16 */ > > use 2 longs and are identical to a timespec64 in everything but > name. These should just be changed to a timespec64, and we suck up > the fact it increases the size of the 32 bit inode as we have to > increase it's size to support time > y2038 anyway. > > This is what I meant about premature optimisation - you've got a > wonderfully complex solution to a problem that we don't need to > solve to support timestamps >y2038. It's also why it goes down the > wrong path at this point - most of the changes are not necessary if > all we need to do is a simple timespec -> timespec64 type change and > the addition timestamp range limiting in the existing truncation > function... The pahole output I pasted in the previous email(on the left) was for timespec. Yes, I do know that timespec64 is same as timespec on 64 bit systems: #if __BITS_PER_LONG == 64 # define timespec64 timespec I think it's been agreed upon now that inode timestamps will be changed to use timespec64 as Arnd mentioned, if I do not hear any objections. The whole purpose of this is to gather comments. > > The problem really is that > > there is more than one way of updating these attributes(timestamps in > > this particular case). The side effect of this is that we don't always > > call timespec_trunc() before assigning timestamps which can lead to > > inconsistencies between on disk and in memory inode timestamps. > > That's a problem that can be fixed independently of y2038 support. > Indeed, we can be quite lazy about updating timestamps - by intent > and design we usually have different timestamps in memory compared > to on disk, which is one of the reasons why there are so many > different ways to change and update timestamps.... This has nothing to do with lazy updates. This is about writing wrong granularities and non clamped values to in-memory inode. -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote: > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote: > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote: > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote: > > > > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: > > Summarizing, here are the open questions that need to be sorted before another > series: > 1. What should be part of the series? > a. y2038 safe data types conversion. > b. range check and clamping Yes. > 2. How to achieve a seamless transition? > Is inode_timespec solution agreed upon to achieve 1a? No. Just convert direct to timespec64. > An alternate approach is included in the cover letter. > 3. policy for handling out of range timestamps: > There was no conclusion on this from the previous series as noted in the > cover letter. > a. sysadmin through sysctl (Arnd's suggestion) > b. have default vfs handlers with an option for individual fs to override. > c. clamp and ignore I think it's a mix - if the timestamps come in from userspace, fail with ERANGE. That could be controlled by sysctl via VFS part of the ->setattr operation, or in each of the individual FS implementations. If they come from the kernel (e.g. atime update) then the generic behvaiour is to warn and continue, filesystems can otherwise select their own policy for kernel updates via ->update_time. > d. disable expired fs at compile time (Arnd's suggestion) Not really an option, because it means we can't use filesystems that interop with other systems (e.g. cameras, etc) because they won't support y2038k timestamps for a long time, if ever (e.g. vfat). > > > The problem really is that > > > there is more than one way of updating these attributes(timestamps in > > > this particular case). The side effect of this is that we don't always > > > call timespec_trunc() before assigning timestamps which can lead to > > > inconsistencies between on disk and in memory inode timestamps. > > > > That's a problem that can be fixed independently of y2038 support. > > Indeed, we can be quite lazy about updating timestamps - by intent > > and design we usually have different timestamps in memory compared > > to on disk, which is one of the reasons why there are so many > > different ways to change and update timestamps.... > > This has nothing to do with lazy updates. > This is about writing wrong granularities and non clamped values to > in-memory inode. Which really shouldn't happen because we should be clamping and/or truncating timestamps at the creation/entry point into the VFS/filesystem. e.g. current_fs_time(sb) is how filesystems grab the current kernel time for timestamp updates. Add an equivalent current_fs_time64(sb) to do return timespec64 and do clamping and limit warning, and now you have a simple vehicle for converting the VFS and filesystems to support y2038k clean date formats. If there are places where filesystems are receiving or using unchecked timestamps then those are bugs that need fixing. Those need to be in separate patches to y2038k support... Cheers, Dave.
On Thursday 14 January 2016 08:04:36 Dave Chinner wrote: > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote: > > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote: > > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote: > > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote: > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: > > > > 2. How to achieve a seamless transition? > > Is inode_timespec solution agreed upon to achieve 1a? > > No. Just convert direct to timespec64. The hard part here is how to split that change into logical patches per file system. We have already discussed all sorts of ways to do that, but there is no ideal solution, as you usually end up either having some really large patches, or you have to modify the same lines multiple times. The most promising approaches are: a) In Deepa's current patch set, some infrastructure is first introduced by changing the type from timespec to an identical inode_timespec, which lets us convert one file system at a time to inode_timespec and then change the type once they are all done. The downside is then that all file systems have to get touched twice so we end up with timespec64 everywhere. b) A variation of that which I would do is to use have a smaller set of infrastructure first, so we can change one file system at a time to timespec64 while leaving the common structures to use timespec until all file systems are converted. The downside is the use of some conversion macros when accessing the times in the inode. When the common code is changed, those accessor macros get turned into trivial assignments that can be removed up later or changed in the same patch. c) The opposite direction from b) is to first change the common code, but then any direct assignment between a timespec in a file system and the timespec64 in the inode/iattr/kstat/etc first needs a conversion helper so we can build cleanly, and then we do one file system at a time to remove them all again while changing the internal structures in the file system from timespec to timespec64. > > An alternate approach is included in the cover letter. > > 3. policy for handling out of range timestamps: > > There was no conclusion on this from the previous series as noted in the > > cover letter. > > a. sysadmin through sysctl (Arnd's suggestion) > > b. have default vfs handlers with an option for individual fs to override. > > c. clamp and ignore > > I think it's a mix - if the timestamps come in from userspace, > fail with ERANGE. That could be controlled by sysctl via VFS > part of the ->setattr operation, or in each of the individual FS > implementations. If they come from the kernel (e.g. atime update) then > the generic behvaiour is to warn and continue, filesystems can > otherwise select their own policy for kernel updates via > ->update_time. I'd prefer not to have it done by the individual file system implementation, so we get a consistent behavior. Normally you either care about correct time stamps, or you care about interoperability and you don't want to have errors returned here. It could be done per mount, but that seems overly complicated for rather little to be gained. > > d. disable expired fs at compile time (Arnd's suggestion) > > Not really an option, because it means we can't use filesystems that > interop with other systems (e.g. cameras, etc) because they won't > support y2038k timestamps for a long time, if ever (e.g. vfat). Let me clarify what my idea is here: I want a global kernel option that disables all code that has known y2038 issues. If anyone tries to build an embedded system with support beyond 2038, that should disable all of those things, including file systems, drivers and system calls, so we can reasonably assume that everything that works today with that kernel build will keep working in the future and not break in random ways. For a file system, this can be done in a number of ways: * Most file systems today interpret the time as an unsigned 32-bit number (as opposed to signed as ext3, xfs and few others do), so as long as we use timespec64 in the syscalls, we are ok. * Some legacy file systems (maybe hfs) can remain disabled, as nobody cares about them any more. * If we still care about them (e.g. ext2), we can make them support only read-only mode. In ext4, this would mean forbidding write access to file systems that don't have the extended inode format enabled. Normal users that don't care about not breaking in 2038 obviously won't set the option, and have the same level of backwards compatibility support as today. > > > > The problem really is that > > > > there is more than one way of updating these attributes(timestamps in > > > > this particular case). The side effect of this is that we don't always > > > > call timespec_trunc() before assigning timestamps which can lead to > > > > inconsistencies between on disk and in memory inode timestamps. > > > > > > That's a problem that can be fixed independently of y2038 support. > > > Indeed, we can be quite lazy about updating timestamps - by intent > > > and design we usually have different timestamps in memory compared > > > to on disk, which is one of the reasons why there are so many > > > different ways to change and update timestamps.... > > > > This has nothing to do with lazy updates. > > This is about writing wrong granularities and non clamped values to > > in-memory inode. > > Which really shouldn't happen because we should be clamping and/or > truncating timestamps at the creation/entry point into the > VFS/filesystem. > > e.g. current_fs_time(sb) is how filesystems grab the current kernel > time for timestamp updates. Add an equivalent current_fs_time64(sb) > to do return timespec64 and do clamping and limit warning, and now > you have a simple vehicle for converting the VFS and filesystems to > support y2038k clean date formats. I think the current patch series does this already. > If there are places where filesystems are receiving or using > unchecked timestamps then those are bugs that need fixing. Those > need to be in separate patches to y2038k support... Fair enough, but that probably means that patch series will have to come first. This will also reduce the number of places in which a separate type conversion function needs to be added. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote: > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote: > > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote: > > > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote: > > > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote: > > > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote: > > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: > > > > > > 2. How to achieve a seamless transition? > > > Is inode_timespec solution agreed upon to achieve 1a? > > > > No. Just convert direct to timespec64. > > The hard part here is how to split that change into logical patches > per file system. We have already discussed all sorts of ways to > do that, but there is no ideal solution, as you usually end up > either having some really large patches, or you have to modify > the same lines multiple times. > > The most promising approaches are: > > a) In Deepa's current patch set, some infrastructure is first > introduced by changing the type from timespec to an identical > inode_timespec, which lets us convert one file system at a time > to inode_timespec and then change the type once they are all > done. The downside is then that all file systems have to get > touched twice so we end up with timespec64 everywhere. > > b) A variation of that which I would do is to use have a smaller > set of infrastructure first, so we can change one file system > at a time to timespec64 while leaving the common structures to > use timespec until all file systems are converted. The downside > is the use of some conversion macros when accessing the times > in the inode. > When the common code is changed, those accessor macros get > turned into trivial assignments that can be removed up later > or changed in the same patch. > > c) The opposite direction from b) is to first change the common > code, but then any direct assignment between a timespec in > a file system and the timespec64 in the inode/iattr/kstat/etc > first needs a conversion helper so we can build cleanly, > and then we do one file system at a time to remove them all > again while changing the internal structures in the > file system from timespec to timespec64. Just a clarification here: approaches b and c also need some functions that take times as arguments, including a function pointer in the vfs layer to be supported in both forms: timespec and timespec64 concurrently. As included in the cover letter, these are: generic_update_time(), inode->i_op->update_time(), lease_get_mtime(), fstack_copy_attr_all(), setattr_copy(), generic_fillattr(). -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote: > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote: > > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote: > > > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote: > > > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote: > > > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote: > > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: > > > > > > 2. How to achieve a seamless transition? > > > Is inode_timespec solution agreed upon to achieve 1a? > > > > No. Just convert direct to timespec64. > > The hard part here is how to split that change into logical patches > per file system. We have already discussed all sorts of ways to > do that, but there is no ideal solution, as you usually end up > either having some really large patches, or you have to modify > the same lines multiple times. > > The most promising approaches are: > > a) In Deepa's current patch set, some infrastructure is first > introduced by changing the type from timespec to an identical > inode_timespec, which lets us convert one file system at a time > to inode_timespec and then change the type once they are all > done. The downside is then that all file systems have to get > touched twice so we end up with timespec64 everywhere. Yes, and the result is not pretty. > b) A variation of that which I would do is to use have a smaller > set of infrastructure first, so we can change one file system > at a time to timespec64 while leaving the common structures to > use timespec until all file systems are converted. The downside > is the use of some conversion macros when accessing the times > in the inode. > When the common code is changed, those accessor macros get > turned into trivial assignments that can be removed up later > or changed in the same patch. Doesn't really make a lot of sense to me. We have to change everything evntually, and it's not that much work to do so up front... > c) The opposite direction from b) is to first change the common > code, but then any direct assignment between a timespec in > a file system and the timespec64 in the inode/iattr/kstat/etc > first needs a conversion helper so we can build cleanly, > and then we do one file system at a time to remove them all > again while changing the internal structures in the > file system from timespec to timespec64. No new helpers are necessary - we've already got the helper functions we need. This: int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry) { struct inode *inode = d_inode(old_dentry); + struct inode_timespec now = current_fs_time(inode->i_sb); - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; + VFS_INODE_SET_XTIME(i_ctime, inode, now); + VFS_INODE_SET_XTIME(i_mtime, dir, now); + VFS_INODE_SET_XTIME(i_ctime, dir, now); inc_nlink(inode); ..... is just wrong. All the type conversion and clamping and checking done in that VFS_INODE_SET_XTIME() should be done in current_fs_time() and have it return a timespec64 directly. Indeed, it already does truncation, and can easily be made to do range clamping, too. i.e. the change should simply be: - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; + inode->i_ctime = dir->i_ctime = dir->i_mtime = current_fs_time(inode->i_sb); This will work cleanly with a s/timespec/timespec64/ API changeover, too, without the need for any additional helpers at all. i.e. the first thing to do is make sure everything that creates/converts timestamps uses/returns a struct timespec that is clamped and truncated at the earliest possible opportunity. i.e. on entry to the kernel and when pulled from the kernel time. Once all the code is using and passing around truncated+clamped struct timespec (like above), then we can simply do a s/timespec/timespec64/ on the inode and API, and we're done. > > I think it's a mix - if the timestamps come in from userspace, > > fail with ERANGE. That could be controlled by sysctl via VFS > > part of the ->setattr operation, or in each of the individual FS > > implementations. If they come from the kernel (e.g. atime update) then > > the generic behvaiour is to warn and continue, filesystems can > > otherwise select their own policy for kernel updates via > > ->update_time. > > I'd prefer not to have it done by the individual file system > implementation, so we get a consistent behavior. Can't be helped, because different filesystems have different timestamp behaviours, and not all use the generic VFS code for timestamp updates. The filesystems need to use the correct helper functions to obtain a valid current time, but you can't stop them from storing and using arbitrary timestamp formats if they so desire... > > > d. disable expired fs at compile time (Arnd's suggestion) > > > > Not really an option, because it means we can't use filesystems that > > interop with other systems (e.g. cameras, etc) because they won't > > support y2038k timestamps for a long time, if ever (e.g. vfat). > > Let me clarify what my idea is here: I want a global kernel option > that disables all code that has known y2038 issues. If anyone tries > to build an embedded system with support beyond 2038, that should > disable all of those things, including file systems, drivers and > system calls, so we can reasonably assume that everything that works > today with that kernel build will keep working in the future and > not break in random ways. It's not that black and white when it comes to filesystems. y2038k support is determined by the on-disk structure of the filesystem being mounted, and that is determined at mount time. When the filesystem mounts and sets it's valid timestamp ranges the VFS will need to decide as to whether the filesystem is allowed to continue mounting or not. > For a file system, this can be done in a number of ways: > > * Most file systems today interpret the time as an unsigned 32-bit > number (as opposed to signed as ext3, xfs and few others do), > so as long as we use timespec64 in the syscalls, we are ok. Actually, sign conversion is a problem we currently have to be very careful of. See, for example, xfstests:tests/generic/258, which tests timestamps recording times before epoch. i.e. in XFS we have to convert the unsigned 32 bit disk timestamp to signed 32 bit before storing it in the VFS timestamp so it behaves correctly on 64 bit systems. This results in us needing to do this when reading the inode from disk: + /* + * time is signed, so need to convert to signed 32 bit before + * storing in inode timestamp which may be 64 bit. Otherwise + * a time before epoch is converted to a time long after epoch + * on 64 bit systems. + */ + inode->i_atime.tv_sec = (int)be32_to_cpu(from->di_atime.t_sec); + inode->i_atime.tv_nsec = (int)be32_to_cpu(from->di_atime.t_nsec); + inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec); + inode->i_mtime.tv_nsec = (int)be32_to_cpu(from->di_mtime.t_nsec); + inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec); + inode->i_ctime.tv_nsec = (int)be32_to_cpu(from->di_ctime.t_nsec); + (http://oss.sgi.com/archives/xfs/2016-01/msg00456.html) > * Some legacy file systems (maybe hfs) can remain disabled, as > nobody cares about them any more. > > * If we still care about them (e.g. ext2), we can make them support > only read-only mode. In ext4, this would mean forbidding write > access to file systems that don't have the extended inode format > enabled. For ext2/4, that would have to be handled internally by the filesystem with feature masks. For other legacy filesystems, then the VFS mount time checking could allow RO mounts if the supported ranges are not y2038k clean. Compile time options are not really the best approach here... > > If there are places where filesystems are receiving or using > > unchecked timestamps then those are bugs that need fixing. Those > > need to be in separate patches to y2038k support... > > Fair enough, but that probably means that patch series will have to > come first. Yes, that is normal practice for structuring a non-trivial feature addition patch series. Bug fixes first, then cleanups, then the functionality changes. > This will also reduce the number of places in which a > separate type conversion function needs to be added. Precisely. I'm pretty sure this should come down to "no new conversion functions needed" for the vast majority of the code. Cheers, Dave.
On Friday 15 January 2016 08:00:01 Dave Chinner wrote: > On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote: > > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote: > > > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote: > > c) The opposite direction from b) is to first change the common > > code, but then any direct assignment between a timespec in > > a file system and the timespec64 in the inode/iattr/kstat/etc > > first needs a conversion helper so we can build cleanly, > > and then we do one file system at a time to remove them all > > again while changing the internal structures in the > > file system from timespec to timespec64. > > No new helpers are necessary - we've already got the helper > functions we need. This: > > int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry) > { > struct inode *inode = d_inode(old_dentry); > + struct inode_timespec now = current_fs_time(inode->i_sb); > > - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; > + VFS_INODE_SET_XTIME(i_ctime, inode, now); > + VFS_INODE_SET_XTIME(i_mtime, dir, now); > + VFS_INODE_SET_XTIME(i_ctime, dir, now); > inc_nlink(inode); > ..... > > is just wrong. All the type conversion and clamping and checking > done in that VFS_INODE_SET_XTIME() should be done in > current_fs_time() and have it return a timespec64 directly. Indeed, > it already does truncation, and can easily be made to do range > clamping, too. i.e. the change should simply be: > > - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; > + inode->i_ctime = dir->i_ctime = dir->i_mtime = current_fs_time(inode->i_sb); Yes, that is the obvious case, and I guess works for at least half the file systems when they always assign righthand side and lefthand side of the time stamps using the external types or helpers like CURRENT_TIME and current_fs_time(). However, there are a couple of file systems that need a bit more refactoring before we can do this, e.g. in ntfs_truncate: if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) { struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb); int sync_it = 0; if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) || !timespec_equal(&VFS_I(base_ni)->i_ctime, &now)) sync_it = 1; VFS_I(base_ni)->i_mtime = now; VFS_I(base_ni)->i_ctime = now; } The type of the local variable must match the return code of current_fs_time(), so if we change over i_mtime and current_fs_time globally, this either has to be rewritten first to avoid the use of local variables, or it needs temporary conversion helpers, or it has to be changed in the same patch. None of those is particularly appealing. There are a few dozen such things in various file systems. > > > I think it's a mix - if the timestamps come in from userspace, > > > fail with ERANGE. That could be controlled by sysctl via VFS > > > part of the ->setattr operation, or in each of the individual FS > > > implementations. If they come from the kernel (e.g. atime update) then > > > the generic behvaiour is to warn and continue, filesystems can > > > otherwise select their own policy for kernel updates via > > > ->update_time. > > > > I'd prefer not to have it done by the individual file system > > implementation, so we get a consistent behavior. > > Can't be helped, because different filesystems have different > timestamp behaviours, and not all use the generic VFS code for > timestamp updates. The filesystems need to use the correct helper > functions to obtain a valid current time, but you can't stop them > from storing and using arbitrary timestamp formats if they so > desire... I mean the decision whether to clamp or error on an overflow should be done consistently. Having a global sysctl knob or a compile-time option is better than having each file system implementor take a guess at what users might prefer, if we can't come up with a behavior (e.g. clamp all the time, or error out all the time) that everybody agrees is always correct. > > > > d. disable expired fs at compile time (Arnd's suggestion) > > > > > > Not really an option, because it means we can't use filesystems that > > > interop with other systems (e.g. cameras, etc) because they won't > > > support y2038k timestamps for a long time, if ever (e.g. vfat). > > > > Let me clarify what my idea is here: I want a global kernel option > > that disables all code that has known y2038 issues. If anyone tries > > to build an embedded system with support beyond 2038, that should > > disable all of those things, including file systems, drivers and > > system calls, so we can reasonably assume that everything that works > > today with that kernel build will keep working in the future and > > not break in random ways. > > It's not that black and white when it comes to filesystems. y2038k > support is determined by the on-disk structure of the filesystem > being mounted, and that is determined at mount time. When the > filesystem mounts and sets it's valid timestamp ranges the VFS > will need to decide as to whether the filesystem is allowed to > continue mounting or not. Some file systems are always broken around 2038 (e.g. HFS in 2040), so if we can't fix them, I want to be able to turn them off in Kconfig along with the 32-bit time_t syscalls. For those where y2038 support depends on on-disk feature flags (ext4 and xfs I guess, maybe one or two more), the config option can turn off write support for the old format. This is a rather important part of the y2038 work: If anyone cares about y2038 problems in this decade, they want to deploy systems with extremely long service contracts and don't want to update them 20 years from now to fix an obscure bug that could be prevented today, so we try hard to identify every line of code that won't work then as it does today. > > For a file system, this can be done in a number of ways: > > > > * Most file systems today interpret the time as an unsigned 32-bit > > number (as opposed to signed as ext3, xfs and few others do), > > so as long as we use timespec64 in the syscalls, we are ok. > > Actually, sign conversion is a problem we currently have to be very > careful of. See, for example, xfstests:tests/generic/258, which > tests timestamps recording times before epoch. i.e. in XFS we have > to convert the unsigned 32 bit disk timestamp to signed 32 bit > before storing it in the VFS timestamp so it behaves correctly on 64 > bit systems. This results in us needing to do this when reading the > inode from disk: > > + /* > + * time is signed, so need to convert to signed 32 bit before > + * storing in inode timestamp which may be 64 bit. Otherwise > + * a time before epoch is converted to a time long after epoch > + * on 64 bit systems. > + */ > + inode->i_atime.tv_sec = (int)be32_to_cpu(from->di_atime.t_sec); > + inode->i_atime.tv_nsec = (int)be32_to_cpu(from->di_atime.t_nsec); > + inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec); > + inode->i_mtime.tv_nsec = (int)be32_to_cpu(from->di_mtime.t_nsec); > + inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec); > + inode->i_ctime.tv_nsec = (int)be32_to_cpu(from->di_ctime.t_nsec); > + > (http://oss.sgi.com/archives/xfs/2016-01/msg00456.html) I'm very aware of this issue. Most file system developers however were not, so e.g. a timestamp on btrfs is interpreted differently on 32-bit and 64-bit kernels. Some file systems (AFS, NFSv3?) explicitly define the timestamps as unsigned, and most others that store 32-bit seconds apparently never thought about the issue and happen to use unsigned interpretation on 64-bit systems, since that is what you get out of be32_to_cpu() without adding the cast. For the y2038 case, this means we are lucky: almost all the users today have 64-bit hardware, so they can already represent timestamps in the range from 1970 to 2106. Once we have 64-bit timestamps in 32-bit user space, we just need to make that use the same format we use on the 64-bit machines already. ext2/3/4, xfs and ocfs2 (maybe one or two more, I'd have to check) currently behave in a consistent manner across 32-bit and 64-bit architectures by allowing a range between 1902 and 2037, and we obviously don't have a choice there but to keep that current behavior, and extend the time format in one way or another to store additional bits for the epoch. > > * Some legacy file systems (maybe hfs) can remain disabled, as > > nobody cares about them any more. > > > > * If we still care about them (e.g. ext2), we can make them support > > only read-only mode. In ext4, this would mean forbidding write > > access to file systems that don't have the extended inode format > > enabled. > > For ext2/4, that would have to be handled internally by the > filesystem with feature masks. For other legacy filesystems, then > the VFS mount time checking could allow RO mounts if the supported > ranges are not y2038k clean. Compile time options are not > really the best approach here... I'm not following the line of thought here. We have some users that want ext4 to mount old file system images without long inodes writable, because they don't care about the 2038 problem. We also have other users that want to force the same file system image to be read-only because they want to ensure that it does not stop working correctly when the time overflow happens while the fs is mounted. If you don't want a compile-time option for it, how do you suggest we decide which case we have? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 14 January 2016 23:46:16 Arnd Bergmann wrote: > > I'm not following the line of thought here. We have some users > that want ext4 to mount old file system images without long > inodes writable, because they don't care about the 2038 problem. > We also have other users that want to force the same file system > image to be read-only because they want to ensure that it does > not stop working correctly when the time overflow happens while > the fs is mounted. > > If you don't want a compile-time option for it, how do you suggest > we decide which case we have? In case that came across wrong, I'm assuming that the first user also wants all the system calls enabled that pass 32-bit time_t values, while the second one wants them all left out from the kernel to ensure that no user space program gets incorrect data. This could be done using a sysctl of course, but I still think we want a compile-time option for the syscalls for clarity, and I would simply use the same compile-time option to determine the behavior of the file system, network protocols and device drivers that deal with 32-bit timestamps outside of the kernel. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2016 at 11:54:36PM +0100, Arnd Bergmann wrote: > On Thursday 14 January 2016 23:46:16 Arnd Bergmann wrote: > > > > I'm not following the line of thought here. We have some users > > that want ext4 to mount old file system images without long > > inodes writable, because they don't care about the 2038 problem. > > We also have other users that want to force the same file system > > image to be read-only because they want to ensure that it does > > not stop working correctly when the time overflow happens while > > the fs is mounted. > > > > If you don't want a compile-time option for it, how do you suggest > > we decide which case we have? > > In case that came across wrong, I'm assuming that the first > user also wants all the system calls enabled that pass 32-bit > time_t values, while the second one wants them all left out from > the kernel to ensure that no user space program gets incorrect > data. system call API support is a completely different class of problem. It's out of the scope of this patchset, and really I don't care what you do with them. The point I'm making is that we'll have to modify all the existing filesystem code to supply a valid timestamp range to the VFS at mount time for the range checking/clamping, similar to how we do the granularity specification right now. That means we can do rejection of non-y2038k compliant filesystems at runtime based on what the filesystem tells the VFS it supports.. Set up the default to be reject if rw, allow if ro, and provide a mount option to override ad allow mounting rw. Users can then make the decision when mounting their filesystems. If they are system/automatically mounted filesystems and aren't y2038k compliant, then the override option can be added to /etc/fstab and we're all good. If the truly paranoid users want to disallow the override and/or read only mount options, then add a sysctl to control that. Cheers, Dave.
On Thu, Jan 14, 2016 at 11:46:16PM +0100, Arnd Bergmann wrote: > On Friday 15 January 2016 08:00:01 Dave Chinner wrote: > > On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote: > > > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote: > > > > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote: > > > c) The opposite direction from b) is to first change the common > > > code, but then any direct assignment between a timespec in > > > a file system and the timespec64 in the inode/iattr/kstat/etc > > > first needs a conversion helper so we can build cleanly, > > > and then we do one file system at a time to remove them all > > > again while changing the internal structures in the > > > file system from timespec to timespec64. > > > > No new helpers are necessary - we've already got the helper > > functions we need. This: > > > > int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry) > > { > > struct inode *inode = d_inode(old_dentry); > > + struct inode_timespec now = current_fs_time(inode->i_sb); > > > > - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; > > + VFS_INODE_SET_XTIME(i_ctime, inode, now); > > + VFS_INODE_SET_XTIME(i_mtime, dir, now); > > + VFS_INODE_SET_XTIME(i_ctime, dir, now); > > inc_nlink(inode); > > ..... > > > > is just wrong. All the type conversion and clamping and checking > > done in that VFS_INODE_SET_XTIME() should be done in > > current_fs_time() and have it return a timespec64 directly. Indeed, > > it already does truncation, and can easily be made to do range > > clamping, too. i.e. the change should simply be: > > > > - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; > > + inode->i_ctime = dir->i_ctime = dir->i_mtime = current_fs_time(inode->i_sb); > > > Yes, that is the obvious case, and I guess works for at least half the > file systems when they always assign righthand side and lefthand > side of the time stamps using the external types or helpers like > CURRENT_TIME and current_fs_time(). > > However, there are a couple of file systems that need a bit more refactoring > before we can do this, e.g. in ntfs_truncate: Sure, and nfs is a pain because of all it's internal use of timespecs, too. But we have these timespec_to_timespec64 helper functions, and that's what we should use in these cases where the filesystem cannot support full 64 bit timestamps internally. In those cases, they'll be telling the superblock this at mount time things like current_fs_time() won't be returning then a timestamp that is out of range for a 32 bit timestamp.... > if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) { > struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb); > int sync_it = 0; > > if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) || > !timespec_equal(&VFS_I(base_ni)->i_ctime, &now)) > sync_it = 1; > VFS_I(base_ni)->i_mtime = now; > VFS_I(base_ni)->i_ctime = now; > } > > The type of the local variable must match the return code of > current_fs_time(), so if we change over i_mtime and current_fs_time > globally, this either has to be rewritten first to avoid the use of > local variables, or it needs temporary conversion helpers, or > it has to be changed in the same patch. None of those is particularly > appealing. There are a few dozen such things in various file systems. it gets rewritten to: struct timespec now; now = timespec64_to_timespec(current_fs_time(VFS_I(base_ni)->i_sb)); .... and the valid timestamp range for ntfs is set to 32bit timestamps. This then leaves it up to the filesystem developers to make the ntfs filesystem code 64 bit timestamp clean if the on disk format is ever changed to support 64 bit times. Same goes for NFS, and any of the other filesystems that use struct timespec internally for time representation. > > > > I think it's a mix - if the timestamps come in from userspace, > > > > fail with ERANGE. That could be controlled by sysctl via VFS > > > > part of the ->setattr operation, or in each of the individual FS > > > > implementations. If they come from the kernel (e.g. atime update) then > > > > the generic behvaiour is to warn and continue, filesystems can > > > > otherwise select their own policy for kernel updates via > > > > ->update_time. > > > > > > I'd prefer not to have it done by the individual file system > > > implementation, so we get a consistent behavior. > > > > Can't be helped, because different filesystems have different > > timestamp behaviours, and not all use the generic VFS code for > > timestamp updates. The filesystems need to use the correct helper > > functions to obtain a valid current time, but you can't stop them > > from storing and using arbitrary timestamp formats if they so > > desire... > > I mean the decision whether to clamp or error on an overflow > should be done consistently. Sure, but that comes from using the helpers we already have, and applying the clamping at a point where errors can be returned to userspace. current_fs_time() should never return a timestamp outside what the filesystem has said it supports and we've validated that behaviour at mount time. hence it's only user provided timestamps that we need range errors on. > Having a global sysctl knob or > a compile-time option is better than having each file system > implementor take a guess at what users might prefer, if we can't > come up with a behavior (e.g. clamp all the time, or error out > all the time) that everybody agrees is always correct. filesystem implementors will use the helper funtions that are provided for this. If they don't (like all the current use of CURRENT_TIME), then that's a bug that needs fixing. i.e. we need a timespec_clamp() function, similar to timespec_trunc(), and y2038k compliant filesystems and syscalls need to use them.... > > > > > > > > Not really an option, because it means we can't use filesystems that > > > > interop with other systems (e.g. cameras, etc) because they won't > > > > support y2038k timestamps for a long time, if ever (e.g. vfat). > > > > > > Let me clarify what my idea is here: I want a global kernel option > > > that disables all code that has known y2038 issues. If anyone tries > > > to build an embedded system with support beyond 2038, that should > > > disable all of those things, including file systems, drivers and > > > system calls, so we can reasonably assume that everything that works > > > today with that kernel build will keep working in the future and > > > not break in random ways. > > > > It's not that black and white when it comes to filesystems. y2038k > > support is determined by the on-disk structure of the filesystem > > being mounted, and that is determined at mount time. When the > > filesystem mounts and sets it's valid timestamp ranges the VFS > > will need to decide as to whether the filesystem is allowed to > > continue mounting or not. > > Some file systems are always broken around 2038 (e.g. HFS in 2040), > so if we can't fix them, I want to be able to turn them off > in Kconfig along with the 32-bit time_t syscalls. That can be done with kconfig depends rules - it has nothing to do with this patch set. > ext2/3/4, xfs and ocfs2 (maybe one or two more, I'd have to check) > currently behave in a consistent manner across 32-bit and 64-bit > architectures by allowing a range between 1902 and 2037, and we > obviously don't have a choice there but to keep that current > behavior, and extend the time format in one way or another to > store additional bits for the epoch. That's a filesystem implementation problem, not a generic inode timestamp problem. i.e. this is handled when the filesystem converts the inode timestamp from a timespec64 in the struct inode to whatever format it stores the timestamp on disk. That conversion does not change just because the VFS inode moves from a timespec to a timespec64. Again, those on-disk format changes to support beyond the current epoch are outside the scope of this patchset, because they are not affected by the timestamp format the VFS choses to use. Cheers, Dave.
On Fri, Jan 15, 2016 at 08:00:01AM +1100, Dave Chinner wrote: > On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote: > > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote: > > > On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote: > > > > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote: > > > > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote: > > > > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@fromorbit.com> wrote: > > > > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote: > > > > > > c) The opposite direction from b) is to first change the common > > code, but then any direct assignment between a timespec in > > a file system and the timespec64 in the inode/iattr/kstat/etc > > first needs a conversion helper so we can build cleanly, > > and then we do one file system at a time to remove them all > > again while changing the internal structures in the > > file system from timespec to timespec64. > > No new helpers are necessary - we've already got the helper > functions we need. This: > > int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry) > { > struct inode *inode = d_inode(old_dentry); > + struct inode_timespec now = current_fs_time(inode->i_sb); > > - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; > + VFS_INODE_SET_XTIME(i_ctime, inode, now); > + VFS_INODE_SET_XTIME(i_mtime, dir, now); > + VFS_INODE_SET_XTIME(i_ctime, dir, now); > inc_nlink(inode); > ..... > > is just wrong. All the type conversion and clamping and checking > done in that VFS_INODE_SET_XTIME() should be done in > current_fs_time() and have it return a timespec64 directly. Indeed, > it already does truncation, and can easily be made to do range > clamping, too. i.e. the change should simply be: > > - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; > + inode->i_ctime = dir->i_ctime = dir->i_mtime = current_fs_time(inode->i_sb); The current patch series already does this and macros don't do any clamping. The accesors are only illustrating a way of using a callback when clamping is detected. And, if we don't need accessors, I will find a different way of doing that if we agree it is necessary. +struct timespec64 current_fs_time(struct super_block *sb) +{ + struct timespec64 now = current_kernel_time64(); + + return fs_time_trunc(now, sb); +} +EXPORT_SYMBOL(current_fs_time); + +struct timespec64 current_fs_time_sec(struct super_block *sb) +{ + struct timespec64 ts = {ktime_get_real_seconds(), 0}; + + /* range check for time. */ + fs_time_range_check(sb, &ts); + + return ts; +} +struct inode_timespec +fs_time_trunc(struct inode_timespec t, struct super_block *sb) +{ + u32 gran = sb->s_time_gran; + + /* range check for time. */ + fs_time_range_check(sb, &t); + if (unlikely(is_fs_timestamp_bad(t))) + return t; + + /* Avoid division in the common cases 1 ns and 1 s. */ + if (gran == 1) + ;/* nothing */ + else if (gran == NSEC_PER_SEC) + t.tv_nsec = 0; + else if (gran > 1 && gran < NSEC_PER_SEC) + t.tv_nsec -= t.tv_nsec % gran; + else + WARN(1, "illegal file time granularity: %u", gran); + + return t; +} But really, you are still misunderstanding what inode_timespec does. It only introduces aliases and not the accessors. This is only a way to avoid code duplication since we cannot change all fs at one time. And, this is what you would do if you were coding in any object oriented language. This is object polymorphism. So, I will paste here again. Whatever is below is all the extra code inode_timespec introduces: +#ifdef CONFIG_FS_USES_64BIT_TIME + +/* Place holder defines until CONFIG_FS_USES_64BIT_TIME + * is enabled. + * timespec64 data type and functions will be used at that + * time directly and these defines will be deleted. + */ +#define inode_timespec timespec64 + +#define inode_timespec_compare timespec64_compare +#define inode_timespec_equal timespec64_equal + +#else + +#define inode_timespec timespec + +#define inode_timespec_compare timespec_compare +#define inode_timespec_equal timespec_equal + +#endif + -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 15 January 2016 13:49:37 Dave Chinner wrote: > On Thu, Jan 14, 2016 at 11:46:16PM +0100, Arnd Bergmann wrote: > > On Friday 15 January 2016 08:00:01 Dave Chinner wrote: > > > On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote: > > > > On Thursday 14 January 2016 08:04:36 Dave Chinner wrote: > > > > Yes, that is the obvious case, and I guess works for at least half the > > file systems when they always assign righthand side and lefthand > > side of the time stamps using the external types or helpers like > > CURRENT_TIME and current_fs_time(). > > > > However, there are a couple of file systems that need a bit more refactoring > > before we can do this, e.g. in ntfs_truncate: > > Sure, and nfs is a pain because of all it's internal use of > timespecs, too. lustre is probably the worst. > But we have these timespec_to_timespec64 helper > functions, and that's what we should use in these cases where the > filesystem cannot support full 64 bit timestamps internally. In > those cases, they'll be telling the superblock this at mount time > things like current_fs_time() won't be returning then a timestamp > that is out of range for a 32 bit timestamp.... I'm not worried about the runtime problems here, just how to get a series of patches that are each doing one reasonable thing at a time. > > if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) { > > struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb); > > int sync_it = 0; > > > > if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) || > > !timespec_equal(&VFS_I(base_ni)->i_ctime, &now)) > > sync_it = 1; > > VFS_I(base_ni)->i_mtime = now; > > VFS_I(base_ni)->i_ctime = now; > > } > > > > The type of the local variable must match the return code of > > current_fs_time(), so if we change over i_mtime and current_fs_time > > globally, this either has to be rewritten first to avoid the use of > > local variables, or it needs temporary conversion helpers, or > > it has to be changed in the same patch. None of those is particularly > > appealing. There are a few dozen such things in various file systems. > > it gets rewritten to: > > struct timespec now; > > now = timespec64_to_timespec(current_fs_time(VFS_I(base_ni)->i_sb)); > .... > > and the valid timestamp range for ntfs is set to 32bit timestamps. > This then leaves it up to the filesystem developers to make > the ntfs filesystem code 64 bit timestamp clean if the on disk > format is ever changed to support 64 bit times. > > Same goes for NFS, and any of the other filesystems that use struct > timespec internally for time representation. That is what I meant in my previous mail about approach c) being ugly because it requires sprinkling lots of timespec64_to_timespec() and timespec_to_timespec64() in the initial patch in order to atomically change the types in inode/iattr/kstat/... without introducing build regressions. It's a rather horrible patch, and quite likely to cause conflicts with other patches that introduce another use of those structures in the merge window. > > Having a global sysctl knob or > > a compile-time option is better than having each file system > > implementor take a guess at what users might prefer, if we can't > > come up with a behavior (e.g. clamp all the time, or error out > > all the time) that everybody agrees is always correct. > > filesystem implementors will use the helper funtions that are > provided for this. If they don't (like all the current use of > CURRENT_TIME), then that's a bug that needs fixing. Ok, then we are in total agreement here: the policy remains to be decided by common code, but the implementation can differ per file system. > i.e. we need > a timespec_clamp() function, similar to timespec_trunc(), and y2038k > compliant filesystems and syscalls need to use them.... I was thinking we end up with a single function that does both clamp() and trunk(), but that's an implementation detail. > > > > Let me clarify what my idea is here: I want a global kernel option > > > > that disables all code that has known y2038 issues. If anyone tries > > > > to build an embedded system with support beyond 2038, that should > > > > disable all of those things, including file systems, drivers and > > > > system calls, so we can reasonably assume that everything that works > > > > today with that kernel build will keep working in the future and > > > > not break in random ways. > > > > > > It's not that black and white when it comes to filesystems. y2038k > > > support is determined by the on-disk structure of the filesystem > > > being mounted, and that is determined at mount time. When the > > > filesystem mounts and sets it's valid timestamp ranges the VFS > > > will need to decide as to whether the filesystem is allowed to > > > continue mounting or not. > > > > Some file systems are always broken around 2038 (e.g. HFS in 2040), > > so if we can't fix them, I want to be able to turn them off > > in Kconfig along with the 32-bit time_t syscalls. > > That can be done with kconfig depends rules - it has nothing to do > with this patch set. kconfig dependencies is what I meant for the simple cases where a file system is known to always be broken, we just need a small modification for the cases you mentioned below. > > ext2/3/4, xfs and ocfs2 (maybe one or two more, I'd have to check) > > currently behave in a consistent manner across 32-bit and 64-bit > > architectures by allowing a range between 1902 and 2037, and we > > obviously don't have a choice there but to keep that current > > behavior, and extend the time format in one way or another to > > store additional bits for the epoch. > > That's a filesystem implementation problem, not a generic inode > timestamp problem. i.e. this is handled when the filesystem converts > the inode timestamp from a timespec64 in the struct inode to > whatever format it stores the timestamp on disk. That conversion > does not change just because the VFS inode moves from a timespec to > a timespec64. Again, those on-disk format changes to support beyond > the current epoch are outside the scope of this patchset, because > they are not affected by the timestamp format the VFS choses to use. Fine with me, we can have another series to add the Kconfig dependencies and modify the file systems that need this. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 15 January 2016 13:27:34 Dave Chinner wrote: > On Thu, Jan 14, 2016 at 11:54:36PM +0100, Arnd Bergmann wrote: > > On Thursday 14 January 2016 23:46:16 Arnd Bergmann wrote: > > > > > > I'm not following the line of thought here. We have some users > > > that want ext4 to mount old file system images without long > > > inodes writable, because they don't care about the 2038 problem. > > > We also have other users that want to force the same file system > > > image to be read-only because they want to ensure that it does > > > not stop working correctly when the time overflow happens while > > > the fs is mounted. > > > > > > If you don't want a compile-time option for it, how do you suggest > > > we decide which case we have? > > > > In case that came across wrong, I'm assuming that the first > > user also wants all the system calls enabled that pass 32-bit > > time_t values, while the second one wants them all left out from > > the kernel to ensure that no user space program gets incorrect > > data. > > system call API support is a completely different class of problem. > It's out of the scope of this patchset, and really I don't care what > you do with them. Sure, I was just providing some background about why we want a compile-time option in general. > The point I'm making is that we'll have to modify all the existing > filesystem code to supply a valid timestamp range to the VFS at > mount time for the range checking/clamping, similar to how we do the > granularity specification right now. That means we can do rejection > of non-y2038k compliant filesystems at runtime based on what the > filesystem tells the VFS it supports.. Set up the default to be > reject if rw, allow if ro, and provide a mount option to override ad > allow mounting rw. We can't really default to "reject if rw", because that would break all systems using ext3 or xfs, unless users modify their fstab or set the flag that makes the partition y2038 compliant. The compile-time option that I'm thinking of would change the default beween "always allow" and "reject if rw", based on whether the system cares about this issue or not. Almost everyone today won't care about it at all and would be rather annoyed by being unable to mount their rootfs, but some people care about the behavior a lot. Having a global sysctl or mount option as an override would be good, maybe both if that isn't over-engineering the problem when we already have a compile-time option. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 15, 2016 at 06:01:36PM +0100, Arnd Bergmann wrote: > On Friday 15 January 2016 13:27:34 Dave Chinner wrote: > > The point I'm making is that we'll have to modify all the existing > > filesystem code to supply a valid timestamp range to the VFS at > > mount time for the range checking/clamping, similar to how we do the > > granularity specification right now. That means we can do rejection > > of non-y2038k compliant filesystems at runtime based on what the > > filesystem tells the VFS it supports.. Set up the default to be > > reject if rw, allow if ro, and provide a mount option to override ad > > allow mounting rw. > > We can't really default to "reject if rw", because that would break > all systems using ext3 or xfs, unless users modify their fstab > or set the flag that makes the partition y2038 compliant. Right, I was refering to the behaviour of a y2038k compliant kernel, A current non-compliant kernel will have the default behaviour you are suggesting. > The compile-time option that I'm thinking of would change the default > beween "always allow" and "reject if rw", based on whether the > system cares about this issue or not. Almost everyone today won't > care about it at all and would be rather annoyed by being unable > to mount their rootfs, but some people care about the behavior > a lot. Yup, that's exactly what I was implying. > Having a global sysctl or mount option as an override would be good, > maybe both if that isn't over-engineering the problem when we already > have a compile-time option. Distros should not be forces to ship multiple kernels just to provide all the different runtime compliance behaviours their users require. Make the policy runtime enforcable, but select the default behaviour and supported policies via compile time options. Cheers, Dave
> On Jan 15, 2016, at 9:50 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Friday 15 January 2016 13:49:37 Dave Chinner wrote: >> On Thu, Jan 14, 2016 at 11:46:16PM +0100, Arnd Bergmann wrote: >>> On Friday 15 January 2016 08:00:01 Dave Chinner wrote: >>>> On Thu, Jan 14, 2016 at 05:53:21PM +0100, Arnd Bergmann wrote: >>>>> On Thursday 14 January 2016 08:04:36 Dave Chinner wrote: >>> >>> Yes, that is the obvious case, and I guess works for at least half the >>> file systems when they always assign righthand side and lefthand >>> side of the time stamps using the external types or helpers like >>> CURRENT_TIME and current_fs_time(). >>> >>> However, there are a couple of file systems that need a bit more refactoring >>> before we can do this, e.g. in ntfs_truncate: >> >> Sure, and nfs is a pain because of all it's internal use of >> timespecs, too. > > lustre is probably the worst. Lustre currently only has one-second granularity in a 64-bit field, so it doesn't really care about the difference between timespec or timespec64 at all. The only other uses are for measuring relative times, so the 64-bitness shouldn't really matter. Could you please point out what issues exist so they can be fixed. Cheers, Andreas >> But we have these timespec_to_timespec64 helper >> functions, and that's what we should use in these cases where the >> filesystem cannot support full 64 bit timestamps internally. In >> those cases, they'll be telling the superblock this at mount time >> things like current_fs_time() won't be returning then a timestamp >> that is out of range for a 32 bit timestamp.... > > I'm not worried about the runtime problems here, just how to > get a series of patches that are each doing one reasonable thing > at a time. > >>> if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) { >>> struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb); >>> int sync_it = 0; >>> >>> if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) || >>> !timespec_equal(&VFS_I(base_ni)->i_ctime, &now)) >>> sync_it = 1; >>> VFS_I(base_ni)->i_mtime = now; >>> VFS_I(base_ni)->i_ctime = now; >>> } >>> >>> The type of the local variable must match the return code of >>> current_fs_time(), so if we change over i_mtime and current_fs_time >>> globally, this either has to be rewritten first to avoid the use of >>> local variables, or it needs temporary conversion helpers, or >>> it has to be changed in the same patch. None of those is particularly >>> appealing. There are a few dozen such things in various file systems. >> >> it gets rewritten to: >> >> struct timespec now; >> >> now = timespec64_to_timespec(current_fs_time(VFS_I(base_ni)->i_sb)); >> .... >> >> and the valid timestamp range for ntfs is set to 32bit timestamps. >> This then leaves it up to the filesystem developers to make >> the ntfs filesystem code 64 bit timestamp clean if the on disk >> format is ever changed to support 64 bit times. >> >> Same goes for NFS, and any of the other filesystems that use struct >> timespec internally for time representation. > > That is what I meant in my previous mail about approach c) being ugly > because it requires sprinkling lots of timespec64_to_timespec() and > timespec_to_timespec64() in the initial patch in order to atomically > change the types in inode/iattr/kstat/... without introducing build > regressions. > > It's a rather horrible patch, and quite likely to cause conflicts with > other patches that introduce another use of those structures in the > merge window. > >>> Having a global sysctl knob or >>> a compile-time option is better than having each file system >>> implementor take a guess at what users might prefer, if we can't >>> come up with a behavior (e.g. clamp all the time, or error out >>> all the time) that everybody agrees is always correct. >> >> filesystem implementors will use the helper funtions that are >> provided for this. If they don't (like all the current use of >> CURRENT_TIME), then that's a bug that needs fixing. > > Ok, then we are in total agreement here: the policy remains > to be decided by common code, but the implementation can differ > per file system. > >> i.e. we need >> a timespec_clamp() function, similar to timespec_trunc(), and y2038k >> compliant filesystems and syscalls need to use them.... > > I was thinking we end up with a single function that does both > clamp() and trunk(), but that's an implementation detail. >>>>> Let me clarify what my idea is here: I want a global kernel option >>>>> that disables all code that has known y2038 issues. If anyone tries >>>>> to build an embedded system with support beyond 2038, that should >>>>> disable all of those things, including file systems, drivers and >>>>> system calls, so we can reasonably assume that everything that works >>>>> today with that kernel build will keep working in the future and >>>>> not break in random ways. >>>> >>>> It's not that black and white when it comes to filesystems. y2038k >>>> support is determined by the on-disk structure of the filesystem >>>> being mounted, and that is determined at mount time. When the >>>> filesystem mounts and sets it's valid timestamp ranges the VFS >>>> will need to decide as to whether the filesystem is allowed to >>>> continue mounting or not. >>> >>> Some file systems are always broken around 2038 (e.g. HFS in 2040), >>> so if we can't fix them, I want to be able to turn them off >>> in Kconfig along with the 32-bit time_t syscalls. >> >> That can be done with kconfig depends rules - it has nothing to do >> with this patch set. > > kconfig dependencies is what I meant for the simple cases where a > file system is known to always be broken, we just need a small > modification for the cases you mentioned below. > >>> ext2/3/4, xfs and ocfs2 (maybe one or two more, I'd have to check) >>> currently behave in a consistent manner across 32-bit and 64-bit >>> architectures by allowing a range between 1902 and 2037, and we >>> obviously don't have a choice there but to keep that current >>> behavior, and extend the time format in one way or another to >>> store additional bits for the epoch. >> >> That's a filesystem implementation problem, not a generic inode >> timestamp problem. i.e. this is handled when the filesystem converts >> the inode timestamp from a timespec64 in the struct inode to >> whatever format it stores the timestamp on disk. That conversion >> does not change just because the VFS inode moves from a timespec to >> a timespec64. Again, those on-disk format changes to support beyond >> the current epoch are outside the scope of this patchset, because >> they are not affected by the timestamp format the VFS choses to use. > > Fine with me, we can have another series to add the Kconfig dependencies > and modify the file systems that need this. > > Arnd > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas
On Saturday 16 January 2016 12:14:22 Andreas Dilger wrote: > >> > >> Sure, and nfs is a pain because of all it's internal use of > >> timespecs, too. > > > > lustre is probably the worst. > > Lustre currently only has one-second granularity in a 64-bit field, > so it doesn't really care about the difference between timespec or > timespec64 at all. > > The only other uses are for measuring relative times, so the 64-bitness > shouldn't really matter. > > Could you please point out what issues exist so they can be fixed. It's not really a bug that needs to be fixed, but more the general issue of referencing inode->i_?time and attr->ia_?time and passing them around. When we change the types in the inode and iattr from timespec to timespec64, all assigments need to be modified, and lustre has more of those assignments than any other file system I'm aware of. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Jan 16, 2016, at 4:36 PM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Saturday 16 January 2016 12:14:22 Andreas Dilger wrote: >>>> >>>> Sure, and nfs is a pain because of all it's internal use of >>>> timespecs, too. >>> >>> lustre is probably the worst. >> >> Lustre currently only has one-second granularity in a 64-bit field, >> so it doesn't really care about the difference between timespec or >> timespec64 at all. >> >> The only other uses are for measuring relative times, so the 64-bitness >> shouldn't really matter. >> >> Could you please point out what issues exist so they can be fixed. > > It's not really a bug that needs to be fixed, but more the general > issue of referencing inode->i_?time and attr->ia_?time and passing > them around. When we change the types in the inode and iattr from > timespec to timespec64, all assigments need to be modified, and lustre > has more of those assignments than any other file system I'm aware of. All of those accesses are with the LTIME_S() macro to get/set only the seconds field of the inode time, so it should only be a one-line patch? Cheers, Andreas
Based on the discussion, here is how I propose to proceed: 1. Series for timestamp range check and clamping 2. Bug fixing patches like change all CURRENT_TIME use cases to current_fs_time() 3. Patches for vfs to use timespec64 internally (maybe a series, if required) 4. Patches that change all fs that use vfs APIs using timestamp arguments (not a series) 5. Change individual fs to use timespec64 (not a series) 6. Change back whatever time conversion APIs left in vfs or individual fs (maybe a series, if required) So, I don't see a need for submitting another series as all the changes now are handled on a case by case basis and no longer have a generic theme. If everyone's in sync then I can proceed with the above plan. -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sunday 17 January 2016 22:09:26 Deepa Dinamani wrote: > Based on the discussion, here is how I propose to proceed: > > 1. Series for timestamp range check and clamping > 2. Bug fixing patches like change all CURRENT_TIME use cases to > current_fs_time() > 3. Patches for vfs to use timespec64 internally (maybe a series, if > required) > 4. Patches that change all fs that use vfs APIs using timestamp arguments > (not a series) > 5. Change individual fs to use timespec64 (not a series) > 6. Change back whatever time conversion APIs left in vfs or individual fs > (maybe a series, if required) > > So, I don't see a need for submitting another series as all the changes now > are handled on a case by case basis and no longer have a generic theme. > > If everyone's in sync then I can proceed with the above plan. Sounds good to me. Step 3 of course is the hard one, and you may run into further problems with it, as we both have in our previous attempts to crack this nut, but with step 2 before it that may become manageable. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 18, 2016 at 2:56 AM, Arnd Bergmann <arnd@arndb.de> wrote: > On Sunday 17 January 2016 22:09:26 Deepa Dinamani wrote: >> Based on the discussion, here is how I propose to proceed: >> >> 1. Series for timestamp range check and clamping >> 2. Bug fixing patches like change all CURRENT_TIME use cases to >> current_fs_time() >> 3. Patches for vfs to use timespec64 internally (maybe a series, if >> required) >> 4. Patches that change all fs that use vfs APIs using timestamp arguments >> (not a series) >> 5. Change individual fs to use timespec64 (not a series) >> 6. Change back whatever time conversion APIs left in vfs or individual fs >> (maybe a series, if required) >> >> So, I don't see a need for submitting another series as all the changes now >> are handled on a case by case basis and no longer have a generic theme. >> >> If everyone's in sync then I can proceed with the above plan. > > Sounds good to me. Step 3 of course is the hard one, and you may run into > further problems with it, as we both have in our previous attempts to > crack this nut, but with step 2 before it that may become manageable. Right, I don't agree with this approach and it will get very ugly. I was just proposing a way to move forward because it looked like we are at a stalemate. Maybe xfs doesn't have these problems but some of the other fs-es do. And, these will need changing twice: before(to use 64 bit arithmetic like cifs, use current_fs_time() like fat etc) and along with vfs. It will unnecessarily bloat the vfs switching to timespec64 code. Below are 3 example filesystem changes that illustrates this problem: Ext4: 1. cr_time 2. Encode and Decode api's Both these ext4 changes need to made along with vfs change to ext4. Many such fs exists and will make the vfs switch over very ugly. FAT: 1. fat_time_fat2unix, fat_time_unix2fat Both the above 2 functions also will have to be modified along with vfs. CIFS: 1. struct cifs_fscache_inode_auxdata - last_write_time, last_change_time 2. cifs_fattr 3. cifs_NTtimeToUnix, cifs_UnixTimeToNT, cnvrtDosUnixTm All the above cifs changes also need to be changed in the same patch as vfs switch to timespec64. I don't think there is any nicer way to do this without having an encapsulation layer like inode_timespec or accessors you mentioned to change the underlying data type in the vfs. Also, this scheme is so outrageously ugly that you can easily miss some change. There is no way of verifying the approach theoretically. Of course, I will be using kernel tests like in other cases. -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday 18 January 2016 09:40:12 Deepa Dinamani wrote: > On Mon, Jan 18, 2016 at 2:56 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Sunday 17 January 2016 22:09:26 Deepa Dinamani wrote: > >> Based on the discussion, here is how I propose to proceed: > > Sounds good to me. Step 3 of course is the hard one, and you may run into > > further problems with it, as we both have in our previous attempts to > > crack this nut, but with step 2 before it that may become manageable. > > Right, I don't agree with this approach and it will get very ugly. > I was just proposing a way to move forward because it looked like we are at > a stalemate. > > Maybe xfs doesn't have these problems but some of the other fs-es do. > And, these will need changing twice: before(to use 64 bit arithmetic > like cifs, use current_fs_time() like fat etc) and along with vfs. > > It will unnecessarily bloat the vfs switching to timespec64 code. > Below are 3 example filesystem changes that illustrates this problem: > > Ext4: > 1. cr_time > 2. Encode and Decode api's > > Both these ext4 changes need to made along with vfs change to ext4. > Many such fs exists and will make the vfs switch over very ugly. > > FAT: > 1. fat_time_fat2unix, fat_time_unix2fat > > Both the above 2 functions also will have to be modified along with vfs. > > CIFS: > 1. struct cifs_fscache_inode_auxdata - last_write_time, last_change_time > 2. cifs_fattr > 3. cifs_NTtimeToUnix, cifs_UnixTimeToNT, cnvrtDosUnixTm > > All the above cifs changes also need to be changed in the same patch as > vfs switch to timespec64. > > I don't think there is any nicer way to do this without having an > encapsulation layer like inode_timespec or accessors you mentioned to > change the underlying data type in the vfs. > > Also, this scheme is so outrageously ugly that you can easily miss > some change. There is no way of verifying the approach theoretically. > Of course, I will be using kernel tests like in other cases. I agree it's ugly and fragile to have one huge patch, but I think the best way to illustrate it is to make it as small as possible and then talk about whether that makes it acceptable or how we can work around the problems. Do you have an estimate what portion of the file systems need any changes at all before we can flip over VFS to the new types? If it's less than half, we you can try yet another variation (nothing new really, we are always dealing with the same few tricks): 1. add timestamp range checking and clamping 2. kill off CURRENT_TIME 3. for each file system that uses struct timespec internally to pass around inode timestamps, do one patch that adds a timespec_to_inode_time() and vice versa, which gets defined like static inline struct timespec timespec_to_inode(struct timespec t) { return t; } 4. change the internal representation in one patch that changes those helpers along with the struct members. 5. change the file systems to use timespec64 internally instead of timespec. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote: > On Monday 18 January 2016 09:40:12 Deepa Dinamani wrote: > > On Mon, Jan 18, 2016 at 2:56 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > > On Sunday 17 January 2016 22:09:26 Deepa Dinamani wrote: > > >> Based on the discussion, here is how I propose to proceed: > > > Sounds good to me. Step 3 of course is the hard one, and you may run into > > > further problems with it, as we both have in our previous attempts to > > > crack this nut, but with step 2 before it that may become manageable. > > > > Right, I don't agree with this approach and it will get very ugly. > > I was just proposing a way to move forward because it looked like we are at > > a stalemate. > > > > Maybe xfs doesn't have these problems but some of the other fs-es do. > > And, these will need changing twice: before(to use 64 bit arithmetic > > like cifs, use current_fs_time() like fat etc) and along with vfs. > > > > It will unnecessarily bloat the vfs switching to timespec64 code. > > Below are 3 example filesystem changes that illustrates this problem: > > > > Ext4: > > 1. cr_time > > 2. Encode and Decode api's > > > > Both these ext4 changes need to made along with vfs change to ext4. > > Many such fs exists and will make the vfs switch over very ugly. > > > > FAT: > > 1. fat_time_fat2unix, fat_time_unix2fat > > > > Both the above 2 functions also will have to be modified along with vfs. > > > > CIFS: > > 1. struct cifs_fscache_inode_auxdata - last_write_time, last_change_time > > 2. cifs_fattr > > 3. cifs_NTtimeToUnix, cifs_UnixTimeToNT, cnvrtDosUnixTm > > > > All the above cifs changes also need to be changed in the same patch as > > vfs switch to timespec64. > > > > I don't think there is any nicer way to do this without having an > > encapsulation layer like inode_timespec or accessors you mentioned to > > change the underlying data type in the vfs. > > > > Also, this scheme is so outrageously ugly that you can easily miss > > some change. There is no way of verifying the approach theoretically. > > Of course, I will be using kernel tests like in other cases. > > I agree it's ugly and fragile to have one huge patch, Nobody is suggesting one huge patch here. This can all be done with small steps. > but I think the > best way to illustrate it is to make it as small as possible and > then talk about whether that makes it acceptable or how we can > work around the problems. > > Do you have an estimate what portion of the file systems need any > changes at all before we can flip over VFS to the new types? All filesystems will, at least, need auditing. A large number of them will need changes, no matter how we "abstract" the VFS type change, even if it is just for 32->64 bit sign extension bugs. Filesystems that have intermediate timestamp formats such as Lustre, NFS, CIFS, etc will need conversion at the vfs/filesytem entry points, and their internals will remain unchanged. Fixing the internals is outside the scope fo the VFS change - the 64 bit VFS inode support stops at the VFS inode/filesystem boundary. > If it's less than half, we you can try yet another variation (nothing > new really, we are always dealing with the same few tricks): > > 1. add timestamp range checking and clamping > 2. kill off CURRENT_TIME Other way around. First make everything use the existing current time functions, then ensure that incoming timestamps are truncated correctly, then add range checking and clamping to the existing time modification functions. > 3. for each file system that uses struct timespec internally to pass > around inode timestamps, do one patch that adds a > timespec_to_inode_time() and vice versa, which gets defined like > > static inline struct timespec timespec_to_inode(struct timespec t) > { > return t; > } This works, and is much cleaner than propagating the macro nastiness everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be better named as it describes the conversion exactly. I don't think this is a huge patch, though - it's mainly the setattr/kstat operations that need changing here. > 4. change the internal representation in one patch that changes those > helpers along with the struct members. If you are talking about converting internal filesystem representations to (e.g. CIFS fattr, NFS fattr, etc) then this is wrong. Those filesystems are isolated and able to use timespecs internally by step 3, and without protocol/format changes can't support y2038k compliant dates. Hence fixing such problems is a problem for the filesystem developers and is not an issue for the VFS timestamp conversion. That said, stuff like the ext4 encode/decode routines (my eyes, they bleed!) that pass the VFS inode timestamp by reference to other functions will need fixing here. > 5. change the file systems to use timespec64 internally instead of > timespec. I think that will work and leave use with a relatively clean code base, as well as be able to address y2038k support each individual filesystem in our own time. Cheers, Dave.
On Tuesday 19 January 2016 08:14:59 Dave Chinner wrote: > On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote: > > On Monday 18 January 2016 09:40:12 Deepa Dinamani wrote: > > > On Mon, Jan 18, 2016 at 2:56 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > > > I agree it's ugly and fragile to have one huge patch, > > Nobody is suggesting one huge patch here. This can all be done with > small steps. > > > but I think the > > best way to illustrate it is to make it as small as possible and > > then talk about whether that makes it acceptable or how we can > > work around the problems. > > > > Do you have an estimate what portion of the file systems need any > > changes at all before we can flip over VFS to the new types? > > All filesystems will, at least, need auditing. A large number of > them will need changes, no matter how we "abstract" the VFS type > change, even if it is just for 32->64 bit sign extension bugs. > > Filesystems that have intermediate timestamp formats such as Lustre, > NFS, CIFS, etc will need conversion at the vfs/filesytem entry > points, and their internals will remain unchanged. Fixing the > internals is outside the scope fo the VFS change - the 64 bit VFS > inode support stops at the VFS inode/filesystem boundary. What I meant with "one huge patch" is simply just the change that is needed to modify the type, if we don't use conversion helper functions. > > If it's less than half, we you can try yet another variation (nothing > > new really, we are always dealing with the same few tricks): > > > > 1. add timestamp range checking and clamping > > 2. kill off CURRENT_TIME > > Other way around. First make everything use the existing current > time functions, then ensure that incoming timestamps are truncated > correctly, then add range checking and clamping to the existing > time modification functions. Makes sense. > > 3. for each file system that uses struct timespec internally to pass > > around inode timestamps, do one patch that adds a > > timespec_to_inode_time() and vice versa, which gets defined like > > > > static inline struct timespec timespec_to_inode(struct timespec t) > > { > > return t; > > } > > This works, and is much cleaner than propagating the macro nastiness > everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be > better named as it describes the conversion exactly. I don't think > this is a huge patch, though - it's mainly the setattr/kstat > operations that need changing here. Good idea for the name. If you are ok with adding those helpers, then it can be done in small steps indeed. I was under the assumption that you didn't like any kind of abstraction of the type in struct inode at all. > > 4. change the internal representation in one patch that changes those > > helpers along with the struct members. > > If you are talking about converting internal filesystem > representations to (e.g. CIFS fattr, NFS fattr, etc) then this is > wrong. Those filesystems are isolated and able to use timespecs > internally by step 3, and without protocol/format changes can't > support y2038k compliant dates. Hence fixing such problems is a > problem for the filesystem developers and is not an issue for the > VFS timestamp conversion. No, once we have the timespec_to_vfs_time helpers in all file systems, that change is just for VFS, and should not touch any file system specific code. It is the equivalent of patch 8/15 in the current version of the series, except that it changes one version of the code to another rather than changing a CONFIG_* symbol that alternates between the two versions coexisting in source. When I first attempted the conversion, I ended up with a very similar trick that Deepa has now, and it's very helpful to find what the code locations are that need to be touched, without doing them all at the same time, as you can simply flip that option to try out another file system. However, I agree that this is better not reflected in how the patches get applied in the end, and there is no need to clutter the git history with having both options in the code at the same time, and we should try to avoid touching a lot of code more than once wherever possible. > > 5. change the file systems to use timespec64 internally instead of > > timespec. > > I think that will work and leave use with a relatively clean code > base, as well as be able to address y2038k support each individual > filesystem in our own time. Ok, thanks. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 18, 2016 at 10:46:07PM +0100, Arnd Bergmann wrote: > On Tuesday 19 January 2016 08:14:59 Dave Chinner wrote: > > On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote: > > > 3. for each file system that uses struct timespec internally to pass > > > around inode timestamps, do one patch that adds a > > > timespec_to_inode_time() and vice versa, which gets defined like > > > > > > static inline struct timespec timespec_to_inode(struct timespec t) > > > { > > > return t; > > > } > > > > This works, and is much cleaner than propagating the macro nastiness > > everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be > > better named as it describes the conversion exactly. I don't think > > this is a huge patch, though - it's mainly the setattr/kstat > > operations that need changing here. > > Good idea for the name. > > If you are ok with adding those helpers, then it can be done in small > steps indeed. I was under the assumption that you didn't like any > kind of abstraction of the type in struct inode at all. You're right, I don't like unnecessary abstractions. I guess I've not communicated the "convert timestamps at the edges, use native timestamp types everywhere inside" structure very well, because type conversion functions such as the above are an absolutely necessary part of ensuring we don't need abstractions in the core code... :P > > > 4. change the internal representation in one patch that changes those > > > helpers along with the struct members. > > > > If you are talking about converting internal filesystem > > representations to (e.g. CIFS fattr, NFS fattr, etc) then this is > > wrong. Those filesystems are isolated and able to use timespecs > > internally by step 3, and without protocol/format changes can't > > support y2038k compliant dates. Hence fixing such problems is a > > problem for the filesystem developers and is not an issue for the > > VFS timestamp conversion. > > No, once we have the timespec_to_vfs_time helpers in all file > systems, that change is just for VFS, and should not touch > any file system specific code. OK, just wanted to make clear, because to me "internal" tends to mean "within a specific filesystem" whilst "generic" is used to refer to things at the VFS layer... Cheers, Dave.
On Mon, Jan 18, 2016 at 5:38 PM, Dave Chinner <david@fromorbit.com> wrote: > On Mon, Jan 18, 2016 at 10:46:07PM +0100, Arnd Bergmann wrote: >> On Tuesday 19 January 2016 08:14:59 Dave Chinner wrote: >> > On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote: >> > > 3. for each file system that uses struct timespec internally to pass >> > > around inode timestamps, do one patch that adds a >> > > timespec_to_inode_time() and vice versa, which gets defined like >> > > >> > > static inline struct timespec timespec_to_inode(struct timespec t) >> > > { >> > > return t; >> > > } >> > >> > This works, and is much cleaner than propagating the macro nastiness >> > everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be >> > better named as it describes the conversion exactly. I don't think >> > this is a huge patch, though - it's mainly the setattr/kstat >> > operations that need changing here. >> >> Good idea for the name. >> >> If you are ok with adding those helpers, then it can be done in small >> steps indeed. I was under the assumption that you didn't like any >> kind of abstraction of the type in struct inode at all. > > You're right, I don't like unnecessary abstractions. I guess I've > not communicated the "convert timestamps at the edges, use native > timestamp types everywhere inside" structure very well, because type > conversion functions such as the above are an absolutely necessary > part of ensuring we don't need abstractions in the core code... :P Let's back out a bit and consider a few changes with the suggested "abstraction": original code: extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec *ts, __le16 __time, __le16 __date, u8 time_cs); fat_time_fat2unix(sbi, &inode->i_mtime, de->time, de->date, 0); becomes ugly extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec64 *ts, __le16 __time, __le16 __date, u8 time_cs); struct timespec64 mtime = vfs_time_to_timespec64(i_mtime, inode); fat_time_fat2unix(sbi, &mtime, de->time, de->date, 0); with inode_timespec it becomes extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct inode_timespec *ts, __le16 __time, __le16 __date, u8 time_cs); fat_time_fat2unix(sbi, &inode->i_mtime, de->time, de->date, 0); Time-conversion function abstraction: Pros: 1. do not have to change vfs core code. Cons: 1. makes all the filesystems that have to use this ugly. 2. How do you make people use these all the time and not go back to use inode_timestamps directly as is the case right now? 3. Even during this switch, how do you stop people from adding new code which does not use these functions? 4. There are some scenarios like direct assignments when these conversions are not required and some other times they are. Imposing something that needs to be only used sometimes and not even having a clear guidelines on this is done is very very wrong. 5. And, if you do not plan on removing these functions once done switching to timespec64, I doubt it will ever get used again and you are leaving dead code in. 6. And, we cannot proving everything is in sync is again a problem. inode_timespec: Pros: 1. does not have to change vfs code. 2. individual filesystem changes are also less ugly. 3. Easy to manage: 1 simple rule: always use inode_timespec from now on until the conversion and then use timespec64. 4. Goes away in the end. 5. Every step is simple and can be proved theoretically right. Cons: 1. Needs 2 step process as with the other approach. I think inode_timespec is a much better abstraction. And, if we are going to use one, then it better be the right one. -Deepa -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 18, 2016 at 09:27:13PM -0800, Deepa Dinamani wrote: > > > On Mon, Jan 18, 2016 at 5:38 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Jan 18, 2016 at 10:46:07PM +0100, Arnd Bergmann wrote: > >> On Tuesday 19 January 2016 08:14:59 Dave Chinner wrote: > >> > On Mon, Jan 18, 2016 at 08:53:22PM +0100, Arnd Bergmann wrote: > >> > > 3. for each file system that uses struct timespec internally to pass > >> > > around inode timestamps, do one patch that adds a > >> > > timespec_to_inode_time() and vice versa, which gets defined like > >> > > > >> > > static inline struct timespec timespec_to_inode(struct timespec t) > >> > > { > >> > > return t; > >> > > } > >> > > >> > This works, and is much cleaner than propagating the macro nastiness > >> > everywhere. IMO vfs_time_to_timespec()/timespec_to_vfs_time would be > >> > better named as it describes the conversion exactly. I don't think > >> > this is a huge patch, though - it's mainly the setattr/kstat > >> > operations that need changing here. > >> > >> Good idea for the name. > >> > >> If you are ok with adding those helpers, then it can be done in small > >> steps indeed. I was under the assumption that you didn't like any > >> kind of abstraction of the type in struct inode at all. > > > > You're right, I don't like unnecessary abstractions. I guess I've > > not communicated the "convert timestamps at the edges, use native > > timestamp types everywhere inside" structure very well, because type > > conversion functions such as the above are an absolutely necessary > > part of ensuring we don't need abstractions in the core code... :P > > > Let's back out a bit and consider a few changes with the suggested "abstraction": > > original code: > > extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec *ts, > __le16 __time, __le16 __date, u8 time_cs); > > fat_time_fat2unix(sbi, &inode->i_mtime, de->time, de->date, 0); > > becomes ugly > > extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec64 *ts, > __le16 __time, __le16 __date, u8 time_cs); > > struct timespec64 mtime = vfs_time_to_timespec64(i_mtime, inode); > fat_time_fat2unix(sbi, &mtime, de->time, de->date, 0); You're doing it wrong. fat_time_fat2unix() still gets passed &inode->i_mtime, and the function prototype is changed to a timespec64. *Nothing else needs to change*, because fat_time_fat2unix() does it own calculations and then stores the time directly into the timespec structure members.... I think you're making a mountain out of a molehill. Most filesystems will be unchanged except for s/timespec/timespec64/ as they store values directly into timespec members when encoding/decoding. There is no need for timestamp conversion in places like this - you're simply not looking deep enough and applying the conversion at the wrong layer. Cheers, Dave.
diff --git a/include/linux/fs.h b/include/linux/fs.h index 12ba937..b9f3cee 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -245,13 +245,13 @@ typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate); */ struct iattr { unsigned int ia_valid; - umode_t ia_mode; - kuid_t ia_uid; - kgid_t ia_gid; - loff_t ia_size; - struct timespec ia_atime; - struct timespec ia_mtime; - struct timespec ia_ctime; + umode_t ia_mode; + kuid_t ia_uid; + kgid_t ia_gid; + loff_t ia_size; + struct inode_timespec ia_atime; + struct inode_timespec ia_mtime; + struct inode_timespec ia_ctime; /* * Not an attribute, but an auxiliary info for filesystems wanting to @@ -616,9 +616,18 @@ struct inode { }; dev_t i_rdev; loff_t i_size; +#ifdef CONFIG_FS_USES_64BIT_TIME + time64_t i_atime_sec; + time64_t i_mtime_sec; + time64_t i_ctime_sec; + s32 i_atime_nsec; + s32 i_mtime_nsec; + s32 i_ctime_nsec; +#else struct timespec i_atime; struct timespec i_mtime; struct timespec i_ctime; +#endif spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; unsigned int i_blkbits; @@ -679,6 +688,38 @@ struct inode { void *i_private; /* fs or device private pointer */ }; +#ifdef CONFIG_FS_USES_64BIT_TIME + +#define VFS_INODE_SET_XTIME(xtime, inode, ts64) \ + do { \ + struct inode_timespec __ts = (ts64); \ + (inode)->xtime##_sec = __ts.tv_sec; \ + (inode)->xtime##_nsec = __ts.tv_nsec; \ + } while (0) + +#define VFS_INODE_GET_XTIME(xtime, inode) \ + (struct timespec64){.tv_sec = (inode)->xtime##_sec, \ + .tv_nsec = (inode)->xtime##_nsec} + +#else + +#define VFS_INODE_SET_XTIME(xtime, inode, ts) \ + ((inode)->xtime = (ts)) + +#define VFS_INODE_GET_XTIME(xtime, inode) \ + ((inode)->xtime) + +#endif + +#define VFS_INODE_SWAP_XTIME(xtime, inode1, inode2) \ + do { \ + struct inode_timespec __ts = \ + VFS_INODE_GET_XTIME(xtime, inode1); \ + VFS_INODE_SET_XTIME(xtime, inode1, \ + VFS_INODE_GET_XTIME(xtime, inode2)); \ + VFS_INODE_SET_XTIME(xtime, inode2, __ts); \ + } while (0) + static inline int inode_unhashed(struct inode *inode) { return hlist_unhashed(&inode->i_hash); diff --git a/include/linux/stat.h b/include/linux/stat.h index 075cb0c..559983f 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -27,9 +27,9 @@ struct kstat { kgid_t gid; dev_t rdev; loff_t size; - struct timespec atime; - struct timespec mtime; - struct timespec ctime; + struct inode_timespec atime; + struct inode_timespec mtime; + struct inode_timespec ctime; unsigned long blksize; unsigned long long blocks; }; diff --git a/include/linux/time64.h b/include/linux/time64.h index 367d5af..be98201 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -26,6 +26,27 @@ struct itimerspec64 { #endif +#ifdef CONFIG_FS_USES_64BIT_TIME + +/* Place holder defines until CONFIG_FS_USES_64BIT_TIME + * is enabled. + * timespec64 data type and functions will be used at that + * time directly and these defines will be deleted. + */ +#define inode_timespec timespec64 + +#define inode_timespec_compare timespec64_compare +#define inode_timespec_equal timespec64_equal + +#else + +#define inode_timespec timespec + +#define inode_timespec_compare timespec_compare +#define inode_timespec_equal timespec_equal + +#endif + /* Parameters used to convert the timespec values: */ #define MSEC_PER_SEC 1000L #define USEC_PER_MSEC 1000L
The current representation of inode times in struct inode, struct iattr, and struct kstat use struct timespec. timespec is not y2038 safe. Use scalar data types (seconds and nanoseconds stored separately) to represent timestamps in struct inode in order to maintain same size for times across 32 bit and 64 bit architectures. In addition, lay them out such that they are packed on a naturally aligned boundary on 64 bit arch as 4 bytes are used to store nsec. This makes each tuple(sec, nscec) use 12 bytes instead of 16 bytes. This will help save RAM space as inode structure is cached in memory. The other structures are transient and do not benefit from these changes. Add accessors for inode timestamps. These provide a way to access the time field members. Accessors abstract the timestamp representation so that any logic to convert between the struct inode timestamps and other interfaces can be placed here. The plan is to globally change all references to these types through these accessors only. So when the actual internal representation changes, it will be transparent to the outside world. This can be extended to add code to validate the inode times that are being set. Macros are chosen as accessors rather than functions because we can condense 3 {a,c,m} time functions into a single macro. After we agree on an approach, the implementation could be changed to use static inline functions if it suits more. Add inode_timespec aliases to help convert kstat and iattr times to use 64 bit times. These hide the internal data type. Use uapi exposed data types here to keep minimal timstamp data type conversions in API's interfacing with vfs. After the CONFIG_FS_USES_64BIT_TIME is enabled, all inode_timespec aliases will be removed and timespec64 data types and API's will be used directly. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> --- include/linux/fs.h | 55 +++++++++++++++++++++++++++++++++++++++++++------- include/linux/stat.h | 6 +++--- include/linux/time64.h | 21 +++++++++++++++++++ 3 files changed, 72 insertions(+), 10 deletions(-)