Message ID | 20240227174649.GL6184@frogsfrogsfrogs (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/14] vfs: export remap and write check helpers | expand |
On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@kernel.org> wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > To head off bikeshedding about the fields in xfs_commit_range, let's > make it an opaque u64 array and require the userspace program to call > a third ioctl to sample the freshness data for us. If we ever converge > on a definition for i_version then we can use that; for now we'll just > use mtime/ctime like the old swapext ioctl. This addresses my concerns about using mtime/ctime. I have to say, Darrick, that I think that referring to this concern as bikeshedding is not being honest. I do hate nit picking reviews and I do hate "maybe also fix the world" review comments, but I think the question about using mtime/ctime in this new API was not out of place and I think that making the freshness data opaque is better for everyone in the long run and hopefully, this will help you move to the things you care about faster. Thanks, Amir.
On Tue, Feb 27, 2024 at 08:52:58PM +0200, Amir Goldstein wrote: > On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > To head off bikeshedding about the fields in xfs_commit_range, let's > > make it an opaque u64 array and require the userspace program to call > > a third ioctl to sample the freshness data for us. If we ever converge > > on a definition for i_version then we can use that; for now we'll just > > use mtime/ctime like the old swapext ioctl. > > This addresses my concerns about using mtime/ctime. Oh good! :) > I have to say, Darrick, that I think that referring to this concern as > bikeshedding is not being honest. > > I do hate nit picking reviews and I do hate "maybe also fix the world" > review comments, but I think the question about using mtime/ctime in > this new API was not out of place I agree, your question about mtime/ctime: "Maybe a stupid question, but under which circumstances would mtime change and ctime not change? Why are both needed?" was a very good question. But perhaps that statement referred to the other part of that thread. > and I think that making the freshness > data opaque is better for everyone in the long run and hopefully, this will > help you move to the things you care about faster. I wish you'd suggested an opaque blob that the fs can lay out however it wants instead of suggesting specifically the change cookie. I'm very much ok with an opaque freshness blob that allows future flexibility in how we define the blob's contents. I was however very upset about the Jeff's suggestion of using i_version. I apologize for using all caps in that reply, and snarling about it in the commit message here. The final version of this patch will not have that. That said, I don't think it is at all helpful to suggest using a file attribute whose behavior is as yet unresolved. Multigrain timestamps were a clever idea, regrettably reverted. As far as I could tell when I wrote my reply, neither had NFS implemented a better behavior and quietly merged it; nor have Jeff and Dave produced any sort of candidate patchset to fix all the resulting issues in XFS. Reading "I realize that STATX_CHANGE_COOKIE is currently kernel internal" made me think "OH $deity, they wants me to do that work too???" A better way to have woreded that might've been "How about switching this to a fs-determined structure so that we can switch the freshness check to i_version when that's fully working on XFS?" The problem I have with reading patch review emails is that I can't easily tell whether an author's suggestion is being made in a casual offhand manner? Or if it reflects something they feel strongly needs change before merging. In fairness to you, Amir, I don't know how much you've kept on top of that i_version vs. XFS discussion. So I have no idea if you were aware of the status of that work. --D > Thanks, > Amir. >
On Fri, Mar 1, 2024 at 1:27 AM Darrick J. Wong <djwong@kernel.org> wrote: > > On Tue, Feb 27, 2024 at 08:52:58PM +0200, Amir Goldstein wrote: > > On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > To head off bikeshedding about the fields in xfs_commit_range, let's > > > make it an opaque u64 array and require the userspace program to call > > > a third ioctl to sample the freshness data for us. If we ever converge > > > on a definition for i_version then we can use that; for now we'll just > > > use mtime/ctime like the old swapext ioctl. > > > > This addresses my concerns about using mtime/ctime. > > Oh good! :) > > > I have to say, Darrick, that I think that referring to this concern as > > bikeshedding is not being honest. > > > > I do hate nit picking reviews and I do hate "maybe also fix the world" > > review comments, but I think the question about using mtime/ctime in > > this new API was not out of place > > I agree, your question about mtime/ctime: > > "Maybe a stupid question, but under which circumstances would mtime > change and ctime not change? Why are both needed?" > > was a very good question. But perhaps that statement referred to the > other part of that thread. > > > and I think that making the freshness > > data opaque is better for everyone in the long run and hopefully, this will > > help you move to the things you care about faster. > > I wish you'd suggested an opaque blob that the fs can lay out however it > wants instead of suggesting specifically the change cookie. I'm very > much ok with an opaque freshness blob that allows future flexibility in > how we define the blob's contents. > I wish I had thought of it myself - it is a good idea - just did not occur to me. Using the language of i_changecounter, that is "the current xfs implementation of i_version", I still think that using it as the content of the opaque freshness blob makes more sense than mtime+ctime, but it is none of my concern what you decide to fill in the freshness blob for the first version. I was not aware of the way xfs_fsr is currently using mtime+ctime when I replied and I am not sure if and how it is relevant to the new API. > I was however very upset about the Jeff's suggestion of using i_version. > I apologize for using all caps in that reply, and snarling about it in > the commit message here. The final version of this patch will not have > that. > > That said, I don't think it is at all helpful to suggest using a file > attribute whose behavior is as yet unresolved. Multigrain timestamps > were a clever idea, regrettably reverted. As far as I could tell when I > wrote my reply, neither had NFS implemented a better behavior and > quietly merged it; nor have Jeff and Dave produced any sort of candidate > patchset to fix all the resulting issues in XFS. > > Reading "I realize that STATX_CHANGE_COOKIE is currently kernel > internal" made me think "OH $deity, they wants me to do that work > too???" > > A better way to have woreded that might've been "How about switching > this to a fs-determined structure so that we can switch the freshness > check to i_version when that's fully working on XFS?" > Yeh, I should have chosen my words more carefully. I was perfectly aware of your lack of interest in doing extra work and wasn't trying to request any. > The problem I have with reading patch review emails is that I can't > easily tell whether an author's suggestion is being made in a casual > offhand manner? Or if it reflects something they feel strongly needs > change before merging. > Can't speak for everyone else, but coming from the middle east, I have fewer politeness filters. When I write "wouldn't it be better to use change_cookie?" I am just asking that question. When I am asking something to be changed before merge, I try to be much more explicit about it and this is what I expect others to do when reviewing my patches. Thanks, Amir.
On Thu, 2024-02-29 at 15:27 -0800, Darrick J. Wong wrote: > On Tue, Feb 27, 2024 at 08:52:58PM +0200, Amir Goldstein wrote: > > On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > To head off bikeshedding about the fields in xfs_commit_range, let's > > > make it an opaque u64 array and require the userspace program to call > > > a third ioctl to sample the freshness data for us. If we ever converge > > > on a definition for i_version then we can use that; for now we'll just > > > use mtime/ctime like the old swapext ioctl. > > > > This addresses my concerns about using mtime/ctime. > > Oh good! :) > > > I have to say, Darrick, that I think that referring to this concern as > > bikeshedding is not being honest. > > > > I do hate nit picking reviews and I do hate "maybe also fix the world" > > review comments, but I think the question about using mtime/ctime in > > this new API was not out of place > > I agree, your question about mtime/ctime: > > "Maybe a stupid question, but under which circumstances would mtime > change and ctime not change? Why are both needed?" > > was a very good question. But perhaps that statement referred to the > other part of that thread. > > > and I think that making the freshness > > data opaque is better for everyone in the long run and hopefully, this will > > help you move to the things you care about faster. > > I wish you'd suggested an opaque blob that the fs can lay out however it > wants instead of suggesting specifically the change cookie. I'm very > much ok with an opaque freshness blob that allows future flexibility in > how we define the blob's contents. > > I was however very upset about the Jeff's suggestion of using i_version. > I apologize for using all caps in that reply, and snarling about it in > the commit message here. The final version of this patch will not have > that. > > That said, I don't think it is at all helpful to suggest using a file > attribute whose behavior is as yet unresolved. Multigrain timestamps > were a clever idea, regrettably reverted. As far as I could tell when I > wrote my reply, neither had NFS implemented a better behavior and > quietly merged it; nor have Jeff and Dave produced any sort of candidate > patchset to fix all the resulting issues in XFS. > > Reading "I realize that STATX_CHANGE_COOKIE is currently kernel > internal" made me think "OH $deity, they wants me to do that work > too???" > > A better way to have woreded that might've been "How about switching > this to a fs-determined structure so that we can switch the freshness > check to i_version when that's fully working on XFS?" > > The problem I have with reading patch review emails is that I can't > easily tell whether an author's suggestion is being made in a casual > offhand manner? Or if it reflects something they feel strongly needs > change before merging. > > In fairness to you, Amir, I don't know how much you've kept on top of > that i_version vs. XFS discussion. So I have no idea if you were aware > of the status of that work. > Sorry, I didn't mean to trigger anyone, but I do have real concerns about any API that attempts to use timestamps to detect whether something has changed. We learned that lesson in NFS in the 90's. VFS timestamp resolution is just not enough to show whether there was a change to a file -- full stop. I get the hand-wringing over i_version definitions and I don't care to rehash that discussion here, but I'll point out that this is a (proposed) XFS-private interface: What you could do is expose the XFS change counter (the one that gets bumped for everything, even atime updates, possibly via different ioctl), and use that for your "freshness" check. You'd unfortunately get false negative freshness checks after read operations, but you shouldn't get any false positives (which is real danger with timestamps).
On Fri, Mar 01, 2024 at 08:31:21AM -0500, Jeff Layton wrote: > On Thu, 2024-02-29 at 15:27 -0800, Darrick J. Wong wrote: > > On Tue, Feb 27, 2024 at 08:52:58PM +0200, Amir Goldstein wrote: > > > On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > To head off bikeshedding about the fields in xfs_commit_range, let's > > > > make it an opaque u64 array and require the userspace program to call > > > > a third ioctl to sample the freshness data for us. If we ever converge > > > > on a definition for i_version then we can use that; for now we'll just > > > > use mtime/ctime like the old swapext ioctl. > > > > > > This addresses my concerns about using mtime/ctime. > > > > Oh good! :) > > > > > I have to say, Darrick, that I think that referring to this concern as > > > bikeshedding is not being honest. > > > > > > I do hate nit picking reviews and I do hate "maybe also fix the world" > > > review comments, but I think the question about using mtime/ctime in > > > this new API was not out of place > > > > I agree, your question about mtime/ctime: > > > > "Maybe a stupid question, but under which circumstances would mtime > > change and ctime not change? Why are both needed?" > > > > was a very good question. But perhaps that statement referred to the > > other part of that thread. > > > > > and I think that making the freshness > > > data opaque is better for everyone in the long run and hopefully, this will > > > help you move to the things you care about faster. > > > > I wish you'd suggested an opaque blob that the fs can lay out however it > > wants instead of suggesting specifically the change cookie. I'm very > > much ok with an opaque freshness blob that allows future flexibility in > > how we define the blob's contents. > > > > I was however very upset about the Jeff's suggestion of using i_version. > > I apologize for using all caps in that reply, and snarling about it in > > the commit message here. The final version of this patch will not have > > that. > > > > That said, I don't think it is at all helpful to suggest using a file > > attribute whose behavior is as yet unresolved. Multigrain timestamps > > were a clever idea, regrettably reverted. As far as I could tell when I > > wrote my reply, neither had NFS implemented a better behavior and > > quietly merged it; nor have Jeff and Dave produced any sort of candidate > > patchset to fix all the resulting issues in XFS. > > > > Reading "I realize that STATX_CHANGE_COOKIE is currently kernel > > internal" made me think "OH $deity, they wants me to do that work > > too???" > > > > A better way to have woreded that might've been "How about switching > > this to a fs-determined structure so that we can switch the freshness > > check to i_version when that's fully working on XFS?" > > > > The problem I have with reading patch review emails is that I can't > > easily tell whether an author's suggestion is being made in a casual > > offhand manner? Or if it reflects something they feel strongly needs > > change before merging. > > > > In fairness to you, Amir, I don't know how much you've kept on top of > > that i_version vs. XFS discussion. So I have no idea if you were aware > > of the status of that work. > > > > Sorry, I didn't mean to trigger anyone, but I do have real concerns > about any API that attempts to use timestamps to detect whether > something has changed. > > We learned that lesson in NFS in the 90's. VFS timestamp resolution is > just not enough to show whether there was a change to a file -- full > stop. > > I get the hand-wringing over i_version definitions and I don't care to > rehash that discussion here, but I'll point out that this is a > (proposed) XFS-private interface: > > What you could do is expose the XFS change counter (the one that gets > bumped for everything, even atime updates, possibly via different > ioctl), and use that for your "freshness" check. > > You'd unfortunately get false negative freshness checks after read > operations, but you shouldn't get any false positives (which is real > danger with timestamps). I don't see how would that work for this usecase? You have to sample file2 before reflinking file2's contents to file1, writing the changes to file1, and executing COMMIT_RANGE. Setting the xfs-private REFLINK inode flag on file2 will trigger an iversion update even though it won't change mtime or ctime. The COMMIT then fails due to the inode flags change. Worse yet, applications aren't going to know if a particular access is actually the one that will trigger an atime update. So this will just fail unpredictably. If iversion was purely a write counter then I would switch the freshness implementation to use it. But it's not, and I know this to be true because I tried that and could not get COMMIT_RANGE to work reliably. I suppose the advantage of the blob thing is that we actually /can/ switch over whenever it's ready. --D > -- > Jeff Layton <jlayton@kernel.org> >
On Fri, 2024-03-01 at 18:48 -0800, Darrick J. Wong wrote: > On Fri, Mar 01, 2024 at 08:31:21AM -0500, Jeff Layton wrote: > > On Thu, 2024-02-29 at 15:27 -0800, Darrick J. Wong wrote: > > > On Tue, Feb 27, 2024 at 08:52:58PM +0200, Amir Goldstein wrote: > > > > On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > > > To head off bikeshedding about the fields in xfs_commit_range, let's > > > > > make it an opaque u64 array and require the userspace program to call > > > > > a third ioctl to sample the freshness data for us. If we ever converge > > > > > on a definition for i_version then we can use that; for now we'll just > > > > > use mtime/ctime like the old swapext ioctl. > > > > > > > > This addresses my concerns about using mtime/ctime. > > > > > > Oh good! :) > > > > > > > I have to say, Darrick, that I think that referring to this concern as > > > > bikeshedding is not being honest. > > > > > > > > I do hate nit picking reviews and I do hate "maybe also fix the world" > > > > review comments, but I think the question about using mtime/ctime in > > > > this new API was not out of place > > > > > > I agree, your question about mtime/ctime: > > > > > > "Maybe a stupid question, but under which circumstances would mtime > > > change and ctime not change? Why are both needed?" > > > > > > was a very good question. But perhaps that statement referred to the > > > other part of that thread. > > > > > > > and I think that making the freshness > > > > data opaque is better for everyone in the long run and hopefully, this will > > > > help you move to the things you care about faster. > > > > > > I wish you'd suggested an opaque blob that the fs can lay out however it > > > wants instead of suggesting specifically the change cookie. I'm very > > > much ok with an opaque freshness blob that allows future flexibility in > > > how we define the blob's contents. > > > > > > I was however very upset about the Jeff's suggestion of using i_version. > > > I apologize for using all caps in that reply, and snarling about it in > > > the commit message here. The final version of this patch will not have > > > that. > > > > > > That said, I don't think it is at all helpful to suggest using a file > > > attribute whose behavior is as yet unresolved. Multigrain timestamps > > > were a clever idea, regrettably reverted. As far as I could tell when I > > > wrote my reply, neither had NFS implemented a better behavior and > > > quietly merged it; nor have Jeff and Dave produced any sort of candidate > > > patchset to fix all the resulting issues in XFS. > > > > > > Reading "I realize that STATX_CHANGE_COOKIE is currently kernel > > > internal" made me think "OH $deity, they wants me to do that work > > > too???" > > > > > > A better way to have woreded that might've been "How about switching > > > this to a fs-determined structure so that we can switch the freshness > > > check to i_version when that's fully working on XFS?" > > > > > > The problem I have with reading patch review emails is that I can't > > > easily tell whether an author's suggestion is being made in a casual > > > offhand manner? Or if it reflects something they feel strongly needs > > > change before merging. > > > > > > In fairness to you, Amir, I don't know how much you've kept on top of > > > that i_version vs. XFS discussion. So I have no idea if you were aware > > > of the status of that work. > > > > > > > Sorry, I didn't mean to trigger anyone, but I do have real concerns > > about any API that attempts to use timestamps to detect whether > > something has changed. > > > > We learned that lesson in NFS in the 90's. VFS timestamp resolution is > > just not enough to show whether there was a change to a file -- full > > stop. > > > > I get the hand-wringing over i_version definitions and I don't care to > > rehash that discussion here, but I'll point out that this is a > > (proposed) XFS-private interface: > > > > What you could do is expose the XFS change counter (the one that gets > > bumped for everything, even atime updates, possibly via different > > ioctl), and use that for your "freshness" check. > > > > You'd unfortunately get false negative freshness checks after read > > operations, but you shouldn't get any false positives (which is real > > danger with timestamps). > > I don't see how would that work for this usecase? You have to sample > file2 before reflinking file2's contents to file1, writing the changes > to file1, and executing COMMIT_RANGE. Setting the xfs-private REFLINK > inode flag on file2 will trigger an iversion update even though it won't > change mtime or ctime. The COMMIT then fails due to the inode flags > change. > > Worse yet, applications aren't going to know if a particular access is > actually the one that will trigger an atime update. So this will just > fail unpredictably. > > If iversion was purely a write counter then I would switch the freshness > implementation to use it. But it's not, and I know this to be true > because I tried that and could not get COMMIT_RANGE to work reliably. > I suppose the advantage of the blob thing is that we actually /can/ > switch over whenever it's ready. > Yeah, that's the other part -- you have to be willing to redrive the I/O every time the freshness check fails, which can get expensive depending on how active the file is. Again this is an XFS interface, so I don't really have a dog in this fight. If you think timestamps are good enough, then so be it. All I can do is mention that it has been our experience in the NFS world that relying on timestamps like this will eventually lead to data corruption. The race conditions may be tight, and much of the time the race may be benign, but if you do this enough you'll eventually get bitten, and end up exchanging data when you shouldn't have. All of that said, I think this is great discussion fodder for LSF this year. I feel like the time is right to consider these sorts of interfaces that do synchronized I/O without locking. I've already proposed a discussion around the state of the i_version counter, so maybe we can chat about it then?
On Sat, Mar 02, 2024 at 07:43:53AM -0500, Jeff Layton wrote: > On Fri, 2024-03-01 at 18:48 -0800, Darrick J. Wong wrote: > > On Fri, Mar 01, 2024 at 08:31:21AM -0500, Jeff Layton wrote: > > > On Thu, 2024-02-29 at 15:27 -0800, Darrick J. Wong wrote: > > > > On Tue, Feb 27, 2024 at 08:52:58PM +0200, Amir Goldstein wrote: > > > > > On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > > > > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > > > > > To head off bikeshedding about the fields in xfs_commit_range, let's > > > > > > make it an opaque u64 array and require the userspace program to call > > > > > > a third ioctl to sample the freshness data for us. If we ever converge > > > > > > on a definition for i_version then we can use that; for now we'll just > > > > > > use mtime/ctime like the old swapext ioctl. > > > > > > > > > > This addresses my concerns about using mtime/ctime. > > > > > > > > Oh good! :) > > > > > > > > > I have to say, Darrick, that I think that referring to this concern as > > > > > bikeshedding is not being honest. > > > > > > > > > > I do hate nit picking reviews and I do hate "maybe also fix the world" > > > > > review comments, but I think the question about using mtime/ctime in > > > > > this new API was not out of place > > > > > > > > I agree, your question about mtime/ctime: > > > > > > > > "Maybe a stupid question, but under which circumstances would mtime > > > > change and ctime not change? Why are both needed?" > > > > > > > > was a very good question. But perhaps that statement referred to the > > > > other part of that thread. > > > > > > > > > and I think that making the freshness > > > > > data opaque is better for everyone in the long run and hopefully, this will > > > > > help you move to the things you care about faster. > > > > > > > > I wish you'd suggested an opaque blob that the fs can lay out however it > > > > wants instead of suggesting specifically the change cookie. I'm very > > > > much ok with an opaque freshness blob that allows future flexibility in > > > > how we define the blob's contents. > > > > > > > > I was however very upset about the Jeff's suggestion of using i_version. > > > > I apologize for using all caps in that reply, and snarling about it in > > > > the commit message here. The final version of this patch will not have > > > > that. > > > > > > > > That said, I don't think it is at all helpful to suggest using a file > > > > attribute whose behavior is as yet unresolved. Multigrain timestamps > > > > were a clever idea, regrettably reverted. As far as I could tell when I > > > > wrote my reply, neither had NFS implemented a better behavior and > > > > quietly merged it; nor have Jeff and Dave produced any sort of candidate > > > > patchset to fix all the resulting issues in XFS. > > > > > > > > Reading "I realize that STATX_CHANGE_COOKIE is currently kernel > > > > internal" made me think "OH $deity, they wants me to do that work > > > > too???" > > > > > > > > A better way to have woreded that might've been "How about switching > > > > this to a fs-determined structure so that we can switch the freshness > > > > check to i_version when that's fully working on XFS?" > > > > > > > > The problem I have with reading patch review emails is that I can't > > > > easily tell whether an author's suggestion is being made in a casual > > > > offhand manner? Or if it reflects something they feel strongly needs > > > > change before merging. > > > > > > > > In fairness to you, Amir, I don't know how much you've kept on top of > > > > that i_version vs. XFS discussion. So I have no idea if you were aware > > > > of the status of that work. > > > > > > > > > > Sorry, I didn't mean to trigger anyone, but I do have real concerns > > > about any API that attempts to use timestamps to detect whether > > > something has changed. > > > > > > We learned that lesson in NFS in the 90's. VFS timestamp resolution is > > > just not enough to show whether there was a change to a file -- full > > > stop. > > > > > > I get the hand-wringing over i_version definitions and I don't care to > > > rehash that discussion here, but I'll point out that this is a > > > (proposed) XFS-private interface: > > > > > > What you could do is expose the XFS change counter (the one that gets > > > bumped for everything, even atime updates, possibly via different > > > ioctl), and use that for your "freshness" check. > > > > > > You'd unfortunately get false negative freshness checks after read > > > operations, but you shouldn't get any false positives (which is real > > > danger with timestamps). > > > > I don't see how would that work for this usecase? You have to sample > > file2 before reflinking file2's contents to file1, writing the changes > > to file1, and executing COMMIT_RANGE. Setting the xfs-private REFLINK > > inode flag on file2 will trigger an iversion update even though it won't > > change mtime or ctime. The COMMIT then fails due to the inode flags > > change. > > > > Worse yet, applications aren't going to know if a particular access is > > actually the one that will trigger an atime update. So this will just > > fail unpredictably. > > > > If iversion was purely a write counter then I would switch the freshness > > implementation to use it. But it's not, and I know this to be true > > because I tried that and could not get COMMIT_RANGE to work reliably. > > I suppose the advantage of the blob thing is that we actually /can/ > > switch over whenever it's ready. > > > > Yeah, that's the other part -- you have to be willing to redrive the I/O > every time the freshness check fails, which can get expensive depending > on how active the file is. Again this is an XFS interface, so I don't > really have a dog in this fight. If you think timestamps are good > enough, then so be it. > > All I can do is mention that it has been our experience in the NFS world > that relying on timestamps like this will eventually lead to data > corruption. The race conditions may be tight, and much of the time the > race may be benign, but if you do this enough you'll eventually get > bitten, and end up exchanging data when you shouldn't have. > > All of that said, I think this is great discussion fodder for LSF this > year. I feel like the time is right to consider these sorts of > interfaces that do synchronized I/O without locking. I've already > proposed a discussion around the state of the i_version counter, so > maybe we can chat about it then? Yes. I've gotten an invitation, so corporate approval and dumb injuries notwithstanding, I'll be there this year. :) --D > -- > Jeff Layton <jlayton@kernel.org> >
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 01b3553adfc55..4019a78ee3ea5 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -860,14 +860,8 @@ struct xfs_commit_range { __u64 flags; /* see XFS_EXCHRANGE_* below */ - /* file2 metadata for freshness checks */ - __u64 file2_ino; /* inode number */ - __s64 file2_mtime; /* modification time */ - __s64 file2_ctime; /* change time */ - __s32 file2_mtime_nsec; /* mod time, nsec */ - __s32 file2_ctime_nsec; /* change time, nsec */ - - __u64 pad; /* must be zeroes */ + /* opaque file2 metadata for freshness checks */ + __u64 file2_freshness[5]; }; /* @@ -973,7 +967,8 @@ struct xfs_commit_range { #define XFS_IOC_BULKSTAT _IOR ('X', 127, struct xfs_bulkstat_req) #define XFS_IOC_INUMBERS _IOR ('X', 128, struct xfs_inumbers_req) #define XFS_IOC_EXCHANGE_RANGE _IOWR('X', 129, struct xfs_exch_range) -#define XFS_IOC_COMMIT_RANGE _IOWR('X', 129, struct xfs_commit_range) +#define XFS_IOC_START_COMMIT _IOWR('X', 130, struct xfs_commit_range) +#define XFS_IOC_COMMIT_RANGE _IOWR('X', 131, struct xfs_commit_range) /* XFS_IOC_GETFSUUID ---------- deprecated 140 */ diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c index e55ae06f1a32c..dae855515c3c4 100644 --- a/fs/xfs/xfs_exchrange.c +++ b/fs/xfs/xfs_exchrange.c @@ -863,3 +863,18 @@ xfs_exchange_range( fsnotify_modify(fxr->file2); return 0; } + +/* Sample freshness data from fxr->file2 for a commit range operation. */ +void +xfs_exchrange_freshness( + struct xfs_exchrange *fxr) +{ + struct inode *inode2 = file_inode(fxr->file2); + struct xfs_inode *ip2 = XFS_I(inode2); + + xfs_ilock(ip2, XFS_IOLOCK_SHARED | XFS_MMAPLOCK_SHARED | XFS_ILOCK_SHARED); + fxr->file2_ino = ip2->i_ino; + fxr->file2_ctime = inode_get_ctime(inode2); + fxr->file2_mtime = inode_get_mtime(inode2); + xfs_iunlock(ip2, XFS_IOLOCK_SHARED | XFS_MMAPLOCK_SHARED | XFS_ILOCK_SHARED); +} diff --git a/fs/xfs/xfs_exchrange.h b/fs/xfs/xfs_exchrange.h index 2dd9ab7d76828..942283a7f75f5 100644 --- a/fs/xfs/xfs_exchrange.h +++ b/fs/xfs/xfs_exchrange.h @@ -36,6 +36,7 @@ struct xfs_exchrange { struct timespec64 file2_ctime; }; +void xfs_exchrange_freshness(struct xfs_exchrange *fxr); int xfs_exchange_range(struct xfs_exchrange *fxr); /* XFS-specific parts of file exchanges */ diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index ee26ac2028da1..1940da72a1da7 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -2402,6 +2402,47 @@ xfs_ioc_exchange_range( return error; } +/* Opaque freshness blob for XFS_IOC_COMMIT_RANGE */ +struct xfs_commit_range_fresh { + __u64 file2_ino; /* inode number */ + __s64 file2_mtime; /* modification time */ + __s64 file2_ctime; /* change time */ + __s32 file2_mtime_nsec; /* mod time, nsec */ + __s32 file2_ctime_nsec; /* change time, nsec */ + __u64 pad; /* zero */ +}; + +static long +xfs_ioc_start_commit( + struct file *file, + struct xfs_commit_range __user *argp) +{ + struct xfs_exchrange fxr = { + .file2 = file, + }; + struct xfs_commit_range args; + struct xfs_commit_range_fresh *kern_f; + struct xfs_commit_range_fresh __user *user_f; + + BUILD_BUG_ON(sizeof(struct xfs_commit_range_fresh) != + sizeof(args.file2_freshness)); + + xfs_exchrange_freshness(&fxr); + + kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness; + kern_f->file2_ino = fxr.file2_ino; + kern_f->file2_mtime = fxr.file2_mtime.tv_sec; + kern_f->file2_mtime_nsec = fxr.file2_mtime.tv_nsec; + kern_f->file2_ctime = fxr.file2_ctime.tv_sec; + kern_f->file2_ctime_nsec = fxr.file2_ctime.tv_nsec; + + user_f = (struct xfs_commit_range_fresh *)&argp->file2_freshness; + if (copy_to_user(user_f, kern_f, sizeof(*kern_f))) + return -EFAULT; + + return 0; +} + static long xfs_ioc_commit_range( struct file *file, @@ -2411,12 +2452,15 @@ xfs_ioc_commit_range( .file2 = file, }; struct xfs_commit_range args; + struct xfs_commit_range_fresh *kern_f; struct fd file1; int error; + kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness; + if (copy_from_user(&args, argp, sizeof(args))) return -EFAULT; - if (memchr_inv(&args.pad, 0, sizeof(args.pad))) + if (memchr_inv(&kern_f->pad, 0, sizeof(kern_f->pad))) return -EINVAL; if (args.flags & ~XFS_EXCHRANGE_ALL_FLAGS) return -EINVAL; @@ -2425,11 +2469,11 @@ xfs_ioc_commit_range( fxr.file2_offset = args.file2_offset; fxr.length = args.length; fxr.flags = args.flags | __XFS_EXCHRANGE_CHECK_FRESH2; - fxr.file2_ino = args.file2_ino; - fxr.file2_mtime.tv_sec = args.file2_mtime; - fxr.file2_mtime.tv_nsec = args.file2_mtime_nsec; - fxr.file2_ctime.tv_sec = args.file2_ctime; - fxr.file2_ctime.tv_nsec = args.file2_ctime_nsec; + fxr.file2_ino = kern_f->file2_ino; + fxr.file2_mtime.tv_sec = kern_f->file2_mtime; + fxr.file2_mtime.tv_nsec = kern_f->file2_mtime_nsec; + fxr.file2_ctime.tv_sec = kern_f->file2_ctime; + fxr.file2_ctime.tv_nsec = kern_f->file2_ctime_nsec; file1 = fdget(args.file1_fd); if (!file1.file) @@ -2782,6 +2826,8 @@ xfs_file_ioctl( case XFS_IOC_EXCHANGE_RANGE: return xfs_ioc_exchange_range(filp, arg); + case XFS_IOC_START_COMMIT: + return xfs_ioc_start_commit(filp, arg); case XFS_IOC_COMMIT_RANGE: return xfs_ioc_commit_range(filp, arg);