diff mbox

[5/5] CIFS: Fix write after setting a read lock for read oplock files

Message ID 1353999029-3975-6-git-send-email-piastry@etersoft.ru (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Shilovsky Nov. 27, 2012, 6:50 a.m. UTC
If we have a read oplock and set a read lock in it, we can't write to the
locked area - so, filemap_fdatawrite may fail with a no information for a
userspace application even if we request a write to non-locked area. Fix
this by replacing it with filemap_write_and_wait call and sending non-page
write in a error case.

While this may end up with two write requests to the server, we can be sure
that our data will be the same at the server and the page cache - the next read
on this file gets the valid data.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
---
 fs/cifs/file.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Comments

Jeff Layton Nov. 27, 2012, 12:52 p.m. UTC | #1
On Tue, 27 Nov 2012 10:50:28 +0400
Pavel Shilovsky <piastry@etersoft.ru> wrote:

> If we have a read oplock and set a read lock in it, we can't write to the
> locked area - so, filemap_fdatawrite may fail with a no information for a
> userspace application even if we request a write to non-locked area. Fix
> this by replacing it with filemap_write_and_wait call and sending non-page
> write in a error case.
> 
> While this may end up with two write requests to the server, we can be sure
> that our data will be the same at the server and the page cache - the next read
> on this file gets the valid data.
> 
> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
> ---
>  fs/cifs/file.c |   10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index f8fe1bd..89efd85 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov,
>  	 */
>  	if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) {
>  		ssize_t written;
> -		int rc;
>  
>  		written = generic_file_aio_write(iocb, iov, nr_segs, pos);
> -		rc = filemap_fdatawrite(inode->i_mapping);
> -		if (rc)
> -			return (ssize_t)rc;
> -
> -		return written;
> +		/* try page write at first */
> +		if (!filemap_write_and_wait(inode->i_mapping))
> +			return written;
> +		/* page write failed - try from pos to pos+len-1 */
>  	}
>  #endif
>  

Bleh -- nasty. I guess this will work though...

Wonder if there's some way to populate the cache and then just mark the
pages clean without sending out writes? That would be a better solution
IMO, but I guess we can live with this for now...

Acked-by: Jeff Layton <jlayton@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Shilovsky Nov. 27, 2012, 2:57 p.m. UTC | #2
2012/11/27 Jeff Layton <jlayton@redhat.com>:
> On Tue, 27 Nov 2012 10:50:28 +0400
> Pavel Shilovsky <piastry@etersoft.ru> wrote:
>
>> If we have a read oplock and set a read lock in it, we can't write to the
>> locked area - so, filemap_fdatawrite may fail with a no information for a
>> userspace application even if we request a write to non-locked area. Fix
>> this by replacing it with filemap_write_and_wait call and sending non-page
>> write in a error case.
>>
>> While this may end up with two write requests to the server, we can be sure
>> that our data will be the same at the server and the page cache - the next read
>> on this file gets the valid data.
>>
>> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
>> ---
>>  fs/cifs/file.c |   10 ++++------
>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
>> index f8fe1bd..89efd85 100644
>> --- a/fs/cifs/file.c
>> +++ b/fs/cifs/file.c
>> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov,
>>        */
>>       if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) {
>>               ssize_t written;
>> -             int rc;
>>
>>               written = generic_file_aio_write(iocb, iov, nr_segs, pos);
>> -             rc = filemap_fdatawrite(inode->i_mapping);
>> -             if (rc)
>> -                     return (ssize_t)rc;
>> -
>> -             return written;
>> +             /* try page write at first */
>> +             if (!filemap_write_and_wait(inode->i_mapping))
>> +                     return written;
>> +             /* page write failed - try from pos to pos+len-1 */
>>       }
>>  #endif
>>
>
> Bleh -- nasty. I guess this will work though...
>
> Wonder if there's some way to populate the cache and then just mark the
> pages clean without sending out writes? That would be a better solution
> IMO, but I guess we can live with this for now...
>
> Acked-by: Jeff Layton <jlayton@redhat.com>

Thanks for reviewing the patchset.

As for this patch I have the followon patch:

http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e

that allows us to populate the page cache without making pages dirty.
I didn't have enough time to test it well (that's why I haven't posted
it yet)  - going to do it tomorrow. But you are welcome to comment the
approach.
Steve French Nov. 27, 2012, 5:04 p.m. UTC | #3
Am curious how you noticed this problem?  Did you have a test case?

On Tue, Nov 27, 2012 at 12:50 AM, Pavel Shilovsky <piastry@etersoft.ru> wrote:
> If we have a read oplock and set a read lock in it, we can't write to the
> locked area - so, filemap_fdatawrite may fail with a no information for a
> userspace application even if we request a write to non-locked area. Fix
> this by replacing it with filemap_write_and_wait call and sending non-page
> write in a error case.
>
> While this may end up with two write requests to the server, we can be sure
> that our data will be the same at the server and the page cache - the next read
> on this file gets the valid data.
>
> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
> ---
>  fs/cifs/file.c |   10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index f8fe1bd..89efd85 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov,
>          */
>         if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) {
>                 ssize_t written;
> -               int rc;
>
>                 written = generic_file_aio_write(iocb, iov, nr_segs, pos);
> -               rc = filemap_fdatawrite(inode->i_mapping);
> -               if (rc)
> -                       return (ssize_t)rc;
> -
> -               return written;
> +               /* try page write at first */
> +               if (!filemap_write_and_wait(inode->i_mapping))
> +                       return written;
> +               /* page write failed - try from pos to pos+len-1 */
>         }
>  #endif
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Shilovsky Nov. 28, 2012, 7:30 a.m. UTC | #4
2012/11/27 Steve French <smfrench@gmail.com>:
> Am curious how you noticed this problem?  Did you have a test case?

The test case that reproduces the problem:
1) client1 opens "filename" (file that already has enough data ~~ 4096
bytes is ok)
2) client1 sets a read lock a file from pos=1 to pos=2
3) client2 opens "filename" - client1 gets oplock break to level II
and push lock from #2 to the server.
4) client1 writes 'a' to a file (from pos 0) - got status OK, but the
server reject the write (due to the client1 writes the whole page from
pos=0 to pos=4096 and conflicts with the read lock from pos=1 to
pos=2) - ERROR.
5) client2 reads 1 byte and got the old data (not 'a') because the
server returns it's version of the data

It is suitable for both CIFS and SMB2.
Jeff Layton Nov. 28, 2012, 11:36 a.m. UTC | #5
On Tue, 27 Nov 2012 18:57:22 +0400
Pavel Shilovsky <piastry@etersoft.ru> wrote:

> 2012/11/27 Jeff Layton <jlayton@redhat.com>:
> > On Tue, 27 Nov 2012 10:50:28 +0400
> > Pavel Shilovsky <piastry@etersoft.ru> wrote:
> >
> >> If we have a read oplock and set a read lock in it, we can't write to the
> >> locked area - so, filemap_fdatawrite may fail with a no information for a
> >> userspace application even if we request a write to non-locked area. Fix
> >> this by replacing it with filemap_write_and_wait call and sending non-page
> >> write in a error case.
> >>
> >> While this may end up with two write requests to the server, we can be sure
> >> that our data will be the same at the server and the page cache - the next read
> >> on this file gets the valid data.
> >>
> >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
> >> ---
> >>  fs/cifs/file.c |   10 ++++------
> >>  1 file changed, 4 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> >> index f8fe1bd..89efd85 100644
> >> --- a/fs/cifs/file.c
> >> +++ b/fs/cifs/file.c
> >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov,
> >>        */
> >>       if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) {
> >>               ssize_t written;
> >> -             int rc;
> >>
> >>               written = generic_file_aio_write(iocb, iov, nr_segs, pos);
> >> -             rc = filemap_fdatawrite(inode->i_mapping);
> >> -             if (rc)
> >> -                     return (ssize_t)rc;
> >> -
> >> -             return written;
> >> +             /* try page write at first */
> >> +             if (!filemap_write_and_wait(inode->i_mapping))
> >> +                     return written;
> >> +             /* page write failed - try from pos to pos+len-1 */
> >>       }
> >>  #endif
> >>
> >
> > Bleh -- nasty. I guess this will work though...
> >
> > Wonder if there's some way to populate the cache and then just mark the
> > pages clean without sending out writes? That would be a better solution
> > IMO, but I guess we can live with this for now...
> >
> > Acked-by: Jeff Layton <jlayton@redhat.com>
> 
> Thanks for reviewing the patchset.
> 
> As for this patch I have the followon patch:
> 
> http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e
> 
> that allows us to populate the page cache without making pages dirty.
> I didn't have enough time to test it well (that's why I haven't posted
> it yet)  - going to do it tomorrow. But you are welcome to comment the
> approach.
> 

The only thing that makes me wary here is that you're setting this flag
on the inode. Could there ever be a situation where another task might
be writing to this inode at the same time and needs to set them dirty?

If not, then I'm not sure I see the need for a new bool in the inode.
It might be simpler to just check what sort of oplock you have in
write_end instead.

There's also a lot of logic around what sort of locking you're doing
here too. I think we ought to do the same sort of I/O regardless of
whether POSIX locks are being used or not.
Pavel Shilovsky Nov. 28, 2012, 12:12 p.m. UTC | #6
2012/11/28 Jeff Layton <jlayton@redhat.com>:
> On Tue, 27 Nov 2012 18:57:22 +0400
> Pavel Shilovsky <piastry@etersoft.ru> wrote:
>
>> 2012/11/27 Jeff Layton <jlayton@redhat.com>:
>> > On Tue, 27 Nov 2012 10:50:28 +0400
>> > Pavel Shilovsky <piastry@etersoft.ru> wrote:
>> >
>> >> If we have a read oplock and set a read lock in it, we can't write to the
>> >> locked area - so, filemap_fdatawrite may fail with a no information for a
>> >> userspace application even if we request a write to non-locked area. Fix
>> >> this by replacing it with filemap_write_and_wait call and sending non-page
>> >> write in a error case.
>> >>
>> >> While this may end up with two write requests to the server, we can be sure
>> >> that our data will be the same at the server and the page cache - the next read
>> >> on this file gets the valid data.
>> >>
>> >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
>> >> ---
>> >>  fs/cifs/file.c |   10 ++++------
>> >>  1 file changed, 4 insertions(+), 6 deletions(-)
>> >>
>> >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
>> >> index f8fe1bd..89efd85 100644
>> >> --- a/fs/cifs/file.c
>> >> +++ b/fs/cifs/file.c
>> >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov,
>> >>        */
>> >>       if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) {
>> >>               ssize_t written;
>> >> -             int rc;
>> >>
>> >>               written = generic_file_aio_write(iocb, iov, nr_segs, pos);
>> >> -             rc = filemap_fdatawrite(inode->i_mapping);
>> >> -             if (rc)
>> >> -                     return (ssize_t)rc;
>> >> -
>> >> -             return written;
>> >> +             /* try page write at first */
>> >> +             if (!filemap_write_and_wait(inode->i_mapping))
>> >> +                     return written;
>> >> +             /* page write failed - try from pos to pos+len-1 */
>> >>       }
>> >>  #endif
>> >>
>> >
>> > Bleh -- nasty. I guess this will work though...
>> >
>> > Wonder if there's some way to populate the cache and then just mark the
>> > pages clean without sending out writes? That would be a better solution
>> > IMO, but I guess we can live with this for now...
>> >
>> > Acked-by: Jeff Layton <jlayton@redhat.com>
>>
>> Thanks for reviewing the patchset.
>>
>> As for this patch I have the followon patch:
>>
>> http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e
>>
>> that allows us to populate the page cache without making pages dirty.
>> I didn't have enough time to test it well (that's why I haven't posted
>> it yet)  - going to do it tomorrow. But you are welcome to comment the
>> approach.
>>
>
> The only thing that makes me wary here is that you're setting this flag
> on the inode. Could there ever be a situation where another task might
> be writing to this inode at the same time and needs to set them dirty?
>
> If not, then I'm not sure I see the need for a new bool in the inode.
> It might be simpler to just check what sort of oplock you have in
> write_end instead.
>
> There's also a lot of logic around what sort of locking you're doing
> here too. I think we ought to do the same sort of I/O regardless of
> whether POSIX locks are being used or not.
>

There are some places where VFS code uses write_end call (through
pagecache_write_end) but I didn't find any place where cifs code can
hit it. So, I think we can assume now that cifs_write_end is called
only from generic_file_aio_write codepath. If so, we can be sure that
only on process may want to set pages dirty through cifs_write_end due
to i_mutex lock.

It seems your are right and we can use clientCanCacheAll value
directly from cifs_write_end - will make changes. Also, I think I can
merge these two patches into one.

Thanks.
Jeff Layton Nov. 28, 2012, 12:37 p.m. UTC | #7
On Wed, 28 Nov 2012 16:12:56 +0400
Pavel Shilovsky <piastry@etersoft.ru> wrote:

> 2012/11/28 Jeff Layton <jlayton@redhat.com>:
> > On Tue, 27 Nov 2012 18:57:22 +0400
> > Pavel Shilovsky <piastry@etersoft.ru> wrote:
> >
> >> 2012/11/27 Jeff Layton <jlayton@redhat.com>:
> >> > On Tue, 27 Nov 2012 10:50:28 +0400
> >> > Pavel Shilovsky <piastry@etersoft.ru> wrote:
> >> >
> >> >> If we have a read oplock and set a read lock in it, we can't write to the
> >> >> locked area - so, filemap_fdatawrite may fail with a no information for a
> >> >> userspace application even if we request a write to non-locked area. Fix
> >> >> this by replacing it with filemap_write_and_wait call and sending non-page
> >> >> write in a error case.
> >> >>
> >> >> While this may end up with two write requests to the server, we can be sure
> >> >> that our data will be the same at the server and the page cache - the next read
> >> >> on this file gets the valid data.
> >> >>
> >> >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
> >> >> ---
> >> >>  fs/cifs/file.c |   10 ++++------
> >> >>  1 file changed, 4 insertions(+), 6 deletions(-)
> >> >>
> >> >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> >> >> index f8fe1bd..89efd85 100644
> >> >> --- a/fs/cifs/file.c
> >> >> +++ b/fs/cifs/file.c
> >> >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov,
> >> >>        */
> >> >>       if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) {
> >> >>               ssize_t written;
> >> >> -             int rc;
> >> >>
> >> >>               written = generic_file_aio_write(iocb, iov, nr_segs, pos);
> >> >> -             rc = filemap_fdatawrite(inode->i_mapping);
> >> >> -             if (rc)
> >> >> -                     return (ssize_t)rc;
> >> >> -
> >> >> -             return written;
> >> >> +             /* try page write at first */
> >> >> +             if (!filemap_write_and_wait(inode->i_mapping))
> >> >> +                     return written;
> >> >> +             /* page write failed - try from pos to pos+len-1 */
> >> >>       }
> >> >>  #endif
> >> >>
> >> >
> >> > Bleh -- nasty. I guess this will work though...
> >> >
> >> > Wonder if there's some way to populate the cache and then just mark the
> >> > pages clean without sending out writes? That would be a better solution
> >> > IMO, but I guess we can live with this for now...
> >> >
> >> > Acked-by: Jeff Layton <jlayton@redhat.com>
> >>
> >> Thanks for reviewing the patchset.
> >>
> >> As for this patch I have the followon patch:
> >>
> >> http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e
> >>
> >> that allows us to populate the page cache without making pages dirty.
> >> I didn't have enough time to test it well (that's why I haven't posted
> >> it yet)  - going to do it tomorrow. But you are welcome to comment the
> >> approach.
> >>
> >
> > The only thing that makes me wary here is that you're setting this flag
> > on the inode. Could there ever be a situation where another task might
> > be writing to this inode at the same time and needs to set them dirty?
> >
> > If not, then I'm not sure I see the need for a new bool in the inode.
> > It might be simpler to just check what sort of oplock you have in
> > write_end instead.
> >
> > There's also a lot of logic around what sort of locking you're doing
> > here too. I think we ought to do the same sort of I/O regardless of
> > whether POSIX locks are being used or not.
> >
> 
> There are some places where VFS code uses write_end call (through
> pagecache_write_end) but I didn't find any place where cifs code can
> hit it. So, I think we can assume now that cifs_write_end is called
> only from generic_file_aio_write codepath. If so, we can be sure that
> only on process may want to set pages dirty through cifs_write_end due
> to i_mutex lock.
> 
> It seems your are right and we can use clientCanCacheAll value
> directly from cifs_write_end - will make changes. Also, I think I can
> merge these two patches into one.
> 
> Thanks.
> 

Note though, that it matters what sort of cache= option is in force
too. If cache=loose then you *do* want to set the page dirty.
Pavel Shilovsky Nov. 28, 2012, 12:48 p.m. UTC | #8
2012/11/28 Jeff Layton <jlayton@redhat.com>:
> On Wed, 28 Nov 2012 16:12:56 +0400
> Pavel Shilovsky <piastry@etersoft.ru> wrote:
>
>> 2012/11/28 Jeff Layton <jlayton@redhat.com>:
>> > On Tue, 27 Nov 2012 18:57:22 +0400
>> > Pavel Shilovsky <piastry@etersoft.ru> wrote:
>> >
>> >> 2012/11/27 Jeff Layton <jlayton@redhat.com>:
>> >> > On Tue, 27 Nov 2012 10:50:28 +0400
>> >> > Pavel Shilovsky <piastry@etersoft.ru> wrote:
>> >> >
>> >> >> If we have a read oplock and set a read lock in it, we can't write to the
>> >> >> locked area - so, filemap_fdatawrite may fail with a no information for a
>> >> >> userspace application even if we request a write to non-locked area. Fix
>> >> >> this by replacing it with filemap_write_and_wait call and sending non-page
>> >> >> write in a error case.
>> >> >>
>> >> >> While this may end up with two write requests to the server, we can be sure
>> >> >> that our data will be the same at the server and the page cache - the next read
>> >> >> on this file gets the valid data.
>> >> >>
>> >> >> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
>> >> >> ---
>> >> >>  fs/cifs/file.c |   10 ++++------
>> >> >>  1 file changed, 4 insertions(+), 6 deletions(-)
>> >> >>
>> >> >> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
>> >> >> index f8fe1bd..89efd85 100644
>> >> >> --- a/fs/cifs/file.c
>> >> >> +++ b/fs/cifs/file.c
>> >> >> @@ -2511,14 +2511,12 @@ cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov,
>> >> >>        */
>> >> >>       if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) {
>> >> >>               ssize_t written;
>> >> >> -             int rc;
>> >> >>
>> >> >>               written = generic_file_aio_write(iocb, iov, nr_segs, pos);
>> >> >> -             rc = filemap_fdatawrite(inode->i_mapping);
>> >> >> -             if (rc)
>> >> >> -                     return (ssize_t)rc;
>> >> >> -
>> >> >> -             return written;
>> >> >> +             /* try page write at first */
>> >> >> +             if (!filemap_write_and_wait(inode->i_mapping))
>> >> >> +                     return written;
>> >> >> +             /* page write failed - try from pos to pos+len-1 */
>> >> >>       }
>> >> >>  #endif
>> >> >>
>> >> >
>> >> > Bleh -- nasty. I guess this will work though...
>> >> >
>> >> > Wonder if there's some way to populate the cache and then just mark the
>> >> > pages clean without sending out writes? That would be a better solution
>> >> > IMO, but I guess we can live with this for now...
>> >> >
>> >> > Acked-by: Jeff Layton <jlayton@redhat.com>
>> >>
>> >> Thanks for reviewing the patchset.
>> >>
>> >> As for this patch I have the followon patch:
>> >>
>> >> http://git.altlinux.org/people/piastry/public/?p=cifs-2.6.git;a=commitdiff;h=11673376e6840c686f0c6e51892b16ad1945887e
>> >>
>> >> that allows us to populate the page cache without making pages dirty.
>> >> I didn't have enough time to test it well (that's why I haven't posted
>> >> it yet)  - going to do it tomorrow. But you are welcome to comment the
>> >> approach.
>> >>
>> >
>> > The only thing that makes me wary here is that you're setting this flag
>> > on the inode. Could there ever be a situation where another task might
>> > be writing to this inode at the same time and needs to set them dirty?
>> >
>> > If not, then I'm not sure I see the need for a new bool in the inode.
>> > It might be simpler to just check what sort of oplock you have in
>> > write_end instead.
>> >
>> > There's also a lot of logic around what sort of locking you're doing
>> > here too. I think we ought to do the same sort of I/O regardless of
>> > whether POSIX locks are being used or not.
>> >
>>
>> There are some places where VFS code uses write_end call (through
>> pagecache_write_end) but I didn't find any place where cifs code can
>> hit it. So, I think we can assume now that cifs_write_end is called
>> only from generic_file_aio_write codepath. If so, we can be sure that
>> only on process may want to set pages dirty through cifs_write_end due
>> to i_mutex lock.
>>
>> It seems your are right and we can use clientCanCacheAll value
>> directly from cifs_write_end - will make changes. Also, I think I can
>> merge these two patches into one.
>>
>> Thanks.
>>
>
> Note though, that it matters what sort of cache= option is in force
> too. If cache=loose then you *do* want to set the page dirty.

This makes sense, thanks.
Pavel Shilovsky Nov. 28, 2012, 1:55 p.m. UTC | #9
2012/11/28 Jeff Layton <jlayton@redhat.com>:
> There's also a lot of logic around what sort of locking you're doing
> here too. I think we ought to do the same sort of I/O regardless of
> whether POSIX locks are being used or not.

We can use cifs_writev for both POSIX and mandatory variants but I
divided them to make POSIX variant work faster (no need to check hold
a semaphore, walk through a lock list, etc).
Jeff Layton Nov. 28, 2012, 2:02 p.m. UTC | #10
On Wed, 28 Nov 2012 17:55:41 +0400
Pavel Shilovsky <piastry@etersoft.ru> wrote:

> 2012/11/28 Jeff Layton <jlayton@redhat.com>:
> > There's also a lot of logic around what sort of locking you're doing
> > here too. I think we ought to do the same sort of I/O regardless of
> > whether POSIX locks are being used or not.
> 
> We can use cifs_writev for both POSIX and mandatory variants but I
> divided them to make POSIX variant work faster (no need to check hold
> a semaphore, walk through a lock list, etc).
> 

Refresh my memory -- why do we need to handle writes differently when
POSIX vs. non-POSIX locking is in force? It seems to me that that
shouldn't matter and the behavior should be solely a function of what
sort of oplock you have.
Pavel Shilovsky Nov. 28, 2012, 2:07 p.m. UTC | #11
2012/11/28 Jeff Layton <jlayton@redhat.com>:
> On Wed, 28 Nov 2012 17:55:41 +0400
> Pavel Shilovsky <piastry@etersoft.ru> wrote:
>
>> 2012/11/28 Jeff Layton <jlayton@redhat.com>:
>> > There's also a lot of logic around what sort of locking you're doing
>> > here too. I think we ought to do the same sort of I/O regardless of
>> > whether POSIX locks are being used or not.
>>
>> We can use cifs_writev for both POSIX and mandatory variants but I
>> divided them to make POSIX variant work faster (no need to check hold
>> a semaphore, walk through a lock list, etc).
>>
>
> Refresh my memory -- why do we need to handle writes differently when
> POSIX vs. non-POSIX locking is in force? It seems to me that that
> shouldn't matter and the behavior should be solely a function of what
> sort of oplock you have.

A write request can have conflict with mandatory locks set on a file.
That's why we need to check for lock conflicts before issue the
write/read.
Jeff Layton Nov. 28, 2012, 2:29 p.m. UTC | #12
On Wed, 28 Nov 2012 18:07:46 +0400
Pavel Shilovsky <piastry@etersoft.ru> wrote:

> 2012/11/28 Jeff Layton <jlayton@redhat.com>:
> > On Wed, 28 Nov 2012 17:55:41 +0400
> > Pavel Shilovsky <piastry@etersoft.ru> wrote:
> >
> >> 2012/11/28 Jeff Layton <jlayton@redhat.com>:
> >> > There's also a lot of logic around what sort of locking you're doing
> >> > here too. I think we ought to do the same sort of I/O regardless of
> >> > whether POSIX locks are being used or not.
> >>
> >> We can use cifs_writev for both POSIX and mandatory variants but I
> >> divided them to make POSIX variant work faster (no need to check hold
> >> a semaphore, walk through a lock list, etc).
> >>
> >
> > Refresh my memory -- why do we need to handle writes differently when
> > POSIX vs. non-POSIX locking is in force? It seems to me that that
> > shouldn't matter and the behavior should be solely a function of what
> > sort of oplock you have.
> 
> A write request can have conflict with mandatory locks set on a file.
> That's why we need to check for lock conflicts before issue the
> write/read.
> 

Hmmm...and the server might not know about the lock if it's cached.
Fair enough.
Pavel Shilovsky Nov. 28, 2012, 2:38 p.m. UTC | #13
2012/11/28 Jeff Layton <jlayton@redhat.com>:
> On Wed, 28 Nov 2012 18:07:46 +0400
> Pavel Shilovsky <piastry@etersoft.ru> wrote:
>
>> 2012/11/28 Jeff Layton <jlayton@redhat.com>:
>> > On Wed, 28 Nov 2012 17:55:41 +0400
>> > Pavel Shilovsky <piastry@etersoft.ru> wrote:
>> >
>> >> 2012/11/28 Jeff Layton <jlayton@redhat.com>:
>> >> > There's also a lot of logic around what sort of locking you're doing
>> >> > here too. I think we ought to do the same sort of I/O regardless of
>> >> > whether POSIX locks are being used or not.
>> >>
>> >> We can use cifs_writev for both POSIX and mandatory variants but I
>> >> divided them to make POSIX variant work faster (no need to check hold
>> >> a semaphore, walk through a lock list, etc).
>> >>
>> >
>> > Refresh my memory -- why do we need to handle writes differently when
>> > POSIX vs. non-POSIX locking is in force? It seems to me that that
>> > shouldn't matter and the behavior should be solely a function of what
>> > sort of oplock you have.
>>
>> A write request can have conflict with mandatory locks set on a file.
>> That's why we need to check for lock conflicts before issue the
>> write/read.
>>
>
> Hmmm...and the server might not know about the lock if it's cached.
> Fair enough.

Yes. Also there may be a situation when we have oplock level II and
brlocks sent to the server (we can only cache brlock in exclusive
oplock case): we can check locks locally too and don't send a write to
the server if conflicts exist - save performance.
diff mbox

Patch

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index f8fe1bd..89efd85 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2511,14 +2511,12 @@  cifs_strict_writev(struct kiocb *iocb, const struct iovec *iov,
 	 */
 	if (!cinode->clientCanCacheAll && cinode->clientCanCacheRead) {
 		ssize_t written;
-		int rc;
 
 		written = generic_file_aio_write(iocb, iov, nr_segs, pos);
-		rc = filemap_fdatawrite(inode->i_mapping);
-		if (rc)
-			return (ssize_t)rc;
-
-		return written;
+		/* try page write at first */
+		if (!filemap_write_and_wait(inode->i_mapping))
+			return written;
+		/* page write failed - try from pos to pos+len-1 */
 	}
 #endif