diff mbox

[linux-cifs-client] Re: [PATCH] fs/cifs: send SMB_COM_FLUSH in cifs_fsync

Message ID 524f69650902201721l1c7f0d34p4708980a90d7fc8f@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Steve French Feb. 21, 2009, 1:21 a.m. UTC
On Fri, Feb 20, 2009 at 2:51 PM, Horst Reiterer
<horst.reiterer@gmail.com> wrote:
> Hi,
>
> In contrast to the now-obsolete smbfs, cifs does not send SMB_COM_FLUSH
> in response to an explicit fsync(2) to guarantee that all volatile data
> is written to stable storage on the server side, provided the server
> honors the request (which, to my knowledge, is true for Windows and
> Samba with 'strict sync' enabled).
> This patch modifies the cifs_fsync implementation

I modified your patch slightly to not lose the writeback rc in one
case, and to change camel case pTcon to tcon and remove one
unnecessary local variable.

See attached.   Thanks for the submission - looks fine otherwise.   If
you have any performance numbers before and after (with e.g. dbench,
iozone, bonnie etc. or perhaps something which calls fsync more often
- that would be helpful in determining whether we need a mount option
to optionally disable it - as the samba server does)

Comments

Horst Reiterer Feb. 22, 2009, 8:50 p.m. UTC | #1
Hi,

On Fri, 20 Feb 2009, Steve French wrote:
> I modified your patch slightly to not lose the writeback rc in one
> case, and to change camel case pTcon to tcon and remove one
> unnecessary local variable.

Thanks.

> If you have any performance numbers before and after (with e.g. dbench,
> iozone, bonnie etc. or perhaps something which calls fsync more often
> - that would be helpful in determining whether we need a mount option
> to optionally disable it - as the samba server does)

I tested the patch using a simple test driver that transfers data in 1MB
chunks and calls fsync after each chunk. The results are as follows:

  Test run            Throughput (MB/s)
                      1 client (MB/s) 10 clients 20 clients 40 clients
  cifs - vanilla                   64        116        117        111
  cifs - fsync.patch               54         90         95         95

As you can see, fsync does make a noticeable difference here. The problem
with benchmarks in this respect is that the difference solely depends on
the underlying device. In this scenario, the storage device featured a
battery-backed disk cache. Without it, the overhead would have been much
more significant.

In any case, it's probably a good idea to introduce a mount option. The
new behavior should be enabled by default though, to provide maximum data
consistency. While I can't think of a legitimate reason to disable fsync
if data consistency is a concern, an option might be useful in the
exceptional case where 1) data consistency is not a strict requirement and
2) SMB_COM_FLUSH cannot be ignored on the server side.

Thanks,

Horst.
Steve French Feb. 22, 2009, 9:03 p.m. UTC | #2
On Sun, Feb 22, 2009 at 2:50 PM, Horst Reiterer
<horst.reiterer@gmail.com> wrote:
> In any case, it's probably a good idea to introduce a mount option. The
> new behavior should be enabled by default though, to provide maximum data
> consistency.

Suggestions on what to call such a new mount option?  How about
"nostrictfsync"  ?
Horst Reiterer Feb. 22, 2009, 9:12 p.m. UTC | #3
On Sun, 22 Feb 2009, Steve French wrote:
> Suggestions on what to call such a new mount option?  How about
> "nostrictfsync"  ?

Sound good, should be self-explanatory for Samba users and those familiar
with the fsync concept.

Horst.
Jeff Layton Feb. 23, 2009, 1:34 a.m. UTC | #4
On Sun, 22 Feb 2009 22:12:11 +0100 (CET)
Horst Reiterer <horst.reiterer@gmail.com> wrote:

> On Sun, 22 Feb 2009, Steve French wrote:
> > Suggestions on what to call such a new mount option?  How about
> > "nostrictfsync"  ?
> 
> Sound good, should be self-explanatory for Samba users and those familiar
> with the fsync concept.
> 
> Horst.

My suggestion would be to not add a new option until someone requests
it/complains about it. We already have a lot of unneeded/unused mount
options, and I think this will just be adding one to the pile.

My $.02...
Jeff Layton Feb. 23, 2009, 2:10 a.m. UTC | #5
On Sun, 22 Feb 2009 17:34:46 -0800
Jeff Layton <jlayton@redhat.com> wrote:

> On Sun, 22 Feb 2009 22:12:11 +0100 (CET)
> Horst Reiterer <horst.reiterer@gmail.com> wrote:
> 
> > On Sun, 22 Feb 2009, Steve French wrote:
> > > Suggestions on what to call such a new mount option?  How about
> > > "nostrictfsync"  ?
> > 
> > Sound good, should be self-explanatory for Samba users and those familiar
> > with the fsync concept.
> > 
> > Horst.
> 
> My suggestion would be to not add a new option until someone requests
> it/complains about it. We already have a lot of unneeded/unused mount
> options, and I think this will just be adding one to the pile.
> 
> My $.02...
> 

...and to lend further strength to this argument, we're not doing this
at close(), but rather at fsync(). posix is pretty clear about what's
supposed to happen at fsync:

"The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage device
associated with the file described by fildes."

...if you don't want to take the performance penalty then don't use
fsync(). If you're using fsync, then you care about data integrity
and adding a mount option to disable it is rather pointless.
Steve French Feb. 23, 2009, 2:35 a.m. UTC | #6
On Sun, Feb 22, 2009 at 8:10 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Sun, 22 Feb 2009 17:34:46 -0800
> Jeff Layton <jlayton@redhat.com> wrote:
>
>> On Sun, 22 Feb 2009 22:12:11 +0100 (CET)
>> Horst Reiterer <horst.reiterer@gmail.com> wrote:
>>
>> > On Sun, 22 Feb 2009, Steve French wrote:
>> > > Suggestions on what to call such a new mount option?  How about
>> > > "nostrictfsync"  ?
>> >
>> > Sound good, should be self-explanatory for Samba users and those familiar
>> > with the fsync concept.
>> >
>> > Horst.
>>
>> My suggestion would be to not add a new option until someone requests
>> it/complains about it. We already have a lot of unneeded/unused mount
>> options, and I think this will just be adding one to the pile.
>>
>> My $.02...
>>
>
> ...and to lend further strength to this argument, we're not doing this
> at close(), but rather at fsync(). posix is pretty clear about what's
> supposed to happen at fsync:
>
> "The fsync() function shall request that all data for the open file
> descriptor named by fildes is to be transferred to the storage device
> associated with the file described by fildes."
>
> ...if you don't want to take the performance penalty then don't use
> fsync(). If you're using fsync, then you care about data integrity
> and adding a mount option to disable it is rather pointless.

There is a lot of history here (especially in the man page of
smb.conf) on dumb things apps do in this area.  I don't object to
allowing users to turn strict sync behavior off.   There are some apps
who presumably consider getting data to the server (which we already
do in fsync) "stable enough" storage (clients are not battery backed
up, with redundant, highly available hardware as some servers are).
I agree that the default should be to write the data to the metal on
the server (even though this can be very, very expensive) for fsync
diff mbox

Patch

diff --git a/fs/cifs/CHANGES b/fs/cifs/CHANGES
index 851388f..d43e0fe 100644
--- a/fs/cifs/CHANGES
+++ b/fs/cifs/CHANGES
@@ -6,7 +6,9 @@  the server to treat subsequent connections, especially those that
 are authenticated as guest, as reconnections, invalidating the earlier
 user's smb session.  This fix allows cifs to mount multiple times to the
 same server with different userids without risking invalidating earlier
-established security contexts.
+established security contexts.  fsync now sends SMB Flush operation
+to better ensure that we wait for server to write all of the data to
+server disk (not just write it over the network).
 
 Version 1.56
 ------------
diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c
index 490e34b..877e4d9 100644
--- a/fs/cifs/cifs_debug.c
+++ b/fs/cifs/cifs_debug.c
@@ -340,6 +340,8 @@  static int cifs_stats_proc_show(struct seq_file *m, void *v)
 				seq_printf(m, "\nWrites: %d Bytes: %lld",
 					atomic_read(&tcon->num_writes),
 					(long long)(tcon->bytes_written));
+				seq_printf(m, "\nFlushes: %d",
+					atomic_read(&tcon->num_flushes));
 				seq_printf(m, "\nLocks: %d HardLinks: %d "
 					      "Symlinks: %d",
 					atomic_read(&tcon->num_locks),
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index e004f6d..44ff94d 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -254,6 +254,7 @@  struct cifsTconInfo {
 	atomic_t num_smbs_sent;
 	atomic_t num_writes;
 	atomic_t num_reads;
+	atomic_t num_flushes;
 	atomic_t num_oplock_brks;
 	atomic_t num_opens;
 	atomic_t num_closes;
diff --git a/fs/cifs/cifspdu.h b/fs/cifs/cifspdu.h
index b4e2e9f..eda6e51 100644
--- a/fs/cifs/cifspdu.h
+++ b/fs/cifs/cifspdu.h
@@ -43,6 +43,7 @@ 
 #define SMB_COM_CREATE_DIRECTORY      0x00 /* trivial response */
 #define SMB_COM_DELETE_DIRECTORY      0x01 /* trivial response */
 #define SMB_COM_CLOSE                 0x04 /* triv req/rsp, timestamp ignored */
+#define SMB_COM_FLUSH                 0x05 /* triv req/rsp */
 #define SMB_COM_DELETE                0x06 /* trivial response */
 #define SMB_COM_RENAME                0x07 /* trivial response */
 #define SMB_COM_QUERY_INFORMATION     0x08 /* aka getattr */
@@ -790,6 +791,12 @@  typedef struct smb_com_close_rsp {
 	__u16 ByteCount;	/* bct = 0 */
 } __attribute__((packed)) CLOSE_RSP;
 
+typedef struct smb_com_flush_req {
+	struct smb_hdr hdr;	/* wct = 1 */
+	__u16 FileID;
+	__u16 ByteCount;	/* 0 */
+} __attribute__((packed)) FLUSH_REQ;
+
 typedef struct smb_com_findclose_req {
 	struct smb_hdr hdr; /* wct = 1 */
 	__u16 FileID;
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 083dfc5..596fc86 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -281,6 +281,9 @@  extern int CIFSPOSIXCreate(const int xid, struct cifsTconInfo *tcon,
 extern int CIFSSMBClose(const int xid, struct cifsTconInfo *tcon,
 			const int smb_file_id);
 
+extern int CIFSSMBFlush(const int xid, struct cifsTconInfo *tcon,
+			const int smb_file_id);
+
 extern int CIFSSMBRead(const int xid, struct cifsTconInfo *tcon,
 			const int netfid, unsigned int count,
 			const __u64 lseek, unsigned int *nbytes, char **buf,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 939e2f7..4c344fe 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1934,6 +1934,27 @@  CIFSSMBClose(const int xid, struct cifsTconInfo *tcon, int smb_file_id)
 }
 
 int
+CIFSSMBFlush(const int xid, struct cifsTconInfo *tcon, int smb_file_id)
+{
+	int rc = 0;
+	FLUSH_REQ *pSMB = NULL;
+	cFYI(1, ("In CIFSSMBFlush"));
+
+	rc = small_smb_init(SMB_COM_FLUSH, 1, tcon, (void **) &pSMB);
+	if (rc)
+		return rc;
+
+	pSMB->FileID = (__u16) smb_file_id;
+	pSMB->ByteCount = 0;
+	rc = SendReceiveNoRsp(xid, tcon->ses, (struct smb_hdr *) pSMB, 0);
+	cifs_stats_inc(&tcon->num_flushes);
+	if (rc)
+		cERROR(1, ("Send error in Flush = %d", rc));
+
+	return rc;
+}
+
+int
 CIFSSMBRename(const int xid, struct cifsTconInfo *tcon,
 	      const char *fromName, const char *toName,
 	      const struct nls_table *nls_codepage, int remap)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 12bb656..83b4741 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1523,6 +1523,9 @@  int cifs_fsync(struct file *file, struct dentry *dentry, int datasync)
 {
 	int xid;
 	int rc = 0;
+	struct cifsTconInfo *tcon;
+	struct cifsFileInfo *smbfile =
+		(struct cifsFileInfo *)file->private_data;
 	struct inode *inode = file->f_path.dentry->d_inode;
 
 	xid = GetXid();
@@ -1534,7 +1537,11 @@  int cifs_fsync(struct file *file, struct dentry *dentry, int datasync)
 	if (rc == 0) {
 		rc = CIFS_I(inode)->write_behind_rc;
 		CIFS_I(inode)->write_behind_rc = 0;
+		tcon = CIFS_SB(inode->i_sb)->tcon;
+		if (!rc && tcon && smbfile)
+			rc = CIFSSMBFlush(xid, tcon, smbfile->netfid);
 	}
+
 	FreeXid(xid);
 	return rc;
 }