From patchwork Thu Mar 3 22:25:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Marshall X-Patchwork-Id: 8496511 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id E653EC0553 for ; Thu, 3 Mar 2016 22:25:16 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D62722039D for ; Thu, 3 Mar 2016 22:25:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8BF5220398 for ; Thu, 3 Mar 2016 22:25:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758571AbcCCWZM (ORCPT ); Thu, 3 Mar 2016 17:25:12 -0500 Received: from mail-lb0-f172.google.com ([209.85.217.172]:34049 "EHLO mail-lb0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758530AbcCCWZK (ORCPT ); Thu, 3 Mar 2016 17:25:10 -0500 Received: by mail-lb0-f172.google.com with SMTP id cf7so24577026lbb.1 for ; Thu, 03 Mar 2016 14:25:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=omnibond-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=D8R95VlQg9AUYH1bKLMYWm7I2cK/V08gYMAORQrZbxs=; b=SiYidb4156KO+nE9RVu9D4bu/qRlMkDMyyCbdZXJb5S3baP2L/g8+V+00DZJXqxuUQ OqR6oSAkZJjJHwp/VR4DogcdG72vR1j2nmt+bpgZ4ysiTlsHJNEtTjtmYukyn4flIoo8 59DwdWKEIs8OY3rQ/mGjdJqs3pvDbrUCCfUrvxkYeYhNaX72oxzF6VQLWgvjSNROKIsa QnV4Tv/qev3WQO/fAF65pKodzWcMEMo3SRabObxm37ms89f6+xChYj7Xg3R2bISc5fU9 vOmh9FMC0ogEuzFOqrmzws0iEIp22jRFbPQr207SjX90LLTzcD+zh/kQIXQWFaqx1xSb tntw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=D8R95VlQg9AUYH1bKLMYWm7I2cK/V08gYMAORQrZbxs=; b=LuGef1OvAzPmI2mV8er8kvXgEnTqXZqQr1jZG2crznJ9VbC9hgxvx+4uwIWLD4KEc/ mP+AWB1EfQW3y/uiAE4WuTOx9cjBJVFHFLy8dw797GqnVEmKSdBKSH7TBvtJLBzWSxuJ bEzscWkdRq8IR0/dMXsP/OyafwJ7He29echXK8Nr6QZIGUKMTDSDjrxkWJ2/9yH9veC9 VA/0MkPQToTcoGo88jikFgnkqsCSiWjNh1q0s3VwOl5rOJNcG2tFtAemALpmVBdkpgip PdIqvs4dv4yz7qieArEg+srZ48VTZrEoAsthoNbCSivqYgqponbHYqmOCeMVpo5tG5Cz as3g== X-Gm-Message-State: AD7BkJLoABGeOPwCJ/PoND3rtmfpEis/hIqSI1jTPcUIZFxKpWcBqoJOL50uofyRUMBAW7U1oUj3GwWKDGwx5w== MIME-Version: 1.0 X-Received: by 10.112.156.6 with SMTP id wa6mr1860302lbb.66.1457043908620; Thu, 03 Mar 2016 14:25:08 -0800 (PST) Received: by 10.112.161.106 with HTTP; Thu, 3 Mar 2016 14:25:08 -0800 (PST) In-Reply-To: References: <20160122200442.GF17997@ZenIV.linux.org.uk> <20160123001202.GJ17997@ZenIV.linux.org.uk> <20160123012808.GK17997@ZenIV.linux.org.uk> <20160123191055.GN17997@ZenIV.linux.org.uk> <20160123214006.GO17997@ZenIV.linux.org.uk> <20160123224632.GQ17997@ZenIV.linux.org.uk> Date: Thu, 3 Mar 2016 17:25:08 -0500 Message-ID: Subject: Re: write() semantics (Re: Orangefs ABI documentation) From: Mike Marshall To: Linus Torvalds Cc: Al Viro , linux-fsdevel , Mike Marshall Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Here is what I have come up with to try and make our return to interrupted writes more acceptable... in my tests it seems to work. My test involved running the client-core with a tiny IO buffer size (4k) and a C program with a large write buffer(32M). That way there was plenty of time for me to fire off the C program and hit Ctrl-C while the write was chugging along. The return value from do_readv_writev always matches the size of the file written by the aborted C program in my tests. I changed the C program around so that sometimes I ran it with (O_CREAT | O_RDWR) on open and sometimes with (O_CREAT | O_RDWR | O_APPEND) and it seemed to do the right thing. I didn't try to set up a signal handler so that the signal wasn't fatal to the process, I guess that would be a test to actually see and verify the correct short return code to write... Do you all think this looks like it should work in principle? BTW: in the distant past someone else attempted to solve this problem the "nfs intr" way - we have an intr mount option, and that's why there's all that sweating over whether or not stuff is "interruptible" in waitqueue.c... I'm not sure if our intr mount option is relevant anymore given the way the op handling code has evolved... On Sat, Jan 23, 2016 at 6:35 PM, Linus Torvalds wrote: > On Sat, Jan 23, 2016 at 2:46 PM, Al Viro wrote: >> >> What should we get? -EINTR, despite having written some data? > > No, that's not acceptable. > > Either all or nothing (which is POSIX) or the NFS 'intr' mount > behavior (partial write return, -EINTR only when nothing was written > at all). And, like NFS, a mount option might be a good thing. > > And of course, for the usual reasons, fatal signals are special in > that for them we generally say "screw posix, nobody sees the return > value anyway", but even there the filesystem might as well still > return the partial return value (just to not introduce yet another > special case). > > In fact, I think that with our "fatal signals interrupt" behavior, > nobody should likely use the "intr" mount option on NFS. Even if the > semantics may be "better", there are likely simply just too many > programs that don't check the return value of "write()" at all, much > less handle partial writes correctly. > > (And yes, our "screw posix" behavior wrt fatal signals is strictly > wrong even _despite_ the fact that nobody sees the return value - > other processes can still obviously see that the whole write wasn't > done. But blocking on a fatal signal is _so_ annoying that it's one of > those things where we just say "posix was wrong on this one, and if we > squint a bit we look _almost_ like we're compliant"). > > Linus --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c index 6f2e0f7..4349c9d 100644 --- a/fs/orangefs/file.c +++ b/fs/orangefs/file.c @@ -180,21 +180,54 @@ populate_shared_memory: } if (ret < 0) { - /* - * don't write an error to syslog on signaled operation - * termination unless we've got debugging turned on, as - * this can happen regularly (i.e. ctrl-c) - */ - if (ret == -EINTR) + if (ret == -EINTR) { + /* + * We can't return EINTR if any data was written, + * it's not POSIX. It is minimally acceptable + * to give a partial write, the way NFS does. + * + * It would be optimal to return all or nothing, + * but if a userspace write is bigger than + * an IO buffer, and the interrupt occurs + * between buffer writes, that would not be + * possible. + */ + switch (new_op->op_state - OP_VFS_STATE_GIVEN_UP) { + /* + * If the op was waiting when the interrupt + * occurred, then the client-core did not + * trigger the write. + */ + case OP_VFS_STATE_WAITING: + ret = 0; + break; + /* + * If the op was in progress when the interrupt + * occurred, then the client-core was able to + * trigger the write. + */ + case OP_VFS_STATE_INPROGR: + ret = total_size; + break; + default: + gossip_err("%s: unexpected op state :%d:.\n", + __func__, + new_op->op_state); + ret = 0; + break; + } gossip_debug(GOSSIP_FILE_DEBUG, - "%s: returning error %ld\n", __func__, - (long)ret); - else + "%s: got EINTR, state:%d: %p\n", + __func__, + new_op->op_state, + new_op); + } else { gossip_err("%s: error in %s handle %pU, returning %zd\n", __func__, type == ORANGEFS_IO_READ ? "read from" : "write to", handle, ret); + } if (orangefs_cancel_op_in_progress(new_op)) return ret;