From patchwork Mon Feb  6 15:49:52 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jeff Layton <jlayton@redhat.com>
X-Patchwork-Id: 9558259
Return-Path: <ceph-devel-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	C776960413 for <patchwork-ceph-devel@patchwork.kernel.org>;
	Mon,  6 Feb 2017 15:49:59 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B733B25EA6
	for <patchwork-ceph-devel@patchwork.kernel.org>;
	Mon,  6 Feb 2017 15:49:59 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id AAE8527F88; Mon,  6 Feb 2017 15:49:59 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7913A25EA6
	for <patchwork-ceph-devel@patchwork.kernel.org>;
	Mon,  6 Feb 2017 15:49:58 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753959AbdBFPt4 (ORCPT
	<rfc822;patchwork-ceph-devel@patchwork.kernel.org>);
	Mon, 6 Feb 2017 10:49:56 -0500
Received: from mail-qk0-f175.google.com ([209.85.220.175]:36824 "EHLO
	mail-qk0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751372AbdBFPtz (ORCPT
	<rfc822; ceph-devel@vger.kernel.org>); Mon, 6 Feb 2017 10:49:55 -0500
Received: by mail-qk0-f175.google.com with SMTP id 11so58914219qkl.3
	for <ceph-devel@vger.kernel.org>;
	Mon, 06 Feb 2017 07:49:55 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to
	:references:mime-version:content-transfer-encoding;
	bh=nXhanfELynIgFW066VyjjBMwFRjUeUYkxwcy7WCoS04=;
	b=OtXBvx+DEmvWKpuoeV9PR80NmNjVNbtktqYlDG66xtAJIsyJDA0Y+bdHvSc+cQoCzB
	FcVgEUTjc5TitrCbRzfy3n5ArgXmNCtEMITiJwLj7UchRV8swZJTDmB/fdknre0F+ERL
	b1jd64Z0I3WIegF+fZ6NsHmj87X9GG5n5rmuPaJV2c0R1BxeU+UnQw5wGruB8Nu4Ne12
	2bLrwMJw9hUWaN8slSWAoQ5jLXhnzqoQCGw22jwKSjFCxZqLvDkHvTLE79++rFjRzjyd
	KNeQUH77eZZJP7Ug26TXbM0L10yFotPxgOJ6juUO/BJLEINF3jrjY/LEuKuHtab+3t/p
	M6iA==
X-Gm-Message-State: 
 AMke39lvdQG1xwmXi0c7nMnZJlMFYUrMO+rE+CDFNuTqo/UpgegiQPGxPFQd4g7RgN+G0Xr2
X-Received: by 10.55.23.194 with SMTP id 63mr10784434qkx.125.1486396194318;
	Mon, 06 Feb 2017 07:49:54 -0800 (PST)
Received: from cpe-2606-A000-1125-405B-DECE-A663-7B68-BD3E.dyn6.twc.com
	(cpe-2606-A000-1125-405B-DECE-A663-7B68-BD3E.dyn6.twc.com.
	[2606:a000:1125:405b:dece:a663:7b68:bd3e])
	by smtp.gmail.com with ESMTPSA id
	s20sm803741qtc.39.2017.02.06.07.49.53
	(version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
	Mon, 06 Feb 2017 07:49:53 -0800 (PST)
Message-ID: <1486396192.2712.4.camel@redhat.com>
Subject: Re: [PATCH v2 6/6] libceph: allow requests to return immediately on
	full conditions if caller wishes
From: Jeff Layton <jlayton@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>,
	"Yan, Zheng" <zyan@redhat.com>, Sage Weil <sage@redhat.com>,
	John Spray <jspray@redhat.com>
Date: Mon, 06 Feb 2017 10:49:52 -0500
In-Reply-To: 
 <CAOi1vP_k0q0ff-BjTWYk8faJvRs4TNFgJO+XeovebvAM9Wtxfg@mail.gmail.com>
References: <20170206132927.9219-1-jlayton@redhat.com>
	<20170206132927.9219-7-jlayton@redhat.com>
	<CAOi1vP_k0q0ff-BjTWYk8faJvRs4TNFgJO+XeovebvAM9Wtxfg@mail.gmail.com>
X-Mailer: Evolution 3.22.4 (3.22.4-2.fc25) 
Mime-Version: 1.0
Sender: ceph-devel-owner@vger.kernel.org
Precedence: bulk
List-ID: <ceph-devel.vger.kernel.org>
X-Mailing-List: ceph-devel@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On Mon, 2017-02-06 at 15:09 +0100, Ilya Dryomov wrote:

[...]

> > diff --git a/include/linux/ceph/rados.h b/include/linux/ceph/rados.h
> > index 5c0da61cb763..def43570a85a 100644
> > --- a/include/linux/ceph/rados.h
> > +++ b/include/linux/ceph/rados.h
> > @@ -401,6 +401,7 @@ enum {
> >         CEPH_OSD_FLAG_KNOWN_REDIR = 0x400000,  /* redirect bit is authoritative */
> >         CEPH_OSD_FLAG_FULL_TRY =    0x800000,  /* try op despite full flag */
> >         CEPH_OSD_FLAG_FULL_FORCE = 0x1000000,  /* force op despite full flag */
> > +       CEPH_OSD_FLAG_FULL_CANCEL = 0x2000000, /* cancel operation on full flag */
> 
> Is this a new flag?  This is the wire protocol and I don't see it in
> ceph.git.
> 
> I'll look at epoch_barrier and callback stuff later.
> 
> Thanks,
> 
>                 Ilya

Here's a respun version of the last patch in the set. This should avoid
adding an on the wire flag. I just added a new bool and changed the
code to set and look at that to indicate the desire for an immediate
error return in this case. Compiles but is otherwise untested. I'll give
it a go in a bit.

-----------------------------8<------------------------------

libceph: allow requests to return immediately on full
 conditions if caller wishes

Right now, cephfs will cancel any in-flight OSD write operations when a
new map comes in that shows the OSD or pool as full, but nothing
prevents new requests from stalling out after that point.

If the caller knows that it will want an immediate error return instead
of blocking on a full or at-quota error condition then allow it to set a
flag to request that behavior. Cephfs write requests will always set
that flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/ceph/addr.c                  | 4 ++++
 fs/ceph/file.c                  | 4 ++++
 include/linux/ceph/osd_client.h | 1 +
 net/ceph/osd_client.c           | 6 ++++++
 4 files changed, 15 insertions(+)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 4547bbf80e4f..ef9c9bae7460 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1040,6 +1040,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 
 		req->r_callback = writepages_finish;
 		req->r_inode = inode;
+		req->r_enospc_on_full = true;
 
 		/* Format the osd request message and submit the write */
 		len = 0;
@@ -1689,6 +1690,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 	}
 
 	req->r_mtime = inode->i_mtime;
+	req->r_enospc_on_full = true;
 	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 	if (!err)
 		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1732,6 +1734,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 	}
 
 	req->r_mtime = inode->i_mtime;
+	req->r_enospc_on_full = true;
 	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 	if (!err)
 		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1893,6 +1896,7 @@ static int __ceph_pool_perm_get(struct ceph_inode_info *ci,
 	err = ceph_osdc_start_request(&fsc->client->osdc, rd_req, false);
 
 	wr_req->r_mtime = ci->vfs_inode.i_mtime;
+	wr_req->r_enospc_on_full = true;
 	err2 = ceph_osdc_start_request(&fsc->client->osdc, wr_req, false);
 
 	if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index a91a4f1fc837..eaed17f90d5f 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -714,6 +714,7 @@ static void ceph_aio_retry_work(struct work_struct *work)
 	req->r_callback = ceph_aio_complete_req;
 	req->r_inode = inode;
 	req->r_priv = aio_req;
+	req->r_enospc_on_full = true;
 
 	ret = ceph_osdc_start_request(req->r_osdc, req, false);
 out:
@@ -912,6 +913,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 
 			osd_req_op_init(req, 1, CEPH_OSD_OP_STARTSYNC, 0);
 			req->r_mtime = mtime;
+			req->r_enospc_on_full = true;
 		}
 
 		osd_req_op_extent_osd_data_pages(req, 0, pages, len, start,
@@ -1105,6 +1107,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 						false, true);
 
 		req->r_mtime = mtime;
+		req->r_enospc_on_full = true;
 		ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 		if (!ret)
 			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1557,6 +1560,7 @@ static int ceph_zero_partial_object(struct inode *inode,
 	}
 
 	req->r_mtime = inode->i_mtime;
+	req->r_enospc_on_full = true;
 	ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 	if (!ret) {
 		ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 17bf1873bb01..f01e93ff03d5 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -172,6 +172,7 @@ struct ceph_osd_request {
 
 	int               r_result;
 	bool              r_got_reply;
+	bool		  r_enospc_on_full; /* return ENOSPC when full */
 
 	struct ceph_osd_client *r_osdc;
 	struct kref       r_kref;
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index d61d7a79fdb3..9f40d11b3c68 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -50,6 +50,7 @@ static void link_linger(struct ceph_osd *osd,
 			struct ceph_osd_linger_request *lreq);
 static void unlink_linger(struct ceph_osd *osd,
 			  struct ceph_osd_linger_request *lreq);
+static void complete_request(struct ceph_osd_request *req, int err);
 
 #if 1
 static inline bool rwsem_is_wrlocked(struct rw_semaphore *sem)
@@ -1643,6 +1644,7 @@ static void __submit_request(struct ceph_osd_request *req, bool wrlocked)
 	enum calc_target_result ct_res;
 	bool need_send = false;
 	bool promoted = false;
+	int ret = 0;
 
 	WARN_ON(req->r_tid || req->r_got_reply);
 	dout("%s req %p wrlocked %d\n", __func__, req, wrlocked);
@@ -1683,6 +1685,8 @@ static void __submit_request(struct ceph_osd_request *req, bool wrlocked)
 		pr_warn_ratelimited("FULL or reached pool quota\n");
 		req->r_t.paused = true;
 		maybe_request_map(osdc);
+		if (req->r_enospc_on_full)
+			ret = -ENOSPC;
 	} else if (!osd_homeless(osd)) {
 		need_send = true;
 	} else {
@@ -1699,6 +1703,8 @@ static void __submit_request(struct ceph_osd_request *req, bool wrlocked)
 	link_request(osd, req);
 	if (need_send)
 		send_request(req);
+	else if (ret)
+		complete_request(req, ret);
 	mutex_unlock(&osd->lock);
 
 	if (ct_res == CALC_TARGET_POOL_DNE)