From patchwork Fri Feb 10 23:17:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Shilovsky X-Patchwork-Id: 9567595 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D04B3602B6 for ; Fri, 10 Feb 2017 23:18:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C99A228541 for ; Fri, 10 Feb 2017 23:18:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE638285F1; Fri, 10 Feb 2017 23:18:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED17728541 for ; Fri, 10 Feb 2017 23:18:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751413AbdBJXSI (ORCPT ); Fri, 10 Feb 2017 18:18:08 -0500 Received: from mail-pg0-f66.google.com ([74.125.83.66]:35350 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751934AbdBJXSH (ORCPT ); Fri, 10 Feb 2017 18:18:07 -0500 Received: by mail-pg0-f66.google.com with SMTP id 204so4077121pge.2 for ; Fri, 10 Feb 2017 15:18:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=E1bj9YVdvh86sP7ls5vCmB/OcXyahs74jjdy1hDeQlI=; b=P9+R+XgNO8Hawo7OiVi6B+Q5IRKUqZNiKmlV5Ip3Tt++7hfxOZ2WreAnoHdDltcZDj 9F0vyQ4nfq7oCPe5dGLQ91ekddfvGHWymgnhFxMZvdEFdII+88fuJNv5Gi61IWoWxoAq Es5hxJUZF6YOwS1ulPYh5++w2AQI0rqQ+EK4u0pzUf6OiiV3eDkAkW6kPRiHyCdEMCzK nh5yHnzaro1uPKVw8JyEm9AOgXfQa3QJzWPlTG61U4Zsxhs8lb68ayKal3vofFZyZkKc T0o5zwtsAd0CMGfX/s0Xz/dEuvkLx/6pi5UxmD/4QWyZcC/AQnaaX69rzxOYOpN78Ms5 e7Wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=E1bj9YVdvh86sP7ls5vCmB/OcXyahs74jjdy1hDeQlI=; b=rJGIdxJmYe8N4z9Um6WVXNdLmvHx8qJtKYzak9WSTNp3FUp25bq7huqB7sWTipkNl4 OQZ9AVnWY4zeWaubtOEGSoyYiDuS/Zh2cqzI5P+hjy+jj+iVBq9P6/xdPAIN/3xjK2t8 DGDNj7iEKvgTqPHnh8A+i7TrV6vf0XhQEX4QLHwy92EgUOzdaa/Tzw390IwTY/3udvTh UPSn0h/fcuaRZHtppFxd/fwjwGxmmNE6nj3Hwcfgj/8jT+WnbdbFY70LuF9TTy5gmzFF o5yjx21OkAAuV4qdoMOkIE/MCEcfsxva1NE73Je1sNMZ2f735GRrBhj8qLZyEVJl4Lqm wRDw== X-Gm-Message-State: AMke39lSdd3kG5EHX0taTxDvGsP/saEPdGi0jI0Uz6Q6s0eg1JtSNKDIUCtHjgWFEl0AWQ== X-Received: by 10.99.137.199 with SMTP id v190mr13635653pgd.120.1486768686294; Fri, 10 Feb 2017 15:18:06 -0800 (PST) Received: from ubuntu-vm.corp.microsoft.com ([2001:4898:80e8:9::63b]) by smtp.gmail.com with ESMTPSA id p26sm7551719pfj.23.2017.02.10.15.18.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 10 Feb 2017 15:18:04 -0800 (PST) From: Pavel Shilovsky X-Google-Original-From: Pavel Shilovsky To: linux-cifs@vger.kernel.org Subject: [PATCH 1/2] CIFS: Add asynchronous read support through kernel AIO Date: Fri, 10 Feb 2017 15:17:57 -0800 Message-Id: <1486768678-36802-2-git-send-email-pshilov@microsoft.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1486768678-36802-1-git-send-email-pshilov@microsoft.com> References: <1486768678-36802-1-git-send-email-pshilov@microsoft.com> Sender: linux-cifs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently the code doesn't recognize asynchronous calls passed by io_submit() and processes all calls synchronously. This is not what kernel AIO expects. This patch improves this for reading by introducing new async context that keeps track of all issued i/o requests and moving a response collecting procedure to a separate thread. This allows to return to the caller immediately for asynchronous calls and call the iocb->ki_complete() once all requests are completed. For synchronous calls the current thread simply waits until all requests are completed. This improves reading performance of single threaded applications with increasing of i/o queue depth size. Signed-off-by: Pavel Shilovsky --- fs/cifs/cifsglob.h | 18 ++++ fs/cifs/file.c | 273 +++++++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 252 insertions(+), 39 deletions(-) diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index f9a9a12..80771ca 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1109,6 +1109,23 @@ struct cifs_io_parms { struct cifs_tcon *tcon; }; +struct cifs_aio_ctx { + struct kref refcount; + struct list_head list; + struct mutex aio_mutex; + struct completion done; + struct iov_iter iter; + struct kiocb *iocb; + struct cifsFileInfo *cfile; + struct page **pages; + struct bio_vec *bv; + unsigned int npages; + ssize_t rc; + unsigned int len; + unsigned int total_len; + bool should_dirty; +}; + struct cifs_readdata; /* asynchronous read support */ @@ -1118,6 +1135,7 @@ struct cifs_readdata { struct completion done; struct cifsFileInfo *cfile; struct address_space *mapping; + struct cifs_aio_ctx *ctx; __u64 offset; unsigned int bytes; unsigned int got_bytes; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 98dc842..6ceeed2 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "cifsfs.h" #include "cifspdu.h" @@ -2410,6 +2411,108 @@ int cifs_flush(struct file *file, fl_owner_t id) return rc; } +static inline struct cifs_aio_ctx * +cifs_aio_ctx_alloc(void) +{ + struct cifs_aio_ctx *ctx; + + ctx = kzalloc(sizeof(struct cifs_aio_ctx), GFP_KERNEL); + if (!ctx) + return NULL; + + INIT_LIST_HEAD(&ctx->list); + mutex_init(&ctx->aio_mutex); + init_completion(&ctx->done); + kref_init(&ctx->refcount); + return ctx; +} + +static inline void +cifs_aio_ctx_release(struct kref *refcount) +{ + struct cifs_aio_ctx *ctx = container_of(refcount, + struct cifs_aio_ctx, refcount); + + cifsFileInfo_put(ctx->cfile); + kvfree(ctx->bv); + kfree(ctx); +} + +static int +setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw) +{ + ssize_t rc; + unsigned int cur_npages; + unsigned int npages = 0; + unsigned int i; + size_t len; + size_t count = iov_iter_count(iter); + unsigned int saved_len; + size_t start; + unsigned int max_pages = iov_iter_npages(iter, INT_MAX); + struct page **pages; + struct bio_vec *bv; + + if (iter->type & ITER_KVEC) { + memcpy(&ctx->iter, iter, sizeof(struct iov_iter)); + ctx->len = count; + iov_iter_advance(iter, count); + return 0; + } + + bv = kmalloc_array(max_pages, sizeof(struct bio_vec), GFP_KERNEL); + if (!bv) { + bv = vmalloc(max_pages * sizeof(struct bio_vec)); + if (!bv) + return -ENOMEM; + } + + saved_len = count; + + while (count && npages < max_pages) { + rc = iov_iter_get_pages_alloc(iter, &pages, count, &start); + if (rc < 0) { + cifs_dbg(VFS, "couldn't get user pages (rc=%zd)\n", rc); + break; + } + + if (rc > count) { + cifs_dbg(VFS, "get pages rc=%zd more than %zu\n", rc, + count); + break; + } + + iov_iter_advance(iter, rc); + count -= rc; + rc += start; + cur_npages = (rc + PAGE_SIZE - 1) / PAGE_SIZE; + + if (npages + cur_npages > max_pages) { + cifs_dbg(VFS, "out of vec array capacity (%u vs %u)\n", + npages + cur_npages, max_pages); + break; + } + + for (i = 0; i < cur_npages; i++) { + len = rc > PAGE_SIZE ? PAGE_SIZE : rc; + bv[npages + i].bv_page = pages[i]; + bv[npages + i].bv_offset = start; + bv[npages + i].bv_len = len - start; + rc -= len; + start = 0; + } + + npages += cur_npages; + kvfree(pages); + } + + ctx->bv = bv; + ctx->len = saved_len - count; + ctx->npages = npages; + iov_iter_bvec(&ctx->iter, ITER_BVEC | rw, ctx->bv, npages, ctx->len); + return 0; +} + static int cifs_write_allocate_pages(struct page **pages, unsigned long num_pages) { @@ -2859,6 +2962,7 @@ cifs_uncached_readdata_release(struct kref *refcount) struct cifs_readdata, refcount); unsigned int i; + kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); for (i = 0; i < rdata->nr_pages; i++) { put_page(rdata->pages[i]); rdata->pages[i] = NULL; @@ -2900,6 +3004,8 @@ cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter) return remaining ? -EFAULT : 0; } +static void collect_uncached_read_data(struct cifs_aio_ctx *ctx); + static void cifs_uncached_readv_complete(struct work_struct *work) { @@ -2907,6 +3013,8 @@ cifs_uncached_readv_complete(struct work_struct *work) struct cifs_readdata, work); complete(&rdata->done); + collect_uncached_read_data(rdata->ctx); + /* the below call can possibly free the last ref to aio ctx */ kref_put(&rdata->refcount, cifs_uncached_readdata_release); } @@ -2973,7 +3081,8 @@ cifs_uncached_copy_into_pages(struct TCP_Server_Info *server, static int cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, - struct cifs_sb_info *cifs_sb, struct list_head *rdata_list) + struct cifs_sb_info *cifs_sb, struct list_head *rdata_list, + struct cifs_aio_ctx *ctx) { struct cifs_readdata *rdata; unsigned int npages, rsize, credits; @@ -3020,6 +3129,8 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, rdata->read_into_pages = cifs_uncached_read_into_pages; rdata->copy_into_pages = cifs_uncached_copy_into_pages; rdata->credits = credits; + rdata->ctx = ctx; + kref_get(&ctx->refcount); if (!rdata->cfile->invalidHandle || !cifs_reopen_file(rdata->cfile, true)) @@ -3042,50 +3153,37 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, return rc; } -ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to) +static void +collect_uncached_read_data(struct cifs_aio_ctx *ctx) { - struct file *file = iocb->ki_filp; - ssize_t rc; - size_t len; - ssize_t total_read = 0; - loff_t offset = iocb->ki_pos; + struct cifs_readdata *rdata, *tmp; + struct iov_iter *to = &ctx->iter; struct cifs_sb_info *cifs_sb; struct cifs_tcon *tcon; - struct cifsFileInfo *open_file; - struct cifs_readdata *rdata, *tmp; - struct list_head rdata_list; - - len = iov_iter_count(to); - if (!len) - return 0; - - INIT_LIST_HEAD(&rdata_list); - cifs_sb = CIFS_FILE_SB(file); - open_file = file->private_data; - tcon = tlink_tcon(open_file->tlink); - - if (!tcon->ses->server->ops->async_readv) - return -ENOSYS; + unsigned int i; + int rc; - if ((file->f_flags & O_ACCMODE) == O_WRONLY) - cifs_dbg(FYI, "attempting read on write only file instance\n"); + tcon = tlink_tcon(ctx->cfile->tlink); + cifs_sb = CIFS_SB(ctx->cfile->dentry->d_sb); - rc = cifs_send_async_read(offset, len, open_file, cifs_sb, &rdata_list); + mutex_lock(&ctx->aio_mutex); - /* if at least one read request send succeeded, then reset rc */ - if (!list_empty(&rdata_list)) - rc = 0; + if (list_empty(&ctx->list)) { + mutex_unlock(&ctx->aio_mutex); + return; + } - len = iov_iter_count(to); + rc = ctx->rc; /* the loop below should proceed in the order of increasing offsets */ again: - list_for_each_entry_safe(rdata, tmp, &rdata_list, list) { + list_for_each_entry_safe(rdata, tmp, &ctx->list, list) { if (!rc) { - /* FIXME: freezable sleep too? */ - rc = wait_for_completion_killable(&rdata->done); - if (rc) - rc = -EINTR; - else if (rdata->result == -EAGAIN) { + if (!try_wait_for_completion(&rdata->done)) { + mutex_unlock(&ctx->aio_mutex); + return; + } + + if (rdata->result == -EAGAIN) { /* resend call if it's a retryable error */ struct list_head tmp_list; unsigned int got_bytes = rdata->got_bytes; @@ -3111,9 +3209,9 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to) rdata->offset + got_bytes, rdata->bytes - got_bytes, rdata->cfile, cifs_sb, - &tmp_list); + &tmp_list, ctx); - list_splice(&tmp_list, &rdata_list); + list_splice(&tmp_list, &ctx->list); kref_put(&rdata->refcount, cifs_uncached_readdata_release); @@ -3131,14 +3229,111 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to) kref_put(&rdata->refcount, cifs_uncached_readdata_release); } - total_read = len - iov_iter_count(to); + for (i = 0; i < ctx->npages; i++) { + if (ctx->should_dirty) + set_page_dirty(ctx->bv[i].bv_page); + put_page(ctx->bv[i].bv_page); + } + + ctx->total_len = ctx->len - iov_iter_count(to); - cifs_stats_bytes_read(tcon, total_read); + cifs_stats_bytes_read(tcon, ctx->total_len); /* mask nodata case */ if (rc == -ENODATA) rc = 0; + ctx->rc = (rc == 0) ? ctx->total_len : rc; + + mutex_unlock(&ctx->aio_mutex); + + if (ctx->iocb && ctx->iocb->ki_complete) + ctx->iocb->ki_complete(ctx->iocb, ctx->rc, 0); + else + complete(&ctx->done); +} + +ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to) +{ + struct file *file = iocb->ki_filp; + ssize_t rc; + size_t len; + ssize_t total_read = 0; + loff_t offset = iocb->ki_pos; + struct cifs_sb_info *cifs_sb; + struct cifs_tcon *tcon; + struct cifsFileInfo *cfile; + struct cifs_aio_ctx *ctx; + + len = iov_iter_count(to); + if (!len) + return 0; + + cifs_sb = CIFS_FILE_SB(file); + cfile = file->private_data; + tcon = tlink_tcon(cfile->tlink); + + if (!tcon->ses->server->ops->async_readv) + return -ENOSYS; + + if ((file->f_flags & O_ACCMODE) == O_WRONLY) + cifs_dbg(FYI, "attempting read on write only file instance\n"); + + ctx = cifs_aio_ctx_alloc(); + if (!ctx) + return -ENOMEM; + + ctx->cfile = cifsFileInfo_get(cfile); + + if (!is_sync_kiocb(iocb)) + ctx->iocb = iocb; + + if (to->type & ITER_IOVEC) + ctx->should_dirty = true; + + rc = setup_aio_ctx_iter(ctx, to, READ); + if (rc) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return rc; + } + + len = ctx->len; + + /* grab a lock here due to read response handlers can access ctx */ + mutex_lock(&ctx->aio_mutex); + + rc = cifs_send_async_read(offset, len, cfile, cifs_sb, &ctx->list, ctx); + + /* if at least one read request send succeeded, then reset rc */ + if (!list_empty(&ctx->list)) + rc = 0; + + mutex_unlock(&ctx->aio_mutex); + + if (rc) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return rc; + } + + if (!is_sync_kiocb(iocb)) { + kref_put(&ctx->refcount, cifs_aio_ctx_release); + return -EIOCBQUEUED; + } + + /* FIXME: freezable sleep too? */ + rc = wait_for_completion_killable(&ctx->done); + if (rc) { + mutex_lock(&ctx->aio_mutex); + ctx->rc = rc = -EINTR; + total_read = ctx->total_len; + mutex_unlock(&ctx->aio_mutex); + } else { + rc = ctx->rc; + total_read = ctx->total_len; + } + + kref_put(&ctx->refcount, cifs_aio_ctx_release); + if (total_read) { iocb->ki_pos += total_read; return total_read;