From patchwork Sat Sep 4 20:24:52 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Latchesar Ionkov X-Patchwork-Id: 156111 X-Patchwork-Delegate: ericvh@gmail.com Received: from lists.sourceforge.net (lists.sourceforge.net [216.34.181.88]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id o84KPkDw015772 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 4 Sep 2010 20:26:22 GMT Received: from localhost ([127.0.0.1] helo=sfs-ml-3.v29.ch3.sourceforge.com) by sfs-ml-3.v29.ch3.sourceforge.com with esmtp (Exim 4.69) (envelope-from ) id 1OrzIe-0006EN-BL; Sat, 04 Sep 2010 20:25:20 +0000 Received: from sog-mx-3.v43.ch3.sourceforge.com ([172.29.43.193] helo=mx.sourceforge.net) by sfs-ml-3.v29.ch3.sourceforge.com with esmtp (Exim 4.69) (envelope-from ) id 1OrzIc-0006EI-LI for v9fs-developer@lists.sourceforge.net; Sat, 04 Sep 2010 20:25:18 +0000 Received-SPF: pass (sog-mx-3.v43.ch3.sourceforge.com: domain of gmail.com designates 209.85.210.47 as permitted sender) client-ip=209.85.210.47; envelope-from=lionkov@gmail.com; helo=mail-pz0-f47.google.com; Received: from mail-pz0-f47.google.com ([209.85.210.47]) by sog-mx-3.v43.ch3.sourceforge.com with esmtp (Exim 4.69) id 1OrzIa-0005qA-Dl for v9fs-developer@lists.sourceforge.net; Sat, 04 Sep 2010 20:25:18 +0000 Received: by pzk2 with SMTP id 2so1052698pzk.34 for ; Sat, 04 Sep 2010 13:25:10 -0700 (PDT) Received: by 10.114.125.12 with SMTP id x12mr1160010wac.91.1283631909462; Sat, 04 Sep 2010 13:25:09 -0700 (PDT) Received: from valinor ([166.135.41.61]) by mx.google.com with ESMTPS id o17sm6888022wal.21.2010.09.04.13.24.57 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 04 Sep 2010 13:25:07 -0700 (PDT) Date: Sat, 4 Sep 2010 14:24:52 -0600 From: Latchesar Ionkov To: v9fs-developer@lists.sourceforge.net Message-ID: <20100904202452.GA2279@valinor> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Score: -1.6 (-) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.210.47 listed in list.dnswl.org] -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain 0.0 FREEMAIL_FROM Sender email is freemail (lionkov[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.0 T_TO_NO_BRKTS_FREEMAIL T_TO_NO_BRKTS_FREEMAIL X-Headers-End: 1OrzIa-0005qA-Dl Cc: linux-fsdevel@vger.kernel.org Subject: [V9fs-developer] [RFC2][PATCH] 9p/net: add zero copy support X-BeenThere: v9fs-developer@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: v9fs-developer-bounces@lists.sourceforge.net X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter1.kernel.org [140.211.167.41]); Sat, 04 Sep 2010 20:26:22 +0000 (UTC) diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h index a8de812..871e90d 100644 --- a/include/net/9p/9p.h +++ b/include/net/9p/9p.h @@ -27,6 +27,8 @@ #ifndef NET_9P_H #define NET_9P_H +#include + /** * enum p9_debug_flags - bits for mount time debug parameter * @P9_DEBUG_ERROR: more verbose error messages including original error string @@ -304,6 +306,8 @@ enum p9_qid_t { /* ample room for Twrite/Rread header */ #define P9_IOHDRSZ 24 +#define P9_TWRITE_HDRSZ 23 +#define P9_RREAD_HDRSZ 11 /* Room for readdir header */ #define P9_READDIRHDRSZ 24 @@ -633,9 +637,12 @@ struct p9_rwstat { * @size: prefixed length of the structure * @id: protocol operating identifier of type &p9_msg_t * @tag: transaction id of the request - * @offset: used by marshalling routines to track currentposition in buffer + * @offset: used by marshalling routines to track current position in buffer * @capacity: used by marshalling routines to track total capacity - * @sdata: payload + * @sgcount: number of entries in the scatterlist array + * @sg: array of scatterlist structs that describe the payload. The first + * entry in the array is always mapped and pointer to it is stored in &buf + * @buf: payload (whole if sg is NULL, otherwise the first entry). * * &p9_fcall represents the structure for all 9P RPC * transactions. Requests are packaged into fcalls, and reponses @@ -645,14 +652,21 @@ struct p9_rwstat { */ struct p9_fcall { + /* 9P message header data */ u32 size; u8 id; u16 tag; + /* data related to marshalling */ size_t offset; size_t capacity; - uint8_t *sdata; + /* buffer description */ + int sgcount; + struct scatterlist *sg; + uint8_t *buf; + + struct list_head fclist; }; struct p9_idpool; diff --git a/include/net/9p/client.h b/include/net/9p/client.h index d1aa2cf..ccc3ab7 100644 --- a/include/net/9p/client.h +++ b/include/net/9p/client.h @@ -95,6 +95,8 @@ enum p9_req_status_t { * @wq: wait_queue for the client to block on for this request * @tc: the request fcall structure * @rc: the response fcall structure + * @tb: the request buffer + * @rb: the response buffer * @aux: transport specific data (provided for trans_fd migration) * @req_list: link for higher level objects to chain requests * @@ -110,6 +112,7 @@ enum p9_req_status_t { */ struct p9_req_t { + u16 tag; int status; int t_err; wait_queue_head_t *wq; @@ -134,6 +137,12 @@ struct p9_req_t { * @tagpool - transaction id accounting for session * @reqs - 2D array of requests * @max_tag - current maximum tag id allocated + * @sblist - list of small pdu + * @sbcount - number of elements in sblist + * @lblist - list of large (->msize) pdu + * @lbcount - number of elements in lblist + * @sgblist - list of scatterlist pdu + * @sgbcount - number of elements in sgblist * * The client structure is used to keep track of various per-client * state that has been instantiated. @@ -146,12 +155,20 @@ struct p9_req_t { * Each row is 256 requests and we'll support up to 256 rows for * a total of 64k concurrent requests per session. * + * The client structure keeps track of two classes of 9P message buffers: + * - large buffers (lblist, lbcount) that can hold up to &msize bytes + * messages. + * - small buffers (sblist, sbcount) that can hold up to P9_SMALL_SIZE + * byte header + scatterlist for up to &msize/PAGE_SIZE+2 pages. + * There are up to P9_MAX_BUFS messages kept in the lists. + * * Bugs: duplicated data and potentially unnecessary elements. */ struct p9_client { spinlock_t lock; /* protect client structure */ int msize; + int maxsgcount; unsigned char proto_version; struct p9_trans_module *trans_mod; enum p9_trans_status status; @@ -164,6 +181,11 @@ struct p9_client { struct p9_idpool *tagpool; struct p9_req_t *reqs[P9_ROW_MAXTAG]; int max_tag; + + struct list_head sblist; + int sbcount; + struct list_head lblist; + int lbcount; }; /** @@ -257,6 +279,8 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req); int p9_parse_header(struct p9_fcall *, int32_t *, int8_t *, int16_t *, int); int p9stat_read(char *, int, struct p9_wstat *, int); void p9stat_free(struct p9_wstat *); +struct p9_fcall *p9_fcall_alloc(struct p9_client *c, int size); +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc); int p9_is_proto_dotu(struct p9_client *clnt); int p9_is_proto_dotl(struct p9_client *clnt); diff --git a/include/net/9p/transport.h b/include/net/9p/transport.h index 6d5886e..4912dc1 100644 --- a/include/net/9p/transport.h +++ b/include/net/9p/transport.h @@ -43,11 +43,16 @@ * BUGS: the transport module list isn't protected. */ +enum { + P9_TRANS_SG = 1, /* support for scatterlists */ +}; + struct p9_trans_module { struct list_head list; char *name; /* name of transport */ int maxsize; /* max message size of transport */ int def; /* this transport should be default */ + int flags; /* P9_TRANS_* flags */ struct module *owner; int (*create)(struct p9_client *, const char *, char *); void (*close) (struct p9_client *); @@ -60,4 +65,10 @@ void v9fs_unregister_trans(struct p9_trans_module *m); struct p9_trans_module *v9fs_get_trans_by_name(const substring_t *name); struct p9_trans_module *v9fs_get_default_trans(void); void v9fs_put_trans(struct p9_trans_module *m); + +static int inline p9_trans_sg_support(struct p9_trans_module *ts) +{ + return ts->flags & P9_TRANS_SG; +} + #endif /* NET_9P_TRANSPORT_H */ diff --git a/net/9p/client.c b/net/9p/client.c index 9eb7250..8536411 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -32,12 +32,16 @@ #include #include #include +#include #include #include #include #include #include "protocol.h" +#define P9_FCALL_MINSZ 256 +#define P9_MAX_BUF 32 + /* * Client Option Parsing (code inspired by NFS code) * - a little lazy - parse all client options @@ -92,8 +96,8 @@ static int get_protocol_version(const substring_t *name) return version; } -static struct p9_req_t * -p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...); +static struct p9_req_t *p9_client_rpc(struct p9_client *c, int8_t type, + struct p9_fcall *tc, struct p9_fcall *rc, const char *fmt, ...); /** * parse_options - parse mount options into client structure @@ -229,29 +233,11 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag) return ERR_PTR(-ENOMEM); } init_waitqueue_head(req->wq); - req->tc = kmalloc(sizeof(struct p9_fcall)+c->msize, - GFP_KERNEL); - req->rc = kmalloc(sizeof(struct p9_fcall)+c->msize, - GFP_KERNEL); - if ((!req->tc) || (!req->rc)) { - printk(KERN_ERR "Couldn't grow tag array\n"); - kfree(req->tc); - kfree(req->rc); - kfree(req->wq); - req->tc = req->rc = NULL; - req->wq = NULL; - return ERR_PTR(-ENOMEM); - } - req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall); - req->tc->capacity = c->msize; - req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall); - req->rc->capacity = c->msize; + req->tc = NULL; + req->rc = NULL; } - p9pdu_reset(req->tc); - p9pdu_reset(req->rc); - - req->tc->tag = tag-1; + req->tag = tag - 1; req->status = REQ_STATUS_ALLOC; return &c->reqs[row][col]; @@ -357,14 +343,91 @@ static void p9_tag_cleanup(struct p9_client *c) static void p9_free_req(struct p9_client *c, struct p9_req_t *r) { - int tag = r->tc->tag; - P9_DPRINTK(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag); + int tag = r->tag; + P9_DPRINTK(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag); + p9_fcall_free(c, r->tc); + r->tc = NULL; + p9_fcall_free(c, r->rc); + r->rc = NULL; r->status = REQ_STATUS_IDLE; if (tag != P9_NOTAG && p9_idpool_check(tag, c->tagpool)) p9_idpool_put(tag, c->tagpool); } +struct p9_fcall *p9_fcall_alloc(struct p9_client *c, int size) +{ + unsigned long flags; + struct p9_fcall *fc; + int *count; + struct list_head *list; + + fc = NULL; + if (size < P9_FCALL_MINSZ) { + size = P9_FCALL_MINSZ + c->maxsgcount*sizeof(struct scatterlist); + list = &c->sblist; + count = &c->sbcount; + } else { + size = c->msize; + list = &c->lblist; + count = &c->lbcount; + } + + spin_lock_irqsave(&c->lock, flags); + if (*count > 0) { + fc = list_first_entry(list, struct p9_fcall, fclist); + list_del(&fc->fclist); + (*count)--; + } + spin_unlock_irqrestore(&c->lock, flags); + if (fc) + return fc; + + /* no free fcalls, allocate one */ + fc = kmalloc(sizeof(struct p9_fcall) + size, GFP_KERNEL); + if (!fc) + return NULL; + + INIT_LIST_HEAD(&fc->fclist); + fc->capacity = size; + fc->buf = (uint8_t *) &fc[1]; + fc->sgcount = 0; + fc->sg = NULL; + + return fc; +} +EXPORT_SYMBOL(p9_fcall_alloc); + +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc) +{ + unsigned long flags; + int *count; + struct list_head *list; + + if (!fc) + return; + + fc->sgcount = 0; + fc->sg = NULL; + if (fc->capacity == c->msize) { + count = &c->lbcount; + list = &c->lblist; + } else { + count = &c->sbcount; + list = &c->sblist; + } + + spin_lock_irqsave(&c->lock, flags); + if (*count < P9_MAX_BUF) { + list_add_tail(&fc->fclist, list); + fc = NULL; + (*count)++; + } + spin_unlock_irqrestore(&c->lock, flags); + kfree(fc); +} +EXPORT_SYMBOL(p9_fcall_free); + /** * p9_client_cb - call back from transport to client * c: client state @@ -399,14 +462,12 @@ p9_parse_header(struct p9_fcall *pdu, int32_t *size, int8_t *type, int16_t *tag, int err; pdu->offset = 0; - if (pdu->size == 0) - pdu->size = 7; - err = p9pdu_readf(pdu, 0, "dbw", &r_size, &r_type, &r_tag); if (err) goto rewind_and_exit; - pdu->size = r_size; +/* we allow short reads + pdu->size = r_size; */ pdu->id = r_type; pdu->tag = r_tag; @@ -428,6 +489,44 @@ rewind_and_exit: } EXPORT_SYMBOL(p9_parse_header); +static int p9_sg_read_error(struct p9_fcall *rc, int proto_version, + char **ename, int *ecode) +{ + int err, n, fecode; + int16_t len, count; + char *str; + + /* the string len is in the header */ + err = p9pdu_readf(rc, proto_version, "w", &len); + if (err < 0) + return err; + + count = len; + fecode = (proto_version == p9_proto_2000u) || + (proto_version == p9_proto_2000L); + + if (fecode) + count += 4; + + str = kmalloc(count + 1, GFP_KERNEL); + if (!str) + return -ENOMEM; + + n = sg_copy_to_buffer(rc->sg, rc->sgcount, str, count); + if (fecode) { + /* if we got short read, set ecode to -EINVAL */ + if (n == count) { + *ecode = le32_to_cpu(*(__le32 *) &str[len]); + n -= 4; + } else + *ecode = -EINVAL; + } + + str[n] = '\0'; + *ename = str; + return 0; +} + /** * p9_check_errors - check 9p packet for error return and process it * @c: current client instance @@ -442,28 +541,50 @@ EXPORT_SYMBOL(p9_parse_header); static int p9_check_errors(struct p9_client *c, struct p9_req_t *req) { int8_t type; + int32_t size; int err; + struct p9_fcall *rc; - err = p9_parse_header(req->rc, NULL, &type, NULL, 0); + rc = req->rc; + err = p9_parse_header(rc, &size, &type, NULL, 0); if (err) { P9_DPRINTK(P9_DEBUG_ERROR, "couldn't parse header %d\n", err); return err; } - if (type == P9_RERROR) { + if (type == req->tc->id+1) { + if (size != rc->size) { + P9_DPRINTK(P9_DEBUG_ERROR, "short read: %d got %d\n", + rc->size, size); + err = -EINVAL; + } else + err = 0; + } else if (type == P9_RERROR) { int ecode; char *ename; - err = p9pdu_readf(req->rc, c->proto_version, "s?d", + /* error handling is complicated because of the + * Tread/Rerror case if scatterlist is used */ + if (rc->sg) + err = p9_sg_read_error(req->rc, c->proto_version, &ename, &ecode); + else + err = p9pdu_readf(req->rc, c->proto_version, "s?d", + &ename, &ecode); + if (err) { - P9_DPRINTK(P9_DEBUG_ERROR, "couldn't parse error%d\n", - err); - return err; + if (size != rc->size) { + ename = kstrdup("Error name too long", + GFP_KERNEL); + ecode = -EINVAL; + } else { + P9_DPRINTK(P9_DEBUG_ERROR, + "couldn't parse error%d\n", err); + return err; + } } - if (p9_is_proto_dotu(c) || - p9_is_proto_dotl(c)) + if (p9_is_proto_dotu(c) || p9_is_proto_dotl(c)) err = -ecode; if (!err || !IS_ERR_VALUE(err)) @@ -472,8 +593,11 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req) P9_DPRINTK(P9_DEBUG_9P, "<<< RERROR (%d) %s\n", -ecode, ename); kfree(ename); - } else - err = 0; + } else { + P9_DPRINTK(P9_DEBUG_ERROR, "mismatch: expected %d, got %d\n", + req->tc->id + 1, type); + err = -EINVAL; + } return err; } @@ -502,7 +626,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq) P9_DPRINTK(P9_DEBUG_9P, ">>> TFLUSH tag %d\n", oldtag); - req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag); + req = p9_client_rpc(c, P9_TFLUSH, NULL, NULL, "w", oldtag); if (IS_ERR(req)) return PTR_ERR(req); @@ -522,13 +646,13 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq) * p9_client_rpc - issue a request and wait for a response * @c: client session * @type: type of request - * @fmt: protocol format string (see protocol.c) + * @tcfmt: request protocol format string (see protocol.c) + * @rcfmt: response protocol format string * - * Returns request structure (which client must free using p9_free_req) */ -static struct p9_req_t * -p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) +static struct p9_req_t *p9_client_rpc(struct p9_client *c, int8_t type, + struct p9_fcall *tc, struct p9_fcall *rc, const char *fmt, ...) { va_list ap; int tag, err; @@ -563,13 +687,43 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) if (IS_ERR(req)) return req; + /* set up the outgoing buffer */ + req->tc = tc; + if (!req->tc) + req->tc = p9_fcall_alloc(c, 0); + + if (!req->tc) { + err = -ENOMEM; + goto reterr; + } + req->tc->tag = req->tag; + /* marshall the data */ - p9pdu_prepare(req->tc, tag, type); va_start(ap, fmt); + p9pdu_prepare(req->tc, tag, type); err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap); - va_end(ap); p9pdu_finalize(req->tc); + va_end(ap); + if (err<0 && !tc) { + /* the reason for the failure may be that the small buffer + was too small. Try with the large one */ + p9_fcall_free(c, req->tc); + req->tc = p9_fcall_alloc(c, c->msize); + if (!req->tc) { + err = -ENOMEM; + goto reterr; + } + va_start(ap, fmt); + p9pdu_prepare(req->tc, tag, type); + err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap); + p9pdu_finalize(req->tc); + va_end(ap); + } + + /* we'll let the transport to allocate the appropriate response buffer + in case it is not specified by the caller */ + req->rc = rc; err = c->trans_mod->request(c, req); if (err < 0) { c->status = Disconnected; @@ -683,15 +837,15 @@ int p9_client_version(struct p9_client *c) switch (c->proto_version) { case p9_proto_2000L: - req = p9_client_rpc(c, P9_TVERSION, "ds", + req = p9_client_rpc(c, P9_TVERSION, NULL, NULL, "ds", c->msize, "9P2000.L"); break; case p9_proto_2000u: - req = p9_client_rpc(c, P9_TVERSION, "ds", + req = p9_client_rpc(c, P9_TVERSION, NULL, NULL, "ds", c->msize, "9P2000.u"); break; case p9_proto_legacy: - req = p9_client_rpc(c, P9_TVERSION, "ds", + req = p9_client_rpc(c, P9_TVERSION, NULL, NULL, "ds", c->msize, "9P2000"); break; default: @@ -721,8 +875,10 @@ int p9_client_version(struct p9_client *c) goto error; } - if (msize < c->msize) + if (msize < c->msize) { c->msize = msize; + c->maxsgcount = c->msize/PAGE_SIZE + 2; + } error: kfree(version); @@ -746,6 +902,10 @@ struct p9_client *p9_client_create(const char *dev_name, char *options) clnt->trans = NULL; spin_lock_init(&clnt->lock); INIT_LIST_HEAD(&clnt->fidlist); + clnt->sbcount = 0; + INIT_LIST_HEAD(&clnt->sblist); + clnt->lbcount = 0; + INIT_LIST_HEAD(&clnt->lblist); p9_tag_init(clnt); @@ -780,6 +940,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options) if ((clnt->msize+P9_IOHDRSZ) > clnt->trans_mod->maxsize) clnt->msize = clnt->trans_mod->maxsize-P9_IOHDRSZ; + clnt->maxsgcount = clnt->msize/PAGE_SIZE + 2; err = p9_client_version(clnt); if (err) goto close_trans; @@ -856,7 +1017,7 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid, goto error; } - req = p9_client_rpc(clnt, P9_TATTACH, "ddss?d", fid->fid, + req = p9_client_rpc(clnt, P9_TATTACH, NULL, NULL, "ddss?d", fid->fid, afid ? afid->fid : P9_NOFID, uname, aname, n_uname); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -905,7 +1066,7 @@ p9_client_auth(struct p9_client *clnt, char *uname, u32 n_uname, char *aname) goto error; } - req = p9_client_rpc(clnt, P9_TAUTH, "dss?d", + req = p9_client_rpc(clnt, P9_TAUTH, NULL, NULL, "dss?d", afid ? afid->fid : P9_NOFID, uname, aname, n_uname); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -964,7 +1125,7 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, int nwname, char **wnames, P9_DPRINTK(P9_DEBUG_9P, ">>> TWALK fids %d,%d nwname %d wname[0] %s\n", oldfid->fid, fid->fid, nwname, wnames ? wnames[0] : NULL); - req = p9_client_rpc(clnt, P9_TWALK, "ddT", oldfid->fid, fid->fid, + req = p9_client_rpc(clnt, P9_TWALK, NULL, NULL, "ddT", oldfid->fid, fid->fid, nwname, wnames); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -1030,9 +1191,9 @@ int p9_client_open(struct p9_fid *fid, int mode) return -EINVAL; if (p9_is_proto_dotl(clnt)) - req = p9_client_rpc(clnt, P9_TLOPEN, "dd", fid->fid, mode); + req = p9_client_rpc(clnt, P9_TLOPEN, NULL, NULL, "dd", fid->fid, mode); else - req = p9_client_rpc(clnt, P9_TOPEN, "db", fid->fid, mode); + req = p9_client_rpc(clnt, P9_TOPEN, NULL, NULL, "db", fid->fid, mode); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1074,8 +1235,8 @@ int p9_client_create_dotl(struct p9_fid *ofid, char *name, u32 flags, u32 mode, if (ofid->mode != -1) return -EINVAL; - req = p9_client_rpc(clnt, P9_TLCREATE, "dsddd", ofid->fid, name, flags, - mode, gid); + req = p9_client_rpc(clnt, P9_TLCREATE, NULL, NULL, "dsddd", ofid->fid, name, + flags, mode, gid); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1119,7 +1280,7 @@ int p9_client_fcreate(struct p9_fid *fid, char *name, u32 perm, int mode, if (fid->mode != -1) return -EINVAL; - req = p9_client_rpc(clnt, P9_TCREATE, "dsdb?s", fid->fid, name, perm, + req = p9_client_rpc(clnt, P9_TCREATE, NULL, NULL, "dsdb?s", fid->fid, name, perm, mode, extension); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -1158,8 +1319,8 @@ int p9_client_symlink(struct p9_fid *dfid, char *name, char *symtgt, gid_t gid, dfid->fid, name, symtgt); clnt = dfid->clnt; - req = p9_client_rpc(clnt, P9_TSYMLINK, "dssd", dfid->fid, name, symtgt, - gid); + req = p9_client_rpc(clnt, P9_TSYMLINK, NULL, NULL, "dssd", dfid->fid, + name, symtgt, gid); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1189,8 +1350,8 @@ int p9_client_link(struct p9_fid *dfid, struct p9_fid *oldfid, char *newname) P9_DPRINTK(P9_DEBUG_9P, ">>> TLINK dfid %d oldfid %d newname %s\n", dfid->fid, oldfid->fid, newname); clnt = dfid->clnt; - req = p9_client_rpc(clnt, P9_TLINK, "dds", dfid->fid, oldfid->fid, - newname); + req = p9_client_rpc(clnt, P9_TLINK, NULL, NULL, "dds", dfid->fid, + oldfid->fid, newname); if (IS_ERR(req)) return PTR_ERR(req); @@ -1210,7 +1371,7 @@ int p9_client_clunk(struct p9_fid *fid) err = 0; clnt = fid->clnt; - req = p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid); + req = p9_client_rpc(clnt, P9_TCLUNK, NULL, NULL, "d", fid->fid); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1236,7 +1397,7 @@ int p9_client_remove(struct p9_fid *fid) err = 0; clnt = fid->clnt; - req = p9_client_rpc(clnt, P9_TREMOVE, "d", fid->fid); + req = p9_client_rpc(clnt, P9_TREMOVE, NULL, NULL, "d", fid->fid); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1251,54 +1412,148 @@ error: } EXPORT_SYMBOL(p9_client_remove); -int -p9_client_read(struct p9_fid *fid, char *data, char __user *udata, u64 offset, - u32 count) +int p9_sg_prepare(struct p9_fcall *fc, int sz0, char *data, + const char __user *udata, u32 count, int rw) { - int err, rsize, total; + int i, m, err, off; + struct page *pages[64]; + + err = 0; + fc->sgcount = min(count/PAGE_SIZE + 3, + (fc->capacity - P9_FCALL_MINSZ) / sizeof(struct scatterlist)); + fc->sg = (struct scatterlist *) &fc->buf[P9_FCALL_MINSZ]; + sg_init_table(fc->sg, fc->sgcount); + sg_set_buf(&fc->sg[0], fc->buf, sz0); + + if (udata) { + fc->sgcount = min(fc->sgcount, (int)(ARRAY_SIZE(pages) + 1)); + err = get_user_pages_fast((unsigned long) udata, fc->sgcount-1, + rw, pages); + if (err < 0) + goto error; + + off = ((unsigned long) udata) & ~PAGE_MASK; + for(i = 0; count > 0; off = 0, i++) { + m = min((int)(PAGE_SIZE - off), (int) count); + sg_set_page(&fc->sg[i+1], pages[i], m, off); + count -= m; + } + } else { + off = ((unsigned long) data) & ~PAGE_MASK; + for(i = 0; count > 0; off = 0, i++) { + m = min((int)(PAGE_SIZE - off), (int) count); + sg_set_buf(&fc->sg[i+1], data, m); + count -= m; + data += m; + } + } + + sg_mark_end(&fc->sg[i]); + fc->sgcount = i+1; + fc->capacity = P9_FCALL_MINSZ; + return 0; + +error: + fc->sg = NULL; + fc->sgcount = 0; + return err; +} + +void p9_sg_release(struct p9_client *c, struct p9_fcall *fc, char *data, + const char __user *udata) +{ + struct scatterlist *sg; + struct page *p; + + if (!fc || !fc->sg) + return; + + if (udata) { + /* fc->sg[0] is for the header and pointing to kernel space */ + for(sg=&fc->sg[1]; sg != NULL; sg = sg_next(sg)) { + p = sg_page(sg); + kunmap(p); + put_page(p); + } + } + + fc->sgcount = 0; + fc->sg = NULL; + fc->capacity = 256 + c->maxsgcount*sizeof(struct scatterlist); +} + +int p9_client_read(struct p9_fid *fid, char *data, char __user *udata, + u64 offset, u32 count) +{ + int err, sgsupport; struct p9_client *clnt; struct p9_req_t *req; + struct p9_fcall *rc; char *dataptr; P9_DPRINTK(P9_DEBUG_9P, ">>> TREAD fid %d offset %llu %d\n", fid->fid, (long long unsigned) offset, count); + rc = NULL; err = 0; clnt = fid->clnt; - total = 0; - - rsize = fid->iounit; - if (!rsize || rsize > clnt->msize-P9_IOHDRSZ) - rsize = clnt->msize - P9_IOHDRSZ; + if (count > clnt->msize-P9_IOHDRSZ) + count = clnt->msize-P9_IOHDRSZ; + + if (fid->iounit>0 && count>fid->iounit) + count = fid->iounit; + + sgsupport = p9_trans_sg_support(clnt->trans_mod); + if (sgsupport) { + rc = p9_fcall_alloc(clnt, 0); + if (!rc) + return -ENOMEM; + + /* don't use scatterlist for small buffers */ + if (count >= 256) { + err = p9_sg_prepare(rc, P9_RREAD_HDRSZ, data, udata, + count, 1); + if (err < 0) { + p9_fcall_free(clnt, rc); + return err; + } + } + } - if (count < rsize) - rsize = count; + req = p9_client_rpc(clnt, P9_TREAD, NULL, rc, "dqd", fid->fid, offset, + count); - req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; } - err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr); + P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count); + + rc = req->rc; + if (rc->sg) + err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count); + else + err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr); + if (err) { p9pdu_dump(1, req->rc); goto free_and_error; } - P9_DPRINTK(P9_DEBUG_9P, "<<< RREAD count %d\n", count); - - if (data) { - memmove(data, dataptr, count); - } + if (!rc->sg) { + if (data) + memmove(data, dataptr, count); - if (udata) { - err = copy_to_user(udata, dataptr, count); - if (err) { - err = -EFAULT; - goto free_and_error; + if (udata) { + err = copy_to_user(udata, dataptr, count); + if (err) { + err = -EFAULT; + goto free_and_error; + } } } + p9_sg_release(clnt, rc, data, udata); p9_free_req(clnt, req); return count; @@ -1313,28 +1568,54 @@ int p9_client_write(struct p9_fid *fid, char *data, const char __user *udata, u64 offset, u32 count) { - int err, rsize, total; + int err, sgsupport; struct p9_client *clnt; struct p9_req_t *req; + struct p9_fcall *tc; P9_DPRINTK(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu count %d\n", fid->fid, (long long unsigned) offset, count); + tc = NULL; err = 0; clnt = fid->clnt; - total = 0; - rsize = fid->iounit; - if (!rsize || rsize > clnt->msize-P9_IOHDRSZ) - rsize = clnt->msize - P9_IOHDRSZ; + if (count > clnt->msize-P9_IOHDRSZ) + count = clnt->msize-P9_IOHDRSZ; + + if (fid->iounit>0 && count>fid->iounit) + count = fid->iounit; + + sgsupport = p9_trans_sg_support(clnt->trans_mod); + if (sgsupport) { + tc = p9_fcall_alloc(clnt, 0); + if (!tc) + return -ENOMEM; + + /* don't use scatterlist for small buffers */ + if (count >= 256) { + err = p9_sg_prepare(tc, P9_TWRITE_HDRSZ, data, udata, + count, 0); + if (err < 0) { + p9_fcall_free(clnt, tc); + return err; + } + } + } + + if (tc && tc->sg) { + /* the data is already put in the appropriate place in sg, + just put the fid, the offset and the count */ + req = p9_client_rpc(clnt, P9_TWRITE, tc, NULL, "dqd", fid->fid, + offset, count); + } else { + if (data) + req = p9_client_rpc(clnt, P9_TWRITE, NULL, NULL, "dqD", + fid->fid, offset, count, data); + else + req = p9_client_rpc(clnt, P9_TWRITE, NULL, NULL, "dqU", + fid->fid, offset, count, udata); + } - if (count < rsize) - rsize = count; - if (data) - req = p9_client_rpc(clnt, P9_TWRITE, "dqD", fid->fid, offset, - rsize, data); - else - req = p9_client_rpc(clnt, P9_TWRITE, "dqU", fid->fid, offset, - rsize, udata); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1348,6 +1629,7 @@ p9_client_write(struct p9_fid *fid, char *data, const char __user *udata, P9_DPRINTK(P9_DEBUG_9P, "<<< RWRITE count %d\n", count); + p9_sg_release(clnt, req->tc, data, udata); p9_free_req(clnt, req); return count; @@ -1374,7 +1656,7 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid) err = 0; clnt = fid->clnt; - req = p9_client_rpc(clnt, P9_TSTAT, "d", fid->fid); + req = p9_client_rpc(clnt, P9_TSTAT, NULL, NULL, "d", fid->fid); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1425,7 +1707,8 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid, err = 0; clnt = fid->clnt; - req = p9_client_rpc(clnt, P9_TGETATTR, "dq", fid->fid, request_mask); + req = p9_client_rpc(clnt, P9_TGETATTR, NULL, NULL, "dq", fid->fid, + request_mask); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1516,7 +1799,8 @@ int p9_client_wstat(struct p9_fid *fid, struct p9_wstat *wst) wst->name, wst->uid, wst->gid, wst->muid, wst->extension, wst->n_uid, wst->n_gid, wst->n_muid); - req = p9_client_rpc(clnt, P9_TWSTAT, "dwS", fid->fid, wst->size+2, wst); + req = p9_client_rpc(clnt, P9_TWSTAT, NULL, NULL, "dwS", fid->fid, + wst->size+2, wst); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1547,7 +1831,8 @@ int p9_client_setattr(struct p9_fid *fid, struct p9_iattr_dotl *p9attr) p9attr->size, p9attr->atime_sec, p9attr->atime_nsec, p9attr->mtime_sec, p9attr->mtime_nsec); - req = p9_client_rpc(clnt, P9_TSETATTR, "dI", fid->fid, p9attr); + req = p9_client_rpc(clnt, P9_TSETATTR, NULL, NULL, "dI", fid->fid, + p9attr); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -1571,7 +1856,7 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb) P9_DPRINTK(P9_DEBUG_9P, ">>> TSTATFS fid %d\n", fid->fid); - req = p9_client_rpc(clnt, P9_TSTATFS, "d", fid->fid); + req = p9_client_rpc(clnt, P9_TSTATFS, NULL, NULL, "d", fid->fid); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1611,7 +1896,7 @@ int p9_client_rename(struct p9_fid *fid, struct p9_fid *newdirfid, char *name) P9_DPRINTK(P9_DEBUG_9P, ">>> TRENAME fid %d newdirfid %d name %s\n", fid->fid, newdirfid->fid, name); - req = p9_client_rpc(clnt, P9_TRENAME, "dds", fid->fid, + req = p9_client_rpc(clnt, P9_TRENAME, NULL, NULL, "dds", fid->fid, newdirfid->fid, name); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -1649,7 +1934,7 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid, ">>> TXATTRWALK file_fid %d, attr_fid %d name %s\n", file_fid->fid, attr_fid->fid, attr_name); - req = p9_client_rpc(clnt, P9_TXATTRWALK, "dds", + req = p9_client_rpc(clnt, P9_TXATTRWALK, NULL, NULL, "dds", file_fid->fid, attr_fid->fid, attr_name); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -1688,7 +1973,7 @@ int p9_client_xattrcreate(struct p9_fid *fid, const char *name, fid->fid, name, (long long)attr_size, flags); err = 0; clnt = fid->clnt; - req = p9_client_rpc(clnt, P9_TXATTRCREATE, "dsqd", + req = p9_client_rpc(clnt, P9_TXATTRCREATE, NULL, NULL, "dsqd", fid->fid, name, attr_size, flags); if (IS_ERR(req)) { err = PTR_ERR(req); @@ -1722,7 +2007,8 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset) if (count < rsize) rsize = count; - req = p9_client_rpc(clnt, P9_TREADDIR, "dqd", fid->fid, offset, rsize); + req = p9_client_rpc(clnt, P9_TREADDIR, NULL, NULL, "dqd", fid->fid, + offset, rsize); if (IS_ERR(req)) { err = PTR_ERR(req); goto error; @@ -1760,8 +2046,8 @@ int p9_client_mknod_dotl(struct p9_fid *fid, char *name, int mode, clnt = fid->clnt; P9_DPRINTK(P9_DEBUG_9P, ">>> TMKNOD fid %d name %s mode %d major %d " "minor %d\n", fid->fid, name, mode, MAJOR(rdev), MINOR(rdev)); - req = p9_client_rpc(clnt, P9_TMKNOD, "dsdddd", fid->fid, name, mode, - MAJOR(rdev), MINOR(rdev), gid); + req = p9_client_rpc(clnt, P9_TMKNOD, NULL, NULL, "dsdddd", fid->fid, + name, mode, MAJOR(rdev), MINOR(rdev), gid); if (IS_ERR(req)) return PTR_ERR(req); @@ -1791,8 +2077,8 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, char *name, int mode, clnt = fid->clnt; P9_DPRINTK(P9_DEBUG_9P, ">>> TMKDIR fid %d name %s mode %d gid %d\n", fid->fid, name, mode, gid); - req = p9_client_rpc(clnt, P9_TMKDIR, "dsdd", fid->fid, name, mode, - gid); + req = p9_client_rpc(clnt, P9_TMKDIR, NULL, NULL, "dsdd", fid->fid, + name, mode, gid); if (IS_ERR(req)) return PTR_ERR(req); diff --git a/net/9p/protocol.c b/net/9p/protocol.c index 3acd3af..36ab7bf 100644 --- a/net/9p/protocol.c +++ b/net/9p/protocol.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include "protocol.h" @@ -60,7 +61,7 @@ void p9pdu_dump(int way, struct p9_fcall *pdu) { int i, n; - u8 *data = pdu->sdata; + u8 *data = pdu->buf; int datalen = pdu->size; char buf[255]; int buflen = 255; @@ -104,30 +105,38 @@ EXPORT_SYMBOL(p9stat_free); static size_t pdu_read(struct p9_fcall *pdu, void *data, size_t size) { - size_t len = MIN(pdu->size - pdu->offset, size); - memcpy(data, &pdu->sdata[pdu->offset], len); - pdu->offset += len; - return size - len; + if (pdu->size-pdu->offset < size) + return 1; + + memcpy(data, &pdu->buf[pdu->offset], size); + pdu->offset += size; + return 0; } static size_t pdu_write(struct p9_fcall *pdu, const void *data, size_t size) { - size_t len = MIN(pdu->capacity - pdu->size, size); - memcpy(&pdu->sdata[pdu->size], data, len); - pdu->size += len; - return size - len; + if (pdu->capacity-pdu->offset < size) + return 1; + + memcpy(&pdu->buf[pdu->offset], data, size); + pdu->offset += size; + return 0; } static size_t pdu_write_u(struct p9_fcall *pdu, const char __user *udata, size_t size) { - size_t len = MIN(pdu->capacity - pdu->size, size); - int err = copy_from_user(&pdu->sdata[pdu->size], udata, len); + int err; + + if (pdu->capacity-pdu->offset < size) + return 1; + + err = copy_from_user(&pdu->buf[pdu->offset], udata, size); if (err) - printk(KERN_WARNING "pdu_write_u returning: %d\n", err); + return 1; - pdu->size += len; - return size - len; + pdu->offset += size; + return 0; } /* @@ -259,7 +268,7 @@ p9pdu_vreadf(struct p9_fcall *pdu, int proto_version, const char *fmt, *count = MIN(*count, pdu->size - pdu->offset); - *data = &pdu->sdata[pdu->offset]; + *data = &pdu->buf[pdu->offset]; } } break; @@ -582,7 +591,7 @@ int p9stat_read(char *buf, int len, struct p9_wstat *st, int proto_version) fake_pdu.size = len; fake_pdu.capacity = len; - fake_pdu.sdata = buf; + fake_pdu.buf = buf; fake_pdu.offset = 0; ret = p9pdu_readf(&fake_pdu, proto_version, "S", st); @@ -597,16 +606,25 @@ EXPORT_SYMBOL(p9stat_read); int p9pdu_prepare(struct p9_fcall *pdu, int16_t tag, int8_t type) { + pdu->id = type; + pdu->offset = 0; return p9pdu_writef(pdu, 0, "dbw", 0, type, tag); } int p9pdu_finalize(struct p9_fcall *pdu) { - int size = pdu->size; + int size; int err; + struct scatterlist *sg; - pdu->size = 0; + size = pdu->offset; + if (pdu->sg) { + for(sg = &pdu->sg[1]; sg != NULL; sg = sg_next(sg)) + size += sg->length; + } + pdu->offset = 0; err = p9pdu_writef(pdu, 0, "d", size); + pdu->offset = size; pdu->size = size; #ifdef CONFIG_NET_9P_DEBUG @@ -635,7 +653,7 @@ int p9dirent_read(char *buf, int len, struct p9_dirent *dirent, fake_pdu.size = len; fake_pdu.capacity = len; - fake_pdu.sdata = buf; + fake_pdu.buf = buf; fake_pdu.offset = 0; ret = p9pdu_readf(&fake_pdu, proto_version, "Qqbs", &dirent->qid, diff --git a/net/9p/protocol.h b/net/9p/protocol.h index 2431c0f..09da1a7 100644 --- a/net/9p/protocol.h +++ b/net/9p/protocol.h @@ -32,3 +32,6 @@ int p9pdu_prepare(struct p9_fcall *pdu, int16_t tag, int8_t type); int p9pdu_finalize(struct p9_fcall *pdu); void p9pdu_dump(int, struct p9_fcall *); void p9pdu_reset(struct p9_fcall *pdu); +int p9pdu_prepare_sg(struct p9_fcall *pdu, int32_t hdrsz, int32_t count, + int rw, const void __user *data); +void p9pdu_free_sg(struct p9_fcall *pdu); diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index c85109d..9843376 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -347,16 +347,15 @@ static void p9_read_work(struct work_struct *work) goto error; } - if (m->req->rc == NULL) { - m->req->rc = kmalloc(sizeof(struct p9_fcall) + - m->client->msize, GFP_KERNEL); - if (!m->req->rc) { - m->req = NULL; - err = -ENOMEM; - goto error; - } + /* trans_fd doesn't support scatterlists, rc can't e non-NULL */ + BUG_ON(m->req->rc != NULL); + m->req->rc = p9_fcall_alloc(m->client, n); + if (!m->req->rc) { + m->req = NULL; + err = -ENOMEM; + goto error; } - m->rbuf = (char *)m->req->rc + sizeof(struct p9_fcall); + m->rbuf = (char *)m->req->rc->buf; memcpy(m->rbuf, m->tmp_buf, m->rsize); m->rsize = n; } @@ -364,6 +363,7 @@ static void p9_read_work(struct work_struct *work) /* not an else because some packets (like clunk) have no payload */ if ((m->req) && (m->rpos == m->rsize)) { /* packet is read in */ P9_DPRINTK(P9_DEBUG_TRANS, "got new packet\n"); + m->req->rc->size = m->rsize; spin_lock(&m->client->lock); if (m->req->status != REQ_STATUS_ERROR) m->req->status = REQ_STATUS_RCVD; @@ -462,7 +462,7 @@ static void p9_write_work(struct work_struct *work) P9_DPRINTK(P9_DEBUG_TRANS, "move req %p\n", req); list_move_tail(&req->req_list, &m->req_list); - m->wbuf = req->tc->sdata; + m->wbuf = req->tc->buf; m->wsize = req->tc->size; m->wpos = 0; spin_unlock(&m->client->lock); diff --git a/net/9p/util.c b/net/9p/util.c index e048701..f8c35f8 100644 --- a/net/9p/util.c +++ b/net/9p/util.c @@ -66,7 +66,7 @@ struct p9_idpool *p9_idpool_create(void) EXPORT_SYMBOL(p9_idpool_create); /** - * p9_idpool_destroy - create a new per-connection id pool + * p9_idpool_destroy - destroy the per-connection id pool * @p: idpool to destory */