Message ID | 87obmym8jv.fsf@rustcorp.com.au (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Il 30/07/2012 01:50, Rusty Russell ha scritto: >> Also, being the first user of chained scatterlist doesn't exactly give >> me warm fuzzies. > > We're far from the first user: they've been in the kernel for well over > 7 years. They were introduced for the block layer, but they tended to > ignore private uses of scatterlists like this one. Yeah, but sg_chain has no users in drivers, only a private one in lib/scatterlist.c. The internal API could be changed to something else and leave virtio-scsi screwed... > Yes, we should do this. But note that this means an iteration, so we > might as well combine the loops :) I'm really bad at posting pseudo-code, but you can count the number of physically-contiguous entries at the beginning of the list only. So if everything is contiguous, you use a single non-indirect buffer and save a kmalloc. If you use indirect buffers, I suspect it's much less effective to collapse physically-contiguous entries. More elaborate heuristics do need a loop, though. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/30/2012 10:12 AM, Paolo Bonzini wrote: > Il 30/07/2012 01:50, Rusty Russell ha scritto: >>> Also, being the first user of chained scatterlist doesn't exactly give >>> me warm fuzzies. >> >> We're far from the first user: they've been in the kernel for well over >> 7 years. They were introduced for the block layer, but they tended to >> ignore private uses of scatterlists like this one. > > Yeah, but sg_chain has no users in drivers, only a private one in > lib/scatterlist.c. The internal API could be changed to something else > and leave virtio-scsi screwed... > >> Yes, we should do this. But note that this means an iteration, so we >> might as well combine the loops :) > > I'm really bad at posting pseudo-code, but you can count the number of > physically-contiguous entries at the beginning of the list only. So if > everything is contiguous, you use a single non-indirect buffer and save > a kmalloc. If you use indirect buffers, I suspect it's much less > effective to collapse physically-contiguous entries. More elaborate > heuristics do need a loop, though. > [All the below with a grain of salt, from my senile memory] You must not forget some facts about the scatterlist received here at the LLD. It has already been DMA mapped and locked by the generic layer. Which means that the DMA engine has already collapsed physically-contiguous entries. Those you get here are already unique physically. (There were bugs in the past, where this was not true, please complain if you find them again) A scatterlist is two different lists taking the same space, but with two different length. - One list is the PAGE pointers plus offset && length, which is bigger or equal to the 2nd list. The end marker corresponds to this list. This list is the input into the DMA engine. - Second list is the physical DMA addresses list. With their physical-lengths. Offset is not needed because it is incorporated in the DMA address. This list is the output from the DMA engine. The reason 2nd list is shorter is because the DMA engine tries to minimize the physical scatter-list entries which is usually a limited HW resource. This list might follow chains but it's end is determined by the received sg_count from the DMA engine, not by the end marker. At the time my opinion, and I think Rusty agreed, was that the scatterlist should be split in two. The input page-ptr list is just the BIO, and the output of the DMA-engine should just be the physical part of the sg_list, as a separate parameter. But all this was berried under too much APIs and the noise was two strong, for any single brave sole. So I'd just trust blindly the returned sg_count from the DMA engine, it is already optimized. I THINK > Paolo Boaz -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -63,6 +63,8 @@ struct vring_virtqueue /* Host supports indirect buffers */ bool indirect; + /* Threshold before we go indirect. */ + unsigned int indirect_threshold; /* Number of free buffers */ unsigned int num_free; @@ -139,6 +141,32 @@ static int vring_add_indirect(struct vri return head; } +static void adjust_threshold(struct vring_virtqueue *vq, + unsigned int out, unsigned int in) +{ + /* There are really two species of virtqueue, and it matters here. + * If there are no output parts, it's a "normally full" receive queue, + * otherwise it's a "normally empty" send queue. */ + if (out) { + /* Leave threshold unless we're full. */ + if (out + in < vq->num_free) + return; + } else { + /* Leave threshold unless we're empty. */ + if (vq->num_free != vq->vring.num) + return; + } + + /* Never drop threshold below 1 */ + vq->indirect_threshold /= 2; + vq->indirect_threshold |= 1; + + printk("%s %s: indirect threshold %u (%u+%u vs %u)\n", + dev_name(&vq->vq.vdev->dev), + vq->vq.name, vq->indirect_threshold, + out, in, vq->num_free); +} + static int vring_add_buf(struct virtqueue *_vq, struct scatterlist sg[], unsigned int out, @@ -151,19 +179,33 @@ static int vring_add_buf(struct virtqueu START_USE(vq); BUG_ON(data == NULL); - - /* If the host supports indirect descriptor tables, and we have multiple - * buffers, then go indirect. FIXME: tune this threshold */ - if (vq->indirect && (out + in) > 1 && vq->num_free) { - head = vring_add_indirect(vq, sg, out, in); - if (head != vq->vring.num) - goto add_head; - } - BUG_ON(out + in > vq->vring.num); BUG_ON(out + in == 0); vq->addbuf_total++; + + /* If the host supports indirect descriptor tables, consider it. */ + if (vq->indirect) { + bool try_indirect; + + /* We tweak the threshold automatically. */ + adjust_threshold(vq, out, in); + + /* If we can't fit any at all, fall through. */ + if (vq->num_free == 0) + try_indirect = false; + else if (out + in > vq->num_free) + try_indirect = true; + else + try_indirect = (out + in > vq->indirect_threshold); + + if (try_indirect) { + head = vring_add_indirect(vq, sg, out, in); + if (head != vq->vring.num) + goto add_head; + } + } + if (vq->num_free < out + in) { pr_debug("Can't add buf len %i - avail = %i\n", out + in, vq->num_free); @@ -401,6 +443,7 @@ struct virtqueue *vring_new_virtqueue(un = vq->other_notify = 0; vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC); + vq->indirect_threshold = num; /* No callback? Tell other side not to bother us. */ if (!callback)