Message ID | 20190130102604.14496-1-javier@javigon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V2] lightnvm: pblk: prevent stall due to wb threshold | expand |
On 1/30/19 11:26 AM, Javier González wrote: > In order to respect mw_cuinits, pblk's write buffer maintains a > backpointer to protect data not yet persisted; when writing to the write > buffer, this backpointer defines a threshold that pblk's rate-limiter > enforces. > > On small PU configurations, the following scenarios might take place: (i) > the threshold is larger than the write buffer and (ii) the threshold is > smaller than the write buffer, but larger than the maximun allowed > split bio - 256KB at this moment (Note that writes are not always > split - we only do this when we the size of the buffer is smaller > than the buffer). In both cases, pblk's rate-limiter prevents the I/O to > be written to the buffer, thus stalling. > > This patch fixes the original backpointer implementation by considering > the threshold both on buffer creation and on the rate-limiters path, > when bio_split is triggered (case (ii) above). > > Fixes: 766c8ceb16fc ("lightnvm: pblk: guarantee that backpointer is respected on writer stall") > Signed-off-by: Javier González <javier@javigon.com> > --- > > Changes since V1: > - Fix a bad arithmetinc on the rate-limiter max_io calculation (from > Hans) > > drivers/lightnvm/pblk-rb.c | 25 +++++++++++++++++++------ > drivers/lightnvm/pblk-rl.c | 5 ++--- > drivers/lightnvm/pblk.h | 2 +- > 3 files changed, 22 insertions(+), 10 deletions(-) > > diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c > index d4ca8c64ee0f..a6133b50ed9c 100644 > --- a/drivers/lightnvm/pblk-rb.c > +++ b/drivers/lightnvm/pblk-rb.c > @@ -45,10 +45,23 @@ void pblk_rb_free(struct pblk_rb *rb) > /* > * pblk_rb_calculate_size -- calculate the size of the write buffer > */ > -static unsigned int pblk_rb_calculate_size(unsigned int nr_entries) > +static unsigned int pblk_rb_calculate_size(unsigned int nr_entries, > + unsigned int threshold) > { > - /* Alloc a write buffer that can at least fit 128 entries */ > - return (1 << max(get_count_order(nr_entries), 7)); > + unsigned int thr_sz = 1 << (get_count_order(threshold + NVM_MAX_VLBA)); > + unsigned int max_sz = max(thr_sz, nr_entries); > + unsigned int max_io; > + > + /* Alloc a write buffer that can (i) fit at least two split bios > + * (considering max I/O size NVM_MAX_VLBA, and (ii) guarantee that the > + * threshold will be respected > + */ > + max_io = (1 << max((int)(get_count_order(max_sz)), > + (int)(get_count_order(NVM_MAX_VLBA << 1)))); > + if ((threshold + NVM_MAX_VLBA) >= max_io) > + max_io <<= 1; > + > + return max_io; > } > > /* > @@ -67,12 +80,12 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, > unsigned int alloc_order, order, iter; > unsigned int nr_entries; > > - nr_entries = pblk_rb_calculate_size(size); > + nr_entries = pblk_rb_calculate_size(size, threshold); > entries = vzalloc(array_size(nr_entries, sizeof(struct pblk_rb_entry))); > if (!entries) > return -ENOMEM; > > - power_size = get_count_order(size); > + power_size = get_count_order(nr_entries); > power_seg_sz = get_count_order(seg_size); > > down_write(&pblk_rb_lock); > @@ -149,7 +162,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, > * Initialize rate-limiter, which controls access to the write buffer > * by user and GC I/O > */ > - pblk_rl_init(&pblk->rl, rb->nr_entries); > + pblk_rl_init(&pblk->rl, rb->nr_entries, threshold); > > return 0; > } > diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c > index 76116d5f78e4..e9e0af0df165 100644 > --- a/drivers/lightnvm/pblk-rl.c > +++ b/drivers/lightnvm/pblk-rl.c > @@ -207,7 +207,7 @@ void pblk_rl_free(struct pblk_rl *rl) > del_timer(&rl->u_timer); > } > > -void pblk_rl_init(struct pblk_rl *rl, int budget) > +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold) > { > struct pblk *pblk = container_of(rl, struct pblk, rl); > struct nvm_tgt_dev *dev = pblk->dev; > @@ -217,7 +217,6 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) > int sec_meta, blk_meta; > unsigned int rb_windows; > > - > /* Consider sectors used for metadata */ > sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines; > blk_meta = DIV_ROUND_UP(sec_meta, geo->clba); > @@ -234,7 +233,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) > /* To start with, all buffer is available to user I/O writers */ > rl->rb_budget = budget; > rl->rb_user_max = budget; > - rl->rb_max_io = budget >> 1; > + rl->rb_max_io = budget - threshold; > rl->rb_gc_max = 0; > rl->rb_state = PBLK_RL_HIGH; > > diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h > index 72ae8755764e..a6386d5acd73 100644 > --- a/drivers/lightnvm/pblk.h > +++ b/drivers/lightnvm/pblk.h > @@ -924,7 +924,7 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force); > /* > * pblk rate limiter > */ > -void pblk_rl_init(struct pblk_rl *rl, int budget); > +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold); > void pblk_rl_free(struct pblk_rl *rl); > void pblk_rl_update_rates(struct pblk_rl *rl); > int pblk_rl_high_thrs(struct pblk_rl *rl); > Thanks Javier. Applied for 5.1.
Hi Javier! How did you test this? I'm trying to add a test case to our testing framework. This is what i ran in qemu, and I got a hang (with this version of the patch) nvme lnvm create -d nvme0n1 -t pblk -n pblk0 -f -b 0 -e 0 kernel log: [ 116.381799] pblk pblk0: luns:1, lines:280, secs:212736, buf entries:128 # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=4k count=1 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000480941 s, 8.5 MB/s # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=64k count=1 1+0 records in 1+0 records out 65536 bytes (66 kB, 64 KiB) copied, 0.000477373 s, 137 MB/s # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=128k count=1 1+0 records in 1+0 records out 131072 bytes (131 kB, 128 KiB) copied, 0.000548722 s, 239 MB/s # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=256k count=1 1+0 records in 1+0 records out 262144 bytes (262 kB, 256 KiB) copied, 0.000718515 s, 365 MB/s # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=512k count=1 <HANG> On Wed, Jan 30, 2019 at 11:28 AM Javier González <javier@javigon.com> wrote: > > In order to respect mw_cuinits, pblk's write buffer maintains a > backpointer to protect data not yet persisted; when writing to the write > buffer, this backpointer defines a threshold that pblk's rate-limiter > enforces. > > On small PU configurations, the following scenarios might take place: (i) > the threshold is larger than the write buffer and (ii) the threshold is > smaller than the write buffer, but larger than the maximun allowed > split bio - 256KB at this moment (Note that writes are not always > split - we only do this when we the size of the buffer is smaller > than the buffer). In both cases, pblk's rate-limiter prevents the I/O to > be written to the buffer, thus stalling. > > This patch fixes the original backpointer implementation by considering > the threshold both on buffer creation and on the rate-limiters path, > when bio_split is triggered (case (ii) above). > > Fixes: 766c8ceb16fc ("lightnvm: pblk: guarantee that backpointer is respected on writer stall") > Signed-off-by: Javier González <javier@javigon.com> > --- > > Changes since V1: > - Fix a bad arithmetinc on the rate-limiter max_io calculation (from > Hans) > > drivers/lightnvm/pblk-rb.c | 25 +++++++++++++++++++------ > drivers/lightnvm/pblk-rl.c | 5 ++--- > drivers/lightnvm/pblk.h | 2 +- > 3 files changed, 22 insertions(+), 10 deletions(-) > > diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c > index d4ca8c64ee0f..a6133b50ed9c 100644 > --- a/drivers/lightnvm/pblk-rb.c > +++ b/drivers/lightnvm/pblk-rb.c > @@ -45,10 +45,23 @@ void pblk_rb_free(struct pblk_rb *rb) > /* > * pblk_rb_calculate_size -- calculate the size of the write buffer > */ > -static unsigned int pblk_rb_calculate_size(unsigned int nr_entries) > +static unsigned int pblk_rb_calculate_size(unsigned int nr_entries, > + unsigned int threshold) > { > - /* Alloc a write buffer that can at least fit 128 entries */ > - return (1 << max(get_count_order(nr_entries), 7)); > + unsigned int thr_sz = 1 << (get_count_order(threshold + NVM_MAX_VLBA)); > + unsigned int max_sz = max(thr_sz, nr_entries); > + unsigned int max_io; > + > + /* Alloc a write buffer that can (i) fit at least two split bios > + * (considering max I/O size NVM_MAX_VLBA, and (ii) guarantee that the > + * threshold will be respected > + */ > + max_io = (1 << max((int)(get_count_order(max_sz)), > + (int)(get_count_order(NVM_MAX_VLBA << 1)))); > + if ((threshold + NVM_MAX_VLBA) >= max_io) > + max_io <<= 1; > + > + return max_io; > } > > /* > @@ -67,12 +80,12 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, > unsigned int alloc_order, order, iter; > unsigned int nr_entries; > > - nr_entries = pblk_rb_calculate_size(size); > + nr_entries = pblk_rb_calculate_size(size, threshold); > entries = vzalloc(array_size(nr_entries, sizeof(struct pblk_rb_entry))); > if (!entries) > return -ENOMEM; > > - power_size = get_count_order(size); > + power_size = get_count_order(nr_entries); > power_seg_sz = get_count_order(seg_size); > > down_write(&pblk_rb_lock); > @@ -149,7 +162,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, > * Initialize rate-limiter, which controls access to the write buffer > * by user and GC I/O > */ > - pblk_rl_init(&pblk->rl, rb->nr_entries); > + pblk_rl_init(&pblk->rl, rb->nr_entries, threshold); > > return 0; > } > diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c > index 76116d5f78e4..e9e0af0df165 100644 > --- a/drivers/lightnvm/pblk-rl.c > +++ b/drivers/lightnvm/pblk-rl.c > @@ -207,7 +207,7 @@ void pblk_rl_free(struct pblk_rl *rl) > del_timer(&rl->u_timer); > } > > -void pblk_rl_init(struct pblk_rl *rl, int budget) > +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold) > { > struct pblk *pblk = container_of(rl, struct pblk, rl); > struct nvm_tgt_dev *dev = pblk->dev; > @@ -217,7 +217,6 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) > int sec_meta, blk_meta; > unsigned int rb_windows; > > - > /* Consider sectors used for metadata */ > sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines; > blk_meta = DIV_ROUND_UP(sec_meta, geo->clba); > @@ -234,7 +233,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) > /* To start with, all buffer is available to user I/O writers */ > rl->rb_budget = budget; > rl->rb_user_max = budget; > - rl->rb_max_io = budget >> 1; > + rl->rb_max_io = budget - threshold; > rl->rb_gc_max = 0; > rl->rb_state = PBLK_RL_HIGH; > > diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h > index 72ae8755764e..a6386d5acd73 100644 > --- a/drivers/lightnvm/pblk.h > +++ b/drivers/lightnvm/pblk.h > @@ -924,7 +924,7 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force); > /* > * pblk rate limiter > */ > -void pblk_rl_init(struct pblk_rl *rl, int budget); > +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold); > void pblk_rl_free(struct pblk_rl *rl); > void pblk_rl_update_rates(struct pblk_rl *rl); > int pblk_rl_high_thrs(struct pblk_rl *rl); > -- > 2.17.1 >
> On 31 Jan 2019, at 11.41, Hans Holmberg <hans@owltronix.com> wrote: > > Hi Javier! > > How did you test this? I'm trying to add a test case to our testing framework. > > This is what i ran in qemu, and I got a hang (with this version of the patch) > > nvme lnvm create -d nvme0n1 -t pblk -n pblk0 -f -b 0 -e 0 I run several low configurations without problem. Can you share the qemu configuration and version? I’m on travel until Friday - I’ll come back to you over the weekend. > > kernel log: [ 116.381799] pblk pblk0: luns:1, lines:280, secs:212736, > buf entries:128 > > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=4k count=1 > 1+0 records in > 1+0 records out > 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000480941 s, 8.5 MB/s > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=64k count=1 > 1+0 records in > 1+0 records out > 65536 bytes (66 kB, 64 KiB) copied, 0.000477373 s, 137 MB/s > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=128k count=1 > 1+0 records in > 1+0 records out > 131072 bytes (131 kB, 128 KiB) copied, 0.000548722 s, 239 MB/s > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=256k count=1 > 1+0 records in > 1+0 records out > 262144 bytes (262 kB, 256 KiB) copied, 0.000718515 s, 365 MB/s > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=512k count=1 > <HANG> > > >> On Wed, Jan 30, 2019 at 11:28 AM Javier González <javier@javigon.com> wrote: >> >> In order to respect mw_cuinits, pblk's write buffer maintains a >> backpointer to protect data not yet persisted; when writing to the write >> buffer, this backpointer defines a threshold that pblk's rate-limiter >> enforces. >> >> On small PU configurations, the following scenarios might take place: (i) >> the threshold is larger than the write buffer and (ii) the threshold is >> smaller than the write buffer, but larger than the maximun allowed >> split bio - 256KB at this moment (Note that writes are not always >> split - we only do this when we the size of the buffer is smaller >> than the buffer). In both cases, pblk's rate-limiter prevents the I/O to >> be written to the buffer, thus stalling. >> >> This patch fixes the original backpointer implementation by considering >> the threshold both on buffer creation and on the rate-limiters path, >> when bio_split is triggered (case (ii) above). >> >> Fixes: 766c8ceb16fc ("lightnvm: pblk: guarantee that backpointer is respected on writer stall") >> Signed-off-by: Javier González <javier@javigon.com> >> --- >> >> Changes since V1: >> - Fix a bad arithmetinc on the rate-limiter max_io calculation (from >> Hans) >> >> drivers/lightnvm/pblk-rb.c | 25 +++++++++++++++++++------ >> drivers/lightnvm/pblk-rl.c | 5 ++--- >> drivers/lightnvm/pblk.h | 2 +- >> 3 files changed, 22 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c >> index d4ca8c64ee0f..a6133b50ed9c 100644 >> --- a/drivers/lightnvm/pblk-rb.c >> +++ b/drivers/lightnvm/pblk-rb.c >> @@ -45,10 +45,23 @@ void pblk_rb_free(struct pblk_rb *rb) >> /* >> * pblk_rb_calculate_size -- calculate the size of the write buffer >> */ >> -static unsigned int pblk_rb_calculate_size(unsigned int nr_entries) >> +static unsigned int pblk_rb_calculate_size(unsigned int nr_entries, >> + unsigned int threshold) >> { >> - /* Alloc a write buffer that can at least fit 128 entries */ >> - return (1 << max(get_count_order(nr_entries), 7)); >> + unsigned int thr_sz = 1 << (get_count_order(threshold + NVM_MAX_VLBA)); >> + unsigned int max_sz = max(thr_sz, nr_entries); >> + unsigned int max_io; >> + >> + /* Alloc a write buffer that can (i) fit at least two split bios >> + * (considering max I/O size NVM_MAX_VLBA, and (ii) guarantee that the >> + * threshold will be respected >> + */ >> + max_io = (1 << max((int)(get_count_order(max_sz)), >> + (int)(get_count_order(NVM_MAX_VLBA << 1)))); >> + if ((threshold + NVM_MAX_VLBA) >= max_io) >> + max_io <<= 1; >> + >> + return max_io; >> } >> >> /* >> @@ -67,12 +80,12 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, >> unsigned int alloc_order, order, iter; >> unsigned int nr_entries; >> >> - nr_entries = pblk_rb_calculate_size(size); >> + nr_entries = pblk_rb_calculate_size(size, threshold); >> entries = vzalloc(array_size(nr_entries, sizeof(struct pblk_rb_entry))); >> if (!entries) >> return -ENOMEM; >> >> - power_size = get_count_order(size); >> + power_size = get_count_order(nr_entries); >> power_seg_sz = get_count_order(seg_size); >> >> down_write(&pblk_rb_lock); >> @@ -149,7 +162,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, >> * Initialize rate-limiter, which controls access to the write buffer >> * by user and GC I/O >> */ >> - pblk_rl_init(&pblk->rl, rb->nr_entries); >> + pblk_rl_init(&pblk->rl, rb->nr_entries, threshold); >> >> return 0; >> } >> diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c >> index 76116d5f78e4..e9e0af0df165 100644 >> --- a/drivers/lightnvm/pblk-rl.c >> +++ b/drivers/lightnvm/pblk-rl.c >> @@ -207,7 +207,7 @@ void pblk_rl_free(struct pblk_rl *rl) >> del_timer(&rl->u_timer); >> } >> >> -void pblk_rl_init(struct pblk_rl *rl, int budget) >> +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold) >> { >> struct pblk *pblk = container_of(rl, struct pblk, rl); >> struct nvm_tgt_dev *dev = pblk->dev; >> @@ -217,7 +217,6 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) >> int sec_meta, blk_meta; >> unsigned int rb_windows; >> >> - >> /* Consider sectors used for metadata */ >> sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines; >> blk_meta = DIV_ROUND_UP(sec_meta, geo->clba); >> @@ -234,7 +233,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) >> /* To start with, all buffer is available to user I/O writers */ >> rl->rb_budget = budget; >> rl->rb_user_max = budget; >> - rl->rb_max_io = budget >> 1; >> + rl->rb_max_io = budget - threshold; >> rl->rb_gc_max = 0; >> rl->rb_state = PBLK_RL_HIGH; >> >> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h >> index 72ae8755764e..a6386d5acd73 100644 >> --- a/drivers/lightnvm/pblk.h >> +++ b/drivers/lightnvm/pblk.h >> @@ -924,7 +924,7 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force); >> /* >> * pblk rate limiter >> */ >> -void pblk_rl_init(struct pblk_rl *rl, int budget); >> +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold); >> void pblk_rl_free(struct pblk_rl *rl); >> void pblk_rl_update_rates(struct pblk_rl *rl); >> int pblk_rl_high_thrs(struct pblk_rl *rl); >> -- >> 2.17.1 >>
On Thu, Jan 31, 2019 at 5:33 PM Javier González <javier@javigon.com> wrote: > > > > > On 31 Jan 2019, at 11.41, Hans Holmberg <hans@owltronix.com> wrote: > > > > Hi Javier! > > > > How did you test this? I'm trying to add a test case to our testing framework. > > > > This is what i ran in qemu, and I got a hang (with this version of the patch) > > > > nvme lnvm create -d nvme0n1 -t pblk -n pblk0 -f -b 0 -e 0 > > I run several low configurations without problem. Can you share the qemu configuration and version? > Of course! qemu remote: https://github.com/CNEX-Labs/qemu-nvme.git branch: master (cb200e3ccf9c1ff21f6275c6cb68b2801135a640) My geometry: parallel units: 8, secs per chk: 768, meta size 16, ws min 12, ws opt 24, cunits=0 > I’m on travel until Friday - I’ll come back to you over the weekend. Oh, no worries. This patch did not introduce the issue, so it's not a regression, but it looks like the same type of hang as the patch addresses. I might have stumbled across another bug. One thing I noticed was that the dd write block size that triggered the hang is >= write buffer. (128 entries * 4k) Safe travels! > > > > > kernel log: [ 116.381799] pblk pblk0: luns:1, lines:280, secs:212736, > > buf entries:128 > > > > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=4k count=1 > > 1+0 records in > > 1+0 records out > > 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000480941 s, 8.5 MB/s > > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=64k count=1 > > 1+0 records in > > 1+0 records out > > 65536 bytes (66 kB, 64 KiB) copied, 0.000477373 s, 137 MB/s > > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=128k count=1 > > 1+0 records in > > 1+0 records out > > 131072 bytes (131 kB, 128 KiB) copied, 0.000548722 s, 239 MB/s > > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=256k count=1 > > 1+0 records in > > 1+0 records out > > 262144 bytes (262 kB, 256 KiB) copied, 0.000718515 s, 365 MB/s > > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=512k count=1 > > <HANG> > > > > > > > >> On Wed, Jan 30, 2019 at 11:28 AM Javier González <javier@javigon.com> wrote: > >> > >> In order to respect mw_cuinits, pblk's write buffer maintains a > >> backpointer to protect data not yet persisted; when writing to the write > >> buffer, this backpointer defines a threshold that pblk's rate-limiter > >> enforces. > >> > >> On small PU configurations, the following scenarios might take place: (i) > >> the threshold is larger than the write buffer and (ii) the threshold is > >> smaller than the write buffer, but larger than the maximun allowed > >> split bio - 256KB at this moment (Note that writes are not always > >> split - we only do this when we the size of the buffer is smaller > >> than the buffer). In both cases, pblk's rate-limiter prevents the I/O to > >> be written to the buffer, thus stalling. > >> > >> This patch fixes the original backpointer implementation by considering > >> the threshold both on buffer creation and on the rate-limiters path, > >> when bio_split is triggered (case (ii) above). > >> > >> Fixes: 766c8ceb16fc ("lightnvm: pblk: guarantee that backpointer is respected on writer stall") > >> Signed-off-by: Javier González <javier@javigon.com> > >> --- > >> > >> Changes since V1: > >> - Fix a bad arithmetinc on the rate-limiter max_io calculation (from > >> Hans) > >> > >> drivers/lightnvm/pblk-rb.c | 25 +++++++++++++++++++------ > >> drivers/lightnvm/pblk-rl.c | 5 ++--- > >> drivers/lightnvm/pblk.h | 2 +- > >> 3 files changed, 22 insertions(+), 10 deletions(-) > >> > >> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c > >> index d4ca8c64ee0f..a6133b50ed9c 100644 > >> --- a/drivers/lightnvm/pblk-rb.c > >> +++ b/drivers/lightnvm/pblk-rb.c > >> @@ -45,10 +45,23 @@ void pblk_rb_free(struct pblk_rb *rb) > >> /* > >> * pblk_rb_calculate_size -- calculate the size of the write buffer > >> */ > >> -static unsigned int pblk_rb_calculate_size(unsigned int nr_entries) > >> +static unsigned int pblk_rb_calculate_size(unsigned int nr_entries, > >> + unsigned int threshold) > >> { > >> - /* Alloc a write buffer that can at least fit 128 entries */ > >> - return (1 << max(get_count_order(nr_entries), 7)); > >> + unsigned int thr_sz = 1 << (get_count_order(threshold + NVM_MAX_VLBA)); > >> + unsigned int max_sz = max(thr_sz, nr_entries); > >> + unsigned int max_io; > >> + > >> + /* Alloc a write buffer that can (i) fit at least two split bios > >> + * (considering max I/O size NVM_MAX_VLBA, and (ii) guarantee that the > >> + * threshold will be respected > >> + */ > >> + max_io = (1 << max((int)(get_count_order(max_sz)), > >> + (int)(get_count_order(NVM_MAX_VLBA << 1)))); > >> + if ((threshold + NVM_MAX_VLBA) >= max_io) > >> + max_io <<= 1; > >> + > >> + return max_io; > >> } > >> > >> /* > >> @@ -67,12 +80,12 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, > >> unsigned int alloc_order, order, iter; > >> unsigned int nr_entries; > >> > >> - nr_entries = pblk_rb_calculate_size(size); > >> + nr_entries = pblk_rb_calculate_size(size, threshold); > >> entries = vzalloc(array_size(nr_entries, sizeof(struct pblk_rb_entry))); > >> if (!entries) > >> return -ENOMEM; > >> > >> - power_size = get_count_order(size); > >> + power_size = get_count_order(nr_entries); > >> power_seg_sz = get_count_order(seg_size); > >> > >> down_write(&pblk_rb_lock); > >> @@ -149,7 +162,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, > >> * Initialize rate-limiter, which controls access to the write buffer > >> * by user and GC I/O > >> */ > >> - pblk_rl_init(&pblk->rl, rb->nr_entries); > >> + pblk_rl_init(&pblk->rl, rb->nr_entries, threshold); > >> > >> return 0; > >> } > >> diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c > >> index 76116d5f78e4..e9e0af0df165 100644 > >> --- a/drivers/lightnvm/pblk-rl.c > >> +++ b/drivers/lightnvm/pblk-rl.c > >> @@ -207,7 +207,7 @@ void pblk_rl_free(struct pblk_rl *rl) > >> del_timer(&rl->u_timer); > >> } > >> > >> -void pblk_rl_init(struct pblk_rl *rl, int budget) > >> +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold) > >> { > >> struct pblk *pblk = container_of(rl, struct pblk, rl); > >> struct nvm_tgt_dev *dev = pblk->dev; > >> @@ -217,7 +217,6 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) > >> int sec_meta, blk_meta; > >> unsigned int rb_windows; > >> > >> - > >> /* Consider sectors used for metadata */ > >> sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines; > >> blk_meta = DIV_ROUND_UP(sec_meta, geo->clba); > >> @@ -234,7 +233,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) > >> /* To start with, all buffer is available to user I/O writers */ > >> rl->rb_budget = budget; > >> rl->rb_user_max = budget; > >> - rl->rb_max_io = budget >> 1; > >> + rl->rb_max_io = budget - threshold; > >> rl->rb_gc_max = 0; > >> rl->rb_state = PBLK_RL_HIGH; > >> > >> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h > >> index 72ae8755764e..a6386d5acd73 100644 > >> --- a/drivers/lightnvm/pblk.h > >> +++ b/drivers/lightnvm/pblk.h > >> @@ -924,7 +924,7 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force); > >> /* > >> * pblk rate limiter > >> */ > >> -void pblk_rl_init(struct pblk_rl *rl, int budget); > >> +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold); > >> void pblk_rl_free(struct pblk_rl *rl); > >> void pblk_rl_update_rates(struct pblk_rl *rl); > >> int pblk_rl_high_thrs(struct pblk_rl *rl); > >> -- > >> 2.17.1 > >>
> On 31 Jan 2019, at 21.10, Hans Holmberg <hans@owltronix.com> wrote: > > On Thu, Jan 31, 2019 at 5:33 PM Javier González <javier@javigon.com> wrote: >>> On 31 Jan 2019, at 11.41, Hans Holmberg <hans@owltronix.com> wrote: >>> >>> Hi Javier! >>> >>> How did you test this? I'm trying to add a test case to our testing framework. >>> >>> This is what i ran in qemu, and I got a hang (with this version of the patch) >>> >>> nvme lnvm create -d nvme0n1 -t pblk -n pblk0 -f -b 0 -e 0 >> >> I run several low configurations without problem. Can you share the qemu configuration and version? > > Of course! > > qemu remote: https://github.com/CNEX-Labs/qemu-nvme.git > branch: master (cb200e3ccf9c1ff21f6275c6cb68b2801135a640) > > My geometry: > parallel units: 8, secs per chk: 768, meta size 16, ws min 12, ws opt > 24, cunits=0 You are right, lmw_cunits=0 gives the problem. What about the following? If it goes through cijoe, I'll send a V3. diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c index d4ca8c64ee0f..a6133b50ed9c 100644 --- a/drivers/lightnvm/pblk-rb.c +++ b/drivers/lightnvm/pblk-rb.c @@ -45,10 +45,23 @@ void pblk_rb_free(struct pblk_rb *rb) /* * pblk_rb_calculate_size -- calculate the size of the write buffer */ -static unsigned int pblk_rb_calculate_size(unsigned int nr_entries) +static unsigned int pblk_rb_calculate_size(unsigned int nr_entries, + unsigned int threshold) { - /* Alloc a write buffer that can at least fit 128 entries */ - return (1 << max(get_count_order(nr_entries), 7)); + unsigned int thr_sz = 1 << (get_count_order(threshold + NVM_MAX_VLBA)); + unsigned int max_sz = max(thr_sz, nr_entries); + unsigned int max_io; + + /* Alloc a write buffer that can (i) fit at least two split bios + * (considering max I/O size NVM_MAX_VLBA, and (ii) guarantee that the + * threshold will be respected + */ + max_io = (1 << max((int)(get_count_order(max_sz)), + (int)(get_count_order(NVM_MAX_VLBA << 1)))); + if ((threshold + NVM_MAX_VLBA) >= max_io) + max_io <<= 1; + + return max_io; } /* @@ -67,12 +80,12 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, unsigned int alloc_order, order, iter; unsigned int nr_entries; - nr_entries = pblk_rb_calculate_size(size); + nr_entries = pblk_rb_calculate_size(size, threshold); entries = vzalloc(array_size(nr_entries, sizeof(struct pblk_rb_entry))); if (!entries) return -ENOMEM; - power_size = get_count_order(size); + power_size = get_count_order(nr_entries); power_seg_sz = get_count_order(seg_size); down_write(&pblk_rb_lock); @@ -149,7 +162,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, * Initialize rate-limiter, which controls access to the write buffer * by user and GC I/O */ - pblk_rl_init(&pblk->rl, rb->nr_entries); + pblk_rl_init(&pblk->rl, rb->nr_entries, threshold); return 0; } diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c index 76116d5f78e4..b014957dde0b 100644 --- a/drivers/lightnvm/pblk-rl.c +++ b/drivers/lightnvm/pblk-rl.c @@ -207,7 +207,7 @@ void pblk_rl_free(struct pblk_rl *rl) del_timer(&rl->u_timer); } -void pblk_rl_init(struct pblk_rl *rl, int budget) +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold) { struct pblk *pblk = container_of(rl, struct pblk, rl); struct nvm_tgt_dev *dev = pblk->dev; @@ -217,7 +217,6 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) int sec_meta, blk_meta; unsigned int rb_windows; - /* Consider sectors used for metadata */ sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines; blk_meta = DIV_ROUND_UP(sec_meta, geo->clba); @@ -234,7 +233,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) /* To start with, all buffer is available to user I/O writers */ rl->rb_budget = budget; rl->rb_user_max = budget; - rl->rb_max_io = budget >> 1; + rl->rb_max_io = threshold ? (budget - threshold) : (budget - 1); rl->rb_gc_max = 0; rl->rb_state = PBLK_RL_HIGH; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index 72ae8755764e..a6386d5acd73 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -924,7 +924,7 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force); /* * pblk rate limiter */ -void pblk_rl_init(struct pblk_rl *rl, int budget); +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold); void pblk_rl_free(struct pblk_rl *rl); void pblk_rl_update_rates(struct pblk_rl *rl); int pblk_rl_high_thrs(struct pblk_rl *rl); -- 2.17.1
On Mon, Feb 4, 2019 at 9:14 AM Javier González <javier@javigon.com> wrote: > > > > On 31 Jan 2019, at 21.10, Hans Holmberg <hans@owltronix.com> wrote: > > > > On Thu, Jan 31, 2019 at 5:33 PM Javier González <javier@javigon.com> wrote: > >>> On 31 Jan 2019, at 11.41, Hans Holmberg <hans@owltronix.com> wrote: > >>> > >>> Hi Javier! > >>> > >>> How did you test this? I'm trying to add a test case to our testing framework. > >>> > >>> This is what i ran in qemu, and I got a hang (with this version of the patch) > >>> > >>> nvme lnvm create -d nvme0n1 -t pblk -n pblk0 -f -b 0 -e 0 > >> > >> I run several low configurations without problem. Can you share the qemu configuration and version? > > > > Of course! > > > > qemu remote: https://github.com/CNEX-Labs/qemu-nvme.git > > branch: master (cb200e3ccf9c1ff21f6275c6cb68b2801135a640) > > > > My geometry: > > parallel units: 8, secs per chk: 768, meta size 16, ws min 12, ws opt > > 24, cunits=0 > > You are right, lmw_cunits=0 gives the problem. > > What about the following? If it goes through cijoe, I'll send a V3. Please do send a V3 if it passes your testing. Thanks! Hans > > diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c > index d4ca8c64ee0f..a6133b50ed9c 100644 > --- a/drivers/lightnvm/pblk-rb.c > +++ b/drivers/lightnvm/pblk-rb.c > @@ -45,10 +45,23 @@ void pblk_rb_free(struct pblk_rb *rb) > /* > * pblk_rb_calculate_size -- calculate the size of the write buffer > */ > -static unsigned int pblk_rb_calculate_size(unsigned int nr_entries) > +static unsigned int pblk_rb_calculate_size(unsigned int nr_entries, > + unsigned int threshold) > { > - /* Alloc a write buffer that can at least fit 128 entries */ > - return (1 << max(get_count_order(nr_entries), 7)); > + unsigned int thr_sz = 1 << (get_count_order(threshold + NVM_MAX_VLBA)); > + unsigned int max_sz = max(thr_sz, nr_entries); > + unsigned int max_io; > + > + /* Alloc a write buffer that can (i) fit at least two split bios > + * (considering max I/O size NVM_MAX_VLBA, and (ii) guarantee that the > + * threshold will be respected > + */ > + max_io = (1 << max((int)(get_count_order(max_sz)), > + (int)(get_count_order(NVM_MAX_VLBA << 1)))); > + if ((threshold + NVM_MAX_VLBA) >= max_io) > + max_io <<= 1; > + > + return max_io; > } > > /* > @@ -67,12 +80,12 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, > unsigned int alloc_order, order, iter; > unsigned int nr_entries; > > - nr_entries = pblk_rb_calculate_size(size); > + nr_entries = pblk_rb_calculate_size(size, threshold); > entries = vzalloc(array_size(nr_entries, sizeof(struct pblk_rb_entry))); > if (!entries) > return -ENOMEM; > > - power_size = get_count_order(size); > + power_size = get_count_order(nr_entries); > power_seg_sz = get_count_order(seg_size); > > down_write(&pblk_rb_lock); > @@ -149,7 +162,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, > * Initialize rate-limiter, which controls access to the write buffer > * by user and GC I/O > */ > - pblk_rl_init(&pblk->rl, rb->nr_entries); > + pblk_rl_init(&pblk->rl, rb->nr_entries, threshold); > > return 0; > } > diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c > index 76116d5f78e4..b014957dde0b 100644 > --- a/drivers/lightnvm/pblk-rl.c > +++ b/drivers/lightnvm/pblk-rl.c > @@ -207,7 +207,7 @@ void pblk_rl_free(struct pblk_rl *rl) > del_timer(&rl->u_timer); > } > > -void pblk_rl_init(struct pblk_rl *rl, int budget) > +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold) > { > struct pblk *pblk = container_of(rl, struct pblk, rl); > struct nvm_tgt_dev *dev = pblk->dev; > @@ -217,7 +217,6 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) > int sec_meta, blk_meta; > unsigned int rb_windows; > > - > /* Consider sectors used for metadata */ > sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines; > blk_meta = DIV_ROUND_UP(sec_meta, geo->clba); > @@ -234,7 +233,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) > /* To start with, all buffer is available to user I/O writers */ > rl->rb_budget = budget; > rl->rb_user_max = budget; > - rl->rb_max_io = budget >> 1; > + rl->rb_max_io = threshold ? (budget - threshold) : (budget - 1); > rl->rb_gc_max = 0; > rl->rb_state = PBLK_RL_HIGH; > > diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h > index 72ae8755764e..a6386d5acd73 100644 > --- a/drivers/lightnvm/pblk.h > +++ b/drivers/lightnvm/pblk.h > @@ -924,7 +924,7 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force); > /* > * pblk rate limiter > */ > -void pblk_rl_init(struct pblk_rl *rl, int budget); > +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold); > void pblk_rl_free(struct pblk_rl *rl); > void pblk_rl_update_rates(struct pblk_rl *rl); > int pblk_rl_high_thrs(struct pblk_rl *rl); > -- > 2.17.1 >
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c index d4ca8c64ee0f..a6133b50ed9c 100644 --- a/drivers/lightnvm/pblk-rb.c +++ b/drivers/lightnvm/pblk-rb.c @@ -45,10 +45,23 @@ void pblk_rb_free(struct pblk_rb *rb) /* * pblk_rb_calculate_size -- calculate the size of the write buffer */ -static unsigned int pblk_rb_calculate_size(unsigned int nr_entries) +static unsigned int pblk_rb_calculate_size(unsigned int nr_entries, + unsigned int threshold) { - /* Alloc a write buffer that can at least fit 128 entries */ - return (1 << max(get_count_order(nr_entries), 7)); + unsigned int thr_sz = 1 << (get_count_order(threshold + NVM_MAX_VLBA)); + unsigned int max_sz = max(thr_sz, nr_entries); + unsigned int max_io; + + /* Alloc a write buffer that can (i) fit at least two split bios + * (considering max I/O size NVM_MAX_VLBA, and (ii) guarantee that the + * threshold will be respected + */ + max_io = (1 << max((int)(get_count_order(max_sz)), + (int)(get_count_order(NVM_MAX_VLBA << 1)))); + if ((threshold + NVM_MAX_VLBA) >= max_io) + max_io <<= 1; + + return max_io; } /* @@ -67,12 +80,12 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, unsigned int alloc_order, order, iter; unsigned int nr_entries; - nr_entries = pblk_rb_calculate_size(size); + nr_entries = pblk_rb_calculate_size(size, threshold); entries = vzalloc(array_size(nr_entries, sizeof(struct pblk_rb_entry))); if (!entries) return -ENOMEM; - power_size = get_count_order(size); + power_size = get_count_order(nr_entries); power_seg_sz = get_count_order(seg_size); down_write(&pblk_rb_lock); @@ -149,7 +162,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold, * Initialize rate-limiter, which controls access to the write buffer * by user and GC I/O */ - pblk_rl_init(&pblk->rl, rb->nr_entries); + pblk_rl_init(&pblk->rl, rb->nr_entries, threshold); return 0; } diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c index 76116d5f78e4..e9e0af0df165 100644 --- a/drivers/lightnvm/pblk-rl.c +++ b/drivers/lightnvm/pblk-rl.c @@ -207,7 +207,7 @@ void pblk_rl_free(struct pblk_rl *rl) del_timer(&rl->u_timer); } -void pblk_rl_init(struct pblk_rl *rl, int budget) +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold) { struct pblk *pblk = container_of(rl, struct pblk, rl); struct nvm_tgt_dev *dev = pblk->dev; @@ -217,7 +217,6 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) int sec_meta, blk_meta; unsigned int rb_windows; - /* Consider sectors used for metadata */ sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines; blk_meta = DIV_ROUND_UP(sec_meta, geo->clba); @@ -234,7 +233,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget) /* To start with, all buffer is available to user I/O writers */ rl->rb_budget = budget; rl->rb_user_max = budget; - rl->rb_max_io = budget >> 1; + rl->rb_max_io = budget - threshold; rl->rb_gc_max = 0; rl->rb_state = PBLK_RL_HIGH; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index 72ae8755764e..a6386d5acd73 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -924,7 +924,7 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force); /* * pblk rate limiter */ -void pblk_rl_init(struct pblk_rl *rl, int budget); +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold); void pblk_rl_free(struct pblk_rl *rl); void pblk_rl_update_rates(struct pblk_rl *rl); int pblk_rl_high_thrs(struct pblk_rl *rl);
In order to respect mw_cuinits, pblk's write buffer maintains a backpointer to protect data not yet persisted; when writing to the write buffer, this backpointer defines a threshold that pblk's rate-limiter enforces. On small PU configurations, the following scenarios might take place: (i) the threshold is larger than the write buffer and (ii) the threshold is smaller than the write buffer, but larger than the maximun allowed split bio - 256KB at this moment (Note that writes are not always split - we only do this when we the size of the buffer is smaller than the buffer). In both cases, pblk's rate-limiter prevents the I/O to be written to the buffer, thus stalling. This patch fixes the original backpointer implementation by considering the threshold both on buffer creation and on the rate-limiters path, when bio_split is triggered (case (ii) above). Fixes: 766c8ceb16fc ("lightnvm: pblk: guarantee that backpointer is respected on writer stall") Signed-off-by: Javier González <javier@javigon.com> --- Changes since V1: - Fix a bad arithmetinc on the rate-limiter max_io calculation (from Hans) drivers/lightnvm/pblk-rb.c | 25 +++++++++++++++++++------ drivers/lightnvm/pblk-rl.c | 5 ++--- drivers/lightnvm/pblk.h | 2 +- 3 files changed, 22 insertions(+), 10 deletions(-)