Commit Graph

870769 Commits

Author SHA1 Message Date
Connor O'Brien
1a4353af5d cpufreq: schedutil: fix check for stale utilization values
Part of the fix from commit d86ab9cff8 ("cpufreq: schedutil: use now
as reference when aggregating shared policy requests") is reversed in
commit 05d2ca242067 ("cpufreq: schedutil: Ignore CPU load older than
WALT window size") due to a porting mistake. Restore it while keeping
the relevant change from the latter patch.

Bug: 117438867
Bug: 144961676
Test: build & boot
Change-Id: I21399be760d7c8e2fff6c158368a285dc6261647
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
2022-11-12 11:24:59 +00:00
Miguel de Dios
8399535f1b sched: core: Disable double lock/unlock balance in move_queued_task()
CONFIG_LOCK_STAT shows warnings in move_queued_task() for releasing a
pinned lock. The warnings are due to the calls to
double_unlock_balance() added to snapshot WALT. Lets disable them if
not building with SCHED_WALT.

Bug: 123720375
Bug: 148940637
Change-Id: I8bff8550c4f79ca535556f6ec626f17ff5fce637
Signed-off-by: Miguel de Dios <migueldedios@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
2022-11-12 11:24:59 +00:00
Miguel de Dios
96c3f64cd4 sched: fair: Disable double lock/unlock balance in detach_task()
CONFIG_LOCK_STAT shows warnings in detach_task() for releasing a
pinned lock. The warnings are due to the calls to
double_unlock_balance() added to snapshot WALT. Lets disable them if
not building with SCHED_WALT.

Bug: 123720375
Bug: 148940637
Change-Id: Ibfa28b1434fa6006fa0117fd2df1a3eadb321568
Signed-off-by: Miguel de Dios <migueldedios@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
2022-11-12 11:24:58 +00:00
Rick Yiu
7de89e1d8b sched/fair: apply sync wake-up to pure CFS path
Since CONFIG_SCHED_WALT is disabled, we need another way to boost
perf as sched_boost does, and skipping EAS has similar effect. We
use powerhal to handle it. Also apply sync wake-up so that pure
CFS path (when skipping EAS) can benefit from it.

(Combine the following two commits
  2d21560126cb sched/fair: apply sync wake-up to pure CFS path
  9917d5335479 sched/fair: refine check for sync wake-up)

Bug: 119932121
Bug: 117438867
Bug: 144961676
Test: boot to home, operation normal
Change-Id: I970852540839881a926b7e7da5f70ef7e0185349
Signed-off-by: Rick Yiu <rickyiu@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
2022-11-12 11:24:58 +00:00
Connor O'Brien
63e23634b7 Revert "sched: fair: Always try to use energy efficient cpu for wakeups"
This reverts commit 63c27502786646271b4c4ba32268b727e294bbb2.

Bug: 117438867
Bug: 144961676
Test: Tracing confirms EAS is no longer always used
Change-Id: If321547a86592527438ac21c3734a9f4decda712
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
2022-11-12 11:24:58 +00:00
Jimmy Shiu
7d9ac85ec1 sched: fair: avoid little cpus due to sync, prev bias
Important threads can get forced to little cpu's
when the sync or prev_bias hints are followed
blindly. This patch adds a check to see whether
those paths are forcing the task to a cpu that
has less capacity than other cpu's available for
the task. If so, we ignore the sync and prev_bias
and allow the scheduler to make a free decision.

Bug: 117438867
Bug: 144961676
Change-Id: Ie5a99f9a8b65ba9382a8d0de2ae0aad843e558d1
Signed-off-by: Miguel de Dios <migueldedios@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
2022-11-12 11:24:58 +00:00
Srinath Sridharan
d7ab758a70 ANDROID: sched: EAS: take cstate into account when selecting idle core
Introduce a new sysctl for this option, 'sched_cstate_aware'.
When this is enabled, select_idle_sibling in CFS is modified to
choose the idle CPU in the sibling group which has the lowest
idle state index - idle state indexes are assumed to increase
as sleep depth and hence wakeup latency increase. In this way,
we attempt to minimise wakeup latency when an idle CPU is
required.

Signed-off-by: Srinath Sridharan <srinathsr@google.com>

Includes:
sched: EAS: fix select_idle_sibling
when sysctl_sched_cstate_aware is enabled, best_idle cpu will not be chosen
in the original flow because it will goto done directly

Bug: 30107557
Bug: 144961676
Change-Id: Ie09c2e3960cafbb976f8d472747faefab3b4d6ac
Signed-off-by: martin_liu <martin_liu@htc.com>
Signed-off-by: Andres Oportus <andresoportus@google.com>
[refactored and fixed conflicts]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
2022-11-12 11:24:58 +00:00
Rick Yiu
d72455859d sched: fix issue of cpu freq running at max always
All cpus are running at max freq, the reason is that in the check of
sugov_up_down_rate_limit() in cpufreq_schedutil.c, the time passed in
is always 0, so the check is always true and hence the freq will not
be updated. It is caused by sched_ktime_clock() will return 0 if
CONFIG_SCHED_WALT is not set. Fix it by replacing sched_ktime_clock()
with ktime_get_ns().

Bug: 119932718
Test: cpu freq could change after fix
Change-Id: I62a0b35208dcd7a1d23da27f909cce3e59208d1f
Signed-off-by: Rick Yiu <rickyiu@google.com>
2022-11-12 11:24:57 +00:00
Jimmy Shiu
dd13056973 sched/fair: Fix compilation issues for !CONFIG_SCHED_WALT
For compilation issues for !CONFIG_SCHED_WALT of the following two
commits:

commit fbd15b6297 ("sched/fair: Avoid force newly idle load balance if have
iowait task")
commit 5250fc5df0 ("sched/fair: Force gold cpus to do idle lb when silver has
big tasks")

Bug: 148766738
Test: build pass and boot to home
Change-Id: Ia2c6ed57d1385a8105bbd7f0aefad6efd7d76c01
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
2022-11-12 11:24:57 +00:00
Wei Wang
06bad29583 drivers: arch_topology: wire up thermal limit for arch_scale_max_freq_capacity
before patch and "echo 50000 > /sys/class/thermal/tz-by-name/sdm-therm/emul_temp"
com.android.uibench.janktests.UiBenchJankTests#testInvalidateTree: PASSED (02m6.247s)
        gfx-avg-slow-ui-thread: 0.07110321338664297
        gfx-avg-missed-vsync: 0.0
        gfx-avg-high-input-latency: 74.25140826299423
        gfx-max-frame-time-50: 12
        gfx-min-total-frames: 2250
        gfx-avg-frame-time-99: 11.8
        gfx-avg-num-frame-deadline-missed: 1.6
        gfx-avg-frame-time-50: 9.6
        gfx-max-high-input-latency: 99.86666666666667
        gfx-avg-frame-time-90: 11.0
        gfx-avg-frame-time-95: 11.0
        gfx-max-frame-time-95: 13
        gfx-max-frame-time-90: 13
        gfx-max-slow-draw: 0.0
        gfx-max-frame-time-99: 13
        gfx-avg-slow-draw: 0.0
        gfx-max-total-frames: 2251
        gfx-avg-jank: 43.678000000000004
        gfx-max-slow-bitmap-uploads: 0.0
        gfx-max-missed-vsync: 0.0
        gfx-avg-total-frames: 2250
        gfx-max-jank: 96.67
        gfx-max-slow-ui-thread: 0.13333333333333333
        gfx-max-num-frame-deadline-missed: 3
        gfx-avg-slow-bitmap-uploads: 0.0

aefore patch and "echo 50000 > /sys/class/thermal/tz-by-name/sdm-therm/emul_temp"
google/perf/jank/UIBench/UIBench (1 Test)
----------------------------------------
[1/1] com.android.uibench.janktests.UiBenchJankTests#testInvalidateTree: PASSED (02m7.027s)
        gfx-avg-slow-ui-thread: 0.0
        gfx-avg-missed-vsync: 0.0
        gfx-avg-high-input-latency: 11.53777777777778
        gfx-max-frame-time-50: 7
        gfx-min-total-frames: 2250
        gfx-avg-frame-time-99: 8.0
        gfx-avg-num-frame-deadline-missed: 0.0
        gfx-avg-frame-time-50: 7.0
        gfx-max-high-input-latency: 41.15555555555556
        gfx-avg-frame-time-90: 7.2
        gfx-avg-frame-time-95: 7.8
        gfx-max-frame-time-95: 8
        gfx-max-frame-time-90: 8
        gfx-max-slow-draw: 0.0
        gfx-max-frame-time-99: 8
        gfx-avg-slow-draw: 0.0
        gfx-max-total-frames: 2250
        gfx-avg-jank: 0.0
        gfx-max-slow-bitmap-uploads: 0.0
        gfx-max-missed-vsync: 0.0
        gfx-avg-total-frames: 2250
        gfx-max-jank: 0.0
        gfx-max-slow-ui-thread: 0.0
        gfx-max-num-frame-deadline-missed: 0
        gfx-avg-slow-bitmap-uploads: 0.0

Bug: 143162654
Test: use emul_temp to change thermal condition and see capacity changed
Change-Id: Idbf943f9c831c288db40d820682583ade3bbf05e
Signed-off-by: Wei Wang <wvw@google.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2022-11-12 11:24:57 +00:00
Wei Wang
5736acecef kernel: sched: fix cpu cpu_capacity_orig being capped incorrectly
update_cpu_capacity will update cpu_capacity_orig capped with
thermal_cap, in non-WALT case, thermal_cap is previous
cpu_capacity_orig. This caused cpu_capacity_orig being capped
incorrectly.

Test: Build
Bug: 144143594
Change-Id: I1ff9d9c87554c2d2395d46b215276b7ab50585c0
Signed-off-by: Wei Wang <wvw@google.com>
(cherry picked from commit dac65a5a494f8d0c80101acc5d482d94cda6f158)
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2022-11-12 11:24:57 +00:00
Pavankumar Kondeti
3d4772825c sched/walt: Fix negative count of sched_asym_cpucapacity static key
The current code sets per-cpu variable sd_asym_cpucapacity while
building sched domains even when there are no asymmetric CPUs.
This is done to make sure that EAS remains enabled on a b.L system
after hotplugging out all big/LITTLE CPUs. However it is causing
the below warning during CPU hotplug.

[13988.932604] pc : static_key_slow_dec_cpuslocked+0xe8/0x150
[13988.932608] lr : static_key_slow_dec_cpuslocked+0xe8/0x150
[13988.932610] sp : ffffffc010333c00
[13988.932612] x29: ffffffc010333c00 x28: ffffff8138d88088
[13988.932615] x27: 0000000000000000 x26: 0000000000000081
[13988.932618] x25: ffffff80917efc80 x24: ffffffc010333c60
[13988.932621] x23: ffffffd32bf09c58 x22: 0000000000000000
[13988.932623] x21: 0000000000000000 x20: ffffff80917efc80
[13988.932626] x19: ffffffd32bf0a3e0 x18: ffffff8138039c38
[13988.932628] x17: ffffffd32bf2b000 x16: 0000000000000050
[13988.932631] x15: 0000000000000050 x14: 0000000000040000
[13988.932633] x13: 0000000000000178 x12: 0000000000000001
[13988.932636] x11: 16a9ca5426841300 x10: 16a9ca5426841300
[13988.932639] x9 : 16a9ca5426841300 x8 : 16a9ca5426841300
[13988.932641] x7 : 0000000000000000 x6 : ffffff813f4edadb
[13988.932643] x5 : 0000000000000000 x4 : 0000000000000004
[13988.932646] x3 : ffffffc010333880 x2 : ffffffd32a683a2c
[13988.932648] x1 : ffffffd329355498 x0 : 000000000000001b
[13988.932651] Call trace:
[13988.932656]  static_key_slow_dec_cpuslocked+0xe8/0x150
[13988.932660]  partition_sched_domains_locked+0x1f8/0x80c
[13988.932666]  sched_cpu_deactivate+0x9c/0x13c
[13988.932670]  cpuhp_invoke_callback+0x6ac/0xa8c
[13988.932675]  cpuhp_thread_fun+0x158/0x1ac
[13988.932678]  smpboot_thread_fn+0x244/0x3e4
[13988.932681]  kthread+0x168/0x178
[13988.932685]  ret_from_fork+0x10/0x18

The mismatch between increment/decrement of sched_asym_cpucapacity
static key is resulting in the above warning. It is due to
the fact that the increment happens only when the system really
has asymmetric capacity CPUs. This check is done by going through
each CPU capacity. So when system becomes SMP during hotplug,
the increment never happens. However the decrement of this static
key is done when any of the currently online CPU has per-cpu variable
sd_asym_cpucapacity value as non-NULL. Since we always set this
variable, we run into this issue.

Our goal was to enable EAS on SMP. To achieve that enable EAS and
build perf domains (required for compute energy) irrespective
of per-cpu variable sd_asym_cpucapacity value. In this way we
no longer have to enable sched_asym_cpucapacity feature on SMP
to enable EAS.

Change-Id: Id46f2b80350b742c75195ad6939b814d4695eb07
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2022-11-12 11:24:57 +00:00
spakkkk
dd87470fed Revert "sched/fair: Add policy for restricting prefer_spread to newly idle balance"
To fix PELT build

This reverts commit f177186646.
2022-11-12 11:24:56 +00:00
spakkkk
202e7d3077 arm64: config: sched changes
*switch to 300Hz timer frequency
*disable cgroup freezer
*disable fair group sched
*disable cpu isolation
*disable WALT
*disable corectl
*disable cpu-boost
*disable msm-performance
*disable sched_autogroup
2022-11-12 11:24:56 +00:00
spakkkk
b297a634f5 arm64: config: disable SLUB per-CPU partial caches 2022-11-12 11:24:56 +00:00
spakkkk
43101fe0ca arm64: config: enable fq_codel queueing discipline 2022-11-12 11:24:56 +00:00
spakkkk
e5f009e15f arm64: config: disable EL2 vector hardening 2022-11-12 11:24:56 +00:00
spakkkk
04e3beff47 arm64: config: disable more unnecessary errata 2022-11-12 11:24:55 +00:00
spakkkk
431e8beb04 arm64: config: disable per-cgroup pressure tracking
Based on: 87124d3f2d
2022-11-12 11:24:55 +00:00
spakkkk
387b5b7cc4 arm64: config: disable EDAC
de0b90de78
2022-11-12 11:24:55 +00:00
momojuro
d3947c5bfd arm64: config: disable IPA unit test
"The IPA_UT (Unit Test) is onky for debugging purposes and it's
meaningless to keep it enabled for production / daily usage." kholk on Github.
2022-11-12 11:24:55 +00:00
spakkkk
18241de031 arm64: config: disable logo 2022-11-12 11:24:54 +00:00
spakkkk
640aed7a38 arm64: config: disable stack frame size warning 2022-11-12 11:24:54 +00:00
spakkkk
df7c65a9f2 arm64: config: add required configs to pass CTS tests mandatory
Based on: 672529f93d
2022-11-12 11:24:54 +00:00
spakkkk
e5dadd9ea0 arm64: config: disable some drivers
b45a3c963c
f123413842
55dfa3ce88
2022-11-12 11:24:54 +00:00
spakkkk
caef7be1c2 arm64: config: disable stability debug config 2022-11-12 11:24:54 +00:00
Danny Lin
71f41132c1 regulator: core: Set more descriptive device names
We have proper names for regulators, so use them instead of meaningless
numbers generated at boot that make it hard to identify regulators
based on their device names alone.

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2022-11-12 11:24:53 +00:00
Danny Lin
0e4c247c72 Suppress overly verbose log spam
It's hard to see anything that's actually useful with so much verbose
spam in the log buffer.

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2022-11-12 11:24:53 +00:00
Panchajanya1999
67d7e0a712 f2fs/sysfs: Introduce a Read-Only attribute macro
Useful when we need to set a node RO to avoid Android over-riding
the custom set values.

Change-Id: Iad8cf81504d55b8ed75e6b5563f7cf397595ec1a
Signed-off-by: Panchajanya1999 <rsk52959@gmail.com>
2022-11-12 11:24:53 +00:00
Panchajanya1999
c6eab8dcb9 f2fs/gc: Reduce GC thread urgent sleep time to 50ms
Android sets the value to 50ms via vold's IdleMaint service. Since
500ms is too long for GC to colllect all invalid segments in time
which results in performance degradation.

On un-encrypted device, vold fails to set this value to 50ms thus
degrades the performance over time.

Based on [1].

[1] https://github.com/topjohnwu/Magisk/pull/5462
Signed-off-by: Panchajanya1999 <rsk52959@gmail.com>
Change-Id: I80f2c29558393d726d5e696aaf285096c8108b23
Signed-off-by: Panchajanya1999 <rsk52959@gmail.com>
2022-11-12 11:24:53 +00:00
Park Ju Hyung
9566261ba1 f2fs: reduce timeout for uncongestion
On high fs utilization, congestion is hit quite frequently and waiting for a
whooping 20ms is too expensive, especially on critical paths.

Reduce it to an amount that is unlikely to affect UI rendering paths.

The new times are as follows:
  100 Hz  => 1 jiffy   (effective: 10 ms)
  250 Hz  => 2 jiffies (effective: 8 ms)
  300 Hz  => 2 jiffies (effective: 6 ms)
  1000 Hz => 6 jiffies (effective: 6 ms)

Co-authored-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: LibXZR <xzr467706992@163.com>
2022-11-12 11:24:53 +00:00
Danny Lin
182804e6fc f2fs: Demote GC thread to idle scheduler class
We don't want the background GC work causing UI jitter should it ever
collide with periods of user activity.

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2022-11-12 11:24:52 +00:00
Park Ju Hyung
78e3f5a4b3 f2fs: set ioprio of GC kthread to idle
GC should run conservatively as possible to reduce latency spikes to the user.

Setting ioprio to idle class will allow the kernel to schedule GC thread's I/O
to not affect any other processes' I/O requests.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2022-11-12 11:24:52 +00:00
Jesse Chan
f64ba4724c f2fs: Enlarge min_fsync_blocks to 20
In OPPO's kernel:
enlarge min_fsync_blocks to optimize performance
  - yanwu@TECH.Storage.FS.oF2FS, 2019/08/12

Huawei is also doing this in their production kernel.

If this optimization is good for them and shipped
with their devices, it should be good for us.

Signed-off-by: Jesse Chan <jc@linux.com>
2022-11-12 11:24:52 +00:00
alk3pInjection
548103b8b3 fs: ext4-f2fs: Remove CAF tracings
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
(cherry picked from commit 7b9cf5ece723a1445ebbd472a1e8063e48a83a02)
Change-Id: I6539321164d96fff98a0180b636456e9cb6e07e4
Signed-off-by: Carlos Ayrton Lopez Arroyo <15030201@itcelaya.edu.mx>
2022-11-12 11:24:52 +00:00
EmanuelCN
c2944a884c arm64: config: enable CONFIG_CRYPTO_LZ4HC 2022-11-12 11:24:52 +00:00
EmanuelCN
6b2081cd80 arm64: config enable EROFS 2022-11-12 11:24:51 +00:00
Hongyu Jin
40afa80d88 erofs: fix use-after-free of on-stack io[]
The root cause is the race as follows:
Thread #1                              Thread #2(irq ctx)

z_erofs_runqueue()
  struct z_erofs_decompressqueue io_A[];
  submit bio A
  z_erofs_decompress_kickoff(,,1)
                                       z_erofs_decompressqueue_endio(bio A)
                                       z_erofs_decompress_kickoff(,,-1)
                                       spin_lock_irqsave()
                                       atomic_add_return()
  io_wait_event()	-> pending_bios is already 0
  [end of function]
                                       wake_up_locked(io_A[]) // crash

Referenced backtrace in kernel 5.4:

[   10.129422] Unable to handle kernel paging request at virtual address eb0454a4
[   10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G        WC O      5.4.147-ab09225 #1
[   11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48)
[   11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0)
[   11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c)
[   11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0)
[   11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc)
[   11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c)
[   11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc)
[   11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c)
[   11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138)
[   11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0)
[   11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4)
[   11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0)

Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Change-Id: Id1df2670db486f487917393c0c9f76c3efc814b1
2022-11-12 11:24:51 +00:00
Yue Hu
4b2b49f86b erofs: remove the fast path of per-CPU buffer decompression
As Xiang mentioned, such path has no real impact to our current
decompression strategy, remove it directly. Also, update the return
value of z_erofs_lz4_decompress() to 0 if success to keep consistent
with LZMA which will return 0 as well for that case.

Link: https://lore.kernel.org/r/20211014065744.1787-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Change-Id: I56883674be2477e1ae462554188f2c8ab44d6d70
2022-11-12 11:24:51 +00:00
Yue Hu
a99a739399 erofs: clear compacted_2b if compacted_4b_initial > totalidx
Currently, the whole indexes will only be compacted 4B if
compacted_4b_initial > totalidx. So, the calculated compacted_2b
is worthless for that case. It may waste CPU resources.

No need to update compacted_4b_initial as mkfs since it's used to
fulfill the alignment of the 1st compacted_2b pack and would handle
the case above.

We also need to clarify compacted_4b_end here. It's used for the
last lclusters which aren't fitted in the previous compacted_2b
packs.

Some messages are from Xiang.

Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Change-Id: Id6f41153033aa1a8b8bc69328064d5902f4bd21f
2022-11-12 11:24:51 +00:00
Yue Hu
f96c4f125c erofs: remove the mapping parameter from erofs_try_to_free_cached_page()
The mapping is not used at all, remove it and update related code.

Link: https://lore.kernel.org/r/20210810072416.1392-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Change-Id: Iec9e654528b36a579b18fbd20677e76fc733ff6e
2022-11-12 11:24:51 +00:00
Yue Hu
3fa8ac111c erofs: directly use wrapper erofs_page_is_managed() when shrinking
We already have the wrapper function to identify managed page.

Link: https://lore.kernel.org/r/20210810065450.1320-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Change-Id: Id68356b5fc0cd6f4da2b53c2e56f9075253d624d
2022-11-12 11:24:50 +00:00
Gao Xiang
027bb6d69b erofs: fix 1 lcluster-sized pcluster for big pcluster
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type
rather than a HEAD or PLAIN type instead, which means its pclustersize
_must_ be 1 lcluster (since its uncompressed size < 2 lclusters),
as illustrated below:

       HEAD     HEAD / PLAIN    lcluster type
   ____________ ____________
  |_:__________|_________:__|   file data (uncompressed)
   .                .
  .____________.
  |____________|                pcluster data (compressed)

Such on-disk case was explained before [1] but missed to be handled
properly in the runtime implementation.

It can be observed if manually generating 1 lcluster-sized pcluster
with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now.

[1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org

Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org
Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>
Change-Id: I8a11d6d0c883a7222767e5c48f5da41c22698784
2022-11-12 11:24:50 +00:00
Gao Xiang
c94dab0246 erofs: enable big pcluster feature
Enable COMPR_CFGS and BIG_PCLUSTER since the implementations are
all settled properly.

Link: https://lore.kernel.org/r/20210407043927.10623-11-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I3e12261cf03e62bad4d7287d5827f3309a45d2a6
2022-11-12 11:24:50 +00:00
Gao Xiang
6247dd5bfb erofs: support decompress big pcluster for lz4 backend
Prior to big pcluster, there was only one compressed page so it'd
easy to map this. However, when big pcluster is enabled, more work
needs to be done to handle multiple compressed pages. In detail,

 - (maptype 0) if there is only one compressed page + no need
   to copy inplace I/O, just map it directly what we did before;

 - (maptype 1) if there are more compressed pages + no need to
   copy inplace I/O, vmap such compressed pages instead;

 - (maptype 2) if inplace I/O needs to be copied, use per-CPU
   buffers for decompression then.

Another thing is how to detect inplace decompression is feasable or
not (it's still quite easy for non big pclusters), apart from the
inplace margin calculation, inplace I/O page reusing order is also
needed to be considered for each compressed page. Currently, if the
compressed page is the xth page, it shouldn't be reused as [0 ...
nrpages_out - nrpages_in + x], otherwise a full copy will be triggered.

Although there are some extra optimization ideas for this, I'd like
to make big pcluster work correctly first and obviously it can be
further optimized later since it has nothing with the on-disk format
at all.

Link: https://lore.kernel.org/r/20210407043927.10623-10-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: Icf073fd978b9a7f659651540e478a67d14450a09
2022-11-12 11:24:50 +00:00
Gao Xiang
7caadd02df erofs: support parsing big pcluster compact indexes
Different from non-compact indexes, several lclusters are packed
as the compact form at once and an unique base blkaddr is stored for
each pack, so each lcluster index would take less space on avarage
(e.g. 2 bytes for COMPACT_2B.) btw, that is also why BIG_PCLUSTER
switch should be consistent for compact head0/1.

Prior to big pcluster, the size of all pclusters was 1 lcluster.
Therefore, when a new HEAD lcluster was scanned, blkaddr would be
bumped by 1 lcluster. However, that way doesn't work anymore for
big pcluster since we actually don't know the compressed size of
pclusters in advance (before reading CBLKCNT lcluster).

So, instead, let blkaddr of each pack be the first pcluster blkaddr
with a valid CBLKCNT, in detail,

 1) if CBLKCNT starts at the pack, this first valid pcluster is
    itself, e.g.
  _____________________________________________________________
 |_CBLKCNT0_|_NONHEAD_| .. |_HEAD_|_CBLKCNT1_| ... |_HEAD_| ...
 ^ = blkaddr base          ^ += CBLKCNT0           ^ += CBLKCNT1

 2) if CBLKCNT doesn't start at the pack, the first valid pcluster
    is the next pcluster, e.g.
  _________________________________________________________
 | NONHEAD_| .. |_HEAD_|_CBLKCNT0_| ... |_HEAD_|_HEAD_| ...
                ^ = blkaddr base        ^ += CBLKCNT0
                                               ^ += 1

When a CBLKCNT is found, blkaddr will be increased by CBLKCNT
lclusters, or a new HEAD is found immediately, bump blkaddr by 1
instead (see the picture above.)

Also noted if CBLKCNT is the end of the pack, instead of storing
delta1 (distance of the next HEAD lcluster) as normal NONHEADs,
it still uses the compressed block count (delta0) since delta1
can be calculated indirectly but the block count can't.

Adjust decoding logic to fit big pcluster compact indexes as well.

Link: https://lore.kernel.org/r/20210407043927.10623-9-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: Ia9dc0c8bf03b630edc4dfcedf61ef0632e1fcb05
2022-11-12 11:24:49 +00:00
Gao Xiang
05658e6d14 erofs: support parsing big pcluster compress indexes
When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes
will also have the same on-disk header compact indexes to keep per-file
configurations instead of leaving it zeroed.

If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each
pcluster in this file by parsing 1st non-head lcluster.

Link: https://lore.kernel.org/r/20210407043927.10623-8-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I316bad6965e2cfa97e6fb1fb548dda8bc363b3b1
2022-11-12 11:24:49 +00:00
Gao Xiang
3a2d5cdb17 erofs: adjust per-CPU buffers according to max_pclusterblks
Adjust per-CPU buffers on demand since big pcluster definition is
available. Also, bail out unsupported pcluster size according to
Z_EROFS_PCLUSTER_MAX_SIZE.

Link: https://lore.kernel.org/r/20210407043927.10623-7-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I5679ca306856206222bba50d78985390252e6b1f
2022-11-12 11:24:49 +00:00
Gao Xiang
d0d8fc4059 erofs: add big physical cluster definition
Big pcluster indicates the size of compressed data for each physical
pcluster is no longer fixed as block size, but could be more than 1
block (more accurately, 1 logical pcluster)

When big pcluster feature is enabled for head0/1, delta0 of the 1st
non-head lcluster index will keep block count of this pcluster in
lcluster size instead of 1. Or, the compressed size of pcluster
should be 1 lcluster if pcluster has no non-head lcluster index.

Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since
it depends on COMPR_CFGS and will be released together.

Link: https://lore.kernel.org/r/20210407043927.10623-6-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I00c6ff3b9197f0d4c3b19698d4f1fb1899773734
2022-11-12 11:24:49 +00:00
Gao Xiang
fd0b2e82e9 erofs: fix up inplace I/O pointer for big pcluster
When picking up inplace I/O pages, it should be traversed in reverse
order in aligned with the traversal order of file-backed online pages.
Also, index should be updated together when preloading compressed pages.

Previously, only page-sized pclustersize was supported so no problem
at all. Also rename `compressedpages' to `icpage_ptr' to reflect its
functionality.

Link: https://lore.kernel.org/r/20210407043927.10623-5-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: Iefdad4fe34757134e4ae06dd4dcc16ada0a7880c
2022-11-12 11:24:49 +00:00