Commit Graph

870599 Commits

Author SHA1 Message Date
Lukasz Luba
8addd3815e BACKPORT: trace: events: add devfreq trace event file
The patch adds a new file for with trace events for devfreq
framework. They are used for performance analysis of the framework.
It also contains updates in MAINTAINERS file adding new entry for
devfreq maintainers.

Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: Id252a3809cd0d210e3af027141b3ff7572fbfcc7
2022-11-12 11:24:24 +00:00
Saravana Kannan
6d9b5bae1b BACKPORT: PM / devfreq: Restart previous governor if new governor fails to start
If the new governor fails to start, switch back to old governor so that the
devfreq state is not left in some weird limbo.

[Myungjoo: assume fatal on revert failure and set df->governor to NULL]
Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: If9c35dca5d07fbfff6de236e7b2bfb5fd299abc7
2022-11-12 11:24:24 +00:00
MyungJoo Ham
23b4a59cdb BACKPORT: PM / devfreq: consistent indentation
Following up with complaints on inconsistent indentation from
Yangtao Li, this fixes indentation inconsistency.

In principle, this tries to put arguments aligned to the left
including the first argument except for the case where
the first argument is on the far-right side.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: Ib337712266c82f8bf70c3cbd0ed61f40d300db73
2022-11-12 11:24:24 +00:00
UtsavBalar1231
44aae90745 Revert "devfreq: add support to handle device suspend state"
This reverts commit aef86b2ecc.

Change-Id: Idbbe114721d5bc305a00fb37cf200f4123126756
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:24 +00:00
Yangtao Li
ff7a51f4c0 BACKPORT:PM / devfreq: fix missing check of return value in devfreq_add_device()
devm_kzalloc() could fail, so insert a check of its return value. And
if it fails, returns -ENOMEM.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: I8d305a8929b41854b4145dc9e04fa01e31b2af20
2022-11-12 11:24:23 +00:00
Yangtao Li
44e98834ca BACKPORT: PM / devfreq: fix mem leak in devfreq_add_device()
'devfreq' is malloced in devfreq_add_device() and should be freed in
the error handling cases, otherwise it will cause memory leak.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: I2ce1c43a550abb1e31eb3024222a3be7e8d67fe7
2022-11-12 11:24:23 +00:00
Lukasz Luba
0ab5e88438 BACKPORT: PM / devfreq: add devfreq_suspend/resume() functions
This patch adds implementation for global suspend/resume for
devfreq framework. System suspend will next use these functions.

Suggested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Suggested-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: I58de77b6d59f2508e254ccbeec49c0be82dd3c8e
2022-11-12 11:24:23 +00:00
Lukasz Luba
d83b387685 BACKPORT: PM / devfreq: add support for suspend/resume of a devfreq device
The patch prepares devfreq device for handling suspend/resume
functionality. The new fields will store needed information during this
process. Devfreq framework handles opp-suspend DT entry and there is no
need of modyfications in the drivers code. It uses atomic variables to
make sure no race condition affects the process.

Suggested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Suggested-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: I853e345c565593d7af3a8efc0c7c60514f9007f3
2022-11-12 11:24:23 +00:00
Lukasz Luba
2682fbed13 BACKPORT: PM / devfreq: refactor set_target frequency function
The refactoring is needed for the new client in devfreq: suspend.
To avoid code duplication, move it to the new local function
devfreq_set_target.

Suggested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Suggested-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: I735f1f380a84d5151a6b22b4824c307db28f2a82
2022-11-12 11:24:23 +00:00
zhong jiang
4cffe60f39 BACKPORT: PM / devfreq: remove redundant null pointer check before kfree
kfree has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before kfree.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: I5d8568549fe93f131c390742fcdfcf3d290c65f4
2022-11-12 11:24:22 +00:00
Bjorn Andersson
db569d7a58 BACKPORT: PM / devfreq: Drop custom MIN/MAX macros
Drop the custom MIN/MAX macros in favour of the standard min/max from
kernel.h

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Change-Id: I60ef1488ea87b47ef05c817052dc86b4695c6ffd
2022-11-12 11:24:22 +00:00
UtsavBalar1231
05533a7780 Revert "ANDROID: GKI: PM / devfreq: Introduce a sysfs lock"
This reverts commit 8f43993c58.

Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:22 +00:00
UtsavBalar1231
bb333f080d Revert "ANDROID: GKI: PM / devfreq: Fix race condition between suspend/resume and governor_store"
This reverts commit 902ad8fa08.

Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:22 +00:00
UtsavBalar1231
02f382ca53 Revert "ANDROID: GKI: PM/devfreq: Do not switch governors from sysfs when device is suspended"
This reverts commit 4f9183cc24.

Change-Id: Ie18b2aac4f191b789b3ffd69a862a07b677273a5
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:22 +00:00
UtsavBalar1231
e181da7d89 Revert "ANDROID: GKI: PM / devfreq: Allow min freq to be 0"
This reverts commit ace5c22c16.

Change-Id: Ifec1d4a7e5e4851181ca0cd46453513b65313260
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:21 +00:00
UtsavBalar1231
c165ac0a47 Revert "FROMLIST: PM / devfreq: Restart previous governor if new governor fails to start"
This reverts commit a2038b4794.

Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:21 +00:00
Stephen Dickey
de090849fd BACKPORT: devfreq: memlat: track cpu during ipi to cluster
Will aid debugging lockups in perf_event_read_value()
track the cpu being ipi'd.

Change-Id: Ia948f31bb2d91bca6144c0c50f8f66bd9c1459fe
Signed-off-by: Stephen Dickey <dickey@codeaurora.org>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:21 +00:00
Sultan Alsawaf
bbe528de02 of: Keep the phandle cache around after boot
Phandle lookups still occur frequently after boot (like in the regulator
subsystem), and they can be quite expensive if the device tree is
complex. Lookups disable IRQs and have been observed to take over a
millisecond on a mobile arm64 device, which is very bad for system
latency.

Keep the phandle cache around after boot to retain O(1) lookup times.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:21 +00:00
Sultan Alsawaf
3f9503e8f6 thread_info: Order thread flag tests with respect to flag mutations
Currently, thread flag tests are unordered with respect to flag changes,
which results in thread flag changes not becoming immediately visible to
other CPUs. On a weakly-ordered CPU, this is most noticeable with the
TIF_NEED_RESCHED flag and optimistic lock spinners, where the preemptoff
tracer shows an optimistic lock spinner will often elapse its scheduling
quantum despite checking TIF_NEED_RESCHED on every loop iteration. This
leads to scheduling delays and latency spikes, especially when disabling
preemption is involved, as is the case for optimistic lock spinning.

Making the thread flag helpers ordered with respect to test operations
resolves the issue seen in the preemptoff tracer. Now, optimistic lock
spinners bail out in a timely manner, and other TIF_NEED_RESCHED users
will benefit similarly.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:20 +00:00
Sultan Alsawaf
b75887b8a8 mm: Don't hog the CPU and zone lock in rmqueue_bulk()
There is noticeable scheduling latency and heavy zone lock contention
stemming from rmqueue_bulk's single hold of the zone lock while doing
its work, as seen with the preemptoff tracer. There's no actual need for
rmqueue_bulk() to hold the zone lock the entire time; it only does so
for supposed efficiency. As such, we can relax the zone lock and even
reschedule when IRQs are enabled in order to keep the scheduling delays
and zone lock contention at bay. Forward progress is still guaranteed,
as the zone lock can only be relaxed after page removal.

With this change, rmqueue_bulk() no longer appears as a serious offender
in the preemptoff tracer, and system latency is noticeably improved.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:20 +00:00
Sultan Alsawaf
370fefb53f mm: Lower the non-hugetlbpage pageblock size to reduce scheduling delays
The page allocator processes free pages in groups of pageblocks, where
the size of a pageblock is typically quite large (1024 pages without
hugetlbpage support). Pageblocks are processed atomically with the zone
lock held, which can cause severe scheduling delays on both the CPU
going through the pageblock and any other CPUs waiting to acquire the
zone lock. A frequent offender is move_freepages_block(), which is used
by rmqueue() for page allocation.

As it turns out, there's no requirement for pageblocks to be so large,
so the pageblock order can simply be reduced to ease the scheduling
delays and zone lock contention. PAGE_ALLOC_COSTLY_ORDER is used as a
reasonable setting to ensure non-costly page allocation requests can
still be serviced without always needing to free up more than one
pageblock's worth of pages at a time.

This has a noticeable effect on overall system latency when memory
pressure is elevated. The various mm functions which operate on
pageblocks no longer appear in the preemptoff tracer, where previously
they would spend up to 100 ms on a mobile arm64 CPU processing a
pageblock with preemption disabled and the zone lock held.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:20 +00:00
Sultan Alsawaf
3e0eb86439 perf/core: Fix risky smp_processor_id() usage in perf_event_read_local()
There's no requirement that perf_event_read_local() be used from a
context where CPU migration isn't possible, yet smp_processor_id() is
used with the assumption that the caller guarantees CPU migration can't
occur. Since IRQs are disabled here anyway, the smp_processor_id() can
simply be moved to the IRQ-disabled section to guarantee its safety.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:20 +00:00
Sultan Alsawaf
e06cdf4ce5 ion: Fix partial cache maintenance operations
The partial cache maintenance helpers check the number of segments in
each mapping before checking if the mapping is actually in use, which
sometimes results in spurious errors being returned to vidc. The errors
then cause vidc to malfunction, even though nothing's wrong.

The reason for checking the segment count first was to elide map_rwsem;
however, it turns out that map_rwsem isn't needed anyway, so we can have
our cake and eat it too.

Fix the spurious segment count errors by reordering the checks, and
remove map_rwsem entirely so we don't have to worry about eliding it for
performance reasons.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:20 +00:00
Sultan Alsawaf
7efe3414b7 qos: Change cpus_affine to not be atomic
There isn't a need for cpus_affine to be atomic, and reading/writing to
it outside of the global pm_qos lock is racy anyway. As such, we can
simply turn it into a primitive integer type.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:19 +00:00
Sultan Alsawaf
803faf2d23 qos: Speed up plist traversal in pm_qos_set_value_for_cpus()
The plist is already sorted and traversed in ascending order of PM QoS
value, so we can simply look at the lowest PM QoS values which affect
the given request's CPUs until we've looked at all of them, at which
point the traversal can be stopped early. This also lets us get rid of
the pesky qos_val array.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:19 +00:00
Sultan Alsawaf
5d000fb40c qos: Fix PM QoS requests almost never shutting off
Andrzej Perczak discovered that his CPUs would almost never enter an
idle state deeper than C0, and pinpointed the cause of the issue to be
commit "qos: Speed up pm_qos_set_value_for_cpus()". As it turns out, the
optimizations introduced in that commit contain two issues that are
responsible for this behavior: pm_qos_remove_request() fails to refresh
the affected per-CPU targets, and IRQ migrations fail to refresh their
old affinity's targets.

Removing a request fails to refresh the per-CPU targets because
`new_req->node.prio` isn't updated to the PM QoS class' default value
upon removal, and so it contains its old value from when it was active.
This causes the `changed` loop in pm_qos_set_value_for_cpus() to check
against a stale PM QoS request value and erroneously determine that the
request in question doesn't alter the current per-CPU targets.

As for IRQ migrations, only the new CPU affinity mask gets updated,
which causes the CPUs present in the old affinity mask but not the new
one to retain their targets, specifically when a migration occurs while
the associated PM QoS request is active.

To fix these issues while retaining optimal speed, update PM QoS
requests' CPU affinity inside pm_qos_set_value_for_cpus() so that the
old affinity can be known, and skip the `changed` loop when the request
in question is being removed.

Reported-by: Andrzej Perczak <kartapolska@gmail.com>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:19 +00:00
Sultan Alsawaf
40106c0f1f cpuidle: lpm-levels: Only cancel the bias timer when it's used
The bias timer is only started when WFI is used, so we only need to
try and cancel it after leaving WFI.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:19 +00:00
Sultan Alsawaf
cf3a9c131c sched/core: Always panic when scheduling in atomic context
Scheduling in atomic context is indicative of a serious problem that,
although may not be immediately lethal, can lead to strange issues and
eventually a panic. We should therefore panic the first time it's
detected.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:19 +00:00
Sultan Alsawaf
57cc3cbe1e ion: Restore ION_IOC_HEAP_QUERY ioctl command
It turns out that the ION_IOC_HEAP_QUERY command is actually used in
some camera-related components in Android 11, such as libultradepth_api,
and in libdmabufheap in Android 12. The omission of this command causes
these components to break when their ioctl attempt returns -ENOTTY.

Restore the ION_IOC_HEAP_QUERY command to fix the incompatibility.

Unfortunately, libdmabufheap uses heap names in order to look up heap
IDs so that the calling userspace code can maintain a constant heap name
and cope with inconsistent heap IDs. For example, if some user code
wants to allocate from the system heap, it only has to specify "system"
as the desired heap name, and it doesn't need to keep track of the
system heap ID.

This is unfortunate because now we must copy heap name strings to
userspace. In order to speed this up, a pre-allocated array, which is
statically allocated to accommodate the maximum number of heaps, is
populated with heap data as heaps are created. When a heap query command
requests heap data, all we have to do is copy the big array of pre-made
data, and we're done.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:18 +00:00
Sultan Alsawaf
5fb388c1c1 ion: Further optimize ioctl handler
We can omit the _IOC_SIZE() check and also inline copy_from_user() by
duplicating copy_from_user() for each ioctl command and giving it a
constant size. Since there aren't many ioctls here, this doesn't turn
the code into spaghetti.

We can further optimize the prefetch ioctls as well by omitting one word
of data from the copy_from_user(), since the first member of `struct
ion_prefetch_data` (the `len` field) is unused. As proof of this, rename
`len` to `unused` in the uapi header, which also ensures that the
compiler will notify us if this ever changes in the future. This is
necessary because the prefetch data is used outside of ion.c, where we
cannot easily audit its usage.

There's no reduction done for the allocation ioctl because we could only
reduce the copy_from_user() payload by a half word, which will result in
a payload size that isn't a multiple of a word. The copy_from_user()
implementation on arm64 will go slower as a result, so just leave it
untouched.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:18 +00:00
Sultan Alsawaf
5e17b2f335 ion: Remove unneeded rwsem for the heap priority list
Heaps are never removed, and there is only one ion_device_add_heap()
user: msm_ion_probe(). This single user calls ion_device_add_heap()
sequentially, not concurrently. Furthermore, all heap additions are done
once kernel init is complete, and heaps are only accessed by userspace,
so no locking is needed at all here.

The write lock in ion_walk_heaps() doesn't make sense either since the
heap-walking functions neither mutate a heap's placement in the plist,
nor change a heap in a way that requires pausing all buffer allocations.
The functions used in the heap walking routine handle synchronization
themselves, so there's no need for the mutex-style locking here. This
write lock appears to be a historical artifact from the following 2013
commit (present in msm-3.4 trees) where a justification for the write
lock was never given: 7c1b8aa23ef ("gpu: ion: Add support for heap
walking").

Since the heap plist rwsem appears to be thoroughly useless, we can
safely remove it to reduce complexity and improve performance.

Also, change the name of ion_device_add_heap() to ion_add_heap() so the
compiler can notify us if ion_device_add_heap() is used elsewhere in the
future.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:18 +00:00
celtare21
5ddd0c1940 sched/core: Fix rq clock warning in sched_migrate_to_cpumask_end()
The following warning occurs because we don't update the runqueue's
clock when taking rq->lock in sched_migrate_to_cpumask_end():

rq->clock_update_flags < RQCF_ACT_SKIP
WARNING: CPU: 0 PID: 991 at update_curr+0x1c8/0x2bc
[...]
Call trace:
update_curr+0x1c8/0x2bc
dequeue_task_fair+0x7c/0x1238
do_set_cpus_allowed+0x64/0x28c
sched_migrate_to_cpumask_end+0xa8/0x1b4
m_stop+0x40/0x78
seq_read+0x39c/0x4ac
__vfs_read+0x44/0x12c
vfs_read+0xf0/0x1d8
SyS_read+0x6c/0xcc
el0_svc_naked+0x34/0x38

Fix it by adding an update_rq_clock() call when taking rq->lock.

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-11-12 11:24:18 +00:00
Sultan Alsawaf
adaa599abb msm: kgsl: Affine kgsl_3d0_irq and worker kthread to the big CPU cluster
These are in the critical path for rendering frames to the display, so
mark them as performance-critical and affine them to the big CPU
cluster. They aren't placed onto the prime cluster because the
single-CPU prime cluster will be used to run the DRM IRQ and kthreads.
DRM is more latency-critical than KGSL and we need to have DRM and KGSL
running on separate CPUs for the best performance, so KGSL gets the big
cluster.

Note that since there are other IRQs requested via kgsl_request_irq(),
we must specify that the IRQ to be made perf-critical is kgsl_3d0_irq.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:18 +00:00
Sultan Alsawaf
94f3e31c1b Revert "msm: kgsl: Affine IRQ and worker kthread to the big CPU cluster"
This reverts commit 417bded5a942a2a23ad65b3fe5fd3fff2d0dbf5b.

This is wrong. This causes 3 IRQs to be affined to the big CPU cluster,
not just the primary kgsl_3d0_irq one. As a result, the perf crit API
thinks that the 2 extra IRQs are critical and will balance them despite
them being rarely used (kgsl_hfi_irq and kgsl_gmu_irq).

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:17 +00:00
Sultan Alsawaf
686d26f283 sched/fair: Compile out NUMA code entirely when NUMA is disabled
Scheduler code is very hot and every little optimization counts. Instead
of constantly checking sched_numa_balancing when NUMA is disabled,
compile it out.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:17 +00:00
Sultan Alsawaf
e23d4ea590 mm: Perform PID map reads on the little CPU cluster
PID map reads for processes with thousands of mappings can be done
extensively by certain Android apps, burning through CPU time on
higher-performance CPUs even though reading PID maps is never a
performance-critical task. We can relieve the load on the important CPUs
by moving PID map reads to little CPUs via sched_migrate_to_cpumask_*().

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: dreamisbaka <jolinux.g@gmail.com>
2022-11-12 11:24:17 +00:00
Sultan Alsawaf
c2c3304ca2 sched: Add API to migrate the current process to a given cpumask
There are some chunks of code in the kernel running in process context
where it may be helpful to run the code on a specific set of CPUs, such
as when reading some CPU-intensive procfs files. This is especially
useful when the code in question must run within the context of the
current process (so kthreads cannot be used).

Add an API to make this possible, which consists of the following:
sched_migrate_to_cpumask_start():
 @old_mask: pointer to output the current task's old cpumask
 @dest: pointer to a cpumask the current task should be moved to

sched_migrate_to_cpumask_end():
 @old_mask: pointer to the old cpumask generated earlier
 @dest: pointer to the dest cpumask provided earlier

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: dreamisbaka <jolinux.g@gmail.com>
2022-11-12 11:24:17 +00:00
Sultan Alsawaf
d516115a24 mm: Micro-optimize PID map reads for arm64 while retaining output format
Android and various applications in Android need to read PID map data in
order to work. Some processes can contain over 10,000 mappings, which
results in lots of time wasted on simply generating strings. This wasted
time adds up, especially in the case of Unity-based games, which utilize
the Boehm garbage collector. A game's main process typically has well
over 10,000 mappings due to the loaded textures, and the Boehm GC reads
PID maps several times a second. This results in over 100,000 map
entries being printed out per second, so micro-optimization here is
important. Before this commit, show_vma_header_prefix() would typically
take around 1000 ns to run on a Snapdragon 855; now it only takes about
50 ns to run, which is a 20x improvement.

The primary micro-optimizations here assume that there are no more than
40 bits in the virtual address space, hence the CONFIG_ARM64_VA_BITS
check. Arm64 uses a virtual address size of 39 bits, so this perfectly
covers it.

This also removes padding used to beautify PID map output to further
speed up reads and reduce the amount of bytes printed, and optimizes the
dentry path retrieval for file-backed mappings. Note, however, that the
trailing space at the end of the line for non-file-backed mappings
cannot be omitted, as it breaks some PID map parsers.

This still retains insignificant leading zeros from printed hex values
to maintain the current output format.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: LibXZR <xzr467706992@163.com>
2022-11-12 11:24:17 +00:00
Sultan Alsawaf
2d1025e96a msm: msm_bus: Don't enable QoS clocks when none are present
There's no point in enabling QoS clocks when are none of them for certain
clients.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:16 +00:00
Sultan Alsawaf
f3df72d95e msm: kgsl: Use lock-less list for page pools
Page pool additions and removals are very hot during GPU workloads, so
they should be optimized accordingly. We can use a lock-less list for
storing the free pages in order to speed things up. The lock-less list
allows for one llist_del_first() user and unlimited llist_add() users to
run concurrently, so only a spin lock around the llist_del_first() is
needed; everything else is lock-free. The per-pool page count is now an
atomic to make it lock-free as well.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: LibXZR <xzr467706992@163.com>
2022-11-12 11:24:16 +00:00
Sultan Alsawaf
da748e58a9 drm/msm/sde: Don't clear dim layers when there aren't any applied
Clearing dim layers indiscriminately for each blend stage on each commit
wastes a lot of CPU time since the clearing process is heavy on register
accesses. We can optimize this by only clearing dim layers when they're
actually set, and only clearing them on a per-stage basis at that. This
reduces display commit latency considerably.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:16 +00:00
Sultan Alsawaf
03af212bd8 msm: kgsl: Don't busy wait for fenced GMU writes when possible
The most frequent user of fenced GMU writes, adreno_ringbuffer_submit(),
performs a fenced GMU write under a spin lock, and since fenced GMU
writes use udelay(), a lot of CPU cycles are burned here. Not only is
the spin lock held for longer than necessary (because the write doesn't
need to be inside the spin lock), but also a lot of CPU time is wasted
in udelay() for tens of microseconds when usleep_range() can be used
instead.

Move the locked fenced GMU writes to outside their spin locks and make
adreno_gmu_fenced_write() use usleep_range() when not in atomic/IRQ
context, to save power and improve performance. Fenced GMU writes are
found to take an average of 28 microseconds on the Snapdragon 855, so a
usleep range of 10 to 30 microseconds is optimal.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:16 +00:00
Sultan Alsawaf
0871f5ceea msm: kgsl: Remove unneeded time profiling from ringbuffer submission
The time profiling here is only used to provide additional debug info
for a context dump as well as a tracepoint. It adds non-trivial overhead
to ringbuffer submission since it accesses GPU registers, so remove it
along with the tracepoint since we're not debugging adreno.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:15 +00:00
Sultan Alsawaf
a8052f1777 scsi: ufs: Add simple IRQ-affined PM QoS operations
Qualcomm's PM QoS solution suffers from a number of issues: applying
PM QoS to all CPUs, convoluted spaghetti code that wastes CPU cycles,
and keeping PM QoS applied for 10 ms after all requests finish
processing.

This implements a simple IRQ-affined PM QoS mechanism for each UFS
adapter which uses atomics to elide locking, and enqueues a worker to
apply PM QoS to the target CPU as soon as a command request is issued.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
2022-11-12 11:24:15 +00:00
Panchajanya1999
67322ffa74 drivers/char: adsprpc: Remove Qcom's PM_QoS implementation
Qualcomm's QoS implementation wastes a significant power from
CPU cycles.
Scrap the QoS bits and save a bit power without hurting any
functionality.

Change-Id: I1de3563d9c99ba863f10a90a900d290bdd8e6b79
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: Carlos Ayrton Lopez Arroyo <15030201@itcelaya.edu.mx>
2022-11-12 11:24:15 +00:00
Sultan Alsawaf
d89f76b07e scsi: ufs: Scrap Qualcomm's PM QoS implementation
This implementation is completely over the top and wastes lots of CPU
cycles. It's too convoluted to fix, so just scrap it to make way for a
simpler solution. This purges every PM QoS reference in the UFS drivers.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
2022-11-12 11:24:15 +00:00
Sultan Alsawaf
c0b8dc4d0a qos: Remove pm_qos_update_request_timeout() API
Using a timeout for a PM QoS request can lead to disastrous results on
power consumption. It's always possible to find a fixed scope in which a
PM QoS request should be applied, so timeouts aren't ever strictly
needed; they're usually just a lazy way of using PM QoS. Remove the API
so that it cannot be abused any longer.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:15 +00:00
Sultan Alsawaf
28c9e970a8 msm: kgsl: Remove L2PC PM QoS feature
KGSL already has PM QoS covering what matters. The L2PC PM QoS code is
not only unneeded, but also unused, so remove it. It's poorly designed
anyway since it uses a timeout with PM QoS, which is drastically bad for
power consumption.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:14 +00:00
Sultan Alsawaf
85bc91328d drm/msm/sde: Don't read and clear VBIF errors upon commit
Reading and clearing any errors from the VBIF error registers takes a
significant amount of time during kickoff, and is only used to produce
debug logs when errors are detected. Since we're not debugging hardware
issues in MDSS, remove the VBIF error clearing entirely to reduce
display rendering latency.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:14 +00:00
Sultan Alsawaf
952dfb1b3f drm/msm/sde: Remove redundant write memory barriers from IRQ routines
Explicit write memory barriers are unneeded here since releasing a lock
already implies a full memory barrier.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2022-11-12 11:24:14 +00:00