android_kernel_xiaomi_sm7250

Author	SHA1	Message	Date
Lukasz Luba	8addd3815e	BACKPORT: trace: events: add devfreq trace event file The patch adds a new file for with trace events for devfreq framework. They are used for performance analysis of the framework. It also contains updates in MAINTAINERS file adding new entry for devfreq maintainers. Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: Id252a3809cd0d210e3af027141b3ff7572fbfcc7	2022-11-12 11:24:24 +00:00
Saravana Kannan	6d9b5bae1b	BACKPORT: PM / devfreq: Restart previous governor if new governor fails to start If the new governor fails to start, switch back to old governor so that the devfreq state is not left in some weird limbo. [Myungjoo: assume fatal on revert failure and set df->governor to NULL] Signed-off-by: Sibi Sankar <sibis@codeaurora.org> Signed-off-by: Saravana Kannan <skannan@codeaurora.org> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: If9c35dca5d07fbfff6de236e7b2bfb5fd299abc7	2022-11-12 11:24:24 +00:00
MyungJoo Ham	23b4a59cdb	BACKPORT: PM / devfreq: consistent indentation Following up with complaints on inconsistent indentation from Yangtao Li, this fixes indentation inconsistency. In principle, this tries to put arguments aligned to the left including the first argument except for the case where the first argument is on the far-right side. Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Acked-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: Ib337712266c82f8bf70c3cbd0ed61f40d300db73	2022-11-12 11:24:24 +00:00
UtsavBalar1231	44aae90745	Revert "devfreq: add support to handle device suspend state" This reverts commit `aef86b2ecc`. Change-Id: Idbbe114721d5bc305a00fb37cf200f4123126756 Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>	2022-11-12 11:24:24 +00:00
Yangtao Li	ff7a51f4c0	BACKPORT:PM / devfreq: fix missing check of return value in devfreq_add_device() devm_kzalloc() could fail, so insert a check of its return value. And if it fails, returns -ENOMEM. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: I8d305a8929b41854b4145dc9e04fa01e31b2af20	2022-11-12 11:24:23 +00:00
Yangtao Li	44e98834ca	BACKPORT: PM / devfreq: fix mem leak in devfreq_add_device() 'devfreq' is malloced in devfreq_add_device() and should be freed in the error handling cases, otherwise it will cause memory leak. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: I2ce1c43a550abb1e31eb3024222a3be7e8d67fe7	2022-11-12 11:24:23 +00:00
Lukasz Luba	0ab5e88438	BACKPORT: PM / devfreq: add devfreq_suspend/resume() functions This patch adds implementation for global suspend/resume for devfreq framework. System suspend will next use these functions. Suggested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Suggested-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: I58de77b6d59f2508e254ccbeec49c0be82dd3c8e	2022-11-12 11:24:23 +00:00
Lukasz Luba	d83b387685	BACKPORT: PM / devfreq: add support for suspend/resume of a devfreq device The patch prepares devfreq device for handling suspend/resume functionality. The new fields will store needed information during this process. Devfreq framework handles opp-suspend DT entry and there is no need of modyfications in the drivers code. It uses atomic variables to make sure no race condition affects the process. Suggested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Suggested-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: I853e345c565593d7af3a8efc0c7c60514f9007f3	2022-11-12 11:24:23 +00:00
Lukasz Luba	2682fbed13	BACKPORT: PM / devfreq: refactor set_target frequency function The refactoring is needed for the new client in devfreq: suspend. To avoid code duplication, move it to the new local function devfreq_set_target. Suggested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Suggested-by: Chanwoo Choi <cw00.choi@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: I735f1f380a84d5151a6b22b4824c307db28f2a82	2022-11-12 11:24:23 +00:00
zhong jiang	4cffe60f39	BACKPORT: PM / devfreq: remove redundant null pointer check before kfree kfree has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: I5d8568549fe93f131c390742fcdfcf3d290c65f4	2022-11-12 11:24:22 +00:00
Bjorn Andersson	db569d7a58	BACKPORT: PM / devfreq: Drop custom MIN/MAX macros Drop the custom MIN/MAX macros in favour of the standard min/max from kernel.h Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com> Change-Id: I60ef1488ea87b47ef05c817052dc86b4695c6ffd	2022-11-12 11:24:22 +00:00
UtsavBalar1231	05533a7780	Revert "ANDROID: GKI: PM / devfreq: Introduce a sysfs lock" This reverts commit `8f43993c58`. Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>	2022-11-12 11:24:22 +00:00
UtsavBalar1231	bb333f080d	Revert "ANDROID: GKI: PM / devfreq: Fix race condition between suspend/resume and governor_store" This reverts commit `902ad8fa08`. Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>	2022-11-12 11:24:22 +00:00
UtsavBalar1231	02f382ca53	Revert "ANDROID: GKI: PM/devfreq: Do not switch governors from sysfs when device is suspended" This reverts commit `4f9183cc24`. Change-Id: Ie18b2aac4f191b789b3ffd69a862a07b677273a5 Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>	2022-11-12 11:24:22 +00:00
UtsavBalar1231	e181da7d89	Revert "ANDROID: GKI: PM / devfreq: Allow min freq to be 0" This reverts commit `ace5c22c16`. Change-Id: Ifec1d4a7e5e4851181ca0cd46453513b65313260 Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>	2022-11-12 11:24:21 +00:00
UtsavBalar1231	c165ac0a47	Revert "FROMLIST: PM / devfreq: Restart previous governor if new governor fails to start" This reverts commit `a2038b4794`. Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>	2022-11-12 11:24:21 +00:00
Stephen Dickey	de090849fd	BACKPORT: devfreq: memlat: track cpu during ipi to cluster Will aid debugging lockups in perf_event_read_value() track the cpu being ipi'd. Change-Id: Ia948f31bb2d91bca6144c0c50f8f66bd9c1459fe Signed-off-by: Stephen Dickey <dickey@codeaurora.org> Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>	2022-11-12 11:24:21 +00:00
Sultan Alsawaf	bbe528de02	of: Keep the phandle cache around after boot Phandle lookups still occur frequently after boot (like in the regulator subsystem), and they can be quite expensive if the device tree is complex. Lookups disable IRQs and have been observed to take over a millisecond on a mobile arm64 device, which is very bad for system latency. Keep the phandle cache around after boot to retain O(1) lookup times. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:21 +00:00
Sultan Alsawaf	3f9503e8f6	thread_info: Order thread flag tests with respect to flag mutations Currently, thread flag tests are unordered with respect to flag changes, which results in thread flag changes not becoming immediately visible to other CPUs. On a weakly-ordered CPU, this is most noticeable with the TIF_NEED_RESCHED flag and optimistic lock spinners, where the preemptoff tracer shows an optimistic lock spinner will often elapse its scheduling quantum despite checking TIF_NEED_RESCHED on every loop iteration. This leads to scheduling delays and latency spikes, especially when disabling preemption is involved, as is the case for optimistic lock spinning. Making the thread flag helpers ordered with respect to test operations resolves the issue seen in the preemptoff tracer. Now, optimistic lock spinners bail out in a timely manner, and other TIF_NEED_RESCHED users will benefit similarly. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:20 +00:00
Sultan Alsawaf	b75887b8a8	mm: Don't hog the CPU and zone lock in rmqueue_bulk() There is noticeable scheduling latency and heavy zone lock contention stemming from rmqueue_bulk's single hold of the zone lock while doing its work, as seen with the preemptoff tracer. There's no actual need for rmqueue_bulk() to hold the zone lock the entire time; it only does so for supposed efficiency. As such, we can relax the zone lock and even reschedule when IRQs are enabled in order to keep the scheduling delays and zone lock contention at bay. Forward progress is still guaranteed, as the zone lock can only be relaxed after page removal. With this change, rmqueue_bulk() no longer appears as a serious offender in the preemptoff tracer, and system latency is noticeably improved. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:20 +00:00
Sultan Alsawaf	370fefb53f	mm: Lower the non-hugetlbpage pageblock size to reduce scheduling delays The page allocator processes free pages in groups of pageblocks, where the size of a pageblock is typically quite large (1024 pages without hugetlbpage support). Pageblocks are processed atomically with the zone lock held, which can cause severe scheduling delays on both the CPU going through the pageblock and any other CPUs waiting to acquire the zone lock. A frequent offender is move_freepages_block(), which is used by rmqueue() for page allocation. As it turns out, there's no requirement for pageblocks to be so large, so the pageblock order can simply be reduced to ease the scheduling delays and zone lock contention. PAGE_ALLOC_COSTLY_ORDER is used as a reasonable setting to ensure non-costly page allocation requests can still be serviced without always needing to free up more than one pageblock's worth of pages at a time. This has a noticeable effect on overall system latency when memory pressure is elevated. The various mm functions which operate on pageblocks no longer appear in the preemptoff tracer, where previously they would spend up to 100 ms on a mobile arm64 CPU processing a pageblock with preemption disabled and the zone lock held. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:20 +00:00
Sultan Alsawaf	3e0eb86439	perf/core: Fix risky smp_processor_id() usage in perf_event_read_local() There's no requirement that perf_event_read_local() be used from a context where CPU migration isn't possible, yet smp_processor_id() is used with the assumption that the caller guarantees CPU migration can't occur. Since IRQs are disabled here anyway, the smp_processor_id() can simply be moved to the IRQ-disabled section to guarantee its safety. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:20 +00:00
Sultan Alsawaf	e06cdf4ce5	ion: Fix partial cache maintenance operations The partial cache maintenance helpers check the number of segments in each mapping before checking if the mapping is actually in use, which sometimes results in spurious errors being returned to vidc. The errors then cause vidc to malfunction, even though nothing's wrong. The reason for checking the segment count first was to elide map_rwsem; however, it turns out that map_rwsem isn't needed anyway, so we can have our cake and eat it too. Fix the spurious segment count errors by reordering the checks, and remove map_rwsem entirely so we don't have to worry about eliding it for performance reasons. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:20 +00:00
Sultan Alsawaf	7efe3414b7	qos: Change cpus_affine to not be atomic There isn't a need for cpus_affine to be atomic, and reading/writing to it outside of the global pm_qos lock is racy anyway. As such, we can simply turn it into a primitive integer type. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:19 +00:00
Sultan Alsawaf	803faf2d23	qos: Speed up plist traversal in pm_qos_set_value_for_cpus() The plist is already sorted and traversed in ascending order of PM QoS value, so we can simply look at the lowest PM QoS values which affect the given request's CPUs until we've looked at all of them, at which point the traversal can be stopped early. This also lets us get rid of the pesky qos_val array. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:19 +00:00
Sultan Alsawaf	5d000fb40c	qos: Fix PM QoS requests almost never shutting off Andrzej Perczak discovered that his CPUs would almost never enter an idle state deeper than C0, and pinpointed the cause of the issue to be commit "qos: Speed up pm_qos_set_value_for_cpus()". As it turns out, the optimizations introduced in that commit contain two issues that are responsible for this behavior: pm_qos_remove_request() fails to refresh the affected per-CPU targets, and IRQ migrations fail to refresh their old affinity's targets. Removing a request fails to refresh the per-CPU targets because `new_req->node.prio` isn't updated to the PM QoS class' default value upon removal, and so it contains its old value from when it was active. This causes the `changed` loop in pm_qos_set_value_for_cpus() to check against a stale PM QoS request value and erroneously determine that the request in question doesn't alter the current per-CPU targets. As for IRQ migrations, only the new CPU affinity mask gets updated, which causes the CPUs present in the old affinity mask but not the new one to retain their targets, specifically when a migration occurs while the associated PM QoS request is active. To fix these issues while retaining optimal speed, update PM QoS requests' CPU affinity inside pm_qos_set_value_for_cpus() so that the old affinity can be known, and skip the `changed` loop when the request in question is being removed. Reported-by: Andrzej Perczak <kartapolska@gmail.com> Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:19 +00:00
Sultan Alsawaf	40106c0f1f	cpuidle: lpm-levels: Only cancel the bias timer when it's used The bias timer is only started when WFI is used, so we only need to try and cancel it after leaving WFI. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:19 +00:00
Sultan Alsawaf	cf3a9c131c	sched/core: Always panic when scheduling in atomic context Scheduling in atomic context is indicative of a serious problem that, although may not be immediately lethal, can lead to strange issues and eventually a panic. We should therefore panic the first time it's detected. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:19 +00:00
Sultan Alsawaf	57cc3cbe1e	ion: Restore ION_IOC_HEAP_QUERY ioctl command It turns out that the ION_IOC_HEAP_QUERY command is actually used in some camera-related components in Android 11, such as libultradepth_api, and in libdmabufheap in Android 12. The omission of this command causes these components to break when their ioctl attempt returns -ENOTTY. Restore the ION_IOC_HEAP_QUERY command to fix the incompatibility. Unfortunately, libdmabufheap uses heap names in order to look up heap IDs so that the calling userspace code can maintain a constant heap name and cope with inconsistent heap IDs. For example, if some user code wants to allocate from the system heap, it only has to specify "system" as the desired heap name, and it doesn't need to keep track of the system heap ID. This is unfortunate because now we must copy heap name strings to userspace. In order to speed this up, a pre-allocated array, which is statically allocated to accommodate the maximum number of heaps, is populated with heap data as heaps are created. When a heap query command requests heap data, all we have to do is copy the big array of pre-made data, and we're done. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:18 +00:00
Sultan Alsawaf	5fb388c1c1	ion: Further optimize ioctl handler We can omit the _IOC_SIZE() check and also inline copy_from_user() by duplicating copy_from_user() for each ioctl command and giving it a constant size. Since there aren't many ioctls here, this doesn't turn the code into spaghetti. We can further optimize the prefetch ioctls as well by omitting one word of data from the copy_from_user(), since the first member of `struct ion_prefetch_data` (the `len` field) is unused. As proof of this, rename `len` to `unused` in the uapi header, which also ensures that the compiler will notify us if this ever changes in the future. This is necessary because the prefetch data is used outside of ion.c, where we cannot easily audit its usage. There's no reduction done for the allocation ioctl because we could only reduce the copy_from_user() payload by a half word, which will result in a payload size that isn't a multiple of a word. The copy_from_user() implementation on arm64 will go slower as a result, so just leave it untouched. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:18 +00:00
Sultan Alsawaf	5e17b2f335	ion: Remove unneeded rwsem for the heap priority list Heaps are never removed, and there is only one ion_device_add_heap() user: msm_ion_probe(). This single user calls ion_device_add_heap() sequentially, not concurrently. Furthermore, all heap additions are done once kernel init is complete, and heaps are only accessed by userspace, so no locking is needed at all here. The write lock in ion_walk_heaps() doesn't make sense either since the heap-walking functions neither mutate a heap's placement in the plist, nor change a heap in a way that requires pausing all buffer allocations. The functions used in the heap walking routine handle synchronization themselves, so there's no need for the mutex-style locking here. This write lock appears to be a historical artifact from the following 2013 commit (present in msm-3.4 trees) where a justification for the write lock was never given: 7c1b8aa23ef ("gpu: ion: Add support for heap walking"). Since the heap plist rwsem appears to be thoroughly useless, we can safely remove it to reduce complexity and improve performance. Also, change the name of ion_device_add_heap() to ion_add_heap() so the compiler can notify us if ion_device_add_heap() is used elsewhere in the future. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:18 +00:00
celtare21	5ddd0c1940	sched/core: Fix rq clock warning in sched_migrate_to_cpumask_end() The following warning occurs because we don't update the runqueue's clock when taking rq->lock in sched_migrate_to_cpumask_end(): rq->clock_update_flags < RQCF_ACT_SKIP WARNING: CPU: 0 PID: 991 at update_curr+0x1c8/0x2bc [...] Call trace: update_curr+0x1c8/0x2bc dequeue_task_fair+0x7c/0x1238 do_set_cpus_allowed+0x64/0x28c sched_migrate_to_cpumask_end+0xa8/0x1b4 m_stop+0x40/0x78 seq_read+0x39c/0x4ac __vfs_read+0x44/0x12c vfs_read+0xf0/0x1d8 SyS_read+0x6c/0xcc el0_svc_naked+0x34/0x38 Fix it by adding an update_rq_clock() call when taking rq->lock. Signed-off-by: celtare21 <celtare21@gmail.com>	2022-11-12 11:24:18 +00:00
Sultan Alsawaf	adaa599abb	msm: kgsl: Affine kgsl_3d0_irq and worker kthread to the big CPU cluster These are in the critical path for rendering frames to the display, so mark them as performance-critical and affine them to the big CPU cluster. They aren't placed onto the prime cluster because the single-CPU prime cluster will be used to run the DRM IRQ and kthreads. DRM is more latency-critical than KGSL and we need to have DRM and KGSL running on separate CPUs for the best performance, so KGSL gets the big cluster. Note that since there are other IRQs requested via kgsl_request_irq(), we must specify that the IRQ to be made perf-critical is kgsl_3d0_irq. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:18 +00:00
Sultan Alsawaf	94f3e31c1b	Revert "msm: kgsl: Affine IRQ and worker kthread to the big CPU cluster" This reverts commit 417bded5a942a2a23ad65b3fe5fd3fff2d0dbf5b. This is wrong. This causes 3 IRQs to be affined to the big CPU cluster, not just the primary kgsl_3d0_irq one. As a result, the perf crit API thinks that the 2 extra IRQs are critical and will balance them despite them being rarely used (kgsl_hfi_irq and kgsl_gmu_irq). Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:17 +00:00
Sultan Alsawaf	686d26f283	sched/fair: Compile out NUMA code entirely when NUMA is disabled Scheduler code is very hot and every little optimization counts. Instead of constantly checking sched_numa_balancing when NUMA is disabled, compile it out. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:17 +00:00
Sultan Alsawaf	e23d4ea590	mm: Perform PID map reads on the little CPU cluster PID map reads for processes with thousands of mappings can be done extensively by certain Android apps, burning through CPU time on higher-performance CPUs even though reading PID maps is never a performance-critical task. We can relieve the load on the important CPUs by moving PID map reads to little CPUs via sched_migrate_to_cpumask_*(). Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: dreamisbaka <jolinux.g@gmail.com>	2022-11-12 11:24:17 +00:00
Sultan Alsawaf	c2c3304ca2	sched: Add API to migrate the current process to a given cpumask There are some chunks of code in the kernel running in process context where it may be helpful to run the code on a specific set of CPUs, such as when reading some CPU-intensive procfs files. This is especially useful when the code in question must run within the context of the current process (so kthreads cannot be used). Add an API to make this possible, which consists of the following: sched_migrate_to_cpumask_start(): @old_mask: pointer to output the current task's old cpumask @dest: pointer to a cpumask the current task should be moved to sched_migrate_to_cpumask_end(): @old_mask: pointer to the old cpumask generated earlier @dest: pointer to the dest cpumask provided earlier Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: dreamisbaka <jolinux.g@gmail.com>	2022-11-12 11:24:17 +00:00
Sultan Alsawaf	d516115a24	mm: Micro-optimize PID map reads for arm64 while retaining output format Android and various applications in Android need to read PID map data in order to work. Some processes can contain over 10,000 mappings, which results in lots of time wasted on simply generating strings. This wasted time adds up, especially in the case of Unity-based games, which utilize the Boehm garbage collector. A game's main process typically has well over 10,000 mappings due to the loaded textures, and the Boehm GC reads PID maps several times a second. This results in over 100,000 map entries being printed out per second, so micro-optimization here is important. Before this commit, show_vma_header_prefix() would typically take around 1000 ns to run on a Snapdragon 855; now it only takes about 50 ns to run, which is a 20x improvement. The primary micro-optimizations here assume that there are no more than 40 bits in the virtual address space, hence the CONFIG_ARM64_VA_BITS check. Arm64 uses a virtual address size of 39 bits, so this perfectly covers it. This also removes padding used to beautify PID map output to further speed up reads and reduce the amount of bytes printed, and optimizes the dentry path retrieval for file-backed mappings. Note, however, that the trailing space at the end of the line for non-file-backed mappings cannot be omitted, as it breaks some PID map parsers. This still retains insignificant leading zeros from printed hex values to maintain the current output format. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: LibXZR <xzr467706992@163.com>	2022-11-12 11:24:17 +00:00
Sultan Alsawaf	2d1025e96a	msm: msm_bus: Don't enable QoS clocks when none are present There's no point in enabling QoS clocks when are none of them for certain clients. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>	2022-11-12 11:24:16 +00:00
Sultan Alsawaf	f3df72d95e	msm: kgsl: Use lock-less list for page pools Page pool additions and removals are very hot during GPU workloads, so they should be optimized accordingly. We can use a lock-less list for storing the free pages in order to speed things up. The lock-less list allows for one llist_del_first() user and unlimited llist_add() users to run concurrently, so only a spin lock around the llist_del_first() is needed; everything else is lock-free. The per-pool page count is now an atomic to make it lock-free as well. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: LibXZR <xzr467706992@163.com>	2022-11-12 11:24:16 +00:00
Sultan Alsawaf	da748e58a9	drm/msm/sde: Don't clear dim layers when there aren't any applied Clearing dim layers indiscriminately for each blend stage on each commit wastes a lot of CPU time since the clearing process is heavy on register accesses. We can optimize this by only clearing dim layers when they're actually set, and only clearing them on a per-stage basis at that. This reduces display commit latency considerably. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:16 +00:00
Sultan Alsawaf	03af212bd8	msm: kgsl: Don't busy wait for fenced GMU writes when possible The most frequent user of fenced GMU writes, adreno_ringbuffer_submit(), performs a fenced GMU write under a spin lock, and since fenced GMU writes use udelay(), a lot of CPU cycles are burned here. Not only is the spin lock held for longer than necessary (because the write doesn't need to be inside the spin lock), but also a lot of CPU time is wasted in udelay() for tens of microseconds when usleep_range() can be used instead. Move the locked fenced GMU writes to outside their spin locks and make adreno_gmu_fenced_write() use usleep_range() when not in atomic/IRQ context, to save power and improve performance. Fenced GMU writes are found to take an average of 28 microseconds on the Snapdragon 855, so a usleep range of 10 to 30 microseconds is optimal. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:16 +00:00
Sultan Alsawaf	0871f5ceea	msm: kgsl: Remove unneeded time profiling from ringbuffer submission The time profiling here is only used to provide additional debug info for a context dump as well as a tracepoint. It adds non-trivial overhead to ringbuffer submission since it accesses GPU registers, so remove it along with the tracepoint since we're not debugging adreno. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:15 +00:00
Sultan Alsawaf	a8052f1777	scsi: ufs: Add simple IRQ-affined PM QoS operations Qualcomm's PM QoS solution suffers from a number of issues: applying PM QoS to all CPUs, convoluted spaghetti code that wastes CPU cycles, and keeping PM QoS applied for 10 ms after all requests finish processing. This implements a simple IRQ-affined PM QoS mechanism for each UFS adapter which uses atomics to elide locking, and enqueues a worker to apply PM QoS to the target CPU as soon as a command request is issued. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: alk3pInjection <webmaster@raspii.tech>	2022-11-12 11:24:15 +00:00
Panchajanya1999	67322ffa74	drivers/char: adsprpc: Remove Qcom's PM_QoS implementation Qualcomm's QoS implementation wastes a significant power from CPU cycles. Scrap the QoS bits and save a bit power without hurting any functionality. Change-Id: I1de3563d9c99ba863f10a90a900d290bdd8e6b79 Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live> Signed-off-by: Carlos Ayrton Lopez Arroyo <15030201@itcelaya.edu.mx>	2022-11-12 11:24:15 +00:00
Sultan Alsawaf	d89f76b07e	scsi: ufs: Scrap Qualcomm's PM QoS implementation This implementation is completely over the top and wastes lots of CPU cycles. It's too convoluted to fix, so just scrap it to make way for a simpler solution. This purges every PM QoS reference in the UFS drivers. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: alk3pInjection <webmaster@raspii.tech>	2022-11-12 11:24:15 +00:00
Sultan Alsawaf	c0b8dc4d0a	qos: Remove pm_qos_update_request_timeout() API Using a timeout for a PM QoS request can lead to disastrous results on power consumption. It's always possible to find a fixed scope in which a PM QoS request should be applied, so timeouts aren't ever strictly needed; they're usually just a lazy way of using PM QoS. Remove the API so that it cannot be abused any longer. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:15 +00:00
Sultan Alsawaf	28c9e970a8	msm: kgsl: Remove L2PC PM QoS feature KGSL already has PM QoS covering what matters. The L2PC PM QoS code is not only unneeded, but also unused, so remove it. It's poorly designed anyway since it uses a timeout with PM QoS, which is drastically bad for power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:14 +00:00
Sultan Alsawaf	85bc91328d	drm/msm/sde: Don't read and clear VBIF errors upon commit Reading and clearing any errors from the VBIF error registers takes a significant amount of time during kickoff, and is only used to produce debug logs when errors are detected. Since we're not debugging hardware issues in MDSS, remove the VBIF error clearing entirely to reduce display rendering latency. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:14 +00:00
Sultan Alsawaf	952dfb1b3f	drm/msm/sde: Remove redundant write memory barriers from IRQ routines Explicit write memory barriers are unneeded here since releasing a lock already implies a full memory barrier. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-11-12 11:24:14 +00:00

1 2 3 4 5 ...

870599 Commits