Commit Graph

870686 Commits

Author SHA1 Message Date
Ritesh Harjani
70cc7eeb24 ext4: optimize file overwrites
In case if the file already has underlying blocks/extents allocated
then we don't need to start a journal txn and can directly return
the underlying mapping. Currently ext4_iomap_begin() is used by
both DAX & DIO path. We can check if the write request is an
overwrite & then directly return the mapping information.

This could give a significant perf boost for multi-threaded writes
specially random overwrites.
On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement
could be seen in random writes (overwrite). Also bcoz this optimizes
away the spinlock contention during jbd2 slab cache allocation
(jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/88e795d8a4d5cd22165c7ebe857ba91d68d8813e.1600401668.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-12 11:24:42 +00:00
Park Ju Hyung
4d8e9a4708 quota_tree: Avoid dynamic memory allocations
Most allocations done here are rather small and can fit on the stack,
eliminating the need to allocate them dynamically. Reserve a 1024B
stack buffer for this purpose to avoid the overhead of dynamic
memory allocation.

1024B covers most use cases, and higher values were observed to cause
stack corruptions.

Co-authored-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
2022-11-12 11:24:42 +00:00
Park Ju Hyung
33ce619f2f techpack: display: add some bp hints to hot paths
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
[@0ctobot: Adapted for 4.19]
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:41 +00:00
Park Ju Hyung
aeef1bc25e drm/msm: use kmem_cache pool for struct vblank_work
These get allocated and freed millions of times on this kernel tree.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
2022-11-12 11:24:41 +00:00
Park Ju Hyung
d2baa1092e kthread: use buffer from the stack space
struct kthread_create_info is small enough to fit perfectly under
the stack space.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:41 +00:00
Park Ju Hyung
518acbdb88 exec: use bprm from the stack space
struct linux_binprm isn't big and is safe to use from the stack space

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
[@0ctobot: Adapted for 4.19]
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:41 +00:00
Park Ju Hyung
60b66cad88 sched: do not allocate window cpu arrays separately
These are allocated extremely frequently.

Allocate them with CONFIG_NR_CPUS upon struct ravg's allocation.

This will break walt debug tracings.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:41 +00:00
Park Ju Hyung
19d0fb968c power_supply: don't allocate attrname
healthd queries this extremely frequently and attrname is allocated
and de-allocated repeatedly.

Use the stack space instead.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:40 +00:00
Park Ju Hyung
26d295a1b4 dma_buf: try to use kmem_cache pool for dmabuf allocations
These get allocated and freed millions of times on this kernel tree.
Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Most allocations' size is:
(sizeof(struct dma_buf) + sizeof(struct reservation_object)).

Put those under kmem_cache pool and distinguish them with dmabuf->from_kmem
flag.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
[@0ctobot: Adapted for 4.19]
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:40 +00:00
Park Ju Hyung
774d7449d5 dma_buf: use kmem_cache pool for struct sync_file
These get allocated and freed millions of times on this kernel tree.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:40 +00:00
Park Ju Hyung
515f0624db dma_buf: use kmem_cache pool for struct dma_buf_attachment
These get allocated and freed millions of times on this kernel tree.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:40 +00:00
Park Ju Hyung
80a6eade76 dcache: increase DNAME_INLINE_LEN
Most dentry allocations exceed 32B.

Increase it by 192 bytes to accommodate larger allocation requests.
This still ensures 64 bytes cacheline alignments.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:39 +00:00
Park Ju Hyung
737ee0e2bf techpack: camera: use kmem_cache pool for struct sync_user_payload
These get allocated and freed millions of times on this kernel tree.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
[@0ctobot: Adapted for 4.19]
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:39 +00:00
Park Ju Hyung
4c374f8565 kernfs: use kmem_cache pool for struct kernfs_open_node/file
These get allocated and freed millions of times on this kernel tree.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:39 +00:00
Park Ju Hyung
4c2bbf266f sdcardfs: use kmem_cache pool for struct sdcardfs_file_info
These get allocated and freed millions of times on this kernel tree.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:39 +00:00
Park Ju Hyung
096c92b442 cgroup: use kmem_cache pool for struct cgrp_cset_link
These get allocated and freed millions of times on Android.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:39 +00:00
Park Ju Hyung
ea73c00adc trace: Add a CONFIG_DISABLE_TRACE_PRINTK option.
* Poorly made kernel trees often use trace_printk() without
   properly guarding them in a #ifdef macro.

 * Such usage of trace_printk() causes a warning at
   boot and additional memory allocation.

This option serves to disable those all at once with ease.

Change-Id: I3edd80bdc0cc6763c7184017f8c0a15de06952bb
Signed-off-by: starlight5234 <starlight5234@protonmail.ch>
2022-11-12 11:24:38 +00:00
Park Ju Hyung
0752ac29bb writeback: hardcode dirty_expire_centisecs=3000 (30s)
https://android-review.googlesource.com/c/platform/system/core/+/938362

Hardcode this and make /proc/sys/vm/dirty_expire_centisecs a no-op.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
(cherry picked from commit b2c3e47759aade606691d5c67f250005a0f2fe1c)
(cherry picked from commit b95894f5fa7a8f8df1d14977b04ca8f255cfa373)
(cherry picked from commit bc361a6fd917533e6bb00eb8dc313d8d88aad044)
(cherry picked from commit 89385d7e3e4f1c141667ee9756f7b916c1548830)
(cherry picked from commit faa16d5fa01d646155ac8c2e749b746eca653dc2)
(cherry picked from commit 9d0a6443278bb12df5bd36dd77ca12cca50b992f)
(cherry picked from commit 6f18d30a6241074b0f4c0a602f74e9abc35898e8)
(cherry picked from commit 3a619a8c02a6b082f6b5a40de34b3c0a6a939b5c)
(cherry picked from commit a4ede9b9dc48c513858a3b465432891b4012b182)
(cherry picked from commit 18f18de6613a63b2ff2dc657fee8e472d090dca8)
(cherry picked from commit 684b4fa2175183a7424280f270aa4a30e5c818ad)
(cherry picked from commit 417a0b93c3616a26d5c4e5272b628c6e5860c53d)
2022-11-12 11:24:38 +00:00
Juhyung Park
c0b3686526 sched: promote nodes out of CONFIG_SCHED_DEBUG
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:38 +00:00
Jason Edson
87d7deab74 sched_clock: Remove cycles count in suspend and resume
This reverts commit 0205e659f4.

Change-Id: If08a71e7555a74f98e1af7d73604b6fcbb61765a
Signed-off-by: Jason Edson <jaysonedson@gmail.com>
2022-11-12 11:24:38 +00:00
Kyle Lin
9ba5b76315 cpufreq: stats: replace the global lock with atomic
We want to reduce the lock contention so replace the global lock with
atomic.

bug: 127722781
Change-Id: I08ed3d55bf6bf17f31f4017c82c998fb513bad3e
Signed-off-by: Kyle Lin <kylelin@google.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2022-11-12 11:24:38 +00:00
LibXZR
e3dc7d8199 ion: Limit max active to 1 for the buffer freeing workers
* This is not something time critical. The load of the workqueues can
sometimes be very high, especially when unbound workqueues are restricted
to small cluster, bringing notable lags to userspace. Limit it with
max_active = 1 to reduce the instantaneous load.

* This fixes UI lags after running faceunlock on OOS.

Signed-off-by: LibXZR <xzr467706992@163.com>
2022-11-12 11:24:37 +00:00
Chris Lew
1582fcb2ee soc: qcom: smp2p_sleepstate: Add suspend delay
Add delay in smp2p_sleepstate suspend path to prevent wakeup loop with
non wakeup streaming from sensors. This delay should be the maximum
time for sensors samples in flight to be drained.

Change-Id: I79f944caa2ccdd65dc1649aef8d6359f44612479
Signed-off-by: Chris Lew <clew@codeaurora.org>
Signed-off-by: Ananth Raghavan Subramanian <sananth@codeaurora.org>
Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
Bug: 123377615
Bug: 131260677
2022-11-12 11:24:37 +00:00
celtare21
a6700dbb48 rmnet_ipa: Fix netdev watchdog triggering on suspend
Sometimes remnet_ipa fails to suspend with the following trace:

NETDEV WATCHDOG: rmnet_ipa0 (): transmit queue 0 timed out

Signed-off-by: celtare21 <celtare21@gmail.com>
2022-11-12 11:24:37 +00:00
Weiyi Chen
c203bbded3 qmi_rmnet: Do not use alarm timer for DFC
Not using alarm timer for dfc powersave check and eliminate the need
for a wakelock. This allows AP to go to suspend quicker.

Change-Id: I7153055d0231a65125ad88808db9e1d0032f24d9
Signed-off-by: Weiyi Chen <quic_weiyic@quicinc.com>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
2022-11-12 11:24:37 +00:00
LibXZR
ce3006a555 subsystem_restart: Always performs soft resets when subsystems crash
* Oneplus is using a dirty list and userspace nodes to override restart_level
for some subsystems. Why not just ignore all RESET_SOC and use a soft reset
instead? Nobody wanna to see their device goes randomly reboot sometimes.

* Note that commit 578bed2 is also needed to fix broken irq free when soft
reset the modem.

Signed-off-by: LibXZR <xzr467706992@163.com>
2022-11-12 11:24:37 +00:00
Yaroslav Furman
aefe7a673f drivers: power: Add timeouts to wakelocks
These can get stuck sometimes and prevert system from sleeping.

Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: alk3pInjection <webmaster@raspii.tech>
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
2022-11-12 11:24:36 +00:00
Danny Lin
3078e78e38 tcp: Enable ECN negotiation by default
This is now the default for all connections in iOS 11+, and we have
RFC 3168 ECN fallback to detect and disable ECN for broken flows.

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2022-11-12 11:24:36 +00:00
spakkkk
3d75a15a21 arm64: config: disable seccomp 2022-11-12 11:24:36 +00:00
Park Ju Hyung
1bc273fcb0 kernel: fake system calls on seccomp to succeed
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
2022-11-12 11:24:36 +00:00
spakkkk
f0625025f0 thermal: tsens: remove unused 2022-11-12 11:24:36 +00:00
Tyler Nijmeh
39420e241e block: Do not collect I/O statistics
This adds a great deal of latency to block requests due to disabling
preemption temporarily during statistics collection.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
2022-11-12 11:24:35 +00:00
Zlatan Radovanovic
2eaf0ecf44 Makefile: Enable opaque pointers mode
https://llvm.org/docs/OpaquePointers.html

Signed-off-by: Zlatan Radovanovic <zlatan.radovanovic@fet.ba>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2022-11-12 11:24:35 +00:00
Tashfin Shakeer Rhythm
b597386ece Makefile: Pass more feature modifiers to -march
• armv8.1-a: Includes armv8-a, +crc, +lse, +rdma by default.
• fp16: Enables FP16 extension. This also enables floating-point instructions.
• rcpc: Enables the RcPc extension. This is passed on to the assembler,
	enabling inline asm statements to use instructions from the RcPc extension.

Lookup: https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/AArch64-Options.html
Test: A successful build with both Clang & GCC. Kernel booted just fine.

Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2022-11-12 11:24:35 +00:00
mikairyuu
96243a34cc Makefile: allow missing prototypes 2022-11-12 11:24:35 +00:00
mikairyuu
34ead8f146 Makefile: return -O3
looks like it got fixed on latest clang-15 builds
2022-11-12 11:24:35 +00:00
John Galt
45aa6f5477 Makefile/Polly: update optimizations 2022-11-12 11:24:34 +00:00
mikairyuu
ab85d010c5 arm64: config: enforce LTO/Polly and SCS 2022-11-12 11:24:34 +00:00
mikairyuu
77aef8ee79 Kbuild: Make polly enabled by default 2022-11-12 11:24:34 +00:00
mikairyuu
d92b446b14 arm64: config: Enable optimize inlining 2022-11-12 11:24:34 +00:00
kondors1995
95410c538b Makefile: update polly and inline optimizations 2022-11-12 11:24:34 +00:00
Diab Neiroukh
2c336746e6 kbuild: Add a config to optimise inlining.
Increased inlining can provide an improvement in performance, but also
increases the size of the final kernel image. This config peovides a bit
more control on how much is inlined to further optimise the kernel for
certain workloads.

Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev>
2022-11-12 11:24:33 +00:00
mikairyuu
7285dd1a70 Makefile: LTO Tweaks
* Ensure -O3 is always set no matter linker (doesn't impact release
   builds).
 * Enable fwhole-program-vtables with LTO for better inlining decisions
   (0.00489045% binary size decrease).
 * Set import-instr-limit to 40:
   * Decreases output size by 10.0308%, and also where measurable
     performance changes stop occurring. Chromium found 10 was a good
     limit for performance/binary size, and AOSP found 5 was a good
     compromise. However we're a kernel, and a bit different.

import-instr-limit tests (compared to no limit):
import-instr-limit=10: 15.1171% binary size decrease.
import-instr-limit=20: 15.1025% binary size decrease.
import-instr-limit=30: 10.0455% binary size decrease.
import-instr-limit=40: 10.0308% binary size decrease.
import-instr-limit=50: 5.01785% binary size decrease.
import-instr-limit=60: 5.01296%% binary size decrease.

Makefile: re address lto tweaks

Subsequent to 28d40c3798

After additional clean testing, it was found 20 is the reasonable limit
before any measurable performance loss occurs.

Makefile: re address lto tweaks

All previous testing was embarrassingly flawed.

Since further investigation, the upstream determined 5 is a good fit.
2022-11-12 11:24:33 +00:00
idkwhoiam322
46064cafed Makefile: strip debug always 2022-11-12 11:24:33 +00:00
kdrag0n
927592765e Makefile: Set --lto-O3 LLD linker flag when building with clang LTO
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Change-Id: I87faddd3bb5ca6e132ff3831bfddd2a0b4511fb9
2022-11-12 11:24:33 +00:00
Adam W. Willis
cfc65fcfc4 Kbuild: Support LLVM Polyhedral Loop Optimizations
Polly is able to optimize various loops throughout the kernel for cache
locality. A mathematical representation of the program, based on
polyhedra, is analysed to find opportunistic optimisations in memory
access patterns which then leads to loop transformations.

I generally see static Kconfig entries being created to enable
these flags, which inevitably breaks builds and emits misleading
errors for those trying to compile the kernel with standard Clang
toolchains (AOSP, Snapdragon, etc.) which do not natively support
Polly.

Let's instead take advantage of Kconfig compile time checks and
determine compatibility dynamically, similarly to how RELR
relocations and other compiler specific features are handled on
4.19.

[0ctobot: Based on kdrag0n/proton_bluecross@0537f23]

Change-Id: I8c8e4e62f54dc4f84b043030b75d745039c786e8
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Co-authored-by: Diab Neiroukh <lazerl0rd@thezest.dev>

Makefile: Restore support for -polly-run-dce

This flag was initially omitted due to incompatibility with Clang
13 development builds, which has since been resolved.

Change-Id: I8f75c6498df1d3e2c7886da9d0c15446a971edc4
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:33 +00:00
Danny Lin
0d7292a8ba Makefile: Use O3 optimization level for Clang LTO
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2022-11-12 11:24:32 +00:00
spakkkk
a3a0b6d9f7 arm64: config: enable DCE 2022-11-12 11:24:32 +00:00
Adam W. Willis
3bdb4cf8c6 Kbuild: Select LD_DEAD_CODE_DATA_ELIMINATION with LTO_CLANG
This selection being the default behavior for Clang LTO has been
reverted upstream over concerns that the use of -gc-sections
carries a greater potential risk of breakage.

That being said, as someone who has been using these features in
tandem for several years to no ill effect, this is a risk that I
am willing to take in order to trim the fat from LTO's thick
kernel images and potentially reduce boot times.

Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
2022-11-12 11:24:32 +00:00
spakkkk
a020141124 arm64: config: Switch to LLD, enable RELR and ThinLTO 2022-11-12 11:24:32 +00:00