Makefile: LTO Tweaks

* Ensure -O3 is always set no matter linker (doesn't impact release
   builds).
 * Enable fwhole-program-vtables with LTO for better inlining decisions
   (0.00489045% binary size decrease).
 * Set import-instr-limit to 40:
   * Decreases output size by 10.0308%, and also where measurable
     performance changes stop occurring. Chromium found 10 was a good
     limit for performance/binary size, and AOSP found 5 was a good
     compromise. However we're a kernel, and a bit different.

import-instr-limit tests (compared to no limit):
import-instr-limit=10: 15.1171% binary size decrease.
import-instr-limit=20: 15.1025% binary size decrease.
import-instr-limit=30: 10.0455% binary size decrease.
import-instr-limit=40: 10.0308% binary size decrease.
import-instr-limit=50: 5.01785% binary size decrease.
import-instr-limit=60: 5.01296%% binary size decrease.

Makefile: re address lto tweaks

Subsequent to 28d40c3798

After additional clean testing, it was found 20 is the reasonable limit
before any measurable performance loss occurs.

Makefile: re address lto tweaks

All previous testing was embarrassingly flawed.

Since further investigation, the upstream determined 5 is a good fit.
This commit is contained in:
mikairyuu 2021-12-24 19:19:03 +10:00 committed by spakkkk
parent 46064cafed
commit 7285dd1a70

View File

@ -652,6 +652,7 @@ LLVM_NM := llvm-nm
export LLVM_AR LLVM_NM
# Set O3 optimization level for LTO
LDFLAGS += --plugin-opt=O3
LDFLAGS += --plugin-opt=-import-instr-limit=5
endif
ifdef CONFIG_LTO_GCC
@ -732,6 +733,9 @@ KBUILD_CFLAGS += -Os
else
KBUILD_CFLAGS += -O2
endif
ifdef CONFIG_LTO_CLANG
KBUILD_CFLAG += -fwhole-program-vtables
endif
# Tell gcc to never replace conditional load with a non-conditional one
KBUILD_CFLAGS += $(call cc-option,--param=allow-store-data-races=0)