android_kernel_xiaomi_sm7250

Author	SHA1	Message	Date
Greg Kroah-Hartman	5a7d5998b9	ANDROID: GKI: mount.h: add Android ABI padding to some structures Try to mitigate potential future driver core api changes by adding a padding to struct vfsmount. Based on a patch from Michal Marek <mmarek@suse.cz> from the SLES kernel Bug: 151154716 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I9ce1b63f05c90af168eeea1312ac88d3cc5cfdf3	2020-05-01 15:18:12 +02:00
Greg Kroah-Hartman	cab4399ebf	This is the 4.19.47 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzxMDsACgkQONu9yGCS aT6kjxAAvyKCDNGaQgBXGXe6xvBK7ad+mk+MU6WVycN+PIQzA8zVfR7RcGJgEP8t 65QrePyacMe5bmSgTnUKGz6DpwpWbCMamyftjoaPWIhxDFmQy7FB9ANOVoPDBw49 +jKT30ioBSI6LYQDU4xhxDO01HVEXmmLZquqYDfLoHOMLeXCivfTlM7PQPfxMZzn fdeLtvfCnfMiftkXZjGqaWoUAnrlTmncQk9nXMcDgxrGy9pHJ6B1WWE5ygqr4Z5s MaKfHotCSYD/eP8JsyIdJg+iMESv5Z0ZDCjbVslm81fCLeUD6atdnmpYxCphmT7Q ifN23i4FJrXBX4xLpD9RYzavH3+hQzqb2pt02aBZRW0OLFK0qjYrkdayjwWGOIUI zK9bgfHiiNKtoyvakQJ09uMhpO2thWeTMh8a6iBLTQ5Koi60adW1l5GrPuTTZYG7 V8xNB2cLsUktDsAr1I/kwrCMlE/oNFgy2La5zMzmELnFTUJRlMAoAGaa1DPcOFLt QVdT8luJMu+1KTeMZPoK/7QQGszMDTAot4+Ys56KyPQ6zN/rGr2vm7kHYn41FyEp KXpyeIm0/RKxUysz2Fyx+dL75e4ZhBof+amQ7Kotz6bF45o+ZwJ7THT6XKIOoln5 E7iI5RhQMZ2WuVIHLKa0QFBRn4RPpzie1lwOXWAAF6oqNgewXHw= =fjPE -----END PGP SIGNATURE----- Merge 4.19.47 into android-4.19 Changes in 4.19.47 x86: Hide the int3_emulate_call/jmp functions from UML ext4: do not delete unlinked inode from orphan list on failed truncate ext4: wait for outstanding dio during truncate in nojournal mode f2fs: Fix use of number of devices KVM: x86: fix return value for reserved EFER bio: fix improper use of smp_mb__before_atomic() sbitmap: fix improper use of smp_mb__before_atomic() Revert "scsi: sd: Keep disk read-only when re-reading partition" crypto: vmx - CTR: always increment IV as quadword mmc: sdhci-iproc: cygnus: Set NO_HISPD bit to fix HS50 data hold time problem mmc: sdhci-iproc: Set NO_HISPD bit to fix HS50 data hold time problem kvm: svm/avic: fix off-by-one in checking host APIC ID libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead arm64/kernel: kaslr: reduce module randomization range to 2 GB arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable gfs2: Fix sign extension bug in gfs2_update_stats btrfs: don't double unlock on error in btrfs_punch_hole Btrfs: do not abort transaction at btrfs_update_root() after failure to COW path Btrfs: avoid fallback to transaction commit during fsync of files with holes Btrfs: fix race between ranged fsync and writeback of adjacent ranges btrfs: sysfs: Fix error path kobject memory leak btrfs: sysfs: don't leak memory when failing add fsid udlfb: fix some inconsistent NULL checking fbdev: fix divide error in fb_var_to_videomode NFSv4.2 fix unnecessary retry in nfs4_copy_file_range NFSv4.1 fix incorrect return value in copy_file_range bpf: add bpf_jit_limit knob to restrict unpriv allocations brcmfmac: assure SSID length from firmware is limited brcmfmac: add subtype check for event handling in data path arm64: errata: Add workaround for Cortex-A76 erratum #1463225 btrfs: honor path->skip_locking in backref code ovl: relax WARN_ON() for overlapping layers use case fbdev: fix WARNING in __alloc_pages_nodemask bug media: cpia2: Fix use-after-free in cpia2_exit media: serial_ir: Fix use-after-free in serial_ir_init_module media: vb2: add waiting_in_dqbuf flag media: vivid: use vfree() instead of kfree() for dev->bitmap_cap ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit bpf: devmap: fix use-after-free Read in __dev_map_entry_free batman-adv: mcast: fix multicast tt/tvlv worker locking at76c50x-usb: Don't register led_trigger if usb_register_driver failed acct_on(): don't mess with freeze protection Revert "btrfs: Honour FITRIM range constraints during free space trim" gfs2: Fix lru_count going negative cxgb4: Fix error path in cxgb4_init_module NFS: make nfs_match_client killable IB/hfi1: Fix WQ_MEM_RECLAIM warning gfs2: Fix occasional glock use-after-free mmc: core: Verify SD bus width tools/bpf: fix perf build error with uClibc (seen on ARC) selftests/bpf: set RLIMIT_MEMLOCK properly for test_libbpf_open.c bpftool: exclude bash-completion/bpftool from .gitignore pattern dmaengine: tegra210-dma: free dma controller in remove() net: ena: gcc 8: fix compilation warning hv_netvsc: fix race that may miss tx queue wakeup Bluetooth: Ignore CC events not matching the last HCI command pinctrl: zte: fix leaked of_node references ASoC: Intel: kbl_da7219_max98357a: Map BTN_0 to KEY_PLAYPAUSE usb: dwc2: gadget: Increase descriptors count for ISOC's usb: dwc3: move synchronize_irq() out of the spinlock protected block ASoC: hdmi-codec: unlock the device on startup errors powerpc/perf: Return accordingly on invalid chip-id in powerpc/boot: Fix missing check of lseek() return value powerpc/perf: Fix loop exit condition in nest_imc_event_init ASoC: imx: fix fiq dependencies spi: pxa2xx: fix SCR (divisor) calculation brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler() ACPI / property: fix handling of data_nodes in acpi_get_next_subnode() drm/nouveau/bar/nv50: ensure BAR is mapped media: stm32-dcmi: return appropriate error codes during probe ARM: vdso: Remove dependency with the arch_timer driver internals arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable powerpc/watchdog: Use hrtimers for per-CPU heartbeat sched/cpufreq: Fix kobject memleak scsi: qla2xxx: Fix a qla24xx_enable_msix() error path scsi: qla2xxx: Fix abort handling in tcm_qla2xxx_write_pending() scsi: qla2xxx: Avoid that lockdep complains about unsafe locking in tcm_qla2xxx_close_session() scsi: qla2xxx: Fix hardirq-unsafe locking x86/modules: Avoid breaking W^X while loading modules Btrfs: fix data bytes_may_use underflow with fallocate due to failed quota reserve btrfs: fix panic during relocation after ENOSPC before writeback happens btrfs: Don't panic when we can't find a root key iwlwifi: pcie: don't crash on invalid RX interrupt rtc: 88pm860x: prevent use-after-free on device remove rtc: stm32: manage the get_irq probe defer case scsi: qedi: Abort ep termination if offload not scheduled s390/kexec_file: Fix detection of text segment in ELF loader sched/nohz: Run NOHZ idle load balancer on HK_FLAG_MISC CPUs w1: fix the resume command API s390: qeth: address type mismatch warning dmaengine: pl330: _stop: clear interrupt status mac80211/cfg80211: update bss channel on channel switch libbpf: fix samples/bpf build failure due to undefined UINT32_MAX slimbus: fix a potential NULL pointer dereference in of_qcom_slim_ngd_register ASoC: fsl_sai: Update is_slave_mode with correct value mwifiex: prevent an array overflow rsi: Fix NULL pointer dereference in kmalloc net: cw1200: fix a NULL pointer dereference nvme: set 0 capacity if namespace block size exceeds PAGE_SIZE nvme-rdma: fix a NULL deref when an admin connect times out crypto: sun4i-ss - Fix invalid calculation of hash end bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set bcache: return error immediately in bch_journal_replay() bcache: fix failure in journal relplay bcache: add failure check to run_cache_set() for journal replay bcache: avoid clang -Wunintialized warning RDMA/cma: Consider scope_id while binding to ipv6 ll address vfio-ccw: Do not call flush_workqueue while holding the spinlock vfio-ccw: Release any channel program when releasing/removing vfio-ccw mdev x86/build: Move _etext to actual end of .text smpboot: Place the __percpu annotation correctly x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault() mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions Bluetooth: hci_qca: Give enough time to ROME controller to bootup. HID: logitech-hidpp: use RAP instead of FAP to get the protocol version pinctrl: pistachio: fix leaked of_node references pinctrl: samsung: fix leaked of_node references clk: rockchip: undo several noc and special clocks as critical on rk3288 perf/arm-cci: Remove broken race mitigation dmaengine: at_xdmac: remove BUG_ON macro in tasklet media: coda: clear error return value before picture run media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper media: au0828: stop video streaming only when last user stops media: ov2659: make S_FMT succeed even if requested format doesn't match audit: fix a memory leak bug media: stm32-dcmi: fix crash when subdev do not expose any formats media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable() media: pvrusb2: Prevent a buffer overflow iio: adc: stm32-dfsdm: fix unmet direct dependencies detected block: fix use-after-free on gendisk powerpc/numa: improve control of topology updates powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWX random: fix CRNG initialization when random.trust_cpu=1 random: add a spinlock_t to struct batched_entropy cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock sched/core: Check quota and period overflow at usec to nsec conversion sched/rt: Check integer overflow at usec to nsec conversion sched/core: Handle overflow in cpu_shares_write_u64 staging: vc04_services: handle kzalloc failure drm/msm: a5xx: fix possible object reference leak irq_work: Do not raise an IPI when queueing work on the local CPU thunderbolt: Take domain lock in switch sysfs attribute callbacks s390/qeth: handle error from qeth_update_from_chp_desc() USB: core: Don't unbind interfaces following device reset failure x86/irq/64: Limit IST stack overflow check to #DB stack drm: etnaviv: avoid DMA API warning when importing buffers phy: sun4i-usb: Make sure to disable PHY0 passby for peripheral mode phy: mapphone-mdm6600: add gpiolib dependency i40e: Able to add up to 16 MAC filters on an untrusted VF i40e: don't allow changes to HW VLAN stripping on active port VLANs ACPI/IORT: Reject platform device creation on NUMA node mapping failure arm64: vdso: Fix clock_getres() for CLOCK_REALTIME RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure perf/x86/msr: Add Icelake support perf/x86/intel/rapl: Add Icelake support perf/x86/intel/cstate: Add Icelake support hwmon: (vt1211) Use request_muxed_region for Super-IO accesses hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses hwmon: (pc87427) Use request_muxed_region for Super-IO accesses hwmon: (f71805f) Use request_muxed_region for Super-IO accesses scsi: libsas: Do discovery on empty PHY to update PHY info mmc: core: make pwrseq_emmc (partially) support sleepy GPIO controllers mmc_spi: add a status check for spi_sync_locked mmc: sdhci-of-esdhc: add erratum eSDHC5 support mmc: sdhci-of-esdhc: add erratum A-009204 support mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support drm/amdgpu: fix old fence check in amdgpu_fence_emit PM / core: Propagate dev->power.wakeup_path when no callbacks clk: rockchip: Fix video codec clocks on rk3288 extcon: arizona: Disable mic detect if running when driver is removed clk: rockchip: Make rkpwm a critical clock on rk3288 s390: zcrypt: initialize variables before_use x86/microcode: Fix the ancient deprecated microcode loading method s390/mm: silence compiler warning when compiling without CONFIG_PGSTE s390: cio: fix cio_irb declaration selftests: cgroup: fix cleanup path in test_memcg_subtree_control() qmi_wwan: Add quirk for Quectel dynamic config cpufreq: ppc_cbe: fix possible object reference leak cpufreq/pasemi: fix possible object reference leak cpufreq: pmac32: fix possible object reference leak cpufreq: kirkwood: fix possible object reference leak block: sed-opal: fix IOC_OPAL_ENABLE_DISABLE_MBR x86/build: Keep local relocations with ld.lld drm/pl111: fix possible object reference leak iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion iio: hmc5843: fix potential NULL pointer dereferences iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data iio: adc: ti-ads7950: Fix improper use of mlock selftests/bpf: ksym_search won't check symbols exists rtlwifi: fix a potential NULL pointer dereference mwifiex: Fix mem leak in mwifiex_tm_cmd brcmfmac: fix missing checks for kmemdup b43: shut up clang -Wuninitialized variable warning brcmfmac: convert dev_init_lock mutex to completion brcmfmac: fix WARNING during USB disconnect in case of unempty psq brcmfmac: fix race during disconnect when USB completion is in progress brcmfmac: fix Oops when bringing up interface during USB disconnect rtc: xgene: fix possible race condition rtlwifi: fix potential NULL pointer dereference scsi: ufs: Fix regulator load and icc-level configuration scsi: ufs: Avoid configuring regulator with undefined voltage range drm/panel: otm8009a: Add delay at the end of initialization arm64: cpu_ops: fix a leaked reference by adding missing of_node_put wil6210: fix return code of wmi_mgmt_tx and wmi_mgmt_tx_ext x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP x86/uaccess, signal: Fix AC=1 bloat x86/ia32: Fix ia32_restore_sigcontext() AC leak x86/uaccess: Fix up the fixup chardev: add additional check for minor range overlap RDMA/hns: Fix bad endianess of port_pd variable sh: sh7786: Add explicit I/O cast to sh7786_mm_sel() HID: core: move Usage Page concatenation to Main item ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put cxgb3/l2t: Fix undefined behaviour HID: logitech-hidpp: change low battery level threshold from 31 to 30 percent spi: tegra114: reset controller on probe kobject: Don't trigger kobject_uevent(KOBJ_REMOVE) twice. media: video-mux: fix null pointer dereferences media: wl128x: prevent two potential buffer overflows media: gspca: Kill URBs on USB device disconnect efifb: Omit memory map check on legacy boot thunderbolt: property: Fix a missing check of kzalloc thunderbolt: Fix to check the return value of kmemdup timekeeping: Force upper bound for setting CLOCK_REALTIME scsi: qedf: Add missing return in qedf_post_io_req() in the fcport offload check virtio_console: initialize vtermno value for ports tty: ipwireless: fix missing checks for ioremap overflow: Fix -Wtype-limits compilation warnings x86/mce: Fix machine_check_poll() tests for error types rcutorture: Fix cleanup path for invalid torture_type strings x86/mce: Handle varying MCA bank counts rcuperf: Fix cleanup path for invalid perf_type strings usb: core: Add PM runtime calls to usb_hcd_platform_shutdown scsi: qla4xxx: avoid freeing unallocated dma memory scsi: lpfc: avoid uninitialized variable warning selinux: avoid uninitialized variable warning batman-adv: allow updating DAT entry timeouts on incoming ARP Replies dmaengine: tegra210-adma: use devm_clk_*() helpers hwrng: omap - Set default quality thunderbolt: Fix to check return value of ida_simple_get thunderbolt: Fix to check for kmemdup failure drm/amd/display: fix releasing planes when exiting odm thunderbolt: property: Fix a NULL pointer dereference e1000e: Disable runtime PM on CNP+ tinydrm/mipi-dbi: Use dma-safe buffers for all SPI transfers igb: Exclude device from suspend direct complete optimization media: si2165: fix a missing check of return value media: dvbsky: Avoid leaking dvb frontend media: m88ds3103: serialize reset messages in m88ds3103_set_frontend media: staging: davinci_vpfe: disallow building with COMPILE_TEST drm/amd/display: Fix Divide by 0 in memory calculations drm/amd/display: Set stream->mode_changed when connectors change scsi: ufs: fix a missing check of devm_reset_control_get media: vimc: stream: fix thread state before sleep media: gspca: do not resubmit URBs when streaming has stopped media: go7007: avoid clang frame overflow warning with KASAN media: vimc: zero the media_device on probe scsi: lpfc: Fix FDMI manufacturer attribute value scsi: lpfc: Fix fc4type information for FDMI media: saa7146: avoid high stack usage with clang scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices spi : spi-topcliff-pch: Fix to handle empty DMA buffers drm/omap: dsi: Fix PM for display blank with paired dss_pll calls spi: rspi: Fix sequencer reset during initialization spi: imx: stop buffer overflow in RX FIFO flush spi: Fix zero length xfer bug ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM drm/v3d: Handle errors from IRQ setup. drm/drv: Hold ref on parent device during drm_device lifetime drm: Wake up next in drm_read() chain if we are forced to putback the event drm/sun4i: dsi: Change the start delay calculation vfio-ccw: Prevent quiesce function going into an infinite loop drm/sun4i: dsi: Enforce boundaries on the start delay NFS: Fix a double unlock from nfs_match,get_client Linux 4.19.47 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-05-31 08:14:29 -07:00
Al Viro	7c2bcb3cca	acct_on(): don't mess with freeze protection commit 9419a3191dcb27f24478d288abaab697228d28e6 upstream. What happens there is that we are replacing file->path.mnt of a file we'd just opened with a clone and we need the write count contribution to be transferred from original mount to new one. That's it. We do NOT want any kind of freeze protection for the duration of switchover. IOW, we should just use __mnt_{want,drop}_write() for that switchover; no need to bother with mnt_{want,drop}_write() there. Tested-by: Amir Goldstein <amir73il@gmail.com> Reported-by: syzbot+2a73a6ea9507b7112141@syzkaller.appspotmail.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-31 06:46:05 -07:00
Daniel Rosenberg	a618d31ac8	ANDROID: mnt: Add filesystem private data to mount points This starts to add private data associated directly to mount points. The intent is to give filesystems a sense of where they have come from, as a means of letting a filesystem take different actions based on this information. Bug: 62094374 Bug: 120446149 Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2 Signed-off-by: Daniel Rosenberg <drosen@google.com> [astrachan: Folded 89a54ed3bf68 ("ANDROID: mnt: Fix next_descendent") into this patch] Signed-off-by: Alistair Strachan <astrachan@google.com>	2018-12-05 09:48:13 -08:00
Greg Kroah-Hartman	b24413180f	License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a /uapi/ one with no licensing information in it, - file was a /uapi/ one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non /uapi/ files that summary was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a /uapi/ path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the /uapi/ ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------\|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-02 11:10:55 +01:00
Kees Cook	3859a271a0	randstruct: Mark various structs for randomization This marks many critical kernel structures for randomization. These are structures that have been targeted in the past in security exploits, or contain functions pointers, pointers to function pointer tables, lists, workqueues, ref-counters, credentials, permissions, or are otherwise sensitive. This initial list was extracted from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Left out of this list is task_struct, which requires special handling and will be covered in a subsequent patch. Signed-off-by: Kees Cook <keescook@chromium.org>	2017-06-30 12:00:51 -07:00
Eric W. Biederman	93faccbbfa	fs: Better permission checking for submounts To support unprivileged users mounting filesystems two permission checks have to be performed: a test to see if the user allowed to create a mount in the mount namespace, and a test to see if the user is allowed to access the specified filesystem. The automount case is special in that mounting the original filesystem grants permission to mount the sub-filesystems, to any user who happens to stumble across the their mountpoint and satisfies the ordinary filesystem permission checks. Attempting to handle the automount case by using override_creds almost works. It preserves the idea that permission to mount the original filesystem is permission to mount the sub-filesystem. Unfortunately using override_creds messes up the filesystems ordinary permission checks. Solve this by being explicit that a mount is a submount by introducing vfs_submount, and using it where appropriate. vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let sget and friends know that a mount is a submount so they can take appropriate action. sget and sget_userns are modified to not perform any permission checks on submounts. follow_automount is modified to stop using override_creds as that has proven problemantic. do_mount is modified to always remove the new MS_SUBMOUNT flag so that we know userspace will never by able to specify it. autofs4 is modified to stop using current_real_cred that was put in there to handle the previous version of submount permission checking. cifs is modified to pass the mountpoint all of the way down to vfs_submount. debugfs is modified to pass the mountpoint all of the way down to trace_automount by adding a new parameter. To make this change easier a new typedef debugfs_automount_t is introduced to capture the type of the debugfs automount function. Cc: stable@vger.kernel.org Fixes: `069d5ac9ae` ("autofs: Fix automounts by using current_real_cred()->uid") Fixes: `aeaa4a79ff` ("fs: Call d_automount with the filesystems creds") Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2017-02-02 04:36:12 +13:00
Al Viro	9763f7a4a5	Merge branch 'work.autofs' into for-linus Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-16 16:34:52 -05:00
Al Viro	ca71cf71ee	namespace.c: constify struct path passed to a bunch of primitives Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-05 19:03:12 -05:00
Ian Kent	c6609c0a1c	vfs: add path_is_mountpoint() helper d_mountpoint() can only be used reliably to establish if a dentry is not mounted in any namespace. It isn't aware of the possibility there may be multiple mounts using a given dentry that may be in a different namespace. Add helper functions, path_is_mountpoint(), that checks if a struct path is a mountpoint for this case. Link: http://lkml.kernel.org/r/20161011053358.27645.9729.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-03 20:51:35 -05:00
Eric W. Biederman	d29216842a	mnt: Add a per mount namespace limit on the number of mounts CAI Qian <caiqian@redhat.com> pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent <raven@themaw.net> described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2016-09-30 12:46:48 -05:00
Andy Lutomirski	380cf5ba6b	fs: Treat foreign mounts as nosuid If a process gets access to a mount from a different user namespace, that process should not be able to take advantage of setuid files or selinux entrypoints from that filesystem. Prevent this by treating mounts from other mount namespaces and those not owned by current_user_ns() or an ancestor as nosuid. This will make it safer to allow more complex filesystems to be mounted in non-root user namespaces. This does not remove the need for MNT_LOCK_NOSUID. The setuid, setgid, and file capability bits can no longer be abused if code in a user namespace were to clear nosuid on an untrusted filesystem, but this patch, by itself, is insufficient to protect the system from abuse of files that, when execed, would increase MAC privilege. As a more concrete explanation, any task that can manipulate a vfsmount associated with a given user namespace already has capabilities in that namespace and all of its descendents. If they can cause a malicious setuid, setgid, or file-caps executable to appear in that mount, then that executable will only allow them to elevate privileges in exactly the set of namespaces in which they are already privileges. On the other hand, if they can cause a malicious executable to appear with a dangerous MAC label, running it could change the caller's security context in a way that should not have been possible, even inside the namespace in which the task is confined. As a hardening measure, this would have made CVE-2014-5207 much more difficult to exploit. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2016-06-24 10:40:41 -05:00
Linus Torvalds	8f502d5b9e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull usernamespace mount fixes from Eric Biederman: "Way back in October Andrey Vagin reported that umount(MNT_DETACH) could be used to defeat MNT_LOCKED. As I worked to fix this I discovered that combined with mount propagation and an appropriate selection of shared subtrees a reference to a directory on an unmounted filesystem is not necessary. That MNT_DETACH is allowed in user namespace in a form that can break MNT_LOCKED comes from my early misunderstanding what MNT_DETACH does. To avoid breaking existing userspace the conflict between MNT_DETACH and MNT_LOCKED is fixed by leaving mounts that are locked to their parents in the mount hash table until the last reference goes away. While investigating this issue I also found an issue with __detach_mounts. The code was unnecessarily and incorrectly triggering mount propagation. Resulting in too many mounts going away when a directory is deleted, and too many cpu cycles are burned while doing that. Looking some more I realized that __detach_mounts by only keeping mounts connected that were MNT_LOCKED it had the potential to still leak information so I tweaked the code to keep everything locked together that possibly could be. This code was almost ready last cycle but Al invented fs_pin which slightly simplifies this code but required rewrites and retesting, and I have not been in top form for a while so it took me a while to get all of that done. Similiarly this pull request is late because I have been feeling absolutely miserable all week. The issue of being able to escape a bind mount has not yet been addressed, as the fixes are not yet mature" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mnt: Update detach_mounts to leave mounts connected mnt: Fix the error check in __detach_mounts mnt: Honor MNT_LOCKED when detaching mounts fs_pin: Allow for the possibility that m_list or s_list go unused. mnt: Factor umount_mnt from umount_tree mnt: Factor out unhash_mnt from detach_mnt and umount_tree mnt: Fail collect_mounts when applied to unmounted mounts mnt: Don't propagate unmounts to locked mounts mnt: On an unmount propagate clearing of MNT_LOCKED mnt: Delay removal from the mount hash. mnt: Add MNT_UMOUNT flag mnt: In umount_tree reuse mnt_list instead of mnt_hash mnt: Don't propagate umounts in __detach_mounts mnt: Improve the umount_tree flags mnt: Use hlist_move_list in namespace_unlock	2015-04-18 11:20:31 -04:00
Dan Ehrenberg	e6e20a7a5f	init: export name_to_dev_t and mark name argument as const DM will switch its device lookup code to using name_to_dev_t() so it must be exported. Also, the @name argument should be marked const. Signed-off-by: Dan Ehrenberg <dehrenberg@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:18 -04:00
Eric W. Biederman	590ce4bcbf	mnt: Add MNT_UMOUNT flag In some instances it is necessary to know if the the unmounting process has begun on a mount. Add MNT_UMOUNT to make that reliably testable. This fix gets used in fixing locked mounts in MNT_DETACH Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-04-02 20:34:18 -05:00
Miklos Szeredi	c771d683a6	vfs: introduce clone_private_mount() Overlayfs needs a private clone of the mount, so create a function for this and export to modules. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-10-24 00:14:36 +02:00
Linus Torvalds	f6f993328b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Stuff in here: - acct.c fixes and general rework of mnt_pin mechanism. That allows to go for delayed-mntput stuff, which will permit mntput() on deep stack without worrying about stack overflows - fs shutdown will happen on shallow stack. IOW, we can do Eric's umount-on-rmdir series without introducing tons of stack overflows on new mntput() call chains it introduces. - Bruce's d_splice_alias() patches - more Miklos' rename() stuff. - a couple of regression fixes (stable fodder, in the end of branch) and a fix for API idiocy in iov_iter.c. There definitely will be another pile, maybe even two. I'd like to get Eric's series in this time, but even if we miss it, it'll go right in the beginning of for-next in the next cycle - the tricky part of prereqs is in this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) fix copy_tree() regression __generic_file_write_iter(): fix handling of sync error after DIO switch iov_iter_get_pages() to passing maximal number of pages fs: mark __d_obtain_alias static dcache: d_splice_alias should detect loops exportfs: update Exporting documentation dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED dcache: remove unused d_find_alias parameter dcache: d_obtain_alias callers don't all want DISCONNECTED dcache: d_splice_alias should ignore DCACHE_DISCONNECTED dcache: d_splice_alias mustn't create directory aliases dcache: close d_move race in d_splice_alias dcache: move d_splice_alias namei: trivial fix to vfs_rename_dir comment VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode. cifs: support RENAME_NOREPLACE hostfs: support rename flags shmem: support RENAME_EXCHANGE shmem: support RENAME_NOREPLACE btrfs: add RENAME_NOREPLACE ...	2014-08-11 11:44:11 -07:00
Al Viro	3064c3563b	death to mnt_pinned Rather than playing silly buggers with vfsmount refcounts, just have acct_on() ask fs/namespace.c for internal clone of file->f_path.mnt and replace it with said clone. Then attach the pin to original vfsmount. Voila - the clone will be alive until the file gets closed, making sure that underlying superblock remains active, etc., and we can drop the original vfsmount, so that it's not kept busy. If the file lives until the final mntput of the original vfsmount, we'll notice that there's an fs_pin (one in bsd_acct_struct that holds that file) and mnt_pin_kill() will take it out. Since ->kill() is synchronous, we won't proceed past that point until these files are closed (and private clones of our vfsmount are gone), so we get the same ordering warranties we used to get. mnt_pin()/mnt_unpin()/->mnt_pinned is gone now, and good riddance - it never became usable outside of kernel/acct.c (and racy wrt umount even there). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-08-07 14:40:09 -04:00
Eric W. Biederman	9566d67428	mnt: Correct permission checks in do_remount While invesgiating the issue where in "mount --bind -oremount,ro ..." would result in later "mount --bind -oremount,rw" succeeding even if the mount started off locked I realized that there are several additional mount flags that should be locked and are not. In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime flags in addition to MNT_READONLY should all be locked. These flags are all per superblock, can all be changed with MS_BIND, and should not be changable if set by a more privileged user. The following additions to the current logic are added in this patch. - nosuid may not be clearable by a less privileged user. - nodev may not be clearable by a less privielged user. - noexec may not be clearable by a less privileged user. - atime flags may not be changeable by a less privileged user. The logic with atime is that always setting atime on access is a global policy and backup software and auditing software could break if atime bits are not updated (when they are configured to be updated), and serious performance degradation could result (DOS attack) if atime updates happen when they have been explicitly disabled. Therefore an unprivileged user should not be able to mess with the atime bits set by a more privileged user. The additional restrictions are implemented with the addition of MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME mnt flags. Taken together these changes and the fixes for MNT_LOCK_READONLY should make it safe for an unprivileged user to create a user namespace and to call "mount --bind -o remount,... ..." without the danger of mount flags being changed maliciously. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:12:34 -07:00
Eric W. Biederman	a6138db815	mnt: Only change user settable mount flags in remount Kenton Varda <kenton@sandstorm.io> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-07-31 17:11:54 -07:00
Al Viro	f2ebb3a921	smarter propagate_mnt() The current mainline has copies propagated to all nodes, then tears down the copies we made for nodes that do not contain counterparts of the desired mountpoint. That sets the right propagation graph for the copies (at teardown time we move the slaves of removed node to a surviving peer or directly to master), but we end up paying a fairly steep price in useless allocations. It's fairly easy to create a situation where N calls of mount(2) create exactly N bindings, with O(N^2) vfsmounts allocated and freed in process. Fortunately, it is possible to avoid those allocations/freeings. The trick is to create copies in the right order and find which one would've eventually become a master with the current algorithm. It turns out to be possible in O(nodes getting propagation) time and with no extra allocations at all. One part is that we need to make sure that eventual master will be created before its slaves, so we need to walk the propagation tree in a different order - by peer groups. And iterate through the peers before dealing with the next group. Another thing is finding the (earlier) copy that will be a master of one we are about to create; to do that we are (temporary) marking the masters of mountpoints we are attaching the copies to. Either we are in a peer of the last mountpoint we'd dealt with, or we have the following situation: we are attaching to mountpoint M, the last copy S_0 had been attached to M_0 and there are sequences S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i}, S_{i} mounted on M{i} and we need to create a slave of the first S_{k} such that M is getting propagation from M_{k}. It means that the master of M_{k} will be among the sequence of masters of M. On the other hand, the nearest marked node in that sequence will either be the master of M_{k} or the master of M_{k-1} (the latter - in the case if M_{k-1} is a slave of something M gets propagation from, but in a wrong peer group). So we go through the sequence of masters of M until we find a marked one (P). Let N be the one before it. Then we go through the sequence of masters of S_0 until we find one (say, S) mounted on a node D that has P as master and check if D is a peer of N. If it is, S will be the master of new copy, if not - the master of S will be. That's it for the hard part; the rest is fairly simple. Iterator is in next_group(), handling of one prospective mountpoint is propagate_one(). It seems to survive all tests and gives a noticably better performance than the current mainline for setups that are seriously using shared subtrees. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-04-01 23:19:08 -04:00
Al Viro	48a066e72d	RCU'd vfsmounts * RCU-delayed freeing of vfsmounts * vfsmount_lock replaced with a seqlock (mount_lock) * sequence number from mount_lock is stored in nameidata->m_seq and used when we exit RCU mode * new vfsmount flag - MNT_SYNC_UMOUNT. Set by umount_tree() when its caller knows that vfsmount will have no surviving references. * synchronize_rcu() done between unlocking namespace_sem in namespace_unlock() and doing pending mntput(). * new helper: legitimize_mnt(mnt, seq). Checks the mount_lock sequence number against seq, then grabs reference to mnt. Then it rechecks mount_lock again to close the race and either returns success or drops the reference it has acquired. The subtle point is that in case of MNT_SYNC_UMOUNT we can simply decrement the refcount and sod off - aforementioned synchronize_rcu() makes sure that final mntput() won't come until we leave RCU mode. We need that, since we don't want to end up with some lazy pathwalk racing with umount() and stealing the final mntput() from it - caller of umount() may expect it to return only once the fs is shut down and we don't want to break that. In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do full-blown mntput() in case of mount_lock sequence number mismatch happening just as we'd grabbed the reference, but in those cases we won't be stealing the final mntput() from anything that would care. * mntput_no_expire() doesn't lock anything on the fast path now. Incidentally, SMP and UP cases are handled the same way - no ifdefs there. * normal pathname resolution does not do any writes to mount_lock. It does, of course, bump the refcounts of vfsmount and dentry in the very end, but that's it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-09 00:16:19 -05:00
Eric W. Biederman	5ff9d8a65c	vfs: Lock in place mounts from more privileged users When creating a less privileged mount namespace or propogating mounts from a more privileged to a less privileged mount namespace lock the submounts so they may not be unmounted individually in the child mount namespace revealing what is under them. This enforces the reasonable expectation that it is not possible to see under a mount point. Most of the time mounts are on empty directories and revealing that does not matter, however I have seen an occassionaly sloppy configuration where there were interesting things concealed under a mount point that probably should not be revealed. Expirable submounts are not locked because they will eventually unmount automatically so whatever is under them already needs to be safe for unprivileged users to access. From a practical standpoint these restrictions do not appear to be significant for unprivileged users of the mount namespace. Recursive bind mounts and pivot_root continues to work, and mounts that are created in a mount namespace may be unmounted there. All of which means that the common idiom of keeping a directory of interesting files and using pivot_root to throw everything else away continues to work just fine. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-07-24 09:14:46 -07:00
Eric W. Biederman	90563b198e	vfs: Add a mount flag to lock read only bind mounts When a read-only bind mount is copied from mount namespace in a higher privileged user namespace to a mount namespace in a lesser privileged user namespace, it should not be possible to remove the the read-only restriction. Add a MNT_LOCK_READONLY mount flag to indicate that a mount must remain read-only. CC: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2013-03-27 07:50:04 -07:00
Al Viro	c63181e6b6	vfs: move fsnotify junk to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:12 -05:00
Al Viro	52ba1621de	vfs: move mnt_devname Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:11 -05:00
Al Viro	1a4eeaf2a8	vfs: move mnt_list to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:11 -05:00
Al Viro	863d684f94	vfs: move the rest of int fields to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:10 -05:00
Al Viro	15169fe784	vfs: mnt_id/mnt_group_id moved Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:10 -05:00
Al Viro	143c8c91ce	vfs: mnt_ns moved to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:09 -05:00
Al Viro	6776db3d32	vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:08 -05:00
Al Viro	d10e8def07	vfs: take mnt_master to struct mount make IS_MNT_SLAVE take struct mount * at the same time Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:08 -05:00
Al Viro	6b41d536f7	vfs: take mnt_child/mnt_mounts to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:06 -05:00
Al Viro	68e8a9feab	vfs: all counters taken to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:06 -05:00
Al Viro	a73324da7a	vfs: move mnt_mountpoint to struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:05 -05:00
Al Viro	3376f34fff	vfs: mnt_parent moved to struct mount the second victim... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:04 -05:00
Al Viro	1b8e5564b9	vfs: the first spoils - mnt_hash moved taken out of struct vfsmount into struct mount Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:57:02 -05:00
Al Viro	2a79f17e4a	vfs: mnt_drop_write_file() new helper (wrapper around mnt_drop_write()) to be used in pair with mnt_want_write_file(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:40 -05:00
Al Viro	79e801a906	vfs: make do_kern_mount() static the only user outside of fs/namespace.c has died Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:52:39 -05:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Al Viro	f03c65993b	sanitize vfsmount refcounting changes Instead of splitting refcount between (per-cpu) mnt_count and (SMP-only) mnt_longrefs, make all references contribute to mnt_count again and keep track of how many are longterm ones. Accounting rules for longterm count: * 1 for each fs_struct.root.mnt * 1 for each fs_struct.pwd.mnt * 1 for having non-NULL ->mnt_ns * decrement to 0 happens only under vfsmount lock exclusive That allows nice common case for mntput() - since we can't drop the final reference until after mnt_longterm has reached 0 due to the rules above, mntput() can grab vfsmount lock shared and check mnt_longterm. If it turns out to be non-zero (which is the common case), we know that this is not the final mntput() and can just blindly decrement percpu mnt_count. Otherwise we grab vfsmount lock exclusive and do usual decrement-and-check of percpu mnt_count. For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm(); namespace.c uses the latter in places where we don't already hold vfsmount lock exclusive and opencodes a few remaining spots where we need to manipulate mnt_longterm. Note that we mostly revert the code outside of fs/namespace.c back to what we used to have; in particular, normal code doesn't need to care about two kinds of references, etc. And we get to keep the optimization Nick's variant had bought us... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-16 13:47:07 -05:00
David Howells	ea5b778a8b	Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Unexport do_add_mount() and make ->d_automount() return the vfsmount to be added rather than calling do_add_mount() itself. follow_automount() will then do the addition. This slightly complicates things as ->d_automount() normally wants to add the new vfsmount to an expiration list and start an expiration timer. The problem with that is that the vfsmount will be deleted if it has a refcount of 1 and the timer will not repeat if the expiration list is empty. To this end, we require the vfsmount to be returned from d_automount() with a refcount of (at least) 2. One of these refs will be dropped unconditionally. In addition, follow_automount() must get a 3rd ref around the call to do_add_mount() lest it eat a ref and return an error, leaving the mount we have open to being expired as we would otherwise have only 1 ref on it. d_automount() should also add the the vfsmount to the expiration list (by calling mnt_set_expiry()) and start the expiration timer before returning, if this mechanism is to be used. The vfsmount will be unlinked from the expiration list by follow_automount() if do_add_mount() fails. This patch also fixes the call to do_add_mount() for AFS to propagate the mount flags from the parent vfsmount. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:48 -05:00
Nick Piggin	b3e19d924b	fs: scale mntget/mntput The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:33 +11:00
Miklos Szeredi	532490f0a5	vfs: remove unused MNT_STRICTATIME Commit `d0adde574b` added MNT_STRICTATIME but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and MNT_NOATIME rather than setting any mount flag). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-11 00:29:47 -04:00
Andreas Gruenbacher	2504c5d63b	fsnotify/vfsmount: add fsnotify fields to struct vfsmount This patch adds the list and mask fields needed to support vfsmount marks. These are the same fields fsnotify needs on an inode. They are not used, just declared and we note where the cleanup hook should be (the function is not yet defined) Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-07-28 09:58:57 -04:00
Linus Torvalds	0f2cc4ecd8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) init: Open /dev/console from rootfs mqueue: fix typo "failues" -> "failures" mqueue: only set error codes if they are really necessary mqueue: simplify do_open() error handling mqueue: apply mathematics distributivity on mq_bytes calculation mqueue: remove unneeded info->messages initialization mqueue: fix mq_open() file descriptor leak on user-space processes fix race in d_splice_alias() set S_DEAD on unlink() and non-directory rename() victims vfs: add NOFOLLOW flag to umount(2) get rid of ->mnt_parent in tomoyo/realpath hppfs can use existing proc_mnt, no need for do_kern_mount() in there Mirror MS_KERNMOUNT in ->mnt_flags get rid of useless vfsmount_lock use in put_mnt_ns() Take vfsmount_lock to fs/internal.h get rid of insanity with namespace roots in tomoyo take check for new events in namespace (guts of mounts_poll()) to namespace.c Don't mess with generic_permission() under ->d_lock in hpfs sanitize const/signedness for udf nilfs: sanitize const/signedness in dealing with ->d_name.name ... Fix up fairly trivial (famous last words...) conflicts in drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c	2010-03-04 08:15:33 -08:00
Al Viro	8089352a13	Mirror MS_KERNMOUNT in ->mnt_flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:08:00 -05:00
Al Viro	47cd813f29	Take vfsmount_lock to fs/internal.h no more users left outside of fs/*.c (and very few outside of fs/namespace.c, actually) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:59 -05:00
Valerie Aurora	495d6c9c65	VFS: Clean up shared mount flag propagation The handling of mount flags in set_mnt_shared() got a little tangled up during previous cleanups, with the following problems: * MNT_PNODE_MASK is defined as a literal constant when it should be a bitwise xor of other MNT_* flags * set_mnt_shared() clears and then sets MNT_SHARED (part of MNT_PNODE_MASK) * MNT_PNODE_MASK could use a comment in mount.h * MNT_PNODE_MASK is a terrible name, change to MNT_SHARED_MASK This patch fixes these problems. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-03-03 14:07:55 -05:00
Tejun Heo	003cb608a2	percpu: add __percpu sparse annotations to fs Add __percpu sparse annotations to fs. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk>	2010-02-17 11:17:38 +09:00

1 2

84 Commits