* refs/heads/tmp-93083852:
Linux 4.19.69
rxrpc: Fix local refcounting
rxrpc: Fix local endpoint replacement
rxrpc: Fix read-after-free in rxrpc_queue_local()
rxrpc: Fix local endpoint refcounting
powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB
dm zoned: fix potential NULL dereference in dmz_do_reclaim()
xfs: always rejoin held resources during defer roll
xfs: Add attibute remove and helper functions
xfs: Add attibute set and helper functions
xfs: Add helper function xfs_attr_try_sf_addname
xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
xfs: don't trip over uninitialized buffer on extent read of corrupted inode
xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
mm/zsmalloc.c: fix race condition in zs_destroy_pool
mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
mm, page_owner: handle THP splits correctly
genirq: Properly pair kobject_del() with kobject_add()
dm zoned: properly handle backing device failure
dm zoned: improve error handling in i/o map code
dm zoned: improve error handling in reclaim
dm table: fix invalid memory accesses with too high sector number
dm space map metadata: fix missing store of apply_bops() return value
dm raid: add missing cleanup in raid_ctr()
dm integrity: fix a crash due to BUG_ON in __journal_read_write()
dm btree: fix order of block initialization in btree_split_beneath
dm kcopyd: always complete failed jobs
x86/boot: Fix boot regression caused by bootparam sanitizing
x86/boot: Save fields explicitly, zero out everything else
x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
x86/apic: Handle missing global clockevent gracefully
x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAE
gpiolib: never report open-drain/source lines as 'input' to user-space
drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX
libceph: fix PG split vs OSD (re)connect race
ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
ceph: clear page dirty before invalidate page
clk: socfpga: stratix10: fix rate caclulationg for cnt_clks
Revert "dm bufio: fix deadlock with loop device"
HID: wacom: Correct distance scale for 2nd-gen Intuos devices
HID: wacom: correct misreported EKR ring values
selftests: kvm: Adding config fragments
KVM: arm: Don't write junk to CP15 registers on reset
KVM: arm64: Don't write junk to sysregs on reset
perf pmu-events: Fix missing "cpu_clk_unhalted.core" event
perf cpumap: Fix writing to illegal memory in handling cpumap mask
perf ftrace: Fix failure to set cpumask when only one cpu is present
block, bfq: handle NULL return value by bfq_init_rq()
drm/vmwgfx: fix memory leak when too many retries have occurred
x86/lib/cpu: Address missing prototypes warning
libata: add SG safety checks in SFF pio transfers
libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
net: hisilicon: Fix dma_map_single failed on arm64
net: hisilicon: fix hip04-xmit never return TX_BUSY
net: hisilicon: make hip04_tx_reclaim non-reentrant
net: stmmac: tc: Do not return a fragment entry
net: stmmac: Fix issues when number of Queues >= 4
net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
s390: put _stext and _etext into .text section
SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
SMB3: Fix potential memory leak when processing compound chain
drm/rockchip: Suspend DP late
HID: input: fix a4tech horizontal wheel custom usage
HID: quirks: Set the INCREMENT_USAGE_ON_DUPLICATE quirk on Saitek X52
NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
net/ethernet/qlogic/qed: force the string buffer NULL-terminated
can: peak_usb: force the string buffer NULL-terminated
can: sja1000: force the string buffer NULL-terminated
perf bench numa: Fix cpu0 binding
net: phy: phy_led_triggers: Fix a possible null-pointer dereference in phy_led_trigger_change_speed()
isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack
rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet
rxrpc: Fix potential deadlock
netfilter: ipset: Fix rename concurrency with listing
netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets
netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too
mac80211_hwsim: Fix possible null-pointer dereferences in hwsim_dump_radio_nl()
isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain()
qed: RDMA - Fix the hw_ver returned in device attributes
net: usb: qmi_wwan: Add the BroadMobi BM818 card
ASoC: ti: davinci-mcasp: Correct slot_width posed constraint
ASoC: rockchip: Fix mono capture
st_nci_hci_connectivity_event_received: null check the allocation
st21nfca_connectivity_event_received: null check the allocation
ASoC: Fail card instantiation if DAI format setup fails
can: gw: Fix error path of cgw_module_init
can: mcp251x: add error check when wq alloc failed
can: dev: call netif_carrier_off() in register_candev()
selftests: forwarding: gre_multipath: Fix flower filters
selftests: forwarding: gre_multipath: Enable IPv4 forwarding
net: mvpp2: Don't check for 3 consecutive Idle frames for 10G links
bonding: Force slave speed check after link state recovery for 802.3ad
selftests/bpf: fix sendmsg6_prog on s390
ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks
netfilter: ebtables: fix a memory leak bug in compat
mips: fix cacheinfo
MIPS: kernel: only use i8253 clocksource with periodic clockevent
HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT
Conflicts:
fs/userfaultfd.c
Change-Id: I35868bf90a3b2693b033ff302fbbb0dfd36b43fa
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
commit a561372405cf6bc6f14239b3a9e57bb39f2788b0 upstream.
We can't rely on ->peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not. ->peer_features is not valid unless the OSD session is open. If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.
In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap. However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).
Instead, let's drop this feature check. It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent. Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.
Cc: stable@vger.kernel.org
Fixes: 7de030d6b1 ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Jerry Lee <leisurelysw24@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-6f994bf:
Revert "ANDROID: sched: Disable find_best_target() by default"
ANDROID: cpufreq: times: don't copy invalid freqs from freq table
UPSTREAM: filemap: add a comment about FAULT_FLAG_RETRY_NOWAIT behavior
BACKPORT: filemap: drop the mmap_sem for all blocking operations
BACKPORT: filemap: kill page_cache_read usage in filemap_fault
UPSTREAM: filemap: pass vm_fault to the mmap ra helpers
ANDROID: binder: remove extra declaration left after backport
FROMGIT: binder: fix BUG_ON found by selinux-testsuite
ANDROID: sched: Disable find_best_target() by default
ANDROID: sched/fair: Make the EAS wake-up prefer-idle aware
Linux 4.19.32
power: supply: charger-manager: Fix incorrect return value
ALSA: hda - Enforces runtime_resume after S3 and S4 for each codec
ALSA: hda - Record the current power state before suspend/resume calls
locking/lockdep: Add debug_locks check in __lock_downgrade()
x86/unwind: Add hardcoded ORC entry for NULL
x86/unwind: Handle NULL pointer calls better in frame unwinder
loop: access lo_backing_file only when the loop device is Lo_bound
netfilter: ebtables: remove BUGPRINT messages
f2fs: fix to avoid deadlock of atomic file operations
RDMA/cma: Rollback source IP address if failing to acquire device
drm: Reorder set_property_atomic to avoid returning with an active ww_ctx
Bluetooth: hci_ldisc: Postpone HCI_UART_PROTO_READY bit set in hci_uart_set_proto()
Bluetooth: hci_ldisc: Initialize hci_dev before open()
Bluetooth: Fix decrementing reference count twice in releasing socket
Bluetooth: hci_uart: Check if socket buffer is ERR_PTR in h4_recv_buf()
media: v4l2-ctrls.c/uvc: zero v4l2_event
ext4: brelse all indirect buffer in ext4_ind_remove_space()
ext4: fix data corruption caused by unaligned direct AIO
ext4: fix NULL pointer dereference while journal is aborted
ALSA: ac97: Fix of-node refcount unbalance
ALSA: hda/ca0132 - make pci_iounmap() call conditional
ALSA: x86: Fix runtime PM for hdmi-lpe-audio
SMB3: Fix SMB3.1.1 guest mounts to Samba
irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
objtool: Move objtool_file struct off the stack
perf probe: Fix getting the kernel map
cifs: allow guest mounts to work for smb3.11
futex: Ensure that futex address is aligned in handle_futex_death()
scsi: ibmvscsi: Fix empty event pool access during host removal
scsi: ibmvscsi: Protect ibmvscsi_head from concurrent modificaiton
powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038
MIPS: Fix kernel crash for R6 in jump label branch function
MIPS: Ensure ELF appended dtb is relocated
mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction.
udf: Fix crash on IO error during truncate
libceph: wait for latest osdmap in ceph_monc_blacklist_add()
iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
drm/vmwgfx: Return 0 when gmrid::get_node runs out of ID's
drm/vmwgfx: Don't double-free the mode stored in par->set_mode
mmc: renesas_sdhi: limit block count to 16 bit for old revisions
mmc: mxcmmc: "Revert mmc: mxcmmc: handle highmem pages"
mmc: pxamci: fix enum type confusion
ALSA: firewire-motu: use 'version' field of unit directory to identify model
ALSA: hda - add Lenovo IdeaCentre B550 to the power_save_blacklist
ANDROID: dm-bow: Fix 32 bit compile errors
UPSTREAM: sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
UPSTREAM: sched/fair: Update scale invariance of PELT
UPSTREAM: sched/fair: Move the rq_of() helper function
UPSTREAM: sched/fair: Remove setting task's se->runnable_weight during PELT update
ANDROID: Add dm-bow to cuttlefish configuration
UPSTREAM: binder: fix handling of misaligned binder object
UPSTREAM: binder: fix sparse issue in binder_alloc_selftest.c
BACKPORT: binder: use userspace pointer as base of buffer space
UPSTREAM: binder: fix kerneldoc header for struct binder_buffer
BACKPORT: binder: remove user_buffer_offset
UPSTREAM: binder: remove kernel vm_area for buffer space
UPSTREAM: binder: avoid kernel vm_area for buffer fixups
BACKPORT: binder: add function to copy binder object from buffer
BACKPORT: binder: add functions to copy to/from binder buffers
UPSTREAM: binder: create userspace-to-binder-buffer copy function
ANDROID: dm-bow: Add dm-bow feature
f2fs: set pin_file under CAP_SYS_ADMIN
f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
f2fs: fix to do sanity check with inode.i_inline_xattr_size
f2fs: give some messages for inline_xattr_size
f2fs: don't trigger read IO for beyond EOF page
f2fs: fix to add refcount once page is tagged PG_private
f2fs: remove wrong comment in f2fs_invalidate_page()
f2fs: fix to use kvfree instead of kzfree
f2fs: print more parameters in trace_f2fs_map_blocks
f2fs: trace f2fs_ioc_shutdown
f2fs: fix to avoid deadlock of atomic file operations
f2fs: fix to dirty inode for i_mode recovery
f2fs: give random value to i_generation
f2fs: no need to take page lock in readdir
f2fs: fix to update iostat correctly in IPU path
f2fs: fix encrypted page memory leak
f2fs: make fault injection covering __submit_flush_wait()
f2fs: fix to retry fill_super only if recovery failed
f2fs: silence VM_WARN_ON_ONCE in mempool_alloc
f2fs: correct spelling mistake
f2fs: fix wrong #endif
f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG
f2fs: don't allow negative ->write_io_size_bits
f2fs: fix to check inline_xattr_size boundary correctly
Revert "f2fs: fix to avoid deadlock of atomic file operations"
Revert "f2fs: fix to check inline_xattr_size boundary correctly"
f2fs: do not use mutex lock in atomic context
f2fs: fix potential data inconsistence of checkpoint
f2fs: fix to avoid deadlock of atomic file operations
f2fs: fix to check inline_xattr_size boundary correctly
f2fs: jump to label 'free_node_inode' when failing from d_make_root()
f2fs: fix to document inline_xattr_size option
f2fs: fix to data block override node segment by mistake
f2fs: fix typos in code comments
f2fs: use xattr_prefix to wrap up
f2fs: sync filesystem after roll-forward recovery
f2fs: flush quota blocks after turnning it off
f2fs: avoid null pointer exception in dcc_info
f2fs: don't wake up too frequently, if there is lots of IOs
f2fs: try to keep CP_TRIMMED_FLAG after successful umount
f2fs: add quick mode of checkpoint=disable for QA
f2fs: run discard jobs when put_super
f2fs: fix to set sbi dirty correctly
f2fs: fix to initialize variable to avoid UBSAN/smatch warning
f2fs: UBSAN: set boolean value iostat_enable correctly
f2fs: add brackets for macros
f2fs: check if file namelen exceeds max value
f2fs: fix to trigger fsck if dirent.name_len is zero
f2fs: no need to check return value of debugfs_create functions
f2fs: export FS_NOCOW_FL flag to user
f2fs: check inject_rate validity during configuring
f2fs: remove set but not used variable 'err'
f2fs: fix compile warnings: 'struct *' declared inside parameter list
f2fs: change error code to -ENOMEM from -EINVAL
Conflicts:
drivers/md/Kconfig
kernel/sched/fair.c
Change-Id: I2c6ba055f1160864446c87507a7fd7c8249ad885
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
commit bb229bbb3bf63d23128e851a1f3b85c083178fa1 upstream.
Because map updates are distributed lazily, an OSD may not know about
the new blacklist for quite some time after "osd blacklist add" command
is completed. This makes it possible for a blacklisted but still alive
client to overwrite a post-blacklist update, resulting in data
corruption.
Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
the post-blacklist epoch for all post-blacklist requests ensures that
all such requests "wait" for the blacklist to come into force on their
respective OSDs.
Cc: stable@vger.kernel.org
Fixes: 6305a3b415 ("libceph: support for blacklisting clients")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-c97d2b5:
Linux 4.19.26
net: phylink: avoid resolving link state too early
pinctrl: max77620: Use define directive for max77620_pinconf_param values
udlfb: handle unplug properly
netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put()
netfilter: nfnetlink_osf: add missing fmatch check
netfilter: ipv6: Don't preserve original oif for loopback address
netfilter: nft_compat: use-after-free when deleting targets
netfilter: nf_tables: fix flush after rule deletion in the same batch
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs
staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs
staging: erofs: {dir,inode,super}.c: rectify BUG_ONs
staging: erofs: add a full barrier in erofs_workgroup_unfreeze
staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'
staging: erofs: atomic_cond_read_relaxed on ref-locked workgroup
staging: erofs: remove the redundant d_rehash() for the root dentry
staging: erofs: drop multiref support temporarily
staging: erofs: replace BUG_ON with DBG_BUGON in data.c
staging: erofs: complete error handing of z_erofs_do_read_page
staging: erofs: fix a bug when appling cache strategy
net: avoid false positives in untrusted gso validation
net: validate untrusted gso packets without csum offload
kvm: x86: Return LA57 feature based on hardware capability
mac80211: allocate tailroom for forwarded mesh packets
drm/amd/display: Fix MST reboot/poweroff sequence
drm/i915/fbdev: Actually configure untiled displays
gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
ARC: define ARCH_SLAB_MINALIGN = 8
ARC: U-boot: check arguments paranoidly
ARCv2: Enable unaligned access in early ASM code
parisc: Fix ptrace syscall number modification
KEYS: always initialize keyring_index_key::desc_len
KEYS: user: Align the payload buffer
RDMA/srp: Rework SCSI device reset handling
net/mlx5e: XDP, fix redirect resources availability check
net_sched: fix two more memory leaks in cls_tcindex
net_sched: fix a memory leak in cls_tcindex
net_sched: fix a race condition in tcindex_destroy()
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
net: socket: make bond ioctls go through compat_ifreq_ioctl()
net: socket: fix SIOCGIFNAME in compat
Revert "kill dev_ifsioc()"
Revert "socket: fix struct ifreq size in compat ioctl"
team: avoid complex list operations in team_nl_cmd_options_set()
sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
net: sfp: do not probe SFP module before we're attached
net/packet: fix 4gb buffer limit due to overflow check
net/mlx5e: Don't overwrite pedit action when multiple pedit used
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
net: ena: fix race between link up and device initalization
ipv6: propagate genlmsg_reply return code
inet_diag: fix reporting cgroup classid and fallback to priority
batman-adv: fix uninit-value in batadv_interface_tx()
isdn: avm: Fix string plus integer warning from Clang
net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
selftests: forwarding: Add a test case for externally learned FDB entries
mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
net: bridge: Mark FDB entries that were added by user as such
mlxsw: pci: Return error on PCI reset timeout
dpaa_eth: NETIF_F_LLTX requires to do our own update of trans_start
bpf: bpf_setsockopt: reset sock dst on SO_MARK changes
leds: lp5523: fix a missing check of return value of lp55xx_read
hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table
atm: he: fix sign-extension overflow on large shift
selftests/bpf: retry tests that expect build-id
bpf: zero out build_id for BPF_STACK_BUILD_ID_IP
bpf: don't assume build-id length is always 20 bytes
afs: Fix key refcounting in file locking code
afs: Don't set vnode->cb_s_break in afs_validate()
selftests: tc-testing: fix parsing of ife type
selftests: tc-testing: fix tunnel_key failure if dst_port is unspecified
selftests: tc-testing: drop test on missing tunnel key id
pvcalls-front: fix potential null dereference
drm/sun4i: backend: add missing of_node_puts
vhost: return EINVAL if iovecs size does not match the message size
drm/amd/display: fix PME notification not working in RV desktop
drm/amdkfd: Don't assign dGPUs to APU topology devices
drm/meson: add missing of_node_put
always clear the X2APIC_ENABLE bit for PV guest
netfilter: nft_flow_offload: fix checking method of conntrack helper
scsi: cxgb4i: add wait_for_completion()
scsi: ufs: Fix geometry descriptor size
scsi: qedi: Add ep_state for login completion on un-reachable targets
scsi: ufs: Fix system suspend status
scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes
isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
net: stmmac: Prevent RX starvation in stmmac_napi_poll()
net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
net: stmmac: Check if CBS is supported before configuring
net: stmmac: dwxgmac2: Only clear interrupts that are active
net: stmmac: Fix PCI module removal leak
acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id()
powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.
RDMA/mthca: Clear QP objects during their allocation
netfilter: nft_flow_offload: fix interaction with vrf slave device
bpf: fix panic in stack_map_get_build_id() on i386 and arm32
pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock
bpf: correctly set initial window on active Fast Open sender
netfilter: nft_flow_offload: Fix reverse route lookup
MIPS: jazz: fix 64bit build
include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
scsi: isci: initialize shost fully before calling scsi_add_host()
scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
netfilter: nf_tables: fix leaking object reference count
selftests: forwarding: Add a test for VLAN deletion
mlxsw: spectrum_acl: Add cleanup after C-TCAM update error condition
xprtrdma: Double free in rpcrdma_sendctxs_create()
MIPS: ath79: Enable OF serial ports in the default config
net/mlx4: Get rid of page operation after dma_alloc_coherent
watchdog: mt7621_wdt/rt2880_wdt: Fix compilation problem
selftests/bpf: Test [::] -> [::1] rewrite in sys_sendmsg in test_sock_addr
bpf: Fix [::] -> [::1] rewrite in sys_sendmsg
net: hns: Fix use after free identified by SLUB debug
qed: Fix qed_ll2_post_rx_buffer_notify_fw() by adding a write memory barrier
qed: Fix qed_chain_set_prod() for PBL chains with non power of 2 page count
xen/pvcalls: remove set but not used variable 'intf'
mfd: mc13xxx: Fix a missing check of a register-read failure
mfd: tps65218: Use devm_regmap_add_irq_chip and clean up error path in probe()
mfd: cros_ec_dev: Add missing mfd_remove_devices() call in remove
mfd: axp20x: Add supported cells for AXP803
mfd: axp20x: Re-align MFD cell entries
mfd: axp20x: Add AC power supply cell for AXP813
mfd: wm5110: Add missing ASRC rate register
mfd: qcom_rpm: write fw_version to CTRL_REG
mfd: bd9571mwv: Add volatile register to make DVFS work
mfd: ab8500-core: Return zero in get_register_interruptible()
mfd: mt6397: Do not call irq_domain_remove if PMIC unsupported
mfd: db8500-prcmu: Fix some section annotations
mfd: twl-core: Fix section annotations on {,un}protect_pm_master
pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
pvcalls-front: properly allocate sk
pvcalls-front: don't try to free unallocated rings
pvcalls-front: read all data before closing the connection
mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
backlight: pwm_bl: Fix devicetree parsing with auto-generated brightness tables
KEYS: allow reaching the keys quotas exactly
ALSA: hda/realtek: Disable PC beep in passthrough on alc285
ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5
proc, oom: do not report alien mms when setting oom_score_adj
numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
libceph: handle an empty authorize reply
mac80211: Free mpath object when rhashtable insertion fails
mac80211: Use linked list instead of rhashtable walk for mesh tables
mac80211: Restore vif beacon interval if start ap fails
gpio: pxa: avoid attempting to set pin direction via pinctrl on MMP2
gpio: MT7621: use a per instance irq_chip structure
MIPS: eBPF: Always return sign extended 32b values
tracing: Fix number of entries in trace header
ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction
Change-Id: Ie585d8274f881ac87155e9deda341c43cd8923b4
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
* refs/heads/tmp-0755dc9:
Linux 4.19.22
svcrdma: Remove max_sge check at connect time
svcrdma: Reduce max_send_sges
batman-adv: Force mac header to start of data on xmit
batman-adv: Avoid WARN on net_device without parent in netns
xfrm: refine validation of template and selector families
libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()
Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
xfrm: Make set-mark default behavior backward compatible
SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT
drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user
drm/vmwgfx: Fix setting of dma masks
drm/i915: always return something on DDI clock selection
drm/amd/powerplay: Fix missing break in switch
drm/modes: Prevent division by zero htotal
mac80211: ensure that mgmt tx skbs have tailroom for encryption
mic: vop: Fix use-after-free on remove
powerpc/radix: Fix kernel crash with mremap()
firmware: arm_scmi: provide the mandatory device release callback
ARM: dts: da850: fix interrupt numbers for clocksource
ARM: tango: Improve ARCH_MULTIPLATFORM compatibility
ARM: iop32x/n2100: fix PCI IRQ mapping
MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds
mips: loongson64: remove unreachable(), fix loongson_poweroff().
MIPS: VDSO: Use same -m%-float cflag as the kernel proper
MIPS: OCTEON: don't set octeon_dma_bar_type if PCI is disabled
mips: cm: reprime error cause
tracing: uprobes: Fix typo in pr_fmt string
pinctrl: cherryview: fix Strago DMI workaround
pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller
debugfs: fix debugfs_rename parameter checking
samples: mei: use /dev/mei0 instead of /dev/mei
mei: me: add ice lake point device id.
misc: vexpress: Off by one in vexpress_syscfg_exec()
signal: Better detection of synchronous signals
signal: Always notice exiting tasks
iio: ti-ads8688: Update buffer allocation for timestamps
iio: chemical: atlas-ph-sensor: correct IIO_TEMP values to millicelsius
iio: adc: axp288: Fix TS-pin handling
tools: iio: iio_generic_buffer: make num_loops signed
libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
mtd: rawnand: gpmi: fix MX28 bus master lockup problem
mtd: spinand: Fix the error/cleanup path in spinand_init()
mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache
mtd: Make sure mtd->erasesize is valid even if the partition is of size 0
ANDROID: cuttlefish: enable CONFIG_NET_SCH_NETEM=y
Add XFRM-I to cuttlefish defconfigs
ANDROID: Move from clang r346389b to r349610.
Change-Id: Ie249267aa9e0d4eb169adecafc0cdc59a0a2eb0f
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
commit 0fd3fd0a9bb0b02b6435bb7070e9f7b82a23f068 upstream.
The authorize reply can be empty, for example when the ticket used to
build the authorizer is too old and TAG_BADAUTHORIZER is returned from
the service. Calling ->verify_authorizer_reply() results in an attempt
to decrypt and validate (somewhat) random data in au->buf (most likely
the signature block from calc_signature()), which fails and ends up in
con_fault_finish() with !con->auth_retry. The ticket isn't invalidated
and the connection is retried again and again until a new ticket is
obtained from the monitor:
libceph: osd2 192.168.122.1:6809 bad authorize reply
libceph: osd2 192.168.122.1:6809 bad authorize reply
libceph: osd2 192.168.122.1:6809 bad authorize reply
libceph: osd2 192.168.122.1:6809 bad authorize reply
Let TAG_BADAUTHORIZER handler kick in and increment con->auth_retry.
Cc: stable@vger.kernel.org
Fixes: 5c056fdc5b ("libceph: verify authorize reply on connect")
Link: https://tracker.ceph.com/issues/20164
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4aac9228d16458cedcfd90c7fb37211cf3653ac3 upstream.
con_fault() can transition the connection into STANDBY right after
ceph_con_keepalive() clears STANDBY in clear_standby():
libceph user thread ceph-msgr worker
ceph_con_keepalive()
mutex_lock(&con->mutex)
clear_standby(con)
mutex_unlock(&con->mutex)
mutex_lock(&con->mutex)
con_fault()
...
if KEEPALIVE_PENDING isn't set
set state to STANDBY
...
mutex_unlock(&con->mutex)
set KEEPALIVE_PENDING
set WRITE_PENDING
This triggers warnings in clear_standby() when either ceph_con_send()
or ceph_con_keepalive() get to clearing STANDBY next time.
I don't see a reason to condition queue_con() call on the previous
value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING
into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING
could have been a non-atomic flag.
Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Git-Commit: 69d6302b65a83ce04720158f3f6fc2c9fb46c941
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: Ic0dedbad02494fdc1475ee40ded392221a62a69a
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
commit 7e241f647dc7087a0401418a187f3f5b527cc690 upstream.
skb_can_coalesce() allows coalescing neighboring slab objects into
a single frag:
return page == skb_frag_page(frag) &&
off == frag->page_offset + skb_frag_size(frag);
ceph_tcp_sendpage() can be handed slab pages. One example of this is
XFS: it passes down sector sized slab objects for its metadata I/O. If
the kernel client is co-located on the OSD node, the skb may go through
loopback and pop on the receive side with the exact same set of frags.
When tcp_recvmsg() attempts to copy out such a frag, hardened usercopy
complains because the size exceeds the object's allocated size:
usercopy: kernel memory exposure attempt detected from ffff9ba917f20a00 (kmalloc-512) (1024 bytes)
Although skb_can_coalesce() could be taught to return false if the
resulting frag would cross a slab object boundary, we already have
a fallback for non-refcounted pages. Utilize it for slab pages too.
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
request_key never return NULL,so no need do non-NULL check.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Avoid scribbling over memory if the received reply/challenge is larger
than the buffer supplied with the authorizer.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Derive the signature from the entire buffer (both AES cipher blocks)
instead of using just the first half of the first block, leaving out
data_crc entirely.
This addresses CVE-2018-1129.
Link: http://tracker.ceph.com/issues/24837
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
When a client authenticates with a service, an authorizer is sent with
a nonce to the service (ceph_x_authorize_[ab]) and the service responds
with a mutation of that nonce (ceph_x_authorize_reply). This lets the
client verify the service is who it says it is but it doesn't protect
against a replay: someone can trivially capture the exchange and reuse
the same authorizer to authenticate themselves.
Allow the service to reject an initial authorizer with a random
challenge (ceph_x_authorize_challenge). The client then has to respond
with an updated authorizer proving they are able to decrypt the
service's challenge and that the new authorizer was produced for this
specific connection instance.
The accepting side requires this challenge and response unconditionally
if the client side advertises they have CEPHX_V2 feature bit.
This addresses CVE-2018-1128.
Link: http://tracker.ceph.com/issues/24836
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Will be used for encrypting both the initial and updated authorizers.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Will be used for decrypting the server challenge which is only preceded
by ceph_x_encrypt_header.
Drop struct_v check to allow for extending ceph_x_encrypt_header in the
future.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Will be used for sending ceph_msg_connect with an updated authorizer,
after the server challenges the initial authorizer.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
We already copy authorizer_reply_buf and authorizer_reply_buf_len into
ceph_connection. Factoring out __prepare_write_connect() requires two
more: authorizer_buf and authorizer_buf_len. Store the pointer to the
handshake in con->auth rather than piling on.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Remove blank lines at end of file and trailing whitespace.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The request mtime field is used all over ceph, and is currently
represented as a 'timespec' structure in Linux. This changes it to
timespec64 to allow times beyond 2038, modifying all users at the
same time.
[ Remove now redundant ts variable in writepage_nounlock(). ]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
ceph_con_keepalive_expired() is the last user of timespec_add() and some
of the last uses of ktime_get_real_ts(). Replacing this with timespec64
based interfaces lets us remove that deprecated API.
I'm introducing new ceph_encode_timespec64()/ceph_decode_timespec64()
here that take timespec64 structures and convert to/from ceph_timespec,
which is defined to have an unsigned 32-bit tv_sec member. This extends
the range of valid times to year 2106, avoiding the year 2038 overflow.
The ceph file system portion still uses the old functions for inode
timestamps, this will be done separately after the VFS layer is converted.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
There is no reason to continue option parsing after detecting
bad option.
[ Return match_int() errors from ceph_parse_options() to match the
behaviour of parse_rbd_opts_token() and parse_fsopt_token(). ]
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The wire format dictates that payload_len fits into 4 bytes. Take u32
instead of size_t to reflect that.
All callers pass a small integer, so no changes required.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
requests are aborted, improving CephFS ENOSPC handling and making
"umount -f" actually work (Zheng and myself). The rest is mostly
mount option handling cleanups from Chengguang and assorted fixes
from Zheng, Luis and Dongsheng.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJbIkigAAoJEEp/3jgCEfOL3EUH/1s7Ib3FgFzG/SPPKISxZOGr
ndZGg0rPT9mPIQ4rp6t0z/cDlMrluPmCK3sWrAPe//sZz9iZiuip+mCL0gUFXFNr
1kL2xDKkJzGxtP3UlUvr5CC6bnxLdeBXJRBDLk/swtphuqArKndlbN/iLZnCZivT
uJDk+vZTwNJ3UhQP4QdnOQLV60NYs+q4euTqbZF3+pDiRiONbxRfXC3adFsc8zL9
zlie3CHPbrQHWMsfNvbfM3rBH1WhTwEssDm+IEFlKl19q9SKP2WPZfmBcE1pmZ58
AhIMoNGdQha1FXS6N96kaPaqFgeysPnEPoyHDqLxsUMKqsvJlOEZsK1jujza4rE=
=EfXm
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"The main piece is a set of libceph changes that revamps how OSD
requests are aborted, improving CephFS ENOSPC handling and making
"umount -f" actually work (Zheng and myself).
The rest is mostly mount option handling cleanups from Chengguang and
assorted fixes from Zheng, Luis and Dongsheng.
* tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits)
rbd: flush rbd_dev->watch_dwork after watch is unregistered
ceph: update description of some mount options
ceph: show ino32 if the value is different with default
ceph: strengthen rsize/wsize/readdir_max_bytes validation
ceph: fix alignment of rasize
ceph: fix use-after-free in ceph_statfs()
ceph: prevent i_version from going back
ceph: fix wrong check for the case of updating link count
libceph: allocate the locator string with GFP_NOFAIL
libceph: make abort_on_full a per-osdc setting
libceph: don't abort reads in ceph_osdc_abort_on_full()
libceph: avoid a use-after-free during map check
libceph: don't warn if req->r_abort_on_full is set
libceph: use for_each_request() in ceph_osdc_abort_on_full()
libceph: defer __complete_request() to a workqueue
libceph: move more code into __complete_request()
libceph: no need to call flush_workqueue() before destruction
ceph: flush pending works before shutdown super
ceph: abort osd requests on force umount
libceph: introduce ceph_osdc_abort_requests()
...
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
- Introduce overflow test module (Rasmus, Kees)
- Introduce saturating size helper functions (Matthew, Kees)
- Treewide use of struct_size() for allocators (Kees)
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsYJ1gWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlCTEACwdEeriAd2VwxknnsstojGD/3g
8TTFA19vSu4Gxa6WiDkjGoSmIlfhXTlZo1Nlmencv16ytSvIVDNLUIB3uDxUIv1J
2+dyHML9JpXYHHR7zLXXnGFJL0wazqjbsD3NYQgXqmun7EVVYnOsAlBZ7h/Lwiej
jzEJd8DaHT3TA586uD3uggiFvQU0yVyvkDCDONIytmQx+BdtGdg9TYCzkBJaXuDZ
YIthyKDvxIw5nh/UaG3L+SKo73tUr371uAWgAfqoaGQQCWe+mxnWL4HkCKsjFzZL
u9ouxxF/n6pij3E8n6rb0i2fCzlsTDdDF+aqV1rQ4I4hVXCFPpHUZgjDPvBWbj7A
m6AfRHVNnOgI8HGKqBGOfViV+2kCHlYeQh3pPW33dWzy/4d/uq9NIHKxE63LH+S4
bY3oO2ela8oxRyvEgXLjqmRYGW1LB/ZU7FS6Rkx2gRzo4k8Rv+8K/KzUHfFVRX61
jEbiPLzko0xL9D53kcEn0c+BhofK5jgeSWxItdmfuKjLTW4jWhLRlU+bcUXb6kSS
S3G6aF+L+foSUwoq63AS8QxCuabuhreJSB+BmcGUyjthCbK/0WjXYC6W/IJiRfBa
3ZTxBC/2vP3uq/AGRNh5YZoxHL8mSxDfn62F+2cqlJTTKR/O+KyDb1cusyvk3H04
KCDVLYPxwQQqK1Mqig==
=/3L8
-----END PGP SIGNATURE-----
Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull overflow updates from Kees Cook:
"This adds the new overflow checking helpers and adds them to the
2-factor argument allocators. And this adds the saturating size
helpers and does a treewide replacement for the struct_size() usage.
Additionally this adds the overflow testing modules to make sure
everything works.
I'm still working on the treewide replacements for allocators with
"simple" multiplied arguments:
*alloc(a * b, ...) -> *alloc_array(a, b, ...)
and
*zalloc(a * b, ...) -> *calloc(a, b, ...)
as well as the more complex cases, but that's separable from this
portion of the series. I expect to have the rest sent before -rc1
closes; there are a lot of messy cases to clean up.
Summary:
- Introduce arithmetic overflow test helper functions (Rasmus)
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
- Introduce overflow test module (Rasmus, Kees)
- Introduce saturating size helper functions (Matthew, Kees)
- Treewide use of struct_size() for allocators (Kees)"
* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
treewide: Use struct_size() for devm_kmalloc() and friends
treewide: Use struct_size() for vmalloc()-family
treewide: Use struct_size() for kmalloc()-family
device: Use overflow helpers for devm_kmalloc()
mm: Use overflow helpers in kvmalloc()
mm: Use overflow helpers in kmalloc_array*()
test_overflow: Add memory allocation overflow tests
overflow.h: Add allocation size calculation helpers
test_overflow: Report test failures
test_overflow: macrofy some more, do more tests for free
lib: add runtime test of check_*_overflow functions
compiler.h: enable builtin overflow checkers and add fallback code
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
void *entry[];
};
instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:
// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
// sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@
- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@
- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@
- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)
Signed-off-by: Kees Cook <keescook@chromium.org>
calc_target() isn't supposed to fail with anything but POOL_DNE, in
which case we report that the pool doesn't exist and fail the request
with -ENOENT. Doing this for -ENOMEM is at the very least confusing
and also harmful -- as the preceding requests complete, a short-lived
locator string allocation is likely to succeed after a wait.
(We used to call ceph_object_locator_to_pg() for a pi lookup. In
theory that could fail with -ENOENT, hence the "ret != -ENOENT" warning
being removed.)
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The intent behind making it a per-request setting was that it would be
set for writes, but not for reads. As it is, the flag is set for all
fs/ceph requests except for pool perm check stat request (technically
a read).
ceph_osdc_abort_on_full() skips reads since the previous commit and
I don't see a use case for marking individual requests.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Don't consider reads for aborting and use ->base_oloc instead of
->target_oloc, as done in __submit_request().
Strictly speaking, we shouldn't be aborting FULL_TRY/FULL_FORCE writes
either. But, there is an inconsistency in FULL_TRY/FULL_FORCE handling
on the OSD side [1], so given that neither of these is used in the
kernel client, leave it for when the OSD behaviour is sorted out.
[1] http://tracker.ceph.com/issues/24339
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Sending map check after complete_request() was called is not only
useless, but can lead to a use-after-free as req->r_kref decrement in
__complete_request() races with map check code.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
The "FULL or reached pool quota" warning is there to explain paused
requests. No need to emit it if pausing isn't going to occur.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Scanning the trees just to see if there is anything to abort is
unnecessary -- all that is needed here is to update the epoch barrier
first, before we start aborting. Simplify and do the update inside the
loop before calling abort_request() for the first time.
The switch to for_each_request() also fixes a bug: homeless requests
weren't even considered for aborting.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
In the common case, req->r_callback is called by handle_reply() on the
ceph-msgr worker thread without any locks. If handle_reply() fails, it
is called with both osd->lock and osdc->lock. In the map check case,
it is called with just osdc->lock but held for write. Finally, if the
request is aborted because of -ENOSPC or by ceph_osdc_abort_requests(),
it is called directly on the submitter's thread, again with both locks.
req->r_callback on the submitter's thread is relatively new (introduced
in 4.12) and ripe for deadlocks -- e.g. writeback worker thread waiting
on itself:
inode_wait_for_writeback+0x26/0x40
evict+0xb5/0x1a0
iput+0x1d2/0x220
ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph]
writepages_finish+0x2d3/0x410 [ceph]
__complete_request+0x26/0x60 [libceph]
complete_request+0x2e/0x70 [libceph]
__submit_request+0x256/0x330 [libceph]
submit_request+0x2b/0x30 [libceph]
ceph_osdc_start_request+0x25/0x40 [libceph]
ceph_writepages_start+0xdfe/0x1320 [ceph]
do_writepages+0x1f/0x70
__writeback_single_inode+0x45/0x330
writeback_sb_inodes+0x26a/0x600
__writeback_inodes_wb+0x92/0xc0
wb_writeback+0x274/0x330
wb_workfn+0x2d5/0x3b0
Defer __complete_request() to a workqueue in all failure cases so it's
never on the same thread as ceph_osdc_start_request() and always called
with no locks held.
Link: http://tracker.ceph.com/issues/23978
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Move req->r_completion wake up and req->r_kref decrement into
__complete_request().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
All gotos to "more" are conditioned on con->state == OPEN, but the only
thing "more" does is opening the socket if con->state == PREOPEN. Kill
that label and rename "more_kvec" to "more".
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
... and store num_bvecs for client code's convenience.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
ceph_con_workfn() validates con->state before calling try_read() and
then try_write(). However, try_read() temporarily releases con->mutex,
notably in process_message() and ceph_con_in_msg_alloc(), opening the
window for ceph_con_close() to sneak in, close the connection and
release con->sock. When try_write() is called on the assumption that
con->state is still valid (i.e. not STANDBY or CLOSED), a NULL sock
gets passed to the networking stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_socket_sendmsg+0x5/0x20
Make sure con->state is valid at the top of try_write() and add an
explicit BUG_ON for this, similar to try_read().
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/23706
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
If we go without an established session for a while, backoff delay will
climb to 30 seconds. The keepalive timeout is also 30 seconds, so it's
pretty easily hit after a prolonged hunting for a monitor: we don't get
a chance to send out a keepalive in time, which means we never get back
a keepalive ack in time, cutting an established session and attempting
to connect to a different monitor every 30 seconds:
[Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established
[Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon
[Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established
[Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
[Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established
[Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon
[Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established
[Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
The regular keepalive interval is 10 seconds. After ->hunting is
cleared in finish_hunting(), call __schedule_delayed() to ensure we
send out a keepalive after 10 seconds.
Cc: stable@vger.kernel.org # 4.7+
Link: http://tracker.ceph.com/issues/23537
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
This means that if we do some backoff, then authenticate, and are
healthy for an extended period of time, a subsequent failure won't
leave us starting our hunting sequence with a large backoff.
Mirrors ceph.git commit d466bc6e66abba9b464b0b69687cf45c9dccf383.
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
- support for rbd "fancy" striping (myself). The striping feature bit
is now fully implemented, allowing mapping v2 images with non-default
striping patterns. This completes support for --image-format 2.
- CephFS quota support (Luis Henriques and Zheng Yan). This set is
based on the new SnapRealm code in the upcoming v13.y.z ("Mimic")
release. Quota handling will be rejected on older filesystems.
- memory usage improvements in CephFS (Chengguang Xu). Directory
specific bits have been split out of ceph_file_info and some effort
went into improving cap reservation code to avoid OOM crashes.
Also included a bunch of assorted fixes all over the place from
Chengguang and others.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJazOI/AAoJEEp/3jgCEfOLOu0IAKGFkcCo0UdQDGHHJZHn2rAm
CSWMMwyYGAhoWI6Gva0jx1A2omZLFSeq/MC8dWLL/MNAKt8i/qo8bTsTrwCHMR2Q
D0FsvMWIhkWRS1/FcD1uVDhn0a/DFm5Kfy8kzz3v695TDCt+BYWrCqyHTB/wSdRR
VpO3KdpHQ9h3ojNBRgIniOCNPeQP+QzLXy+P0h0oKbP2Y03mwJlsWG4L6zakkkwT
e2I+RVdlOMUDJ7rZxiXESBr6BuLI4oOkPe8roQGmZPy1Xe17xa9M5iWVNuM6RUhO
Z9bS2aLMhbDyeCPqvzgAnsUtFT0PAQjB5NYw2yqisbHs/wrU5kMOOpcLqz/Ls/s=
=v1I9
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"The big ticket items are:
- support for rbd "fancy" striping (myself).
The striping feature bit is now fully implemented, allowing mapping
v2 images with non-default striping patterns. This completes
support for --image-format 2.
- CephFS quota support (Luis Henriques and Zheng Yan).
This set is based on the new SnapRealm code in the upcoming v13.y.z
("Mimic") release. Quota handling will be rejected on older
filesystems.
- memory usage improvements in CephFS (Chengguang Xu).
Directory specific bits have been split out of ceph_file_info and
some effort went into improving cap reservation code to avoid OOM
crashes.
Also included a bunch of assorted fixes all over the place from
Chengguang and others"
* tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits)
ceph: quota: report root dir quota usage in statfs
ceph: quota: add counter for snaprealms with quota
ceph: quota: cache inode pointer in ceph_snap_realm
ceph: fix root quota realm check
ceph: don't check quota for snap inode
ceph: quota: update MDS when max_bytes is approaching
ceph: quota: support for ceph.quota.max_bytes
ceph: quota: don't allow cross-quota renames
ceph: quota: support for ceph.quota.max_files
ceph: quota: add initial infrastructure to support cephfs quotas
rbd: remove VLA usage
rbd: fix spelling mistake: "reregisteration" -> "reregistration"
ceph: rename function drop_leases() to a more descriptive name
ceph: fix invalid point dereference for error case in mdsc destroy
ceph: return proper bool type to caller instead of pointer
ceph: optimize memory usage
ceph: optimize mds session register
libceph, ceph: add __init attribution to init funcitons
ceph: filter out used flags when printing unused open flags
ceph: don't wait on writeback when there is no more dirty pages
...
This patch adds the infrastructure required to support cephfs quotas as it
is currently implemented in the ceph fuse client. Cephfs quotas can be
set on any directory, and can restrict the number of bytes or the number
of files stored beneath that point in the directory hierarchy.
Quotas are set using the extended attributes 'ceph.quota.max_files' and
'ceph.quota.max_bytes', and can be removed by setting these attributes to
'0'.
Link: http://tracker.ceph.com/issues/22372
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Add __init attribution to the functions which are called only once
during initiating/registering operations and deleting unnecessary
symbol exports.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>