Commit Graph

967 Commits

Author SHA1 Message Date
Ivaylo Georgiev
a8b2f6325b Merge android-4.19-q.69 (93083852) into msm-4.19
* refs/heads/tmp-93083852:
  Linux 4.19.69
  rxrpc: Fix local refcounting
  rxrpc: Fix local endpoint replacement
  rxrpc: Fix read-after-free in rxrpc_queue_local()
  rxrpc: Fix local endpoint refcounting
  powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB
  dm zoned: fix potential NULL dereference in dmz_do_reclaim()
  xfs: always rejoin held resources during defer roll
  xfs: Add attibute remove and helper functions
  xfs: Add attibute set and helper functions
  xfs: Add helper function xfs_attr_try_sf_addname
  xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
  xfs: don't trip over uninitialized buffer on extent read of corrupted inode
  xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
  mm/zsmalloc.c: fix race condition in zs_destroy_pool
  mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
  mm, page_owner: handle THP splits correctly
  genirq: Properly pair kobject_del() with kobject_add()
  dm zoned: properly handle backing device failure
  dm zoned: improve error handling in i/o map code
  dm zoned: improve error handling in reclaim
  dm table: fix invalid memory accesses with too high sector number
  dm space map metadata: fix missing store of apply_bops() return value
  dm raid: add missing cleanup in raid_ctr()
  dm integrity: fix a crash due to BUG_ON in __journal_read_write()
  dm btree: fix order of block initialization in btree_split_beneath
  dm kcopyd: always complete failed jobs
  x86/boot: Fix boot regression caused by bootparam sanitizing
  x86/boot: Save fields explicitly, zero out everything else
  x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
  x86/apic: Handle missing global clockevent gracefully
  x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
  userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
  Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAE
  gpiolib: never report open-drain/source lines as 'input' to user-space
  drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX
  libceph: fix PG split vs OSD (re)connect race
  ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
  ceph: clear page dirty before invalidate page
  clk: socfpga: stratix10: fix rate caclulationg for cnt_clks
  Revert "dm bufio: fix deadlock with loop device"
  HID: wacom: Correct distance scale for 2nd-gen Intuos devices
  HID: wacom: correct misreported EKR ring values
  selftests: kvm: Adding config fragments
  KVM: arm: Don't write junk to CP15 registers on reset
  KVM: arm64: Don't write junk to sysregs on reset
  perf pmu-events: Fix missing "cpu_clk_unhalted.core" event
  perf cpumap: Fix writing to illegal memory in handling cpumap mask
  perf ftrace: Fix failure to set cpumask when only one cpu is present
  block, bfq: handle NULL return value by bfq_init_rq()
  drm/vmwgfx: fix memory leak when too many retries have occurred
  x86/lib/cpu: Address missing prototypes warning
  libata: add SG safety checks in SFF pio transfers
  libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
  net: hisilicon: Fix dma_map_single failed on arm64
  net: hisilicon: fix hip04-xmit never return TX_BUSY
  net: hisilicon: make hip04_tx_reclaim non-reentrant
  net: stmmac: tc: Do not return a fragment entry
  net: stmmac: Fix issues when number of Queues >= 4
  net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
  s390: put _stext and _etext into .text section
  SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
  SMB3: Fix potential memory leak when processing compound chain
  drm/rockchip: Suspend DP late
  HID: input: fix a4tech horizontal wheel custom usage
  HID: quirks: Set the INCREMENT_USAGE_ON_DUPLICATE quirk on Saitek X52
  NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
  NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
  net/ethernet/qlogic/qed: force the string buffer NULL-terminated
  can: peak_usb: force the string buffer NULL-terminated
  can: sja1000: force the string buffer NULL-terminated
  perf bench numa: Fix cpu0 binding
  net: phy: phy_led_triggers: Fix a possible null-pointer dereference in phy_led_trigger_change_speed()
  isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack
  rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet
  rxrpc: Fix potential deadlock
  netfilter: ipset: Fix rename concurrency with listing
  netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets
  netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too
  mac80211_hwsim: Fix possible null-pointer dereferences in hwsim_dump_radio_nl()
  isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain()
  qed: RDMA - Fix the hw_ver returned in device attributes
  net: usb: qmi_wwan: Add the BroadMobi BM818 card
  ASoC: ti: davinci-mcasp: Correct slot_width posed constraint
  ASoC: rockchip: Fix mono capture
  st_nci_hci_connectivity_event_received: null check the allocation
  st21nfca_connectivity_event_received: null check the allocation
  ASoC: Fail card instantiation if DAI format setup fails
  can: gw: Fix error path of cgw_module_init
  can: mcp251x: add error check when wq alloc failed
  can: dev: call netif_carrier_off() in register_candev()
  selftests: forwarding: gre_multipath: Fix flower filters
  selftests: forwarding: gre_multipath: Enable IPv4 forwarding
  net: mvpp2: Don't check for 3 consecutive Idle frames for 10G links
  bonding: Force slave speed check after link state recovery for 802.3ad
  selftests/bpf: fix sendmsg6_prog on s390
  ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks
  netfilter: ebtables: fix a memory leak bug in compat
  mips: fix cacheinfo
  MIPS: kernel: only use i8253 clocksource with periodic clockevent
  HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT

Conflicts:
	fs/userfaultfd.c

Change-Id: I35868bf90a3b2693b033ff302fbbb0dfd36b43fa
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-08-30 00:24:09 -07:00
Ilya Dryomov
51f6afddb1 libceph: fix PG split vs OSD (re)connect race
commit a561372405cf6bc6f14239b3a9e57bb39f2788b0 upstream.

We can't rely on ->peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not.  ->peer_features is not valid unless the OSD session is open.  If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.

In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap.  However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).

Instead, let's drop this feature check.  It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent.  Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.

Cc: stable@vger.kernel.org
Fixes: 7de030d6b1 ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Jerry Lee <leisurelysw24@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-29 08:28:50 +02:00
Ivaylo Georgiev
4925b05687 Merge android-4.19.32 (6f994bf) into msm-4.19
* refs/heads/tmp-6f994bf:
  Revert "ANDROID: sched: Disable find_best_target() by default"
  ANDROID: cpufreq: times: don't copy invalid freqs from freq table
  UPSTREAM: filemap: add a comment about FAULT_FLAG_RETRY_NOWAIT behavior
  BACKPORT: filemap: drop the mmap_sem for all blocking operations
  BACKPORT: filemap: kill page_cache_read usage in filemap_fault
  UPSTREAM: filemap: pass vm_fault to the mmap ra helpers
  ANDROID: binder: remove extra declaration left after backport
  FROMGIT: binder: fix BUG_ON found by selinux-testsuite
  ANDROID: sched: Disable find_best_target() by default
  ANDROID: sched/fair: Make the EAS wake-up prefer-idle aware
  Linux 4.19.32
  power: supply: charger-manager: Fix incorrect return value
  ALSA: hda - Enforces runtime_resume after S3 and S4 for each codec
  ALSA: hda - Record the current power state before suspend/resume calls
  locking/lockdep: Add debug_locks check in __lock_downgrade()
  x86/unwind: Add hardcoded ORC entry for NULL
  x86/unwind: Handle NULL pointer calls better in frame unwinder
  loop: access lo_backing_file only when the loop device is Lo_bound
  netfilter: ebtables: remove BUGPRINT messages
  f2fs: fix to avoid deadlock of atomic file operations
  RDMA/cma: Rollback source IP address if failing to acquire device
  drm: Reorder set_property_atomic to avoid returning with an active ww_ctx
  Bluetooth: hci_ldisc: Postpone HCI_UART_PROTO_READY bit set in hci_uart_set_proto()
  Bluetooth: hci_ldisc: Initialize hci_dev before open()
  Bluetooth: Fix decrementing reference count twice in releasing socket
  Bluetooth: hci_uart: Check if socket buffer is ERR_PTR in h4_recv_buf()
  media: v4l2-ctrls.c/uvc: zero v4l2_event
  ext4: brelse all indirect buffer in ext4_ind_remove_space()
  ext4: fix data corruption caused by unaligned direct AIO
  ext4: fix NULL pointer dereference while journal is aborted
  ALSA: ac97: Fix of-node refcount unbalance
  ALSA: hda/ca0132 - make pci_iounmap() call conditional
  ALSA: x86: Fix runtime PM for hdmi-lpe-audio
  SMB3: Fix SMB3.1.1 guest mounts to Samba
  irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
  objtool: Move objtool_file struct off the stack
  perf probe: Fix getting the kernel map
  cifs: allow guest mounts to work for smb3.11
  futex: Ensure that futex address is aligned in handle_futex_death()
  scsi: ibmvscsi: Fix empty event pool access during host removal
  scsi: ibmvscsi: Protect ibmvscsi_head from concurrent modificaiton
  powerpc/vdso64: Fix CLOCK_MONOTONIC inconsistencies across Y2038
  MIPS: Fix kernel crash for R6 in jump label branch function
  MIPS: Ensure ELF appended dtb is relocated
  mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction.
  udf: Fix crash on IO error during truncate
  libceph: wait for latest osdmap in ceph_monc_blacklist_add()
  iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
  drm/vmwgfx: Return 0 when gmrid::get_node runs out of ID's
  drm/vmwgfx: Don't double-free the mode stored in par->set_mode
  mmc: renesas_sdhi: limit block count to 16 bit for old revisions
  mmc: mxcmmc: "Revert mmc: mxcmmc: handle highmem pages"
  mmc: pxamci: fix enum type confusion
  ALSA: firewire-motu: use 'version' field of unit directory to identify model
  ALSA: hda - add Lenovo IdeaCentre B550 to the power_save_blacklist
  ANDROID: dm-bow: Fix 32 bit compile errors
  UPSTREAM: sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
  UPSTREAM: sched/fair: Update scale invariance of PELT
  UPSTREAM: sched/fair: Move the rq_of() helper function
  UPSTREAM: sched/fair: Remove setting task's se->runnable_weight during PELT update
  ANDROID: Add dm-bow to cuttlefish configuration
  UPSTREAM: binder: fix handling of misaligned binder object
  UPSTREAM: binder: fix sparse issue in binder_alloc_selftest.c
  BACKPORT: binder: use userspace pointer as base of buffer space
  UPSTREAM: binder: fix kerneldoc header for struct binder_buffer
  BACKPORT: binder: remove user_buffer_offset
  UPSTREAM: binder: remove kernel vm_area for buffer space
  UPSTREAM: binder: avoid kernel vm_area for buffer fixups
  BACKPORT: binder: add function to copy binder object from buffer
  BACKPORT: binder: add functions to copy to/from binder buffers
  UPSTREAM: binder: create userspace-to-binder-buffer copy function
  ANDROID: dm-bow: Add dm-bow feature
  f2fs: set pin_file under CAP_SYS_ADMIN
  f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
  f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
  f2fs: fix to do sanity check with inode.i_inline_xattr_size
  f2fs: give some messages for inline_xattr_size
  f2fs: don't trigger read IO for beyond EOF page
  f2fs: fix to add refcount once page is tagged PG_private
  f2fs: remove wrong comment in f2fs_invalidate_page()
  f2fs: fix to use kvfree instead of kzfree
  f2fs: print more parameters in trace_f2fs_map_blocks
  f2fs: trace f2fs_ioc_shutdown
  f2fs: fix to avoid deadlock of atomic file operations
  f2fs: fix to dirty inode for i_mode recovery
  f2fs: give random value to i_generation
  f2fs: no need to take page lock in readdir
  f2fs: fix to update iostat correctly in IPU path
  f2fs: fix encrypted page memory leak
  f2fs: make fault injection covering __submit_flush_wait()
  f2fs: fix to retry fill_super only if recovery failed
  f2fs: silence VM_WARN_ON_ONCE in mempool_alloc
  f2fs: correct spelling mistake
  f2fs: fix wrong #endif
  f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG
  f2fs: don't allow negative ->write_io_size_bits
  f2fs: fix to check inline_xattr_size boundary correctly
  Revert "f2fs: fix to avoid deadlock of atomic file operations"
  Revert "f2fs: fix to check inline_xattr_size boundary correctly"
  f2fs: do not use mutex lock in atomic context
  f2fs: fix potential data inconsistence of checkpoint
  f2fs: fix to avoid deadlock of atomic file operations
  f2fs: fix to check inline_xattr_size boundary correctly
  f2fs: jump to label 'free_node_inode' when failing from d_make_root()
  f2fs: fix to document inline_xattr_size option
  f2fs: fix to data block override node segment by mistake
  f2fs: fix typos in code comments
  f2fs: use xattr_prefix to wrap up
  f2fs: sync filesystem after roll-forward recovery
  f2fs: flush quota blocks after turnning it off
  f2fs: avoid null pointer exception in dcc_info
  f2fs: don't wake up too frequently, if there is lots of IOs
  f2fs: try to keep CP_TRIMMED_FLAG after successful umount
  f2fs: add quick mode of checkpoint=disable for QA
  f2fs: run discard jobs when put_super
  f2fs: fix to set sbi dirty correctly
  f2fs: fix to initialize variable to avoid UBSAN/smatch warning
  f2fs: UBSAN: set boolean value iostat_enable correctly
  f2fs: add brackets for macros
  f2fs: check if file namelen exceeds max value
  f2fs: fix to trigger fsck if dirent.name_len is zero
  f2fs: no need to check return value of debugfs_create functions
  f2fs: export FS_NOCOW_FL flag to user
  f2fs: check inject_rate validity during configuring
  f2fs: remove set but not used variable 'err'
  f2fs: fix compile warnings: 'struct *' declared inside parameter list
  f2fs: change error code to -ENOMEM from -EINVAL

Conflicts:
	drivers/md/Kconfig
	kernel/sched/fair.c

Change-Id: I2c6ba055f1160864446c87507a7fd7c8249ad885
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-05-09 00:12:14 -07:00
Ilya Dryomov
9cae232a87 libceph: wait for latest osdmap in ceph_monc_blacklist_add()
commit bb229bbb3bf63d23128e851a1f3b85c083178fa1 upstream.

Because map updates are distributed lazily, an OSD may not know about
the new blacklist for quite some time after "osd blacklist add" command
is completed.  This makes it possible for a blacklisted but still alive
client to overwrite a post-blacklist update, resulting in data
corruption.

Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
the post-blacklist epoch for all post-blacklist requests ensures that
all such requests "wait" for the blacklist to come into force on their
respective OSDs.

Cc: stable@vger.kernel.org
Fixes: 6305a3b415 ("libceph: support for blacklisting clients")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-27 14:14:39 +09:00
Ivaylo Georgiev
bb8fbbfe0f Merge android-4.19.26 (c97d2b5) into msm-4.19
* refs/heads/tmp-c97d2b5:
  Linux 4.19.26
  net: phylink: avoid resolving link state too early
  pinctrl: max77620: Use define directive for max77620_pinconf_param values
  udlfb: handle unplug properly
  netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put()
  netfilter: nfnetlink_osf: add missing fmatch check
  netfilter: ipv6: Don't preserve original oif for loopback address
  netfilter: nft_compat: use-after-free when deleting targets
  netfilter: nf_tables: fix flush after rule deletion in the same batch
  Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
  staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs
  staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs
  staging: erofs: {dir,inode,super}.c: rectify BUG_ONs
  staging: erofs: add a full barrier in erofs_workgroup_unfreeze
  staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'
  staging: erofs: atomic_cond_read_relaxed on ref-locked workgroup
  staging: erofs: remove the redundant d_rehash() for the root dentry
  staging: erofs: drop multiref support temporarily
  staging: erofs: replace BUG_ON with DBG_BUGON in data.c
  staging: erofs: complete error handing of z_erofs_do_read_page
  staging: erofs: fix a bug when appling cache strategy
  net: avoid false positives in untrusted gso validation
  net: validate untrusted gso packets without csum offload
  kvm: x86: Return LA57 feature based on hardware capability
  mac80211: allocate tailroom for forwarded mesh packets
  drm/amd/display: Fix MST reboot/poweroff sequence
  drm/i915/fbdev: Actually configure untiled displays
  gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
  drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
  ARC: define ARCH_SLAB_MINALIGN = 8
  ARC: U-boot: check arguments paranoidly
  ARCv2: Enable unaligned access in early ASM code
  parisc: Fix ptrace syscall number modification
  KEYS: always initialize keyring_index_key::desc_len
  KEYS: user: Align the payload buffer
  RDMA/srp: Rework SCSI device reset handling
  net/mlx5e: XDP, fix redirect resources availability check
  net_sched: fix two more memory leaks in cls_tcindex
  net_sched: fix a memory leak in cls_tcindex
  net_sched: fix a race condition in tcindex_destroy()
  sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
  geneve: should not call rt6_lookup() when ipv6 was disabled
  net: socket: make bond ioctls go through compat_ifreq_ioctl()
  net: socket: fix SIOCGIFNAME in compat
  Revert "kill dev_ifsioc()"
  Revert "socket: fix struct ifreq size in compat ioctl"
  team: avoid complex list operations in team_nl_cmd_options_set()
  sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
  sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
  net: sfp: do not probe SFP module before we're attached
  net/packet: fix 4gb buffer limit due to overflow check
  net/mlx5e: Don't overwrite pedit action when multiple pedit used
  net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
  net: ena: fix race between link up and device initalization
  ipv6: propagate genlmsg_reply return code
  inet_diag: fix reporting cgroup classid and fallback to priority
  batman-adv: fix uninit-value in batadv_interface_tx()
  isdn: avm: Fix string plus integer warning from Clang
  net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
  selftests: forwarding: Add a test case for externally learned FDB entries
  mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
  net: bridge: Mark FDB entries that were added by user as such
  mlxsw: pci: Return error on PCI reset timeout
  dpaa_eth: NETIF_F_LLTX requires to do our own update of trans_start
  bpf: bpf_setsockopt: reset sock dst on SO_MARK changes
  leds: lp5523: fix a missing check of return value of lp55xx_read
  hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table
  atm: he: fix sign-extension overflow on large shift
  selftests/bpf: retry tests that expect build-id
  bpf: zero out build_id for BPF_STACK_BUILD_ID_IP
  bpf: don't assume build-id length is always 20 bytes
  afs: Fix key refcounting in file locking code
  afs: Don't set vnode->cb_s_break in afs_validate()
  selftests: tc-testing: fix parsing of ife type
  selftests: tc-testing: fix tunnel_key failure if dst_port is unspecified
  selftests: tc-testing: drop test on missing tunnel key id
  pvcalls-front: fix potential null dereference
  drm/sun4i: backend: add missing of_node_puts
  vhost: return EINVAL if iovecs size does not match the message size
  drm/amd/display: fix PME notification not working in RV desktop
  drm/amdkfd: Don't assign dGPUs to APU topology devices
  drm/meson: add missing of_node_put
  always clear the X2APIC_ENABLE bit for PV guest
  netfilter: nft_flow_offload: fix checking method of conntrack helper
  scsi: cxgb4i: add wait_for_completion()
  scsi: ufs: Fix geometry descriptor size
  scsi: qedi: Add ep_state for login completion on un-reachable targets
  scsi: ufs: Fix system suspend status
  scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes
  isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
  net: stmmac: Prevent RX starvation in stmmac_napi_poll()
  net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
  net: stmmac: Check if CBS is supported before configuring
  net: stmmac: dwxgmac2: Only clear interrupts that are active
  net: stmmac: Fix PCI module removal leak
  acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id()
  powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.
  RDMA/mthca: Clear QP objects during their allocation
  netfilter: nft_flow_offload: fix interaction with vrf slave device
  bpf: fix panic in stack_map_get_build_id() on i386 and arm32
  pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock
  bpf: correctly set initial window on active Fast Open sender
  netfilter: nft_flow_offload: Fix reverse route lookup
  MIPS: jazz: fix 64bit build
  include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
  scsi: isci: initialize shost fully before calling scsi_add_host()
  scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
  netfilter: nf_tables: fix leaking object reference count
  selftests: forwarding: Add a test for VLAN deletion
  mlxsw: spectrum_acl: Add cleanup after C-TCAM update error condition
  xprtrdma: Double free in rpcrdma_sendctxs_create()
  MIPS: ath79: Enable OF serial ports in the default config
  net/mlx4: Get rid of page operation after dma_alloc_coherent
  watchdog: mt7621_wdt/rt2880_wdt: Fix compilation problem
  selftests/bpf: Test [::] -> [::1] rewrite in sys_sendmsg in test_sock_addr
  bpf: Fix [::] -> [::1] rewrite in sys_sendmsg
  net: hns: Fix use after free identified by SLUB debug
  qed: Fix qed_ll2_post_rx_buffer_notify_fw() by adding a write memory barrier
  qed: Fix qed_chain_set_prod() for PBL chains with non power of 2 page count
  xen/pvcalls: remove set but not used variable 'intf'
  mfd: mc13xxx: Fix a missing check of a register-read failure
  mfd: tps65218: Use devm_regmap_add_irq_chip and clean up error path in probe()
  mfd: cros_ec_dev: Add missing mfd_remove_devices() call in remove
  mfd: axp20x: Add supported cells for AXP803
  mfd: axp20x: Re-align MFD cell entries
  mfd: axp20x: Add AC power supply cell for AXP813
  mfd: wm5110: Add missing ASRC rate register
  mfd: qcom_rpm: write fw_version to CTRL_REG
  mfd: bd9571mwv: Add volatile register to make DVFS work
  mfd: ab8500-core: Return zero in get_register_interruptible()
  mfd: mt6397: Do not call irq_domain_remove if PMIC unsupported
  mfd: db8500-prcmu: Fix some section annotations
  mfd: twl-core: Fix section annotations on {,un}protect_pm_master
  pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
  pvcalls-front: properly allocate sk
  pvcalls-front: don't try to free unallocated rings
  pvcalls-front: read all data before closing the connection
  mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
  backlight: pwm_bl: Fix devicetree parsing with auto-generated brightness tables
  KEYS: allow reaching the keys quotas exactly
  ALSA: hda/realtek: Disable PC beep in passthrough on alc285
  ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5
  proc, oom: do not report alien mms when setting oom_score_adj
  numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
  ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
  libceph: handle an empty authorize reply
  mac80211: Free mpath object when rhashtable insertion fails
  mac80211: Use linked list instead of rhashtable walk for mesh tables
  mac80211: Restore vif beacon interval if start ap fails
  gpio: pxa: avoid attempting to set pin direction via pinctrl on MMP2
  gpio: MT7621: use a per instance irq_chip structure
  MIPS: eBPF: Always return sign extended 32b values
  tracing: Fix number of entries in trace header
  ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction

Change-Id: Ie585d8274f881ac87155e9deda341c43cd8923b4
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-03-13 10:41:20 -07:00
Ivaylo Georgiev
af057a3fcf Merge android-4.19.22 (0755dc9) into msm-4.19
* refs/heads/tmp-0755dc9:
  Linux 4.19.22
  svcrdma: Remove max_sge check at connect time
  svcrdma: Reduce max_send_sges
  batman-adv: Force mac header to start of data on xmit
  batman-adv: Avoid WARN on net_device without parent in netns
  xfrm: refine validation of template and selector families
  libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()
  Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
  xfrm: Make set-mark default behavior backward compatible
  SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT
  drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user
  drm/vmwgfx: Fix setting of dma masks
  drm/i915: always return something on DDI clock selection
  drm/amd/powerplay: Fix missing break in switch
  drm/modes: Prevent division by zero htotal
  mac80211: ensure that mgmt tx skbs have tailroom for encryption
  mic: vop: Fix use-after-free on remove
  powerpc/radix: Fix kernel crash with mremap()
  firmware: arm_scmi: provide the mandatory device release callback
  ARM: dts: da850: fix interrupt numbers for clocksource
  ARM: tango: Improve ARCH_MULTIPLATFORM compatibility
  ARM: iop32x/n2100: fix PCI IRQ mapping
  MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds
  mips: loongson64: remove unreachable(), fix loongson_poweroff().
  MIPS: VDSO: Use same -m%-float cflag as the kernel proper
  MIPS: OCTEON: don't set octeon_dma_bar_type if PCI is disabled
  mips: cm: reprime error cause
  tracing: uprobes: Fix typo in pr_fmt string
  pinctrl: cherryview: fix Strago DMI workaround
  pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller
  debugfs: fix debugfs_rename parameter checking
  samples: mei: use /dev/mei0 instead of /dev/mei
  mei: me: add ice lake point device id.
  misc: vexpress: Off by one in vexpress_syscfg_exec()
  signal: Better detection of synchronous signals
  signal: Always notice exiting tasks
  iio: ti-ads8688: Update buffer allocation for timestamps
  iio: chemical: atlas-ph-sensor: correct IIO_TEMP values to millicelsius
  iio: adc: axp288: Fix TS-pin handling
  tools: iio: iio_generic_buffer: make num_loops signed
  libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
  mtd: rawnand: gpmi: fix MX28 bus master lockup problem
  mtd: spinand: Fix the error/cleanup path in spinand_init()
  mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache
  mtd: Make sure mtd->erasesize is valid even if the partition is of size 0
  ANDROID: cuttlefish: enable CONFIG_NET_SCH_NETEM=y
  Add XFRM-I to cuttlefish defconfigs
  ANDROID: Move from clang r346389b to r349610.

Change-Id: Ie249267aa9e0d4eb169adecafc0cdc59a0a2eb0f
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-03-13 10:41:06 -07:00
Ilya Dryomov
c74260710e libceph: handle an empty authorize reply
commit 0fd3fd0a9bb0b02b6435bb7070e9f7b82a23f068 upstream.

The authorize reply can be empty, for example when the ticket used to
build the authorizer is too old and TAG_BADAUTHORIZER is returned from
the service.  Calling ->verify_authorizer_reply() results in an attempt
to decrypt and validate (somewhat) random data in au->buf (most likely
the signature block from calc_signature()), which fails and ends up in
con_fault_finish() with !con->auth_retry.  The ticket isn't invalidated
and the connection is retried again and again until a new ticket is
obtained from the monitor:

  libceph: osd2 192.168.122.1:6809 bad authorize reply
  libceph: osd2 192.168.122.1:6809 bad authorize reply
  libceph: osd2 192.168.122.1:6809 bad authorize reply
  libceph: osd2 192.168.122.1:6809 bad authorize reply

Let TAG_BADAUTHORIZER handler kick in and increment con->auth_retry.

Cc: stable@vger.kernel.org
Fixes: 5c056fdc5b ("libceph: verify authorize reply on connect")
Link: https://tracker.ceph.com/issues/20164
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 10:08:50 +01:00
Ilya Dryomov
f7fb58a78a libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()
commit 4aac9228d16458cedcfd90c7fb37211cf3653ac3 upstream.

con_fault() can transition the connection into STANDBY right after
ceph_con_keepalive() clears STANDBY in clear_standby():

    libceph user thread               ceph-msgr worker

ceph_con_keepalive()
  mutex_lock(&con->mutex)
  clear_standby(con)
  mutex_unlock(&con->mutex)
                                mutex_lock(&con->mutex)
                                con_fault()
                                  ...
                                  if KEEPALIVE_PENDING isn't set
                                    set state to STANDBY
                                  ...
                                mutex_unlock(&con->mutex)
  set KEEPALIVE_PENDING
  set WRITE_PENDING

This triggers warnings in clear_standby() when either ceph_con_send()
or ceph_con_keepalive() get to clearing STANDBY next time.

I don't see a reason to condition queue_con() call on the previous
value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING
into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING
could have been a non-atomic flag.

Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-15 08:10:13 +01:00
Kees Cook
6ed29e4be3 libceph: Remove VLA usage of skcipher
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Git-Commit: 69d6302b65a83ce04720158f3f6fc2c9fb46c941
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: Ic0dedbad02494fdc1475ee40ded392221a62a69a
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2019-01-10 12:24:31 -07:00
Ilya Dryomov
02e28d5b85 libceph: fall back to sendmsg for slab pages
commit 7e241f647dc7087a0401418a187f3f5b527cc690 upstream.

skb_can_coalesce() allows coalescing neighboring slab objects into
a single frag:

  return page == skb_frag_page(frag) &&
         off == frag->page_offset + skb_frag_size(frag);

ceph_tcp_sendpage() can be handed slab pages.  One example of this is
XFS: it passes down sector sized slab objects for its metadata I/O.  If
the kernel client is co-located on the OSD node, the skb may go through
loopback and pop on the receive side with the exact same set of frags.
When tcp_recvmsg() attempts to copy out such a frag, hardened usercopy
complains because the size exceeds the object's allocated size:

  usercopy: kernel memory exposure attempt detected from ffff9ba917f20a00 (kmalloc-512) (1024 bytes)

Although skb_can_coalesce() could be taught to return false if the
resulting frag would cross a slab object boundary, we already have
a fallback for non-refcounted pages.  Utilize it for slab pages too.

Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-27 16:13:11 +01:00
YueHaibing
4de17aea5c crush: fix using plain integer as NULL warning
Fixes the following sparse warnings:

net/ceph/crush/mapper.c:517:76: warning: Using plain integer as NULL pointer
net/ceph/crush/mapper.c:728:68: warning: Using plain integer as NULL pointer

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:44 +02:00
YueHaibing
bad87216fb libceph: remove unnecessary non NULL check for request_key
request_key never return NULL,so no need do non-NULL check.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:44 +02:00
Ilya Dryomov
f1d10e0463 libceph: weaken sizeof check in ceph_x_verify_authorizer_reply()
Allow for extending ceph_x_authorize_reply in the future.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-08-02 21:33:26 +02:00
Ilya Dryomov
130f52f2b2 libceph: check authorizer reply/challenge length before reading
Avoid scribbling over memory if the received reply/challenge is larger
than the buffer supplied with the authorizer.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-08-02 21:33:26 +02:00
Ilya Dryomov
cc255c76c7 libceph: implement CEPHX_V2 calculation mode
Derive the signature from the entire buffer (both AES cipher blocks)
instead of using just the first half of the first block, leaving out
data_crc entirely.

This addresses CVE-2018-1129.

Link: http://tracker.ceph.com/issues/24837
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-08-02 21:33:25 +02:00
Ilya Dryomov
6daca13d2e libceph: add authorizer challenge
When a client authenticates with a service, an authorizer is sent with
a nonce to the service (ceph_x_authorize_[ab]) and the service responds
with a mutation of that nonce (ceph_x_authorize_reply).  This lets the
client verify the service is who it says it is but it doesn't protect
against a replay: someone can trivially capture the exchange and reuse
the same authorizer to authenticate themselves.

Allow the service to reject an initial authorizer with a random
challenge (ceph_x_authorize_challenge).  The client then has to respond
with an updated authorizer proving they are able to decrypt the
service's challenge and that the new authorizer was produced for this
specific connection instance.

The accepting side requires this challenge and response unconditionally
if the client side advertises they have CEPHX_V2 feature bit.

This addresses CVE-2018-1128.

Link: http://tracker.ceph.com/issues/24836
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-08-02 21:33:24 +02:00
Ilya Dryomov
149cac4a50 libceph: factor out encrypt_authorizer()
Will be used for encrypting both the initial and updated authorizers.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-08-02 21:33:24 +02:00
Ilya Dryomov
c571fe24d2 libceph: factor out __ceph_x_decrypt()
Will be used for decrypting the server challenge which is only preceded
by ceph_x_encrypt_header.

Drop struct_v check to allow for extending ceph_x_encrypt_header in the
future.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-08-02 21:33:23 +02:00
Ilya Dryomov
c0f56b483a libceph: factor out __prepare_write_connect()
Will be used for sending ceph_msg_connect with an updated authorizer,
after the server challenges the initial authorizer.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-08-02 21:33:22 +02:00
Ilya Dryomov
262614c429 libceph: store ceph_auth_handshake pointer in ceph_connection
We already copy authorizer_reply_buf and authorizer_reply_buf_len into
ceph_connection.  Factoring out __prepare_write_connect() requires two
more: authorizer_buf and authorizer_buf_len.  Store the pointer to the
handshake in con->auth rather than piling on.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2018-08-02 21:33:22 +02:00
Stephen Hemminger
24e1dd6afd ceph: fix whitespace
Remove blank lines at end of file and trailing whitespace.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02 21:33:21 +02:00
Arnd Bergmann
fac02ddf91 libceph: use timespec64 for r_mtime
The request mtime field is used all over ceph, and is currently
represented as a 'timespec' structure in Linux. This changes it to
timespec64 to allow times beyond 2038, modifying all users at the
same time.

[ Remove now redundant ts variable in writepage_nounlock(). ]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02 21:33:14 +02:00
Arnd Bergmann
473bd2d780 libceph: use timespec64 in for keepalive2 and ticket validity
ceph_con_keepalive_expired() is the last user of timespec_add() and some
of the last uses of ktime_get_real_ts().  Replacing this with timespec64
based interfaces  lets us remove that deprecated API.

I'm introducing new ceph_encode_timespec64()/ceph_decode_timespec64()
here that take timespec64 structures and convert to/from ceph_timespec,
which is defined to have an unsigned 32-bit tv_sec member. This extends
the range of valid times to year 2106, avoiding the year 2038 overflow.

The ceph file system portion still uses the old functions for inode
timestamps, this will be done separately after the VFS layer is converted.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02 21:26:12 +02:00
Ilya Dryomov
2f56b6bae7 libceph: amend "bad option arg" error message
Don't mention "mount" -- in the rbd case it is "mapping".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02 21:26:11 +02:00
Chengguang Xu
17173c82e3 libceph: stop parsing when a bad int arg is detected
There is no reason to continue option parsing after detecting
bad option.

[ Return match_int() errors from ceph_parse_options() to match the
  behaviour of parse_rbd_opts_token() and parse_fsopt_token(). ]

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02 21:26:11 +02:00
Ilya Dryomov
6d54228fd1 libceph: make ceph_osdc_notify{,_ack}() payload_len u32
The wire format dictates that payload_len fits into 4 bytes.  Take u32
instead of size_t to reflect that.

All callers pass a small integer, so no changes required.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02 21:26:11 +02:00
Linus Torvalds
dc594c39f7 The main piece is a set of libceph changes that revamps how OSD
requests are aborted, improving CephFS ENOSPC handling and making
 "umount -f" actually work (Zheng and myself).  The rest is mostly
 mount option handling cleanups from Chengguang and assorted fixes
 from Zheng, Luis and Dongsheng.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbIkigAAoJEEp/3jgCEfOL3EUH/1s7Ib3FgFzG/SPPKISxZOGr
 ndZGg0rPT9mPIQ4rp6t0z/cDlMrluPmCK3sWrAPe//sZz9iZiuip+mCL0gUFXFNr
 1kL2xDKkJzGxtP3UlUvr5CC6bnxLdeBXJRBDLk/swtphuqArKndlbN/iLZnCZivT
 uJDk+vZTwNJ3UhQP4QdnOQLV60NYs+q4euTqbZF3+pDiRiONbxRfXC3adFsc8zL9
 zlie3CHPbrQHWMsfNvbfM3rBH1WhTwEssDm+IEFlKl19q9SKP2WPZfmBcE1pmZ58
 AhIMoNGdQha1FXS6N96kaPaqFgeysPnEPoyHDqLxsUMKqsvJlOEZsK1jujza4rE=
 =EfXm
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The main piece is a set of libceph changes that revamps how OSD
  requests are aborted, improving CephFS ENOSPC handling and making
  "umount -f" actually work (Zheng and myself).

  The rest is mostly mount option handling cleanups from Chengguang and
  assorted fixes from Zheng, Luis and Dongsheng.

* tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits)
  rbd: flush rbd_dev->watch_dwork after watch is unregistered
  ceph: update description of some mount options
  ceph: show ino32 if the value is different with default
  ceph: strengthen rsize/wsize/readdir_max_bytes validation
  ceph: fix alignment of rasize
  ceph: fix use-after-free in ceph_statfs()
  ceph: prevent i_version from going back
  ceph: fix wrong check for the case of updating link count
  libceph: allocate the locator string with GFP_NOFAIL
  libceph: make abort_on_full a per-osdc setting
  libceph: don't abort reads in ceph_osdc_abort_on_full()
  libceph: avoid a use-after-free during map check
  libceph: don't warn if req->r_abort_on_full is set
  libceph: use for_each_request() in ceph_osdc_abort_on_full()
  libceph: defer __complete_request() to a workqueue
  libceph: move more code into __complete_request()
  libceph: no need to call flush_workqueue() before destruction
  ceph: flush pending works before shutdown super
  ceph: abort osd requests on force umount
  libceph: introduce ceph_osdc_abort_requests()
  ...
2018-06-15 07:24:58 +09:00
Kees Cook
6da2ec5605 treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Linus Torvalds
2857676045 - Introduce arithmetic overflow test helper functions (Rasmus)
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
 - Introduce overflow test module (Rasmus, Kees)
 - Introduce saturating size helper functions (Matthew, Kees)
 - Treewide use of struct_size() for allocators (Kees)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsYJ1gWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlCTEACwdEeriAd2VwxknnsstojGD/3g
 8TTFA19vSu4Gxa6WiDkjGoSmIlfhXTlZo1Nlmencv16ytSvIVDNLUIB3uDxUIv1J
 2+dyHML9JpXYHHR7zLXXnGFJL0wazqjbsD3NYQgXqmun7EVVYnOsAlBZ7h/Lwiej
 jzEJd8DaHT3TA586uD3uggiFvQU0yVyvkDCDONIytmQx+BdtGdg9TYCzkBJaXuDZ
 YIthyKDvxIw5nh/UaG3L+SKo73tUr371uAWgAfqoaGQQCWe+mxnWL4HkCKsjFzZL
 u9ouxxF/n6pij3E8n6rb0i2fCzlsTDdDF+aqV1rQ4I4hVXCFPpHUZgjDPvBWbj7A
 m6AfRHVNnOgI8HGKqBGOfViV+2kCHlYeQh3pPW33dWzy/4d/uq9NIHKxE63LH+S4
 bY3oO2ela8oxRyvEgXLjqmRYGW1LB/ZU7FS6Rkx2gRzo4k8Rv+8K/KzUHfFVRX61
 jEbiPLzko0xL9D53kcEn0c+BhofK5jgeSWxItdmfuKjLTW4jWhLRlU+bcUXb6kSS
 S3G6aF+L+foSUwoq63AS8QxCuabuhreJSB+BmcGUyjthCbK/0WjXYC6W/IJiRfBa
 3ZTxBC/2vP3uq/AGRNh5YZoxHL8mSxDfn62F+2cqlJTTKR/O+KyDb1cusyvk3H04
 KCDVLYPxwQQqK1Mqig==
 =/3L8
 -----END PGP SIGNATURE-----

Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull overflow updates from Kees Cook:
 "This adds the new overflow checking helpers and adds them to the
  2-factor argument allocators. And this adds the saturating size
  helpers and does a treewide replacement for the struct_size() usage.
  Additionally this adds the overflow testing modules to make sure
  everything works.

  I'm still working on the treewide replacements for allocators with
  "simple" multiplied arguments:

     *alloc(a * b, ...) -> *alloc_array(a, b, ...)

  and

     *zalloc(a * b, ...) -> *calloc(a, b, ...)

  as well as the more complex cases, but that's separable from this
  portion of the series. I expect to have the rest sent before -rc1
  closes; there are a lot of messy cases to clean up.

  Summary:

   - Introduce arithmetic overflow test helper functions (Rasmus)

   - Use overflow helpers in 2-factor allocators (Kees, Rasmus)

   - Introduce overflow test module (Rasmus, Kees)

   - Introduce saturating size helper functions (Matthew, Kees)

   - Treewide use of struct_size() for allocators (Kees)"

* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  treewide: Use struct_size() for devm_kmalloc() and friends
  treewide: Use struct_size() for vmalloc()-family
  treewide: Use struct_size() for kmalloc()-family
  device: Use overflow helpers for devm_kmalloc()
  mm: Use overflow helpers in kvmalloc()
  mm: Use overflow helpers in kmalloc_array*()
  test_overflow: Add memory allocation overflow tests
  overflow.h: Add allocation size calculation helpers
  test_overflow: Report test failures
  test_overflow: macrofy some more, do more tests for free
  lib: add runtime test of check_*_overflow functions
  compiler.h: enable builtin overflow checkers and add fallback code
2018-06-06 17:27:14 -07:00
Kees Cook
acafe7e302 treewide: Use struct_size() for kmalloc()-family
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:

// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
//                      sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@

- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-06 11:15:43 -07:00
Ilya Dryomov
a86f009f10 libceph: allocate the locator string with GFP_NOFAIL
calc_target() isn't supposed to fail with anything but POOL_DNE, in
which case we report that the pool doesn't exist and fail the request
with -ENOENT.  Doing this for -ENOMEM is at the very least confusing
and also harmful -- as the preceding requests complete, a short-lived
locator string allocation is likely to succeed after a wait.

(We used to call ceph_object_locator_to_pg() for a pi lookup.  In
theory that could fail with -ENOENT, hence the "ret != -ENOENT" warning
being removed.)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:00 +02:00
Ilya Dryomov
c843d13cae libceph: make abort_on_full a per-osdc setting
The intent behind making it a per-request setting was that it would be
set for writes, but not for reads.  As it is, the flag is set for all
fs/ceph requests except for pool perm check stat request (technically
a read).

ceph_osdc_abort_on_full() skips reads since the previous commit and
I don't see a use case for marking individual requests.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:46:00 +02:00
Ilya Dryomov
690f951d7e libceph: don't abort reads in ceph_osdc_abort_on_full()
Don't consider reads for aborting and use ->base_oloc instead of
->target_oloc, as done in __submit_request().

Strictly speaking, we shouldn't be aborting FULL_TRY/FULL_FORCE writes
either.  But, there is an inconsistency in FULL_TRY/FULL_FORCE handling
on the OSD side [1], so given that neither of these is used in the
kernel client, leave it for when the OSD behaviour is sorted out.

[1] http://tracker.ceph.com/issues/24339

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:59 +02:00
Ilya Dryomov
6001567c14 libceph: avoid a use-after-free during map check
Sending map check after complete_request() was called is not only
useless, but can lead to a use-after-free as req->r_kref decrement in
__complete_request() races with map check code.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:59 +02:00
Ilya Dryomov
29e878201e libceph: don't warn if req->r_abort_on_full is set
The "FULL or reached pool quota" warning is there to explain paused
requests.  No need to emit it if pausing isn't going to occur.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:58 +02:00
Ilya Dryomov
4eea0fefd7 libceph: use for_each_request() in ceph_osdc_abort_on_full()
Scanning the trees just to see if there is anything to abort is
unnecessary -- all that is needed here is to update the epoch barrier
first, before we start aborting.  Simplify and do the update inside the
loop before calling abort_request() for the first time.

The switch to for_each_request() also fixes a bug: homeless requests
weren't even considered for aborting.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:58 +02:00
Ilya Dryomov
88bc1922c2 libceph: defer __complete_request() to a workqueue
In the common case, req->r_callback is called by handle_reply() on the
ceph-msgr worker thread without any locks.  If handle_reply() fails, it
is called with both osd->lock and osdc->lock.  In the map check case,
it is called with just osdc->lock but held for write.  Finally, if the
request is aborted because of -ENOSPC or by ceph_osdc_abort_requests(),
it is called directly on the submitter's thread, again with both locks.

req->r_callback on the submitter's thread is relatively new (introduced
in 4.12) and ripe for deadlocks -- e.g. writeback worker thread waiting
on itself:

  inode_wait_for_writeback+0x26/0x40
  evict+0xb5/0x1a0
  iput+0x1d2/0x220
  ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph]
  writepages_finish+0x2d3/0x410 [ceph]
  __complete_request+0x26/0x60 [libceph]
  complete_request+0x2e/0x70 [libceph]
  __submit_request+0x256/0x330 [libceph]
  submit_request+0x2b/0x30 [libceph]
  ceph_osdc_start_request+0x25/0x40 [libceph]
  ceph_writepages_start+0xdfe/0x1320 [ceph]
  do_writepages+0x1f/0x70
  __writeback_single_inode+0x45/0x330
  writeback_sb_inodes+0x26a/0x600
  __writeback_inodes_wb+0x92/0xc0
  wb_writeback+0x274/0x330
  wb_workfn+0x2d5/0x3b0

Defer __complete_request() to a workqueue in all failure cases so it's
never on the same thread as ceph_osdc_start_request() and always called
with no locks held.

Link: http://tracker.ceph.com/issues/23978
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:58 +02:00
Ilya Dryomov
26df726bcd libceph: move more code into __complete_request()
Move req->r_completion wake up and req->r_kref decrement into
__complete_request().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:58 +02:00
Ilya Dryomov
0d09c57d08 libceph: no need to call flush_workqueue() before destruction
destroy_workqueue() drains the workqueue before proceeding with
destruction.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:57 +02:00
Ilya Dryomov
66850df585 libceph: introduce ceph_osdc_abort_requests()
This will be used by the filesystem for "umount -f".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:57 +02:00
Ilya Dryomov
e5c9388399 libceph: use MSG_TRUNC for discarding received bytes
Avoid a copy into the "skip buffer".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:55 +02:00
Ilya Dryomov
d2935d6f75 libceph: get rid of more_kvec in try_write()
All gotos to "more" are conditioned on con->state == OPEN, but the only
thing "more" does is opening the socket if con->state == PREOPEN.  Kill
that label and rename "more_kvec" to "more".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-06-04 20:45:55 +02:00
Chengguang Xu
fe943d5042 libceph, rbd: add error handling for osd_req_op_cls_init()
Add proper error handling for osd_req_op_cls_init() to replace
BUG_ON statement when failing from memory allocation.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:54 +02:00
Ilya Dryomov
0010f7052d libceph: add osd_req_op_extent_osd_data_bvecs()
... and store num_bvecs for client code's convenience.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-05-10 10:15:05 +02:00
Ilya Dryomov
9c55ad1c21 libceph: validate con->state at the top of try_write()
ceph_con_workfn() validates con->state before calling try_read() and
then try_write().  However, try_read() temporarily releases con->mutex,
notably in process_message() and ceph_con_in_msg_alloc(), opening the
window for ceph_con_close() to sneak in, close the connection and
release con->sock.  When try_write() is called on the assumption that
con->state is still valid (i.e. not STANDBY or CLOSED), a NULL sock
gets passed to the networking stack:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: selinux_socket_sendmsg+0x5/0x20

Make sure con->state is valid at the top of try_write() and add an
explicit BUG_ON for this, similar to try_read().

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/23706
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-04-26 17:39:08 +02:00
Ilya Dryomov
7b4c443d13 libceph: reschedule a tick in finish_hunting()
If we go without an established session for a while, backoff delay will
climb to 30 seconds.  The keepalive timeout is also 30 seconds, so it's
pretty easily hit after a prolonged hunting for a monitor: we don't get
a chance to send out a keepalive in time, which means we never get back
a keepalive ack in time, cutting an established session and attempting
to connect to a different monitor every 30 seconds:

  [Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established
  [Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon
  [Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established
  [Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
  [Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established
  [Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon
  [Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established
  [Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon

The regular keepalive interval is 10 seconds.  After ->hunting is
cleared in finish_hunting(), call __schedule_delayed() to ensure we
send out a keepalive after 10 seconds.

Cc: stable@vger.kernel.org # 4.7+
Link: http://tracker.ceph.com/issues/23537
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-04-24 10:40:21 +02:00
Ilya Dryomov
facb9f6eba libceph: un-backoff on tick when we have a authenticated session
This means that if we do some backoff, then authenticate, and are
healthy for an extended period of time, a subsequent failure won't
leave us starting our hunting sequence with a large backoff.

Mirrors ceph.git commit d466bc6e66abba9b464b0b69687cf45c9dccf383.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-04-24 10:39:52 +02:00
Linus Torvalds
b284d4d5a6 The big ticket items are:
- support for rbd "fancy" striping (myself).  The striping feature bit
   is now fully implemented, allowing mapping v2 images with non-default
   striping patterns.  This completes support for --image-format 2.
 
 - CephFS quota support (Luis Henriques and Zheng Yan).  This set is
   based on the new SnapRealm code in the upcoming v13.y.z ("Mimic")
   release.  Quota handling will be rejected on older filesystems.
 
 - memory usage improvements in CephFS (Chengguang Xu).  Directory
   specific bits have been split out of ceph_file_info and some effort
   went into improving cap reservation code to avoid OOM crashes.
 
 Also included a bunch of assorted fixes all over the place from
 Chengguang and others.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJazOI/AAoJEEp/3jgCEfOLOu0IAKGFkcCo0UdQDGHHJZHn2rAm
 CSWMMwyYGAhoWI6Gva0jx1A2omZLFSeq/MC8dWLL/MNAKt8i/qo8bTsTrwCHMR2Q
 D0FsvMWIhkWRS1/FcD1uVDhn0a/DFm5Kfy8kzz3v695TDCt+BYWrCqyHTB/wSdRR
 VpO3KdpHQ9h3ojNBRgIniOCNPeQP+QzLXy+P0h0oKbP2Y03mwJlsWG4L6zakkkwT
 e2I+RVdlOMUDJ7rZxiXESBr6BuLI4oOkPe8roQGmZPy1Xe17xa9M5iWVNuM6RUhO
 Z9bS2aLMhbDyeCPqvzgAnsUtFT0PAQjB5NYw2yqisbHs/wrU5kMOOpcLqz/Ls/s=
 =v1I9
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The big ticket items are:

   - support for rbd "fancy" striping (myself).

     The striping feature bit is now fully implemented, allowing mapping
     v2 images with non-default striping patterns. This completes
     support for --image-format 2.

   - CephFS quota support (Luis Henriques and Zheng Yan).

     This set is based on the new SnapRealm code in the upcoming v13.y.z
     ("Mimic") release. Quota handling will be rejected on older
     filesystems.

   - memory usage improvements in CephFS (Chengguang Xu).

     Directory specific bits have been split out of ceph_file_info and
     some effort went into improving cap reservation code to avoid OOM
     crashes.

  Also included a bunch of assorted fixes all over the place from
  Chengguang and others"

* tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits)
  ceph: quota: report root dir quota usage in statfs
  ceph: quota: add counter for snaprealms with quota
  ceph: quota: cache inode pointer in ceph_snap_realm
  ceph: fix root quota realm check
  ceph: don't check quota for snap inode
  ceph: quota: update MDS when max_bytes is approaching
  ceph: quota: support for ceph.quota.max_bytes
  ceph: quota: don't allow cross-quota renames
  ceph: quota: support for ceph.quota.max_files
  ceph: quota: add initial infrastructure to support cephfs quotas
  rbd: remove VLA usage
  rbd: fix spelling mistake: "reregisteration" -> "reregistration"
  ceph: rename function drop_leases() to a more descriptive name
  ceph: fix invalid point dereference for error case in mdsc destroy
  ceph: return proper bool type to caller instead of pointer
  ceph: optimize memory usage
  ceph: optimize mds session register
  libceph, ceph: add __init attribution to init funcitons
  ceph: filter out used flags when printing unused open flags
  ceph: don't wait on writeback when there is no more dirty pages
  ...
2018-04-10 12:25:30 -07:00
Luis Henriques
fb18a57568 ceph: quota: add initial infrastructure to support cephfs quotas
This patch adds the infrastructure required to support cephfs quotas as it
is currently implemented in the ceph fuse client.  Cephfs quotas can be
set on any directory, and can restrict the number of bytes or the number
of files stored beneath that point in the directory hierarchy.

Quotas are set using the extended attributes 'ceph.quota.max_files' and
'ceph.quota.max_bytes', and can be removed by setting these attributes to
'0'.

Link: http://tracker.ceph.com/issues/22372
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 11:17:51 +02:00
Chengguang Xu
57a35dfb52 libceph, ceph: add __init attribution to init funcitons
Add __init attribution to the functions which are called only once
during initiating/registering operations and deleting unnecessary
symbol exports.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:48 +02:00