RISC-V Linux 内核及周边技术动态第 78 期

呀呀呀创作于 2024/02/17

时间：20240213
编辑：晓怡
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v8: riscv: sophgo: add clock support for Sophgo CV1800/SG2000 SoCs

To perform well on short messages, the new implementation processes the full message in one call to the assembly function if the data is contiguous. Otherwise it falls back to CBC operations followed by CTS at the end. For decryption, to further improve performance on short messages, especially block-aligned messages, the CBC-CTS assembly function parallelizes the AES decryption of all full blocks.

v11: riscv: Create and document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl

Improve the performance of icache flushing by creating a new prctl flag PR_RISCV_SET_ICACHE_FLUSH_CTX. The interface is left generic to allow for future expansions such as with the proposed J extension [1].
Documentation is also provided to explain the use case.

v2: riscv: pwm: sophgo: add pwm support for CV1800

The Sophgo CV1800 chip provides a set of four independent PWM channel outputs. This series adds PWM controller support for Sophgo cv1800.

v2: riscv/fence: Consolidate fence definitions and define __{mb,rmb,wmb}

Disparate fence implementations are consolidated into fence.h.
Introduce __{mb,rmb,wmb}, and rely on the generic definitions for {mb,rmb,wmb}. A first consequence is that __{mb,rmb,wmb} map to a compiler barrier on !SMP (while their definition remains unchanged on SMP).

v1: riscv: Various text patching improvements

Here are a few changes to minimize calls to stop_machine() and flush_icache_*() in the various text patching functions, as well as to simplify the code.

v2: RISC-V: Add dynamic TSO support

The upcoming RISC-V Ssdtso specification introduces a bit in the senvcfg CSR to switch the memory consistency model of user mode at run-time from RVWMO to TSO. The active consistency model can therefore be switched on a per-hart base and managed by the kernel on a per-process base.

v1: clk: renesas: rzg2l: Add support for power domains

Series adds support for power domains on rzg2l driver.
RZ/G2L kind of devices support a functionality called MSTOP (module stop/standby). According to hardware manual the module could be switch to standby after its clocks are disabled. The reverse order of operation should be done when enabling a module (get the module out of standby, enable its clocks etc).

v1: -next: RISC-V: ACPI: Enable CPPC based cpufreq support

This series enables the support for “Collaborative Processor Performance Control (CPPC) on ACPI based RISC-V platforms. It depends on the encoding of CPPC registers as defined in RISC-V FFH spec [2].

GIT PULL: percpu changes for v6.8-rc4

The PR to enable the percpu page allocator had a tlb flush parameter mixup of end vs size.. This contains the fix.

v2: riscv: sophgo: add i2c and spi device to CV180x/SG2000x SoCs

Add i2c and spi devices
The patch depends on the clk patch: https://lore.kernel.org/all/IA1PR20MB4953C774D41EDF1EADB6EC18BB6D2@IA1PR20MB4953.namprd20.prod.outlook.com/

v1: RISC-V: Don’t use IPIs in flush_icache_all() when patching text

If some of the HARTs are parked by stop machine then IPI-based flushing in flush_icache_all() will hang. This hang is observed when text patching is invoked by various debug and BPF features.

进程调度

v1: sched: make cpu_util_cfs visible

As RT, DL, IRQ time could be deemed as lost time of CFS’s task, some timing value want to know the distribution of how these spread approximately by using utilization account value (nivcsw is not enough sometimes), wheras, cpu_util_cfs is not visible out side of kernel/sched, This commit would like to make it be visible.

v2: sched/debug: Dump end of stack when detected corrupted

When debugging a kernel hang during suspend/resume [1], there were random memory corruptions in different places like being reported by slub_debug+KASAN, or detected by scheduler with error message

v2: perf sched: Minor optimizations for resource initialization

start_work_mutex, work_done_wait_mutex, curr_thread, curr_pid, and cpu_last_switched are initialized together in cmd_sched(), but for different perf sched subcommands, some actions are unnecessary, especially perf sched record. This series of patches initialize only required resources for different subcommands.

v2: net-next: net/sched: actions report errors with extack

When an action detects invalid parameters, it should be adding an external ack to netlink so that the user is able to diagnose the issue.

v1: sched: cpufreq: Rename map_util_perf to apply_dvfs_headroom

We are providing headroom for the utilization to grow until the next decision point to pick the next frequency. Give the function a better name and give it some documentation. It is not really mapping anything.

内存管理

v1: hugetlb: two small improvements of hugetlb init parallelization

This series includes two improvements: fixing the PADATA Kconfig warning and a potential bug in gather_bootmem_prealloc_parallel. Please refer to the specific commit message for details.

v2: enable bs > ps in XFS

This is the second version of the series that enables block size > page size (Large Block Size) in XFS. This version has various bug fixes and suggestion collected from the v1[1]. The context and motivation can be seen in cover letter of the v1. We also recorded a talk about this effort at LPC [2], if someone would like more context on this effort.

v5: per-vma locks in userfaultfd

Performing userfaultfd operations (like copy/move etc.) in critical section of mmap_lock (read-mode) causes significant contention on the lock when operations requiring the lock in write-mode are taking place concurrently. We can use per-vma locks instead to significantly reduce the contention issue.

v3: Memory allocation profiling

Memory allocation, v3 and final:
Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production.

v1: mm: document memalloc_noreclaim_save() and memalloc_pin_save()

The memalloc_noreclaim_save() function currently has no documentation comment, so the implications of its usage are not obvious. Namely that it not only prevents entering reclaim (as the name suggests), but also allows using all memory reserves and thus should be only used in contexts that are allocating memory to free memory. This may lead to new improper usages being added.

v4: Enable >0 order folio memory compaction

This patchset enables >0 order folio memory compaction, which is one of the prerequisitions for large folio support[1]. It is on top of mm-everything-2024-02-10-00-56.

v3: kasan: add atomic tests

Test that KASan can detect some unsafe atomic accesses.
As discussed in the linked thread below, these tests attempt to cover the most common uses of atomics and, therefore, aren’t exhaustive.

v2: Port hierarchical_{memory,swap}_limit cgroup1->cgroup2

which are useful for userland to easily and performance-wise find out the effective cgroup limits being applied. Otherwise userland has to open+read+close the file “memory.max” and/or “memory.swap.max” in multiple parent directories of a nested cgroup.

v1: mm/zswap: optimize for dynamic zswap_pools

Dynamic pool creation has been supported for a long time, which maybe not used so much in practice. But with the per-memcg lru merged, the current structure of zswap_pool’s lru and shrinker become less optimal.

v1: x86/vdso: Move vDSO to mmap region

The vDSO (and its initial randomization) was introduced in commit 2aae950b21e4 (“x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu”), but had very low entropy. The entropy was improved in commit but there is still improvement to be made.

v2: mm/memory: optimize unmap/zap with PTE-mapped THP

This series is based on [1]. Similar to what we did with fork(), let’s implement PTE batching during unmap/zap when processing PTE-mapped THPs.
We collect consecutive PTEs that map consecutive pages of the same large folio, making sure that the other PTE bits are compatible, and (a) adjust the refcount only once per batch, (b) call rmap handling functions only once per batch, (c) perform batch PTE setting/updates and (d) perform TLB entry removal once per batch.

v1: selftests/mm: Don’t needlessly use sudo to obtain root in run_vmtests.sh

When opening yama/ptrace_scope we unconditionally use sudo to ensure we are running as root, resulting in failures if running in a minimal root filesystem where sudo is not installed. Since automated test systems will typically just run all of kselftest as root (and many kselftests rely on this for full functionality) add a check to see if we’re already root and only invoke sudo if not.

v1: mm/hugetlb: Move page order check inside hugetlb_cma_reserve()

All platforms could benefit from page order check against MAX_PAGE_ORDER before allocating a CMA area for gigantic hugetlb pages. Let’s move this check from individual platforms to generic hugetlb.

v2: bpf-next: bpf: Introduce BPF arena.

The work on bpf_arena was inspired by Barret’s work: https://github.com/google/ghost-userspace/blob/main/lib/queue.bpf.h that implements queues, lists and AVL trees completely as bpf programs using giant bpf array map and integer indices instead of pointers. bpf_arena is a sparse array that allows to use normal C pointers to build such data structures. Last few patches implement page_frag allocator, link list and hash table as bpf programs.

v1: mm/memblock: Add MEMBLOCK_RSRV_NOINIT into flagname[] array

The commit 77e6c43e137c (“memblock: introduce MEMBLOCK_RSRV_NOINIT flag”) skipped adding this newly introduced memblock flag into flagname[] array, thus preventing a correct memblock flags output for applicable memblock regions.

v2: Memory management patches needed by Rust Binder

This patchset contains some abstractions needed by the Rust implementation of the Binder driver for passing data between userspace, kernelspace, and directly into other processes.

v1: fs/proc/task_mmu: Add display flag for VM_MAYOVERLAY

VM_UFFD_MISSING flag is mutually exclussive with VM_MAYOVERLAY flag as they both use the same bit position i.e 0x00000200 in the vm_flags. Let’s update show_smap_vma_flags() to display the correct flags depending on CONFIG_MMU.

文件系统

v6: Set casefold/fscrypt dentry operations through sb->s_d_op

v6 of this patchset applying the comments from Eric and the suggestion from Christian. Thank you for your feedback.

v4: fs-verity support for XFS

Here’s v4 of my patchset of adding fs-verity support to XFS.
This implementation uses extended attributes to store fs-verity metadata. The Merkle tree blocks are stored in the remote extended attributes. The names are offsets into the tree.

v1: dcache: rename d_genocide()

Political context aside, using analogies from the real world in code is supposed to help us human programmers understand the code better.

v1: fs/hfsplus: use better @opf description

Use a more descriptive explanation of the @opf function parameter, more in line with <linux/blk_types.h>.

v1: udf: convert to new mount API

Convert the UDF filesystem to the new mount API.
UDF is slightly unique in that it always preserves prior mount options across a remount, so that’s handled by udf_init_options().

v2: zonefs: convert zonefs to use the new mount api

Convert the zonefs filesystem to use the new mount API. Tested using the zonefs test suite from: https://github.com/damien-lemoal/zonefs-tools

v9: Landlock: IOCTL support

Introduce the LANDLOCK_ACCESS_FS_IOCTL right, which restricts the use of ioctl(2) on file descriptors.
We attach IOCTL access rights to opened file descriptors, as we already do for LANDLOCK_ACCESS_FS_TRUNCATE.

v7: filtering and snapshots of a block devices

The filtering block device mechanism is implemented in the block layer. This allows to attach and detach block device filters. Filters extend the functionality of the block layer. See more in Documentation/block/blkfilter.rst.

v1: quota: Detect loops in quota tree

Syzbot has found that when it creates corrupted quota files where the quota tree contains a loop, we will deadlock when tryling to insert a dquot. Add loop detection into functions traversing the quota tree.

安全增强

v1: Xperia 1 V support

DTS for the phone and some fly-by fixes
Patch 1 for Mark/sound Rest for qcom

v1: hardening: Enable KFENCE in the hardening config

KFENCE is not a security mitigation mechanism (due to sampling), but has the performance characteristics of unintrusive hardening techniques. When used at scale, however, it improves overall security by allowing kernel developers to detect heap memory-safety bugs cheaply.

v1: iommu/mtk_iommu: Use devm_kcalloc() instead of devm_kzalloc()

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].
Here the multiplication is obviously safe because MTK_PROTECT_PA_ALIGN is defined as a literal value of 256 or 128.

v1: iommu/vt-d: Use kcalloc() instead of kzalloc()

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].
Here the multiplication is obviously safe because DMAR_LATENCY_NUM is the number of latency types defined in the “latency_type” enum.

v2: mtd: rawnand: Prefer struct_size over open coded arithmetic

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].

v1: fs/ntfs3: use kcalloc() instead of kzalloc()

We are trying to get rid of all multiplications from allocation functions to prevent integer overflows[1]. Here the multiplication is obviously safe, but using kcalloc() is more appropriate and improves readability. This patch has no effect on runtime behavior.

v1: stddef: Allow attributes to be used when creating flex arrays

We’re going to have more cases where we need to apply attributes (e.g. __counted_by) to struct members that have been declared with DECLARE_FLEX_ARRAY. Add a new …_ATTR helper to allow for this and annotate one such user in linux/in.h.

v1: irqchip/bcm-6345-l1: Prefer struct_size over open coded arithmetic

This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1].

v1: drm/i915: Add flex arrays to struct i915_syncmap

The “struct i915_syncmap” uses a dynamically sized set of trailing elements. It can use an “u32” array or a “struct i915_syncmap *” array.

v1: scsi: Replace {v}snprintf() variants with safer alternatives

Note: We’re also taking the time to obay our new .editorconfig overlord!
For a far better description of the problem than I could author, see Jon’s write-up on LWN [1] and/or Alex’s on the Kernel Self Protection Project [1].

v3: wifi: mwifiex: Refactor 1-element array into flexible array in struct mwifiex_ie_types_chan_list_param_set

struct mwifiex_ie_types_chan_list_param_set::chan_scan_param is treated as a flexible array, so convert it into one so that it doesn’t trip the array bounds sanitizer[1]. Only a few places were using sizeof() on the whole struct, so adjust those to follow the calculation pattern to avoid including the trailing single element.

v3: pstore: add multi-backend support

I have been steadily working but struggled to find a seamlessly integrated way to implement tty frontend until Guilherme inspired me that multi-backend and tty frontend are actually two separate entities.

v1: xen/gntalloc: Replace UAPI 1-element array

Without changing the structure size (since it is UAPI), add a proper flexible array member, and reference it in the kernel so that it will not be trip the array-bounds sanitizer[1].

v1: net/sun3_82586: Avoid reading past buffer in debug output

Since NUM_XMIT_BUFFS is always 1, building m68k with sun3_defconfig and -Warraybounds, this build warning is visible[1]:

v3: Tegra30: add support for LG tegra based phones

Bring up Tegra 3 based LG phones Optimus 4X HD and Optimus Vu based on LG X3 board.

v2: selftests/seccomp: Pin benchmark to single CPU

The seccomp benchmark test (for validating the benefit of bitmaps) can be sensitive to scheduling speed, so pin the process to a single CPU, which appears to significantly improve reliability, and loosen the “close enough” checking to allow up to 10% variance instead of 1%.

v3: ubsan: Reintroduce signed overflow sanitizer

In order to mitigate unexpected signed wrap-around[1], bring back the signed integer overflow sanitizer. It was removed in commit 6aaa31aeb9cf (“ubsan: remove overflow checks”) because it was effectively a no-op when combined with -fno-strict-overflow (which correctly changes signed overflow from being “undefined” to being explicitly “wrap around”).

异步 IO

v1: -next: io_uring: switch struct io_kiocb flag definitions to BIT_ULL()

The io_kiocb.flags variable was expanded to 64 bits, but none of the existing or newly-added flag definitions were updated, causing build issues on 32-bit platforms, where unsigned long is a 32-bit value.

v1: liburing: add script for statistics sqpoll running time

Count the running time and actual IO processing time of the sqpoll thread, and output the statistical time to terminal.

v8: io_uring: Statistics of the true utilization of sq threads.

Count the running time and actual IO processing time of the sqpoll thread, and output the statistical data to fdinfo.
Variable description: “work_time” in the code represents the sum of the jiffies of the sq thread actually processing IO, that is, how many milliseconds it actually takes to process IO. “total_time” represents the total time that the sq thread has elapsed from the beginning of the loop to the current time point, that is, how many milliseconds it has spent in total.

Rust For Linux

v2: rust: locks: Add get_mut method to Lock

Having a mutable reference guarantees that no other threads have access to the lock, so we can take advantage of that to grant callers access to the protected data without the the cost of acquiring and releasing the locks. Since the lifetime of the data is tied to the mutable reference, the borrow checker guarantees that the usage is safe.

v1: bcachefs: add framework for internal Rust code

This series adds support for Rust code into bcachefs. This only enables using Rust internally within bcachefs; there are no public Rust APIs added. Rust support is hidden behind a new config option, CONFIG_BCACHEFS_RUST. It is optional and bcachefs can still be built with full functionality without rust.

v3: rust: place generated init_module() function in .init.text

Currently Rust kernel modules have their init code placed in the .text section of the .ko file. I don’t think this causes any real problems for Rust modules as long as all code called during initialization lives in .text.

v1: rust: stop using ptr_metadata feature

The byte_sub method was stabilized in Rust 1.75.0. By using that method, we no longer need the unstable ptr_metadata feature for implementing Arc::from_raw.

周边技术动态

Buildroot

v1: package/libopenssl: security bump to version 3.2.1

And drop the now upstreamed patches.
Fixes the following (low severity) issues:
CVE-2023-6129 POLY1305 MAC implementation corrupts vector registers on PowerPC https://www.openssl.org/news/secadv/20240109.txt

support/testing: add optee-os runtime test

commit: https://git.buildroot.net/buildroot/commit/?id=cd56ac9eb63f0acecd78b1983f9d889f21f8fe0e branch: https://git.buildroot.net/buildroot/commit/?id=refs/heads/master

U-Boot

v1: Added FDT PAD memory size while reserving memory for FDT to avoid some memory corruption issue

In the board_f.c file the FDT memory region is reserved without FDT padding bytes. uboot will add some params like bootargs while launching linux. While relocate the FDT, if its decided as run in the Fixed memory location i.e fdy_high is set as -1, then the padding bytes not added while relocating the FDT, but the size is blindly added with padding bytes without reserving the physical memory in the FDT header in the image_fdt.c file.

[置顶] Linux Lab v1.3 升级部分内核到 v6.6，新增上游内核工具链支持，完善 riscv64 和 nolibc 开发支持，另有新增 2 款虚拟开发板：ppc64le/pseries 和 ppc64le/powernvLinux Lab 发布 v1.3 正式版，升级部分内核到 v6.6，新增 2 款 ppc64 虚拟开发板

RISC-V Linux 内核及周边技术动态第 78 期

内核动态

RISC-V 架构支持

进程调度

内存管理

文件系统

安全增强

异步 IO

Rust For Linux

周边技术动态

Buildroot

U-Boot

猜你喜欢：

Read Album:

Read Related:

Read Latest:

支付宝打赏￥9.68元		微信打赏￥9.68元
	请作者喝杯咖啡吧