RISC-V Linux 内核及周边技术动态第 88 期

呀呀呀创作于 2024/04/22

时间：20240421
编辑：晓怡
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v6: add initial Milk-V Duo S board support

This adds an initial device tree for the Milk-V Duo S board. Last tested on linux-next 20240419

v3: riscv: Support vendor extensions and xtheadvector

This patch series ended up much larger than expected, please bear with me! The goal here is to support vendor extensions, starting at probing the device tree and ending with reporting to userspace.

v8: RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest

This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation to provide counter information (i.e. values/overflow status) via a shared memory between the SBI implementation and supervisor OS. This allows to minimize the number of traps in when perf being used inside a kvm guest as it relies on SBI PMU + trap/emulation of the counters.

v2: Linux RISC-V IOMMU Support

This patch series introduces support for RISC-V IOMMU architected hardware into the Linux kernel.
The RISC-V IOMMU specification, which this series is based on, is ratified and available at GitHub/riscv-non-isa [1].

v1: riscv: Add support for Ssdbltrp extension

A double trap typically arises during a sensitive phase in trap handling operations — when an exception or interrupt occurs while the trap handler (the component responsible for managing these events) is in a non-reentrant state.

v1: RISC-V: clarify what some RISCV_ISA* config options do

During some discussion on IRC yesterday and on Pu’s bpf patch [1] I noticed that these RISCV_ISA* Kconfig options are not really clear about their implications. Many of these options have no impact on what userspace is allowed to do, for example an application can use Zbb regardless of whether or not the kernel does.

v2: Add support for a few Zc* extensions as well as Zcmop

Add support for (yet again) more RVA23U64 missing extensions. Add support for Zcmop, Zca, Zcf, Zcd and Zcb extensions isa string parsing, hwprobe and kvm support. Zce, Zcmt and Zcmp extensions have been left out since they target microcontrollers/embedded CPUs and are not needed by RVA23U64

v1: riscv: Idle thread using Zawrs extension

This patch series introduces a new implementation of idle thread using Zawrs extension.

v2: perf: RISC-V: Check standard event availability

The RISC-V SBI PMU specification defines several standard hardware and cache events. Currently, all of these events are exposed to userspace, even when not actually implemented. They appear in the perf list output, and commands like perf stat try to use them.

v2: RISCV: KVM: Avoid lock inversion in SBI_EXT_HSM_HART_START

Documentation/virt/kvm/locking.rst advises that kvm->lock should be acquired outside vcpu->mutex and kvm->srcu. However, when KVM/RISC-V handling SBI_EXT_HSM_HART_START, the lock ordering is vcpu->mutex, kvm->srcu then kvm->lock.

v1: RISC-V: Enable IPI CPU Backtrace

Add CPU backtrace feature using IPI on riscv. Currently, riscv doesn’t yet support the feature while other architectures do. As IPI multiplexing allows to handle multiple IPIs, I think this feature can also be enabled on riscv by adding IPI.

v1: mm: code and data partitioning improvements

Managing allocations to ensure code and data pages are not interleaved is not possible prior to this patch, as ASLR requires programming a dynamic _text offset while the vmalloc infrastructure maintains static VMALLOC_START and VMALLOC_END constants.

v1: mmc: sdhci-of-dwcmshc: support Sophgo SG2042

Add support for the mmc controller for Sophgo SG2042. Adding corresponding new compatible strings, and implement custom sdhci_ops.

v2: Consistently prefer sysfs/json events

As discussed in: https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/ preferring sysfs/json events consistently (with or without a given PMU) will enable RISC-V’s hope to customize legacy events in the perf tool.

v1: Revert “riscv: disable generation of unwind tables”

RISC-V has supported the complete relocation types in module loader by ‘8fd6c5142395 (“riscv: Add remaining module relocations”)’. Now RISC-V port can enable unwind tables in case eh_frame parsing is needed.

v4: RISC-V: ACPI: Add external interrupt controller support

This series adds support for the below ECR approved by ASWG. 1) MADT - https://drive.google.com/file/d/1oMGPyOD58JaPgMl1pKasT-VKsIKia7zR/view?usp=sharing

进程调度

v1: perf sched map: Add command-name option to filter the output map

By default, perf sched map prints sched-in events for all the tasks which may not be required all the time as it prints lot of symbols and rows to the terminal.

内存管理

v1: -next: cgroup: Introduce css_is_online() helper

Introduce css_is_online() helper to test if whether the specified css is online, avoid testing css.flags with CSS_ONLINE directly outside of cgroup.c.

v1: cgroup: Introduce css_is_online() helper

Introduce css_is_online() helper to test if whether the specified css is online, avoid testing css.flags with CSS_ONLINE directly outside of cgroup.c.

v1: mm/slub: create kmalloc 96 and 192 caches regardless cache size order

For SLAB the kmalloc caches needed to be created in ascending sizes in order. However, the constraint is not necessary anymore because SLAB has been deprecated and SLUB doesn’t need to comply with the constraint. Thus, kmalloc 96 and 192 caches can be created after the other size kmalloc caches are created instead of checking every time to find their order to be created. Also, this change could prevent engineers from being confused by the deprecated constraint.

v3: slub: introduce count_partial_free_approx()

This patch fixes a known issue in get_slabinfo() which relies on count_partial() to get the exact count of free objects in a kmem_cache_node’s partial list. For some slubs, their partial lists can be extremely long. Currently, count_partial() traverses a partial list to get the exact count of objects. This process may take a long time, during which slab allocations are blocked and IRQs are disabled. In production, even NMI watchdog can be triggered due to this matter.

v8: ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers

Increasing DRAM size and cost has made memory subsystem reliability an important concern. These modules are used where potentially corrupted data could cause expensive or fatal issues. Memory errors are one of the top hardware failures that cause server and workload crashes.

v1: selftests: mm: protection_keys: save/restore nr_hugepages value from launch script

The save/restore of nr_hugepages was added to the test itself by using the atexit() functionality. But it is broken as parent exits after creating child. Hence calling the atexit() function early. That’s not it. The child exits after creating its child and so on.

v1: arm64/mm: uffd write-protect and soft-dirty tracking

This series adds uffd write-protect and soft-dirty tracking support for arm64. I consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.

v13: Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

This patchset is also available at:
https://github.com/amdese/linux/commits/snp-host-v13

v1: mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation

The documentation is wrong and relying on it almost resulted in BUGs in new callers: we return -EAGAIN on unexpected folio references, not -EBUSY.

v2: mm: convert mm’s rss stats to use atomic mode

Since commit f1a7941243c1 (“mm: convert mm’s rss stats into percpu_counter”), the rss_stats have converted into percpu_counter, which convert the error margin from (nr_threads * 64) to approximately (nr_cpus ^ 2). However, the new percpu allocation in mm_init() causes a performance regression on fork/exec/shell. Even after commit 14ef95be6f55 (“kernel/fork: group allocation/free of per-cpu counters for mm struct”), the performance of fork/exec/shell is still poor compared to previous kernel versions.

v1: mm: swapfile: check usable swap device in __folio_throttle_swaprate()

Skip blk_cgroup_congested() if no usable swap device since no swapin/out, suppose that no thorottle when swapon/swapoff, so no swap_lock, difference as shows below from perf date of CoW pagefault,

v10: mm/madvise: enhance lazyfreeing with mTHP in madvise_free

This patchset adds support for lazyfreeing multi-size THP (mTHP) without needing to first split the large folio via split_folio(). However, we still need to split a large folio that is not fully mapped within the target range.

v2: mm: memory: check userfaultfd_wp() in vmf_orig_pte_uffd_wp()

Add userfaultfd_wp() check in vmf_orig_pte_uffd_wp() to avoid the unnecessary pte_marker_entry_uffd_wp() in most pagefault, difference as shows below from perf data of lat_pagefault, note, the function vmf_orig_pte_uffd_wp() is not inlined in the two kernel versions.

v10: net-next: net: intel: start The Great Code Dedup + Page Pool for iavf

Here’s a two-shot: introduce {,Intel} Ethernet common library (libeth and libie) and switch iavf to Page Pool. Details are in the commit messages; here’s a summary:
Not a secret there’s a ton of code duplication between two and more Intel ethernet modules. Before introducing new changes, which would need to be copied over again, start decoupling the already existing duplicate functionality into a new module, which will be shared between several Intel Ethernet drivers.

v6: Memory management patches needed by Rust Binder

This patchset contains some abstractions needed by the Rust implementation of the Binder driver for passing data between userspace, kernelspace, and directly into other processes.

v1: Improve memory statistics for virtio balloon

RFC -> v1:
several text changes: oom-kill -> oom-kills, SCAN_ASYNC -> ASYN_SCAN.
move vm events codes into ‘#ifdef CONFIG_VM_EVENT_COUNTERS’

[PATCH v9 rebase on mm-unstable 0/8] Reduce tlb and interrupt numbers over 90% by improving folio migration

However, it’s only for ones using hinting fault. I thought it’d be much better if we have a general mechanism to reduce all tlb numbers that we can ultimately apply to any type of migration.

v1: fstests: add fsstress + compaction test

Running compaction while we run fsstress can crash older kernels as per korg#218227 [0], the fix for that [0] has been posted [1] but that patch is not yet on v6.9-rc4 and the patch requires changes for v6.9.

v4: mm/page_table_check: Support userfault wr-protect entries

Allow page_table_check hooks to check over userfaultfd wr-protect criteria upon pgtable updates. The rule is no co-existance allowed for any writable flag against userfault wr-protect flag.

v2: slub: limit number of slabs to scan in count_partial()

This patch fixes a known issue in get_slabinfo() which relies on count_partial() to get the exact count of free objects in a kmem_cache_node’s partial list. For some slub caches, their per-node partial list can be extremely long. The current version of count_partial() traverses the partial list to get the exact count of objects while holding the kmem_cache_node’s spinlock.

v1: fs/proc/task_mmu: convert hugetlb functions to work on folis

Let’s convert two more functions, getting rid of two more page_mapcount() calls.

v9: Reduce tlb and interrupt numbers over 90% by improving folio migration

While I’m working with a tiered memory system e.g. CXL memory, I have been facing migration overhead esp. tlb shootdown on promotion or demotion between different tiers.

v1: selftest mm/mseal: style change

This patch is a follow up to the comments [1] on test code during mseal discussion. This is style only change to the selftest code, not to test code logic.

v10: mm: report per-page metadata information

Adds a global Memmap field to /proc/meminfo. This information can be used by users to see how much memory is being used by per-page metadata, which can vary depending on build configuration, machine architecture, and system use.

[PATCH 0/5 RESEND] mm: code and data partitioning improvements

Managing allocations to ensure code and data pages are not interleaved is not possible prior to this patch, as ASLR requires programming a dynamic _text offset while the vmalloc infrastructure maintains static VMALLOC_START and VMALLOC_END constants.

v1: cgroup/rstat: global cgroup_rstat_lock changes

This patchset is focused on the global cgroup_rstat_lock.
Patch-1: Adds tracepoints to improve measuring lock behavior.Patch-2: Converts the global lock into a mutex.Patch-3: Limits userspace triggered pressure on the lock.

v2: selftests: exec: make binaries position independent

The -static overrides the -pie and binaries aren’t position independent anymore. Use -static-pie instead which would produce a static and position independent binary.

v2: memblock: add no-map alloc functions

Like reserved-memory with the ‘no-map’ property and only ‘size’ property (w/o ‘reg’ property), there are memory regions need to be allocated in memblock.memory marked with the MEMBLOCK_NOMAP flag, but should not be allocated in memblock.reserved.

v1: mm/sparse: guard the size of mem_section is power of 2

We usually have this check, while commit 2a3cb8baef71 (“mm/sparse: delete old sparse_init and enable new one”) missed to take it.

文件系统

v15: Landlock: IOCTL support

Make ioctl(2) requests for device files restrictable with Landlock, in a way that is useful for real-world applications.

v2: sysctl: treewide: constify ctl_table_header::ctl_table_arg

To be able to constify instances of struct ctl_tables it is necessary to remove ways through which non-const versions are exposed from the sysctl core. One of these is the ctl_table_arg member of struct ctl_table_header.

v2: JFS folio conversion

This patchset removes uses of struct page from the I/O paths of JFS. write_begin and write_end are still passed a struct page, but they convert to a folio as their first thing. The logmgr still uses a struct page, but I think that’s one we actually don’t want to convert since it’s never inserted into the page cache.

v1: ntfs3: Convert (most of) ntfs3 to use folios

I’m not making any attempt here to support large folios. This is just to remove uses of the page-based APIs. There are still a number of places in ntfs3 which use a struct page, but this is a good start on the conversions.

v1: Convert UDF to folios

I’m not making any attempt here to support large folios. This is just to remove uses of the page-based APIs. Most of these places are for inline data or symlinks, so it wouldn’t be appropriate to use large folios (unless we want to support bs>PS, which seems to be permitted by UDF, although not widely supported).

v1: Convert ext4’s mballoc to use folios

These pages are stored in the page cache, so they’re really folios. Convert the whole file from pages to folios.

v2: dm: restore synchronous close of device mapper block device

‘dmsetup remove’ and ‘dmsetup remove_all’ require synchronous bdev release. Otherwise dm_lock_for_deletion() may return -EBUSY if the open count is > 0, because the open count is dropped in dm_blk_close() which occurs after fput() completes.

v2: xarray: inline xas_descend to improve performance

The commit 63b1898fffcd (“XArray: Disallow sibling entries of nodes”) modified the xas_descend function in such a way that it was no longer being compiled as an inline function, because it increased the size of xas_descend(), and the compiler no longer optimizes it as inline. This had a negative impact on performance, xas_descend is called frequently to traverse downwards in the xarray tree, making it a hot function.

v3: Improve buffer head documentation

Turn buffer head documentation into its own document, and make many general improvements to the docs. Obviously there is much more that could be done. Tested with make htmldocs.

v1: dm: core: put device mapper block device synchronously

‘dmsetup remove_all’ actually depends on sync bdev release since dm_lock_for_deletion() may return -EBUSY if the open count is > 0, and the open count is dropped in dm_blk_close().

v2: fuse: Add initial support for fs-verity

This adds support for the FS_IOC_ENABLE_VERITY and FS_IOC_MEASURE_VERITY ioctls. The FS_IOC_READ_VERITY_METADATA is missing but from the documentation, “This is a fairly specialized use case, and most fs-verity users won’t need this ioctl.”

v1: fstests: add mmap page boundary tests

mmap() POSIX compliance says we should zero fill data beyond a file size up to page boundary, and issue a SIGBUS if we go beyond. While fsx helps us test zero-fill sometimes, fsstress also let’s us sometimes test for SIGBUS however that is based on a random value and its not likley we always test it. Dedicate a specic test for this to make testing for this specific situation and to easily expand on other corner cases.

v1: module: ban ‘.’, ‘..’ as module names, ban ‘/’ in module names

and any name containing ‘/’ as they show in sysfs as directory names:
/sys/module/${mod.name}
sysfs tries to mangle the name and make ‘/’ into ‘!’ which kind of work but not really.

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe.

网络设备

v1: net: phy: mediatek-ge-soc: follow netdev LED trigger semantics

Only blink if the link is up on a LED which is programmed to also indicate link-status.
Otherwise, if both LEDs are in use to indicate different speeds, the resulting blinking being inverted on LEDs which aren’t switched on at a specific speed is quite counter-intuitive.

v3: net-next: net: sparx5: add support for port mirroring

This series adds support for port mirroring, and port mirroring stats, through tc matchall action FLOW_ACTION_MIRRED.

v2: net-next: net: phy: adin: add support for setting led-, link-status-pin polarity

ADIN1300/1200 support software control over pin polarity for both LED_0 and LINK_ST pins.
LED_0 polarity can be supported in led framework with led_polarity_set callback in a future patch-set. LINK_ST is a fixed-function output not suitable for led framework.

[net-next PATCH v3] octeontx2-pf: Add support for offload tc with skbedit mark action

Support offloading of skbedit mark action.
For example, to mark with 0x0008, with dest ip 60.60.60.2 on eth2 interface:
# tc qdisc add dev eth2 ingress# tc filter add dev eth2 ingress protocol ip flower
dst_ip 60.60.60.2 action skbedit mark 0x0008 skip_sw

v2: net: icmp: prevent possible NULL dereferences from icmp_build_probe()

First problem is a double call to __in_dev_get_rcu(), because the second one could return NULL.
if (__in_dev_get_rcu(dev) && __in_dev_get_rcu(dev)->ifa_list)

v5: net-next: selftests: drv-net: support testing with a remote system

Implement support for tests which require access to a remote system / endpoint which can generate traffic. This series concludes the “groundwork” for upstream driver tests.

v1: net-next: netdev: support dumping a single netdev in qstats

I was writing a test for page pool which depended on qstats, and got tired of having to filter dumps in user space. Add support for dumping stats for a single netdev.

v1: net: tools: ynl: don’t ignore errors in NLMSG_DONE messages

NLMSG_DONE contains an error code, it has to be extracted. Prior to this change all dumps will end in success, and in case of failure the result is silently truncated.

v1: net-next: af_unix: Don’t access successor in unix_del_edges() during GC.

syzbot reported use-after-free in unix_del_edges(). [0]
What the repro does is basically repeat the following quickly.

v1: net: skb: Increasing allocation in __napi_alloc_skb() to 2k when needed.

When testing CONFIG_MAX_SKB_FRAGS=45 on ppc64le and x86_64 I ran into a couple of issues.

v2: net-next: netdevsim: add NAPI support

Add NAPI support to netdevsim and register its Rx queues with NAPI instances. Then add a selftest using the new netdev Python selftest infra to exercise the existing Netdev Netlink API, specifically the queue-get API.

v2: net-next: net: A lightweight zero-copy notification

Original title is “net: socket sendmsg MSG_ZEROCOPY_UARG” https://lore.kernel.org/all/
Original notification mechanism needs poll + recvmmsg which is not easy for applcations to accommodate. And, it also incurs unignorable overhead including extra system calls and usage of optmem.

v3: net: b44: set pause params only when interface is up

So if you try to change the pause params while the network interface is disabled/administratively down, everything explodes (which likely netifd tries to do).

v2: net: rxrpc: Clients must accept conn from any address

The find connection logic of Transarc’s Rx was modified in the mid-1990s to support multi-homed servers which might send a response packet from an address other than the destination address in the received packet.

v1: ibmvnic: Use -EBUSY in __ibmvnic_reset()

Date: Fri, 19 Apr 2024 15:46:17 +0200
Add a minus sign before the error code “EBUSY” so that a negative value will be used as in other cases.

v1: net-next: net: microchip: Correct spelling in comments

Correct spelling in comments in Microchip drivers. Flagged by codespell.

[PATCH net-next RFC] net: dsa: mv88e6xxx: Correct check for empty list

Since commit a3c53be55c95 (“net: dsa: mv88e6xxx: Support multiple MDIO busses”) mv88e6xxx_default_mdio_bus() has checked that the return value of list_first_entry() is non-NULL. This appears to be intended to guard against the list chip->mdios being empty.

v2: io_uring-next/net-next: implement io_uring notification (ubuf_info) stacking

Please, don’t take directly, conflicts with io_uring.
To have per request buffer notifications each zerocopy io_uring send request allocates a new ubuf_info. However, as an skb can carry only one uarg, it may force the stack to create many small skbs hurting performance in many ways.

v1: net-next: MT7530 DSA Subdriver Improvements Act IV

This is the forth patch series with the goal of simplifying the MT7530 DSA subdriver and improving support for MT7530, MT7531, and the switch on the MT7988 SoC.

[Intel-wired-lan] v1: iwl-next: ice: refactor struct ice_vsi_cfg_params to be inside of struct ice_vsi

Refactor struct ice_vsi_cfg_params to be embedded into struct ice_vsi. Prior to that the members of the struct were scattered around ice_vsi, and were copy-pasted for purposes of reinit.

[Intel-wired-lan] v10: net-next: ice: Support 5 layer Tx scheduler topology

For performance reasons there is a need to have support for selectable Tx scheduler topology. Currently firmware supports only the default 9-layer and 5-layer topology. This patch series enables switch from default to 5-layer topology, if user decides to opt-in.

v1: net-next: mlx5e per-queue coalescing

This patchset adds ethtool per-queue coalescing support for the mlx5e driver.

v1: net: bridge/br_netlink.c: no need to return void function

br_info_notify is a void function. There is no need to return.

v1: net-next: tcp: do not export tcp_twsk_purge()

After commit 1eeb50435739 (“tcp/dccp: do not care about families in inet_twsk_purge()”) tcp_twsk_purge() is no longer potentially called from a module.

v1: net: af_unix: Read with MSG_PEEK loops if the first unread byte is OOB

Read with MSG_PEEK flag loops if the first byte to read is an OOB byte. commit 22dd70eb2c3d (“af_unix: Don’t peek OOB data without MSG_OOB.”) addresses the loop issue but does not address the issue that no data beyond OOB byte can be read.

v1: net-next: Resolve security issue in MACsec offload Rx datapath

Some device drivers support devices that enable them to annotate whether a Rx skb refers to a packet that was processed by the MACsec offloading functionality of the device. Logic in the Rx handling for MACsec offload does not utilize this information to preemptively avoid forwarding to the macsec netdev currently.

v1: net-next: tcp: avoid sending too small packets

tcp_sendmsg() cooks ‘large’ skbs, that are later split if needed from tcp_write_xmit().

v1: net-next: gve: Implement netdev queue api

Following the discussion on https://patchwork.kernel.org/project/linux-media/patch/20240305020153.2787423-2-almasrymina@google.com/, the queue api defined by Mina is implemented for gve.

[net-next PATCH v2] octeontx2-pf: Add ucast filter count configurability via devlink.

Added a devlink param to set/modify unicast filter count. Currently it’s hardcoded with a macro.

GIT PULL: Networking for v6.9-rc5

The following changes since commit 2ae9a8972ce04046957f8af214509cebfd3bfb9c:
Merge tag ‘net-6.9-rc4’ of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2024-04-11 11:46:31 -0700)

v3: net: udp: preserve the connected status if only UDP cmsg

If “udp_cmsg_send()” returned 0 (i.e. only UDP cmsg), “connected” should not be set to 0. Otherwise it stops the connected socket from using the cached route.

v1: net-next: net: ethernet: mtk_eth_soc: flower: validate control flags

This driver currently doesn’t support any control flags.
Use flow_rule_has_control_flags() to check for control flags, such as can be set through tc flower ... ip_flags frag.

v4: net-next: selftests: virtio_net: introduce initial testing infrastructure

This patchset aims at introducing very basic initial infrastructure for virtio_net testing, namely it focuses on virtio feature testing.

安全增强

v1: string: Merge separate tests into string_kunit.c

We have a few lone function tests (strscpy and strcat) that are better collected into the common string_kunit.c test suite. Perform various renamings, merge everything, and clean up after them.

v2: string_kunit: Add test cases for str*cmp functions

Currently, str*cmp functions (strcmp, strncmp, strcasecmp and strncasecmp) are not covered with tests. Extend the string_kunit.c test by adding the test cases for them.

v1: ubsan: Add awareness of signed integer overflow traps

On arm64, UBSAN traps can be decoded from the trap instruction. Add the add, sub, and mul overflow trap codes now that CONFIG_UBSAN_SIGNED_WRAP exists. Seen under clang 19:
Internal error: UBSAN: unrecognized failure code: 00000000f2005515 [#1] PREEMPT SMP

v10: Introduce mseal

This is V10 version, it rebases v9 patch to 6.9.rc3. We also applied and tested mseal() in chrome and chromebook.

异步 IO

v2: io_uring: releasing CPU resources when polling

This patch is intended to release the CPU resources of io_uring in polling mode. When IO is issued, the program immediately polls for check completion, which is a waste of CPU resources when IO commands are executed on the disk.

Rust For Linux

v1: rust: add ‘firmware’ tag support to module! macro

This patch is necessary for a new QT2025 PHY driver in Rust:
https://lwn.net/Articles/969888/

v1: kbuild: rust: split up helpers.c

This patch splits up the rust helpers C file. When rebasing patch sets on upstream linux, merge conflicts in helpers.c is common and time consuming [1]. Thus, split the file so that each kernel component can live in a separate file.

v1: net-next: net: phy: add Applied Micro QT2025 PHY driver

This patchset adds a PHY driver for Applied Micro Circuits Corporation QT2025. The 1-3th patches simply add more support functions for the PHYLIB Rust bindings, which are necessary for the driver (the fourth patch).

v1: Add room for insn-enconding and symbol name

The longest length of a symbol (KSYM_NAME_LEN) was increased to 512 in the reference [1]. This patch adds room for insn-encoding and a symbol name, as proposed in [2]

BPF

v1: selftests/bpf: Add ring_buffer__consume_n test.

Add a testcase for the ring_buffer__consume_n() API.
The test produces multiple samples in a ring buffer, using a sys_getpid() fentry prog, and consumes them from user-space in batches, rather than consuming all of them greedily, like ring_buffer__consume() does.

v1: bpf-next: use network helpers, part 2

This patchset uses more network helpers in test_sock_addr.c, but first of all, patch 2 is needed to make network_helpers.c independent of test_progs.c. Then network_helpers.h can be included into test_sock_addr.c without compile errors.

v1: bpf-next: Add notrace to queued_spin_lock_slowpath function to avoid deadlocks

This patch is to prevent deadlocks when multiple bpf programs are attached to queued_spin_locks functions. This issue is similar to what is already discussed [1] before with the spin_lock helpers.

v2: bpf-next: Introduce bpf_wq

This is a followup of sleepable bpf_timer[0].
When discussing sleepable bpf_timer, it was thought that we should give a try to bpf_wq, as the 2 APIs are similar but distinct enough to justify a new one.

v3: bpf-next: bpf: btf: include linux/types.h for u32

Inclusion of the header linux/btf_ids.h relies on indirect inclusion of the header linux/types.h. Including it directly on the top level helps to avoid potential problems if linux/types.h hasn’t been included before.

v1: bpf: arm32, bpf: reimplement sign-extension mov instruction

The steps for fixing this and the instructions emitted by the JIT are explained below with examples in all combinations

v3: 5.15.y: Backport bounds checks for bpf

These backports fix CVE-2021-4204, CVE-2022-23222 for 5.15.y.
This includes a conflict resolution with 45ce4b4f9009 (“bpf: Fix crash due to out of bounds access into reg2btf_ids.”) which was cherry-picked previously.

v1: bpf: xdp: use flags field to disambiguate broadcast redirect

When redirecting a packet using XDP, the bpf_redirect_map() helper will set up the redirect destination information in struct bpf_redirect_info (using the __bpf_xdp_redirect_map() helper function), and the xdp_do_redirect() function will read this information after the XDP program returns and pass the frame on to the right redirect destination.

v4: bpf-next: Replace mono_delivery_time with tstamp_type

Patch 1 :- This patch takes care of only renaming the mono delivery timestamp to tstamp_type with no change in functionality of existing available code in kernel also Starts assigning tstamp_type with either mono or real and introduces a new enum in the skbuff.h, again no change in functionality of the existing available code in kernel , just making the code scalable.

v1: bpf-next: bpf: add sacked flag in BPF_SOCK_OPS_RETRANS_CB

Add TCP_SKB_CB(skb)->sacked as the 4th arg of sockops passed to bpf program. Then we can get the retransmission efficiency by counting skbs w/ and w/o TCPCB_EVER_RETRANS mark. And for this purpose, sacked updating is moved after the BPF_SOCK_OPS_RETRANS_CB hook.

v2: bpf-next: bpf/verifier: range computation improvements

This is the second series of this patches, now changed after Yonghong review. Thank you for the review.

v9: net-next: virtio_net: Support RX hash XDP hint

The RSS hash report is a feature that’s part of the virtio specification. Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost (still a work in progress as per [1]) support this feature. While the capability to obtain the RSS hash has been enabled in the normal path, it’s currently missing in the XDP path. Therefore, we are introducing XDP hints through kfuncs to allow XDP programs to access the RSS hash.

v9: bpf-next: BPF crypto API framework

This series introduces crypto kfuncs to make BPF programs able to utilize kernel crypto subsystem. Crypto operations made pluggable to avoid extensive growth of kernel when it’s not needed. Only skcipher is added within this series, but it can be easily extended to other types of operations. No hardware offload supported as it needs sleepable context which is not available for TX or XDP programs. At the same time crypto context initialization kfunc can only run in sleepable context, that’s why it should be run separately and store the result in the map.

v1: dwarves: pahole: support nonstandard btf_features

This small series allows the user to specify –btf_features=all along with non-standard features such as –btf_features=reproducible_build . Features are documented as standard - so participating in “all” - or non-standard - such as “reproducible_build”.

v3: bpf-next: bpf: Harden and/or/xor value tracking

This patch addresses a latent unsoundness issue in the scalar(32)_min_max_and/or/xor functions. While it is not a bugfix, it ensures that the functions produce sound outputs for all inputs.

v1: neighbour: guarantee the localhost connections be established successfully even the ARP table is full

Inter-process communication on localhost should be established successfully even the ARP table is full, many processes on server machine use the localhost to communicate such as command-line interface (CLI), servers hope all CLI commands can be executed successfully even the arp table is full. Right now CLI commands got timeout when the arp table is full. Set the parameter of exempt_from_gc to be true for LOOPBACK net device to keep localhost neigh in arp table, not removed by gc.

v4: kbuild: Avoid weak external linkage where possible

Weak external linkage is intended for cases where a symbol reference can remain unsatisfied in the final link. Taking the address of such a symbol should yield NULL if the reference was not satisfied.

v4: security: digest_cache LSM

Integrity detection and protection has long been a desirable feature, to reach a large user base and mitigate the risk of flaws in the software and attacks.

v2: net-next: First try to replace page_frag with page_frag_cache

This patchset tries to unfiy the page frag implementation by replacing page_frag with page_frag_cache for sk_page_frag() first. net_high_order_alloc_disable_key for the implementation in net/core/sock.c doesn’t seems matter that much now have have pcp support for high-order pages in commit 44042b449872 (“mm/page_alloc: allow high-order pages to be stored on the per-cpu lists”).

周边技术动态

Qemu

v1: target/riscv: Add support for Smdbltrp and Ssdbltrp extensions

A double trap typically arises during a sensitive phase in trap handling operations — when an exception or interrupt occurs while the trap handler (the component responsible for managing these events) is in a non-reentrant state.

v3: riscv: thead: Add th.sxstatus CSR emulation

The th.sxstatus CSR can be used to identify available custom extension on T-Head CPUs. The CSR is documented here:https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadsxstatus.adoc

v3: for-9.1: target/riscv: set tval in breakpoints