泰晓科技 -- 聚焦 Linux - 追本溯源,见微知著!
网站地址:https://tinylab.org

泰晓RISC-V实验箱,转战RISC-V,开箱即用
请稍侯

RISC-V Linux 内核及周边技术动态第 86 期

呀呀呀 创作于 2024/04/08

时间:20240407
编辑:晓怡
仓库:RISC-V Linux 内核技术调研活动
赞助:PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v8: riscv: add initial support for Canaan Kendryte K230

K230 is an ideal chip for RISC-V Vector 1.0 evaluation now. Add initial support for it to allow more people to participate in building drivers to mainline for it.

v2: riscv: improve nommu and timer-clint

As is known, the sophgo CV1800B contains so called little core, which is C906 w/o MMU, so I want to run nommu linux on it. This series is the result of the bring up. After this series, w/ proper dts, we can run nommu linux on milkv duo’s little core.

**[v2: clocksouce/timer-clintriscv: some improvements](http://lore.kernel.org/linux-riscv/20240406111757.1597-1-jszhang@kernel.org/)**

This series is a simple improvement for timer-clint and timer-riscv:

Add set_state_shutdown for timer-clint, this hook is used when switching clockevent from timer-clint to another timer.

v6: riscv: pwm: sophgo: add pwm support for CV1800

The Sophgo CV1800 chip provides a set of four independent PWM channel outputs. This series adds PWM controller support for Sophgo cv1800.

v1: ftrace: riscv: move from REGS to ARGS

This commit replaces riscv’s support for FTRACE_WITH_REGS with support for FTRACE_WITH_ARGS. This is required for the ongoing effort to stop relying on stop_machine() for RISCV’s implementation of ftrace.

v1: bpf-next: riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs

Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs.

v3: rust: make mutually exclusive with CFI_CLANG

On RISC-V and arm64, and presumably x86, if CFI_CLANG is enabled, loading a rust module will trigger a kernel panic. Support for sanitisers, including kcfi (CFI_CLANG), is in the works, but for now they’re nightly-only options in rustc. Make RUST depend on !CFI_CLANG to prevent configuring a kernel without symmetrical support for kfi.

v1: Add parsing for Zimop ISA extension

The Zimop ISA extension was ratified recently. This series adds support for parsing it from riscv,isa, hwprobe export and kvm support for Guest/VM.

v1: riscv: selftests: Add signal handling vector tests

Add two tests to check vector save/restore when a signal is received during a vector routine. One test ensures that a value is not clobbered during signal handling. The other verifies that vector registers modified in the signal handler are properly reflected when the signal handling is complete.

v3: riscv control-flow integrity for usermode

Sending out v3 for cpu assisted riscv user mode control flow integrity.

v2: Add IAX45 support for RZ/Five SoC

The IAX45 block on RZ/Five SoC is almost identical to the IRQC bock found on the RZ/G2L family of SoCs.

v6: Add StarFive JH8100 dwmac support

Add StarFive JH8100 dwmac support. The JH8100 dwmac shares the same driver code as the JH7110 dwmac and has only one reset signal.

v5: RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest

This series implements SBI PMU improvements done in SBI v2.0[1] i.e. PMU snapshot and fw_read_hi() functions.

v3: bpf-next: Add 12-argument support for RV64 bpf trampoline

This patch adds 12 function arguments support for riscv64 bpf trampoline. The current bpf trampoline supports <= sizeof(u64) bytes scalar arguments [0] and <= 16 bytes struct arguments [1]. Therefore, we focus on the situation where scalars are at most XLEN bits and aggregates whose total size does not exceed 2×XLEN bits in the riscv calling convention [2].

v1: riscv: defconfig: Enable StarFive JH7110 drivers

Add support for StarFive JH7110 SoC and VisionFive 2 board.

  • Clock & reset
  • Cache
  • Temperature sensor
  • PMIC (AXP15060)
  • Restart GPIO
  • RNG
  • I2C
  • SPI
  • Quad SPI
  • USB & USB 2.0 PHY & PCIe 2.0/USB 3.0 PHY
  • Audio (I2S / TDM / PWM-DAC)
  • Camera Subsystem & MIPI-CSI2 RX & D-PHY RX

v1: mm/gup: consistently call it GUP-fast

Some cleanups around function names, comments and the config option of “GUP-fast” – GUP without “lock” safety belts on.

v1: KVM: riscv: selftests: Add SBI base extension test

This is the first patch to enable the base extension selftest for the SBI implementation in KVM. Test for other extensions will be added later.

进程调度

v1: net-next: net: sched: cake: Optimize number of calls to cake_heapify()

Improve the max-heap construction process by reducing unnecessary heapify operations. Specifically, adjust the starting condition from n / 2 to n / 2 - 1 in the loop that iterates over all non-leaf elements.

v1: sched/fair: Complete EEVDF

I’m slowly crawling back out of the hole and trying to get back to work. Availability is still low on my end, but I’ll try and respond to some email.

**[v1: sched/fairisolation: Correctly clear nohz.[nr_cpusidle_cpus_mask] for isolated CPUs](http://lore.kernel.org/lkml/20240403150543.2793354-1-pierre.gondois@arm.com/)**

Zhang Rui reported that find_new_ilb() was iterating over CPUs in isolated cgroup partitions. This triggered spurious wakeups for theses CPUs.

v1: sched/pi: Reweight fair_policy() tasks when inheriting prio

For fair tasks inheriting the priority (nice) without reweighting is a NOP as the task’s share won’t change.

内存管理

v1: mm,swap: add document about RCU read lock and swapoff interaction

During reviewing a patch to fix the race condition between free_swap_and_cache() and swapoff() [1], it was found that the document about how to prevent racing with swapoff isn’t clear enough. Especially RCU read lock can prevent swapoff from freeing data structures. So, the document is added as comments.

v1: mm/mmap: make accountable_mapping return bool

accountable_mapping can return bool,so we change it

v1: mm/userfaultfd: Allow hugetlb change protection upon poison entry

After UFFDIO_POISON, there can be two kinds of hugetlb pte markers, either the POISON one or UFFD_WP one.

v1: mm: Convert pagecache_isize_extended to use a folio

Remove four hidden calls to compound_head(). Also exit early if the filesystem block size is >= PAGE_SIZE instead of just equal to PAGE_SIZE.

v1: Documentation/admin-guide/sysctl/vm.rst adding the importance of NUMA-node count to documentation

If any bits are set in node_reclaim_mode (tunable via /proc/sys/vm/zone_reclaim_mode) within get_pages_from_freelist(), then page allocations start getting early access to reclaim via the node_reclaim() code path when memory pressure increases. This behavior provides the most optimization for multiple NUMA node machines.

v1: selftests: Replace “Bail out” with “Error” in ksft_exit_fail_msg()

“Bail out! “ is not descriptive. It rather should be: “Failed: “ and then this added prefix doesn’t need to be added everywhere. Usually in the logs, we are searching for “Failed” or “Error” instead of “Bail out” so it must be replace.

v1: mm: set pageblock_order to HPAGE_PMD_ORDER in case with !CONFIG_HUGETLB_PAGE but THP enabled

As Vlastimil suggested in previous discussion[1], it doesn’t make sense to set pageblock_order as MAX_PAGE_ORDER when hugetlbfs is not enabled and THP is enabled. Instead, it should be set to HPAGE_PMD_ORDER.

v4: mm: add per-order mTHP alloc and swpout counters

The patchset introduces a framework to facilitate mTHP counters, starting with the allocation and swap-out counters. Currently, only four new nodes are appended to the stats directory for each mTHP size.

v3: DAMON based tiered memory management for CXL memory

There was an RFC IDEA “DAMOS-based Tiered-Memory Management” previously posted at [1].

It says there is no implementation of the demote/promote DAMOS action are made. This RFC is about its implementation for physical address space.

v11: Improved Memory Tier Creation for CPUless NUMA Nodes

When a memory device, such as CXL1.1 type3 memory, is emulated as normal memory (E820_TYPE_RAM), the memory device is indistinguishable from normal DRAM in terms of memory tiering with the current implementation. The current memory tiering assigns all detected normal memory nodes to the same DRAM tier.

v1: userfaultfd: change src_folio after ensuring it’s unpinned in UFFDIO_MOVE

Commit d7a08838ab74 (“mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails”) moved the src_folio->{mapping, index} changing to after clearing the page-table and ensuring that it’s not pinned. This avoids failure of swapout+migration and possibly memory corruption.

v1: s390: page_mapcount(), page_has_private() and PG_arch_1

On my journey to remove page_mapcount(), I got hooked up on other folio cleanups that Willy most certainly will enjoy.

v1: selftests: add ksft_exit_fail_perror()

In this series, ksft_exit_fail_perror() is being added which is helper function on top of ksft_exit_fail_msg(). It prints errno and its string form always.

v9: net-next: net: intel: start The Great Code Dedup + Page Pool for iavf

Here’s a two-shot: introduce {,Intel} Ethernet common library (libeth and libie) and switch iavf to Page Pool. Details are in the commit messages; here’s a summary

v1: Make {virt, phys, page, pfn} translation work with KFENCE for LoongArch

On LoongArch kmalloc() range is DMW-mapped rather than TLB-mapped, so KFENCE remap __kfence_pool to the TLB-mappd range.

v1: kasan: hw_tags: include linux/vmalloc.h

This header is no longer included implicitly and instead needs to be pulled in directly

v4: Memory management patches needed by Rust Binder

This patchset contains some abstractions needed by the Rust implementation of the Binder driver for passing data between userspace, kernelspace, and directly into other processes.

v13: mm/gup: Introduce memfd_pin_folios() for pinning memfd folios

Currently, some drivers (e.g, Udmabuf) that want to longterm-pin the pages/folios associated with a memfd, do so by simply taking a reference on them. This is not desirable because the pages/folios may reside in Movable zone or CMA block.

v2: SLUB: improve filling cpu partial a bit in get_partial_node()

This series is to remove the unnecessary check for filling cpu partial and improve the readability.

v1: khugepaged folio conversions

We’ve been kind of hacking piecemeal at converting khugepaged to use folios instead of compound pages, and so this patchset is a little larger than it should be as I undo some of our wrong moves in the past. In particular, collapse_file() now consistently uses ‘new_folio’ for the freshly allocated folio and ‘folio’ for the one that’s currently in use.

v1: Use folio APIs in procfs

Not sure whether Andrew or Christian will want to take this set of fixes. We’re down to very few users of the PageFoo macros, with proc being a major user. After this patchset and another patchset I have for khugepaged, we can get rid of PageActive, PageReadahead and PageSwapBacked.

v1: mm: page_alloc: use the correct THP order for THP PCP

Commit 44042b449872 (“mm/page_alloc: allow high-order pages to be stored on the per-cpu lists”) extends the PCP allocator to store THP pages, and it determines whether to cache THP pags in PCP by comparing with pageblock_order. But the pageblock_order is not always equal to THP order, it might also be MAX_PAGE_ORDER, which could prevent PCP from caching THP pages.

v6: Swap-out mTHP without splitting

This series adds support for swapping out multi-size THP (mTHP) without needing to first split the large folio via split_huge_page_to_list_to_order(). It closely follows the approach already used to swap-out PMD-sized THP.

v3: support multi-size THP numa balancing

This patchset tries to support mTHP numa balancing, as a simple solution to start, the NUMA balancing algorithm for mTHP will follow the THP strategy as the basic support. Please find details in each patch.

v2: arch/mm/fault: accelerate pagefault when badaccess

After VMA lock-based page fault handling enabled, if bad access met under per-vma lock, it will fallback to mmap_lock-based handling, so it leads to unnessary mmap lock and vma find again. A test from lmbench shows 34% improve after this changes on arm64,

v3: mm: add per-order mTHP alloc_success and alloc_fail counters

Profiling a system blindly with mTHP has become challenging due to the lack of visibility into its operations. Presenting the success rate of mTHP allocations appears to be pressing need.

文件系统

v14: Landlock: IOCTL support

Make ioctl(2) requests for device files restrictable with Landlock, in a way that is useful for real-world applications.

v2: fs: Set file_handle::handle_bytes before referencing file_handle::f_handle

Since __counted_by(handle_bytes) was added to struct file_handle, we need to explicitly set it in the one place it wasn’t yet happening prior to accessing the flex array “f_handle”. For robustness also check for a negative value for handle_bytes, which is possible for an “int”, but nothing appears to set.

v1: cifs: Add tracing for the cifs_tcon struct refcounting

Add tracing for the refcounting/lifecycle of the cifs_tcon struct, marking

GIT PULL: bcachefs repair code for rc3

The following changes since commit b3c7fd35c03c17a950737fb56a06b730a7962d28:

bcachefs: On emergency shutdown, print out current journal sequence number (2024-04-01 01:07:24 -0400)

v1: More GFS2 folio conversions

Yet more gfs2 folio conversions. As usual, compile tested only. The third patch is a bit more “interesting” than most.

[RESEND]v3: security: Place security_path_post_mknod() where the original IMA call was

Commit 08abce60d63f (“security: Introduce path_post_mknod hook”) introduced security_path_post_mknod(), to replace the IMA-specific call to ima_post_path_mknod().

v1: exfat: move extend valid_size into ->page_mkwrite()

It is not a good way to extend valid_size to the end of the mmap area by writing zeros in mmap. Because after calling mmap, no data may be written, or only a small amount of data may be written to the head of the mmap area.

v3: fiemap extension for more physical information

For many years, various btrfs users have written programs to discover the actual disk space used by files, using root-only interfaces. However, this information is a great fit for fiemap: it is inherently tied to extent information, all filesystems can use it, and the capabilities required for FIEMAP make sense for this additional information also.

GIT PULL: security changes for v6.9-rc3

A single bug fix to address a kernel panic in the newly introduced function security_path_post_mknod.

PS: sorry for the email mismatch, @huawei.com emails resent from themailing list are classified by Gmail as spam, we are working onfixing it.

v2: fuse: allow FUSE drivers to declare themselves free from outside changes

Traditionally, we’ve allowed people to set leases on FUSE inodes. Some FUSE drivers are effectively local filesystems and should be fine with kernel-internal lease support. Others are backed by a network server that may have multiple clients, or may be backed by something non-file like entirely. On those, we don’t want to allow leases.

GIT PULL: security changes for v6.9-rc3

I have a small bug fix for this kernel version. Please pull.

PS: sorry for the email mismatch, @huawei.com emails resent from themailing list are classified by Gmail as spam, we are working onfixing it.

v2: security: Handle dentries without inode in security_path_post_mknod()

Commit 08abce60d63fi (“security: Introduce path_post_mknod hook”) introduced security_path_post_mknod(), to replace the IMA-specific call to ima_post_path_mknod().

v2: vmcore: replace strncpy with strscpy_pad

strncpy() is in the process of being replaced as it is deprecated [1]. We should move towards safer and less ambiguous string interfaces.

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe.

网络设备

v1: net-next: First try to replace page_frag with page_frag_cache

This patchset tries to unfiy the page frag implementation by replacing page_frag with page_frag_cache for sk_page_frag() first.

v2: net-next: net: display more skb fields in skb_dump()

Print these additional fields in skb_dump() to ease debugging.

v4: net-next: net: Add generic support for netdev LEDs

For some devices, the MAC controls the LEDs in the RJ45 connector, not the PHY. This patchset provides generic support for such LEDs, and adds the first user, mv88e6xxx.

v3: RDMA/mana_ib: Add flex array to struct mana_cfg_rx_steer_req_v2

The “struct mana_cfg_rx_steer_req_v2” uses a dynamically sized set of trailing elements. Specifically, it uses a “mana_handle_t” array. So, use the preferred way in the kernel declaring a flexible array [1].

v1: net: change maximum number of UDP segments to 128

Earlier commit fc8b2a619469378 (“net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation”) added check of potential number of UDP segment vs UDP_MAX_SEGMENTS in linux/virtio_net.h.

v2: net-next: mptcp: add reset reason options in some places

The reason codes are handled in two ways nowadays (quoting Mat Martineau):

  1. Sending in the MPTCP option on RST packets when there is no subflow context available (these use subflow_add_reset_reason() and directly call a TCP-level send_reset function)
  2. The “normal” way via subflow->reset_reason. This will propagate to both the outgoing reset packet and to a local path manager process via netlink in mptcp_event_sub_closed()

v2: net: af_unix: Clear stale u->oob_skb.

syzkaller started to report deadlock of unix_gc_lock after commit it just uncovers the bug that has been there since commit 314001f0bf92 (“af_unix: Add OOB support”).

v1: net-next: ipv4: Set scope explicitly in ip_route_output().

Add a “scope” parameter to ip_route_output() so that callers don’t have to override the tos parameter with the RTO_ONLINK flag if they want a local scope.

[net-next,RFC PATCH 0/5] Configuring NAPI instance for a queue

Support user configurability of queue<->NAPI association. The netdev-genl interface is extended with ‘queue-set’ command. Currently, the set command enables associating a NAPI ID for a queue, but can also be extended to support configuring other attributes. To set the NAPI attribute, the command requires the queue identifiers and the ID of the NAPI instance that the queue has to be associated with.

v2: net-next: ipvlan: handle NETDEV_DOWN event

In case of stacked devices, to help propagate the down link state from the parent/root device (to this leaf device), handle NETDEV_DOWN event like it is done now for NETDEV_UP.

v1: net-next: mptcp: add last time fields in mptcp_info

These patches from Geliang add support for the “last time” field in MPTCP Info, and verify that the counters look valid.

v6: ipsec-next: xfrm: Add Direction to the SA in or out

This patch introduces the ‘dir’ attribute, ‘in’ or ‘out’, to the xfrm_state, SA, enhancing usability by delineating the scope of values based on direction. An input SA will now exclusively encompass values pertinent to input, effectively segregating them from output-related values. This change aims to streamline the configuration process and improve the overall clarity of SA attributes.

v1: wifi: Un-embed ath10k and ath11k dummy netdev

struct net_device shouldn’t be embedded into any structure, instead, the owner should use the private space to embed their state into net_device.

v1: net-next: XYZ: Handle HAS_IOPORT dependencies

This is a follow up in my ongoing effort of making inb()/outb() and similar I/O port accessors compile-time optional. Previously I sent this as a treewide series titled “treewide: Remove I/O port accessors for HAS_IOPORT=n” with the latest being its 5th version[0]. With a significant subset of patches merged I’ve changed over to per-subsystem series. These series are stand alone and should be merged via the relevant tree such that with all subsystems complete we can follow this up with the final patch that will make the I/O port accessors compile-time optional.

v1: net-next: tcp: more struct tcp_sock adjustments

tp->recvmsg_inq is used from tcp recvmsg() thus should be in tcp_sock_read_rx group.

tp->tcp_clock_cache and tp->tcp_mstamp are written both in rx and tx paths, thus are better placed in tcp_sock_write_txrx group.

v2: Bluetooth: keep LE flow credits when recvbuf full

Previously LE flow credits were returned to the sender even if the socket’s receive buffer was full. This meant that no back-pressure was applied to the sender, thus it continued to send data, resulting in data loss without any error being reported.

v1: HW TX Rate Limiting Driver API

This is follow-up to the ongoing discussion started by Intel to extend the support for TX shaping H/W offload [1].

v6: iwl-next: Introduce ETH56G PHY model for E825C products

E825C products have a different PHY model than E822, E823 and E810 products. This PHY is ETH56G and its support is necessary to have functional PTP stack for E825C products.

v3: net-next: Enhanced DCB and DSCP Support for KSZ Switches

This patch series is aimed at improving support for DCB (Data Center Bridging) and DSCP (Differentiated Services Code Point) on KSZ switches.

v1: l2cap: do not return LE flow credits when buf full

Previously LE flow credits were returned to the sender even if the socket’s receive buffer was full. This meant that no back-pressure was applied to the sender, thus it continued to send data, resulting in data loss without any error being reported.

v1: pull request for net-next: batman-adv 2024-04-05

The following changes since commit 4cece764965020c22cff7665b18a012006359095:

Linux 6.9-rc1 (2024-03-24 14:10:05 -0700)

are available in the Git repository at:

git://git.open-mesh.org/linux-merge.git tags/batadv-next-pullrequest-20240405

v4: net-next: net: usb: ax88179_178a: non necessary second random mac address

If the mac address can not be read from the device registers or the devicetree, a random address is generated, but this was already done from usbnet_probe, so it is not necessary to call eth_hw_addr_random from here again to generate another random address.

v2: net-next: net: phy: micrel: lan8814: Enable PTP_PF_PEROUT

Add support for PTP_PF_PEROUT to lan8814. First patch just enables the LTC at probe time, such that it is not required to enable timestamping to have the LTC enabled. While the second patch actually adds support for PTP_PF_PEROUT.

v4: net-next: nfp: series of minor driver improvements

This short series bundles two unrelated but small updates to the nfp driver.

v2: net-next: Add support for flower actions mirred and redirect

This series adds support for the two tc flower actions mirred and redirect. Both actions are implemented by means of a port mask and a mask mode. The mask mode controls how the mask is applied, and together they are used by the switch to make a forwarding decision. Both actions are configurable via the IS0 or IS2 VCAP’s (ingress stage 0 and 2, respectively).

v3: net-next: selftests: net: groundwork for YNL-based tests

Currently the options for writing networking tests are C, bash or some mix of the two. YAML/Netlink gives us the ability to easily interface with Netlink in higher level laguages. In particular, there is a Python library already available in tree, under tools/net. Add the scaffolding which allows writing tests using this library.

v1: net-next: mptcp: add reset reasons in skb in more cases

The first patch only removes the check while the second adds reasons into some places.

v1: Add REQ_F_CQE_SKIP support to io_uring zerocopy

This patchset allows for io_uring zerocopy to support REQ_F_CQE_SKIP, skipping the normal completion notification, but not the zerocopy buffer release notification.

[PATCH RESEND net-next v3] net: cache for same cpu skb_attempt_defer_free

Optimise skb_attempt_defer_free() when run by the same CPU the skb was allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can disable softirqs and put the buffer into cpu local caches.

v2: net-next: net: enable SOCK_NOSPACE for UDP

wake_up_poll() and variants can be expensive even if they don’t actually wake anything up as it involves disabling irqs, taking a spinlock and walking through the poll list, which is fraught with cache bounces. That might happen when someone waits for POLLOUT or even POLLIN as the waitqueue is shared, even though we should be able to skip these false positive calls when the tx queue is not full.

安全增强

v1: xfs: replace deprecated strncpy with strscpy_pad

strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.

v1: kdb: replace deprecated strncpy

All the other cases in this big switch statement use memcpy or other methods for copying string data. Since the lengths are handled manually and carefully, using strncpy() is may be misleading. It doesn’t guarantee any sort of NUL-termination on its destination buffer. At any rate, it’s deprecated [1] and we want to remove all its uses [2].

v2: Add sy7802 flash led driver

This series introduces a driver for the Silergy SY7802 charge pump used in the BQ Aquaris M5 and X5 smartphones.

v1: udf: replace deprecated strncpy/strcpy with strscpy

strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. Also replace an instance of strcpy() which is also deprecated.

v6: RESEND: arm64: qcom: add AIM300 AIoT board support

Add AIM300 AIoT support along with usb, ufs, regulators, serial, PCIe, and PMIC functions. AIM300 Series is a highly optimized family of modules designed to support AIoT applications. It integrates QCS8550 SoC, UFS and PMIC chip etc.

异步 IO

v1: liburing: manpage improvements

Just sweeping through github issues.

v1: io_uring: use private workqueue for exit work

Rather than use the system unbound event workqueue, use an io_uring specific one. This avoids dependencies with the tty, which also uses the system_unbound_wq, and issues flushes of said workqueue from inside its poll handling.

Rust For Linux

v1: Rust bindings for cpufreq and OPP core + sample driver

This RFC adds initial rust bindings for two subsystems, cpufreq and operating performance points (OPP). The bindings are provided for most of the interface these subsystems expose.

v1: rust: init: change the generated name of guard variables

The initializers created by the [try_][pin_]init! macros utilize the guard pattern to drop already initialized fields, when initialization fails mid-way. These guards are generated to have the same name as the field that they handle. To prevent namespacing issues when the field name is the same as e.g. a constant name, add __ as a prefix and _guard as the suffix.

v4: Arc methods for linked list

This patchset contains two useful methods for the Arc type. They will be used in my Rust linked list implementation [1], which Rust Binder uses. See the Rust Binder RFC [2] for more information.

v1: Rust 1.78.0 upgrade

This is the first upgrade without the alloc fork.

In other words, it is based on top of Wedson’s “Allocation APIs” series [1], applied on top of the current rust-next, i.e. commit 9ffe2a730313 (“rust: str: add {make,to}_{upper,lower}case() to CString”).

BPF

v2: bpf-next: selftests/bpf: Add F_SETFL for fcntl

Incorrect arguments are passed to fcntl() in test_sockmap.c when invoking it to set file status flags. If O_NONBLOCK is used as 2nd argument and passed into fcntl, -EINVAL will be returned (See do_fcntl() in fs/fcntl.c). The correct approach is to use F_SETFL as 2nd argument, and O_NONBLOCK as 3rd one.

v1: bpf: dereference of null in __cgroup_bpf_query() function

In the __cgroup_bpf_query() function, it is possible to dereference the null pointer in the line id = prog->aux->id; since there is no check for a non-zero value of the variable prog.

v4: libbpf: API to partially consume items from ringbuffer

Introduce ring__consume_n() and ring_buffer__consume_n() API to partially consume items from one (or more) ringbuffer(s).

v2: bpf-next: bpf: Allow invoking kfuncs from BPF_PROG_TYPE_SYSCALL progs

Currently, a set of core BPF kfuncs (e.g. bpf_task_, bpf_cgroup_, bpf_cpumask_*, etc) cannot be invoked from BPF_PROG_TYPE_SYSCALL programs. The whitelist approach taken for enabling kfuncs makes sense: it not safe to call these kfuncs from every program type. For example, it may not be safe to call bpf_task_acquire() in an fentry to free_task().

v1: bpf-next: arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs

Support an instruction for resolving absolute addresses of per-CPU data from their per-CPU offsets. This instruction is internal-only and users are not allowed to use them directly. They will only be used for internal inlining optimizations for now between BPF verifier and BPF JITs.

v1: bpf-next: bpf: allow bpf_for_each_map_elem() helper with different input maps

Currently, taking different maps within a single bpf_for_each_map_elem call is not allowed. For example the following codes cannot pass the verifier (with error “tail_call abusing map_ptr”):

v2: bpf-next: selftests/bpf: Make sure libbpf doesn’t enforce the signature of a func pointer.

The verifier in the kernel ensures that the struct_ops operators behave correctly by checking that they access parameters and context appropriately. The verifier will approve a program as long as it correctly accesses the context/parameters, regardless of its function signature.

GIT PULL: Networking for v6.9-rc3

The following changes since commit 50108c352db70405b3d71d8099d0b3adc3b3352c:

Merge tag ‘net-6.9-rc2’ of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2024-03-28 13:09:37 -0700)

**[v1: net: xsk: validate user input for XDP_{UMEMCOMPLETION}_FILL_RING](http://lore.kernel.org/bpf/20240404202738.3634547-1-edumazet@google.com/)**

syzbot reported an illegal copy in xsk_setsockopt() [1]

Make sure to validate setsockopt() @optlen parameter.

v5: bpf-next: bpftool: Mount bpffs on provided dir instead of parent dir

When pinning programs/objects under PATH (eg: during “bpftool prog loadall”) the bpffs is mounted on the parent dir of PATH in the following situations:

  • the given dir exists but it is not bpffs.
  • the given dir doesn’t exist and the parent dir is not bpffs.

v5: kallsyms: rework symbol lookup return codes

This originally showed up while building with -O3, but later started happening in other configurations as well, depending on inlining decisions. The underlying issue is that the local ‘name’ variable is always initialized to the be the same as ‘buffer’ in the called functions that fill the buffer, which gcc notices while inlining, though it could see that the address check always skips the copy.

v1: bpf: x86: avoid link error with CONFIG_SMP=n

On UP systems, this_cpu_off is not defined, so the new percpu code in bpf fails to link:

x86_64-linux-ld: vmlinux.o: in function do_jit': bpf_jit_comp.c:(.text+0xbab14): undefined reference to this_cpu_off’

Use offset zero on UP builds instead.

v14: net-next: Introducing P4TC (series 1)

This is the first patchset of two. In this patch we are submitting 15 which cover the minimal viable P4 PNA architecture.

v3: net-next: allocate dummy device dynamically

struct net_device shouldn’t be embedded into any structure, instead, the owner should use the private space to embed their state into net_device.

v1: bpf-next: bpf: handle CONFIG_SMP=n configuration in x86 BPF JIT

On non-SMP systems, there is no “per-CPU” data, it’s just global data. So in such case just don’t do this_cpu_off-based per-CPU address adjustment.

Closes: https://lore.kernel.org/oe-kbuild-all/202404040951.d4CUx5S6-lkp@intel.com/

v3: bpf-next: Inline bpf_get_branch_snapshot() BPF helper

Implement inlining of bpf_get_branch_snapshot() BPF helper using generic BPF assembly approach. This allows to reduce LBR record usage right before LBR records are captured from inside BPF program.

v1: bpf-next: bpf: pack struct bpf_fib_lookup

The struct bpf_fib_lookup is supposed to be of size 64. A recent commit a static assertion to check this property so that future changes to the structure will not accidentally break this assumption.

v8: net-next: Device Memory TCP

This revision largely rebases on top of net-next and addresses the feedback RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.

v1: bpf: replace deprecated strncpy with strscpy

strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces.

v1: perf lock contention: Add a missing NULL check

I got a report for a failure in BPF verifier on a recent kernel with perf lock contention command. It checks task->sighand->siglock without checking if sighand is NULL or not. Let’s add one.

v2: libbpf: use local bpf_helpers.h include

Commit 20d59ee55172fdf6 (“libbpf: add bpf_core_cast() macro”) added a bpf_helpers include in bpf_core_read.h as a system include. Usually, the includes are local, though, like in bpf_tracing.h. This commit adjusts the include to be local as well.

v3: bpf-next: Selftests/xsk: Test with maximum and minimum HW ring size configurations

Please find enclosed a patch set that introduces enhancements and new test cases to the selftests/xsk framework. These test the robustness and reliability of AF_XDP across both minimal and maximal ring size configurations.

v1: bpf-next: selftests/bpf: Add pid limit for mptcpify prog

In order to prevent mptcpify prog from affecting the running results of other BPF tests, a pid limit was added to restrict it from only modifying its own program.

v6: net-next: Add minimal XDP support to TI AM65 CPSW Ethernet driver

This patch adds XDP support to TI AM65 CPSW Ethernet driver.

v3: bpf-next: bpf: add a verbose message if map limit is reached

When more than 64 maps are used by a program and its subprograms the verifier returns -E2BIG. Add a verbose message which highlights the source of the error and also print the actual limit.

v2: bpf-next: selftests/bpf: Skip test when perf_event_open returns EOPNOTSUPP

When testing send_signal and stacktrace_build_id_nmi using the riscv sbi pmu driver without the sscofpmf extension or the riscv legacy pmu driver, they encountered failures as follows:

v1: bpf-next: export send_byte and send_recv_data

Export two helpers used by MPTCP BPF selftests.

v1: bpf-next: bpf: Improve program stats run-time calculation

This patch improves the run-time calculation for program stats by capturing the duration as soon as possible after the program returns.

v1: bpf-next: selftests/bpf: Using llvm may_goto inline asm for cond_break macro

Currently, cond_break macro uses bytes to encode the may_goto insn. Patch [1] in llvm implemented may_goto insn in BPF backend. Replace byte-level encoding with llvm inline asm for better usability. Using llvm may_goto insn is controlled by macro __BPF_FEATURE_MAY_GOTO.

v2: ftrace: make extra rcu_is_watching() validation check optional

Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to control whether ftrace low-level code performs additional rcu_is_watching()-based validation logic in an attempt to catch noinstr violations.

v5: perf/x86/amd: add LBR capture support outside of hardware events

Add AMD-specific implementation of perf_snapshot_branch_stack static call that allows LBR capture from arbitrary points in the kernel. This is utilized by BPF programs. See patch #3 for all the details.

周边技术动态

Qemu

riscv disassembler error with pmpcfg0

I’ve been using QEMU8 to collect instruction information on U-Boot + OpenSBI.

v3: target/riscv: raise an exception when CSRRS/CSRRC writes a read-only CSR

Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields. Note that if rs1 specifies a register holding a zero value other than x0, the instruction will still attempt to write the unmodified value back to the CSR and will cause any attendant side effects.

U-Boot

v3: riscv: add support for Milk-V Mars board

The Milk-V Mars board is technically very close to the StarFive VisionFive 2 board.



Read Album:

Read Related:

Read Latest: