RISC-V Linux 内核及周边技术动态第 81 期

呀呀呀创作于 2024/03/04

时间：20240303
编辑：晓怡
仓库：RISC-V Linux 内核技术调研活动
赞助：PLCT Lab, ISCAS

内核动态

RISC-V 架构支持

v6: riscv: Use Kconfig to set unaligned access speed

If the hardware unaligned access speed is known at compile time, it is possible to avoid running the unaligned access speed probe to speedup boot-time.

v1: riscv: hwprobe: export highest virtual userspace address

Some userspace applications (OpenJDK for instance) uses the free MSBs in pointers to insert additional information for their own logic and need to get this information from somewhere. Currently they rely on parsing /proc/cpuinfo “mmu=svxx” string to obtain the current value of virtual address usable bits 1. Since this reflect the raw supported MMU mode, it might differ from the logical one used internally which is why arch_get_mmap_end() is used.

v5: riscv: ASID-related and UP-related TLB flush enhancements

While reviewing Alexandre Ghiti’s “riscv: tlb flush improvements” series1, I noticed that most TLB flush functions end up as a call to local_flush_tlb_all() when SMP is disabled. This series resolves that, and also optimizes the scenario where SMP is enabled but only one CPU is present or online. Along the way, I realized that we should be using single-ASID flushes wherever possible, so I implemented that as well.

v4: RISC-V SBI v2.0 PMU improvements and Perf sampling in KVM guest

This series implements SBI PMU improvements done in SBI v2.01 i.e. PMU snapshot and fw_read_hi() functions.
SBI v2.0 introduced PMU snapshot feature which allows the SBI implementation to provide counter information (i.e. values/overflow status) via a shared memory between the SBI implementation and supervisor OS. This allows to minimize the number of traps in when perf being used inside a kvm guest as it relies on SBI PMU + trap/emulation of the counters.

GIT PULL: RISC-V Sophgo Devicetrees for v6.9

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:
Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)
are available in the Git repository at:
https://github.com:sophgo/linux.git riscv-sophgo-dt-for-v6.9

v15: Refactoring Microchip PCIe driver and add StarFive PCIe

This patchset final purpose is add PCIe driver for StarFive JH7110 SoC. JH7110 using PLDA XpressRICH PCIe IP. Microchip PolarFire Using the same IP and have commit their codes, which are mixed with PLDA controller codes and Microchip platform codes.

回复: v8: Add timer driver for StarFive JH7110 RISC-V SoC

Could you please help to review this patch and give your comments if you have time? Thanks.

v1: arch: mm, vdso: consolidate PAGE_SIZE definition

Naresh noticed that the newly added usage of the PAGE_SIZE macro in include/vdso/datapage.h introduced a build regression. I had an older patch that I revived to have this defined through Kconfig rather than through including asm/page.h, which is not allowed in vdso code.

v1: riscv: deprecate CONFIG_MMU=n

Deprecation of NOMMU support for riscv was discussed during LPC 2023 1. Reasons for this involves lack of users as well as maintenance efforts to support this mode. psABI FDPIC specification also never made it upstream and last public messages of this development seems to date back from 2020 [2]. Plan the deprecation to be done in 2 years from now. Mark the Kconfig option as deprecated by adding a new dummy option which explicitly displays the deprecation in case of CONFIG_MMU=n.

Patch “irqchip/sifive-plic: Enable interrupt if needed before EOI” has been added to the 6.7-stable tree

This is a note to let you know that I’ve just added the patch titled
irqchip/sifive-plic: Enable interrupt if needed before EOI
to the 6.7-stable tree which can be found at:http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

Patch “irqchip/sifive-plic: Enable interrupt if needed before EOI” has been added to the 6.6-stable tree

This is a note to let you know that I’ve just added the patch titled
irqchip/sifive-plic: Enable interrupt if needed before EOI
to the 6.6-stable tree which can be found at:http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

Patch “irqchip/sifive-plic: Enable interrupt if needed before EOI” has been added to the 6.1-stable tree

This is a note to let you know that I’ve just added the patch titled
irqchip/sifive-plic: Enable interrupt if needed before EOI
to the 6.1-stable tree which can be found at:http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

v1: cpuidle: riscv-sbi: Add cluster_pm_enter()/exit()

When the cpus in the same cluster are all in the idle state, the kernel might put the cluster into a deeper low power state. Call the cluster_pm_enter() before entering the low power state and call the cluster_pm_exit() after the cluster woken up.

v15: Linux RISC-V AIA Support

The RISC-V AIA specification is ratified as-per the RISC-V international process. The latest ratified AIA specifcation can be found at: https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf

v1: dt-bindings: pwm: opencores: Add compatible for StarFive JH8100

StarFive JH8100 uses the same OpenCores PWM controller as JH7110. Mark JH8100 as compatible to the OpenCores PWM controller.

进程调度

v2: sched: Add trace_sched_waking() tracepoint to sched_ttwu_pending()

Zimuzo reported seeing occasional cases in perfetto traces where tasks went from sleeping directly to trace_sched_wakeup() without always seeing a trace_sched_waking().

v1: sched/eevdf: avoid task starvation in cgroups

When running update_curr, it is checked whether the current task has missed its deadline (update_deadline). If the deadline has been crossed, the task is set to be rescheduled if there are other tasks available on its cfs_rq. This can cause task starvation in some cgroup configurations.

v1: sched/eevdf: sched feature to dismiss lag on wakeup

The previously used CFS scheduler gave tasks that were woken up an enhanced chance to see runtime immediately by deducting a certain value from its vruntime on runqueue placement during wakeup.
This property was used by some, at least vhost, to ensure, that certain kworkers are scheduled immediately after being woken up. The EEVDF scheduler, does not support this so far. Instead, if such a woken up entitiy carries a negative lag from its previous execution, it will have to wait for the current time slice to finish, which affects the performance of the process expecting the immediate execution negatively.

v1: sched/core: split iowait state into two states

iowait is a bogus metric, but it’s helpful in the sense that it allows short waits to not enter sleep states that have a higher exit latency than we would’ve picked for iowait’ing tasks. However, it’s harmless in that lots of applications and monitoring assumes that iowait is busy time, or otherwise use it as a health metric. Particularly for async IO it’s entirely nonsensical.

内存管理

v1: memcg_kmem hooks refactoring and kmem_cache_charge()

I have tried to look into Linus’s suggestions to reduce slab memcg accounting overhead [1] [2].
The reorganized hooks are in Patch 1 and it definitely seems like nice cleanup on its own.

v2: enable bs > ps in XFS

This is the second version of the series that enables block size > page size (Large Block Size) in XFS. The context and motivation can be seen in cover letter of the RFC v11. We also recorded a talk about this effort at LPC [3], if someone would like more context on this effort.

v7: mm/vmalloc: lock contention optimization under multi-threading

This version has the rearrangement of macros from the previous one.
We are not sure whether we have completely moved these macros and their corresponding helper to the correct position. Could you please help to check whether they are correct?

v1: Merge arm64/riscv hugetlbfs contpte support

This patchset intends to merge the contiguous ptes hugetlbfs implementation of arm64 and riscv.
Both arm64 and riscv support the use of contiguous ptes to map pages that are larger than the default page table size, respectively called contpte and svnapot.

v1: Improved Memory Tier Creation for CPUless NUMA Nodes

The memory tiering component in the kernel is functionally useless for CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the nodes are lumped together in the DRAM tier. https://lore.kernel.org/linux-mm/PH0PR08MB7955E9F08CCB64F23963B5C3A860A@PH0PR08MB7955.namprd08.prod.outlook.com/T/

v1: selftests/mm: Dont fail testsuite due to a lack of hugepages

On systems that have large core counts and large page sizes, but limited memory, the userfaultfd test hugepage requirement is too large.

v1: mm/mempolicy: Use a folio in do_mbind()

We actually add folios to the pagelist already, but then work with them as pages. Removes a call to compound_head() in PageKsm() and removes a reference to page->index.

v2: mm/vmstat: Add order’s information for extfrag_index and unusable_index

Current cat /sys/kernel/debug/extfrag/extfrag_index and /sys/kernel/debug/extfrag/unusable_index is not friendly to userspace.

v2: zswap: replace RB tree with xarray

Very deep RB tree requires rebalance at times. That contributes to the zswap fault latencies. Xarray does not need to perform tree rebalance. Replacing RB tree to xarray can have some small performance gain.
One small difference is that xarray insert might fail with ENOMEM, while RB tree insert does not allocate additional memory.

v3: filemap: avoid unnecessary major faults in filemap_fault()

The major fault occurred when using mlockall(MCL_CURRENT | MCL_FUTURE) in application, which leading to an unexpected issue1.
This caused by temporarily cleared PTE during a read+clear/modify/write update of the PTE, eg, do_numa_page()/change_pte_range().

v2: mm: support large folios swap-in

-v2:
lots of code cleanup according to Chris’s comments, thanks!
collect Chris’s ack tags, thanks!
address David’s comment on moving to use folio_add_new_anon_rmap for !folio_test_anon in do_swap_page, thanks!
remove the MADV_PAGEOUT patch from this series as Ryan will intergrate it into swap-out series
Apply Kairui’s work of “mm/swap: fix race when skipping swapcache” on large folios swap-in as well
fixed corrupted data(zero-filled data) in two races: zswap and a part of entries are in swapcache while some others are not in by checking SWAP_HAS_CACHE while swapping in a large folio

v1: mm: page_alloc: Use div64_ul() instead of do_div()

Fixes Coccinelle/coccicheck warning reported by do_div.cocci.
Compared to do_div(), div64_ul() does not implicitly cast the divisor and does not unnecessarily calculate the remainder.

v1: mm/kmemleak: Don’t hold kmemleak_lock when calling printk()

When some error conditions happen (like OOM), some kmemleak functions call printk() to dump out some useful debugging information while holding the kmemleak_lock. This may cause deadlock as the printk() function may need to allocate additional memory leading to a create_object() call acquiring kmemleak_lock again.

v1: Is pagecache_isize_extended() compatible with large folios?

I’d appreciate some filesystem people checking my work here (in that pagecache_isize_extended() may already be broken and we didn’t notice).
As far as I can tell (and it’d be nice to explain this in the kernel-doc a little more thoroughly), the reason pagecache_isize_extended() exists is that some filesystems rely on getting page_mkwrite() calls in order to instantiate blocks. So if you have a filesystem using 512 byte blocks and a 256 byte file mmaped, a store anywhere in the page will only result in block 0 of the file being instantiated and the folio will now be marked as dirty.

v1: mm: Use folio more widely in __split_huge_page

We already have a folio; use it instead of the head page where reasonable. Saves a couple of calls to compound_head() and elimimnates a few references to page->mapping.

v1: mm/zsmalloc: move get_zspage_lockless into #ifdef

It’s only used from inside of an #ifdef section, causing a warning otherwise:
mm/zsmalloc.c:735:23: error: unused function ‘get_zspage_lockless’ [-Werror,-Wunused-function]735 | static struct zspage *get_zspage_lockless(struct page *page)| ^
Move it down into that block to avoid adding another #ifdef.

v1: mm/treewide: Replace pXd_large() with pXd_leaf()

[based on latest akpm/mm-unstable, commit 1274e7646240]
These two APIs are mostly always the same. It’s confusing to have both of them. Merge them into one. Here I used pXd_leaf() only because pXd_leaf() is a global API which is always defined, while pXd_large() is not.

v2: mm: add alloc_contig_migrate_range allocation statistics

alloc_contig_migrate_range has every information to be able to understand big contiguous allocation latency. For example, how many pages are migrated, how many times they were needed to unmap from page tables.

v2: mm/zsmalloc: don’t need to reserve LSB in handle

We will save allocated tag in the object header to indicate that it’s allocated.
handle |= OBJ_ALLOCATED_TAG;
So the object header needs to reserve LSB for this tag bit.

v1: mm/vmscan: simplify the calculation of fractions for SCAN_FRACT

The current way to calculate fractions for SACN_FRACT is little readable and more complicated than it should be. It also performs unnecessary division and adjustment to avoid zero operands. Prune away by multiplying the fractions by ‘anon_cost * file_cost / (3 * total_cost)’:

v1: mm: convert folio_estimated_sharers() to folio_likely_mapped_shared()

Callers of folio_estimated_sharers() only care about “mapped shared vs. mapped exclusively”, not the exact estimate of sharers. Let’s consolidate and unify the condition users are checking. While at it clarify the semantics and extend the discussion on the fuzziness.

v1: mm/cma: convert cma_alloc() to return folio

Change cma_alloc() to return struct folio. This further increases the usage of folios in mm/hugetlb.

v3: Rearrange batched folio freeing

Other than the obvious “remove calls to compound_head” changes, the fundamental belief here is that iterating a linked list is much slower than iterating an array (5-15x slower in my testing). There’s also an associated belief that since we iterate the batch of folios three times, we do better when the array is small (ie 15 entries) than we do with a batch that is hundreds of entries long, which only gives us the opportunity for the first pages to fall out of cache by the time we get to the end.

v1: make the hugetlb migration strategy consistent

As discussed in previous thread 1, there is an inconsistency when handling hugetlb migration. When handling the migration of freed hugetlb, it prevents fallback to other NUMA nodes in alloc_and_dissolve_hugetlb_folio(). However, when dealing with in-use hugetlb, it allows fallback to other NUMA nodes in alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool and might result in unexpected failures when node bound workloads doesn’t get what is asssumed available.

v2: mm: make folio_pte_batch available outside of mm/memory.c

madvise, mprotect and some others might need folio_pte_batch to check if a range of PTEs are completely mapped to a large folio with contiguous physical addresses. Let’s make it available in mm/internal.h.

v1: mm/zsmalloc: simplify synchronization between zs_page_migrate() and free_zspage()

free_zspage() has to hold locks of all pages, since zs_page_migrate() path rely on this page lock to protect the race between zs_free() and it, so it can safely get zspage from page->private.

v1: mm/zsmalloc: don’t need to save tag bit in handle

We only need to save the position (pfn + obj_idx) in the handle, don’t need to save tag bit in handle. So one more bit can be used as obj_idx.

v1: mm: export folio_pte_batch as a couple of modules might need it

madvise and some others might need folio_pte_batch to check if a range of PTEs are completely mapped to a large folio with contiguous physcial addresses. Let’s export it for others to use.

v5: Split a folio to any lower order folios

File folio supports any order and multi-size THP is upstreamed1, so both file and anonymous folios can be >0 order. Currently, split_huge_page() only splits a huge page to order-0 pages, but splitting to orders higher than 0 might better utilize large folios, if done properly. In addition, Large Block Sizes in XFS support would benefit from it during truncate[2]. This patchset adds support for splitting a large folio to any lower order folios. The patchset is on top of mm-everything-2024-02-24-02-40.

v2: Cover a guard gap corner case

For v2, the notable change is a bug fix to not clobber the MMF_TOPDOWN during fork. In the RFC this resulted in fork() children that didn’t exec getting the map up behavior, which included the stress-ng bigheap test. It turns out much of the 4% improvement seen was due to the bottomup mapping direction. With the fix, the performance benefit was a less surprising

v1: vfio/type1: unpin PageReserved page

We meet a warning as following:WARNING: CPU: 99 PID: 1766859 at mm/gup.c:209 try_grab_page.part.0+0xe8/0x1b0CPU: 99 PID: 1766859 Comm: qemu-kvm Kdump: loaded Tainted: GOE 5.10.134-008.2.x86_64 #1Hardware name: Foxconn AliServer-Thor-04-12U-v2/Thunder2, BIOS 1.0.PL.FC.P.031.00 05/18/2022

文件系统

v2: qnx6: convert qnx6 to use the new mount api

Convert the qnx6 filesystem to use the new mount API.
Mostly untested, since there is no qnx6 fs image readily available. Testing did include parsing of the mmi_fs option.

GIT PULL: xfs: Code changes for 6.8

Please pull this branch with changes for xfs for 6.8-rc7. The changes are limited to just one patch where we drop experimental warning message when mounting an xfs filesystem on an fsdax device. We now consider xfs on fsdax to be stable.

v1: ext4: Add direct-io atomic write support using fsawu

This RFC series adds support for atomic writes to ext4 direct-io using filesystem atomic write unit. It’s built on top of John’s “block atomic write v5” series which adds RWF_ATOMIC flag interface to pwritev2() and enables atomic write support in underlying device driver and block layer.

v7: tracing: Support to dump instance traces by ftrace_dump_on_oops

Currently ftrace only dumps the global trace buffer on an OOPs. For debugging a production usecase, instance trace will be helpful to check specific problems since global trace buffer may be used for other purposes.

v1: afs: Don’t cache preferred address

In the AFS fileserver rotation algorithm, don’t cache the preferred address for the server as that will override the explicit preference if a non-preferred address responds first.

v1: fanotify: move path permission and security check

In current state do_fanotify_mark() does path permission and security checking before doing the event configuration checks. In the case where user configures mount and sb marks with kernel internal pseudo fs, security_path_notify() yields an EACESS and causes an earlier exit. Instead, this particular case should have been handled by fanotify_events_supported() and exited with an EINVAL.

v1: fs_parser: handle parameters that can be empty and don’t have a value

While investigating an ext4/053 fstest failure, I realised that when the flag ‘fs_param_can_be_empty’ is set in a parameter and it’s value is NULL that parameter isn’t being handled as a ‘flag’ type. Even if it’s type is set to ‘fs_value_is_flag’. The first patch in this series changes this behaviour.

v2: qnx4: convert qnx4 to use the new mount api

Convert the qnx4 filesystem to use the new mount API.
Tested mount, umount, and remount using a qnx4 boot image.

v1: hugetlbfs: support idmapped mounts

pass down the idmapped mount information to the different helper functions.
Differently, hugetlb_file_setup() will continue to not have any mapping since it is only used from contexts where idmapped mounts are not used.

v1: ext4: Do endio process under irq context for DIO overwrites

Recently we found an ext4 performance regression problem between 4.18 and 5.10 by following test command on a x86 physical machine with nvme:fio -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k-size=2G -numjobs=1 -time_based -runtime=60 -group_reporting-filename=/test/test -name=Rand_write_Testing –cpus_allowed=1

v1: buffered write path without inode lock (for bcachefs)

this is going in my for-next branch - it’s tested and I think all the corner cases are handled to my satisfaction (there are some fun ones!)

v1: virtiofs: don’t mark virtio_fs_sysfs_exit as __exit

Calling an __exit function from an __init function is not allowed and will result in undefined behavior when the code is built-in:

v1: fs: use inode_set_ctime_to_ts to set inode ctime to current time

The function inode_set_ctime_current simply retrieves the current time and assigns it to the field __i_ctime without any alterations. Therefore, it is possible to set ctime to now directly using inode_set_ctime_to_ts

v1: xarray: add guard definitions for xa_lock

Add DEFINE_GUARD definitions so that xa_lock can be used with guard() or scoped_guard().

v1: xfs: stop advertising SB_I_VERSION

The redefinition of how NFS wants inode->i_version to be updated is incomaptible with the XFS i_version mechanism. The VFS now wants inode->i_version to only change when ctime changes (i.e. it has become a ctime change counter, not an inode change counter). XFS has

v5: block atomic writes

This series introduces a proposal to implementing atomic writes in the kernel for torn-write protection.
This series takes the approach of adding a new “atomic” flag to each of pwritev2() and iocb->ki_flags - RWF_ATOMIC and IOCB_ATOMIC, respectively. When set, these indicate that we want the write issued “atomically”.

v2: fuse: add support for explicit export disabling

open_by_handle_at(2) can fail with -ESTALE with a valid handle returned by a previous name_to_handle_at(2) for evicted fuse inodes, which is especially common when entry_valid_timeout is 0, e.g. when the fuse daemon is in “cache=none” mode.

v1: bcachefs disk accounting rewrite

here it is; the disk accounting rewrite I’ve been talking about since forever.
git link: https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-disk-accounting-rewrite

v1: blk: optimization for classic polling

This removes the dependency on interrupts to wake up task. Set task state as TASK_RUNNING, if need_resched() returns true, while polling for IO completion. Earlier, polling task used to sleep, relying on interrupt to wake it up. This made some IO take very long when interrupt-coalescing is enabled in NVMe.

网络设备

v9: net-next: net: ethernet: Rework EEE

Most MAC drivers get EEE wrong. The API to the PHY is not very obvious, which is probably why. Rework the API, pushing most of the EEE handling into phylib core, leaving the MAC drivers to just enable/disable support for EEE in there change_link call back.

v2: net-next: Add en8811h phy driver and devicetree binding doc

This patch series adds the driver and the devicetree binding documentation for the Airoha en8811h PHY.

v1: net-next: net: constify struct class usage

This is a simple and straight forward cleanup series that aims to make the class structures in net constant. This has been possible since 2023 1.

v2: net-next: ethtool: ignore unused/unreliable fields in set_eee op

This function is used with the set_eee() ethtool operation. Certain fields of struct ethtool_keee() are relevant only for the get_eee() operation. In addition, in case of the ioctl interface, we have no guarantee that userspace sends sane values in struct ethtool_eee. Therefore explicitly ignore all fields not needed for set_eee(). This protects from drivers trying to use unchecked and unreliable data, relying on specific userspace behavior.

v1: net-next: net/nlmon: Cancel setting filelds of statistics to zero.

Since filelds of rtnl_link_stats64 have been set to zero in previous dev_get_stats function, there is no need to set again in ndo_get_stats64 function.

v1: net-next: net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list()

Many syzbot reports show extreme rtnl pressure, and many of them hint that smc acquires rtnl in netns creation for no good reason 1

v1: net-next: tools: ynl: add –dbg-small-recv for easier kernel testing

When testing netlink dumps I usually hack some user space up to constrain its user space buffer size (iproute2, ethtool or ynl). Netlink will try to fill the messages up, so since these apps use large buffers by default, the dumps are rarely fragmented.

v6: net-next: net: dsa: vsc73xx: Make vsc73xx usable

This patch series focuses on making vsc73xx usable.
The first patch was added in v2; it switches from a poll loop to read_poll_timeout.

v1: net-next: mptcp: userspace pm: ‘dump addrs’ and ‘get addr’

This series from Geliang adds two new Netlink commands to the userspace PM:
one to dump all addresses of a specific MPTCP connection:
feature added in patches 3 to 5
test added in patches 7, 8 and 10

v1: net-next: mptcp: add TCP_NOTSENT_LOWAT sockopt support

Patch 3 does the magic of adding TCP_NOTSENT_LOWAT support, all the other ones are minor cleanup seen along when working on the new feature.

v1: net-next: tools/net/ynl: Add support for nlctrl netlink family

This series adds a new YNL spec for the nlctrl family, plus some fixes and enhancements for ynl.

v2: net-next: net: ipa: simplify device pointer access

This version of this patch series fixes the bugs in the first patch (which were fixed in the second), where ipa_interrupt_config() had two remaining spots that returned a pointer rather than an integer.

v1: net-next: rxrpc: Miscellaneous changes and make use of MSG_SPLICE_PAGES

Here are some changes to AF_RXRPC:
(1) Cache the transmission serial number of ACK and DATA packets in therxrpc_txbuf struct and log this in the retransmit tracepoint.

v2: iwl-next: XDP Tx Hardware Timestamp for igc driver

Implemented XDP transmit hardware timestamp metadata for igc driver.
This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata on Intel ADL-S platform. Below are the test steps and results.

v2: net: netfilter: Add protection for bmp length out of range

UBSAN load reports an exception of BRK#5515 SHIFT_ISSUE:Bitwise shifts that are out of bounds for their data type.
vmlinux get_bitmap(b=75) + 712 <net/netfilter/nf_conntrack_h323_asn1.c:0> vmlinux decode_seq(bs=0xFFFFFFD008037000, f=0xFFFFFFD008037018, level=134443100) + 1956 <net/netfilter/nf_conntrack_h323_asn1.c:592>

v1: net: hns: Use common error handling code in hns_mac_init()

Date: Fri, 1 Mar 2024 15:48:25 +0100
Add a jump target so that a bit of exception handling can be better reused at the end of this function implementation.

v2: Add minimal XDP support to TI AM65 CPSW Ethernet driver

This patch adds XDP support to TI AM65 CPSW Ethernet driver.
The following features are implemented: NETDEV_XDP_ACT_BASIC, NETDEV_XDP_ACT_REDIRECT, and NETDEV_XDP_ACT_NDO_XMIT.

v4: net: ipv6: fib6_rules: flush route cache when rule is changed

When rule policy is changed, ipv6 socket cache is not refreshed. The sock’s skb still uses a outdated route cache and was sent to a wrong interface.

v3: net-next: MT7530 DSA Subdriver Improvements Act III

This is the third patch series with the goal of simplifying the MT7530 DSA subdriver and improving support for MT7530, MT7531, and the switch on the MT7988 SoC.

v1: net-next: ps3_gelic_net: Use napi routines for RX SKB

Convert the PS3 Gelic network driver’s RX SK buffers over to use the napi_alloc_frag_align and napi_build_skb routines.

安全增强

v3: string: Convert selftests to KUnit

I realized the string selftests hadn’t been converted to KUnit yet. Do that.

v2: Handle faults in KUnit tests

This patch series teaches KUnit to handle kthread faults as errors, and it brings a few related fixes and improvements.

v5: arm64: qcom: add AIM300 AIoT board suppo

Add AIM300 AIoT support along with usb, ufs, regulators, serial, PCIe, and PMIC functions. AIM300 Series is a highly optimized family of modules designed to support AIoT applications. It integrates QCS8550 SoC, UFS and PMIC chip etc.

v1: overflow: Allow non-type arg to type_max() and type_min()

A common use of type_max() is to find the max for the type of a variable. Using the pattern type_max(typeof(var)) is needlessly verbose.

v1: compiler.h: Explain how __is_constexpr() works

The __is_constexpr() macro is dark magic. Shed some light on it with a comment to explain how and why it works.

v1: netdev: Use flexible array for trailing private bytes

Introduce a new struct net_device_priv that contains struct net_device but also accounts for the commonly trailing bytes through the “size” and “data” members.

v1: Run KUnit tests late and handle faults

This patch series moves KUnit test execution at the very end of kernel initialization, just before launching the init process. This opens the way to test any kernel code in its normal state (i.e. fully initialized).

v2: scsi: replace deprecated strncpy

This series contains multiple replacements of strncpy throughout the scsi subsystem.

v4: iio: core: New macros and making use of them

Added new macros to overflow.h and reuse it in IIO. For the sake of examples a few more places were updated (requested by Kees). In case maintainers are okay, tags will be appreciated.

v1: lib: stackinit: Adjust target string to 8 bytes for m68k

For reasons I cannot understand, m68k moves the start of the stack frame for consecutive calls to the same function if the function’s test variable is larger than 8 bytes.

v2: x86, relocs: Ignore relocations in .notes section

When building with CONFIG_XEN_PV=y, .text symbols are emitted into the .notes section so that Xen can find the “startup_xen” entry point.

v1: thermal: core: Move initial num_trips assignment before memcpy()

This panic occurs because trips is counted by num_trips but num_trips is assigned after the call to memcpy(), so the fortify checks think the buffer size is zero because tz was allocated with kzalloc().

异步 IO

[PATCH io_uring/net: correct the type of variable

The namelen is of type int. It shouldn’t be made size_t which is unsigned. The signed number is needed for error checking before use.

v1: io_uring: get rid of intermediate aux cqe caches

With defer taskrun we store aux cqes into a cache array and then flush into the CQ, and we also maintain the ordering so aux cqes are flushed before request completions.

v10: io_uring: Statistics of the true utilization of sq threads.

Count the running time and actual IO processing time of the sqpoll thread, and output the statistical data to fdinfo.

v2: io_uring/net: improve the usercopy for sendmsg/recvmsg

We’re spending a considerable amount of the sendmsg/recvmsg time just copying in the message header. And for provided buffers, the known single entry iovec.

Rust For Linux

v5: kselftest: Add basic test for probing the rust sample modules

Add new basic kselftest that checks if the available rust sample modules can be added and removed correctly.

v2: Arc methods for linked list

This patchset contains two useful methods for the Arc type. They will be used in my Rust linked list implementation, which Rust Binder uses.

v1: Rewrite the VP9 codec library in Rust

This patch ports the VP9 library written by Andrzej into Rust as a proof-of-concept. This is so that we can evaluate the Rust in V4L2 initiative with source code in hand.

v4: rust: locks: Add get_mut method to Lock

Having a mutable reference guarantees that no other threads have access to the lock, so we can take advantage of that to grant callers access to the protected data without the the cost of acquiring and releasing the locks.

v1: rust: add Module::as_ptr

This allows you to get a raw pointer to THIS_MODULE for use in unsafe code. The Rust Binder RFC uses it when defining fops for the binderfs component 1.

BPF

v1: libbpf: Correct debug message in btf__load_vmlinux_btf

In the function btf__load_vmlinux_btf, the debug message incorrectly refers to ‘path’ instead of ‘sysfs_btf_path’.

v1: bpf-next: selftests/bpf: extend uprobe/uretprobe triggering benchmarks

Settle on three “flavors” of uprobe/uretprobe, installed on different kinds of instruction: nop, push, and ret. All three are testing

v1: dwarves: btf_encoder: dynamically allocate the vars array for percpu variables

Use consistent method across allocating function and per-cpu variable representations, based around (re)allocating the arrays based on demand. This avoids issues where the number of per-CPU variables exceeds the hardcoded limit.

v2: net: raise RCU qs after each threaded NAPI poll

We noticed task RCUs being blocked when threaded NAPIs are very busy at workloads: detaching any BPF tracing programs, i.e. removing a ftrace trampoline, will simply block for very long in rcu_tasks_wait_gp.

v2: net-next: Use per-task storage for XDP-redirects on PREEMPT_RT

In [0] I introduced explicit locking for resources which are otherwise locked implicit locked by local_bh_disable() and this protections goes away if the lock in local_bh_disable() is removed on PREEMPT_RT.

v1: tools/testing/selftests/bpf/test_tc_tunnel.sh: Prevent client connect before server bind

In some systems, the netcat server can incur in delay to start listening. When this happens, the test can randomly fail in various points.

v1: bpf-next: bpftool: Mount bpffs on provided dir instead of parent dir

When pinning programs/objects under PATH (eg: during “bpftool prog loadall”) the bpffs is mounted on the parent dir of PATH in the following situations:
the given dir exists but it is not bpffs.
the given dir doesn’t exist and the parent dir is not bpffs.

v3: vhost: virtio: drivers maintain dma info for premapped vq

As discussed: http://lore.kernel.org/all/CACGkMEvq0No8QGC46U4mGsMtuD44fD_cfLcPaVmJ3rHYqRZxYg@mail.gmail.com

v6: bpf-next: Create shadow types for struct_ops maps in skeletons

This patchset allows skeleton users to change the values of the fields in struct_ops maps at runtime. It will create a shadow type pointer in a skeleton for each struct_ops map, allowing users to access the values of fields through these pointers.

v1: bpf: Chose RCU Tasks based on TASKS_RCU rather than PREEMPTION

The advent of CONFIG_PREEMPT_AUTO, AKA lazy preemption, will mean that even kernels built with CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY might see the occasional preemption, and that this preemption just might happen within a trampoline.

v2: net-next: tun: AF_XDP Tx zero-copy support

Now, some drivers support the zero-copy feature of AF_XDP sockets, which can significantly reduce CPU utilization for XDP programs.

[PATCH RFCv2 bpf-next 0/4] bpf: Introduce kprobe multi wrapper attach

adding support to attach both entry and return bpf program on single kprobe multi link. The first RFC patchset is in [0].

v2: perf lock contention: Account contending locks too

Currently it accounts the contention using delta between timestamps in lock:contention_begin and lock:contention_end tracepoints. But it means the lock should see the both events during the monitoring period.

v1: bpf-next: Support kCFI + BPF on arm64

On ARM64 with CONFIG_CFI_CLANG, CFI warnings can be triggered by running the bpf selftests. This is because the JIT doesn’t emit proper CFI prologues for BPF programs, callbacks, and struct_ops trampolines.

v12: net-next: Introducing P4TC (series 1)

This is the first patchset of two. In this patch we are submitting 15 which cover the minimal viable P4 PNA architecture.

周边技术动态

Qemu

v1: target/riscv: move ratified/frozen exts to non-experimental

smaia and ssaia were ratified in August 25th 2023 1.
zvfh and zvfhmin were ratified in August 2nd 2023 [2].

What riscv tracing tools do you recommend and how are they accurate for measurements?

Recently, I was planning to measure the performance of my application of interest for potential RISC-V hardware. Hence, I started my simulations from Spike to analyze dynamic instruction traces and instruction count, nevertheless given it does not support multithreading, I started using Qemu to test my app too.

v4: RISC-V: Modularize common match conditions for trigger

According to RISC-V Debug specification ratified version 0.13 1 (also applied to version 1.0 [2] but it has not been ratified yet), the enabled privilege levels of the trigger is common match conditions for all the types of the trigger.

support on risc-v 128bits

hi, i would like developpe my OS on risc-v 128 bits. after search the support isn´t fully operational
how can i help, and in the same learn risc-v 128 bits

[置顶] 国内第 5 届开源之夏，泰晓科技技术社区踊跃参与，携 7 个项目参加，欢迎大家报名。开源之夏 - Summer 2024

[置顶] Linux Lab v1.3 升级部分内核到 v6.6，新增上游内核工具链支持，完善 riscv64 和 nolibc 开发支持，另有新增 2 款虚拟开发板：ppc64le/pseries 和 ppc64le/powernvLinux Lab 发布 v1.3 正式版，升级部分内核到 v6.6，新增 2 款 ppc64 虚拟开发板

RISC-V Linux 内核及周边技术动态第 81 期

内核动态

RISC-V 架构支持

进程调度

内存管理

文件系统

网络设备

安全增强

异步 IO

Rust For Linux

BPF

周边技术动态

Qemu

猜你喜欢：

Read Album:

Read Related:

Read Latest:

支付宝打赏￥9.68元		微信打赏￥9.68元
	请作者喝杯咖啡吧