Last updated: 2026-07-05 05:01 UTC
All documents
Number of pages: 167
| Author(s) | Title | Year | Publication | Keywords | ||
|---|---|---|---|---|---|---|
| Yahuza Bello, Ahmed Refaey, Ping Yang | Secure Multi-Timescale Orchestration for Zero-Trust Cross-Datacenter Networks | 2026 | Early Access | Authentication Optimization Resource management Modeling Costing Costs Timing Data centers Learning (artificial intelligence) Security Zero trust architecture hierarchical deep reinforcement learning cross-datacenter networks multi-timescale optimization resource management | The widespread deployment of geographically distributed Data Centers (DCs) has intensified the need for scalable and secure access control mechanisms across Cross-Datacenter Networks (CDNs). Zero Trust Architecture (ZTA) addresses this need by enforcing continuous authentication and authorization through Policy Decision Points (PDPs); however, determining where to deploy PDPs and how to dynamically assign authentication requests in the CDNs remains a challenging and NP-hard problem. This challenge arises from the tight coupling between long-term placement decisions and short-term, stochastic authentication workloads. In this paper, we formulate a joint PDP placement and authentication assignment problem for zero-trust-enabled CDNs that minimizes deployment cost, authentication assignment cost, bandwidth consumption, and the number of active PDP instances under resource constraints. To efficiently solve the problem, we propose a Hybrid Hierarchical Deep Reinforcement Learning (HHDRL) framework that decomposes decision-making across multiple time scales. A high-level Double Deep Q-Network (DDQN) agent learns long-term PDP placement policies, while multiple low-level Asynchronous Advantage Actor–Critic (A3C) agents perform real-time authentication assignment within each DC. Extensive simulations demonstrate that the proposed DDQN–A3C framework converges reliably and consistently outperforms benchmark schemes, including DDQN–A2C, a single-agent DDQN approach, and a greedy baseline, achieving lower overall system cost and improved scalability with modest computational overhead. | 10.1109/TNSM.2026.3707392 |
| Madhura Adeppady, Yenchia Yu, Ali Rahmanian, Ahmed Ali-Eldin Hassan, Carla Fabiana Chiasserini | Efficient Management of Composite Heterogeneous Applications at the Network Edge | 2026 | Early Access | Central Processing Unit Servers Resource management Costing Costs Modeling Joining processes Timing Memory Measurement Mobile edge computing Stateless and stateful microservices Application deployment and migration Service management | Edge computing is a promising paradigm for deploying latency-sensitive applications (Apps) as it brings resources closer to end users. Edge Apps often adopt a microservice (MS) architecture, breaking monolithic Apps into lightweight, containerized MSs that can be dynamically and independently deployed. However, managing such Apps involves three key challenges: (i) optimizing the placement of MSs to reduce both response time and resource overhead, (ii) handling MS migration or relocation as users move while minimizing App service disruption (App downtime), and (iii) enabling MS sharing across Apps while ensuring performance guarantees. We formulate this as an optimization problem, named Multi-microservice Application Placement (MAP), prove its NP-hardness, and introduce STEP (State and Topology-aware Edge-MS Placement), a polynomial-time heuristic. STEP distinguishes itself from prior work by: (i) jointly considering stateful and stateless MS characteristics in deployment decisions, (ii) exploiting MS shareability to reduce resource usage, (iii) balancing response latency, App downtime, and resource utilization, and (iv) leveraging multiple versions of the same MS to adapt quality of service to available edge resources. Our results in a small-scale scenario show that STEP achieves near-optimal performance with only 7% higher CPU cost than the optimal solution. Large-scale real-time experiments on a Kubernetes cluster demonstrate that STEP consistently outperforms competing methods, achieving up to 50% lower deployment costs while delivering 50% gain in app quality and saving 15% in radio resources with over 90% request success rates. | 10.1109/TNSM.2026.3709656 |
| Shi-Xin Huang, Te-Chuan Chiu, Jing-Chih Lin, Cheng-Hsuan Kuo | EdgeCookie: A Mitigation Solution Against Threatening TCP DDoS Attack in Edge Cloud | 2026 | Early Access | Servers Switches TCP Floods Filtering Filters Architecture Computer architecture Security Kernel SYN Flood DRDoS Edge Computing Security | With the explosive growth of GenAI service requirements, the demand for digital infrastructure and cloud resources continues to increase. At the same time, distributed denial-of-service (DDoS) attacks – particularly TCP-based vectors such as SYN flood and emerging TCP distributed reflective denial-of-service (DRDoS) – have surged, posing a significant threat to service availability. Current mitigation strategies often fall short in effectively countering both attack types. Although the proliferation of edge computing offers opportunities to deploy mitigation closer to attack sources, it also introduces synchronization challenges across distributed edge servers. In this paper, we propose EdgeCookie, an edge-centric TCP flood attack mitigation architecture. EdgeCookie can mitigate TCP SYN floods, ACK floods, and emerging TCP reflection amplification attacks. Unlike existing switch-based defenses, EdgeCookie requires no specific hardware, making it suitable for running in resource-limited edge clouds. In the core mechanism, we introduce a novel HybridCookie that effectively solves synchronization challenges across distributed edge servers. Experimental results demonstrate that EdgeCookie can mitigate both TCP SYN flood and emerging TCP reflection amplification attacks without facing false positive issues, while maintaining high throughput and adding negligible latency to legitimate traffic. | 10.1109/TNSM.2026.3706627 |
| Soonbeom Kwon, Yusu Noh, Youngwoo Jang, Illyoung Choi, Byungchul Tak, In-geol Chun, Young-Kyoon Suh | Scalable and Robust Resource Provisioning via Adaptive Task Scheduling for Edge Devices | 2026 | Early Access | Schedules Scheduling Cloning Timing Educational institutions Computers Transcoding Videos Tail Edge computing Edge devices Edge server Resource augmentation Task distribution Kubernetes | Edge devices, such as wearables, drones, and CCTV systems, are vital for real-time data collection in urban intelligence. However, their limited computational and storage capacities pose significant challenges. While offloading to public clouds offers scalability, it often incurs high latency and operational costs. Conversely, centralizing workloads on edge servers may result in the underutilization of high-performance edge devices. To address these limitations, we introduce ERPF, a Kubernetes-based Edge Resource Provisioning Framework that augments the capabilities of heterogeneous edge environments. ERPF orchestrates dynamic volume provisioning, GPU-aware resource allocation, execution context migration, and adaptive task distribution to improve system flexibility and efficiency. Building on this, we propose a novel adaptive task scheduling technique, termed eATS, composed of three key mechanisms: (i) Partition Smoothing Scheme for stable task granularity control, (ii) Resilient Edge Reintegration for failure detection and task reassignment, and (iii) Competitive Task Cloning for speculative execution with fastest-result commitment. The proposed eATS scheme reduces task execution time by up to 27.6%, lowers partition size variability by 8.7×, and improves scheduling robustness across heterogeneous edge devices over the baseline. | 10.1109/TNSM.2026.3694238 |
| Deemah H. Tashman, Soumaya Cherkaoui | Trustworthy AI-Driven Dynamic Hybrid RIS: Joint Optimization and Reward Poisoning-Resilient Control in Cognitive MISO Networks | 2026 | Early Access | Reconfigurable intelligent surfaces Reliability Optimization Security MISO Array signal processing Vectors Satellites Reflection Interference Beamforming cascaded channels cognitive radio networks deep reinforcement learning dynamic hybrid reconfigurable intelligent surfaces energy harvesting poisoning attacks | Cognitive radio networks (CRNs) are a key mechanism for alleviating spectrum scarcity by enabling secondary users (SUs) to opportunistically access licensed frequency bands without harmful interference to primary users (PUs). To address unreliable direct SU links and energy constraints common in next-generation wireless networks, this work introduces an adaptive, energy-aware hybrid reconfigurable intelligent surface (RIS) for underlay multiple-input single-output (MISO) CRNs. Distinct from prior approaches relying on static RIS architectures, our proposed RIS dynamically alternates between passive and active operation modes in real time according to harvested energy availability. We also model our scenario under practical hardware impairments and cascaded fading channels. We formulate and solve a joint transmit beamforming and RIS phase optimization problem via the soft actor-critic (SAC) deep reinforcement learning (DRL) method, leveraging its robustness in continuous and highly dynamic environments. Notably, we conduct the first systematic study of reward poisoning attacks on DRL agents in RIS-enhanced CRNs, and propose a lightweight, real-time defense based on reward clipping and statistical anomaly filtering. Numerical results demonstrate that the SAC-based approach consistently outperforms established DRL base-lines, and that the dynamic hybrid RIS strikes a superior trade-off between throughput and energy consumption compared to fully passive and fully active alternatives. We further show the effectiveness of our defense in maintaining SU performance even under adversarial conditions. Our results advance the practical and secure deployment of RIS-assisted CRNs, and highlight crucial design insights for energy-constrained wireless systems. | 10.1109/TNSM.2026.3660728 |
| Jeffrey Redondo, Nauman Aslam, Juan Zhang, Zhenhui Yuan | Optimising QoS in HD Map Updates: Cross-Layer Multi-Agent with Multi-task and Mixed-Dependence (MTMD) | 2026 | Early Access | Optimization Timing High definition video Quality of service Media Access Control Information rates Throughput Vehicles Modeling Videos Edge computing HD map hierarchical learning latency multi-agent offloading reinforcement learning | High-definition (HD) maps generated from autonomous vehicle (AV) sensor data are essential for enabling high levels of driving automation. However, offloading large volumes of raw sensory data to edge servers in dense vehicular ad hoc networks (VANETs) introduces significant latency due to network congestion and packet collisions. Existing solutions primarily focus on dynamically adjusting the minimum contention window (CWmin), while additional MAC-layer parameters — including the maximum contention window (CWmax) and interframe space number (IFSn) — remain largely underexplored. To address this, we propose a cross-layer multi-agent reinforcement learning (MARL) framework that jointly optimises CWmin–CWmax, IFSn, and transmission waiting time within IEEE 802.11p-compliant bounds. The proposed multi-task mixed-dependence (MTMD) framework decomposes the optimisation problem into specialised subtasks handled by selectively coupled agents, balancing coordination and scalability while avoiding the overhead of fully symmetric MARL or centralised hierarchical controllers. A lightweight orchestration layer coordinates agent interaction with the simulation environment via secure message exchange. Evaluated against standard EDCA and representative RL baselines, MTMD achieves latency reductions of 31%, 49%, 87.3%, and 64% for Voice, Video, HD Map, and Best-Effort traffic, respectively, confirming the effectiveness of structured multi-parameter optimisation for latency-critical vehicular applications. | 10.1109/TNSM.2026.3705270 |
| Ibirisol Fontes Ferreira, Eiji Oki | Forestall: A Prefetching Scheme for Domain Name System Resolver Cache Services | 2026 | Early Access | Prefetching Timing Servers Modeling Management Measurement Recording Ecosystems Tracking TV Domain name systems service architecture caching time-to-live renewal policy prefetching | The domain name system (DNS) is crucial to accessing Internet services by playing an essential role in facilitating this process for Internet users. Still, it affects the quality of experience within the Internet service chain. This impact includes the role of the resolver component, which can negatively influence the final user experience when consuming services. Some studies have developed strategies to reduce resolution time within the DNS resolver ecosystem by incorporating components into users’ devices to trigger resolution in advance, changing DNS service and cache algorithm implementation, or utilizing a complex and expensive service architecture that is not scalable for local DNS resolvers in edge deployments. This paper proposes a dynamic prefetching scheme called Forestall to reduce misses, including those caused by expired domain translation data, and to improve the overall performance of the resolver cache component. We model the prefetching scheme for DNS resolvers using DNS transactional information. We define a prefetching advising routine that advises on possible domains by observing past request patterns. We introduce two prefetching routines for efficient domain tracking and advising. We introduce miss-based metrics to measure the efficiency of the prefetching scheme and the potential resource trade-off associated with its deployment. The numerical results indicate that the prefetching scheme improves the performance of the DNS resolver cache component compared to well-deployed prefetching solutions on the Internet. Forestall reduces the miss ratio by more than 50%, depending on the dataset. In a specific workload, Forestall’s results with adjusted parameter combinations yield a decrease in the miss ratio of more than 16%, accompanied by a reasonable increase in additional fetches of around 35%. In terms of service latency that users perceive, Forestall achieves a reduction varying between 20% and 49%. | 10.1109/TNSM.2026.3704549 |
| Kunpeng Zheng, Huibin Zhang, Yongli Zhao, Yuan Cao, Wei Wang, Xin Li, Zhuangzhuang Ma, Lihan Zhao, Jie Zhang | Sun-Outage-Aware Topology Modeling and Adaptive Routing for Optical Satellite Networks | 2026 | Early Access | Sun Interrupters Joining processes Satellites Routing Algorithms Modeling Timing Topology Interference Optical inter-satellite links optical service connections optical satellite network sun outage topology modeling | Optical satellite networks, supported by optical inter-satellite links (OISLs), provide reliable and low-latency optical connectivity. However, periodic and predictable sun outage events significantly compromise OISL availability, leading to frequent OISL interruptions and reduced network reliability. Existing routing algorithms often overlook the regularity of sun outage-induced interrupts and their differentiated impacts on services, resulting in degraded service performance. To address this challenge, this paper proposes a sun outage-enhanced time discretization OISL model and introduces a sun outage link-aware routing (SOLR) algorithm. By incorporating joint awareness of sun outage patterns and service requirements, SOLR employs an adaptive optimization mechanism to dynamically adjust routing decisions within temporal windows. Experimental results demonstrate that SOLR extends stable path durations by 39.9%, reduces interruption rates by 28.5%, and decreases blocking rates by 36.4%, significantly outperforming link-state-based routing algorithms. By effectively mitigating the impact of sun outages, SOLR ensures continuous optical service connections. This interruption-tolerant framework bridges network modeling and service provisioning, offering a robust solution for mission-critical service in optical satellite networks. | 10.1109/TNSM.2026.3697856 |
| Masoumeh Safkhani, Mohammad Reza Servati, Fatemeh Rezaei | HEIoT: A Novel Three-Factor Authentication Protocol for Enhanced Security in IoT and Next-Generation Networks | 2026 | Early Access | Authentication Internet of Things Protocols Security Smart devices Elliptic curve cryptography Modeling Error correction codes Biometrics Costing of Yuan et al.’s Protocol Authentication Multi-factor authentication Desynchronization attack Insider adversary Traceability attack User impersonation attack Elliptic Curve Cryptography (ECC) | The Internet has a significant impact on contemporary society, enabling a wide range of applications, including advanced cellular networks such as 4G, 5G, and 6G. Since these communications occur over shared or open channels, ensuring secure data exchange is of critical importance, as any weakness in the communication infrastructure may compromise system reliability. Device authentication in the Internet of Things (IoT) and user authentication in smart environments, such as smart homes, remain fundamental security challenges. As the first line of defense, authentication mechanisms must be robust, since vulnerabilities at this stage can expose the entire system to serious threats. To address these challenges, numerous authentication schemes based on cryptographic primitives, including Elliptic Curve Cryptography (ECC), have been proposed. In this paper, we present a comprehensive security analysis of an ECC-based three-factor authentication protocol proposed by Yuan et al. Our analysis shows that the protocol is vulnerable to desynchronization, user impersonation, traceability, and insider attacks, all of which succeed with probability 1 by exploiting at most two protocol phases. To mitigate these weaknesses, we propose an improved authentication scheme, called HEIoT. The proposed scheme is formally analyzed under the Real-or-Random (RoR) model to establish session-key security and is further verified using the Scyther tool. Moreover, a Python-based implementation is provided to demonstrate the practicality of the proposed protocol. Comparative results indicate that HEIoT achieves stronger security while maintaining acceptable communication, computational, and storage overhead. | 10.1109/TNSM.2026.3702041 |
| Guangxia Xu, Zhuo Ye, Lu Wang, Xing Huang, Lei Liu, Shahid Mumtaz, Mohsen Guizani | GNN-OSS: A Capacity-Feasible Graph Learning Framework for Secure Blockchain Sharding in IIoT | 2026 | Early Access | Graph neural networks Sharding Information rates Throughput Industrial Internet of Things Learning (artificial intelligence) Robustness Topology Modeling Security Industrial Internet of Things (IIoT) Blockchain Sharding Network Resource Management Graph Neural Networks (GNNs) Trust Management | Effective scaling of blockchain-enabled Industrial Internet of Things (IIoT) requires sharding that simultaneously ensures transaction locality, strict committee-size feasibility, and robustness against malicious node concentration. Existing methods often fail to balance this trilemma, risking either infeasible deployments or increased shard-takeover vulnerabilities. To address this, we propose GNN-OSS, a deployable sharding framework that decouples topology-aware preference learning from hard constraint enforcement. It first employs a trust-repulsion graph neural network to learn locality-aware preferences while discouraging low-trust nodes from collapsing into the same representation region. A Post-Hoc Capacity-Constrained Projection (PH-CCP) then maps these soft preferences into strictly feasible shard assignments. Finally, an entropy-driven Over-lapping Sparse Scheme (OSS) selectively replicates boundary nodes to reduce residual cross-shard overhead without altering primary consensus membership. Evaluations demonstrate that, under the evaluated settings, GNN-OSS achieves a favorable performance–security trade-off. Against 20% malicious nodes, it substantially mitigates shard-takeover risks. Furthermore, it improves throughput by up to 33% over strictly feasible baselines and lowers the cross-shard ratio from 6.4% to 4.4% with minimal per-epoch overhead. Overall, GNN-OSS provides a practical sharding framework for open or hybrid blockchain-enabled IIoT environments. | 10.1109/TNSM.2026.3709024 |
| Daishi Kondo, Yuya Shibuya, Rie S. Yamaguchi, Tomohiro Ishihara, Yuji Sekiya, Toshiyuki Nakata, Tohru Asami | Assessing the Adoption of Email Security Measures After Google’s New Sender Guidelines | 2026 | Early Access | Electronic mail Security Modeling Internet Search engines Companies Guidelines Recording Educational institutions Business DKIM DMARC Email authentication Internet measurements Security protocol adoption SPF | The email sender guidelines introduced by Google on October 3, 2023, mandate authentication protocols like Sender Policy Framework (SPF), DomainKeys Identified Mail (DKIM), and Domain-based Message Authentication, Reporting, and Conformance (DMARC) to enhance email security. However, how such platform-driven policies can effectively promote the adoption of security measures across the global email ecosystem remains unclear. In this measurement study, we analyze the impact of these guidelines by examining the adoption of email security measures across globally popular domains and country-specific subsets. Our results show that the adoption of SPF, DKIM, and DMARC has not yet achieved widespread uptake and exhibits significant regional disparities. In particular, domains associated with China, South Korea, and Japan exhibit consistently low adoption rates. While low adoption in China and South Korea can be partially explained by Gmail’s limited influence in these countries, Japan presents a striking contradiction, with low adoption persisting despite Google’s dominance. Focusing on Japanese-stock market-listed companies, we observe a significant increase in DMARC adoption following the introduction of the guidelines; however, a substantial proportion of entities remain non-compliant. These findings suggest that platform-driven policies alone are insufficient to achieve widespread security adoption and highlight the need for broader, ecosystem-level, multi-stakeholder initiatives. | 10.1109/TNSM.2026.3707567 |
| Emilio Paolini, Andrea Pinto, Luca Valcarenghi, Flavio Esposito | Programmable In-Network Aggregation for Communication-Aware Federated Learning in 5G RANs | 2026 | Early Access | Modeling Timing Training Federated learning Accuracy 5G mobile communication Convergence Aggregates Labeling Point cloud compression Federated Learning Mobile Networks Wireless In-Network Aggregation Grouping | Federated Learning (FL) enables collaborative model training without sharing raw data, making it attractive for privacy-preserving applications at the wireless edge. However, when executed over real 5G networks, FL performance degrades due to uplink congestion, heterogeneous client capabilities, and intermittent connectivity. Most existing approaches attempt to mitigate these issues indirectly by optimizing clients (through adaptive participation, local training, or selection strategies) or by optimizing models (via pruning, quantization, or compression), but they ignore potential network bottlenecks. This paper introduces FLAG, an FL architecture that embeds innetwork aggregation directly into 5G gNodeBs, transforming the network into an active participant in the learning process. In particular, FLAG performs parameter aggregation at line rate within the 5G Service Data Adaptation Protocol layer and incorporates three mechanisms: Partial-Contribution Correction for loss-tolerant averaging, a timer-driven pipeline for real-time scheduling, and a deadline-based grouping strategy to mitigate stragglers. Experiments with realistic wireless emulation show that FLAG achieves up to 5.1× faster time-to-accuracy and maintains accuracy within 0.8% of a loss-free baseline, while reducing gNB-to-server bandwidth by aggregating pergNB rather than per-client. FLAG requires no modifications to clients or the parameter server, demonstrating how 5G-aware system design can make federated learning scalable, efficient, and resilient under real-world wireless conditions. | 10.1109/TNSM.2026.3697723 |
| Gergely Dobreff, Nóra Szlovencsák, Alija Pašić | A Framework for Disaster-Tolerant Slice Placement in Future Networks | 2026 | Early Access | Costing Costs Codes Routing Modeling Joining processes Bandwidth Encoding Network slicing Delays network slicing resiliency placement resource allocation service function chaining (SFC) ILP heuristic | Autonomous vehicles and telesurgery are placing increasing pressure on network operators to ensure that 5G and beyond networks can support a wide range of services with diverse and stringent requirements. Technologies such as Software-Defined Networking (SDN), Network Function Virtualization (NFV), and network slicing are key enablers for building an ecosystem capable of meeting these demanding conditions. Ensuring not only classical Quality of Service (QoS) metrics but also network resiliency is crucial, as failures in shared infrastructures can severely impact critical services. This paper addresses the problem of resilient network slice placement under arbitrary disasters or attacks, modeled as Shared Risk Link Group (SRLG) failure patterns. We propose an approach that guarantees strict end-to-end delay, bandwidth, and computing requirements while minimizing overall resource usage by accounting for potential failure scenarios. To this end, we introduce a Disaster-Tolerant Slice Placement Framework that enables network operators to define their own resilience scenarios and optimize the network accordingly. Several - routing and network coding–based - strategies are proposed and analyzed. We formulate the problem as an Integer Linear Program (ILP), analyze its computational complexity, and develop efficient heuristic algorithms to obtain near-optimal solutions. Extensive simulations demonstrate the effectiveness of the proposed methods in achieving resource-efficient and resilient network slice placement. The results show that high levels of resiliency can be achieved without excessive over-provisioning, positioning the proposed framework as an effective offline planning and benchmarking tool for 5G and beyond network design. | 10.1109/TNSM.2026.3706661 |
| Juan Zhang, Yangjun Ma, Xunzheng Zhang, Zhao Huang, Qiuji Yi, Nauman Aslam | Multi-objective SFC Placement with Future Demand Awareness in Dynamic Cross-Domain Networks | 2026 | Early Access | Modeling Optimization Transformers Topology Resource management Tin Modules (abstract algebra) Availability Service function chaining Scalability Service function chaining cross-domain networks multi-objective optimization resource allocation predictive modeling | Efficient service function chain (SFC) placement is critical for optimizing network service delivery in dynamic cross-domain networks (CDNs), especially under resource-constrained and heterogeneous environments. However, existing approaches face fundamental limitations in achieving effective multi-objective optimization, particularly in balancing latency minimization with efficient resource utilization. These challenges are further compounded by the inability to capture future resource dynamics and limited visibility across multiple domains. To address these challenges, we propose a novel multi-objective framework for SFC placement that jointly considers latency and resource utilization. The framework integrates Transformer-based prediction with linear programming (LP) to explicitly model future deployability, enabling proactive and globally informed placement decisions. In addition, a dynamic modeling mechanism is developed using domain-aware detection and graph autoencoders (GAEs) to capture evolving network topologies and cross-domain structural dependencies. A Pareto-based optimization strategy is further employed to systematically balance latency and resource efficiency across heterogeneous domains and varying workload conditions. Extensive experiments across multiple network scales and diverse SFC configurations demonstrate that the proposed framework achieves a superior trade-off between latency and deployment capability, while improving scalability, robustness, and long-term resource efficiency in dynamic and large-scale CDN environments. | 10.1109/TNSM.2026.3708714 |
| Yiyang Li, Wei Wang, Yibo Wang, Qiaojun Hu, Weiliang Zhang, Yongli Zhao, Xiaoyu Wang, Jie Zhang | Computing-State Driven Proactive Congestion Control for AI Cluster Interconnect Networks | 2026 | Early Access | Timing Modeling Fluid flow Information rates Throughput Switches Training Data centers Conferences Joining processes large language model remote direct memory access congestion control algorithms distributed training | The rapid upgrade of computing power and the prosperity of large language model (LLM) in data center networks (DCNs) lead to a rigorous demand for ultra-low latency and high throughput. To mitigate the overhead of collective communication during distributed training (DT), Remote Direct Memory Access (RDMA) has been widely adopted in DCNs. Particularly, congestion control algorithms (CCAs) designed for RDMA have attracted much attention to mitigate performance deterioration under network congestion. However, through comprehensive analysis, we investigate that, due to sluggish end-to-end reaction and slow rate convergence, existing widely used reactive CCAs have several limitations in handling bursty traffic (e.g., AllReduce). Specifically, excessive packets are transmitted before senders activate the reaction and converge to the fair rate, which builds up a deep queue and may incur subsequent significant throughput loss. In this paper, we propose a computing-state driven proactive congestion control (CSPCC) with easy deployability. CSPCC consists of the congestion prediction module and the active congestion response module. It leverages current computing state to predict network congestion time and inform corresponding sources in advance. We provide a detailed introduction to the implementation of CSPCC. Then, we conducted small-scale hardware tests and large-scale simulations to evaluate the performance of CSPCC. On our testbed, under NCCL-TESTs, CSPCC improves throughput by 1.67%–13.35% and decreases switch queue occupancy by 28.33%–58.33% compared to DCQCN. Furthermore, under concurrent multi-job LLaMA training, it reduces end-to-end job completion time (JCT) by 5.3%–9.0%. | 10.1109/TNSM.2026.3705429 |
| Ashiqur Rahaman Ridoy, Arnab Kumar Biswas | Adaptive Intrusion Detection Systems: Leveraging Meta-Learning for Improved Cybersecurity | 2026 | Early Access | Modeling Fluid flow Labeling Accuracy Metalearning Learning (artificial intelligence) Training Timing Machine learning Optimization Intrusion Detection Systems Low-Shot Learning Anomaly Detection Network Security Metric-Based Adaptation | In the evolving landscape of cybersecurity, the integration of machine learning (ML) into Intrusion Detection Systems (IDS) has become critical for detecting both known and unknown attacks. This paper proposes a novel multi-stage hybrid IDS framework combining unsupervised anomaly detection, supervised classification, and low-shot adaptation for enhanced resilience to concept drift. The architecture comprises three interconnected stages: Stage 1 (unsupervised anomaly gating) and Stage 2 (supervised taxonomy learning) operate in parallel on a shared harmonized feature space; Stage 3 (Hybrid Low-Shot Adapter (H-LSA)) performs low-shot adaptation when the Stage 1 trigger fires, using transferred Stage 2 weights and a prototype-based cosine-kNN jury. Within the meta-learning family, we instantiate a metric-based low-shot adaptation approach eschewing second-order Model-Agnostic Meta-Learning (MAML) in favor of a partial-freeze, first-order protocol with a prototype-based cosine-kNN jury to enable rapid, low-resource adaptation. Extensive experiments were conducted on the CICIDS2017 (Source), CSECIC-IDS2018 (Target), and the modern BCCC-cPacket-Cloud-DDoS-2024 (Target) datasets (hereafter referred to as BCCC-2024). The results demonstrate that while static Stage 2 models suffer catastrophic failure under concept drift (dropping to 45.36% and 38.32% accuracy on CICIDS2018 and harmonized BCCC-2024, respectively), the proposed framework successfully adapts to new environments, achieving 90.64% accuracy on CICIDS2018 (Macro-F1: 0.8981) and 89.70% on BCCC-2024 (Macro-F1: 0.8801) with a low-resource support set of only 500 labeled samples per class. Furthermore, the system exhibits high computational efficiency, achieving a Stage 3 adapted inference latency between 0.0786 ms and 0.1667 ms per flow across diverse traffic profiles, proving its suitability for real-time, scalable deployment in modern cloud and edge network infrastructures. | 10.1109/TNSM.2026.3706597 |
| Behrooz Farkiani, Fan Liu, Ke Yang, John DeHart, Jyoti Parwatikar, Patrick Crowley | Hermes: A General-Purpose Proxy-Enabled Networking Architecture | 2026 | Early Access | Tunneling HTTP Joining processes Planing IP networks Internet TCP Architecture Computer architecture Servers Overlay Networking Proxy HTTP Architecture Tunneling Service Delivery MASQUE NDN Envoy | We introduce Hermes, a general-purpose networking architecture that aims to improve service delivery over the Internet. Hermes delegates networking responsibilities from applications and services to proxies and is designed as a portable, adaptable solution to four fundamental challenges of efficient service delivery over the Internet: end-to-end traffic management, backward compatibility, data-plane security and privacy models, and adaptable communication layers. The design centers on an overlay of reconfigurable proxies and HTTP tunneling and proxying techniques, utilizing assisting components to extend proxy functionality when needed. Through prototyping and emulation, we demonstrate that Hermes improves key performance metrics across multiple use cases: it provides backward compatibility through protocol translation and tunneling, improves reliability by delegating retry logic to proxies, enables unified policy-based Layer 3 routing across network segments, and serves as an efficient substrate for future architectures like NDN, facilitating their operation over the Internet. Beyond evaluating Hermes across various use cases, we measured the overhead of Hermes’ HTTP tunneling and proxying mechanisms and found it to be modest, typically under 2 ms per proxy pair traversal in an isolated collocated setup. Although the HTTP proxying and tunneling techniques used by Hermes increase single-connection processing overhead, we also show that, with up to 1,000 concurrent requests, proxies can amortize connection setup time and reduce end-to-end latency by utilizing connection pooling and multiplexing. | 10.1109/TNSM.2026.3705327 |
| Jing Zhang, Chao Luo, Rui Shao | MTG-GAN: A Masked Temporal Graph Generative Adversarial Network for Cross-Domain System Log Anomaly Detection | 2026 | Early Access | Anomaly detection Adaptation models Generative adversarial networks Feature extraction Data models Load modeling Accuracy Robustness Contrastive learning Chaos Log Anomaly Detection Generative Adversarial Networks (GANs) Temporal Data Analysis | Anomaly detection of system logs is crucial for the service management of large-scale information systems. Nowadays, log anomaly detection faces two main challenges: 1) capturing evolving temporal dependencies between log events to adaptively tackle with emerging anomaly patterns, 2) and maintaining high detection capabilities across varies data distributions. Existing methods rely heavily on domain-specific data features, making it challenging to handle the heterogeneity and temporal dynamics of log data. This limitation restricts the deployment of anomaly detection systems in practical environments. In this article, a novel framework, Masked Temporal Graph Generative Adversarial Network (MTG-GAN), is proposed for both conventional and cross-domain log anomaly detection. The model enhances the detection capability for emerging abnormal patterns in system log data by introducing an adaptive masking mechanism that combines generative adversarial networks with graph contrastive learning. Additionally, MTG-GAN reduces dependency on specific data distribution and improves model generalization by using diffused graph adjacency information deriving from temporal relevance of event sequence, which can be conducive to improve cross-domain detection performance. Experimental results demonstrate that MTG-GAN outperforms existing methods on multiple real-world datasets in both conventional and cross-domain log anomaly detection. | 10.1109/TNSM.2026.3654642 |
| Huijuan Zhu, Chenhao Zheng, Zhongyuan Liu, Yuan Zhang | Reliable Interpretations of Deep Learning-based Malware Detectors via Deep Q-Networks | 2026 | Early Access | Malware Signal detection Modeling Application programming interfaces Operating systems Androids Training Detectors Probability Conferences Android Malware detection Interpretation Deep Q-Networks | Deep learning has become widely used in Android malware detection, but its black-box nature raises trust concerns, limiting its use in critical security areas. To address this, various interpretation methods have been proposed. Unfortunately, these solutions often suffer from inconsistent results and poor adaptability to model updates. In this work, we propose XDQNMal, a Deep Q-Networks (DQN)-based global interpretation framework designed to uncover the critical features that drive decisions in deep learning-based malware detectors. To enhance the reliability of interpretation, XDQNMal captures API call frequency features derived from the runtime behavior of each application (App). Then, it unites a DQN model with the TabPFN detection model to work collaboratively, using variations in detection results as reward signals. These signals guide the DQN model to gradually identify the most impactful features as interpretations for the detection model’s decisions. Our experimental evaluation on real-world datasets demonstrates that the proposed XDQNMal framework generates reliable interpretation for deep learning-based malware detection models. For instance, suppressing the critical features identified by XDQNMal leads to an average decrease of 20.30% in the probability that the malicious sample is predicted as malicious, highlighting the pivotal role these features play in the model’s decision-making. | 10.1109/TNSM.2026.3699408 |
| Jiayi Liu, Jinshuo Wang, Yizhi Huang, Chen Wang | LLM Deployment Strategies on Mobile Edge Servers for Dynamic Uncertain User Requests | 2026 | Early Access | Modeling Large language models Internet of Things Timing Training Algorithms Costing Costs Delays Optimization LLM Agent MEC IoT LLM deployment Task offloading | Leveraging on the task planning and solving capability of pretrained Large Language Models (LLMs), deploying LLM agents on Mobile Edge Computing (MEC) edge servers brings significant benefits for an Internet of Things (IoT) network for providing enhanced AI intelligence with acceptable delay. In this work, we consider the edge LLMs deployment strategy in an end-edge-cloud LLM agents system for the IoT services, which jointly determines the locations and number of LLM initializations and user requests offloading strategy in a dynamic network environment with stochastic user requests. We formulate this joint LLM Deployment and inference Tasks Offloading (LLMDTO) problem. Typically, we design an LLM service performance evaluation mechanism by measuring its processing delay with stochastic user requests arrivals by Stochastic Network Calculus (SNC). Due to the complexity of the LLMDTO problem, we decompose this joint optimization problem into two subproblems and propose an algorithm based on Multi Agent Deep Reinforcement Learning (MADRL) scheme. To accelerate the training process of the DRL, a reward model is designed by applying the Kolmogorov Arnold Networks (KAN) to return a fast reward estimation. Finally, we validate the proposed algorithm through extensive simulations and results show the effectiveness of the proposition on lower deployment cost and delay in a dynamic network environment. | 10.1109/TNSM.2026.3708677 |