Last updated: 2024-12-10 04:01 UTC
All documents
Number of pages: 131
Author(s) | Title | Year | Publication | Keywords | ||
---|---|---|---|---|---|---|
Jin Ye, Tiantian Yu, Zhaoyi Li, Jiawei Huang | SAR: Receiver-driven Transport Protocol with Micro-burst Prediction in Data Center Networks | 2024 | Early Access | Bandwidth Transport protocols Switches Data centers Receivers Throughput Queueing analysis Data center receiver-driven mix-flows transport protocol | In recent years, motivated by new datacenter applications and the well-known shortcomings of TCP in data center, many receiver-driven transport protocols have been proposed to provide ultra-low latency and zero packet loss by using the proactive congestion control. However, in the scenario of mixed short and long flows, the short flows with ON/OFF pattern generate micro-burst traffic, which significantly deteriorates the performance of existing receiver-driven transport protocols. Firstly, when the short flows turn into ON mode, the long flows cannot immediately concede bandwidth to the short ones, resulting in queue buildup and even packet loss. Secondly, when the short flows change from ON to OFF mode, the released bandwidth cannot be fully utilized by the long flows, leading to serious bandwidth waste. To address these issues, we propose a new receiver-driven transport protocol, called SAR, which predicts the micro burst generated by short flows and adjusts the sending rate of long flows accordingly. With the aid of micro-burst prediction mechanism, SAR mitigates the bandwidth competition due to the arrival of short flows, and alleviates the bandwidth waste when the short flows leave. The testbed and NS2 simulation experiments demonstrate that SAR reduces the average flow completion time (AFCT) by up to 66% compared to typical receiver-driven transport protocols. | 10.1109/TNSM.2024.3450597 |
Ankur Mudgal, Abhishek Verma, Munesh Singh, Kshira Sagar Sahoo, Erik Elmroth, Monowar Bhuyan | FloRa: Flow Table Low-Rate Overflow Reconnaissance and Detection in SDN | 2024 | Early Access | Flora Denial-of-service attack Protocols Feature extraction Control systems Degradation Prevention and mitigation SDN low-rate attack flow table overflow | SDN has evolved to revolutionize next-generation networks, offering programmability for on-the-fly service provisioning, primarily supported by the OpenFlow (OF) protocol. The limited storage capacity of Ternary Content Addressable Memory (TCAM) for storing flow tables in OF switches introduces vulnerabilities, notably the Low-Rate Flow Table Overflow (LOFT) attacks. LOFT exploits the flow table’s storage capacity by occupying a substantial amount of space with malicious flow, leading to a gradual degradation in the flow-forwarding performance of OF switches. To mitigate this threat, we propose FloRa, a machine learning-based solution designed for monitoring and detecting LOFT attacks in SDN. FloRa continuously examines and determines the status of the flow table by closely examining the features of the flow table entries. When suspicious activity is identified, FloRa promptly activates the machine-learning based detection module. The module monitors flow properties, identifies malicious flows, and blacklists them, facilitating their eviction from the flow table. Incorporating novel features such as Packet Arrival Frequency, Content Relevance Score, and Possible Spoofed IP along with Cat Boost employed as the attack detection method. The proposed method reduces CPU overhead, memory overhead, and classification latency significantly and achieves a detection accuracy of 99.49% which is more than the state-of-the-art methods to the best of our knowledge. This approach not only protects the integrity of the flow tables but also guarantees the uninterrupted flow of legitimate traffic. Experimental results indicate the effectiveness of FloRa in LOFT attack detection, ensuring uninterrupted data forwarding and continuous availability of flow table resources in SDN. | 10.1109/TNSM.2024.3446178 |
Kumar Prateek, Soumyadev Maity, Neetesh Saxena | QSKA: A Quantum Secured Privacy-Preserving Mutual Authentication Scheme for Energy Internet-Based Vehicle-to-Grid Communication | 2024 | Early Access | Privacy-Preserving Authentication Security Threats Vehicle-to-Grid | Energy Internet is well-known nowadays for enabling bidirectional V2G communication; however, with communication and computation abilities, V2G systems become vulnerable to cyber-attacks and unauthorised access. An authentication protocol verifies the identity of an entity, establishes trust, and allows access to authorized resources while preventing unauthorized access. Research challenges for vehicle-to-grid authentication protocols include quantum security, privacy, resilience to attacks, and interoperability. The majority of authentication protocols in V2G systems are based on public-key cryptography and depend on some hard problems like integer factorization and discrete logs to guarantee security, which can be easily broken by a quantum adversary. Besides, ensuring both information security and entity privacy is equally crucial in V2G scenarios. Consequently, this work proposes a quantum-secured privacy-preserving key authentication and communication (QSKA) protocol using superdense coding and a hash function for unconditionally secure V2G communication and privacy. QSKA uses a password-based authentication mechanism, enabling V2G entities to securely transfer passwords using superdense coding. The QSKA security verification is performed in proof-assistant Coq. The security analysis and performance evaluation of the QSKA show its resiliency against well-known security attacks and reveal its enhanced reliability and efficiency with respect to state-of-the-art protocols in terms of computation, communication, and energy overhead. | 10.1109/TNSM.2024.3445972 |
Cong Zhou, Baokang Zhao, Fengxiao Tang, Biao Han, Baosheng Wang | Dynamic Multi-objective Service Function Chain Placement Based on Deep Reinforcement Learning | 2024 | Early Access | Optimization Heuristic algorithms Vectors Training Deep reinforcement learning Computational modeling Service function chaining service function chain placement multi-objective deep reinforcement learning related zone decomposition | Service function chain placement is crucial to support services flexibility and diversity for different users and vendors. Specifically, this problem is proved to be NP-hard. Existing deep reinforcement learning based methods either can only handle a limited number of objectives, or their training time are too long. Concomitantly, they are unable to satisfy when the number of objectives is dynamic. It is necessary to model service function chain placement as a multi-objective problem. The multi-objective problem can decomposed into multiple subproblems by the weight vectors. In this paper, we first reveal the relationship between weight vectors and solution position, which can reduce the training time to gain a better placement model. Then, we design a novel algorithm for the service function chain placement problem, called rzMODRL. The weight vectors are divided into zones for training in parallel, and the order is defined for the final models located at the end of a training process, which can save time and improve the quality of the model. Dynamic objective placement method is based on the high-dimensional model to avoid retraining for a low-dimensional placement. Evaluation results show that the proposed algorithms improve the service acceptance ratio up to 32% and the hyper-volume values with 14% in the multi-objective service function chain placement, where hyper-volume has been widely applied to evaluate the convergence and diversity simultaneously in multi-objective optimization. And it is also effective in solving the dynamic objective service function chain placement problem that the difference of average hyper-volume values is 10.44% | 10.1109/TNSM.2024.3446248 |
Mahsa Raeiszadeh, Amin Ebrahimzadeh, Roch H. Glitho, Johan Eker, Raquel A. F. Mini | Real-Time Adaptive Anomaly Detection in Industrial IoT Environments | 2024 | Early Access | Anomaly detection Industrial Internet of Things Real-time systems Concept drift Predictive models Autoregressive processes Accuracy Anomaly Detection Real-time Analytics Concept Drift Streaming Data Industrial Internet of Things (IIoT) | To ensure reliability and service availability, nextgeneration networks are expected to rely on automated anomaly detection systems powered by advanced machine learning methods with the capability of handling multi-dimensional data. Such multi-dimensional, heterogeneous data occurs mostly in todays Industrial Internet of Things (IIoT), where real-time detection of anomalies is critical to prevent impending failures and resolve them in a timely manner. However, existing anomaly detection methods often fall short of effectively coping with the complexity and dynamism of multi-dimensional data streams in IIoT. In this paper, we propose an adaptive method for detecting anomalies in IIoT streaming data utilizing a multi-source prediction model and concept drift adaptation. The proposed anomaly detection algorithm merges a prediction model into a novel drift adaptation method resulting in accurate and efficient anomaly detection that exhibits improved scalability. Our trace-driven evaluations indicate that the proposed method outperforms the state-of-theart anomaly detection methods by achieving up to an 89.71 accuracy (in terms of Area under the Curve (AUC)) while meeting the given efficiency and scalability requirements. | 10.1109/TNSM.2024.3447532 |
Tailai Song, Gianluca Perna, Paolo Garza, Michela Meo, Maurizio Matteo Munafò | Packet Loss in Real-Time Communications: Can ML Tame its Unpredictable Nature? | 2024 | Early Access | Packet loss Quality of experience Task analysis Retina Pandemics Machine learning Visualization Real-time communications RTP packet loss machine learning prediction classification | Due to the flourishing development of networks, and abetted by the Covid-19 pandemic, we have witnessed an exponential surge in the global proliferation of Real-Time Communications (RTC) applications in recent years. In light of this, the necessity for robust, scalable, and intelligent network infrastructures and technologies has become increasingly apparent. Among the principal challenges encountered in RTC lies the issue of packet loss. Indeed, the occurrence of losses leads to communication degradation and reallocation that adversely affect the Quality of Experience (QoE). In this paper, we investigate the feasibility of predicting packet loss phenomena through the utilization of machine learning techniques, solely based on statistics derived directly from packets. We provide different definitions of packet loss, subsequently focusing on the most critical scenario, which is defined as the first loss of a series. By delineating the concept of loss, we propose different problem formulations to determine whether there exists a mathematically advantageous scenario over others. To substantiate our analysis, we demonstrate that these phenomena can be correctly identified with a recall up to 66%, leveraging three ample datasets of RTC traffic, which were collected under distinct conditions at different times, further solidifying the validity of our findings. | 10.1109/TNSM.2024.3442616 |
Shigen Shen, Xuanbin Hao, Zhengjun Gao, Guowen Wu, Yizhou Shen, Hong Zhang, Qiying Cao, Shui Yu | SAC-PP: Jointly Optimizing Privacy Protection and Computation Offloading for Mobile Edge Computing | 2024 | Early Access | Privacy Task analysis Entropy Protection Wireless communication Delays Cloud computing Mobile edge computing computation offloading privacy protection deep reinforcement learning | The emergence of mobile edge computing (MEC) imposes an unprecedented pressure on privacy protection, although it helps the improvement of computation performance including energy consumption and computation delay by computation offloading. To this end, we concern about the privacy protection in the MEC system with a curious edge server. We present a deep reinforcement learning (DRL)-driven computation offloading strategy designed to concurrently optimize privacy protection and computation cost. We investigate the potential privacy breaches resulting from offloading patterns, propose an attack model of privacy theft, and correspondingly define an analytical measure to assess privacy protection levels. In pursuit of an ideal computation offloading approach, we propose an algorithm, SAC-PP, which integrates actor-critic, off-policy, and maximum entropy to improve the efficiency of learning processes. We explore the sensitivity of SAC-PP to hyperparameters and the results demonstrate its stability, which facilitates application and deployment in real environments. The relationship between privacy protection and computation cost is analyzed with different reward factors. Compared with benchmarks, the empirical results from simulations illustrate that the proposed computation offloading approach exhibits enhanced learning speed and overall performance. | 10.1109/TNSM.2024.3447753 |
Amar Rasheed, Mohamed Baza, Gautam Srivastava, Narashimha Karpoor, Cihan Varol | IoTDL2AIDS: Towards IoT-Based System Architecture Supporting Distributed LSTM Learning for Adaptive IDS on UAS | 2024 | Early Access | Training Computational modeling Adaptation models Task analysis Internet of Things Systems architecture Long short term memory UAS IDS anomaly detection IoT drone | The rapid proliferation of Unmanned Aircraft Systems (UAS) introduces new threats to national security. UAS technologies have dramatically revolutionized legitimate business operations while providing powerful weaponizing systems to malicious actors and criminals. Due to their inherited wireless capabilities, they are an easy target for cyber threats. In response to this challenge, the implementation of many Intrusion Detection Systems (IDS), which support anomaly detection on UAS, have been proposed in the past. However, such systems often require offline training with heavy processing, making them unsuitable for UAS deployment. This is pertinent for drone systems that support dynamic changes in mission operational tasks. This paper presents a novel system architecture that utilizes sensing systems capabilities available on existing IoT infrastructure for supporting rapid infield adaptive models training and parameters estimation services for UAS. We have devised a cluster-oriented distributed training algorithm based on LSTM with mini-batch gradient descent, with hundreds of IoT platforms per cluster collaboratively performing model parameters estimation tasks. The proposed architecture is based on deploying a multilayer system that facilitates secure dissemination of power consumption behavioral patterns for the flight sensing system between the UAS layer and the IoT layer. The model was implemented and deployed on a real IoT-enabled platform based on NXP-Kinetis K64 ̶ 120 MHz. Furthermore, model training and validation were performed by applying various datasets contaminated with different percentages of malicious data. Our anomaly detection model achieved high prediction accuracy with an ROC-AUC score of 0.9332. The model maintains minimal power consumption overheads and low training time during the processing of a data batch. | 10.1109/TNSM.2024.3448312 |
Sushmit Bhattacharjee, Konstantinos Alexandris, Thomas Bauschert | Multi-Domain TSN Orchestration & Management for Large-Scale Industrial Networks | 2024 | Early Access | Bridges Streams Protocols Quality of service Standards Real-time systems Resource management IEEE TSN Industry 4.0 inter-domain communication Software-Defined Networking orchestration network management CORECONF | The increasing demand for determinism in modern industrial communication, driven by Industry 4.0, has led to the development of IEEE Time-Sensitive Networking (TSN) standards. However, integrating and configuring interconnected heterogeneous industrial networks remains a significant challenge. This paper extends the novel hierarchical Software-Defined Networking (SDN) based architectural design and control plane framework introduced in the context of orchestration and management of a multi-domain TSN network. We present relevant data models and a signaling schema essential for establishing end-to-end inter-domain time-sensitive streams within the proposed architecture. A proof-of-concept implementation validates the feasibility of the framework and demonstrates its performance advantages over the peer-to-peer model. The scalability of the framework for large-scale industrial networks is verified, and it ensures secure information encapsulation among domains, enabling seamless integration of multi-vendor heterogeneous applications. Furthermore, we investigate the use of CORECONF as a lightweight alternative to NETCONF for network management of multi-domain TSN network, providing experimental results. | 10.1109/TNSM.2024.3447789 |
Ehsan Nowroozi, Nada Jadalla, Samaneh Ghelichkhani, Alireza Jolfaei | Mitigating Label Flipping Attacks in Malicious URL Detectors Using Ensemble Trees | 2024 | Early Access | Training Accuracy Classification tree analysis Uniform resource locators Data models Radio frequency Toxicology Adversarial machine learning backdoor attacks corrupted training sets cybersecurity poisoning attacks label-flipping attacks AI for Security Security for AI | Malicious URLs present significant threats to businesses, such as transportation and banking, causing disruptions in business operations. It is essential to identify these URLs; however, existing Machine Learning models are vulnerable to backdoor attacks. These attacks involve manipulating a small portion of the training data labels, such as Label Flipping, which can lead to misclassification. Therefore, it is crucial to incorporate defense mechanisms into machine-learning models to protect against such attacks. The focus of this study is on backdoor attacks in the context of URL detection using ensemble trees. By illuminating the motivations behind such attacks, highlighting the roles of attackers, and emphasizing the critical importance of effective defense strategies, this paper contributes to the ongoing efforts to fortify machine-learning models against adversarial threats within the machine-learning domain in network security. We propose an innovative alarm system that detects the presence of poisoned labels and a defense mechanism designed to uncover the original class labels with the aim of mitigating backdoor attacks on ensemble tree classifiers. We conducted a case study using the Alexa and Phishing Site URL datasets and showed that label-flipping attacks can be addressed using our proposed defense mechanism. Our experimental results prove that the Label Flipping attack achieved an Attack Success Rate between 50-65% within 2-5%, and the innovative defense method successfully detected poisoned labels with an accuracy of up to 100%. | 10.1109/TNSM.2024.3447411 |
Hui Wang, Zhenyu Yang, Ming Li, Xiaowei Zhang, Yanlan Hu, Donghui Hu | CoSIS: A Secure, Scalability, Decentralized Blockchain via Complexity Theory | 2024 | Early Access | Blockchains Protocols Security Scalability Viruses (medical) Peer-to-peer computing Bitcoin Blockchain peer-to-peer network complex network | As the origin of blockchains, the Nakamoto Consensus protocol is the primary protocol for many public blockchains (e.g., Bitcoin) used in cryptocurrencies. Blockchains need to be decentralized as a core feature, yet it is difficult to strike a balance between scalability and security. Many approaches to improving blockchain scalability often result in diminished security or compromise the decentralized nature of the system. Inspired by network science, especially the epidemic model, we try to solve this problem by mapping the propagation of transactions and blocks as two interacting epidemics, called the CoSIS model. We extend the transaction propagation process to increase the efficiency of block propagation, which reduces the number of unknown transactions. The reduction of the block propagation latency ultimately increases the blockchain throughput. The theory of complex networks is employed to offer an optimal boundary condition. Finally, the node scores are stored in the chain, so that it also provides a new incentive approach. Our experiments show that CoSIS accelerates blocks!/ propagation and TPS is raised by 20% ~ 33% on average. At the same time, the system security can be significantly improved, as an orphaned block rate is close to zero in better cases. CoSIS enhances the scalability and security of the blockchain while ensuring that all changes do not compromise the decentralized nature of the blockchain. | 10.1109/TNSM.2024.3449575 |
Xiwen Jie, Jiangping Han, Guanglei Chen, Hang Wang, Peilin Hong, Kaiping Xue | CACC: A Congestion-Aware Control Mechanism to Reduce INT Overhead and PFC Pause Delay | 2024 | Early Access | Delays Throughput Accuracy Fasteners Aggregates Data centers Convergence Data Center RDMA INT PFC | Nowadays, Remote Direct Memory Access (RDMA) is gaining popularity in data centers for low CPU overhead, high throughput, and ultra-low latency. As one of the state-ofthe-art RDMA Congestion Control (CC) mechanisms, HPCC leverages the In-band Network Telemetry (INT) features to achieve accurate control and significantly shortens the Flow Completion Time (FCT) for short flows. However, there exists redundant INT information increasing the processing latency at switches and affecting flows throughput. Besides, its end-to-end feedback mechanism is not timely enough to help senders cope well with bursty traffic, and there still exists a high probability of triggering Priority-based Flow Control (PFC) pauses under large-scale incast. In this paper, we propose a Congestion-Aware (CA) control mechanism called CACC, which attempts to push CC to the theoretical low INT overhead and PFC pause delay. CACC introduces two CA algorithms to quantize switch buffer and egress port congestion, separately, along with a fine-grained window size adjustment algorithm at the sender. Specifically, the buffer CA algorithm perceives large-scale congestion that may trigger PFC pauses and provides early feedback, significantly reducing the PFC pause delay. The egress port CA algorithm perceives the link state and selectively inserts useful INT data, achieving lower queue sizes and reducing the average overhead per packet from 42 bytes to 2 bits. In our evaluation, compared with HPCC, PINT, and Bolt, CACC shortens the average and tail FCT by up to 27% and 60.1%, respectively. | 10.1109/TNSM.2024.3449699 |
Hossein Taghizadeh, Bardia Safaei, Amir Mahdi Hosseini Monazzah, Elyas Oustad, Sahar Rezagholi Lalani, Alireza Ejlali | LANTERN:Learning-Based Routing Policy for Reliable Energy-Harvesting IoT Networks | 2024 | Early Access | Routing Internet of Things Measurement Reliability Energy efficiency Batteries Standards IoT Routing RPL Energy Harvesting Solar Reliability PDR Energy Consumption Network Lifetime | RPL is introduced to conduct path selection in Lowpower and Lossy Networks (LLN), including IoT. A routing policy in RPL is governed by its objective function, which corresponds to the requirements of the IoT application, e.g., energy-efficiency, and reliability in terms of Packet Delivery Ratio (PDR). In many applications, it is not possible to connect the nodes to the power outlet. Also, since nodes may be geographically inaccessible, replacing the depleted batteries is infeasible. Hence, harvesters are an admirable replacement for traditional batteries to prevent energy hole problem, and consequently to enhance the lifetime and reliability of IoT networks. Nevertheless, the unstable level of energy absorption in harvesters necessitates developing a routing policy, which could consider harvesting aspects. Furthermore, since the rates of absorption, and consumption are incredibly dynamic in different parts of the network, learningbased techniques could be employed in the routing process to provide energy-efficiency. Accordingly, this paper introduces LANTERN; a learning-based routing policy for improving PDR in energy-harvesting IoT networks. In addition to the rate of energy absorption, and consumption, LANTERN utilizes the remaining energy in its routing policy. In this regard, LANTERN introduces a novel routing metric called Energy Exponential Moving Average (EEMA) to perform its path selection. Based on diversified simulations conducted in Cooja, with prolonging the lifetime of the network by 5.7x, and mitigating the probability of energy hole problem, LANTERN improves the PDR by up to 97%, compared to the state-of-the-art. Also, the consumed energy per successfully delivered packet is reduced by 76%. | 10.1109/TNSM.2024.3450011 |
Dan Tang, Xiaocai Wang, Keqin Li, Chao Yin, Wei Liang, Jiliang Zhang | FAPM: A Fake Amplification Phenomenon Monitor to Filter DRDoS Attacks With P4 Data Plane | 2024 | Early Access | Switches Servers IP networks Protocols Pipelines Prevention and mitigation Logic Attack mitigation distributed reflection denial-of-service fake amplification phenomenon P4 | Distributed Reflection Denial-of-Service (DRDoS) attacks have caused significant destructive effects by virtue of emerging protocol vulnerabilities and amplification advantages, and their intensity is increasing. The emergence of programmable data plane supporting line-rate forwarding provides a new opportunity for fine-grained and efficient attack detection. This paper proposed a light-weight DRDoS attack detection and mitigation system called FAPM, which is deployed at the victim end with the intention of detecting the amplification behavior caused by the attack. It places the work of collecting and calculating reflection features on the data plane operated by ''latter window assisting former window'' mechanism, and arranges complex identification and regulation logic on the control plane. This approach avoids the hardware constraints of the programmable switch while leveraging their per-packet processing capability. Also, it reduces communication traffic significantly through feature compression and state transitions. Experiments show that FAPM has (1) fast response capability within seconds (2) a memory footprint at the KB level and communication overhead of 1 Kbps, and (3) good robustness. | 10.1109/TNSM.2024.3449889 |
Xiaohuan Li, Bitao Chen, Junchuan Fan, Jiawen Kang, Jin Ye, Xun Wang, Dusit Niyato | Cloud-Edge-End Collaborative Intelligent Service Computation Offloading: A Digital Twin Driven Edge Coalition Approach for Industrial IoT | 2024 | Early Access | Cloud computing Industrial Internet of Things Task analysis Optimization Games Computational modeling Heuristic algorithms Cloud-edge-end Digital Twin (DT) Coalition game Computation offloading | By using the intelligent edge computing technologies, a large number of computing tasks of end devices in Industrial Internet of Things (IIoT) can be offloaded to edge servers, which can effectively alleviate the burden and enhance the performance of IIoT. However, in large-scale multi-service-oriented IIoT scenarios, offloading service resources are heterogeneous and offloading requirements are mutually exclusive and time-varying, which reduce the offloading efficiency. In this paper, we propose a cloud-edge-end collaboration intelligent service computation offloading scheme based on Digital Twin (DT) driven Edge Coalition Formation (DECF) approach to improve the offloading efficiency and the total utility of edge servers, respectively. Firstly, we establish a DT model to obtain accurate digital representations of heterogeneous end devices and network state parameters in dynamic and complex IIoT scenarios. The DT model can capture time-varying requirements in a low latency manner. Secondly, we formulate two optimization problems to maximize the offloading throughput and total system utility. Finally, we convert the multiobjective optimization problems to a Stackelberg coalition game model and develop a distributed coalition formation approach to balance the two optimizing objectives. Simulation results indicate that, compared with the nearest coalition scheme and non-coalition scheme, the proposed approach achieves offloading throughput improvements of 11.5% and 148%, and enhances the overall utility by 12% and 170%, respectively. | 10.1109/TNSM.2024.3441231 |
Jinbin Hu, Zikai Zhou, Jin Zhang | Lightweight Automatic ECN Tuning Based on Deep Reinforcement Learning With Ultra-Low Overhead in Datacenter Networks | 2024 | Early Access | Tuning Degradation Throughput Heuristic algorithms Topology Servers Network topology Datacenter Network ECN Congestion Control Deep Reinforcement Learning | In modern datacenter networks (DCNs), mainstream congestion control (CC) mechanisms essentially rely on Explicit Congestion Notification (ECN) to reflect congestion. The traditional static ECN threshold performs poorly under dynamic scenarios, and setting a proper ECN threshold under various traffic patterns is challenging and time-consuming. The recently proposed reinforcement learning (RL) based ECN Tuning algorithm (ACC) consumes a large number of computational resources, making it difficult to deploy on switches. In this paper, we present a lightweight and hierarchical automated ECN tuning algorithm called LAECN, which can fully exploit the performance benefits of deep reinforcement learning with ultra-low overhead. The simulation results show that LAECN improves performance significantly by reducing latency and increasing throughput in stable network conditions, and also shows consistent high performance in small flows network environments. For example, LAECN effectively improves throughput by up to 47%, 34%, 32% and 24% over DCQCN, TIMELY, HPCC and ACC, respectively. | 10.1109/TNSM.2024.3450596 |
Xiaoyang Zhao, Chuan Wu, Xia Zhu | Dynamic Flow Scheduling for DNN Training Workloads in Data Centers | 2024 | Early Access | Training Servers Tensors Bandwidth Data centers Computer architecture Residual neural networks Machine Learning System Networking for AI Congestion Control Protocol | Distributed deep learning (DL) training constitutes a significant portion of workloads in modern data centers that are equipped with high computational capacities, such as GPU servers. However, frequent tensor exchanges among workers during distributed deep neural network (DNN) training can result in heavy traffic in the data center network, leading to congestion at server NICs and in the switching network. Unfortunately, none of the existing DL communication libraries support active flow control to optimize tensor transmission performance, instead relying on passive adjustments to the congestion window or sending rate based on packet loss or delay. To address this issue, we propose a flow scheduler per host that dynamically tunes the sending rates of outgoing tensor flows from each server, maximizing network bandwidth utilization and expediting job training progress. Our scheduler comprises two main components: a monitoring module that interacts with state-of-the-art communication libraries supporting parameter server and all-reduce paradigms to track the training progress of DNN jobs, and a congestion control protocol that receives in-network feedback from traversing switches and computes optimized flow sending rates. For data centers where switches are not programmable, we provide a software solution that emulates switch behavior and interacts with the scheduler on servers. Experiments with real-world GPU testbed and trace-driven simulation demonstrate that our scheduler outperforms common rate control protocols and representative learning-based schemes in various settings. | 10.1109/TNSM.2024.3450670 |
Marcos Carvalho, Daniel Soares, Daniel F. Macedo | QoE Estimation Across Different Cloud Gaming Services Using Transfer Learning | 2024 | Early Access | Quality of experience Cloud gaming Transfer learning Data models Quality of service Task analysis Context modeling Cloud Gaming Mobile Cloud Gaming Domain Adaptation Transfer Learning QoE Estimation Machine Learning | Cloud Gaming (CG) has become one of the most important cloud-based services in recent years by providing games to different end-network devices, such as personal computers (wired network) and smartphones/tablets (mobile network). CG services stand challenging for network operators since this service demands rigorous network Quality of Services (QoS). Nevertheless, ensuring proper Quality of Experience (QoE) keeps the end-users engaged in the CG services. However, several factors influence users’ experience, such as context (i.e., game type/players) and the end-network type (wired/mobile). In this case, Machine Learning (ML) models have achieved the state-of-the-art on the end-users’ QoE estimation. Despite that, traditional ML models demand a larger amount of data and assume that the training and test have the same distribution, which can make the ML models hard to generalize to other scenarios from what was trained. This work employs Transfer Learning (TL) techniques to create QoE estimation over different cloud gaming services (wired/mobile) and contexts (game type/players). We improved our previous work by performing a subjective QoE assessment with real users playing new games on a mobile cloud gaming testbed. Results show that transfer learning can decrease the average MSE error by at least 34.7% compared to the source model (wired) performance on the mobile cloud gaming and to 81.5% compared with the model trained from scratch. | 10.1109/TNSM.2024.3451300 |
Qianqian Wu, Qiang Liu, Wenliang Zhu, Zefan Wu | Energy Efficient UAV-Aassisted IoT Data Collection: A Graph-Based Deep Reinforcement Learning Approach | 2024 | Early Access | Autonomous aerial vehicles Data collection Task analysis Energy consumption Heuristic algorithms Energy efficiency Propulsion Data collection energy efficiency unmanned aerial vehicle (UAV) graph attention network deep reinforcement learning | With the advancements in technologies such as 5G, Unmanned Aerial Vehicles (UAVs) have exhibited their potential in various application scenarios, including wireless coverage, search operations, and disaster response. In this paper, we consider the utilization of a group of UAVs as aerial base stations (BS) to collect data from IoT sensor devices. The objective is to maximize the volume of collected data while simultaneously enhancing the geographical fairness among these points of interest, all within the constraints of limited energy resources. Therefore, we propose a deep reinforcement learning (DRL) method based on Graph Attention Networks (GAT), referred to as “GADRL”. GADRL utilizes graph convolutional neural networks to extract spatial correlations among multiple UAVs and makes decisions in a distributed manner under the guidance of DRL. Furthermore, we employ Long Short-Term Memory to establish memory units for storing and utilizing historical information. Numerical results demonstrate that GADRL consistently outperforms four baseline methods, validating its computational efficiency. | 10.1109/TNSM.2024.3450964 |
Haftay Gebreslasie Abreha, Houcine Chougrani, Ilora Maity, Youssouf DRIF, Christos Politis, Symeon Chatzinotas | Fairness-Aware VNF Mapping and Scheduling in Satellite Edge Networks for Mission-Critical Applications | 2024 | Early Access | Dynamic scheduling Satellites Delays Topology Heuristic algorithms Processor scheduling Quality of service Satellite Edge Computing Software Defined Networking (SDN) Network Function Virtualization (NFV) Virtual Network Function (VNF) VNF Scheduling Fairness | Satellite Edge Computing (SEC) is seen as a promising solution for deploying network functions in orbit to provide ubiquitous services with low latency and bandwidth. Software Defined Networks (SDN) and Network Function Virtualization (NFV) enable SEC to manage and deploy services more flexibly. In this paper, we study a dynamic and topology-aware VNF mapping and scheduling strategy within an SDN/NFV-enabled SEC infrastructure. Our focus is on meeting the stringent requirements of mission-critical (MC) applications, recognizing their significance in both satellite-to-satellite and edge-to-satellite communications while ensuring service delay margin fairness across various time-sensitive service requests. We formulate the VNF mapping and scheduling problem as an Integer Nonlinear Programming problem (), with the objective of minimax fairness among specified requests while considering dynamic satellite network topology, traffic, and resource constraints. We then propose two algorithms for solving the problem: Fairness-Aware Greedy Algorithm for Dynamic VNF Mapping and Scheduling () and Fairness-Aware Simulated Annealing-Based Algorithm for Dynamic VNF Mapping and Scheduling () which are suitable for low and high service arrival rates, respectively. Our extensive simulations demonstrate that both and approaches are very close to the optimization-based solution and outperform the benchmark solution in terms of service acceptance rates. | 10.1109/TNSM.2024.3452031 |