Main Article Content

Abstract

The proliferation of Internet of Things (IoT) devices was systematically investigated, revealing pervasive energy
constraints that critically limit operational reliability, sustainability, and scalability across billions of predominantly battery
powered deployments projected to exceed 18.8 billion connected units by 2024. Conventional energy optimization
techniques—including static duty-cycling protocols, fixed transmission power configurations, and rule-based heuristics—
were determined inadequate for adapting to real-time variations in traffic loads, channel quality fluctuations, interference
patterns, and heterogeneous quality of service (QoS) requirements characteristic of diverse applications ranging from
real-time healthcare monitoring to periodic environmental sensing, resulting in excessive energy expenditure,
accelerated battery depletion, elevated packet loss rates, increased end-to-end latency, and overall network
performance degradation. Q-learning reinforcement learning was implemented within a comprehensive simulated
wireless sensor network environment comprising 10-100 battery-constrained nodes periodically transmitting
environmental data to a central gateway, with the optimization problem rigorously formulated as a Markov Decision
Process (MDP). Autonomous RL agents were enabled to optimize critical operational parameters including transmission
power levels (0-20 dBm), dynamic sampling frequencies (1-10 Hz), and adaptive sleep-wake intervals (10 seconds to 5
minutes) through comprehensive state representations encompassing residual battery capacity (1000-5000 mAh),
current traffic density, channel quality indicators (SNR, packet error rate), and application-specific QoS constraints.
Extensive simulations were conducted using established NS-3 network simulator and Python-based reinforcement
learning frameworks (Gym, Stable Baselines3) across diverse operational scenarios varying node density, traffic
patterns, and environmental conditions, executing 10,000 episodes per configuration with optimized hyperparameters
(learning rate α=0.1, discount factor γ=0.95, ε-greedy exploration decay). Results demonstrated statistically significant
superiority over fixed 30% duty-cycling and static power baselines: 35.7-42.3% reductions in total energy consumption
(721-892 mJ/node), 68% extension of network lifetime (239 hours until 20% node failure), packet delivery ratios
exceeding 91%, and marked improvements in energy-per-bit efficiency, validated through rigorous statistical analysis
(t-test/ANOVA, p<0.001). Action attribution analysis identified sleep-wake optimization (41% contribution), dynamic
power control (37%), and sampling rate adjustments (22%) as primary efficiency drivers, positioning reinforcement
learning as a transformative paradigm for sustainable large-scale IoT ecosystems while highlighting needs for lightweight
Deep RL variants and edge deployment optimizations to address training computational overhead.

Keywords

Reinforcement LearningIoT Energy OptimizationQ-learningWireless Sensor NetworksMarkov Decision Process

Article Details