FAULT TOLERANCE PREDICTION IN DISTRIBUTED SYSTEMS USING GRU, TCN AND LSTM
DOI:
https://doi.org/10.22452/Keywords:
LSTM, Fault tolerance, TCN, Distributed system, GRU, Deep learning technquesAbstract
Disruptions in distributed systems can cause widespread failures and downtime which costs the company and lowers productivity. A recovery and fault tolerance system is essential because distributed system’s complexity, unpredictability, and inner workings exacerbate failures. Predictive analytics has emerged in fault management, helping firms to move from reactive to proactive fault management. In this research, we examine the achievements of intelligent fault prediction approach that employs Gated Recurrent Units (GRU), Temporal Convolutional Networks (TCN), and Long Short-Term Memory (LSTM) networks to improve fault tolerance in distributed systems. The GRU and LSTM models can describe temporal, sequential data for a distributed system and identify and comprehend data changes. By employing these designs and TCNs, companies may better recognize fault patterns that emerge over time, forecast future failures, and increase fault management efficiency. TCNs identify long-range time dependencies and allow parallel processing to swiftly discover and respond to defects in large-scale settings using causal and dilated convolutions. This framework uses deep learning models to examine system log and resource consumption data to identify probable failure symptoms and accurately predict future problems. The experimental results show that combining GRU, LSTM, and TCN models improves fault prediction accuracy and reduces unexpected downtime by identifying faults quickly and taking preventative action. Additionally, since the framework continuously collects data from the distributed system and monitors the logged information in real time, users can make better decisions and implement proactive responses to failures, proving that predictive analytics driven by deep learning technology increases intelligent fault tolerance in distributed systems. With 95% accuracy, 93% precision, 95% recall, 94% F1-score, and 0.97 ROC-AUC, the GRU + LSTM + TCN model outperforms all single and dual-model configurations. With a +7% accuracy boost over GRU, +5% over LSTM, and +4% over TCN, multi-model temporal feature fusion is beneficial for fault prediction.

