Digital manipula on tools like deepfakes have ad vanced in sophis ca on because to the quick development of deep learning and ar ficial intelligence. Face swapping, in which one person’s face is swapped out for another, is one of the most alarming types of deepfakes. This technique produces incredibly lifelike movies that may deceive viewers. Detec ng these manipulated videos is crucial to mi ga ng their nega ve impact on privacy and security. This paper proposes an ensemble approach to detec ng face swap deepfakes by combining the Swin Transformer and Bidirec onal Long Short-Term Memory (BiLSTM) with an a en on mechanism. The Swin Transformer is employed for spa al feature extrac on, while the BiLSTM captures temporal pa erns between frames, and the a en on mechanism focuses on the most relevant mesteps. The model is evaluated on the FaceForensics++ dataset, achieving a valida on accuracy of 93.81% with a valida on loss of 0.19, outperforming the Long Short-Term Memory (LSTM), Fully Convolu onal Network (FCN