BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention.

BiCrossMamba-ST

Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention.

by Yassine El Kheir

Voice synthesis technology has reached a point where AI can create convincing imitations of anyone’s voice from just a few minutes of audio. This poses serious risks to:

Banking systems that rely on voice authentication
Legal proceedings where audio evidence is crucial
Personal security against impersonation attacks

and many more

Traditional detection methods are struggling to keep pace with these rapidly evolving deepfake techniques.

Yassine El Kheir (I will link his profile in the name) and other researchers (should I include names) from the Speech and Language Technology, DFKI, Germany and Technical University of Berlin have developed BiCrossMamba-ST, a novel detection system that outperforms existing methods by substantial margins.

The breakthrough lies in its dual-perspective analysis: By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMamba-ST effectively captures the subtle cues of synthetic speech.

The performance improvements are striking:

28.2% fewer parameters while maintaining superior accuracy
67.74% and 26.3% improvement over other state-of-the-art models like AASIST on the ASVspoof19 and ASVspoofDF21 benchmark datasets, respectively

and then attach paper (link or PDF)

BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention.

BiCrossMamba-ST

Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention.

Subscribe to our Newsletter

Kommentar verfassen Kommentieren abbrechen