BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention.

BiCrossMamba-ST

Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention.

by Yassine El Kheir

Voice synthesis technology has reached a point where AI can create convincing imitations of anyone’s voice from just a few minutes of audio. This poses serious risks to:
 
  • Banking systems that rely on voice authentication
  • Legal proceedings where audio evidence is crucial
  • Personal security against impersonation attacks
and many more
 
Traditional detection methods are struggling to keep pace with these rapidly evolving deepfake techniques.
 
Yassine El Kheir (I will link his profile in the name) and other researchers (should I include names) from the Speech and Language Technology, DFKI, Germany and Technical University of Berlin have developed BiCrossMamba-ST, a novel detection system that outperforms existing methods by substantial margins.
 
The breakthrough lies in its dual-perspective analysis: By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMamba-ST effectively captures the subtle cues of synthetic speech.
 
The performance improvements are striking:
 
  • 28.2% fewer parameters while maintaining superior accuracy
  • 67.74% and 26.3% improvement over other state-of-the-art models like AASIST on the ASVspoof19 and ASVspoofDF21 benchmark datasets, respectively
and then attach paper (link or PDF)
Contact_email

Subscribe to our Newsletter

An welchen Arten von AI-Agenten sind Sie interessiert?

Mit der Anmeldung zu unserem Newsletter stimmen Sie unserer Datenschutzrichtlinie zu .

Kommentar verfassen

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert