Audio Deepfake Detection

Detect AI-generated voices, voice cloning, and audio manipulation to prevent phone-based fraud and social engineering attacks.

Audio Deepfake Detection

Detect AI-generated voices, voice cloning, and audio manipulation to prevent social engineering attacks, CEO fraud, and phone-based identity theft.

Common Attack Vectors

Voice Cloning

AI Voice Synthesis:

  • Clone someone's voice from 3-10 seconds of audio
  • Tools: ElevenLabs, Descript, Play.ht
  • Convincing enough to fool family members
  • Used in CEO fraud, grandparent scams

How It Works:

  1. Attacker obtains voice sample (social media, voicemail, YouTube)
  2. Feed sample to text-to-speech AI model
  3. Model generates new speech in target's voice
  4. Call victim with cloned voice

Real-World Example:

  • 2019: CEO voice cloned, $243,000 stolen from UK energy company
  • 2020: Bank manager fooled by deepfake voice, transferred $35M

Text-to-Speech (TTS) Attacks

Synthetic Voice:

  • AI-generated voice (no real person)
  • Generic or custom voice profiles
  • Realistic prosody and intonation
  • Used in automated scam calls

Characteristics:

  • Unnatural speech patterns
  • Robotic transitions between words
  • Consistent tone (lack of emotion)
  • Background noise inconsistencies

Audio Splicing

Cut-and-Paste Audio:

  • Splice together real audio clips
  • Rearrange words/sentences
  • Create fake statements from real recordings
  • Detectable via frequency analysis

Background Noise Manipulation

Indicators of Fake Audio:

  • Inconsistent background noise
  • Abrupt changes in ambient sound
  • Silence where noise expected
  • Added artificial background to mask synthesis

3 Credits Per audio analysis

Detection Capabilities

1. Spectral Analysis

Analyze frequency patterns to detect AI generation.

What We Check:

  • Frequency Range: AI voices often lack full human frequency range
  • Harmonic Patterns: Unnatural harmonic structures
  • Spectral Anomalies: Artifacts specific to TTS models
  • Noise Floor: Consistent vs. natural noise patterns

Detection Example:

{
  "spectralAnalysis": {
    "suspiciousPatterns": true,
    "findings": [
      "Limited frequency range (200-3500 Hz, natural is 80-12000 Hz)",
      "Unnatural harmonic spacing",
      "Consistent noise floor (indicates synthetic generation)"
    ],
    "confidence": 87
  }
}

2. Prosody Analysis

Analyze natural speech rhythm and intonation.

Human Speech:

  • Variable pitch and tone
  • Natural pauses and emphasis
  • Emotion-driven variation
  • Breathing patterns

Synthetic Speech:

  • Consistent pitch/tone
  • Mechanical pauses
  • Lack of emotional variation
  • No breathing sounds (or artificial breathing)

Analysis:

{
  "prosodyAnalysis": {
    "pitchVariation": 12, // 0-100, low = suspicious
    "naturalPauses": false,
    "emotionalRange": 8, // Very low
    "breathingDetected": false,
    "verdict": "likely_synthetic"
  }
}

3. Voice Biometrics

Compare voice to known sample for speaker verification.

Use Case: Verify caller is who they claim to be

Process:

  1. User provides known voice sample (enrollment)
  2. Subsequent calls analysed for voice match
  3. Deepfake detection + biometric matching

Result:

{
  "voiceBiometrics": {
    "matchScore": 42, // 0-100, should be 85+ for match
    "result": "no_match",
    "reason": "Spectral characteristics differ from enrolled sample",
    "deepfakeIndicators": [
      "Synthetic voice detected",
      "Voice characteristics inconsistent with enrollment"
    ]
  }
}

4. Compression Artifacts

Detect digital manipulation through compression analysis.

Indicators:

  • Multiple compression layers (edited audio)
  • Inconsistent compression across file
  • Splice points with compression mismatch
  • Unusual codec combinations

Detection:

{
  "compressionAnalysis": {
    "multipleCompressionDetected": true,
    "layers": 3, // Indicates editing
    "splicePoints": [
      { "timestamp": "2.3s", "confidence": 89 },
      { "timestamp": "5.7s", "confidence": 92 }
    ],
    "verdict": "edited_audio"
  }
}

5. Background Consistency

Analyze ambient noise patterns.

Authentic Audio:

  • Consistent background noise
  • Natural environmental sounds
  • Smooth transitions

Manipulated Audio:

  • Abrupt background changes
  • Silence between words (noise removed)
  • Artificial background added
  • Mismatched environmental acoustics

Analysis Process

Step 1: Upload Audio

const formData = new FormData();
formData.append('audio', audioFile);
formData.append('analysisType', 'voice_verification'); // or 'general'
formData.append('compareToSample', enrollmentAudioId); // Optional: voice biometrics
 
const job = await fetch('/api/v4/deepfake/analyse', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`
  },
  body: formData
});
 
// Returns job ID
{
  "jobId": "job_audio123",
  "status": "processing",
  "estimatedTime": "8 seconds"
}

Step 2: Processing

Analysis Pipeline (5-10 seconds):

  1. Audio preprocessing (noise reduction, normalization)
  2. Spectral analysis
  3. Prosody and intonation analysis
  4. Compression artifact detection
  5. Background consistency check
  6. Voice biometrics (if enrollment sample provided)
  7. AI model fingerprinting
  8. Final risk scoring

Step 3: Receive Results

GET /api/v4/deepfake/jobs/:jobId
 
{
  "jobId": "job_audio123",
  "status": "completed",
  "audio": {
    "filename": "phone_call.mp3",
    "duration": "45 seconds",
    "format": "MP3",
    "sampleRate": "44100 Hz",
    "bitrate": "128 kbps"
  },
  "result": {
    "isDeepfake": true,
    "confidence": 91,
    "manipulationType": "voice_cloning",
    "aiModel": "ElevenLabs-like (suspected)",
    "riskScore": 88,
    "recommendation": "reject"
  },
  "analysis": {
    "spectralAnalysis": {
      "suspiciousPatterns": true,
      "confidence": 89
    },
    "prosodyAnalysis": {
      "pitchVariation": 15,
      "naturalPauses": false,
      "verdict": "synthetic"
    },
    "compressionAnalysis": {
      "multipleCompressionDetected": false
    },
    "backgroundConsistency": {
      "consistent": false,
      "issues": ["Abrupt silence between words"]
    }
  },
  "segments": [
    {
      "start": "0.0s",
      "end": "5.2s",
      "deepfakeConfidence": 94,
      "text": "This is John Smith calling about..."
    },
    {
      "start": "5.2s",
      "end": "12.8s",
      "deepfakeConfidence": 88,
      "text": "...the urgent transfer request..."
    }
  ]
}

Confidence Scores

ScoreAssessmentAction
90-100%Highly likely deepfakeReject/Hang up
70-89%Likely deepfakeManual verification required
40-69%UncertainAdditional authentication
0-39%Likely authenticProceed

Audio Requirements

Technical Requirements

RequirementSpecification
FormatMP3, WAV, M4A, OGG
Min Sample Rate16 kHz
Recommended Sample Rate44.1 kHz
Min Duration3 seconds
Max Duration5 minutes
Max File Size20 MB
Min Bitrate64 kbps
ChannelMono or Stereo

Quality Checks

Reject If:

  • Sample rate < 16 kHz (too low for analysis)
  • Duration < 3 seconds (insufficient data)
  • Heavy noise (SNR < 10 dB)
  • Clipped/distorted audio
  • No speech detected

Use Cases

Phone-Based Verification

Scenario: Bank customer calls to authorise wire transfer

Process:

  1. Customer enrolled voice sample on file
  2. Customer calls and speaks passphrase
  3. Voice biometrics: Verify speaker identity
  4. Deepfake detection: Check for voice cloning
  5. Approve/Deny based on combined score

Implementation:

{
  "verificationType": "phone",
  "enrollmentSampleId": "enroll_abc123",
  "passphrase": "My voice is my password",
  "requirePassphraseMatch": true,
  "deepfakeThreshold": 70,
  "biometricThreshold": 85
}

CEO Fraud Prevention

Scenario: Employee receives call from "CEO" requesting urgent wire transfer

Red Flags:

  • Urgency and secrecy requested
  • Unusual request (CEO doesn't normally call)
  • Poor call quality (may hide deepfake artifacts)

Verification:

  1. Record call audio
  2. Run deepfake detection
  3. If suspicious, call CEO back on known number
  4. Implement dual-approval for unusual requests

Voice Authentication

Scenario: Customer service voice verification

Multi-Factor Check:

  1. Knowledge: Answer security questions
  2. Voice Biometrics: Match enrolled voice
  3. Deepfake Detection: Ensure voice is real
  4. Behavioral: Analyze speech patterns

Best Practices

  1. Enroll voice samples - Collect clean voice sample during onboarding
  2. Set 70%+ deepfake threshold - Balance false positives vs. fraud
  3. Combine with other factors - Voice + knowledge questions + SMS code
  4. Use passphrases - Harder to clone specific phrases
  5. Monitor call quality - Poor quality may hide artifacts
  6. Train staff - Recognise social engineering red flags
  7. Callback verification - For high-risk requests, call back on known number
  8. Time delays - Implement cooling-off period for unusual requests

Limitations

Detection Accuracy

Current Performance:

  • Known TTS models: 95%
  • Voice cloning: 90%
  • Audio splicing: 93%
  • Overall: 90%

Challenges:

  • High-quality voice clones (10+ minutes of training data)
  • Professional audio editing
  • Low-quality phone calls (masks artifacts)
  • Background noise interference

False Positives

Common Causes:

  • Poor phone connection (adds artifacts)
  • Background noise (hides natural speech patterns)
  • Non-native speakers (different prosody)
  • Medical conditions (affects voice characteristics)
  • Emotions (crying, stress alters voice)

Mitigation:

  • Use 70%+ threshold
  • Manual review for 70-89% range
  • Request callback if uncertain
  • Document legitimate reasons for unusual characteristics

Emerging Threats

Real-Time Voice Conversion

Technology:

  • Real-time voice-to-voice transformation
  • Latency < 100ms (imperceptible)
  • Maintains emotion and prosody
  • Very sophisticated

Detection:

  • Spectral anomalies still present
  • Slight latency in responses
  • Background noise inconsistencies

Emotion Synthesis

New Capability:

  • AI models that add emotion to synthetic speech
  • Crying, laughing, stress
  • Makes voice clones more convincing

Counter-Measures:

  • Analyze emotional transitions (synthetic often too perfect)
  • Check for natural vocal strain
  • Verify emotional context matches content

API Reference

Analyze Audio

POST /api/v4/deepfake/analyse
 
// multipart/form-data
{
  "audio": File,
  "analysisType": "voice_verification" | "general",
  "enrollmentSampleId": string, // Optional: for biometrics
  "returnSegmentAnalysis": boolean // Per-segment deepfake scores
}
 
// Returns
{
  "jobId": "job_audio123",
  "status": "processing",
  "estimatedTime": "8 seconds"
}

Enroll Voice Sample

POST /api/v4/voice/enroll
 
// multipart/form-data
{
  "audio": File,
  "applicantId": string,
  "passphrase": string // Optional: specific phrase
}
 
// Returns enrollment ID for future biometric matching
{
  "enrollmentId": "enroll_abc123",
  "quality": 94,
  "status": "active"
}

Verify Speaker

POST /api/v4/voice/verify
 
{
  "enrollmentId": "enroll_abc123",
  "audioSampleId": "job_audio123",
  "includeDeepfakeCheck": true
}
 
// Returns combined biometric + deepfake result
{
  "biometricMatch": {
    "score": 92,
    "result": "match"
  },
  "deepfakeDetection": {
    "isDeepfake": false,
    "confidence": 18
  },
  "overallResult": "verified",
  "confidence": 89
}

Pricing

ServiceProcessing TimeCost
Audio deepfake detection5-10 seconds3 Credits
Voice enrollment3-5 seconds3 Credits
Voice verification (biometric + deepfake)8-12 secondsContact us

Regulatory Landscape

Biometric Data Privacy

GDPR (EU):

  • Voice is biometric data
  • Explicit consent required
  • Right to erasure applies
  • Encryption required

BIPA (Illinois, US):

  • Written policy required
  • User consent before collection
  • Cannot sell biometric data

CCPA (California):

  • Privacy notice required
  • Opt-out right
  • Deletion right

Deepfake Regulations

US State Laws:

  • California AB 730: Criminalizesdeepfake videos in elections
  • Texas HB 3004: Criminalises deepfake videos without disclosure
  • Virginia HB 2678: Criminalises non-consensual deepfake pornography

EU AI Act (Proposed):

  • Disclosure requirements for deepfakes
  • Transparency obligations
  • High-risk AI system regulations

Next Steps

See it in action

Experience the full power of VeriPlus compliance platform.

Start Free Trial

Ready to get started?

Start with our free plan. No credit card required.

We value your privacy

We use cookies to enhance your browsing experience, serve personalized content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies. Read our Privacy Policy and Cookie Policy for more information.