Skip to the content.

Experiment results for sarcastic TTS

An overview of sarcastic speech synthesis

An overview of SarcasticTTS

Update by Feb 23

Part 1. Denoised effect

Original Samples

Denoise by voicefixer

Denoise by Davinci Resolve

Denoised by Adobe(from Xiyuan)

Part 2. Fined-tuned FS2 on MUStARD(MUSTARD++)

Finetuned FS2(MUStARD, w/o denoise)

Finetuned FS2(MUStARD, w voicefixer)

Finetuned F2(MUStARD, w DavinciResolve)

Finetuned FS2(MUSTARD++, w Adobe)

Finetuned FS2(MUSTARD++(sarcastic), w Adobe)

Finetuned FS2(MUSTARD++(non-sarcastic), w Adobe)

Pre-trained FS2(LibriTTS)

Part 3. Friends without sarcasm labels

Spkear Amount
Chandler 2566
Joey 3018
Monica 2598
Phoebe 2642
Rachel 2870
Ross 2917
Total 16611

Pre-trained on Friends TV(w/o denoise)

On test set (from back-translated text)

On validation set(512)

Pre-trained on Friends TV(denoised by voicerfixer)

On test set (from back-translated text, spk: Chandler)

On test set (from back-translated text, spk: Rachel)

On validation set(512)

Friends with sarcasm labels(update on May 10)

Methods
example1(non-sarcastic)
Example1 (sarcastic)
Example2(non-sarcastic)
Example2(sarcastic)
Fine-tuned FS2
FS2 + Sarcasm Label
FS2 + ReferenceEncoder

MUSTARD_Plus_Plus in different sarcastic & emotional types(update on May 24)

Methods
example1
Example2
Fine-tuned FS2(PRO)
Fine-tuned FS2(ILL)
Fine-tuned FS2(Neutral)
Fine-tuned FS2(Happy)

MUSTARD_Plus_Plus with reference speech(update on June 07)

Methods
example1(reference1)
Example1(reference2)
Example1(reference3)
Example1(reference4)
example2(reference1)
Example2(reference2)
Example2(reference3)
Example2(reference4)
Fine-tuned FS2(w/o feedback)
Fine-tuned FS2(w/o feedback)

Evaluate on unseen data, from tweet sarcastic texts(update on June-14)

text(from iSarcasm)
pretrain
FT
FT(sarcastic)
FT(bert)
FT(ref)
FT(feedback)
FT(2feedback)
text1
text2
text3
text4
text5
text6
text7
text8
text9
text10

StyleSpeech on MUSTARD Plus Plus (update on June-28)

Methods
Example
GroundTruth
Pre-trained FS2
Fine-tuned FS2
Pre-trained StyleSpeech
Fine-tuned StyleSpeech