Skip to the content.

Overview

This dataset contains 5597 labeled segments. The clips were labeled with six stuttering-related event types and come with additional metadata. The labels in KSoF are compatible to the lables from the large English dataset SEP-28k, originally created by researches at Apple. An extended and improved version of the SEP-28k, SEP28k-E can be found on github.

Content

Files

The release contains this readme file, a copy of the KSoF EULA, a csv file containing the labels, and a foler segments which contains an audio file for each clip in the datset. The filename of the segments matches the column segment_id in the kassel-state-of-fluency-labels.csv file.

KSoF_release
├── Readme.md
├── KSoF_EULA.pdf
├── kassel-state-of-fluency-labels.csv
└── segments
    ├── 0000.wav
    ├── 0001.wav
    ├── 0002.wav
    ├── 0003.wav
    ├── 0004.wav
    ....

Labels

  segment_id speaker utterance therapy_status gender recording_device partition Block Prolongation Sound Repetition Word / Phrase Repetition No dysfluencies Modified/ Speech technique Interjection Natural pause Unintelligible Unsure No Speech Poor Audio Quality Music (Background Noise)
0 0000 000 000 nIK m dg dvel 0 0 0 0 0 3 0 0 0 0 0 0 0
1 0001 000 000 nIK m dg dvel 1 0 0 0 0 1 3 0 0 0 0 0 0

Columns:

Literature

For more details and baseline experiments please see either the pre-print on arXiv or the link from the LREC conference proceedings.

Please cite:

@inproceedings{bayerl_KSoFKasselState_2022,
  title = {KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering},
  booktitle = {Proceedings of the Language Resources and Evaluation Conference},
  author = {Bayerl, Sebastian Peter and {Wolff von Gudenberg}, Alexander and H{\"o}nig, Florian and Noeth, Elmar and Riedhammer, Korbinian},
  year = {2022},
  month = jun,
  pages = {1780--1787},
  publisher = {European Language Resources Association},
  address = {Marseille, France},
  keywords = {Computer Science - Computation and Language,Electrical Engineering and Systems Science - Audio and Speech Processing},
}

Further reading:

@incollection{bayerl_InfluenceDatasetPartitioning_2022,
  title = {The Influence of Dataset Partitioning on Dysfluency Detection Systems},
  booktitle = {Text, Speech, and Dialogue},
  author = {Bayerl, Sebastian P. and Wagner, Dominik and N{\"o}th, Elmar and Bocklet, Tobias and Riedhammer, Korbinian},
  editor = {Sojka, Petr and Kope{\v c}ek, Ivan and Pala, Karel and Hor{\'a}k, Ale{\v s}},
  year = {2022},
  url = {https://arxiv.org/abs/2206.03400},
  publisher = {Springer International Publishing}
}


@inproceedings{bayerl_DetectingDysfluenciesStuttering_2022,
  title = {Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0},
  author = {Bayerl, Sebastian Peter and Wagner, Dominik and N{\"o}th, Elmar and Riedhammer, Korbinian},
  booktitle = {Proc. Interspeech 2022},
  year = {2022},
  url = {https://arxiv.org/abs/2204.03417},
}


@inproceedings{lea_SEP28kDatasetStuttering_2021,
  title = {SEP-28k: A Dataset for Stuttering Event Detection from Podcasts with People Who Stutter},
  shorttitle = {SEP-28k},
  booktitle = {ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  author = {Lea, Colin and Mitra, Vikramjit and Joshi, Aparna and Kajarekar, Sachin and Bigham, Jeffrey P.},
  year = {2021},
  month = jun,
  pages = {6798--6802},
  publisher = {IEEE},
  address = {Toronto, ON, Canada},
  doi = {10.1109/ICASSP39728.2021.9413520},
}