PerSCiDO facilitates the exploration of research datasets.

Share your research datasets using PerSCiDO!

Numbers
Datasets: 38
Downloaded: 2097
  • speech data
VocADomA4H -- Acoustic recordings
This repository contains the acoustics signals of the Vocadom@A4H dataset : https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/vocadoma4h/. This part of the data is restricted but can be accessed by signing a form
Read me file
readme.txt
Read me file
*The VocADom@A4H corpus*

This dataset contains the complementary files (acoustic files) of the VocADom@A4H corpus whose main website is:
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/vocadoma4h/

Further information is also available on the VOCADOM project website: http://vocadom.imag.fr

This dataset contains a corpus of about 12 hours of data from 11 different recording sessions in the Amiqual4Home smart home. The experiment was conducted between May and June 2017 as part of the VocADom project supported by the Agence Nationale de la Recherche under grant ANR-16-CE33-0006.

If you use the corpus or need more details please refer to the following paper: Context-Aware Voice-based Interaction in Smart Home -VocADom@A4H Corpus Collection and Empirical Assessment of its Usefulness
@InProceedings{portet:hal-02165532,
author = "Portet, Fran{\c c}ois and Caffiau, Sybille and Ringeval, Fabien and Vacher, Michel and Bonnefond, Nicolas and Rossato, Solange and Lecouteux, Benjamin and Desot, Thierry",
title = "Context-Aware Voice-based Interaction in Smart Home - VocADom@A4H Corpus Collection and Empirical Assessment of its Usefulness",
booktitle = "17th IEEE International Conference on Pervasive Intelligence and Computing (PICom 2019)",
year = "2019",
location = "Fukuoka, Japan",
url = "https://hal.archives-ouvertes.fr/hal-02165532"
}


*Aims and protocol of the recording*
The experiment was performed to study voice commands in multi-room smart home and in a multi-dweller setting. Usual home automation sensors (movement detector, contact door detector, temperature, etc.) as well as arrays of microphone signals were captured.
Eleven participants uttered voice commands while performing scripted activities of daily living for about one hour of recording per participant. At the beginning of each session for each participant, the voice command grammar was not imposed as the aim was to elicit spontaneous speech. Then, the participants had to follow an increasingly constrained grammar. Using a Wizard-of-Oz approach, out-of-sight experimenters enacted user commands, acting as a 'perfect' voice command system.
For each participant, the whole experiment session was recorded continuously without interruption. Within a session, 3 phases were identified:

Phase 1 - Graphical based instruction to elicit spontaneous voice commands (interaction with the home)
Phase 2 - Inhabitant scenario enacting a visit by a friend (interaction with the home and the visitor)
Phase 3 - Voice commands in noisy domestic environment (reading of voice commands in the home - no interaction)

This data set is intended to be useful for the following (not exclusive) tasks :

multi-Human localization
multi-Human activity recognition
smart home context modeling
multi-channel Voice activity detection
multi-channel Automatic Speech Recognition
multi-channel Spoken Language Understanding
multi-channel Speaker Recognition
multi-channel speech enhancement
multi-channel blind source separation
multi-channel automatic decision making


*What is in this dataset*

This dataset contains the recorded acoustic signals whose usage is restricted to research only. Beware that all recorded speech utterances were in French.

All the data is stored under the record/ directory which contains 11 sub-directories named S[00-10]/. Each of these respects the following structure :

mic_array/ (microphone array recordings) available after having signed the End-User License Agreement (EULA)

mic_headset/ (headset microphone recordings) available after having signed the End-User License Agreement (EULA)



mic_array:

The 16-channel recording of the experiment was performed by 4 arrays of 4 microphones arranged in a square of 10cm side.
Each microphone was a t.bone LC 97 TWS (https://www.thomann.de/fr/the_tbone_lc97_tws.htm).
Recording was performed using Kristal Audio Engine : Version 1.0.1 (Jun 1 2004) on a Windows 8.1 64.
Each channel is a mono 16-bit signed integer PCM sampled at 16kHz
Array I (resp II, III, IV) is composed of channel_[1-4].wav (resp [5-8], [9-12], [13-16])
The floor plan for precise location of these arrays is avaible on the main repository of the dataset.

Known Issues:
- for S05 and S06 the array of microphones recording has been damaged. These files have not been recovered and hence cannot be used with confidence.


mic_headset:

Wireless microphone worn by the participant.
It was a SENNHEISER HSP4 -ew-3 static cardioide, jack 3.5.
The recording was performed using Audacity : 2.0.5 on a Ubuntu 14.04 LTS 64.
Mono 16-bit Signed Integer PCM acquired at 16kHz.

Known Issues:
- 15 first minutes of the worn microphone of S04 are missing. They have been padded with silence. Padding was added using the sox pad option


*What is NOT in this dataset*

All other data are freely available at :
https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/vocadoma4h/

Among other data, the End User will find the speech transcripts and the home automation logs of the recording sessions as well as documents about the smart home, participants and material used during the experiment.
2020 01 09
The size of this dataset is more than 4000 Mb
Archive files
vocadoma4h_20191217.zip
2020 02 04
9.92 GB
  • S00 / mic_array / channel_10.wav 116 187 543 ko
  • S00 / mic_array / channel_11.wav 116 187 543 ko
  • S00 / mic_array / channel_12.wav 116 187 543 ko
  • S00 / mic_array / channel_13.wav 116 187 543 ko
  • S00 / mic_array / channel_14.wav 116 187 543 ko
  • S00 / mic_array / channel_15.wav 116 187 543 ko
  • S00 / mic_array / channel_16.wav 116 187 543 ko
  • S00 / mic_array / channel_1.wav 116 187 543 ko
  • S00 / mic_array / channel_2.wav 116 187 543 ko
  • S00 / mic_array / channel_3.wav 116 187 543 ko
  • S00 / mic_array / channel_4.wav 116 187 543 ko
  • S00 / mic_array / channel_5.wav 116 187 543 ko
  • S00 / mic_array / channel_6.wav 116 187 543 ko
  • S00 / mic_array / channel_7.wav 116 187 543 ko
  • S00 / mic_array / channel_8.wav 116 187 543 ko
  • S00 / mic_array / channel_9.wav 116 187 543 ko
  • S00 / mic_headset / S00.wav 116 187 543 ko
  • S01 / mic_array / channel_10.wav 90 656 293 ko
  • S01 / mic_array / channel_11.wav 90 656 293 ko
  • S01 / mic_array / channel_12.wav 90 656 293 ko
  • S01 / mic_array / channel_13.wav 90 656 293 ko
  • S01 / mic_array / channel_14.wav 90 656 293 ko
  • S01 / mic_array / channel_15.wav 90 656 293 ko
  • S01 / mic_array / channel_16.wav 90 656 293 ko
  • S01 / mic_array / channel_1.wav 90 656 293 ko
  • S01 / mic_array / channel_2.wav 90 656 293 ko
  • S01 / mic_array / channel_3.wav 90 656 293 ko
  • S01 / mic_array / channel_4.wav 90 656 293 ko
  • S01 / mic_array / channel_5.wav 90 656 293 ko
  • S01 / mic_array / channel_6.wav 90 656 293 ko
  • S01 / mic_array / channel_7.wav 90 656 293 ko
  • S01 / mic_array / channel_8.wav 90 656 293 ko
  • S01 / mic_array / channel_9.wav 90 656 293 ko
  • S01 / mic_headset / S01.wav 90 656 293 ko
  • S02 / mic_array / channel_10.wav 124 062 543 ko
  • S02 / mic_array / channel_11.wav 124 062 543 ko
  • S02 / mic_array / channel_12.wav 124 062 543 ko
  • S02 / mic_array / channel_13.wav 124 062 543 ko
  • S02 / mic_array / channel_14.wav 124 062 543 ko
  • S02 / mic_array / channel_15.wav 124 062 543 ko
  • S02 / mic_array / channel_16.wav 124 062 543 ko
  • S02 / mic_array / channel_1.wav 124 062 543 ko
  • S02 / mic_array / channel_2.wav 124 062 543 ko
  • S02 / mic_array / channel_3.wav 124 062 543 ko
  • S02 / mic_array / channel_4.wav 124 062 543 ko
  • S02 / mic_array / channel_5.wav 124 062 543 ko
  • S02 / mic_array / channel_6.wav 124 062 543 ko
  • S02 / mic_array / channel_7.wav 124 062 543 ko
  • S02 / mic_array / channel_8.wav 124 062 543 ko
  • S02 / mic_array / channel_9.wav 124 062 543 ko
  • S02 / mic_headset / S02.wav 124 062 543 ko
  • S03 / mic_array / channel_10.wav 129 718 793 ko
  • S03 / mic_array / channel_11.wav 129 718 793 ko
  • S03 / mic_array / channel_12.wav 129 718 793 ko
  • S03 / mic_array / channel_13.wav 129 718 793 ko
  • S03 / mic_array / channel_14.wav 129 718 793 ko
  • S03 / mic_array / channel_15.wav 129 718 793 ko
  • S03 / mic_array / channel_16.wav 129 718 793 ko
  • S03 / mic_array / channel_1.wav 129 718 793 ko
  • S03 / mic_array / channel_2.wav 129 718 793 ko
  • S03 / mic_array / channel_3.wav 129 718 793 ko
  • S03 / mic_array / channel_4.wav 129 718 793 ko
  • S03 / mic_array / channel_5.wav 129 718 793 ko
  • S03 / mic_array / channel_6.wav 129 718 793 ko
  • S03 / mic_array / channel_7.wav 129 718 793 ko
  • S03 / mic_array / channel_8.wav 129 718 793 ko
  • S03 / mic_array / channel_9.wav 129 718 793 ko
  • S03 / mic_headset / S03.wav 129 718 793 ko
  • S04 / mic_array / channel_10.wav 118 312 543 ko
  • S04 / mic_array / channel_11.wav 118 312 543 ko
  • S04 / mic_array / channel_12.wav 118 312 543 ko
  • S04 / mic_array / channel_13.wav 118 312 543 ko
  • S04 / mic_array / channel_14.wav 118 312 543 ko
  • S04 / mic_array / channel_15.wav 118 312 543 ko
  • S04 / mic_array / channel_16.wav 118 312 543 ko
  • S04 / mic_array / channel_1.wav 118 312 543 ko
  • S04 / mic_array / channel_2.wav 118 312 543 ko
  • S04 / mic_array / channel_3.wav 118 312 543 ko
  • S04 / mic_array / channel_4.wav 118 312 543 ko
  • S04 / mic_array / channel_5.wav 118 312 543 ko
  • S04 / mic_array / channel_6.wav 118 312 543 ko
  • S04 / mic_array / channel_7.wav 118 312 543 ko
  • S04 / mic_array / channel_8.wav 118 312 543 ko
  • S04 / mic_array / channel_9.wav 118 312 543 ko
  • S04 / mic_headset / S04.wav 118 312 543 ko
  • S05 / mic_headset / S05.wav 121 750 043 ko
  • S06 / mic_headset / S06.wav 103 812 543 ko
  • S07 / mic_array / channel_10.wav 118 843 793 ko
  • S07 / mic_array / channel_11.wav 118 843 793 ko
  • S07 / mic_array / channel_12.wav 118 843 793 ko
  • S07 / mic_array / channel_13.wav 118 843 793 ko
  • S07 / mic_array / channel_14.wav 118 843 793 ko
  • S07 / mic_array / channel_15.wav 118 843 793 ko
  • S07 / mic_array / channel_16.wav 118 843 793 ko
  • S07 / mic_array / channel_1.wav 118 843 793 ko
  • S07 / mic_array / channel_2.wav 118 843 793 ko
  • S07 / mic_array / channel_3.wav 118 843 793 ko
  • S07 / mic_array / channel_4.wav 118 843 793 ko
  • S07 / mic_array / channel_5.wav 118 843 793 ko
  • S07 / mic_array / channel_6.wav 118 843 793 ko
  • S07 / mic_array / channel_7.wav 118 843 793 ko
  • S07 / mic_array / channel_8.wav 118 843 793 ko
  • S07 / mic_array / channel_9.wav 118 843 793 ko
  • S07 / mic_headset / S07.wav 118 843 793 ko
  • S08 / mic_array / channel_10.wav 136 625 043 ko
  • S08 / mic_array / channel_11.wav 136 625 043 ko
  • S08 / mic_array / channel_12.wav 136 625 043 ko
  • S08 / mic_array / channel_13.wav 136 625 043 ko
  • S08 / mic_array / channel_14.wav 136 625 043 ko
  • S08 / mic_array / channel_15.wav 136 625 043 ko
  • S08 / mic_array / channel_16.wav 136 625 043 ko
  • S08 / mic_array / channel_1.wav 136 625 043 ko
  • S08 / mic_array / channel_2.wav 136 625 043 ko
  • S08 / mic_array / channel_3.wav 136 625 043 ko
  • S08 / mic_array / channel_4.wav 136 625 043 ko
  • S08 / mic_array / channel_5.wav 136 625 043 ko
  • S08 / mic_array / channel_6.wav 136 625 043 ko
  • S08 / mic_array / channel_7.wav 136 625 043 ko
  • S08 / mic_array / channel_8.wav 136 625 043 ko
  • S08 / mic_array / channel_9.wav 136 625 043 ko
  • S08 / mic_headset / S08.wav 136 625 043 ko
  • S09 / mic_array / channel_10.wav 145 031 293 ko
  • S09 / mic_array / channel_11.wav 145 031 293 ko
  • S09 / mic_array / channel_12.wav 145 031 293 ko
  • S09 / mic_array / channel_13.wav 145 031 293 ko
  • S09 / mic_array / channel_14.wav 145 031 293 ko
  • S09 / mic_array / channel_15.wav 145 031 293 ko
  • S09 / mic_array / channel_16.wav 145 031 293 ko
  • S09 / mic_array / channel_1.wav 145 031 293 ko
  • S09 / mic_array / channel_2.wav 145 031 293 ko
  • S09 / mic_array / channel_3.wav 145 031 293 ko
  • S09 / mic_array / channel_4.wav 145 031 293 ko
  • S09 / mic_array / channel_5.wav 145 031 293 ko
  • S09 / mic_array / channel_6.wav 145 031 293 ko
  • S09 / mic_array / channel_7.wav 145 031 293 ko
  • S09 / mic_array / channel_8.wav 145 031 293 ko
  • S09 / mic_array / channel_9.wav 145 031 293 ko
  • S09 / mic_headset / S09.wav 145 031 293 ko
  • S10 / mic_array / channel_10.wav 132 609 418 ko
  • S10 / mic_array / channel_11.wav 132 609 418 ko
  • S10 / mic_array / channel_12.wav 132 609 418 ko
  • S10 / mic_array / channel_13.wav 132 609 418 ko
  • S10 / mic_array / channel_14.wav 132 609 418 ko
  • S10 / mic_array / channel_15.wav 132 609 418 ko
  • S10 / mic_array / channel_16.wav 132 609 418 ko
  • S10 / mic_array / channel_1.wav 132 609 418 ko
  • S10 / mic_array / channel_2.wav 132 609 418 ko
  • S10 / mic_array / channel_3.wav 132 609 418 ko
  • S10 / mic_array / channel_4.wav 132 609 418 ko
  • S10 / mic_array / channel_5.wav 132 609 418 ko
  • S10 / mic_array / channel_6.wav 132 609 418 ko
  • S10 / mic_array / channel_7.wav 132 609 418 ko
  • S10 / mic_array / channel_8.wav 132 609 418 ko
  • S10 / mic_array / channel_9.wav 132 609 418 ko
  • S10 / mic_headset / S10.wav 132 609 418 ko
Other metadata
  • External Identifiers:

  • Subjects:

    Computer Science
  • Keywords:

    Speech processing, smart home, voice command
  • Corresponding tasks:

    spoken language translation, classification, pattern extraction, prediction, rule extraction, person detection, Activity recognition- tracking
  • Encoding data format:

    wav files

François Portet, Sybille Caffiau, Fabien Ringeval, Michel Vacher, Nicolas Bonnefond (2020). VocADomA4H -- Acoustic recordings [Data set].. Published 2020 via Perscido-Grenoble-Alpes;

François Portet, Sybille Caffiau, Fabien Ringeval, Michel Vacher, Nicolas Bonnefond (2020). VocADomA4H -- Acoustic recordings [Data set].. Published 2020 via Perscido-Grenoble-Alpes