PerSCiDO facilitates the exploration of research datasets.

Share your research datasets using PerSCiDO!

Numbers
Datasets: 35
Downloaded: 1772
Show Filters
Search
Refine

By Scientific Field

  • Computer Science (4)
  • Linguistics (4)
  • Information Technology

  • Arts and Medias

  • Behavioural Sciences

  • Mathematics (1)
  • Biology

  • social web

  • Engineering

  • Social Sciences

  • Medicine

  • Geography

  • Astrophysics and Astronomy

  • didactics

  • Environmental Science and Ecology

  • glaciology

  • Materials Science

  • Agriculture

  • Architecture

  • Biochemistry

  • Chemistry

  • Economy

  • Ethnology

  • Geology

  • History

  • Physics

By year

By DataType

  • experimental data (1)
  • graph data

  • textual data (1)
  • Web data

  • observation data

  • simulation data

  • Trace data

  • survey data

  • Software source code data

  • Video data

  • Image data (1)
  • speech data (1)
  • instrumentation data

  • Observation data simulation results

By Access

  • Open (4)
  • Restricted

Search results
4 results
  • Open
  • experimental data
  • Open
  • textual data
  • Sense Embeddings Models
  • Loïc Vial
  • This dataset contains the models of sense embeddings, or sense vectors, produced for the article called "Sense Embeddings in Knowledge-Based Word Sense Disambiguation" by Loïc Vial, Benjamin Lecouteux and Didier Schwab, in proceedings of the 12th International Conference on Computational Semantics (IWCS 2017). > ...  
  • Open
  • Image data
  • SPEECH-COCO
  • Laurent Besacier
  • SPEECH-COCO is an augmentation of MS-COCO dataset where speech is added to image and text. Speech captions were generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (>600h) paired with images. Disfluencies and speed perturbation were added to the signal in order to sound more natural. Each speech signal (WAV) is paired with a JSON file containing exact timecode for each word/syllable/phoneme in the spoken caption. Such a corpus could be used for Language and Vision (LaVi) tasks including speech input or output instead of text.> ...   ...
  • Open
  • speech data
  • Translation Augmented LibriSpeech Corpus
  • Laurent Besacier
  • Large scale (>200h) and publicly available read audio book corpus. This corpus is an augmentation of LibriSpeech ASR Corpus (1000h) and contains English utterances (from audiobooks) automatically aligned with French text. Our dataset offers ~236h of speech aligned to translated text. Speech recordings and source texts are originally from Gutenberg Project, which is a digital library of public domain books read by volunteers. Our augmentation of LibriSpeech is straightforward: we automatically al igned e-books in a foreign language (French) with English utterances of LibriSpeech. We gathered open domain e-books in French and extracted individual chapters available in LibriSpeech Corpus. Furthermore, we aligned chapters in French with English utterances in order to provide a corpus of speech recordings aligned with their translations.> ...   ...