PerSCiDO facilitates the exploration of research datasets.

Share your research datasets using PerSCiDO!

Numbers
Datasets: 35
Downloaded: 1672
  • Web data
DM Authors
DM-Authors dataset contains information about 4,906 researchers in the domain of data management. The dataset is a crawling on DBLP in October 2014. For each researcher, demographic attributes (gender, seniority, number of publications and publication rate) and activity attributes (list of venues and keywords that the researcher has contribute to) are provided.
Read me file
README.txt
Read me file
DM-Authors dataset contains information about 4,906 researchers in the domain of data management. The dataset is a crawling on DBLP in October 2014. For each researcher, demographic attributes (gender, seniority, number of publications and publication rate) and activity attributes (list of venues and keywords that the researcher has contribute to) are provided. More details are provided in the associated publication.

DM-Authors contains 3,189 male researchers, 459 female researchers and 1259 unknown-gender. The number of publications ranges from 3 to 885. Seniority (years since the first publication in DBLP) ranges from 1 to 67. Publication rate (number of publications per year) ranges from 0.18 to 59. The most frequent keyword in DM-Authors is obviously "data" with 37,987 repetitions. The number of unique keywords is 39,537.

DM-Authors contains four CSV files:
- authors.csv, where author ID, author name and other demographic information are available. This file contains 4,906 records.
- venues.csv, where each line contains the pair which means that the researcher identified with "author_id" has a publication in the conference/journal "venue". This file contains 179,508 records.
- keywords.csv, where each line contains the pair which means that the researcher identified with "author_id" has contributed to the term "keyword" in his/her research. This file contains 1,131,556 records.
- groups.csv, where each line represents a user group made on top of DM-Authors. User groups are obtained using the LCM frequent itemset mining algorithm. Each line has the following structure: "items (support) members". The list of items are separated by comma and constitutes the description of the group. The list of members is also separated by comma and contains the name of researchers which participate in that group. Finally the support value shows the number of members for each group. For instance, one group in the file "groups1.csv" is "senior, WWW, ICDE, SIGMOD Conference, CoRR abs/., (4) Narayanan Shivakumar, Philip Bohannon, Sihem Amer-Yahia, Zhe Zhao" which shows a group of four "senior" researchers who have published in "WWW", "ICDE", "SIGMOD" and "CoRR". Two group files are provided. The file "groups1.csv" contains a succinct list of 231,111 groups and "groups2.csv" contains 790,017 user groups. More details about generating user groups are provided in the associated publication.

Dataset contributer: Behrooz Omidvar-Tehrani, Ohio State University

Dataset Affiliation: LIG

Associated publication: Behrooz Omidvar-Tehrani, Sihem Amer-Yahia, and Alexandre Termier. "Interactive user group analysis." Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015.

Keywords: user data, data management, user groups, prediction, recommendation, user data analysis

2017 03 17
The size of this dataset is beetween 100 and 500 Mb
Archive files
DM_authors.zip
2017 03 17
19.07 MB
  • .DS_Store 6 004 ko
  • DM_authors /
  • DM_authors / ._.DS_Store 0 173 ko
  • authors.csv 151 225 ko
  • DM_authors / ._authors.csv 0 173 ko
  • groups1.txt 31 669 397 ko
  • DM_authors / ._groups1.txt 0 173 ko
  • groups2.txt 113 149 816 ko
  • DM_authors / ._groups2.txt 0 173 ko
  • keywords.csv 15 591 976 ko
  • DM_authors / ._keywords.csv 0 173 ko
  • venues.csv 3 308 250 ko
  • DM_authors / ._venues.csv 0 173 ko
  • ._DM_authors 0 173 ko
Other metadata
  • External Identifiers:

  • Subjects:

    Computer Science, social web
  • Keywords:

    prediction, data management, recommendation, user-data, user data analysis, user groups
  • Corresponding tasks:

    pattern extraction, prediction, clustering, visualisation
  • Encoding data format:

    csv

inproceedings{omidvar2015 interactive, title={Interactive user group analysis}, author={Omidvar-Tehrani, Behrooz and Amer-Yahia, Sihem and Termier, Alexandre}, booktitle={Proceedings of the 24th ACM International on Conference on Information and Kno, doi:10.18709/PERSCIDO.2016.10.DS32. Published 2017 via Perscido-Grenoble-Alpes;

inproceedings{omidvar2015 interactive, title={Interactive user group analysis}, author={Omidvar-Tehrani, Behrooz and Amer-Yahia, Sihem and Termier, Alexandre}, booktitle={Proceedings of the 24th ACM International on Conference on Information and Kno, doi:10.18709/PERSCIDO.2016.10.DS32. Published 2017 via Perscido-Grenoble-Alpes