• Jul 29, 2019 News!IJMLC Had Implemented Online Submission System, Please Sumbit New Submissions thorough This System Only!   [Click]
  • Jul 16, 2019 News!Good News! All papers from Volume 9, Number 3 have been indexed by Scopus!   [Click]
  • Jul 08, 2019 News!Vol.9, No.4 has been published with online version.   [Click]
General Information
    • ISSN: 2010-3700 (Online)
    • Abbreviated Title: Int. J. Mach. Learn. Comput.
    • Frequency: Bimonthly
    • DOI: 10.18178/IJMLC
    • Editor-in-Chief: Dr. Lin Huang
    • Executive Editor:  Ms. Cherry L. Chen
    • Abstracing/Indexing: Scopus (since 2017), EI (INSPEC, IET), Google Scholar, Crossref, ProQuest, Electronic Journals Library.
    • E-mail: ijmlc@ejournal.net
Dr. Lin Huang
Metropolitan State University of Denver, USA
It's my honor to take on the position of editor in chief of IJMLC. We encourage authors to submit papers concerning any branch of machine learning and computing.

IJMLC 2013 Vol.3(2): 219-223 ISSN: 2010-3700
DOI: 10.7763/IJMLC.2013.V3.306

Data Extract: Mining Context from the Web for Dataset Extraction

Ayush Singhal and Jaideep Srivastava
Abstract—In this paper we address the problem of dataset extraction from research articles. With the growing digital data repositories and the demand of data centric research in data mining community, finding appropriate dataset for a research problem has become an essential step in scientific research. But given the wide variety of data usage in scientific research it is very difficult to figure out which datasets are most useful for a particular research topic. To alleviate this problem, an automated dataset search engine is a powerful tool. In this work we propose a novel approach to extract dataset names from research articles. We propose a novel way of using “web intelligence” from academic search engines and online dictionaries to mine dataset names from research articles. We also show a comparison between different sources of “web knowledge” by comparing different academic search engines such as Google scholar, Microsoft academic search. The performance of this approach is evaluated using standard information retrieval metric such as precision, recall and F-measure. We get an F-measure of 80%. This accuracy is significant for an unsupervised approach.

Index Terms—Dataset, information retrieval, web mining, search engines.

The authors are with CS department at University of Minnesota, Minnesota, MN 55414, USA (e-mail: ayush@ cs.umn.edu, srivas@ cs.umn.edu).


Cite:Ayush Singhal and Jaideep Srivastava, "Data Extract: Mining Context from the Web for Dataset Extraction," International Journal of Machine Learning and Computing vol. 3, no. 2, pp. 219-223, 2013.

Copyright © 2008-2019. International Journal of Machine Learning and Computing. All rights reserved.
E-mail: ijmlc@ejournal.net