Distributed Relationship Mining over Big Scholar Data

Da Zhang, Mansur R. Kabuka

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


In this paper, we propose a system infrastructure to construct the big scholar data as a large knowledge graph, discover the meta paths between the entities and calculate the relevancy between entities in the graph. The core infrastructure is established on the secured and private Amazon Elastic Compute Cloud(Amazon EC2) platform. The infrastructure maintains the data evenly across the repositories, processes the data parallel by utilizing open source Spark framework, manages computing resources optimally by utilizing YARN and Hadoop HDFS, and discovers the relationship distributedly between different types of entities. We incorporate four relationship discovery tasks including citation recommendation, potential collaborator discovery, similar venue measurement and paper to venue recommendation on top of this infrastructure. For relationship mining tasks, we propose a mixed and weighted meta path (MWMP) method to explore the potential relationship between different types of entities. To verify the accuracy and measure parallelization speedup of our algorithm, we set up clusters through Amazon EC2 platform.

Original languageEnglish (US)
Article number8345598
Pages (from-to)354-365
Number of pages12
JournalIEEE Transactions on Emerging Topics in Computing
Issue number1
StatePublished - Jan 1 2021


  • Scholarly big data
  • distributed system
  • graph recommendation
  • heterogeneous information network

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications


Dive into the research topics of 'Distributed Relationship Mining over Big Scholar Data'. Together they form a unique fingerprint.

Cite this