HSTOOL for Horizon Scanning of Scientific Literature

Authors:

  • Maja Karasalo
  • Johan Schubert

Publish date: 2019-05-07

Report number: FOI-R--4760--SE

Pages: 35

Written in: English

Keywords:

  • horizon scanning
  • scientometrics
  • Gibbs sampling
  • Dirichlet multinomial mixture model
  • entropy
  • clustering
  • HSTOOL

Abstract

In this report we develop a methodology and a system for horizon scanning of scientific literature to discover scientific trends. Literature within a broadly defined field is automatically clustered and ranked based on topic and scientific impact, respectively. A method for determining the optimal number of clusters for the established Gibbs sampling Dirichlet multinomial mixture model (GSDMM) algorithm is proposed along with a method for deriving descriptive and distinctive words for the discovered clusters. Furthermore, we propose a ranking methodology based on citation statistics to identify significant contributions within the discovered subject areas.