Semi-automatic datadriven webb analysis: research, prototype development, and experiments

Authors:

  • Magnus Rosell
  • Ulrika Wickenberg Bolin
  • Joel Brynielsson
  • Marianela Garcia Lozano
  • David Gustafsson
  • Andreas Horndahl
  • Maja Karasalo
  • Hanna Lilja
  • Björn Pelzer
  • Karl-Göran Stenborg
  • Erik Valldor
  • Stefan Varga

Publish date: 2019-02-08

Report number: FOI-R--4692--SE

Pages: 73

Written in: Swedish

Keywords:

  • data-driven analysis
  • intelligence analysis
  • intelligence studies
  • deception
  • influence operations
  • social media
  • web analysis
  • text analysis
  • image analysis
  • AI
  • machine learning
  • language technology

Abstract

This report describes methods and techniques for semi-automatic data-driven intelligence analysis based on unstructured text and image data from the web. It summarizes the activities of the three years long project TIA (Technologies for Information Fusion and Analysis, 2016 - 2018). Simply put, we have studied computer science methods that may support intelligence analysts in their work, designed a prototype tool in which these methods are used, and conducted workshops and experiments with analysts in which they have tested and used this prototype. Within the project, we have conducted research and developed methods for automated text and image analysis, including methods based on deep neural networks and other types of machine learning. We have developed detectors for several types of objects in images. In automated text analysis, we have studied methods for assessing the sentiment of texts (positive or negative), assessing stance with respect to rumors, and assessing the credibility of texts. We have also developed two rule languages that enable detection of instances of expressions and combinations of facts in text. This facilitates the monitoring of many different themes, which we have also demonstrated in small studies with web data. Perhaps the most significant contribution of the project is the development of a prototype for semi-automatic data-driven intelligence analysis of web data. It consists of a number of components that covers the entire process from data downloading methods, through analysis components for classifying content in text and image data, to visualization. New combinations of the components allow new ways to process and visualize data. The prototype has served multiple purposes for the project. By implementing several of the methods we have studied as components, we have been able to test how well they work on new and realistic data. In presentations and workshops together with analysts, we have also been able to demonstrate advantages and limitations of the methods. Finally, we have conducted an initial usefulness experiment together with analysts to investigate how intelligence analysis is best supported by these methods. Throughout, the prototype and the underlying methods have been considered to be of interest. We intend to continue our research on automated text and image analysis, as well as the development of the prototype, in subsequent projects.