Towards Intelligent Web Context-Based Content On-Demand Extraction Using Deep Learning

Information extraction and reasoning from massive high-dimensional data at dynamic contexts, is very demanding and yet is very hard to obtain in real-time basis. It is not impossible to achieve real-time management process on a huge data resource for content and high level information extraction. However, such process capability and efficiency might be affected and might be limited by the available computational resources and the consequent power consumption. Conventional search mechanisms are often incapable of real-time fetching a predefined content from data source, without concerning the increased number of connected devices that contribute to the same source. In this work, we propose and present a concept for an efficient approach for online content searching, takes advantage of  a) the structure of data profiling employed at the related data source; and b) the learning algorithms that are used for extracting its common features and for generating a map of indices to data contents. This enlables instant mapping of users’ requests to make the process as real-time as possible. As a case of study and a means for a simplified example, we represent the concept through an online application. The application takes two inputs. The first input is a URL, which belongs to a target website. The adopted learning algorithms main blocks are built using several machine learning algorithms and deep learning models to capture the semantic features in the targeted context of data sentences. The preliminary results conclusively confirmed that employing in our approach the recurrent neural networks as the core of the learning algorithm and the GloVE pretrained model as word embedding layer yielded highly acceptable levels of F1-score and prediction time.

Author(s): Bassem Mokhtar

Abstract | PDF

Share This Article