Computation of sparse representation of high dimensional time series data and experimental applications: Mathematics@IISc

Title: Computation of sparse representation of high dimensional time series data and experimental applications

Speaker: Sandeep K. Modi (IISc Mathematics)

Date: 26 April 2018

Time: 2:30 pm

Venue: LH-2, Mathematics Department

Obtaining a sparse representation of high dimensional data is often the first step towards its further analysis. Conventional Vector Autoregressive (VAR) modelling methods applied to such data results in noisy, non-sparse solutions with a too many spurious coefficients. Computing auxiliary quantities such as the Power Spectrum, Coherence and Granger Causality (GC) from such non-sparse models is slow and gives wrong results. Thresholding the distorted values of these quantities as per some criterion, statistical or otherwise, does not alleviate the problem.

We propose two sparse Vector Autoregressive (VAR) modelling methods that work well for high dimensional time series data, even when the number of time points is relatively low, by incorporating only statistically significant coefficients. In numerical experiments using simulated data, our methods show consistently higher accuracy compared to other contemporary methods in recovering the true sparse model. The relative absence of spurious coefficients in our models permits more accurate, stable and efficient evaluation of auxiliary quantities. Our VAR modelling methods are capable of computing Conditional Granger Causality (CGC) in datasets consisting of tens of thousands of variables with a speed and accuracy that far exceeds the capabilities of existing methods.

Using the Conditional Granger Causality computed from our models as a proxy for the weight of the edges in a network, we use community detection algorithms to simultaneously obtain both local and global functional connectivity patterns and community structures in large networks.

We also use our VAR modelling methods to predict time delays in many-variable systems. Using simulated data from non-linear delay differential equations, we compare our methods with commonly used delay prediction techniques and show that our methods yield more accurate results.

We apply the above methods to the following real experimental data:

Analysis of data from the Human Connectome Project (HCP): fMRI data from the HCP database is used to compute sparse brain functional connectivity networks. The network and community structures obtained are consistent over independent recording sessions and show good spatial correspondence with known functional and anatomical regions of the brain.
Analysis of ADHD-200 data: fMRI data from children with ADHD (Attention Deficit Hyperactive Disorder) is used to compute sparse brain functional connectivity networks. Analysis of the network measures obtained provide new ways of differentiating between ADHD and typically developing children using global and node-level network measures. They also enable refinement of published results relating to the rFIC-ACC interaction in fMRI resting state data.
Time-delay prediction from LFP recordings:When applied to Local Field Potential (LFP) recordings from the rat and monkey, our methods predict consistent delays across a range of sampling frequencies.
Application to the Hela gene interaction dataset: The network obtained by applying our methods to this dataset yields results that are at least as good as those from a specialized method for analysing gene interaction. This demonstrates that our methods can be applied to any time series data for which VAR modelling is valid.

In addition to the above methods, we apply non-parametric Granger Causality analysis (originally developed by A. Nedungadi, G. Rangarajan et al) to mixed point-process and real time-series data. Extending the computations to Conditional GC and by increasing the efficiency of the original computer code, we can compute the Conditional GC spectrum in systems consisting of hundreds of variables in a relatively short period. Further, combining this with VAR modelling provides an alternate faster route to compute the significance level of each element of the GC and CGC matrices. We use these techniques to analyse mixed Spike Train and LFP data from monkey electrocorticography (ECoG) recordings during a behavioural task. Interpretation of the results of the analysis is an ongoing collaboration.

Department of Mathematics

INDIAN INSTITUTE OF SCIENCE

PhD Thesis defence

Title: Computation of sparse representation of high dimensional time series data and experimental applications

Speaker: Sandeep K. Modi (IISc Mathematics)

Date: 26 April 2018

Time: 2:30 pm

Venue: LH-2, Mathematics Department