Publications

Refine Results

(Filters Applied) Clear All

Analytical models and methods for anomaly detection in dynamic, attributed graphs

Published in:
Chapter 2, Computational Network Analysis with R: Applications in Biology, Medicine, and Chemistry, 2017, pp. 35-61.

Summary

This chapter is devoted to anomaly detection in dynamic, attributed graphs. There has been a great deal of research on anomaly detection in graphs over the last decade, with a variety of methods proposed. This chapter discusses recent methods for anomaly detection in graphs,with a specific focus on detection within backgrounds based on random graph models. This sort of analysis can be applied for a variety of background models, which can incorporate topological dynamics and attributes of vertices and edges. The authors have developed a framework for anomalous subgraph detection in random background models, based on linear algebraic features of a graph. This includes an implementation in R that exploits structure in the random graph model for computationally tractable analysis of residuals. This chapter outlines this framework within the context of analyzing dynamic, attributed graphs. The remainder of this chapter is organized as follows. Section 2.2 defines the notation used within the chapter. Section 2.3 briefly describes a variety of perspectives and techniques for anomaly detection in graph-based data. Section 2.4 provides an overview of models for graph behavior that can be used as backgrounds for anomaly detection. Section 2.5 describes our framework for anomalous subgraph detection via spectral analysis of residuals, after the data are integrated over time. Section 2.6 discusses how the method described in Section 2.5 can be efficiently implemented in R using open source packages. Section 2.7 demonstrates the power of this technique in controlled simulation, considering the effects of both dynamics and attributes on detection performance. Section 2.8 gives a data analysis example within this context, using an evolving citation graph based on a commercially available document database of public scientific literature. Section 2.9 summarizes the chapter and discusses ongoing research in this area.
READ LESS

Summary

This chapter is devoted to anomaly detection in dynamic, attributed graphs. There has been a great deal of research on anomaly detection in graphs over the last decade, with a variety of methods proposed. This chapter discusses recent methods for anomaly detection in graphs,with a specific focus on detection within...

READ MORE

Cross-domain entity resolution in social media

Published in:
4th Int. Workshop on Natural Language Processing for Social Media, SocialNLP with IJCAI, 11 July 2016.

Summary

The challenge of associating entities across multiple domains is a key problem in social media understanding. Successful cross-domain entity resolution provides integration of information from multiple sites to create a complete picture of user and community activities, characteristics, and trends. In this work, we examine the problem of entity resolution across Twitter and Instagram using general techniques. Our methods fall into three categories: profile, content, and graph based. For the profile-based methods, we consider techniques based on approximate string matching. For content-based methods, we perform author identification. Finally, for graph-based methods, we apply novel cross-domain community detection methods and generate neighborhood-based features. The three categories of methods are applied to a large graph of users in Twitter and Instagram to understand challenges, determine performance, and understand fusion of multiple methods. Final results demonstrate an equal error rate less than 1%.
READ LESS

Summary

The challenge of associating entities across multiple domains is a key problem in social media understanding. Successful cross-domain entity resolution provides integration of information from multiple sites to create a complete picture of user and community activities, characteristics, and trends. In this work, we examine the problem of entity resolution...

READ MORE

Charting a security landscape in the clouds: data protection and collaboration in cloud storage

Summary

This report surveys different approaches to securely storing and sharing data in the cloud based on traditional notions of security: confidentiality, integrity, and availability, with the main focus on confidentiality. An appendix discusses the related notion of how users can securely authenticate to cloud providers. We propose a metric for comparing secure storage approaches based on their residual vulnerabilities: attack surfaces against which an approach cannot protect. Our categorization therefore ranks approaches from the weakest (the most residual vulnerabilities) to the strongest (the fewest residual vulnerabilities). In addition to the security provided by each approach, we also consider their inherent costs and limitations. This report can therefore help an organization select a cloud data protection approach that satisfies their enterprise infrastructure, security specifications, and functionality requirements.
READ LESS

Summary

This report surveys different approaches to securely storing and sharing data in the cloud based on traditional notions of security: confidentiality, integrity, and availability, with the main focus on confidentiality. An appendix discusses the related notion of how users can securely authenticate to cloud providers. We propose a metric for...

READ MORE

Balancing security and performance for agility in dynamic threat environments

Published in:
46th IEEE/IFIP Int. Conf. on Dependable Systems and Networks, DSN 2016, 28 June - 1 July 2016.

Summary

In cyber security, achieving the desired balance between system security and system performance in dynamic threat environments is a long-standing open challenge for cyber defenders. Typically an increase in system security comes at the price of decreased system performance, and vice versa, easily resulting in systems that are misaligned to operator specified requirements for system security and performance as the threat environment evolves. We develop an online, reinforcement learning based methodology to automatically discover and maintain desired operating postures in security-performance space even as the threat environment changes. We demonstrate the utility of our approach and discover parameters enabling an agile response to a dynamic adversary in a simulated security game involving prototype cyber moving target defenses.
READ LESS

Summary

In cyber security, achieving the desired balance between system security and system performance in dynamic threat environments is a long-standing open challenge for cyber defenders. Typically an increase in system security comes at the price of decreased system performance, and vice versa, easily resulting in systems that are misaligned to...

READ MORE

Collaborative Data Analysis and Discovery for Cyber Security

Published in:
Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS 2016)

Summary

In this paper, we present the Cyber Analyst Real-Time Integrated Notebook Application (CARINA). CARINA is a collaborative investigation system that aids in decision making by co-locating the analysis environment with centralized cyber data sources, and providing next generation analysts with increased visibility to the work of others.
READ LESS

Summary

In this paper, we present the Cyber Analyst Real-Time Integrated Notebook Application (CARINA). CARINA is a collaborative investigation system that aids in decision making by co-locating the analysis environment with centralized cyber data sources, and providing next generation analysts with increased visibility to the work of others.

READ MORE

Channel compensation for speaker recognition using MAP adapted PLDA and denoising DNNs

Published in:
Odyssey 2016, The Speaker and Language Recognition Workshop, 21-24 June 2016.

Summary

Over several decades, speaker recognition performance has steadily improved for applications using telephone speech. A big part of this improvement has been the availability of large quantities of speaker-labeled data from telephone recordings. For new data applications, such as audio from room microphones, we would like to effectively use existing telephone data to build systems with high accuracy while maintaining good performance on existing telephone tasks. In this paper we compare and combine approaches to compensate models parameters and features for this purpose. For model adaptation we explore MAP adaptation of hyper-parameters and for feature compensation we examine the use of denoising DNNs. On a multi-room, multi-microphone speaker recognition experiment we show a reduction of 61% in EER with a combination of these approaches while slightly improving performance on telephone data.
READ LESS

Summary

Over several decades, speaker recognition performance has steadily improved for applications using telephone speech. A big part of this improvement has been the availability of large quantities of speaker-labeled data from telephone recordings. For new data applications, such as audio from room microphones, we would like to effectively use existing...

READ MORE

The MITLL NIST LRE 2015 Language Recognition System

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First, the evaluation included fixed training and open training tracks for the first time; second, language classification performance was measured across 6 language clusters using 20 language classes instead of an N-way language task; and third, performance was measured across a nominal 3-30 second range. Results are presented for the overall performance across the six language clusters for both the fixed and open training tasks. On the 6-cluster metric the Lincoln system achieved overall costs of 0.173 and 0.168 for the fixed and open tasks respectively.
READ LESS

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First...

READ MORE

A vocal modulation model with application to predicting depression severity

Published in:
13th IEEE Int. Conf. on Wearable and Implantable Body Sensor Networks, BSN 2016, 14-17 June 2016.

Summary

Speech provides a potential simple and noninvasive "on-body" means to identify and monitor neurological diseases. Here we develop a model for a class of vocal biomarkers exploiting modulations in speech, focusing on Major Depressive Disorder (MDD) as an application area. Two model components contribute to the envelope of the speech waveform: amplitude modulation (AM) from respiratory muscles, and AM from interaction between vocal tract resonances (formants) and frequency modulation in vocal fold harmonics. Based on the model framework, we test three methods to extract envelopes capturing these modulations of the third formant for synthesized sustained vowels. Using subsequent modulation features derived from the model, we predict MDD severity scores with a Gaussian Mixture Model. Performing global optimization over classifier parameters and number of principal components, we evaluate performance of the features by examining the root-mean-squared error (RMSE), mean absolute error (MAE), and Spearman correlation between the actual and predicted MDD scores. We achieved RMSE and MAE values 10.32 and 8.46, respectively (Spearman correlation=0.487, p<0.001), relative to a baseline RMSE of 11.86 and MAE of 10.05, obtained by predicting the mean MDD severity score. Ultimately, our model provides a framework for detecting and monitoring vocal modulations that could also be applied to other neurological diseases.
READ LESS

Summary

Speech provides a potential simple and noninvasive "on-body" means to identify and monitor neurological diseases. Here we develop a model for a class of vocal biomarkers exploiting modulations in speech, focusing on Major Depressive Disorder (MDD) as an application area. Two model components contribute to the envelope of the speech...

READ MORE

BubbleNet: A Cyber Security Dashboard for Visualizing Patterns

Published in:
Proceeding of 2016 Eurographics Conference on Visualization (EuroVis)

Summary

The field of cyber security is faced with ever-expanding amounts of data and a constant barrage of cyber attacks. Within this space, we have designed BubbleNet as a cyber security dashboard to help network analysts identify and summarize patterns within the data.
READ LESS

Summary

The field of cyber security is faced with ever-expanding amounts of data and a constant barrage of cyber attacks. Within this space, we have designed BubbleNet as a cyber security dashboard to help network analysts identify and summarize patterns within the data.

READ MORE

Operational assessment of keyword search on oral history

Published in:
10th Language Resources and Evaluation Conf., LREC 2016, 23-8 May 2016.

Summary

This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted value on human search capability through an information retrieval task on Mechanical Turk. We use English oral history data collected by StoryCorps, a national organization that provides all people with the opportunity to record, share and preserve their stories, and control for a variety of demographics including age, gender, birthplace, and dialect on four different training set sizes. We show comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.
READ LESS

Summary

This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted...

READ MORE