Publications

Refine Results

(Filters Applied) Clear All

A data-stream classification system for investigating terrorist threats

Published in:
Proc. SPIE 9851, Next-Generation Analyst IV, 98510L (May 12, 2016); doi:10.1117/12.2224104.

Summary

The role of cyber forensics in criminal investigations has greatly increased in recent years due to the wealth of data that is collected and available to investigators. Physical forensics has also experienced a data volume and fidelity revolution due to advances in methods for DNA and trace evidence analysis. Key to extracting insight is the ability to correlate across multi-modal data, which depends critically on identifying a touch-point connecting the separate data streams. Separate data sources may be connected because they refer to the same individual, entity or event. In this paper we present a data source classification system tailored to facilitate the investigation of potential terrorist activity. This taxonomy is structured to illuminate the defining characteristics of a particular terrorist effort and designed to guide reporting to decision makers that is complete, concise, and evidence-based. The classification system has been validated and empirically utilized in the forensic analysis of a simulated terrorist activity. Next-generation analysts can use this schema to label and correlate across existing data streams, assess which critical information may be missing from the data, and identify options for collecting additional data streams to fill information gaps.
READ LESS

Summary

The role of cyber forensics in criminal investigations has greatly increased in recent years due to the wealth of data that is collected and available to investigators. Physical forensics has also experienced a data volume and fidelity revolution due to advances in methods for DNA and trace evidence analysis. Key...

READ MORE

Feedback-based social media filtering tool for improved situational awareness

Published in:
15th Annual IEEE Int. Symp. on Technologies for Homeland Security, HST 2016, 10-12 May 2016.

Summary

This paper describes a feature-rich model of data relevance, designed to aid first responder retrieval of useful information from social media sources during disasters or emergencies. The approach is meant to address the failure of traditional keyword-based methods to sufficiently suppress clutter during retrieval. The model iteratively incorporates relevance feedback to update feature space selection and classifier construction across a multimodal set of diverse content characterization techniques. This approach is advantageous because the aspects of the data (or even the modalities of the data) that signify relevance cannot always be anticipated ahead of time. Experiments with both microblog text documents and coupled imagery and text documents demonstrate the effectiveness of this model on sample retrieval tasks, in comparison to more narrowly focused models operating in a priori selected feature spaces. The experiments also show that even relatively low feedback levels (i.e., tens of examples) can lead to a significant performance boost during the interactive retrieval process.
READ LESS

Summary

This paper describes a feature-rich model of data relevance, designed to aid first responder retrieval of useful information from social media sources during disasters or emergencies. The approach is meant to address the failure of traditional keyword-based methods to sufficiently suppress clutter during retrieval. The model iteratively incorporates relevance feedback...

READ MORE

A key-centric processor architecture for secure computing

Published in:
2016 IEEE Int. Symp. on Hardware-Oriented Security and Trust, HOST 2016, 3-5 May 2016.

Summary

We describe a novel key-centric processor architecture in which each piece of data or code can be protected by encryption while at rest, in transit, and in use. Using embedded key management for cryptographic key handling, our processor permits mutually distrusting software written by different entities to work closely together without divulging algorithmic parameters or secret program data. Since the architecture performs encryption, decryption, and key management deeply within the processor hardware, the attack surface is minimized without significant impact on performance or ease of use. The current prototype implementation is based on the Sparc architecture and is highly applicable to small to medium-sized processing loads.
READ LESS

Summary

We describe a novel key-centric processor architecture in which each piece of data or code can be protected by encryption while at rest, in transit, and in use. Using embedded key management for cryptographic key handling, our processor permits mutually distrusting software written by different entities to work closely together...

READ MORE

Storage and Database Management for Big Data

Published in:
Big Data: Storage, Sharing, and Security

Summary

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and user calls for innovative tools that address the challenges faced by big data volume, velocity, and verity. While there has been great progress in the world of database technologies in the past few years, there are still many fundamental considerations that must be made by scientists. For example, which of the seemingly infinite technologies are the best to use for my problem? Answers to such questions require careful understanding of the technology field in addition to the types of problems that are being solved. This chapter aims to address many of the pressing questions faced by individuals interesting in using sotrage or database technologies to solve their big data problems.
READ LESS

Summary

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and user calls for innovative tools that address the challenges faced by big data volume, velocity, and verity. While there has been great progress in the world...

READ MORE

Cryptography for Big Data security

Published in:
Chapter 10 in Big Data: Storage, Sharing, and Security, 2016, pp. 214-87.

Summary

This chapter focuses on state-of-the-art provably secure cryptographic techniques for protecting big data applications. We do not focus on more established, and commonly available cryptographic solutions. The goal is to inform practitioners of new techniques to consider as they develop new big data solutions rather than to summarize the current best practice for securing data.
READ LESS

Summary

This chapter focuses on state-of-the-art provably secure cryptographic techniques for protecting big data applications. We do not focus on more established, and commonly available cryptographic solutions. The goal is to inform practitioners of new techniques to consider as they develop new big data solutions rather than to summarize the current...

READ MORE

A reverse approach to named entity extraction and linking in microposts

Published in:
Proc. of the 6th Workshop on "Making Sense of Microposts" (part of: 25th Int. World Wide Web Conf., 11 April 2016), #Microposts2016, pp. 67-69.

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions that are not co-referent with any entities in the knowledge base.
READ LESS

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions...

READ MORE

Named entity recognition in 140 characters or less

Published in:
Proc. of the 6th Workshop on "Making Sense of Microposts" (part of: 25th Int. World Wide Web Conf., 11 April 2016), #Microposts2016, pp. 78-79.

Summary

In this paper, we explore the problem of recognizing named entities in microposts, a genre with notoriously little context surrounding each named entity and inconsistent use of grammar, punctuation, capitalization, and spelling conventions by authors. In spite of the challenges associated with information extraction from microposts, it remains an increasingly important genre. This paper presents the MIT Information Extraction Toolkit (MITIE) and explores its adaptability to the micropost genre.
READ LESS

Summary

In this paper, we explore the problem of recognizing named entities in microposts, a genre with notoriously little context surrounding each named entity and inconsistent use of grammar, punctuation, capitalization, and spelling conventions by authors. In spite of the challenges associated with information extraction from microposts, it remains an increasingly...

READ MORE

Blind signal classification via sparse coding

Published in:
IEEE Int. Conf. Computer Communications, IEEE INFOCOM 2016, 10-15 April 2016.

Summary

We propose a novel RF signal classification method based on sparse coding, an unsupervised learning method popular in computer vision. In particular, we employ a convolutional sparse coder that can extract high-level features by computing the maximal similarity between an unknown received signal against an overcomplete dictionary of matched filter templates. Such dictionary can be either generated or trained in an unsupervised fashion from signal examples labeled with no ground truths. The computed sparse code then is applied to train SVM classifiers to discriminate RF signals. As a result, the proposed approach can achieve blind signal classification that requires no prior knowledge (e.g., MCS, pulse shaping) about the signals present in an arbitrary RF channel. Since modulated RF signals undergo pulse shaping to aid the matched filter detection by a receiver for the same radio protocol, our method can exploit variability in relative similarity against the dictionary atoms as the key discriminating factor for SVM. We present an empirical validation of our approach. The results indicate that we can separate different classes of digitally modulated signals from blind sampling with 70.3% recall and 24.6% false alarm at 10 dB SNR. If a labeled dataset were available for supervised classifier training, we can enhance the classification accuracy to 87.8% recall and 14.1% false alarm.
READ LESS

Summary

We propose a novel RF signal classification method based on sparse coding, an unsupervised learning method popular in computer vision. In particular, we employ a convolutional sparse coder that can extract high-level features by computing the maximal similarity between an unknown received signal against an overcomplete dictionary of matched filter...

READ MORE

SoK: privacy on mobile devices - it's complicated

Summary

Modern mobile devices place a wide variety of sensors and services within the personal space of their users. As a result, these devices are capable of transparently monitoring many sensitive aspects of these users' lives (e.g., location, health, or correspondences). Users typically trade access to this data for convenient applications and features, in many cases without a full appreciation of the nature and extent of the information that they are exposing to a variety of third parties. Nevertheless, studies show that users remain concerned about their privacy and vendors have similarly been increasing their utilization of privacy-preserving technologies in these devices. Still, despite significant efforts, these technologies continue to fail in fundamental ways, leaving users' private data exposed. In this work, we survey the numerous components of mobile devices, giving particular attention to those that collect, process, or protect users' private data. Whereas the individual components have been generally well studied and understood, examining the entire mobile device ecosystem provides significant insights into its overwhelming complexity. The numerous components of this complex ecosystem are frequently built and controlled by different parties with varying interests and incentives. Moreover, most of these parties are unknown to the typical user. The technologies that are employed to protect the users' privacy typically only do so within a small slice of this ecosystem, abstracting away the greater complexity of the system. Our analysis suggests that this abstracted complexity is the major cause of many privacy-related vulnerabilities, and that a fundamentally new, holistic, approach to privacy is needed going forward. We thus highlight various existing technology gaps and propose several promising research directions for addressing and reducing this complexity.
READ LESS

Summary

Modern mobile devices place a wide variety of sensors and services within the personal space of their users. As a result, these devices are capable of transparently monitoring many sensitive aspects of these users' lives (e.g., location, health, or correspondences). Users typically trade access to this data for convenient applications...

READ MORE

Competing cognitive resilient networks

Published in:
IEEE Trans. Cognit. Commun. and Netw., Vol. 2, No. 1, March 2016, pp. 95-109.

Summary

We introduce competing cognitive resilient network (CCRN) of mobile radios challenged to optimize data throughput and networking efficiency under dynamic spectrum access and adversarial threats (e.g., jamming). Unlike the conventional approaches, CCRN features both communicator and jamming nodes in a friendly coalition to take joint actions against hostile networking entities. In particular, this paper showcases hypothetical blue force and red force CCRNs and their competition for open spectrum resources. We present state-agnostic and stateful solution approaches based on the decision theoretic framework. The state-agnostic approach builds on multiarmed bandit to develop an optimal strategy that enables the exploratory-exploitative actions from sequential sampling of channel rewards. The stateful approach makes an explicit model of states and actions from an underlying Markov decision process and uses multiagent Q-learning to compute optimal node actions. We provide a theoretical framework for CCRN and propose new algorithms for both approaches. Simulation results indicate that the proposed algorithms outperform some of the most important algorithms known to date.
READ LESS

Summary

We introduce competing cognitive resilient network (CCRN) of mobile radios challenged to optimize data throughput and networking efficiency under dynamic spectrum access and adversarial threats (e.g., jamming). Unlike the conventional approaches, CCRN features both communicator and jamming nodes in a friendly coalition to take joint actions against hostile networking entities...

READ MORE