Publications
A fun and engaging interface for crowdsourcing named entities
Summary
Summary
There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus...
Enforced sparse non-negative matrix factorization
Summary
Summary
Non-negative matrix factorization (NMF) is a dimensionality reduction algorithm for data that can be represented as an undirected bipartite graph. It has become a common method for generating topic models of text data because it is known to produce good results, despite its relative simplicity of implementation and ease of...
LLMapReduce: multi-level map-reduce for high performance data analysis
Summary
Summary
The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing...
Generating a multiple-prerequisite attack graph
Summary
Summary
In one aspect, a method to generate an attack graph includes determining if a potential node provides a first precondition equivalent to one of preconditions provided by a group of preexisting nodes on the attack graph. The group of preexisting nodes includes a first state node, a first vulnerability instance...
A data-stream classification system for investigating terrorist threats
Summary
Summary
The role of cyber forensics in criminal investigations has greatly increased in recent years due to the wealth of data that is collected and available to investigators. Physical forensics has also experienced a data volume and fidelity revolution due to advances in methods for DNA and trace evidence analysis. Key...
Feedback-based social media filtering tool for improved situational awareness
Summary
Summary
This paper describes a feature-rich model of data relevance, designed to aid first responder retrieval of useful information from social media sources during disasters or emergencies. The approach is meant to address the failure of traditional keyword-based methods to sufficiently suppress clutter during retrieval. The model iteratively incorporates relevance feedback...
A key-centric processor architecture for secure computing
Summary
Summary
We describe a novel key-centric processor architecture in which each piece of data or code can be protected by encryption while at rest, in transit, and in use. Using embedded key management for cryptographic key handling, our processor permits mutually distrusting software written by different entities to work closely together...
Storage and Database Management for Big Data
Summary
Summary
The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and user calls for innovative tools that address the challenges faced by big data volume, velocity, and verity. While there has been great progress in the world...
Cryptography for Big Data security
Summary
Summary
This chapter focuses on state-of-the-art provably secure cryptographic techniques for protecting big data applications. We do not focus on more established, and commonly available cryptographic solutions. The goal is to inform practitioners of new techniques to consider as they develop new big data solutions rather than to summarize the current...
A reverse approach to named entity extraction and linking in microposts
Summary
Summary
In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions...