Publications
Detecting virus exposure during the pre-symptomatic incubation period using physiological data
Summary
Summary
Early pathogen exposure detection allows better patient care and faster implementation of public health measures (patient isolation, contact tracing). Existing exposure detection most frequently relies on overt clinical symptoms, namely fever, during the infectious prodromal period. We have developed a robust machine learning method to better detect asymptomatic states during...
Benchmarking SciDB data import on HPC systems
Summary
Summary
SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting...
A data-stream classification system for investigating terrorist threats
Summary
Summary
The role of cyber forensics in criminal investigations has greatly increased in recent years due to the wealth of data that is collected and available to investigators. Physical forensics has also experienced a data volume and fidelity revolution due to advances in methods for DNA and trace evidence analysis. Key...
D4M and large array databases for management and analysis of large biomedical imaging data
Summary
Summary
Advances in medical imaging technologies have enabled the acquisition of increasingly large datasets. Current state-of-the-art confocal or multi-photon imaging technology can produce biomedical datasets in excess of 1 TB per dataset. Typical approaches for analyzing large datasets rely on downsampling the original datasets or leveraging distributed computing resources where small...
Rapid sequence identification of potential pathogens using techniques from sparse linear algebra
Summary
Summary
The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling...
Using a big data database to identify pathogens in protein data space [e-print]
Summary
Summary
Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate...
Genetic sequence matching using D4M big data approaches
Summary
Summary
Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method...
Development and use of a comprehensive humanitarian assessment tool in post-earthquake Haiti
Summary
Summary
This paper describes a comprehensive humanitarian assessment tool designed and used following the January 2010 Haiti earthquake. The tool was developed under Joint Task Force -- Haiti coordination using indicators of humanitarian needs to support decision making by the United States Government, agencies of the United Nations, and various non-governmental...
Measurement of aerosol-particle trajectories using a structured laser beam
Summary
Summary
What is believed to be a new concept for the measurement of micrometer-sized particle trajectories in an inlet air stream is introduced. The technique uses a light source and a mask to generate a spatial pattern of light within a volume in space. Particles traverse the illumination volume and elastically...