Current computing technology allows for collection of vast amounts of scientific data rapidly. Automated pattern-detection methods can help scientists sift through the data by identifying patterns of interest for closer examination by the scientists. However, the size of datasets may jeopardize the use of conventional machine learning techniques because of limited memory and or CPU. To overcome these limitations we use two techniques. First, we decompose large machine-learning problems into multiple sub-problems and combine the sub-problem solutions into an ensemble to form a global solution. Second, we process the data incrementally and update intermediate results as more input is presented. TRL: 3-4