Full-text search for arXiv

24 article(s) in total. 74 co-authors, from 1 to 5 common article(s). Median position in authors list is 1,0.

[1] oai:arXiv.org:1708.00544 [pdf] - 1644119

Performance Measurements of Supercomputing and Cloud Storage Solutions

Jones, Michael; Kepner, Jeremy; Arcand, William; Bestor, David; Bergeron, Bill; Gadepally, Vijay; Houle, Michael; Hubbell, Matthew; Michaleas, Peter; Prout, Andrew; Reuther, Albert; Samsi, Siddharth; Monticiollo, Paul

Comments: 5 pages, 4 figures, to appear in IEEE HPEC 2017

Submitted: 2017-08-01

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an open-source, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.

[2] oai:arXiv.org:1707.03515 [pdf] - 1644114

Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

Byun, Chansup; Kepner, Jeremy; Arcand, William; Bestor, David; Bergeron, Bill; Gadepally, Vijay; Houle, Michael; Hubbell, Matthew; Jones, Michael; Klein, Anna; Michaleas, Peter; Milechin, Lauren; Mullen, Julie; Prout, Andrew; Rosa, Antonio; Samsi, Siddharth; Yee, Charles; Reuther, Albert

Comments: 6 pages; 9 figures; accepted to IEEE HPEC 2017

Submitted: 2017-07-11

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and Octave. More recently, machine learning applications, such as the UC Berkeley Caffe deep learning framework, have become increasingly important to LLSC users. Thus, the performance of these applications on KNL systems is of high interest to LLSC users and the broader data analysis and machine learning communities. Our data analysis benchmarks of these application on the Intel KNL processor indicate that single-core double-precision generalized matrix multiply (DGEMM) performance on KNL systems has improved by ~3.5x compared to prior Intel Xeon technologies. Our data analysis applications also achieved ~60% of the theoretical peak performance. Also a performance comparison of a machine learning application, Caffe, between the two different Intel CPUs, Xeon E5 v3 and Xeon Phi 7210, demonstrated a 2.7x improvement on a KNL node.

[3] oai:arXiv.org:1606.05790 [pdf] - 1530862

Mathematical Foundations of the GraphBLAS

Kepner, Jeremy; Aaltonen, Peter; Bader, David; Buluc, Aydın; Franchetti, Franz; Gilbert, John; Hutchison, Dylan; Kumar, Manoj; Lumsdaine, Andrew; Meyerhenke, Henning; McMillan, Scott; Moreira, Jose; Owens, John D.; Yang, Carl; Zalewski, Marcin; Mattson, Timothy

Comments: 9 pages; 11 figures; accepted to IEEE High Performance Extreme Computing (HPEC) conference 2016

Submitted: 2016-06-18, last modified: 2016-07-13

The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. Mathematically the Graph- BLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix mul- tiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.

[4] oai:arXiv.org:1603.01876 [pdf] - 1530516

PageRank Pipeline Benchmark: Proposal for a Holistic System Benchmark for Big-Data Platforms

Dreher, Patrick; Byun, Chansup; Hill, Chris; Gadepally, Vijay; Kuszmaul, Bradley; Kepner, Jeremy

Comments: 9 pages, 7 figures, to appear in IPDPS 2016 Graph Algorithms Building Blocks (GABB) workshop

Submitted: 2016-03-06, last modified: 2016-06-03

The rise of big data systems has created a need for benchmarks to measure and compare the capabilities of these systems. Big data benchmarks present unique scalability challenges. The supercomputing community has wrestled with these challenges for decades and developed methodologies for creating rigorous scalable benchmarks (e.g., HPC Challenge). The proposed PageRank pipeline benchmark employs supercomputing benchmarking methodologies to create a scalable benchmark that is reflective of many real-world big data processing systems. The PageRank pipeline benchmark builds on existing prior scalable benchmarks (Graph500, Sort, and PageRank) to create a holistic benchmark with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. The linear algebraic nature of PageRank makes it well suited to being implemented using the GraphBLAS standard. The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed PageRank pipeline benchmark is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance has been measured.

[5] oai:arXiv.org:1407.3859 [pdf] - 1047925

D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database

Kepner, Jeremy; Anderson, Christian; Arcand, William; Bestor, David; Bergeron, Bill; Byun, Chansup; Hubbell, Matthew; Michaleas, Peter; Mullen, Julie; O'Gwynn, David; Prout, Andrew; Reuther, Albert; Rosa, Antonio; Yee, Charles

Comments: 6 pages; IEEE HPEC 2013

Submitted: 2014-07-14

Non-traditional, relaxed consistency, triple store databases are the backbone of many web companies (e.g., Google Big Table, Amazon Dynamo, and Facebook Cassandra). The Apache Accumulo database is a high performance open source relaxed consistency database that is widely used for government applications. Obtaining the full benefits of Accumulo requires using novel schemas. The Dynamic Distributed Dimensional Data Model (D4M)[http://d4m.mit.edu] provides a uniform mathematical framework based on associative arrays that encompasses both traditional (i.e., SQL) and non-traditional databases. For non-traditional databases D4M naturally leads to a general purpose schema that can be used to fully index and rapidly query every unique string in a dataset. The D4M 2.0 Schema has been applied with little or no customization to cyber, bioinformatics, scientific citation, free text, and social media data. The D4M 2.0 Schema is simple, requires minimal parsing, and achieves the highest published Accumulo ingest rates. The benefits of the D4M 2.0 Schema are independent of the D4M interface. Any interface to Accumulo can achieve these benefits by using the D4M 2.0 Schema

[6] oai:arXiv.org:1406.5751 [pdf] - 1047920

Computing on Masked Data: a High Performance Method for Improving Big Data Veracity

Kepner, Jeremy; Gadepally, Vijay; Michaleas, Pete; Schear, Nabil; Varia, Mayank; Yerukhimovich, Arkady; Cunningham, Robert K.

Comments: to appear in IEEE High Performance Extreme Computing 2014 (ieee-hpec.org)

Submitted: 2014-06-22

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases with strong support of sparse operations, such as SciDB or Apache Accumulo, are ideally suited to this technique. Examples are shown for the application of CMD to a complex DNA matching algorithm and to database operations over social media data.

[7] oai:arXiv.org:1406.4923 [pdf] - 1047917

Achieving 100,000,000 database inserts per second using Accumulo and D4M

Kepner, Jeremy; Arcand, William; Bestor, David; Bergeron, Bill; Byun, Chansup; Gadepally, Vijay; Hubbell, Matthew; Michaleas, Peter; Mullen, Julie; Prout, Andrew; Reuther, Albert; Rosa, Antonio; Yee, Charles

Comments: 6 pages; to appear in IEEE High Performance Extreme Computing (HPEC) 2014

Submitted: 2014-06-18

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a 216-node cluster running the MIT SuperCloud software stack. A peak performance of over 100,000,000 database inserts per second was achieved which is 100x larger than the highest previously published value for any other database. The performance scales linearly with the number of ingest clients, number of database servers, and data size. The performance was achieved by adapting several supercomputing techniques to this application: distributed arrays, domain decomposition, adaptive load balancing, and single-program-multiple-data programming.

[8] oai:arXiv.org:astro-ph/0606464 [pdf] - 1048512

pMatlab Parallel Matlab Library

Bliss, Nadya; Kepner, Jeremy

Comments: 31 Pages, 17 Figures

Submitted: 2006-06-19

MATLAB has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with ~1,000,000 users worldwide. The compute intensive nature of technical computing means that many MATLAB users have codes that can significantly benefit from the increased performance offered by parallel computing. pMatlab (www.ll.mit.edu/pMatlab) provides this capability by implementing Parallel Global Array Semantics (PGAS) using standard operator overloading techniques. The core data structure in pMatlab is a distributed numerical array whose distribution onto multiple processors is specified with a map construct. Communication operations between distributed arrays are abstracted away from the user and pMatlab transparently supports redistribution between any block-cyclic-overlapped distributions up to four dimensions. pMatlab is built on top of the MatlabMPI communication library (www.ll.mit.edu/MatlabMPI) and runs on any combination of heterogeneous systems that support MATLAB, which includes Windows, Linux, MacOSX, and SunOS. Performance is validated by implementing the HPC Challenge benchmark suite and comparing pMatlab performance with the equivalent C+MPI codes. These results indicate that pMatlab can often achieve comparable performance to C+MPI at usually 1/10th the code size. Finally, we present implementation data collected from a sample of 10 real pMatlab applications drawn from the ~100 users at MIT Lincoln Laboratory. These data indicate that users are typically able to go from a serial code to a well performing pMatlab code in about 3 hours while changing less than 1% of their code.

[9] oai:arXiv.org:astro-ph/0305090 [pdf] - 1048511

MatlabMPI

Kepner, Jeremy; Ahalt, Stan

Comments: Download software from http://www.ll.mit.edu/MatlabMPI, 12 pages including 7 color figures; submitted to the Journal of Parallel and Distributed Computing

Submitted: 2003-05-06

The true costs of high performance computing are currently dominated by software. Addressing these costs requires shifting to high productivity languages such as Matlab. MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI ``look and feel'' on top of standard Matlab file I/O, resulting in an extremely compact (~250 lines of code) and ``pure'' implementation which runs anywhere Matlab runs, and on any heterogeneous combination of computers. The performance has been tested on both shared and distributed memory parallel computers (e.g. Sun, SGI, HP, IBM, Linux and MacOSX). MatlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using MatlabMPI achieved a speedup of ~300 using 304 CPUs and ~15% of the theoretical peak (450 Gigaflops) on an IBM SP2 at the Maui High Performance Computing Center. In addition, this entire parallel benchmark application was implemented in 70 software-lines-of-code, illustrating the high productivity of this approach. MatlabMPI is available for download on the web (www.ll.mit.edu/MatlabMPI).

[10] oai:arXiv.org:astro-ph/0207389 [pdf] - 50532

300x Faster Matlab using MatlabMPI

Kepner, Jeremy; Ahalt, Stan

Comments: submitted to Supercomputing 2002; 10 pages; 8 figures

Submitted: 2002-07-18

The true costs of high performance computing are currently dominated by software. Addressing these costs requires shifting to high productivity languages such as Matlab. MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI ``look and feel'' on top of standard Matlab file I/O, resulting in an extremely compact (~250 lines of code) and ``pure'' implementation which runs anywhere Matlab runs, and on any heterogeneous combination of computers. The performance has been tested on both shared and distributed memory parallel computers (e.g. Sun, SGI, HP, IBM and Linux). MatlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using MatlabMPI achieved a speedup of ~300 using 304 CPUs and ~15% of the theoretical peak (450 Gigaflops) on an IBM SP2 at the Maui High Performance Computing Center. In addition, this entire parallel benchmark application was implemented in 70 software-lines-of-code (SLOC) yielding 0.85 Gigaflop/SLOC or 4.4 CPUs/SLOC, which are the highest values of these software price performance metrics ever achieved for any application. The MatlabMPI software will be available for download.

[11] oai:arXiv.org:astro-ph/0110383 [pdf] - 1942362

The Alignment Effect of Brightest Cluster Galaxies in the SDSS

Kim, Rita S. J.; Annis, Jim; Strauss, Michael A.; Lupton, Robert H.; Bahcall, Neta A.; Gunn, James E.; Kepner, Jeremy V.; Postman, Marc

Comments: 6 pages, 3 figures, to appear in "Where's the Matter? Tracing Dark and Bright Matter with the New Generation of Large Scale Surveys", June 2001, Treyer & Tresse Eds, Frontier Group."

Submitted: 2001-10-16

One of the most vital observational clues for unraveling the origin of Brightest Cluster Galaxies (BCG) is the observed alignment of the BCGs with their host cluster and its surroundings. We have examined the BCG-cluster alignment effect, using clusters of galaxies detected from the Sloan Digital Sky Survey (SDSS). We find that the BCGs are preferentially aligned with the principal axis of their hosts, to a much higher redshift (z >~ 0.3) than probed by previous studies (z <~ 0.1). The alignment effect strongly depends on the magnitude difference of the BCG and the second and third brightest cluster members: we find a strong alignment effect for the dominant BCGs, while less dominant BCGs do not show any departure from random alignment with respect to the cluster. We therefore claim that the alignment process originates from the same process that makes the BCG grow dominant, be it direct mergers in the early stage of cluster formation, or a later process that resembles the galactic cannibalism scenario. We do not find strong evidence for (or against) redshift evolution between 0<z<0.45, largely due to the insufficient sample size (< 200 clusters). However, we have developed a framework by which we can examine many more clusters in an automated fashion for the upcoming SDSS cluster catalogs, which will provide us with better statistics for systematic investigations of the alignment with redshift, richness and morphology of both the cluster and the BCG.

[12] oai:arXiv.org:astro-ph/0110259 [pdf] - 45316

Detecting Clusters of Galaxies in the Sloan Digital Sky Survey I : Monte Carlo Comparison of Cluster Detection Algorithms

Kim, Rita S. J.; Kepner, Jeremy V.; Postman, Marc; Strauss, Michael A.; Bahcall, Neta A.; Gunn, James E.; Lupton, Robert H.; Annis, James; Nichol, Robert C.; Castander, Francisco J.; Brinkmann, J.; Brunner, Robert J.; Connolly, Andrew; Csabai, Istvan; Hindsley, Robert B.; Ivezic, Zeljko; Vogeley, Michael S.; York, Donald G.

Comments: 38 pages, 15 figures, Accepted for publication in AJ

Submitted: 2001-10-10

We present a comparison of three cluster finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg^2 region of Sloan Digital Sky Survey (SDSS) imaging data: the Matched Filter (MF; Postman et al. 1996), the Adaptive Matched Filter (AMF; Kepner et al. 1999) and a color-magnitude filtered Voronoi Tessellation Technique (VTT). Among the two matched filters, we find that the MF is more efficient in detecting faint clusters, whereas the AMF evaluates the redshifts and richnesses more accurately, therefore suggesting a hybrid method (HMF) that combines the two. The HMF outperforms the VTT when using a background that is uniform, but it is more sensitive to the presence of a non-uniform galaxy background than is the VTT; this is due to the assumption of a uniform background in the HMF model. We thus find that for the detection thresholds we determine to be appropriate for the SDSS data, the performance of both algorithms are similar; we present the selection function for each method evaluated with these thresholds as a function of redshift and richness. For simulated clusters generated with a Schechter luminosity function (M_r^* = -21.5 and alpha = -1.1) both algorithms are complete for Abell richness >= 1 clusters up to z ~ 0.4 for a sample magnitude limited to r = 21. While the cluster parameter evaluation shows a mild correlation with the local background density, the detection efficiency is not significantly affected by the background fluctuations, unlike previous shallower surveys.

[13] oai:arXiv.org:astro-ph/0107406 [pdf] - 43777

Parallel Programming with MatlabMPI

Kepner, Jeremy

Comments: 3 pages (including 2 color figures). Accepted to the High Performance Embedded Computing (HPEC 2001) workshop

Submitted: 2001-07-20

MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI ``look and feel'' on top of standard Matlab file I/O, resulting in an extremely compact (~100 lines) and ``pure'' implementation which runs anywhere Matlab runs. The performance has been tested on both shared and distributed memory parallel computers. MatlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using MatlabMPI achieved a speedup of ~70 on a parallel computer.

[14] oai:arXiv.org:astro-ph/0107084 [pdf] - 1048508

A Multi-Threaded Fast Convolver for Dynamically Parallel Image Filtering

Kepner, Jeremy

Comments: 25 pages including color figures. Submitted to the Journal of Parallel and Distributed Computing

Submitted: 2001-07-04

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ``pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments). This paper describes a fast (FFT based) method for convolving images, which is also well suited to very large images. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within in a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code.

[15] oai:arXiv.org:astro-ph/0104137 [pdf] - 1048507

Exploiting VSIPL and OpenMP for Parallel Image Processing

Kepner, Jeremy

Comments: 5 pages including 4 figures; to appear in the proceedings of ADASS X

Submitted: 2001-04-08

VSIPL and OpenMP are two open standards for portable high performance computing. VSIPL delivers optimized single processor performance while OpenMP provides a low overhead mechanism for executing thread based parallelism on shared memory systems. Image processing is one of the main areas where VSIPL and OpenMP can make a large impact. Currently, a large fraction of image processing applications are written in the Interpreted Data Language (IDL) environment. The aim of this work is to demonstrate that the performance benefits of these new standards can be brought to image processing community in a high level manner that is transparent to users. To this end, this talk presents a fast, FFT based algorithm for performing image convolutions. This algorithm has been implemented within the IDL environment using VSIPL (for optimized single processor performance) with added OpenMP directives (for parallelism). This work demonstrates that good parallel speedups are attainable using standards and can be integrated seamlessly into existing user environments.

[16] oai:arXiv.org:astro-ph/0009128 [pdf] - 37970

High Speed Interconnects and Parallel Software Libraries: Enabling Technologies for the NVO

Kepner, Jeremy

Comments: 5 pages, including 4 color figures, to appear in proceedings of "Virtual Observatories of the Future"

Submitted: 2000-09-07

The NVOs core data mining and archive federation activities are heavily dependent on the underlying data pipeline software necessary to translate the raw data into scientifically relevant source detections. The data pipeline software dictates: the raw data storage and retrieval mechanisms, the meaning and format of the fields in the source catalogs, and the ability of the NVO users to re-analyze raw data for their own purposes. Increasing the performance of the core data pipeline software so that it can address the needs of current and future high data rate surveys is an important activity that should be addressed in concert with the development of the NVO.

[17] oai:arXiv.org:astro-ph/0004304 [pdf] - 1048505

Cluster Detection in Astronomical Databases: the Adaptive Matched Filter Algorithm and Implementation

Kepner, Jeremy; Kim, Rita

Comments: Submitted to Data Mining and Knowledge Discovery; 25 pages including 13 color figures

Submitted: 2000-04-20

Clusters of galaxies are the most massive objects in the Universe and mapping their location is an important astronomical problem. This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters of galaxies in large astronomical databases. The Adaptive Matched Filter (AMF) algorithm presented here identifies clusters by finding the peaks in a cluster likelihood map generated by convolving a galaxy survey with a filter based on a cluster model and a background model. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.

[18] oai:arXiv.org:astro-ph/9912134 [pdf] - 1048515

Interfacing Interpreted and Compiled Languages to Support Applications on a Massively Parallel Network of Workstations (MP-NOW)

Kepner, Jeremy; Gokhale, Maya; Minnich, Ron; Marks, Aaron; DeGood, John

Comments: To appear in Cluster Computing, 22 pages including 8 color figures

Submitted: 1999-12-07

Astronomers are increasingly using Massively Parallel Network of Workstations (MP-NOW) to address their most challenging computing problems. Fully exploiting these systems is made more difficult as more and more modeling and data analysis software is written in interpreted languages (such as IDL, MATLAB, and Mathematica) which do not lend themselves to parallel computing. We present a specific example of a very simple, but generic solution to this problem. Our example uses an interpreted language (IDL) to set up a calculation and then interfaces with a computational kernel written in a compiled language (C). The IDL code then calls the C code as an external library. We have added to the computational kernel an additional layer, which manages multiple copies of the kernel running on a MP-NOW and returns the results back to the interpreted layer. Our implementation uses The Next generation Taskbag (TNT) library developed at Sarnoff to provide an efficient means for implementing task parallelism. A test problem (taken from Astronomy) has been implemented on the Sarnoff Cyclone computer which consists of 160 heterogeneous nodes connected by a ``fat'' tree 100 Mb/s switched Ethernet running the RedHat Linux and FreeBSD operating systems. Our first results in this ongoing project have demonstrated the feasibility of this approach and produced speedups of greater than 50 on 60 processors.

[19] oai:arXiv.org:astro-ph/9710329 [pdf] - 1048513

Inside-out Galaxy Formation

Kepner, Jeremy

Comments: 21 pages including 6 figures

Submitted: 1997-10-28, last modified: 1999-02-28

Current theories of galaxy formation have tended to focus on hierarchical structure formation, which is the most likely scenario for cosmological models with lots of power at small scales (e.g. standard cold dark matter). Models with little small scale power lead to scenarios closer to spherical collapse. Recently favored power spectra (e.g. CDM+Lambda) lie somewhere in between suggesting that both types of processes are important and may vary over time due to gaseous reheating. From this viewpoint this paper explores a very simple inside out scenario for galaxy formation. This scenario is a natural result of synthesizing earlier work on DM halos, spherical collapse, and gas redistribution via angular momentum. Although, this model is highly simplified and is not designed to accurately describe the detailed formation of any individual galaxy, it does (by design) predict the overall features of galaxies. In addition, old bulges and young disks are an almost unavoidable result of these very simple models. This scenario may provide a useful framework for both observers and theoreticians to think about galaxy formation.

[20] oai:arXiv.org:astro-ph/9901313 [pdf] - 104907

Absorption Line Signatures of Gas in Mini Dark Matter Halos

Kepner, Jeremy; Tripp, Todd; Abel, Tom; Spergel, David

Comments: 37 pages, 16 figures, accepted to the Astronomical Journal

Submitted: 1999-01-21

Recent observations and theoretical calculations suggest that some QSO absorption line systems may be due to gas in small dark matter halos with circular velocities on the order of 30 km/s. Additional observational evidence suggests that, in general, many absorption line systems may also be multi-phase in nature. Thus, computing the absorption lines of mini-halos, in addition to providing signatures of small halos, is a natural way to explore multi-phase behavior. The state of gas in mini-halos is strongly affected by the background UV radiation field. To address this issue a code was developed that includes many of the chemical and radiative processes found in CLOUDY and also incorporates spherically symmetric multi-wavelength radiative transfer of an isotropic field, non-equilibrium chemistry, heating, cooling and self-consistent quasi hydro-static equilibrium gas dynamics. With this code detailed simulations were conducted of gas in mini-halos using different types of background spectra. From these simulations the absorption line signatures of the gas were computed and compared with a variety of observations: high redshift metal lines, He lines and low redshift metal line systems. Based on these results the mini-halo model absorption line signatures appear to be consistent with many current observations given a sufficiently soft spectrum.

[21] oai:arXiv.org:astro-ph/9803125 [pdf] - 253525

An Automated Cluster Finder: the Adaptive Matched Filter

Kepner, Jeremy; Fan, Xiaohui; Bahcall, Neta; Gunn, James; Lupton, Robert; Xu, Guohong

Comments: 32 pages, 12 figures, accepted to ApJ

Submitted: 1998-03-10, last modified: 1999-01-15

We describe an automated method for detecting clusters of galaxies in imaging and redshift galaxy surveys. The Adaptive Matched Filter (AMF) method utilizes galaxy positions, magnitudes, and---when available---photometric or spectroscopic redshifts to find clusters and determine their redshift and richness. The AMF can be applied to most types of galaxy surveys: from two-dimensional (2D) imaging surveys, to multi-band imaging surveys with photometric redshifts of any accuracy (2.5D), to three-dimensional (3D) redshift surveys. The AMF can also be utilized in the selection of clusters in cosmological N-body simulations. The AMF identifies clusters by finding the peaks in a cluster likelihood map generated by convolving a galaxy survey with a filter based on a model of the cluster and field galaxy distributions. In tests on simulated 2D and 2.5D data with a magnitude limit of r' ~ 23.5, clusters are detected with an accuracy of Delta z ~ 0.02 in redshift and ~10% in richness to z < 0.5. Detecting clusters at higher redshifts is possible with deeper surveys. In this paper we present the theory behind the AMF and describe test results on synthetic galaxy catalogs.

[22] oai:arXiv.org:astro-ph/9607097 [pdf] - 95039

A New Statistic for Redshift Surveys: the Redshift Dispersion of Galaxies

Kepner, Jeremy; Summers, Frank; Strauss, Michael

Comments: LaTex 27 pages, 8 Postscript figures, added one figure and some changes to the text

Submitted: 1996-07-18, last modified: 1998-01-02

We present a new statistic-the redshift dispersion-- which may prove useful for comparing next generation redshift surveys (e.g., the Sloan Digital Sky Survey) and cosmological simulations. Our statistic is specifically designed for the projection of phase space which is directly measured by redshift surveys. We find that the redshift dispersion of galaxies as a function of the projected overdensity has a strong dependence on the cosmological density parameter Omega. The redshift dispersion statistic is easy to compute and can be motivated by applying the Cosmic Virial Theorem to subsets of galaxies with the same local density. We show that the velocity dispersion of particles in these subsets is proportional to the product of Omega and the local density. Low resolution N-body simulations of several cosmological models (open/closed CDM, CDM+Lambda, HDM) indicate that the proportionality between velocity dispersion, local density and Omega holds over redshift scales in the range 50 km/s to 500 km/s. The redshift dispersion may provide an interesting means for comparing volume-limited subsamples of the Sloan Digital Sky Survey to equivalent N-body/hydrodynamics simulations.

[23] oai:arXiv.org:astro-ph/9704076 [pdf] - 97038

The Delayed Formation of Dwarf Galaxies

Kepner, Jeremy; Babul, Arif; Spergel, David

Comments: 23 pages (including 8 figures). Figures 3 and 8 best viewed in color

Submitted: 1997-04-09

One of the largest uncertainties in understanding the effect of a background UV field on galaxy formation is the intensity and evolution of the radiation field with redshift. This work attempts to shed light on this issue by computing the quasi-hydrostatic equilibrium states of gas in spherically symmetric dark matter halos (roughly corresponding to dwarf galaxies) as a function of the amplitude of the background UV field. We integrate the full equations of radiative transfer, heating, cooling and non-equilibrium chemistry for nine species: H, H^+, H^-,H_2, H_2^+, He, He^+, He^{++}, and e^-. As the amplitude of the UV background is decreased the gas in the core of the dwarf goes through three stages characterized by the predominance of ionized (H^+), neutral (H) and molecular (H_2) hydrogen. Characterizing the gas state of a dwarf galaxy with the radiation field allows us to estimate its behavior for a variety of models of the background UV flux. Our results indicate that a typical radiation field can easily delay the collapse of gas in halos corresponding to 1-$\sigma$ CDM perturbations with circular velocities less than 30 km/s.

[24] oai:arXiv.org:astro-ph/9307028 [pdf] - 90853

Hubble Space Telescope Images of the Subarcsecond Jet in DG Tau

Kepner, J.; Hartigan, P.; Yang, C.; Strom, S.

Comments: 6 pages, 3 figures included. All in postscript, please read instructions at the beginning of the file. Accepted by the Ap.J. Letters

Submitted: 1993-07-17

We have applied a new restoration technique to archival [O~I], H$\alpha$, and continuum HST images of DG~Tau. The restored [O~I] and H$\alpha$ images show that DG~Tau has a jet with a projected length of 25~AU and width $\leq$10~AU, and is already collimated at a projected distance of $\sim$~40~AU (0\farcs25) from the star. Such a narrow width and short collimation distance for a stellar jet places important constraints on theoretical models of jet formation.