Archive for the 'Research Computing' Category

PSearch: NACS and ICS Collaborate

PSearch

PSearch

Faculty and staff now have a powerful new tool for finding contacts through UCI’s online phone directory.  PSearch melds the directory data NACS maintains with state-of-the-art database research from the lab of ICS Professor Chen Li.

PSearch allows users to enter whatever information they may happen to have (first name, last name, department, phone number, etc.) and PSearch will offer any entries in the campus phone directory which match.  PSearch is error tolerant (you can find people with only an approximation of the spelling of a name) and real time (results are displayed and refined as you enter information.)

PSearch represents a collaboration between NACS and ICS.  Professor Li’s team offered the intelligent database search technology, and NACS offered the data and our user-interface experience.  Key contributors on Professor Li’s team include PhD student Rares Vernica at UCI and Guoliang Li, a visiting researcher from Tsinghua University, China.

PSearch is only one potential use of Dr. Li’s “type-ahead search” technology featured on his TASTIER project web page.  Future uses may involve other campus-wide or even UC-wide data sets.  This new technology makes it possible to simultaneously support full-text (google), quick-link, and directory searches in a single query as exhibited by the search box on the ICS home page.

New Computing Cluster

Computer Cluster

Computer Cluster

Last year, Broadcom graciously donated over 400 compute servers to UC Irvine. While the majority of the servers were distributed to campus researchers, NACS and the Bren School of Information and Computer Sciences have collaborated to bring a new general-purpose campus computing solution to researchers and graduate students at no charge.

Initially, the Broadcom Distributed Unified Cluster (BDUC) is comprised of 80 nodes: 40 nodes with 32-bit Intel processors and 40 nodes with 64-bit AMD processors. Broadcom is expected to donate newer servers over time, allowing nodes to be upgraded.  NACS and ICS plan to further expand the cluster as well, subject to available staff and Data Center resources.

BDUC includes standard open-source compilers, debuggers, and libraries; in addition, the MATLAB Distributed Computing Engine (DCE) will soon be available.  In the near future, BDUC will offer priority queues for research groups that provide financial support or hardware to the cluster.

BDUC is now available to all faculty, staff, and graduate using your UCInetID and password. To request an account, send an e-mail to bduc-request@uci.edu.  A new user how-to guide is available on the NACS website http://www.nacs.uci.edu/computing/bduc/newuser.html.

Moving Bulk Data

Bulk Data

Moving Bulk Data

Data transfer is a routine activity for most faculty, whether it’s sharing research data with colleagues, downloading research databases, or backing up vital data.  When the volume of data you’re transferring is in the tens or hundreds of megabytes, any tool can get the job done.  When you have gigabytes, or tens of gigabytes of data to move, more strategy is called for.

The tool and strategy you should use depends on the kind of data you have, the size of the data, whether you need to do the transfer once or repeatedly, and the computer and tools you’re most comfortable with.  Some ideas are outlined below, but NACS’s Research Computing Support maintains a detailed discussion with links to sites from which you can get data transfer tools.

Two basic strategies exist which can reduce the actual volume of data you need to transfer: compression and synchronization.  Unless your data is already in a compressed form (say, MP3 files), compression can save a great deal of time and network capacity.  Many transfer tools can even do on-the-fly compression.  If your files contain sensitive information, you may wish to consider encrypting the data you’re transferring, although this imposes a small time penalty.

The second strategy, particularly when you’re regularly moving the same data, is to use a synchronization tool that recognizes that only part of your data is new and needs to be transferred.  This can be particularly convenient if you have an entire directory tree you wish to send over the network.

A final technique which might apply in some cases is to make the best possible use of the network, either by setting up multiple parallel data-transfer streams, or even creating a special-purpose GridFTP node.  RCS staff can help you analyze your data transfer needs, choose a method, and set up your system.

RCS staff will also coordinate with NACS Network Engineers to ensure they are aware of research data transfer needs in various campus locations.  This will help inform future network upgrade plans.  In addition, in a few cases, it may be possible to upgrade network connections to higher speed to support critical research requirements.

Online Geographic Information System Services

Geographic Information System (GIS) software has traditionally been used on desktop computers to develop, display, and analyze spatial data.  Recent advances in web-based GIS software now allow researchers and instructors to upload their spatial data to online GIS services.  Colleagues and students can then view and query — and even edit — GIS data online via a web browser and without having GIS software installed on their desktop computers.

NACS uses ESRI’s ArcGIS Server to provide online GIS services.  Development of a new online GIS service is straightforward.  Once an ArcGIS Desktop document is developed, the document and associated GIS data files are uploaded to an ArcGIS Server. A GIS service is generated and custom data queries are assigned. The URL for this new GIS service can then be distributed for users to visit the new site.

NACS has been developing GIS services using ArcGIS Server for two years.  If you are interested in making your GIS data available online, we can develop a GIS service on our server using your data, or we can help you set up ArcGIS Server on your own or a departmental system.

Here are a few ArcGIS Server applications running on the NACS GIS server.  When viewing these GIS services, consider how your own spatial data might be displayed and explored using ArcGIS Server.

Recent Southern California fires (Freeway, Tea, and Sayre) using ESRI basemap data and fire perimeters collected by the Geospatial Multi-Agency Coordination.

California No Child Left Behind, within the UCI Department of Education.

History of North American Indians used for instruction within the UCI Department of History.

GIS Technology in UCI Research

Geographical Information Systems (GIS) is a technology which is finding ever broader use in the UCI research community. NACS Research Computing Specialist Tony Soeller has been supporting GIS software, teaching workshops, and working directly with faculty and graduate students on research projects to exploit GIS tools. Here are some recent examples.

Professor Bradford Hawkins in Ecology and Evolutionary Biology is tracking global species diversity in birds. GIS was used on a massive spatial data synthesis project on global bird ranges to georegister, digitize and rasterize bird range maps, then to summarize the number of bird species within discrete cells 27.5 to 220 km on a side. Numerous ArcGIS programs (ArcObjects and VBA) were written to help in the processing of the data.

Cristiane Surbeck completed her Ph.D. studies in Professor Stan Grant’s lab in Chemical Engineering and Materials Sciences and is now an Assistant Professor at University of Mississippi. Cris has been analyzing the Santa Ana River Watershed. Her research looked at the biological and sediment constituents of runoff into the Santa Ana River from three storm events within the watershed, and compared these data to rainfall volume and land use types which contributed to the runoff. GIS was used to synthesize land use data with rainfall data from the storm events, to delineate individual storm watersheds, and to determine the area of land within each land use type and the amount of precipitation within each of those land use types.

Satish Vutuku, a student in Professor Donald Dabdub’s lab in Mechanical and Aerospace Engineering is examining the impact on air quality of distributed power generation.

In one project, Satish assessed atmospheric impact of emissions from distributed power generation (DG) sources. DG refers to “on-site” generation of power using technologies such as fuel cells and micro turbines. Such DG installations emit pollutants in an urban area in a highly dispersed manner, in contrast to conventional huge power plants that emit pollutants as a concentrated plume far away from urban areas. In order to analyze effects of such DG emissions, Satish created a set of “DG scenarios” that would predict the adoption of DG technologies and corresponding emissions. The development of DG scenarios was based upon highly-detailed land-use data and population data. The land-use data were obtained as GIS files and were formatted to fit the model grid and resolution with help from Tony Soeller.
This is just a sampling of the many projects at UCI which are making use of GIS software and Tony’s expertise. Please contact NACS if you would be interested in exploring the relevance of GIS to your research project.

Mac Cluster Available

Apple has donated to UCI a small computational cluster based on its XServe product line.

This three-server cluster (two computational nodes and one control or “head” node) is built on the PowerPC chip. Each node features two 2Ghz PPC CPUs. The cluster also offers a 1.2Tb (1200 Gigabytes) disk array. The PowerPC architecture features high-performance true 64-bit floating point arithmetic, and is particularly well-suited for floating point and vector calculations.

Originally, NACS and faculty evaluated batch processing systems for the cluster under the Macintosh OS X operating system. Currently the cluster is running Linux, because faculty tend to be more familiar with that operating system, and to take advantage of the richer software development environment available under Linux.

GNU compilers for C, C++, and Fortran are available on the cluster, as well as the optimized IBM C/C++ compiler suite for PowerPC. Faculty may contact NACS for accounts, assistance with porting, and benchmarking.