Beowulf Cluster Computing with Linux 1st Edition by Thomas Sterling – Ebook PDF Instant Download/Delivery: 0262692740, 978-0262692748
Full download Beowulf Cluster Computing with Linux 1st Edition after payment

Product details:
ISBN 10: 0262692740
ISBN 13: 978-0262692748
Author: Thomas Sterling
Comprehensive guides to the latest Beowulf tools and methodologies.
Table of contents:
1 Introduction—Thomas Sterling
1.1 Definitions and Taxonomy
1.2 Opportunities and Advantages
1.3 A Short History
1.4 Elements of a Cluster
1.5 Description of the Book
I Enabling Technologies
2 An Overview of Cluster Computing—ThomasSterling
2.1 A Taxonomy of Parallel Computing
2.2 Hardware System Structure
2.2.1 Beowulf Compute Nodes
2.2.2 Interconnection Networks
2.3 Node Software
2.4 Resource Management
2.5 Distributed Programming
2.6 Conclusions
3 Node Hardware—Thomas Sterling
3.1 Overview of a Beowulf Node
3.1.1 Principal Specifications
3.1.2 Basic Elements
3.2 Processors
3.2.1 Intel Pentium Family
3.2.2 AMD Athlon
3.2.3 Compaq Alpha
3.2.4 IA64
3.3 Motherboard
3.4 Memory
3.4.1 Memory Capacity
3.4.2 Memory Speed
3.4.3 Memory Types
3.4.4 Memory Hierarchy and Caches
3.4.5 Package Styles
3.5 BIO
3.6 Secondary Storage
3.7 PCI Bus
3.8 Example of a Beowulf Node
3.9 Boxes, Shelves, Piles, and Racks
3.10 Node Assembly
3.10.1 Motherboard Preassembly
3.10.2 The Cas
3.10.3 Minimal Peripherals
3.10.4 Booting the System
3.10.5 Installing the Other Components
3.10.6 Troubleshooting
4 Linux—Peter H. Beckman
4.1 What Is Linux?
4.1.1 Why Use Linux for a Beowulf?
4.1.2 A Kernel and a Distribution
4.1.3 Open Source and Free Software
4.1.4 A Linux Distribution
4.1.5 Version Numbers and Development Methods
4.2 The Linux Kernel
4.2.1 Compiling a Kernel
4.2.2 Loadable Kernel Modules
4.2.3 The Beowulf Kernel Diet
4.2.4 Diskless Operation
4.2.5 Downloading and Compiling a New Kernel
4.2.6 Linux File Systems
4.3 Pruning Your Beowlf Node
4.3.1 inetd.conf
4.3.2 /etc/rc.d/init.d
4.3.3 Other Processes and Daemons
4.4 Other Considerations
4.4.1 TCP Messaging
4.4.2 Hardware Performance Counters
4.5 Final Tuning with /proc
4.6 Conclusions
5 Network Hardware—Thomas Sterling
5.1 Interconnect Technologies
5.1.1 The Ethernets
5.1.2 Myrinet
5.1.3 cLAN
5.1.4 Scalable Coherent Interface
5.1.5 QsNet
5.1.6 Infiniband
5.2 A Detailed Look at Ethernet
5.2.1 Packet Format
5.2.2 NIC Architecture
5.2.3 Hubs and Switches
5.3 Network Practicalities: Interconnect Choice
5.3.1 Importance of the Interconnect
5.3.2 Differences between the Interconnect Choices
5.3.3 Strategies to Improve Performance over Ethernet
5.3.4 Cluster Network Pitfalls
5.3.5 An Example of an Ethernet Interconnected Beowulf
5.3.6 An Example of a Myrinet Interconnected Cluste
6 Network Software—Thomas Sterling
6.1 TCP/IP
6.1.1 IP Addresses
6.1.2 Zero-Copy Protocols
6.2 Sockets
6.3 Higher-Level Protocols
6.3.1 Remote Procedure Calls
6.3.2 Distributed Objects: CORBA and Java RMI x
6.4 Distributed File Systems
6.4.1 NFS
6.4.2 AFS
6.4.3 Autofs: The Automounter
6.5 Remote Command Executionx
6.5.1 BSD R Commands
6.5.2 SSH—The Secure Shell
7 Setting Up Clusters: Installation and Configuration—Thomas Sterling and Daniel Savarese
7.1 System Access Models
7.1.1 The Standalone System
7.1.2 The Universally Accessible Machine
7.1.3 The Guarded Beowulf
7.2 Assigning Names
7.2.1 Statistically Assigned Addresses
7.2.2 Dynamically Assigned Addresses
7.3 Installing Node Software
7.3.1 Creating Tar Images6
7.3.2 Setting Upa Clone Root Partition
7.3.3 Setting UpBOOTP
7.3.4 Building a Clone Boot Floppy
7.4 Basic System Administration
7.4.1 Booting and Shutting Down 1
7.4.2 The Node File System
7.4.3 Account Management
7.4.4 Running Unix Commands in Parallel
7.5 Avoiding Security Compromises
7.5.1 System Configuration
7.5.2 Restricting Host Access
7.5.3 Secure Shell
7.5.4 IP Masquerading
7.6 Job Scheduling x
7.7 Some Advice on Upgrading Your Software
8 How Fast Is My Beowulf?—David Bailey
8.1 Metrics 151
8.2 Ping-Pong Test 154
8.3 The LINPACK Benchmark 154
8.4 The NAS Parallel Benchmark Suite 156
II Parallel Programming
9 Parallel Programming with MPI—William Gropp
and Ewing Lusk
161
9.1 Hello World in MPI 162
9.1.1 Compiling and Running MPI Programs 163
9.1.2 Adding Communication to Hello World 165
9.2 Manager/Worker Example 169
9.3 Two-Dimensional Jacobi Example with One-Dimensional
Decomposition 174
9.4 Collective Operations 178
9.5 Parallel Monte Carlo Computation 183
9.6 Installing MPICH under Linux 183
9.6.1 Obtaining and Installing MPICH 183
9.6.2 Running MPICH Jobs with the ch p4 Device 186
9.6.3 Starting and Managing MPD 187
9.6.4 Running MPICH Jobs under MPD 189
xii Contents
9.6.5 Debugging MPI Programs 189
9.6.6 Other Compilers 191
9.7 Tools 192
9.7.1 Profiling Libraries 192
9.7.2 Visualizing Parallel Program Behavior 193
9.8 MPI Implementations for Clusters 194
9.9 MPI Routine Summary 194
10 Advanced Topics in MPI Programming—William
Gropp and Ewing Lusk
199
10.1 Dynamic Process Management in MPI 199
10.1.1 Intercommunicators 199
10.1.2 Spawning New MPI Processes 200
10.1.3 Revisiting Matrix-Vector Multiplication 200
10.1.4 More on Dynamic Process Management 202
10.2 Fault Tolerance 202
10.3 Revisiting Mesh Exchanges 204
10.3.1 Blocking and Nonblocking Communication 205
10.3.2 Communicating Noncontiguous Data in MPI 207
10.4 Motivation for Communicators 211
10.5 More on Collective Operations 213
10.6 Parallel I/O 215
10.6.1 A Simple Example 217
10.6.2 A More Complex Example 219
10.7 Remote Memory Access 221
10.8 Using C++ and Fortran 90 224
10.9 MPI, OpenMP, and Threads 226
10.10 Measuring MPI Performance 227
10.10.1 mpptest 227
10.10.2 SKaMPI 228
10.10.3 High Performance LINPACK 228
10.11 MPI-2 Status 230
Contents xiii
10.12 MPI Routine Summary 230
11 Parallel Programming with PVM—Al Geist and
Stephen Scott
237
11.1 Overview 237
11.2 Program Examples 242
11.3 Fork/Join 242
11.4 Dot Product 246
11.5 Matrix Multiply 251
11.6 One-Dimensional Heat Equation 257
11.7 Using PVM 265
11.7.1 Setting UpPVM 265
11.7.2 Starting PVM 266
11.7.3 Running PVM Programs 267
11.8 PVM Console Details 269
11.9 Host File Options 272
11.10 XPVM 274
11.10.1 Network View 276
11.10.2 Space-Time View 277
11.10.3 Other Views 278
12 Fault-Tolerant and Adaptive Programs with
PVM—Al Geist and Jim Kohl
281
12.1 Considerations for Fault Tolerance 282
12.2 Building Fault-Tolerant Parallel Applications 283
12.3 Adaptive Programs 289
III Managing Clusters
13 Cluster Workload Management—James Patton
Jones, David Lifka, Bill Nitzberg, and Todd Tannenbaum
301
13.1 Goal of Workload Management Software 301
13.2 Workload Management Activities 302
xiv Contents
13.2.1 Queueing 302
13.2.2 Scheduling 303
13.2.3 Monitoring 304
13.2.4 Resource Management 305
13.2.5 Accounting 305
14 Condor: ADistributed Job Scheduler—Todd
Tannenbaum, Derek Wright, Karen Miller, and Miron Livny
307
14.1 Introduction to Condor 307
14.1.1 Features of Condor 308
14.1.2 Understanding Condor ClassAds 309
14.2 Using Condor 313
14.2.1 Roadmapto Using Condor 313
14.2.2 Submitting a Job 314
14.2.3 Overview of User Commands 316
14.2.4 Submitting Different Types of Jobs: Alternative
Universes 323
14.2.5 Giving Your Job Access to Its Data Files 329
14.2.6 The DAGMan Scheduler 330
14.3 Condor Architecture 332
14.3.1 The Condor Daemons 333
14.3.2 The Condor Daemons in Action 334
14.4 Installing Condor under Linux 336
14.5 Configuring Condor 338
14.5.1 Location of Condor’s Configuration Files 338
14.5.2 Recommended Configuration File Layout for a
Cluster 339
14.5.3 Customizing Condor’s Policy Expressions 340
14.5.4 Customizing Condor’s Other Configuration Settings 343
14.6 Administration Tools 343
14.6.1 Remote Configuration and Control 343
14.6.2 Accounting and Logging 344
14.6.3 User Priorities in Condor 345
14.7 Cluster SetupScenarios 346
Contents xv
14.7.1 Basic Configuration: Uniformly Owned Cluster 346
14.7.2 Using Multiprocessor Compute Nodes 347
14.7.3 Scheduling a Distributively Owned Cluster 348
14.7.4 Submitting to the Cluster from Desktop
Workstations 349
14.7.5 Expanding the Cluster to Nondedicated (Desktop)
Computing Resources 349
14.8 Conclusion 350
15 Maui Scheduler: AMultifunction Cluster Scheduler—David B. Jackson
15.1 Overview
15.2 Installation and Initial Configuration
15.2.1 Basic Configuration
15.2.2 Simulation and Testing
15.2.3 Production Scheduling
15.3 Advanced Configuration
15.3.1 Assigning Value: Job Prioritization and Node
Allocation
15.3.2 Fairness: Throttling Policies and Fairshare
15.3.3 Managing Resource Access: Reservations,
Allocation Managers, and Quality of Service
15.3.4 Optimizing Usage: Backfill, Node Sets, and
Preemption
15.3.5 Evaluating System Performance: Diagnostics,
Profiling, Testing, and Simulation
15.4 Steering Workload and Improving Quality of Information 365
15.5 Troubleshooting
15.6 Conclusions
16 PBS: Portable Batch System—James Patton Jones
16.1 History of PBS
16.1.1 Acquiring PBS
16.1.2 PBS Features
16.1.3 PBS Architecture
16.2 Using PBS x
16.2.1 Creating a PBS Job
16.2.2 Submitting a PBS Jobx
16.2.3 Getting the Status of a PBS Job5
16.2.4 PBS Command Summary
16.2.5 Using the PBS Graphical User Interface
16.2.6 PBS Application Programming Interface
16.3 Installing PBS
16.4 Configuring PBS
16.4.1 Network Addresses and PBS
16.4.2 The Qmgr Command
16.4.3 Nodes
16.4.4 Creating or Adding Nodes
16.4.5 Default Configuration
16.4.6 Configuring MOM
16.4.7 Scheduler Configuration
16.5 Managing PBS
16.5.1 Starting PBS Daemons
16.5.2 Monitoring PBS
16.5.3 Tracking PBS Jobs
16.5.4 PBS Accounting Logs
16.6 Troubleshooting
16.6.1 Clients Unable to Contact Server
16.6.2 Nodes Down
16.6.3 Nondelivery of Output
16.6.4 Job Cannot Be Executed
17 PVFS: Parallel Virtual File System—Walt Ligon and Rob Ross
17.1 Introduction
17.1.1 Parallel File Systems
17.1.2 Setting Upa Parallel File System
17.1.3 Programming with a Parallel File System
17.2 Using PVFS
17.2.1 Writing PVFS Programs
17.2.2 PVFS Utilities
17.3 Administering PVFS
17.3.1 Building the PVFS Components
17.3.2 Installation
17.3.3 Startupand Shutdown
17.3.4 Configuration Details
17.3.5 Miscellanea
17.4 Final Words
18 Chiba City: The Argonne Scalable Cluster—Remy Evard
18.1 Chiba City Configuration
18.1.1 Node Configuration
18.1.2 Logical Configuration
18.1.3 Network Configuration
18.1.4 Physical Configuration
18.2 Chiba City Timeline
18.2.1 Phase 1: Motivation
18.2.2 Phase 2: Design and Purchase
18.2.3 Phase 3: Installation
18.2.4 Phase 4: Final Development
18.2.5 Phase 5: Early Users
18.2.6 Phase 6: Full Operation
18.3 Chiba City Software Environment
18.3.1 The Computing Environment
18.3.2 Management Environment
18.4 Chiba City Use
18.5 Final Thoughts
18.5.1 Lessons Learned
18.5.2 Future Directions
19 Conclusions—Thomas Sterling
19.1 Future Directions for Hardware Components
19.2 Future Directions for Software Components
19.3 Final Thoughts
People also search for:
beowulf computing cluster
what is a beowulf cluster
beowulf clusters of computer system uses
beowulf cluster for gaming
beowulf cluster applications
Tags:
Thomas Sterling,Beowulf Cluster,Computing with Linux



