The last decade has seen a substantial increase in commodity computing and network performance, mainly as a result of faster hardware and more sophisticated software. These commodity technologies have been used to develop low-cost high-performance computing systems, popularly called clusters, to solve resource-intensive problems in a number of application domains. Particularly, in the scientific arena, the availability of powerful computing resources has allowed scientists to broaden their simulations and experiments to take into account more parameters than ever before. Fast networks have made it possible to share data from instruments and results of experiments with collaborators around the globe almost instantaneously. Recently, research bodies have begun to launch ambitious programs that facilitate creation of such collaborations to tackle large-scale scientific problems. Collectively, such programs are termed eScience to denote the pivotal role played by the computational infrastructure for enabling collaborative research. A typical eScience scenario is shown in Figure 1. e-Science also envisages sharing scientific instruments such as particle accelerator (CERN Large Hadron Collider), commissioned as national/international infrastructure due to the high cost of ownership.
As a consequence of the large collaborations and the increased computational power, the data generated and analyzed within eScience programs are both massive and inherently distributed. Therefore, the challenges of such environments revolve around data managing its access, distribution, processing and storage. These challenges thus motivate creation of a computational infrastructure by coupling wide-area distributed resources such as databases, storage servers, high speed networks, supercomputers and clusters for solving large-scale problems, leading to what is popularly known as Grid computing. This is analogous to the electrical power grid that provides consistent, pervasive, dependable, transparent access to electric power irrespective of its source. As there are a large number of projects around the world working on developing Grids for different purposes at different scales, several definitions of Grid abound. The Globus Project defines Grid as ?an infrastructure that enables the integrated, collaborative use of high-end computers, networks, databases, and scientific instruments owned and managed by multiple organizations.? Another utility notion based Grid definition put forward by the Gridbus Project is ?Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed ?autonomous? resources dynamically at runtime depending on their availability, capability, performance, cost, and users? quality-of-service requirements?. The development of the Grid infrastructure, both hardware and software, has become the focus of a large community of researchers and developers in both academia and industry. The major problems being addressed by Grid developments are the social problems involved in collaborative research:
? improving distributed management whilst retaining full control over locally managed resources;
? improving the availability of data and identifying problems and solutions to data access patterns; and
? providing researchers with a uniform user- friendly environment that enables access to a wider range of physically distributed facilities improving productivity.
A high-level view of activities involved within a seamless and scalable Grid environment is shown in Figure 2. Grid resources are registered within one or more Grid information services. The end users submit their application requirements to the Grid resource broker which then discovers suitable resources by querying the Information services,