जैविक अनुसंधान क्षेत्रों में उच्च निष्पादन कम्प्यूटिंग (एचपीसी) का अनुप्रयोग

Full form of HPC is High Performance Computing. The article focuses on the application of HPC in agriculture and biological research to address complex computational issues, such as monitoring agricultural activities, assessing the impact of climate change, and providing weather-related information. Application of HPC in different agriculture and biological areas

 What is HPC?

High Performance Computing/Computer (HPC) refers to computational activities that require more than one computer to perform a task, and its objective is to achieve higher performance by aggregating computing power. The main focus of HPC is on developing parallel processing algorithms and software that can divide programs into fragments to be executed simultaneously by separate processors.

HPC found application in various areas, like, product development and optimization, data analysis and storage, large-scale research, simulations, and modeling of complex processes. The use of High-Performance Computing (HPC) enables researchers to tackle intricate scientific problems through applications that require high-speed connectivity, minimal delay, and abundant computing power.

Traditionally, scientists and engineers face challenges such as waiting in long lines to access shared computing resources or the high cost of acquiring advanced hardware. HPC overcomes these obstacles and provides a more accessible and cost-effective solution for scientific problem-solving.

Table. History of HPC

Year

Name of Computing Power

1942

 Atanasoff-Berry Computer (ABC)

1946

 ENIAC

1949

 EDVAC and EDSAC

1951

 Manchester/Ferranti Mark I

1952

 Whirlwind

1961

 IBM 7030 (STRETCH)

1964

 CDC 6600

1969

 CDC 7600

1974

 ILLIAC IV

1976

 Cray-1

1979

 Goodyear MPP

1985

 Cray-2, Caltec Cosmic Cube and Connection Machine

1990

 Intel 80486

1995

 Intel Pentium Pro (P6 processor core)

Modern High Performance Computing

Advantages of using HPC resources:

  1. Cost-effective: By using HPC services, users can save on the expenses associated with procuring, setting up, and managing HPC clusters. They can also choose from various pricing options, such as On Demand, Reserved, or Spot Instances, to optimize their costs.
  2. Flexibility: The ability to quickly add or reduce computing resources to meet specific needs and workloads.
  3. Convenient Access: Users can easily launch computing tasks using APIs or management tools, streamlining workflows and maximizing efficiency and scalability. With quick access to HPC resources, innovation can be accelerated without the need to wait in long queues.

Application of High Performance Computing in agriculture and Biology:

The applications of High-Performance Computing (HPC) are numerous and wide-ranging, encompassing many areas of molecular medicine, personalized medicine, preventative medicine, gene therapy, drug development, microbial genome application, antibiotic resistance, crop improvement, evolutionary analysis, veterinary, comparative analysis, data warehousing, gene and protein annotation, SQL interface, and more.

The capabilities of HPC are being leveraged to tackle complex and data-intensive problems in these and other fields. The article focuses on the application of HPC in agriculture and biological research to address complex computational issues, such as monitoring agricultural activities, assessing the impact of climate change, and providing weather-related information. Application of HPC in different agriculture and biological areas is briefly described below:

1. Role of HPC in Climate change research and weather prediction:

The Earth's climate has changed due to the complex interaction of various factors such as the oceans, land masses, atmosphere, solar flux, and biosphere. HPC technologies are crucial in understanding the local and regional effects of these changes, such as weather patterns, storm surges, and flooding. Through the use of high-performance computing tools, complex models can be developed, validated, and tested to produce the data necessary for public policy formulation, regional planning, and effective decision-making. For instance, HPC can help answer questions related to rising sea levels, coastal waves during storms, and construction zoning. The benefits of using HPC in such scenarios are immeasurable, including protection of environmentally sensitive coastal regions, reduction of property loss, and saving lives during severe weather.

2. Role of HPC in Farm/Land Assessment:

To manage the haphazard and non-scientific farming practices and to optimize crop production and minimize environmental impact, it is necessary to manage variations in soil fertility and crop conditions. Geographical Information System (GIS) provides the solution by collecting, analyzing and presenting geographical data. By utilizing GIS, the soil attributes and crop yield can be determined, allowing for monitoring of various soil and crop characteristics such as soil moisture, growth, nutrient deficiencies, crop diseases, and insect infections. To address these issues, High Performance Computing (HPC) is utilized in GIS applications.

3. Role of HPC in Bioinformatics

The field of bioinformatics plays a crucial role in solving complex problems in biological sciences by using the tools of computer science to interpret the vast amount of data gathered by biologists. The focus of bioinformatics and computational biology is to examine biological phenomena and data through the use of mathematics, informatics, statistics, and computer science.

These approaches provide a means for large-scale and quantitative analysis of data from multiple disciplines and assist agriculture researchers in their analysis of genomics and proteomics data. Bioinformatics encompasses a wide range of studies and techniques aimed at analyzing and interpreting biological data, particularly in the realm of genomics and proteomics.

These areas of study can involve using mathematical and computational methods to understand protein structure and function, simulate protein motion, predict interactions, and analyze whole genomes or transcriptomes, and more. Essentially, bioinformatics helps researchers gain insights into various biological systems through large-scale and quantitative analyses.

The field of genetics investigates the inheritance and variation of individuals based on DNA, while genomics examines the genome's structure and function. Bioinformatics and computational techniques are integral to both, using data gathered through DNA/RNA sequencing and other cutting-edge methods.

The rapid growth of data generated from these technologies presents challenges for informatics and computational methods to keep pace. High performance computing systems are essential for large-scale data analysis in the fields of genetics and genomics.

In recent times, there has been an incredible progress in DNA sequencing technology, enabling the rapid and cost-effective sequencing of DNA for understanding molecular processes. This breakthrough has presented opportunities for sequencing applications and has also created the challenge of handling and analyzing vast amounts of sequence data. Previously, DNA sequencing was limited to short fragments and preserving sequence accuracy for longer reads. But, the sequencing of large DNA fragments with improved accuracy is a significant milestone towards more comprehensive data analysis and increasing the power of sequencing beyond its current capability.

Sequencing a genome using a cluster takes about a week to finish and generates massive amounts of data per run. Storing and processing this data requires a large-scale computer system. To handle these data and ensure that results are not lost, a robust and fault-tolerant high-performance computing system is necessary. The need for high performance clusters or HPC in DNA sequencing is imperative to accommodate the large data storage and data analysis requirements.

4. Role of HPC in Assembly and annotation:

The advent of DNA sequencing technology has greatly expanded the scale and breadth of sequencing data analysis. These massive DNA sequence data are used in a variety of biological studies, including genome and transcriptome sequencing, comparative genomics and transcriptomics, and personalized medicine. However, the sheer size and complexity of these data sets pose significant challenges in their analysis.

In response, computational biology has turned to the advancements in high performance computing to handle the large amounts of data. Two key computations in computational biology, DNA fragment alignment and assembly, can be optimized through the use of high performance computing. There are both free and commercial bioinformatics software available for assembly and alignment, and many of them require high performance computing for effective data analysis.

The task of genome assembly in bioinformatics is among the most challenging and computationally demanding. Assembly of a genome sequence is a complex and computationally intensive task. There are two different categories in genome sequence assembly: De-novo and Mapping. De-novo assembly is performed without a reference, while mapping assembly is reference-based. The De-novo approach is more demanding in terms of processing time and memory, as every read must be compared with every other read. These two methods require ample storage and a considerable amount of computing power, which can only be achieved through the use of High Performance Computing (HPC).

Annotating biological information to DNA sequences is referred to as genome annotation. This involves identifying non-coding portions of the genome, recognizing elements such as gene prediction, and assigning relevant biological information to these elements. Due to the constant increase in genome sequences and the limitations of analysis software and computational capacity, many researchers are turning to High Performance Computing (HPC) for storage, analysis, and data sharing. As the number of sequenced genomes continues to grow, so does the need for computing power to identify genes, determine their functions, and analyze their relationships. This exponential increase in computational demand necessitates the use of supercomputing resources.

 

5. Role of HPC in Evolutionary Analysis

The evolutionary relationship between completed and ongoing genome sequences is a topic of interest. How can similarities among these sequences be used to understand their evolutionary history? One way to answer this question is by constructing a phylogenetic tree, which is a visual representation of the evolutionary relationships between species, based on the comparison of their sequences or gene data.

This type of analysis has numerous practical applications, such as in drug discovery by pharmaceutical companies and the development of improved strains of crops by government laboratories. Moreover, reconstructing large phylogenies can provide new insights into the process of evolution itself. To carry out phylogenetic analysis, several algorithms provided in various software, which demand high performance computing to manage the vast amount of sequencing data. Hence, HPC is crucial for successful downstream phylogenetic analysis.

Remote Accessibility of High-Performance Computing systems

The advancement in technology has led to a change in CPU design, with a shift from single-core processors to parallel processing techniques using multi-core processors, which enhance computational performance and energy efficiency. Remote access is a solution that enables individuals to access data, programs, and computer services from a location that is separate from where the computer is stored.

This technology is ideal for providing support to remote employees or clients, which saves travel costs and improves service quality. For instance, remote access allows for the sharing of shipping documents with multiple clients during a web conference, or enables access to high-volume biological data from anywhere without the need to carry a bulky computer or hard drive. The advantage of remote access to high-performance computing (HPC) allows users to run programs and process data from a remote location, making it possible to analyze biological data and store the results on their computer.

Conclusion

High-Performance Computing (HPC) plays a critical role in advancing agricultural and biotechnological research. It provides the necessary computational support for researchers to carry out high-quality investigations in the agriculture and biological research sector. The outcome of these research efforts results in the development of superior crops, commodities, and processes, which contribute to the enhancement of agricultural productivity on a sustainable basis. Ultimately, HPC helps to address the food security challenges facing the country by fostering innovation in the agriculture sector.

 


Authors:

Ratna Prabha1, DP Singh2,*

1Division of Agricultural Bioinformatics, ICAR - Indian Agricultural Statistics Research Institute, New Delhi- 110012 (India)

2ICAR-Indian Institute of Vegetable Research, Varanasi - 221305 (India)

Email: This email address is being protected from spambots. You need JavaScript enabled to view it.

Related articles