Modern technologies for DNA and RNA sequencing allow for fast, parallel reading of multiple DNA fragments. While sequencing the first genome took 32 years, today with Next Generation Sequencing technologies we are able to sequence 40 genomes in about 2 days, producing 4 TB of text data (a file of about 100 GB per genome). This ability poses a challenge to computing infrastructures, which need to be able to ingest this amount of data and to process it through efficient genomics pipelines, exploiting heterogeneous resources such CPUs, GPUs, HPC clusters and storage exposing different Quality of Service (QoS) to perform the analysis with the optimal cost-performance balance.
This is the case of the Computational Genomic platform under development in the context of the collaboration between INFN and IRCCS AOU Sant’Orsola (the main research hospital in Bologna, Italy). The platform has been deployed on EPIC (Enhanced PrIvacy and Compliance) Cloud, the high security partition of INFN Cloud certified ISO 27001 27017 27018, where some sample genomic pipelines have been implemented and the needed security measures have been adopted to guarantee GDPR compliance.