Hi! My name is Razvan Ioan Panea and I am a Lead Genomic Data Engineer at Regeneron Genetics Center. I am currently leading the engineering efforts in the Genome Informatics group to ensure an at-scale and at at-speed genomics production environment.
Razvan Panea
Contact me
PhD in Computational Biology and Bioinformatics • August 2014 - May 2020
Thesis: A Cloud-Based Infrastructure for Cancer Genomics
B.S. in Applied and Computational Mathematics • August 2011 - May 2014
Thesis: Metabolic network modeling of Theobroma Cacao
Lead Genomic Data Engineer • August 2020 - Present
Optimized and innovated genomics workflows that process and analyze over 500,000 samples per year. Led the software engineering efforts by applying software development best practices, specifically version control using Git, software development life cycle tracking using Jira, and continuous integration, delivery and testing methods through Amazon Web Services CodePipeline. Worked with bioinformaticians and data scientists to develop tools on data platforms such as DNANexus to facilitate at-scale, at-speed genomics pipelines for QC, variant calling and multi-omics analysis. Worked closely with other data engineers to integrate production with distributed compute environments. Interacted with software engineers and external technology partners to ensure 24/7 production uptime.
Research Software Engineer • September 2015 - May 2020
Led a group of 5 people to develop CloudConductor, a modular, scalable, elastic, parallelizable and extensible framework to generate different analysis pipelines. Researched cloud-based services for optimizing and improving the performance of analysis pipelines. Integrated over 50 bioinformatics tools into the newly developed framework. Integrated and tested the framework on Google Cloud Platform and currently working to make the framework platform-agnostic. Finally, we extended CloudConductor to a complete infrastructure using Kubernetes for automatic deployment and Django for user interface.
Research Assistant • January 2018 - November 2019
Sequenced the whole genome and transcriptome of 101 Burkitt Lymphoma cases of different subtypes. Identified genetic alterations in the sequencing data using the newly developed cloud-based framework. Identified mutations in driver genes, such as MYC, ID3 and DDX3X, the presence of the Epstein-Barr virus (EBV) in the samples, and translocations characteristic to Burkitt Lymphoma. Identified associations between lymphoma subtypes and EBV status. Work has been accepted and published in the Blood journal.
Software Developer • June 2013 - August 2014
Interned and continued working at Max Planck Institute for Marine Microbiology to implement features for the SILVA project. Researched SWIG and implemented SWIG interfaces in the SILVA Project to bind and call methods implemented in Python on C++ objects. Benchmarked, compared and deployed a new more efficient phylogenetic tree generator software. Implemented additional features to ensure software stability.
Research Assistant • March 2013 - June 2014
Developed a software in Java to aggregate and analyze the expression levels of all neighboring genes at different time points in E.Coli life cycle. Identified a correlation between the differential expression between two neighboring genes and their transcriptional orientation to each other.
Breakout session • July 2018
Title: Speeding up Research in Genomics
Poster presentation • October 2017
Title: A Cloud-Based Framework for Cancer Genomics
Poster presentation • June 2016
Title: Defining the Microbiome of Lymphomas
Poster presentation • October 2015, October 2016
Title: Defining the Microbiome of Lymphomas
Programming: Python, C, R, Java, C++, SQL
Computing Platforms: Google Cloud, SLURM, Kubernetes, AWS
Libraries/Packages: Apache Libcloud, Plotly, Pandas, SQLAlchemy
Operating Systems: Linux, Windows, OS X, Chrome OS
Software: PyCharm, Jupyter Notebook, RStudio, Adobe Illustrator, Microsoft Suite