Genomic Medicine Special Lectures

General Remarks on Genomic Medicine

Nowadays it has become possible to analyze individual genomes owing to the development of genome sequencing technology. Each cancer is caused by abnormality in some tumor suppressor genes, especially the driver gene. Thus, extraction of genes from a tissue containing cancer enables discovery of abnormal parts in the gene. It is also known that medical treatment should be selected according to the abnormal gene regardless of the organ site of the cancer for the treatment effectiveness. This is called individualized medicine of cancer with molecular-targeted medicine.

 

However, evidence has not been yet established for all genetic abnormalities and drug selections. Therefore, treatment is sometimes conducted as a clinical trial according to the consultation with each specialist and the consent of the patient. In addition, artificial intelligence (AI) is expected to play an active role in this genomic medicine research. AI is expected to facilitate automatic update of literature databases, discovery of new driver genes and biomarkers, and prediction of gene expressions.

Clinical Genetic Medicine in the field of Neurology

Gene-related tests in healthcare are composed of tests to detect pathogenic genes (nucleic acids) such as hepatitis virus, HIV, and tuberculosis, somatic cell genetic tests to detect acquired gene mutations that occur the birth such as leukemia, and genetic test to search for mutations in the germline related with genetic disorders that never change throughout the life and are transmitted to offspring.

 

The last classified germline mutations are individual, lifelong, shared with relatives, have the highest risk of discrimination, and are often unexpected, thus highly require privacy protection, careful examination and diagnosis, and mental support of the subject. On the other hand, curative treatment for hereditary diseases is rapidly developed, which has led to the significance for early detection, prevention, and treatment of genetic diseases.

 

However, many unknown hereditary diseases are still observed, and they require proof of clinically specific signs, pathologically unknown findings, and absence of known genetic mutations. In this case, artificial intelligence (AI) is expected to play a great role in extremely difficult identification of the causative genes.

Pathogenicity Assessment of Variant and Genetic Counseling

Genetic counseling is especially needed when germline mutations testing is conducted. Genetic testing sometimes reveals unexpected information, particularly other genetic mutation than those that were originally intended for the diagnosis. It is recommended to disclose such incidental findings if their treatments and preventive methods have been clinically established, their information is beneficial for health management of the patient and her relatives, and they are mutations (variants) of high certainty as the cause. These genes are designated by ACMG (American College of Medical Genetics and Genomics) recommendations. However, the patient’s wish as to the disclosure must be confirmed before the testing.

 

Pathogenicity of not-designated variants will be assessed with databases or the ACMG-AMP scoring. Sharing information with relatives as to variants will also be considered step-by-step and continuously. Artificial Intelligence (AI) is expected to contribute to data collection, risk assessment, and the like of variants.

Oncology Genomic Medicine and AI

At present, the role of artificial intelligence (AI) is growing in cancer genomic medicine. In particular, AI can carry out cancer diagnosis and classification from gene mutations with considerably high accuracy. That is, early diagnosis of cancer has become available by examination of genes. In addition, the use of AI enables estimation of the proportion of tumor cells from pathological images. Improvement of AI technology has also led to detection of tumor cells with the equivalent accuracy to that by a pathologist.

 

To perform such genomic medicine, accumulation as well as classification of a huge amount of data is indispensable. Furthermore, it is necessary to find the driver gene as the cause of the cancer or the variant as the target of the drug from all mutant genes in each patient. Such a process indispensably requires not only big data and AI but also interpretation and judgement on the results by specialists.

 

The appearance of big data and AI has caused a change of drug development methodologies. Precision medicine is becoming the mainstream that treatment should be provided according to the characteristics of individual patients.

Oncology Genomic Medicine from the Standpoint of Pathologist

Chromosomal abnormalities are classified into deletion, duplication, and inversion that occur within the same chromosome, and insertion and translocation that occur between different chromosomes. Genetic abnormalities include missense mutation, which causes amino acid mutations due to substitution, deletion, and insertion of bases in codons, nonsense mutation, which exchanges an amino codon with a termination codon, and frameshift mutation in which the leading frame of amino acids slides due to insertion and deletion of bases.

 

Normally, there is a mechanism to repair DNA in such a mutation, including base excision repair, nucleic acid excision repair, mismatch repair, and double chain excision repair. Failure of this repair mechanism can cause a possibility of canceration as the driver gene. Therefore, at present, the treatment for cancer is selected for each driver gene rather than for each organ.

 

However, the selection of therapeutic agents is complicated because cancer cells also have a mechanism to acquire resistance to the drug. Thus, support vector machine, which is one type of artificial intelligence (AI), is sometimes employed to carry out genetic analysis from morphological diagnosis for selection of the treatment.

Genome and AI

Gene mutation in cancer is classified into 1, normal dose production of overactive proteins due to mutations in the translation region; 2, overproduction of normal proteins by gene amplification; and 3, overproduction of normal proteins by regulatory DNA in the vicinity or production of overactive fused proteins by fusion with gene that is actively translated due to chromosomal rearrangements.

 

On the other hand, mutation in cancer suppressor genes generally requires gene mutations in both parents to give rise to a cancer. Exceptionally, however, gene mutation in one parent can cause a cancer in cases of complete loss of the normal chromosome, deletion of the region containing the normal gene, functional deletion mutation of the normal gene, and suppression of gene activity of the normal gene due to an epigenetic disease.

 

Not all gene mutations cause a disease, and the pathological significance of mutation (variant) is determined based on existing databases and dissertation information. Mutations without such information are called VUS (variant of unknown significance). Various artificial intelligence (AI) is used to discover this new pathological significance.

Medical Big Data and AI

At present, the genes of patients, particularly of cancer patients are compared with normal genes to search for gene mutations that cause cancer, especially driver genes and to discover treatment drugs for them. Although every genetic abnormality and its treatment has not been established, the driver gene, which is the most relevant gene mutation, is identified for each patient to administer the drug that appears to the most effective based on past evidence. Under some conditions, application of insurance medical treatment and financial support from enterprises may be available.

 

However, identification of the causative gene mutation is not simple, and the effectiveness of the therapeutic drug depends on the combination of various gene mutations, pathological findings, radiological findings, blood test findings, and so forth. Therefore, search for effective therapeutic agents requires construction and analysis of medical big data containing different kinds of information. This analysis also requires artificial intelligence (AI). At the moment, there are a lot of issues in the use of medical big data and AI, such as the utilization of electronic medical records and privacy measures.

Genome and AI 2

At present, artificial intelligence (AI) is used for gene analysis, particularly dimension reduction, predictive models, machine learning, regularization, and multiomics. Principal component analysis (PCA) is the most common method of dimension reduction and involves axis rotation and matrix factorization. Supervised machine learning is mainly carried out among predictive models and analyzes the relationship between gene expression levels and patient subtypes. Typical examples are decision trees, random forests, and logistic regression. Random forests come with variable importance.

 

Genes are predictive variables in gene analysis. However, the use of all genes for predictors will cause overfitting of the model with the training data. Therefore, it is necessary to consider the trade-off between bias and variance. In addition, the regularization method can also prevent overfitting of the model.

 

Nowadays, it is also possible to obtain multidimensional omics datasets, such as RNA-seq, DNA-seq, BS-seq from the same sample to analyze mRNA transcription, DNA methylation, mutation, gene expression, and so forth.

Clinical Genetics / Genomic Medicine and Medical Ethics

Genes create a diversity of biological features. Mendel’s laws of inheritance consist of the laws of superiority and inferiority, the laws of segregation, and the laws of independence. There are also genetic diseases based on the laws. Autosomal dominant disorders are generally mild, and depend on mutations, reproductive fitness, permeability and gonad mosaics, expressivity, and codominant inheritance. Autosomal recessive disorders are infrequent, but often severe, and can sometimes be caused by compound heterozygous or uniparental disomy. X-linked recessive inherited disorders often occur in men as a patient and in women as a carrier. X-linked dominant inherited disorders often occur in women and become severe in men. Non-Mendel genetic diseases include maternally inherited mitochondrial disorders, multifactorial genetic disorders, chromosomal disorders, and epigenetic disorders.

 

Genetic information is characterized by lifelong immutability, commonality among relatives, predictability of future onset, ambiguity of disease onset, and feasibility of testing. It is useful for health management, but also includes at the same time problems as to privacy protection and discrimination prevention. The use of artificial intelligence (AI) enables screening of patients having genetic diseases as well as basic education on genes.

Genomic Medicine and Transition of Social Medical Information Infrastructure

As a prerequisite for genomic medicine, the environment for the accumulation and use of life information is essential. This database is called biobank, which contains samples, such as DNA, and digital data concerning these samples at the same time. It requires a freezer to hold the samples and a memory to hold the digital data.

 

Furthermore, genomic medicine requires the following basic information infrastructure elements: 1, standardization for information sharing, exchange, and effective use; 2, information security for personal information protection; and 3, translation research information infrastructure.

 

Information standardization is to simplify information sharing and exchange between organizations by unifying the names, description methods, orders, file formats, etc. to utilize the information as a unified database. Today, medical information is undergoing international standardization.

 

At present, intentional, or inadvertent privacy infringements are increasing, for which a person and medical organization may be held criminally, civilly, and administratively responsible. This has led to recognition of the importance of information security measure.

Oncology Genomic Medicine x AI

In the field of oncology genomic medicine, artificial intelligence (AI) is particularly expected for genomic data analysis, natural language processing in knowledge database, and annotations in report creation.

 

Genomic data analysis comprises analysis of genome raw data obtained by the next-generation sequencer, variant genes mapping, and detection of variants that appear to be medically important based on past data. In addition, GPUs and FPGAs accelerate this series of work pipelines.

 

Next, in the knowledge database, AI integrates multiple databases worldwide to generate a system for collective search.

 

In report creation, AI supports creation of explanatory supplementary documents for reporting as well as patients. It is possible to refer to the knowledge database to create decision-making aids that support clinical signification and to provide the expert panel with them as an annotation service. It will also be possible to create explanatory supplementary materials with illustrations that are easy to understand for the patients and their relatives.

Efforts for Medical AI and Points for Commercializing Medical AI

Generally, artificial intelligence (AI) undergoes research and development for a purpose different from healthcare. Sometimes such AI is also utilized for healthcare effectively. For example, technology for passersby’s face recognition is applied to tumor diagnostic imaging in healthcare. For AI, face detection and matching amount to the same kind of task as tumor detection and matching.

 

However, clarification of the judgement ground is important in healthcare. Thus, if possible, simple decision trees and regression analysis are more likely preferred to complex deep learning of black-box type. Methods are also being studied to combine decision trees and regression analysis to use different regression equations for different cases.

 

For commercialization of AI medical devices, it is required to carry out clinical trials on the safety and efficacy according to regulations set forth for different risks and then to submit information documents, including data, AI algorithms, hyperparameters that were employed to develop the healthcare AI product.

From the Point of Informatics

Machine learning is often used for study of genetic disorders. Decision trees and random forests are particularly convenient in supervised learning and provide the importance of features and high predictive accuracy. In addition, data can be analyzed by default in many cases. Principal component analysis is typical in unsupervised learning, and useful for illustrating multidimensional data. The R language is particularly suitable for machine learning and characteristic for simple programming and data visualization. However, Python may be more popular for complex deep learning.

 

Prediction of disease from protein structure data involves first mutant structure modeling and then calculation of the relationship between the mutant and disease from the point of structural changes between the wild type and mutants. Decision trees, random forests etc. are used based on the data obtained here. It can be performed by just a few lines of program in R with reference to library. Moreover, deep learning is also recently used to predict protein structure. Easy-to-use software has also been developed.