Ishani Mondal
Abstract
Off-the-shelf biomedical embeddings obtained from the recently released various pre-trained language models (such as BERT, XLNET) have demonstrated state-of-the-art results (in terms of accuracy) for the various natural language understanding tasks (NLU) in the biomedical domain. Relation Classification (RC) falls into one of the most critical tasks. In this paper, we explore how to incorporate domain knowledge of the biomedical entities (such as drug, disease, genes), obtained from Knowledge Graph (KG) Embeddings, for predicting Drug-Drug Interaction from textual corpus. We propose a new method, BERTKG-DDI, to combine drug embeddings obtained from its interaction with other biomedical entities along with domain-specific BioBERT embedding-based RC architecture. Experiments conducted on the DDIExtraction 2013 corpus clearly indicate that this strategy improves other baselines architectures by 4.1% macro F1-score.
Introduction
During the concurrent administration of multiple drugs to a patient, there seems to be a possibility in which an ailment might get cured or it can lead to serious side-effects. These type of interactions are known as Drug-Drug Interactions (DDIs). Predicting drug-drug interactions (DDI) is a difficult task as it requires to understand the underlying action principle of the interacting drugs. Numerous efforts by the researchers have been observed recently in terms of automatic extraction of DDIs from the textual corpus (Sahu and Anand 2018), (Liu etal. 2016), (Sun etal. 2019), (Li and Ji 2019), (Mondal 2020) and predicting unknown DDI from KG (Purkayastha etal. 2019). Automatic extraction of DDI from texts helps to maintain large-scale databases and thereby facilitate the medical experts in their diagnosis.
In parallel to the progress of DDI extraction from the textual corpus, some efforts have been observed recently where the researchers came up with various strategies of augmenting chemical structure information of the drugs and textual description of the drugs (Zhu etal. 2020) to improve Drug-Drug Interaction prediction performance from corpus and Knowledge Graphs. The DDI Prediction from the textual corpus has been framed by the earlier researchers as relation classification problem (Sahu and Anand 2018), (Liu etal. 2016), (Sun etal. 2019), (Li and Ji 2019) using CNN or RNN-based neural networks.
Recently, with the massive success of the pre-trained language models (Devlin etal. 2019), (Yang etal. 2019) in many NLP classifications, we formulate the problem of DDI classification as a relation classification task by leveraging both entities and contextual information. We propose a model that leverages both domain-specific contextual embeddings (Bio-BERT) (Lee etal. 2019) from the target entities (drugs) and also its external information. In the recent years, representation learning has played a pivotal role in solving various machine learning tasks.
In this work, we explore the direction of augmenting graph embeddings to predict relation between two drugs from the textual corpus. We have made use of an in-house Knowledge Graph (Bio-KG) after curating the interactions among drugs, diseases, genes from multiple ontologies.In order to understand the complex underlying mechanism of interactions among the biomedical entities, we employ translation-based and semantics preserving heterogeneous graph embeddings on Bio-KG and augment the entities representation jointly to train the relation classification model. Experiments conducted on the DDIExtraction 2013 corpus (Herrero-Zazo etal. 2013) reveals that this method outperforms the existing baseline models and is in line with the new direction of research of fusing various information to DDI prediction. In a nutshell, the major contributions of this work are summarized as follows:
- 1.
We propose a novel method that jointly leverages textual and external Knowledge information to classify relation type between the drug pairs mentioned in the text showing the efficacy of external entity specific information.
- 2.
Our method achieves new state-of-the-art performance on DDI Extraction 2013 corpus.
Problem Statement
Given an input instance or sentence with two target drug entities and , the task is to classify the type of relation () the drugs hold between them, ( , β¦., ). Here denotes the number of relation types.
Methodology
Text-based Relation Classification
Our model for extracting DDIs from texts is based on the pre-trained BERT-based relation classification model by (Wu and He 2019). Given a sentence with drugs and , let the final hidden state output from BERT module is . Let the vectors to are the final hidden state vectors from BERT for entity , and to are the final hidden state vectors from BERT for entity .An average operation is applied to obtain the vector representation for each of the drug entities. An activation operation tanh is applied followed by a fully connected layer to each of the two vectors, and the output for and are and respectively.
(1) |
(2) |
The weights () and bias () parameters are shared. For the final hidden state vector of the first token (β[CLS]β), we also add an activation operation and a fully connected layer, which is formally expressed as:
(3) |
Matrices , , have the same dimensions, i.e. , , , where is the hidden state size from BERT.We concatenate , and and then add a fully connected layer and a softmax layer, which is expressed as :
(4) |
(5) |
, and is the softmax probability output over . In Equations (1), (2), (3), (4) the bias vectors are , , , . We use cross entropy as the loss function. We denote this text-based architecture as BERT-Text-DDI.
Entity Representation from KG
To infuse external information of the entities in relation classification task, we obtain the representation of two Drug entities mentioned in each input instance of the relation classification task. We use an in-house heterogeneous biomedical Knowledge Graph (Bio-KG) consisting of the interactions of target-target, drug-drug, drug-disease, drug-target, disease-disease, disease-target interactions from a large number of ontologies such as : DrugBank111https://go.drugbank.com/, BioSNAP222http://snap.stanford.edu/biodata/, UniProt333https://www.uniprot.org/ (TheUniProtConsortium 2016). The overall statistics of Bio-KG has been enumerated in table 1. The real-world information/facts observed in the Bio-KG are stored as a collection of triples in the form (, , t). Each triple is composed of a head entity , a tail entity , and a relation between them, e.g., (paracetamol, treats, fever). The fact that paracetamol is effective in curing fever is being stored in Bio-KG. In this case, denotes set of entities, and denotes the set of relations. There are three different types of in Bio-KG such as drugs, diseases, targets and five different types of such as target-target, drug-disease, drug-target, disease-disease, disease-target interactions.
Node Types | Count | Edge Types | Count |
---|---|---|---|
Drug | 6512 | Drug-Target | 15245 |
Target | 30098 | Target-Target | 77108 |
Disease | 23458 | Drug-Disease | 84745 |
Disease-Disease | 35382 | ||
Disease-Target | 31161 | ||
Total Nodes | 60068 | Total Edges | 243641 |
The aim of a Knowledge Graph embedding is to embed the entities and relations into a low-dimensional continuous vector space, so as to simplify the computations on the KG. They mostly use facts in the KG to perform the embedding task, enforcing embedding to be compatible with the facts. They provide a generalizable context about the overall Knowledge Graph (KG) that can be used to infer the relations. In this work, we employ some off-the-shelf KG embeddings to encode the representation of each of the drugs (in terms of their relationship with other entities). The knowledge graph embeddings are computed so that they satisfy certain properties; i.e., they follow a given KGE model. These KGE models define different score functions that measure the distance of two entities relative to its relation type in the low-dimensional embedding space. These score functions are used to train the KGE models so that the entities connected by relations are close to each other while the entities that are not connected are far away.Some of the KGEs used in our experiments as explained below:
- β’
TransE (Bordes etal. 2013): Given a fact (, , ), the relation in TransE is interpreted as a translation vector so that the embedded entities and can be connected by , i.e., + when (, , ) holds. The scoring function is defined as (negative) distance between and , i.e.,
(6) - β’
TransR (Lin etal. 2015): Givena fact (, , ), TransR first projects the entity representations and into the space specific to relation , Here is a projection matrix from the entity spaceto the relation space of , the scoring function is:
(7) - β’
RESCAL (Nickel, Tresp, and Kriegel 2011): Each relation in RESCAL is represented as a matrix which models pairwise interactions between latent factors. The score of a fact (, , ) is defined by a bi-linear function where , are vector representations of the entities, and is a matrix associated with the relation.This score captures pairwise interactions between allcomponents of and :
(8) - β’
DistMult (Yang etal. 2015): DistMult simplifies RESCAL by restricting to diagonal matrices. For each relation , it introduces avector embedding and requires = . Thescoring function is defined as:
(9) This score captures pairwise interactions between only thecomponents of and along the same dimension, and reduces the number of parameters to per relation.
From Bio-KG, we train these KG Embeddings and obtain the representation of all the nodes. In our case, we are only interested in obtaining the representation of drug nodes. We denote the KG representation of drug as .
BERTKG-DDI
From the input instance with two tagged target drug entities and , we obtain the KG embedding representation of two drugs and respectively using Bio-KG. We concatenate these two embeddings and and pass those through a fully connected layer as represented below:
(10) |
and are the parameters of the fully-connected layer of the KG representation of and . The final layer of BERTKG-DDI model contains concatenation of all the previous text-based outputs and drug representation from KG as expressed below:
(11) |
(12) |
Finally the training optimization is achieved using the cross-entropy loss.
Experimental Setup
Dataset and Pre-processing
We have followed the task setting of Task 9.2 in theDDIExtraction 2013 shared task (Herrero-Zazo etal. 2013) for evaluation. It consists of MEDLINE documents annotated with the drug mentions and five types of interactions: Mechanism, Effect, Advice, Interaction and Other. The task is a multi-class classification to classify each of the drug pairs in the sentences into one of the types and we evaluate using three standard evaluation metrics such as: Precision (P), Recall (R) and F1-score (F1).
During pre-processing, we obtain the DRUG mentions in the corpus and map those into unique DrugBank 444https://go.drugbank.com/ identifiers. This is a step for converting the drug mentions into their respective DrugBank ID, a step of entity linking (Mondal etal. 2019), (Leaman, Dogan, and lu 2013). This mention normalization has been performed based on the longest overlap of drug mentions in DrugBank and map the drugs to different Knowledge sources used to construct Bio-KG.
Training Details
For the purpose of experiments, we use the initialization of various pre-trained contextual embeddings. For instance, we use the embeddings such as bert-base-cased 555https://huggingface.co/bert-base-cased, scibert-scivocab-uncased (Beltagy, Lo, and Cohan 2019) 666https://github.com/allenai/scibert and domain-specific biobert v1.0 pubmed pmc and biobert v1.0 pubmed777https://github.com/dmis-lab/biobert as the initialization of the transformer encoder in BERTKG-DDI. We uniformly keep the maximum sequence length as 300 for all the embedding ablations and trained for 5 epochs. For the KG embeddings, we use word embeddings dimensions to be 200. Stochastic Gradient Descent (SGD) was used for optimization with an initial learning rate of 0.0001 and the model is trained for 300 epochs. After training the embeddings, we obtain the final representation of each drug. For the drugs mentioned in the input instance, we make use of the obtained embeddings as shown in the equation 11. We initialize the non-normalized drugs using pre-trained word2vec (of dimension 200 same as the KG embedding) trained on PubMED 888http://evexdb.org/pmresources/ngrams/PubMed/.
Embeddings on BERT-Text-DDI | Test set Macro F1 |
---|---|
bert-base-cased | 0.806 |
scibert-scivocab-uncased | 0.812 |
biobert v1.0 pubmed pmc | 0.818 |
biobert v1.1 pubmed | 0.822 |
KG Embeddings on BERTKG-DDI | Test set Macro F1 |
---|---|
BERTKG-DDI w/ TransE | 0.826 |
BERTKG-DDI w/ TransR | 0.829 |
BERTKG-DDI w/ RESCAL | 0.834 |
BERTKG-DDI w/ DistMult | 0.840 |
Models | Contextual Embeddings | Macro F1 |
---|---|---|
BERT-Text-DDI | biobert v1.0 pubmed pmc | 0.818 |
BERTKG-DDI | biobert v1.0 pubmed pmc | 0.831 |
BERT-Text-DDI | biobert v1.1 pubmed | 0.822 |
BERTKG-DDI | biobert v1.1 pubmed | 0.840 |
Methods | Advice | Effect | Mechanism | Interaction | Total |
---|---|---|---|---|---|
F1 Score | F1 Score | F1 Score | F1 Score | F1 Score | |
(Zhang etal. 2017) | 0.80 | 0.71 | 0.74 | 0.54 | 0.72 |
(Vivian etal. 2017) | 0.85 | 0.76 | 0.77 | 0.57 | 0.77 |
(Asada, Miwa, and Sasaki 2018) | 0.81 | 0.71 | 0.73 | 0.45 | 0.72 |
(Sun etal. 2019) | 0.80 | 0.73 | 0.78 | 0.58 | 0.75 |
(Zhu etal. 2020) | 0.86 | 0.80 | 0.84 | 0.56 | 0.80 |
Our method (BERTKG-DDI) | 0.88 | 0.81 | 0.87 | 0.59 | 0.84 |
Results and Discussion
In this section, we provide a detailed analysis of the various results and findings that we have observed during experiments. We show empirical results based on BERTKG-DDI for both text and KG information.
Ablation of Embeddings on BERT-Text-DDI:During ablation analysis, we observe that the incorporation of domain-specific information in biobert v.1 pubmed boosts up the predictive performance in terms of macro-F1 score (across all relation types) by 2.3% compared to bert-base-cased. Moreover, the scibert-vocab-cased embedddings due to the scientific details obtained during fine-tuning achieves reasonable boost in performance. biobert v.1 pubmed based BERT-Text-DDI is the best-performing text-based relation classification model. The results are enumerated in Table 2.
Ablation analysis of KG Embeddings on BERTKG-DDI:We compare the different KG embeddings for drugs obtained from Bio-KG after augmenting with the BERT-Text-DDI model in Table 3. The semantic-matching models such as RESCAL and DistMult measure plausibility of facts by matching the latent semantics of both relations and entities in their vector space. In our experiments, they seem to outperform the translation-based KGE such as TransE and TransR by an average of 1% macro F1-score.
Advantage of KG information on BERTKG-DDI:During empirical analysis of the BERTKG-DDI model, we observe how much performance gain can be achieved by augmenting KG embeddings. From the results enumerated in terms of macro F1-score on all the relation types in Table 4, we observe that the best-performing BERT-Text-DDI model achieves a performance boost of 1.8% after augmenting KG information in BERTKG-DDI.
Comparison with the existing baselines: We compare our best-performing model with some of the best-performing existing baselines. (Asada, Miwa, and Sasaki 2018) proposed a novel neural method to extract drug-drug interactions (DDIs) from texts using external drug molecular structure information. They encode textual drug pairs with convolutional neural networks and their molecular pairs with graph convolutional networks (GCNs), and then concatenate the outputs of these two networks. (Vivian etal. 2017) proposed an effective model that classifies DDIs from the literature by combining an attention mechanism and a recurrent neural network with long short-term memory (LSTM) units. (Zhang etal. 2017) has presented a hierarchical recurrent neural networks (RNNs)-based method to integrate the SDP and sentence sequence for DDI extraction task.(Sun etal. 2019) has proposed a novel recurrent hybrid convolutional neural network (RHCNN) for DDI extraction from biomedical literature. In the embedding layer, the texts mentioning two entities are represented as a sequence of semantic embeddings and position embeddings. In particular, the complete semantic embedding is obtained by the information fusion between a word embedding and its contextual information which is learnt by recurrent structure. Recently, (Zhu etal. 2020) proposed multiple entity-aware attentions with various entity information to strengthen the representations of drug entities in sentences. They integrate drug descriptions from Wikipedia and DrugBank to our model to enhance the semantic information of drug entities. Also, they modified the output of the BioBERT model and the results show that it is better than using the BioBERT model directly. On the contrary, our method achieves the state-of-the-art performance based on the results on the DDI Extraction 2013 corpus (in terms of F1-scores of all the relation types) as shown in Table 5.
Conclusion
In this paper, we propose an approach, BERTKG-DDI, for DDI relation classification based on pre-trained language models and Knowledge Graph Embedding of the drug entities. Experiments conducted on a benchmark DDI dataset proves the effectiveness of our proposed method. Possible directions of further research might be to explore other external drug representation such as chemical structure, textual description in predicting DDI from textual corpus.
References
- Asada, Miwa, and Sasaki (2018)Asada, M.; Miwa, M.; and Sasaki, Y. 2018.Enhancing Drug-Drug Interaction Extraction from Texts by MolecularStructure Information.In Proceedings of the 56th Annual Meeting of the Associationfor Computational Linguistics (Volume 2: Short Papers), 680β685. Melbourne,Australia: Association for Computational Linguistics.doi:10.18653/v1/P18-2108.URL https://www.aclweb.org/anthology/P18-2108.
- Beltagy, Lo, and Cohan (2019)Beltagy, I.; Lo, K.; and Cohan, A. 2019.SciBERT: A Pretrained Language Model for Scientific Text.In EMNLP/IJCNLP.
- Bordes etal. (2013)Bordes, A.; Usunier, N.; GarcΓa-DurΓ‘n, A.; Weston, J.; and Yakhnenko,O. 2013.Translating Embeddings for Modeling Multi-relational Data.In NIPS.
- Devlin etal. (2019)Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019.BERT: Pre-training of Deep Bidirectional Transformers for LanguageUnderstanding.In NAACL-HLT.
- Herrero-Zazo etal. (2013)Herrero-Zazo, M.; Segura-Bedmar, I.; MartΓnez, P.; and Declerck, T. 2013.The DDI corpus: An annotated corpus with pharmacological substancesand drugβdrug interactions.Journal of Biomedical Informatics 46(5): 914 β 920.ISSN 1532-0464.doi:https://doi.org/10.1016/j.jbi.2013.07.011.URL http://www.sciencedirect.com/science/article/pii/S1532046413001123.
- Leaman, Dogan, and lu (2013)Leaman, R.; Dogan, R.; and lu, Z. 2013.DNorm: Disease Name Normalization with Pairwise Learning to Rank.Bioinformatics (Oxford, England) 29.doi:10.1093/bioinformatics/btt474.
- Lee etal. (2019)Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; and Kang, J. 2019.BioBERT: a pre-trained biomedical language representation model forbiomedical text mining.Bioinformatics 36(4): 1234β1240.ISSN 1367-4803.doi:10.1093/bioinformatics/btz682.URL https://doi.org/10.1093/bioinformatics/btz682.
- Li and Ji (2019)Li, D.; and Ji, H. 2019.Syntax-aware Multi-task Graph Convolutional Networks for BiomedicalRelation Extraction.In Proceedings of the Tenth International Workshop on HealthText Mining and Information Analysis (LOUHI 2019), 28β33. Hong Kong:Association for Computational Linguistics.doi:10.18653/v1/D19-6204.URL https://www.aclweb.org/anthology/D19-6204.
- Lin etal. (2015)Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu, X. 2015.Learning Entity and Relation Embeddings for Knowledge GraphCompletion.In Proceedings of the Twenty-Ninth AAAI Conference onArtificial Intelligence, AAAIβ15, 2181β2187. AAAI Press.ISBN 0262511290.
- Liu etal. (2016)Liu, S.; Tang, B.; Chen, Q.; and Wang, X. 2016.Drug-Drug Interaction Extraction via Convolutional Neural Networks.Computational and Mathematical Methods in Medicine 2016: 1β8.doi:10.1155/2016/6918381.
- Mondal (2020)Mondal, I. 2020.BERTChem-DDI : Improved Drug-Drug Interaction Prediction fromtext using Chemical Structure Information.In Proceedings of Knowledgeable NLP: the First Workshop onIntegrating Structured Knowledge and Neural Networks for NLP, 27β32.Suzhou, China: Association for Computational Linguistics.URL https://www.aclweb.org/anthology/2020.knlp-1.4.
- Mondal etal. (2019)Mondal, I.; Purkayastha, S.; Sarkar, S.; Goyal, P.; Pillai, J.; Bhattacharyya,A.; and Gattu, M. 2019.Medical Entity Linking using Triplet Network.In Proceedings of the 2nd Clinical Natural Language ProcessingWorkshop, 95β100. Minneapolis, Minnesota, USA: Association forComputational Linguistics.doi:10.18653/v1/W19-1912.URL https://www.aclweb.org/anthology/W19-1912.
- Nickel, Tresp, and Kriegel (2011)Nickel, M.; Tresp, V.; and Kriegel, H.-P. 2011.A Three-Way Model for Collective Learning on Multi-Relational Data.In Proceedings of the 28th International Conference onInternational Conference on Machine Learning, ICMLβ11, 809β816. Madison,WI, USA: Omnipress.ISBN 9781450306195.
- Purkayastha etal. (2019)Purkayastha, S.; Mondal, I.; Sarkar, S.; Goyal, P.; and Pillai, J.K.2019.Drug-Drug Interactions Prediction Based on Drug Embedding and GraphAuto-Encoder.In 2019 IEEE 19th International Conference on Bioinformaticsand Bioengineering (BIBE), 547β552.
- Sahu and Anand (2018)Sahu, S.K.; and Anand, A. 2018.Drug-drug interaction extraction from biomedical texts using longshort-term memory network.Journal of Biomedical Informatics 86: 15 β 24.ISSN 1532-0464.doi:https://doi.org/10.1016/j.jbi.2018.08.005.URL http://www.sciencedirect.com/science/article/pii/S1532046418301606.
- Sun etal. (2019)Sun, X.; Dong, K.; Ma, L.; Sutcliffe, R.; He, F.; Chen, S.; and Feng, J. 2019.Drug-Drug Interaction Extraction via Recurrent Hybrid ConvolutionalNeural Networks with an Improved Focal Loss.Entropy 21(1): 37.ISSN 1099-4300.doi:10.3390/e21010037.URL http://dx.doi.org/10.3390/e21010037.
- TheUniProtConsortium (2016)TheUniProtConsortium. 2016.UniProt: the universal protein knowledgebase.Nucleic Acids Research 45(D1): D158βD169.ISSN 0305-1048.doi:10.1093/nar/gkw1099.URL https://doi.org/10.1093/nar/gkw1099.
- Vivian etal. (2017)Vivian, V.; Lin, H.; Luo, L.; Zhao, Z.; Zhengguang, l.; Yijia, Z.; Yang, Z.;and Wang, J. 2017.An attention-based effective neural model for drug-drug interactionsextraction.BMC Bioinformatics 18.doi:10.1186/s12859-017-1855-x.
- Wu and He (2019)Wu, S.; and He, Y. 2019.Enriching Pre-trained Language Model with Entity Information forRelation Classification.CoRR abs/1905.08284.URL http://arxiv.org/abs/1905.08284.
- Yang etal. (2015)Yang, B.; tau Yih, W.; He, X.; Gao, J.; and Deng, L. 2015.Embedding Entities and Relations for Learning and Inference inKnowledge Bases.CoRR abs/1412.6575.
- Yang etal. (2019)Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; and Le, Q.V.2019.XLNet: Generalized Autoregressive Pretraining for LanguageUnderstanding.In NeurIPS.
- Zhang etal. (2017)Zhang, Y.; Zheng, W.; Lin, H.; Wang, J.; Yang, Z.; and Dumontier, M. 2017.Drugβdrug interaction extraction via hierarchical RNNs on sequenceand shortest dependency paths.Bioinformatics 34(5): 828β835.ISSN 1367-4803.doi:10.1093/bioinformatics/btx659.URL https://doi.org/10.1093/bioinformatics/btx659.
- Zhu etal. (2020)Zhu, Y.; Li, L.; Lu, H.; Zhou, A.; and Qin, X. 2020.Extracting drug-drug interactions from texts with BioBERT andmultiple entity-aware attentions.Journal of Biomedical Informatics 106: 103451.ISSN 1532-0464.doi:https://doi.org/10.1016/j.jbi.2020.103451.URL http://www.sciencedirect.com/science/article/pii/S1532046420300794.