How AstraZeneca is using Netflix like Knowledge Graph to Discover New Drugs

Drug development is a long and expensive process. Source: Spark AI Summit 2020
A trivial knowledge graph
Process of building Knowledge Graph
library(dplyr)
library(tidytext)
library(igraph)
library(tidyr)
#' Get information about drugs from NHS website
nhs_url <- 'https://www.nhs.uk/medicines/'
drugs <- c('Atorvastatin', 'Azithromycin', 'Amoxicillin')
#List to save drugs information
datalist = list()
#' for every drug get information from NHS
for (i in 1:length(drugs)) {

drug <- drugs[i]

drug_info <- xml2::read_html(paste0(nhs_url, drug)) %>%
rvest::html_nodes(xpath='//*[@class="nhsuk-grid-column-two-thirds"]') %>%
rvest::html_text() %>%
gsub(pattern="\t|\n|\r", replacement="") %>%
gsub(pattern="\\s+", replacement=" ")

dat <- data.frame(x = drug, y = drug_info)

datalist[[i]] <- dat # add it to your list
}
#' bind the list into a dataframe
drugs_info = do.call(rbind, datalist)
colnames(drugs_info) <- c('Drug', 'Info')
#' extract bigrams from the freetext
drugs_bigrams <- drugs_info %>%
unnest_tokens(bigram, Info, token = "ngrams", n = 2)
#' create
drugs_bigrams_df <- drugs_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
drugs_bigrams_filtered <- drugs_bigrams_df %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)
#' new bigram counts:
drug_bigram_counts <- drugs_bigrams_filtered %>%
count(word1, word2, sort = TRUE)
#' filter for only relatively common combinations
drug_info_graph <- drug_bigram_counts %>%
filter(n > 1) %>%
graph_from_data_frame()
drug_info_graph
#' Plot graph
library(ggraph)
set.seed(2020)
ggraph(drug_info_graph, layout = "fr") +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name), vjust = 1, hjust = 1)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
sid dhuri

sid dhuri

I am data scientist by trade. I love to write about data science, marketing and economics. I am also the founder Orox.ai a marketing ai and automation platform.