Prof. Dr. Naomie Salim
Universiti Teknologi Malaysia
Universiti Teknologi Malaysia
Keynote Talk : Methods for mining chemical and document databases to support computer-aided drug design and development process
The vast amount of data in chemical databases and document databases related to drugs offers a lot of opportunities to aid the process of drug design and development. For instance, searching structurally similar molecules to a promising lead compound can help us discover a better lead compound. The bioactivity of unknown compounds can also be predicted based on their structural similarity to known drug compounds. Similarity measures used for these search can also be used to build focused libraries against specific targets. Traditionally, Vector Space Model utilizing bit string representations of compound and the Tanimoto coefficient has been used to rank molecules based on their structural similarity to query compounds. However, we have proved that the Tanimoto coefficient is not necessarily the best coefficient and fusion of certain coefficients can result in a higher number of similarly active compounds among the top ranked compounds. In this talk, a number of approaches to enhance molecular search will be discussed. The approaches include modification of the Simple Matching Similarity Measure with bit-string re-weighting, probabilistic-based compound searching, Bayesian network-based similarity measures, fragment selection, fragment weightings, relevance feedbacks, fuzzy coefficients, quantum-based similarity searching, Multilevel Neighborhoods of Atoms molecular structure descriptors, shape-based similarity measures and deep learning will be discussed. A new method for the selection of representative subsets of compounds from chemical databases has also been proposed based on an improved chemical space representation and alpha shape theory.
On the other hand, screening of compounds in a virtual library can also be made more efficient if they are first clustered before selecting representatives from each cluster. The talk will present results from a number of clustering techniques for clustering chemical compounds databases and how consensus clustering is used to improve the clustering results.
Finally, instead of relying on manual inspection, the automatic detection of adverse drug effects from medical reports can help regulatory authorities in rapid information screening and extraction to accelerate the generation of medical decision support and safety alerts. In this talk, we will share two extraction methods for such purpose. The first method is based on mining rules augmented with lexical information, i.e. cue words for mining the syntactic dependency paths connecting the drugs and medical conditions entities, and then extracting the corresponding relation. The second method is a case based reasoning model based on automatically learned linguistic patterns from the dependency paths link between the drugs and the medical condition entities to identify the relations. A classification model based on automatically learned and manually curated linguistic patterns to detect sentences holding drug-adverse effect information without relying on a named entity recognition module to identify the entities in the input sentences will also be described.
Professor Salim’s main research goal is to design of new algorithms to improve the effectiveness of searching and mining new knowledge from various kinds of datasets, including unstructured, semi-structured and structured databases. The current focus of her research is on chemical databases and text databases to support the process of computer-aided drug design, text summarisation, plagiarism detection, automatic information extraction, sentiment analysis and recommendation systems. The output of the research has been incorporated into a number of software such as UTMChem Workbench and NADI Natural Products Database System to support drug design and drug optimisation process, UTMCLPD Cross Language Plagiarism Detection System to summarise documents and check for plagiarism and Oricheck for cross-language idea similarity checking and plagiarism detection. The systems can be used by pharmaceutical scientists to search, retrieve, optimize and discover new drug compounds from chemical and natural product databases and help academic institutions preserve academic integrity by providing support to detect intelligent, idea plagiarism across different languages.
Professor Salim has been involved in 53 research projects out of which she heads 21 of the projects. She has authored over 170 journal articles. 160 of her articles are indexed under Scopus and 84 are indexed under Web of Science. Her Google Scholar h-Index is 21, and she has 1931 citations to date. Her Scopus H-index is 14 with 733 Scopus-indexed citations.
Among the research and innovation awards received by Professor Salim are the PECIPTA 2011 Gold Medal award for her UTMCLP cross-language semantic plagiarism detection system, the I-inova 2010 Gold Medal award for her Islamic Ontology-based Quran search engine, BioInnovation 2011 Bronze Award for UTMChem Workbench Molecular Database System, iPhex Gold Medal Award for innovation in teaching and learning, UTM 2011 Best Research Award, UTM 2014 Best Research Award and the INATEX Distinction Award (1998). She has also won the UTM Citra Karisma Indexed Journal Paper Award for 2009, 2011, 2012, 2013 and 2014.
She is a fellow of Japan Society for the Promotion of Science (JSPS), the head of Soft Computing Research Group UTM, Associate Member of UTM Big Data Centre and a UTM Senate Member.