Platform Technology

Data-Driven Technology-Based Drug Discovery Solution

Platform Technology

Figure 1. Concept of Sapientia as novel drug discovery platform

Sapientia is a well-organized drug discovery platform that integrates machine learning (ML) technology with wet lab experimentation, facilitating the drastic acceleration of drug discovery from target identification to the selection of a lead candidate [Figure 1]. The domain of in silico analysis (the left circuit of the diagram) serves as a virtual provider of either a druggable gene target(s) or a new chemical hit/lead(s) to expand a pool of sources for therapeutically promising and reasonable ingredients. The domain of experimentation (the right circuit of the diagram) plays a front role as a validator to investigate whether the designed (or suggested) compounds may affect the disease models in the aspect of medicinal chemistry and biology. Several contact points of two developmental circuits such as QSAR (quantitative structure-activity relationship) and chemical synthesis actively communicate with each other to sustain a high probability of success.

Figure 2. Details of Sapientia.

Sapientia operates through four core parts: (1) SapientiaTID, a genomic workflow from target mining to selection of a new druggable gene, (2) SapientiaHITS, a workflow from in silico screening to hit selection, (3) SapientiaLEAD and SapientiaOPT, integrated workflows for compound design and synthesis for lead identification and optimization, and (4) SapientiaCore, a bunch of computational resources such as StructureDB, SapientiaDB and SapientiaML [Figure 2].

SapientiaTID identifies a novel druggable gene(s) for the development of a first-in-class drug. Generally, drug discovery and development begin with target identification and characterization. Successful drug discovery and development depends on how much we know about the clinical spectrum of disease and which gene(s) is a potentially ‘druggable’ therapeutic target. Most of the targets (i.e., genes) for use in drug discovery are usually identified from scientific literature and publicly available databases. Otherwise, either target deconvolution–a way to identify a target if a biologically effective compound already exists–or target discovery via high throughput screening (HTS) at the excavation of a new druggable target can be a second option. The target identification and validation (TID) facility of Sapientia is launched to achieve successful and efficient target discovery via genomic and experimental approaches. Genomic data obtained from patients, model animals, and cell cultures via single-cell RNA sequencing (scRNA-seq), total RNA sequencing (RNA-seq), genome-wide association study (GWAS), etc. are multifacetedly analyzed through statistical approaches to understand the routine of pathogenesis and to suggest a set of causative genes. TID also attempts to empirically validate the confidence of suggested genes via experimental methodology, specifically silencing-based assay with known model systems (e.g., the usefulness of normal lung fibroblasts as an experimental model of pulmonary fibrosis). During or after the validation, a final gene(s) of interest called a NOVEL TARGET is selected under three critical points: (1) novelty, (2) draggability, and (3) feasibility.

The step for hit identification and selection designated as SapientiaHITS is initiated by the versatile strategy of Sapientia’s in silico screening supported by SapientiaCORE, a cooperative combination of conventional computer-aided drug discovery (CADD) such as ligand-based (LB) and structure-based (SB) drug discovery (DD) [LBDD and SBDD, respectively] with structured, SapientiaDB, and SapientiaML. Particularly, SapientiaDB, the virtual informative reservoir of compounds that are derived from in-house design and synthesis as well as publicly available databases such as ChEMBL, PubChem, ZINC, Maule, etc. provides the diversity of pools with the initial screening step. Furthermore, SapientiaML effectively and accurately supports the virtual process of hit identification via high-performance statistical models such as CNN (convolutive neural network), RNN (recursive neural network), and so on. A line of hit compounds identified by SapientiaCORE is experimentally evaluated to see whether they are effective and worthy to be considered as a starting material for the next discovery step.

SapientiaLEAD and SapientiaOPT have responsibilities to promote lead identification and optimization by interdisciplinary interwinding chemistry design to synthesis to biological validation, respectively. The most critical capability in the two modules is that ChemDesign, a unification of CADD with statistical ML approaches, can produce a diverse series of novel effective compounds which have never been published elsewhere even though the three-dimensional structure of a target is never known. Our ChemDesign can generate a variety of reasonable scaffolds via our ML approaches. Furthermore, Sapientia’s in-house facility to support the fast synthesis of compounds permits to speed up of the design-validation cycle. And our computational strategies for QSAR modeling help us decide whether a compound could jump to the next step.