Algorithm of Capi ServerDatasets
We obtained experimentally validated 2305 molecules from NCBI PubChem BioAssay that have been tested against plasomodium as apicoplast inhibitor. After careful examination of all compounds, we removed all active compound whose AC value was > 1uM. Finally, we got 462 and 450 compounds at 48hr's and 96hr's time respectively, molecules were downloaded from NCBI in SDF format. We created two type of datasets from above molecules.
1) Datatset-1: This dataset is composed of 118 inhibitors and 344 non-inhibitors against the 48hr's model of Plasmodium apicoplast.
2) Dataset-2: This dataset is composed of 210 inhibitors and 240 non-inhibitors against the 96hr's model of Plasmodium apicoplast.
We calculated descriptors of these molecules using PaDEL software, an open source software that can calculate ~10 different types of binary fingerprints along with 1D, 2D and 3D descriptors. In this work, we have used 4 classes of fingerprints (FP) and 1D2D descriptors.
4) SubStrucutre 5) 1D2D Descriptors
Feature Selection: For efficient model building, selection of a preferred set of molecular descriptors is an important step. Descriptors were selected using CfSubSetEval with BestFit algorithm of Weka software
We have used Weka for building models for predicting inhibitors at different time scale Plasmodium apicoplast. Weka is a statistical package having numerous algorithms for data pre-processing, model building, feature selection.
Performance Measures: Once a classification model was constructed, goodness about the fit and statistical significance was assessed using the statistical parameters like sensitivity, specificity, accuracy and MCC.