Druggability index as a guide for compound synthesis.

Chemical space for low molecular weight compounds is estimated to be of the order of 10100 (10 power 100). It is an enormous number, and, evidently, the search for suitable druglike compounds should be driven by knowledge. The first step in this direction was the Lipinski rule of 5, which restricted the search to some extent. Further, a set of medicinal and structural filters were introduced in order to narrow chemical space. For the last decade, many vendors synthesized and commercialized their screening compounds collections, making millions of them available for high-throughput screening (HTS). Yet, the analysis of such large collections would have been a considerable time- and resource-consuming task.

For this reason, evaluation criteria for assessment of databases for druggability become the demands of the times. Databases offered by various commercial suppliers should be screened in silico to find the most promising candidates. To solve this problem, Sirois, S et al. (Sirois, S.; Hatzakis, G.; Wei, D.; Du, Q.; Chou ,K.C.; Assessment of chemical libraries for their druggabilitiy. Computational Biology and Chemistry, 2005, 29, 55-67) has recently offered a function of 12 parameters that takes into account physical, chemical and structural properties as well as the presence of undesirable functional groups. As the result, drug-likeness of entire collection is presented as a single integer. The authors have analyzed 44 databases including the Enamine database. The only pitfall of this comparison was that many analyzed databases were out-of-date (e.g., the Enamine's one was of 2002).

Starting from 2000, Enamine implemented a novel strategy in synthesis of drug-like compounds, named Real Database. Briefly, it can be described as Enamine's validated drug-like chemical space. We have developed more than 40 chemical reactions in which 25 000 building blocks are used for the design and synthesis of screening compounds. Prior to the synthesis, the compounds to be obtained are analyzed for druglikeness. Now the strategy is in its sixth year and proved its flexibility and efficacy. The Real Database was offered to many customers who selected compounds using their own criteria. Using this approach, in a very short period of time we have increased our stock collection from 80 000 in 2002 to 2 000 000 today.

According to the approach offered in the above mentioned article, we have calculated druggability coefficients of Stock collection of 2003, 3 years update (2003-2006), and Real Database of 01.01.2006:

Descriptor Real Database Stock collection of 2003 Stock update 2003-2006
Molecular Weight 0.009424 0.04 0.013
logP 0.061183 0.217 0.085
Rotatable bonds 0.029235 0.02 0.011
Hb Acceptors 0.103994 0.054 0.065
Hb Donors 0.000077 0.0007877 0.00045
Number of N,O atoms 0.000023 0.0006126 0.00003184
Halogens 0.0000125 0.00177 0.00011
Chain Length 0.3784763 0.184 0.211
Rings number 0.000386 0.00629 0.00108
CFnum 0.0 0.0 0.0
Big Ring 0.001733 0.002107 0.00165
Reactive 0.097181 0.08 0.06
Druggability index 0.682 0.601 0.448

For the stock collection, the largest differences were observed for MW, rotatable bonds and log P indices indicating that newly synthesized compounds better fit the "rule of five". This reflects Enamine's effort to create mainly drug-like screening substances. As of January 2006, only 9% of compounds exceeded 500 Daltons limit and 20% exceeded 5 LogP limit. Altered trend of other descriptors also confirmed enhancement of the stock with drug-like compounds. On average, a two-fold decrease was observed for each descriptor. However, slight increase was observed for HB acceptors and Chain Length descriptors, whereas CFnum descriptor was zero in all sets.

Analysis of stock increase with drug like filters revealed that the fraction of drug like compounds increased two-fold for the last 5 years, now reaching 55%:

Year Number of compounds Number of
druglike compounds
% of druglike compounds
2002 102 574 29 664 29
2003 168 614 56 499 34
2004 305 398 143 273 47
2005 402 150 214 489 53
2006 627 520 341 001 55

Considering its size, which is currently of the order of 11 million structures, REAL Database exhibits remarkable values as a source of screening compounds. The estimated druggability index for the REAL database is 0.682. Moreover, about 4 millions of these compounds have zero druggability index, making REAL Database the largest collection of drug like compounds in the world.

