Open Access

Table B.1

Features of molecules used for ML

Molecule name Molecule formula Mass #atoms –OH –C(O)– –COOH –C(O)O– –O– –NH2 –CN –N–C(O)– Valence electrons H–bond acceptor H–bond donor TPSA
amu # # # # # # # # # # # #2
Methane CH4 16 5 8 0 0 0
Ammonia NH3 17 4 8 1 1 35
Water H2O 18 3 1 8 0 0 32
Acetylene C2H2 26 4 10 0 0 0
Hydrogen cyanide HCN 27 3 1 10 1 0 24
Carbon monoxide CO 28 2 12 1 0 17
Ethylene C2H4 28 6 12 0 0 0
Ethylene glycole (CH2OH)2 62 10 2 12 2 2 40
Dinitrogen N2 28 2 10 2 0 48
Ethane C2H6 30 8 14 0 0 0
Formaldehyde H2CO 30 4 L 12 1 0 17
Methylamine CH3NH2 31 7 1 14 1 1 26
Methanol CH3OH 32 6 1 14 1 1 20
Oxygen O2 32 2 8 0 0 32
Hydroxylamine NH2OH 33 5 1 1 14 2 2 46
Methylacetylene CH3CCH 40 7 16 0 0 0
Acetonitrile CH3CN 41 6 1 16 1 0 24
methylisocyanide CH3NC 41 6 1 16 0 0 4
Propene CH2CHCH3 42 9 18 0 0 0
Isocyanic acid HNCO 43 4 16 2 1 41
Acetaldehyde CH3CHO 44 7 1 18 1 0 17
Carbon dioxide CO2 44 3 16 2 0 34
Ethylene oxide c–C2H4O 44 7 1 18 1 0 13
Nitrous oxide N2O 44 3 16 2 0 51
Propane C3H8 44 11 20 0 0 0
Formamide NH2CHO 45 6 1 18 1 1 43
Dimethylether CH3OCH3 46 9 1 20 1 0 9
Ethanol CH3CH2OH 46 9 1 20 1 1 20
Formic acid HCOOH 46 5 1 18 1 1 37
Nitrogen dioxide NO2 46 3 17 3 0 54
Cyanoacetylene HC3N 51 5 1 18 1 0 24
Acrylonitrile CH2CHCN 53 7 1 20 1 0 24
Propionitrile CH3CH2CN 55 9 1 22 1 0 24
Acetone CH3COCH3 58 10 1 24 1 0 17
Allyl alcohol CH2CHCH2OH 58 10 1 24 1 1 20
Propionaldehyde CH3CH2CHO 58 10 1 24 1 0 17
Acetic acid CH3COOH 60 8 1 24 1 1 37
Glycolaldehyde HOCH2CHO 60 8 1 1 24 2 1 37
Glycolonitrile HOCH2CN 57 7 1 1 24 2 1 44
Methyl formate CH3OCHO 60 8 1 24 2 0 26
Pentane C5H12 72 17 32 0 0 0
N,N–DMF (CH3)2NCHO 73 12 1 30 1 0 20
Ethyl formate CH3CH2OCHO 74 11 1 30 2 0 26
Dicyanoacetylene NCCCCN 76 6 2 26 2 0 48
Benzene C6H6 78 12 30 0 0 0
Hexane C6H14 86 20 38 0 0 0
Toluene CH3C6H5 92 15 36 0 0 0
1,1-Dichloroethane CH3CHCl2 98 8 26 0 0 0
Heptane C7H16 100 23 44 0 0 0
Benzonitrile c–C6H5CN 103 13 1 38 1 0 24
Benzaldehyde C6H5CHO 106 14 1 40 1 0 17
Ethylbenzene CH3CH2C6H5 106 18 42 0 0 0
o–Xylene (CH3)2C6H4 106 18 42 0 0 0
Cytosine C4H5N3O 111 13 1 1 42 3 2 72
Octane C8H18 114 26 50 0 0 0
Trichloromethane CHCl3 118 5 26 0 0 0
Thymine C5H6N2O2 126 15 1 48 2 2 66
Naphthalene C10H8 128 18 48 0 0 0
Nonane C9H20 128 29 56 0 0 0
Adenine C5H5N5 135 15 1 50 4 2 80
Decane c10h22 142 32 62 0 0 0
1,2–Dichlorobenzene C6H4Cl2 146 12 42 0 0 0
Guanine C5H5N5O 151 16 1 1 56 4 3 100
Undecane C11H24 156 35 68 0 0 0
Dodecane C12H26 170 38 74 0 0 0
Tridecane C13H28 184 41 80 0 0 0
Tetradecan C14H30 198 44 86 0 0 0
Pentadecane C15H32 212 47 92 0 0 0
Hexadecane C16H34 226 50 98 0 0 0
Heptadecane C17H36 240 53 104 0 0 0
2–Deoxyadenosine C10H13N5O3 251 31 2 1 1 96 8 3 119
Octadecane C18H38 254 56 110 0 0 0
2–Deoxyguanosine C10H13N5O4 267 32 2 1 1 1 102 8 4 139
Nonadecane C19H40 268 59 116 0 0 0
Icosane C20H42 282 62 122 0 0 0
Henicosane C21H44 296 65 128 0 0 0
Coronene C24H12 300 36 108 0 0 0
Docosane C22H46 310 68 134 0 0 0
Tricosane C23H48 324 71 140 0 0 0
Coronene C24H50 338 74 108 0 0 0
Tetracosane C25H52 352 77 146 0 0 0
Pentacosane C26H54 366 80 152 0 0 0
Hexacosane C27H56 380 83 158 0 0 0
Heptacosane C28H58 394 86 164 0 0 0
Octacosane C29H60 408 89 170 0 0 0
Octacosane C30H62 422 92 170 0 0 0
Nonacosane C31H64 436 95 176 0 0 0
Dotriacontane C32H66 450 98 194 0 0 0
Tritriacontane C33H68 464 101 200 0 0 0
Tetratriacontane C34H70 478 104 206 0 0 0
Pentatriacontane C35H72 492 107 212 0 0 0
Hexatriacontane C36H74 506 110 218 0 0 0
Heptatriacontane C37H76 520 113 224 0 0 0
Octatriacontane C38H78 534 116 230 0 0 0
Nonatriacontane C39H80 548 119 236 0 0 0
Tetracontane C40H82 562 122 242 0 0 0
Hentetracontane C41H84 576 125 248 0 0 0
Dotetracontane C42H86 590 128 254 0 0 0
Tritetracontane C43H88 604 131 260 0 0 0
Tetratetracontane C44H90 618 134 266 0 0 0
Pentatetracontane C45H92 632 137 272 0 0 0
Hexatetracontane C46H94 646 140 278 0 0 0
Heptatetracontane C47H96 660 143 284 0 0 0
Octatetracontane C48H98 674 146 290 0 0 0
Nonatetracontane C49H100 688 149 296 0 0 0
Pentacontane C50H102 702 152 302 0 0 0
Henpentacontane C51H104 716 155 308 0 0 0
Dopentacontane C52H106 730 158 314 0 0 0
Tripentacontane C53H108 744 161 320 0 0 0
Tetrapentacontane C54H110 758 164 326 0 0 0
Pentapentacontane C55H112 772 167 332 0 0 0
Hexapentacontane C56H114 786 170 338 0 0 0
Heptapentacontane C57H116 800 173 344 0 0 0
Octapentacontane C58H118 814 176 350 0 0 0
Nonapentacontane C59H120 828 179 356 0 0 0
Hexacontane C60H122 842 182 362 0 0 0

Notes. We note that this table only lists some of the most significant features of the molecules, but does not provide the full feature list for training.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.