This project seeks to demonstrate that data of sufficient chemical diversity are available to construct a multi-class energetic materials toxicity database (EMTD) that can then be evaluated using machine learning (ML)/artificial intelligence (AI) analyses. Using ML, the EMTD will reveal novel insights for predicting toxicological outcomes or physicochemical properties of new energetic compounds. ML allow complex, non-linear analysis and can overcome many limitations of traditional quantitative structure activity relationship methods. Regardless of their inherent sophistication, AI/ML approaches are useless without high-quality data sets. Everything in ML begins and ends with the quality of the underlying dataset and it is apparent that a sufficiently robust EMTD does not exist as of yet. This project has three inter-related goals that include:
(i) Consolidate existing data and available knowledge on energetic compounds into a single EMTD with a clearly defined set of quantitative and qualitative descriptors for: 1-physicochemical properties; 2-toxicity models (molecular-through higher organism endpoints); and 3-toxicity values.
(ii) Significantly expand the EMTD of (i) by identifying and including available toxicity data on structural/functional analogs of known energetics and related compounds.
(iii) Iterative testing of the growing EMTD of (i) and (ii) by selected ML approaches to monitor its predictive capability which will directly reflect its underlying depth and quality.
For goal i, the project team will begin by compiling all existing accessible EMT data into a single database. This will be collected from what is available at Army Public Health Center and other Department of Defense agencies along with easily accessible literature data. Once this initial EMTD is compiled, it will be tested by ML to provide a baseline assessment of predictive capacity. For goal ii, the project team will build and significantly expand the initial EMTD by manual and computer driven searches of academic, environmental, government, and commercial databases along with deep searching of published literature for data on structural and functional analogs of known and predicted energetic compounds. This data will be curated and continuously added to the EMTD. To accomplish goal iii, the project team will continuously test both the initial and growing EMTD using select ML approaches to confirm its predictive capabilities against a specifically set-aside test/query compound set with known properties. This will allow the project team to evaluate whether it has attained sufficient depth and diversity of content to enable high-quality ML analyses that can predict both toxicity and physicochemical properties of new materials. Unique to the approach is the use of subject matter experts at each critical point to help define which compounds are included, to curate the toxicity data that is entered into the EMTD, and to also perform the ML analyses.
This research will establish an EMTD to facilitate ML analyses and ultimately provide the sought after in silico predictive capabilities. Once established, the EMTD can and will be continuously tested by ML approaches to help define its predictive capability and how this improves as the database is expanded. The power of the ML approach will be significantly augmented by expanding EMTD content with data from many structurally similar compounds that are not necessarily energetics themselves. The approach is enabled by drawing from the entire body of available toxicity data. Moreover, ML-based analytical approaches can incorporate multiple different types of toxicity data along with categorical and quantitative data attributes. The EMTD is not static and can be continuously expanded and augmented even after this project ends. The project team intends to make the EMTD public and share/collaborate with others funded for the same research goals.
DISTRIBUTION A. Approved for public release: distribution unlimited.