This work in in submission to National High School Journal of Science
Eric Su Zhang1, Benjamin Joseph Michael Standefer1, Stewart Mayer2
- St. Mark’s School of Texas, Dallas, Texas
- Department of Science, St. Mark’s School of Texas, Dallas, Texas
Automated Machine Learning (AutoML) has emerged as a popular field of research. We present a literature review of existing AutoML papers and conduct a survey on five popular AutoML frameworks. Typically, these frameworks engage in model selection by requiring every model to be run and assessed, a process both time-intensive and computationally expensive. In response, we propose a novel framework, Smart Model Elimination Machine Learning (SMEML), that strategically eliminates models that are unlikely to yield high accuracy. We trained a multi-label XGBoost regression model based on 268 datasets that had their attributes extracted. Based on the attributes of each dataset, our framework can rank the models it believes will be the most performative. Based on this ranking, we can eliminate the lower ranked models. SMEML demonstrates the ability to achieve comparable accuracy to a traditional brute-force approach while significantly reducing the time required. Compared to the brute-force approach, SMEML is 135.2% faster on average. We also believe this innovation is particularly beneficial to machine learning in healthcare, where efficient and accurate disease prediction is crucial.