Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/110698
Título : SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data
Autor : Chen, Olivia
Chou, Kara
Nagpal, Rashmi
Palacios Hielscher, Rafael
Gupta, Amar
Fecha de publicación : 26-may-2026
Editorial : Massachussets Institute of Technology; Institute of Electrical and Electronics Engineers (Cambridge, Estados Unidos de América)
Resumen : Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks.
Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks.
Descripción : Capítulos en libros
URI : http://hdl.handle.net/11531/110698
Aparece en las colecciones: Artículos

Ficheros en este ítem:
Fichero Tamaño Formato  
IIT-25-413C.pdf199,36 kBAdobe PDFVisualizar/Abrir     Request a copy


Los ítems de DSpace están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.