Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/11531/110698| Título : | SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data |
| Autor : | Chen, Olivia Chou, Kara Nagpal, Rashmi Palacios Hielscher, Rafael Gupta, Amar |
| Fecha de publicación : | 26-may-2026 |
| Editorial : | Massachussets Institute of Technology; Institute of Electrical and Electronics Engineers (Cambridge, Estados Unidos de América) |
| Resumen : | Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks. Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks. |
| Descripción : | Capítulos en libros |
| URI : | http://hdl.handle.net/11531/110698 |
| Aparece en las colecciones: | Artículos |
Ficheros en este ítem:
| Fichero | Tamaño | Formato | |
|---|---|---|---|
| IIT-25-413C.pdf | 199,36 kB | Adobe PDF | Visualizar/Abrir Request a copy |
Los ítems de DSpace están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.