Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/11531/110742| Título : | SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data |
| Autor : | Chen, Olivia Chou, Kara Nagpal, Rashmi Palacios Hielscher, Rafael Gupta, Amar |
| Resumen : | Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks. |
| URI : | http://hdl.handle.net/11531/110742 |
| Aparece en las colecciones: | Documentos de Trabajo |
Ficheros en este ítem:
| Fichero | Tamaño | Formato | |
|---|---|---|---|
| IIT-25-413C.pdf | 199,36 kB | Adobe PDF | Visualizar/Abrir Request a copy |
Los ítems de DSpace están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.