SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data
View/ Open
Date
2026-05-26Estado
info:eu-repo/semantics/publishedVersionMetadata
Show full item recordAbstract
Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks. Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks.
SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data
Tipo de Actividad
Capítulos en librosMaterias/ categorías / ODS
Instituto de Investigación Tecnológica (IIT)Palabras Clave
Tabular Data, Semantic Feature Generation, LLMs, Model InterpretabilityTabular Data, Semantic Feature Generation, LLMs, Model Interpretability

