SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data
Fecha
2026-05-26Estado
info:eu-repo/semantics/publishedVersionMetadatos
Mostrar el registro completo del ítemResumen
Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks. Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks.
SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data
Tipo de Actividad
Capítulos en librosMaterias/ categorías / ODS
Instituto de Investigación Tecnológica (IIT)Palabras Clave
Tabular Data, Semantic Feature Generation, LLMs, Model InterpretabilityTabular Data, Semantic Feature Generation, LLMs, Model Interpretability

