• English
    • español
  • español 
    • English
    • español
  • Login
Ver ítem 
  •   DSpace Principal
  • 2.- Investigación
  • Artículos
  • Ver ítem
  •   DSpace Principal
  • 2.- Investigación
  • Artículos
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data

Thumbnail
Ver/
IIT-25-413C.pdf (199.3Kb)
Fecha
2026-05-26
Autor
Chen, Olivia
Chou, Kara
Nagpal, Rashmi
Palacios Hielscher, Rafael
Gupta, Amar
Estado
info:eu-repo/semantics/publishedVersion
Metadatos
Mostrar el registro completo del ítem
Mostrar METS del ítem
Ver registro en CKH

Refworks Export

Resumen
Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks.
 
Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks.
 
URI
http://hdl.handle.net/11531/110698
SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data
Tipo de Actividad
Capítulos en libros
Materias/ categorías / ODS
Instituto de Investigación Tecnológica (IIT)
Palabras Clave
Tabular Data, Semantic Feature Generation, LLMs, Model Interpretability
Tabular Data, Semantic Feature Generation, LLMs, Model Interpretability
Colecciones
  • Artículos

Repositorio de la Universidad Pontificia Comillas copyright © 2015  Desarrollado con DSpace Software
Contacto | Sugerencias
 

 

Búsqueda semántica (CKH Explorer)


Listar

Todo DSpaceComunidades & ColeccionesPor fecha de publicaciónAutoresTítulosMateriasPor DirectorPor tipoEsta colecciónPor fecha de publicaciónAutoresTítulosMateriasPor DirectorPor tipo

Mi cuenta

AccederRegistro

Repositorio de la Universidad Pontificia Comillas copyright © 2015  Desarrollado con DSpace Software
Contacto | Sugerencias