Show simple item record

dc.contributor.authorChen, Oliviaes-ES
dc.contributor.authorChou, Karaes-ES
dc.contributor.authorNagpal, Rashmies-ES
dc.contributor.authorPalacios Hielscher, Rafaeles-ES
dc.contributor.authorGupta, Amares-ES
dc.date.accessioned2026-06-12T06:49:28Z
dc.date.available2026-06-12T06:49:28Z
dc.date.issued2026-05-26es_ES
dc.identifier.urihttp://hdl.handle.net/11531/110698
dc.descriptionCapítulos en libroses_ES
dc.description.abstractMachine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks.es-ES
dc.description.abstractMachine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks.en-GB
dc.format.mimetypeapplication/pdfes_ES
dc.language.isoen-GBes_ES
dc.publisherMassachussets Institute of Technology; Institute of Electrical and Electronics Engineers (Cambridge, Estados Unidos de América)es_ES
dc.rightses_ES
dc.rights.uries_ES
dc.sourceLibro: Undergraduate Research Technology Conference - MIT URTC 2025, Página inicial: 1-5, Página final:es_ES
dc.subject.otherInstituto de Investigación Tecnológica (IIT)es_ES
dc.titleSemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Dataes_ES
dc.typeinfo:eu-repo/semantics/bookPartes_ES
dc.description.versioninfo:eu-repo/semantics/publishedVersiones_ES
dc.rights.accessRightsinfo:eu-repo/semantics/restrictedAccesses_ES
dc.keywordsTabular Data, Semantic Feature Generation, LLMs, Model Interpretabilityes-ES
dc.keywordsTabular Data, Semantic Feature Generation, LLMs, Model Interpretabilityen-GB


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Artículos
    Artículos de revista, capítulos de libro y contribuciones en congresos publicadas.

Show simple item record