| dc.contributor.author | Chen, Olivia | es-ES |
| dc.contributor.author | Chou, Kara | es-ES |
| dc.contributor.author | Nagpal, Rashmi | es-ES |
| dc.contributor.author | Palacios Hielscher, Rafael | es-ES |
| dc.contributor.author | Gupta, Amar | es-ES |
| dc.date.accessioned | 2026-06-12T06:49:28Z | |
| dc.date.available | 2026-06-12T06:49:28Z | |
| dc.date.issued | 2026-05-26 | es_ES |
| dc.identifier.uri | http://hdl.handle.net/11531/110698 | |
| dc.description | Capítulos en libros | es_ES |
| dc.description.abstract | Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks. | es-ES |
| dc.description.abstract | Machine learning models on tabular datasets often struggle to understand the context between features, which can limit their accuracy. We propose SemTab, a hybrid framework for generating semantic features that utilizes an open-source Large Language Model (LLM). We evaluated our framework using three benchmark datasets: Adult Income, German Credit, and Bank Marketing. We compared its performance against several off-the-shelf LLMs. The results show that SemTab achieved the highest accuracy across all the classification tasks. For instance, on the Bank Marketing dataset, SemTab achieved an accuracy of 8 0%, which is approximately 2 0% improvement over the baseline models. This work highlights that a hybrid architecture is a practical approach for applying language models to structured tabular data, yielding accurate and interpretable results for various downstream tasks. | en-GB |
| dc.format.mimetype | application/pdf | es_ES |
| dc.language.iso | en-GB | es_ES |
| dc.publisher | Massachussets Institute of Technology; Institute of Electrical and Electronics Engineers (Cambridge, Estados Unidos de América) | es_ES |
| dc.rights | | es_ES |
| dc.rights.uri | | es_ES |
| dc.source | Libro: Undergraduate Research Technology Conference - MIT URTC 2025, Página inicial: 1-5, Página final: | es_ES |
| dc.subject.other | Instituto de Investigación Tecnológica (IIT) | es_ES |
| dc.title | SemTab: A Hybrid Framework for Semantic Feature Generation on Tabular Data | es_ES |
| dc.type | info:eu-repo/semantics/bookPart | es_ES |
| dc.description.version | info:eu-repo/semantics/publishedVersion | es_ES |
| dc.rights.accessRights | info:eu-repo/semantics/restrictedAccess | es_ES |
| dc.keywords | Tabular Data, Semantic Feature Generation, LLMs, Model Interpretability | es-ES |
| dc.keywords | Tabular Data, Semantic Feature Generation, LLMs, Model Interpretability | en-GB |