Referencias

Agencia de los Derechos Fundamentales de la Unión Europea (2019). Data quality and artificial intelligence: mitigating bias and error to protect fundamental rights, Publications Office, https://data.europa.eu/doi/10.2811/546219
Austermühl, F. (2001). Electronic tools for translators. Routledge.
Bonet-Jover, A., Sepúlveda-Torres, R., Saquete, E., Martínez-Barco, P., y Nieto-Pérez, M.(2024). RUN-AS: a novel approach to annotate news reliability for disinformation detection. Language Resources and Evaluation, 58(2), 609-639.
Botella-Gil, B., Espinosa-Zaragoza, I., Moreda, P., y Palomar, M. (2024). GPLSI: Corpus ClearSim.
Botella-Gil, B., Sepúlveda-Torres, R., Bonet-Jover, A., Martínez-Barco, P., y Saquete, E. (2024). Semi-automatic dataset annotation applied to automatic violent message detection. IEEE Access, 12, 19651-19664.
Creswell, J. W., y Plano Clark, V. L. (2018). Designing and Conducting Mixed Methods Research (3rd ed.). Thousand Oaks, CA: SAGE.
Cooke, A. (2001). A guide to finding quality information on the Internet: selection and evaluation strategies (2nd ed.). Library Association.
Jiménez Piano, M., y Ortiz-Repiso Jiménez, V. (2007). Evaluación y calidad de sedes web. Ediciones Trea.
Li, J., Fang, A., Smyrnis, G., Ivgi, M., Jordan, M., Gadre, S. Y., Bansal, H., Guha, E., Keh, S., Arora, K., Garg, S., Xin, R., Muennighoff, N., Heckel, R., Mercat, J., Chen, M., Gururangan, S., Wortsman, M., Albalak, A., Bitton, Y., Nezhurina, M., Abbas, A., Hsieh, C.-Y., Ghosh, D., Gardner, J., Kilian, M., Zhang, H., Shao, R., Pratt, S., Sanyal, S., Ilharco, G., Daras, G., Marathe, K., Gokaslan, A., Zhang, J., Chandu, K., Nguyen, T., Vasiljevic, I., Kakade, S., Song, S., Sanghavi, S., Faghri, F., Oh, S., Zettlemoyer, L., Lo, K., El-Nouby, A., Pouransari, H., Toshev, A., Wang, S., Groeneveld, D., Soldaini, L., Koh, P. W., Jitsev, J., Kollar, T., Dimakis, A. G., Carmon, Y., Dave, A., Schmidt, L., y Shankar, V. (2024). Datacomp-lm: In search of the next generation of training sets for language models. Advances in Neural Information Processing Systems, 37, 14200–14282.
Miró-Maestre, M., Estevanell-Valladares, E. L., Sepúlveda-Torres, R., y Suárez-Cueto, A. (2025). Enhancing Pragmatic Processing: A Two-Dimension Approach to Detecting Intentions in Spanish. Procesamiento del lenguaje natural, 74, 263-276.
Miró-Maestre, M., Martínez-Murillo, I., Lloret, E., Moreda, P., y Suárez-Cueto, A. (2024). COCOTEROS: A spanish corpus with contextual knowledge for natural language generation. In 40th Annual Conference of the Spanish Association for Natural Language Processing (p. 2024).
Penedo, G., Kydlíček, H., Lozhkov, A., Mitchell, M., Raffel, C. A., Von Werra, L., y Wolf, T. (2024a). The fineweb datasets: Decanting the web for the finest text data at scale. Advances in Neural Information Processing Systems, 37, 30811-30849.
Penedo, G., Kydlíček, H., Sabolčec, V., Messmer, B., Foroutan, N., Jaggi, M., von Werra, L., y Wolf, T. (2024b). FineWeb2: A sparkling update with 1000s of languages. HuggingFace. https://huggingface.co/datasets/HuggingFaceFW/fineweb-2
Penedo, G., Malartic, Q., Hesslow, D., Cojocaru, R., Cappelli, A., Alobeidli, H., Pannier, B., Almazrouei, E., y Launay, J. (2023). The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., y Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67.
Soboleva, D., Al-Khateeb, F., Myers, R., Steeves, J. R., Hestness, J., y Dey, N. (2023). SlimPajama: A 627B token cleaned and deduplicated version of RedPajama. URL: https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama/
Together Computer. (2023). RedPajama: An open-source reproduction of LLaMA training dataset. https://www.together.xyz/blog/redpajama
Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P., Yu, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L., y Levy, O. (2023). Lima: Less is more for alignment (arXiv:2305.11206). arXiv. https://arxiv.org/abs/2305.11206