I had the task of storing a .CSV format file in an Amazon S3 storage, but the pre-requisite was converting this .CSV format file to Apache Parquet format. Thinking from this perspective, I assume that one of the reasons for this conversion is to reduce costs, and for that, the tool Amazon Glue is the perfect tool to accomplish this task. I thought to describe and show my implementation here, but I realized that there are many how-tos that I followed, and their link is shared below as a source:
Convert CSV / JSON files to Apache Parquet using AWS Glue
Three AWS Glue ETL job types for converting data to Apache Parquet
Format Options for ETL Inputs and Outputs in AWS Glue
AWS Glue | CSV to Parquet transformation | Getting started
AWS: How to use AWS Glue ETL to convert CSV to Parquet – Tutorial
Hi! I am Bruno, a Brazilian born and bred, and I am also a naturalized Swedish citizen. I am a former Oracle ACE and, to keep up with academic research, I am a Computer Scientist with an MSc in Data Science and another MSc in Software Engineering. I have over ten years of experience working with companies such as IBM, Epico Tech, and Playtech across three different countries (Brazil, Hungary, and Sweden), and I have joined projects remotely in many others. I am super excited to share my interests in Databases, Cybersecurity, Cloud, Data Science, Data Engineering, Big Data, AI, Programming, Software Engineering, and data in general.
(Continue reading)