Running a marathon is a monumental undertaking. Even though I had been preparing for 1.8 years, it wasn’t until a few weeks ago that I truly grasped the significance of this feat. The turning point came when I made the decision to run for a charity organization in Brazil, specifically Casa da Crianca Paralítica de Campinas (https://www.ccp.org.br/web/). This organization provides vital medical, dental, and pedagogical support to children with physical and mental disabilities in Campinas, Sao Paulo State, Brazil. Moreover, during my preparations, I delved into the origin of the…
Category: Data Analytics (Hadoop/Streaming Data/Machine Learning/Data and information visualization/Big Data)
EXAM PREPARATION – AWS Certified Machine Learning – Specialty
I have been studying not only for this certification of Machine Learning but for other in Data Engineering since I started my master’s studies in Data Science a few years ago. And the reason behind that is that I was already working with the AWS console and was overwhelmed with the changing data management I saw while a database administrator. I still remember everyone talking about this Machine Learning thing, and I had no clue back then what it was. I was seen many database administrators mistakenly changing their role…
Transforming .CSV format file in Apache Parquet format via Amazon Glue
I had the task of storing a .CSV format file in an Amazon S3 storage, but the pre-requisite was converting this .CSV format file to Apache Parquet format. Thinking from this perspective, I assume that one of the reasons for this conversion is to reduce costs, and for that, the tool Amazon Glue is the perfect tool to accomplish this task. I thought to describe and show my implementation here, but I realized that there are many how-tos that I followed, and their link is shared below as a source:…
Exam Preparation – Google Cloud Certified – Professional Data Engineer
Google Professional Data Engineer is one of the most exciting and interesting exams I was aiming to pass. Over my career, I have worked mainly with database management until I realized something had happened – The RDBMS databases were no longer the inevitable choice for data management. As a Data Engineer, one is expected to work still with databases and streaming technologies, microservices, third-party APIs, data pipelines, big data technologies, etc. And back then, I had no idea what was happening. Therefore, what does one do when do not know…
Exam Preparation – Google Cloud Certified – Professional Cloud Database Engineer
I have aimed to take a Google certification since starting my career. However, I only had the opportunity to work with Google’s products in the last 3 years, when I began working with GCP Cloud. Therefore, I started my preparation in the segment that is my domain: databases. And, mainly because I wanted to go deeper into a product that as Athena for AWS and Oracle as RDBMS, it became one of my favorite products to work with, which is Google’s fully managed, serverless data warehouse named BigQuery. To my…
Exam Preparation – AWS Certified Data Analytics – Specialty
Many people don’t know that the idea of learning from data is not something from the current decade. It was pointed out over 50 years ago by John Tukey in “The Future of Data Analysis” (Donoho, D. (2017)). More and more companies have a Data Warehouse (Data Lake or both – Data Lakehouse) that can handle a massive amount of data, such as 900 TB, and has data imported constantly with SQL-like queries and operators accessing it over the day. What is the reason behind it? Identifying and visualizing patterns…
Analyzing my personal running history data of 2022
2022 was an odd year. I started running in 2021, but only in 2022 I began to compete seriously as an amateur. Therefore, I competed in 12 official races where 9 of which were 10km races and three half-marathon (21 km each). However, using gadgets that helped me to record my results, I ended with 98 unique records, which counted all the official races and all the training. Bearing this in mind, what can we do with this whole amount of saved data? The answer is simple: analyzing and visualizing…
Export Data From Nike Run Club in GPX File Type for Data Analytics
I came up with the idea of creating a dataset to analyze my runnings from 2021 to 2022. Therefore, as I was using Nike Run Club, I tried to extract the data from the application. Still, unfortunately, I only managed to do it using third-parties software. In this case, it was used the RunGAP Workout Data Manager for iOS to get the complete training history to create from the Nike Run Club app to upload it to the DropBox file storage and sharing service. From the DropBox, I successfully downloaded…
Error Forbidden: 403 Forbidden 453 while building a dataset from Twitter using Python tweepy
I am developing an application to collect information from Twitter social media to provide some analysis through big data analytics. Therefore, the first step was creating a dataset with all the information collected from the platform, coding (I chose Python programming language), and then applying all machine learning techniques that I thought would be helpful in my project. However, while working on the first step to collect the data, I received the following error from the Twitter platform: To solve this problem, after googling. I discovered that there is a…