Exam Preparation – AWS Certified Data Analytics – Specialty

Many people don’t know that the idea of learning from data is not something from the current decade. It was pointed out over 50 years ago by John Tukey in “The Future of Data Analysis” (Donoho, D. (2017)). More and more companies have a Data Warehouse (Data Lake or both – Data Lakehouse) that can handle a massive amount of data, such as 900 TB, and has data imported constantly with SQL-like queries and operators accessing it over the day. What is the reason behind it? Identifying and visualizing patterns can help one gain competitive advantages for their businesses (Bose, R. (2009)). I have been covering some data analytics examples in many posts in this blog:

Error Forbidden: 403 Forbidden 453 while building a dataset from Twitter using Python tweepy

Analyzing my personal running history data of 2022

Export Data From Nike Run Club in GPX File Type for Data Analytics

And despite that, I have been working with this for the last few years and have just completed a master’s program in Data Science in 2022; only now have I decided to take a certification in this topic, and therefore, I chose the AWS Certified Data Analytics – Specialty. Thus, to prepare for this exam, I covered all the products in the Data Analytics section and its different products as below:

As a matter of fact, I can’t get enough of the Amazon Athena, it is one of my favorite products so far.

Additionally, I enjoy reading books, so I acquired the AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam 1st Edition by Asif Abbasi. On top of that, I did some study cases based on some Amazon posts that I considered as helpful in my preparation:

Integrating MongoDB’s Application Data Platform with Amazon Kinesis Data Firehose: https://aws.amazon.com/blogs/big-data/integrating-the-mongodb-cloud-with-amazon-kinesis-data-firehose/
Difference between data lake and data warehouse: https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/ and https://aws.amazon.com/data-warehouse/
Apache HBase: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase.html
Install Kibana and differences with Grafana (I use Grafana at work): https://aws.amazon.com/what-is/elk-stack/
AWS Glue to handle data to build a dataset: https://aws.amazon.com/glue/
Apache Spark SQL on Amazon EMR: https://aws.amazon.com/emr/features/spark/
Amazon Redshift database encryption: https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html
Build, Train, and Deploy a Machine Learning Model with Amazon SageMaker: https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/
Data Mart: https://aws.amazon.com/what-is/data-mart/
Build a SQL-based ETL pipeline with Apache Spark on Amazon EKS: https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/
Perform Adhoc queries using Amazon Athena: https://www.youtube.com/watch?v=Dmw7HOOmiJQ
Write prepared data directly into JDBC-supported destinations using AWS Glue DataBrew: https://aws.amazon.com/blogs/big-data/write-prepared-data-directly-into-jdbc-supported-destinations-using-aws-glue-databrew/
Apache Flink: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html
Supported formats for Amazon S3 manifest files: https://docs.aws.amazon.com/quicksight/latest/user/supported-manifest-file-format.html
Data ingestion methods: https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/data-ingestion-methods.html
Using the Parquet format in AWS Glue: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-parquet-home.html
Run a Spark SQL-based ETL pipeline with Amazon EMR on Amazon EKS: https://aws.amazon.com/blogs/big-data/run-a-spark-sql-based-etl-pipeline-with-amazon-emr-on-amazon-eks/

After waiting for my results to be published, I got confirmation from Amazon that I succeeded in the exam. Therefore, I hope this post helps you, and have a fantastic learning journey!

References:

Donoho, David. “50 years of data science.” Journal of Computational and Graphical Statistics 26.4 (2017): 745-766

Bose, R. (2009). Advanced analytics: opportunities and challenges. Industrial Management & Data Systems.

brunors

Hi! I am Bruno, a Brazilian born and bred, and I am also a naturalized Swedish citizen. I am a former Oracle ACE and, to keep up with academic research, I am a Computer Scientist with an MSc in Data Science and another MSc in Software Engineering. I have over ten years of experience working with companies such as IBM, Epico Tech, and Playtech across three different countries (Brazil, Hungary, and Sweden), and I have joined projects remotely in many others. I am super excited to share my interests in Databases, Cybersecurity, Cloud, Data Science, Data Engineering, Big Data, AI, Programming, Software Engineering, and data in general.
(Continue reading)

Exam Preparation – AWS Certified Data Analytics – Specialty

Related posts

Leave a Comment Cancel reply

Related posts

From Stockholm Marathon to Multi-cloud (OCI, AWS, Azure, GCP) Strategy

EXAM PREPARATION – AWS Certified Machine Learning – Specialty

Transforming .CSV format file in Apache Parquet format via Amazon Glue

Leave a Comment Cancel reply