Week 7 Worklog
Week 7 Objectives:
- Master AWS Analytics services: Kinesis, Glue, Athena, QuickSight
- Understand DynamoDB fundamentals and design patterns
- Learn data processing with AWS Glue, DataBrew, and EMR
- Build end-to-end data pipelines and analytics solutions
- Practice data visualization with QuickSight dashboards
Tasks to be carried out this week:
| Day | Task | Start Date | Completion Date | Reference Material |
|---|
| 1 | - Study AWS Analytics services overview + Kinesis for streaming data + Glue for ETL + Athena for querying + QuickSight for visualization - Learn DynamoDB basics | 2025/10/20 | 2025/10/20 | https://docs.aws.amazon.com/kinesis/ |
| 2 | - Lab 35: Data streaming pipeline + Create S3 bucket and Kinesis Firehose + Generate sample data + Create Glue Crawler + Query with Athena + Visualize with QuickSight | 2025/10/21 | 2025/10/21 | https://000035.awsstudygroup.com/ |
| 3 | - Lab 39: DynamoDB hands-on + Explore DynamoDB console + Create tables and items + Configure backups + Advanced design patterns + Build serverless applications | 2025/10/22 | 2025/10/22 | https://000039.awsstudygroup.com/ |
| 4 | - Lab 40: DynamoDB cost optimization + Prepare and build database + Analyze data and costs + Configure tagging for cost allocation + Monitor usage patterns | 2025/10/23 | 2025/10/23 | https://000040.awsstudygroup.com/ |
| 5 | - Lab 60: AWS CLI and SDK + Use CloudShell + Practice AWS Console operations + Work with AWS SDK - Lab 70: AWS Glue DataBrew + Create Cloud9 instance + Upload dataset to S3 + Profile and transform data | 2025/10/24 | 2025/10/24 | https://000060.awsstudygroup.com/ https://000070.awsstudygroup.com/ |
| 6 | - Lab 72: Complete data pipeline + Ingest and store data + Catalog with Glue + Transform with Glue, DataBrew, EMR + Analyze with Athena and Kinesis Analytics + Serve with Lambda + Warehouse with Redshift | 2025/10/25 | 2025/10/25 | https://000072.awsstudygroup.com/ |
| 7 | - Lab 73: QuickSight dashboards + Build basic dashboard + Add improvements + Create interactive visualizations - Weekly review and cleanup | 2025/10/26 | 2025/10/26 | https://000073.awsstudygroup.com/ |
Week 7 Achievements:
AWS Analytics Services:
- Understood Kinesis Data Streams, Firehose, and Analytics
- Learned AWS Glue for serverless ETL
- Mastered Athena for SQL queries on S3
- Explored QuickSight for business intelligence
Data Streaming Pipeline (Lab 35):
- Created Kinesis Firehose delivery stream
- Generated and ingested sample streaming data
- Configured Glue Crawler to catalog data
- Queried data with Athena SQL
- Built QuickSight visualizations and dashboards
- Implemented data transformation in Firehose
DynamoDB Mastery (Labs 39, 40):
- Created DynamoDB tables with partition and sort keys
- Implemented GSI and LSI for flexible queries
- Configured on-demand and provisioned capacity modes
- Set up point-in-time recovery and backups
- Learned DynamoDB design patterns (single-table design)
- Built serverless applications with DynamoDB
- Implemented DynamoDB Streams for event-driven architecture
- Optimized costs with tagging and capacity planning
Data Processing (Labs 60, 70, 72):
- Used AWS CloudShell for CLI operations
- Worked with AWS SDK for programmatic access
- Created Cloud9 development environment
- Uploaded and managed datasets in S3
- Profiled data with AWS Glue DataBrew
- Cleaned and transformed data with DataBrew recipes
- Processed data with AWS Glue interactive sessions
- Used Glue Studio GUI for visual ETL
- Ran big data processing with EMR
- Analyzed streaming data with Kinesis Data Analytics
- Served data via Lambda functions
- Loaded data into Redshift for warehousing
Data Visualization (Lab 73):
- Connected QuickSight to multiple data sources
- Built interactive dashboards with filters and parameters
- Created various chart types (bar, line, pie, heat maps)
- Implemented drill-down capabilities
- Shared dashboards with stakeholders
- Scheduled dashboard refreshes
Challenges Encountered:
- Kinesis Firehose Buffer: Data not appearing immediately → Understood buffer size and interval settings
- Glue Crawler Scheduling: Crawler not detecting new data → Configured proper schedule and S3 event triggers
- DynamoDB Hot Partition: Throttling on high-traffic partition key → Redesigned partition key for better distribution
- Athena Query Performance: Slow queries on large datasets → Implemented partitioning and columnar formats (Parquet)
- QuickSight Permissions: Cannot access S3 data → Granted QuickSight proper IAM permissions
- DataBrew Recipe: Transformation not working as expected → Tested recipe on sample data before full run
- EMR Cluster Costs: High costs for idle cluster → Used transient clusters and spot instances
References:
AWS Official Documentation:
AWS Workshops & Labs:
Technical Articles: