Yousry MohamedinLevel Up CodingDelta Lake Liquid Clustering — A visual explanationHow to optimize lakehouse data storage layout with minimal effort.10 min read·Jan 28, 2024--1--1
Yousry MohamedinLevel Up CodingBuilding a lakehouse on Google Cloud sans DatabricksCombining Delta Lake, Iceberg and BigLake16 min read·Dec 17, 2023----
Yousry MohamedinLevel Up CodingNavigating the Void: Unraveling the Mysteries and Pitfalls of the ‘Void’ Data Type in Apache SparkExploring how a void column appears into Spark DataFrames and what could be the implications.8 min read·Dec 5, 2023----
Yousry MohamedinLevel Up CodingSetting up a PySpark local developmet environment for Dataproc serverlessSmooth dependency managment for local developmet and production jobs11 min read·Aug 29, 2023----
Yousry MohamedinLevel Up Codingdbt tests vs Delta Live Tables expectations : a click bait to Spark observable metricsComparing dbt and DLT tests performance and corelating to Spark observable metrics7 min read·Aug 5, 2023----
Yousry MohamedinLevel Up CodingDelta Lake Universal Format — A First LookWrite as delta lake — Read as iceberg8 min read·Jul 15, 2023----
Yousry MohamedinLevel Up CodingBack to basics : Spark caching key ideas!Foundational concepts about how Spark caches Dataframes vs RDDs6 min read·Mar 13, 2023----
Yousry MohamedDelta lake Z-Ordering from A to ZUnderstand how to optimise delta lake tables for high cardinality queries.15 min read·Sep 19, 2022--5--5
Yousry MohamedinTowards Data ScienceIdempotent writes to delta lake tablesA walkthrough using open source delta lake10 min read·Jul 8, 2022--3--3
Yousry MohamedinAnalytics VidhyaSpark Session and the singleton misconception!What structured streaming reveals about Spark Session7 min read·Mar 30, 2022--1--1