Yousry Mohamed – Medium

Yousry Mohamed

Published in
Level Up Coding

Resurrecting Scala in Spark : Another tool in your toolbox when Python and Pandas suffer

Spark Dataset API is still useful to handle some edge cases that require extra flexibility but still run super fast.

Jan 3

Resurrecting Scala in Spark : Another tool in your toolbox when Python and Pandas suffer

Jan 3

Published in
Level Up Coding

Stop using plain PySpark UDFs : No one likes slow cars! Part II

Quick insights about how Pandas UDFs work

Sep 7, 2024

Stop using plain PySpark UDFs : No one likes slow cars! Part II

Sep 7, 2024

Published in
Level Up Coding

Stop using plain PySpark UDFs : No one likes slow cars!

How complex logic can be still implemented using out of the box Spark functions with lightning fast performance.

Jul 18, 2024

Stop using plain PySpark UDFs : No one likes slow cars!

Jul 18, 2024

Published in
Level Up Coding

Delta Lake Liquid Clustering — A visual explanation

How to optimize lakehouse data storage layout with minimal effort.

Jan 28, 2024

Delta Lake Liquid Clustering — A visual explanation

Jan 28, 2024

Published in
Level Up Coding

Building a lakehouse on Google Cloud sans Databricks

Combining Delta Lake, Iceberg and BigLake

Dec 17, 2023

Building a lakehouse on Google Cloud sans Databricks

Dec 17, 2023

Published in
Level Up Coding

Navigating the Void: Unraveling the Mysteries and Pitfalls of the ‘Void’ Data Type in Apache Spark

Exploring how a void column appears into Spark DataFrames and what could be the implications.

Dec 5, 2023

Navigating the Void: Unraveling the Mysteries and Pitfalls of the ‘Void’ Data Type in Apache Spark

Dec 5, 2023

Published in
Level Up Coding

Setting up a PySpark local developmet environment for Dataproc serverless

Smooth dependency managment for local developmet and production jobs

Aug 29, 2023

Setting up a PySpark local developmet environment for Dataproc serverless

Aug 29, 2023

Published in
Level Up Coding

dbt tests vs Delta Live Tables expectations : a click bait to Spark observable metrics

Comparing dbt and DLT tests performance and corelating to Spark observable metrics

Aug 5, 2023

dbt tests vs Delta Live Tables expectations : a click bait to Spark observable metrics

Aug 5, 2023

Published in
Level Up Coding

Delta Lake Universal Format — A First Look

Write as delta lake — Read as iceberg

Jul 15, 2023

Delta Lake Universal Format — A First Look

Jul 15, 2023

Published in
Level Up Coding

Back to basics : Spark caching key ideas!

Foundational concepts about how Spark caches Dataframes vs RDDs

Mar 13, 2023

Back to basics : Spark caching key ideas!

Mar 13, 2023

Yousry Mohamed

Yousry Mohamed

Yousry is a principal data engineer working for Mantel Group. He is very passionate about all things data including Big Data, Machine Learning and AI.

Following

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech