Published inLevel Up CodingStop using plain PySpark UDFs : No one likes slow cars! Part IIQuick insights about how Pandas UDFs workSep 7Sep 7
Published inLevel Up CodingStop using plain PySpark UDFs : No one likes slow cars!How complex logic can be still implemented using out of the box Spark functions with lightning fast performance.Jul 183Jul 183
Published inLevel Up CodingDelta Lake Liquid Clustering — A visual explanationHow to optimize lakehouse data storage layout with minimal effort.Jan 283Jan 283
Published inLevel Up CodingBuilding a lakehouse on Google Cloud sans DatabricksCombining Delta Lake, Iceberg and BigLakeDec 17, 2023Dec 17, 2023
Published inLevel Up CodingNavigating the Void: Unraveling the Mysteries and Pitfalls of the ‘Void’ Data Type in Apache SparkExploring how a void column appears into Spark DataFrames and what could be the implications.Dec 5, 2023Dec 5, 2023
Published inLevel Up CodingSetting up a PySpark local developmet environment for Dataproc serverlessSmooth dependency managment for local developmet and production jobsAug 29, 2023Aug 29, 2023
Published inLevel Up Codingdbt tests vs Delta Live Tables expectations : a click bait to Spark observable metricsComparing dbt and DLT tests performance and corelating to Spark observable metricsAug 5, 2023Aug 5, 2023
Published inLevel Up CodingDelta Lake Universal Format — A First LookWrite as delta lake — Read as icebergJul 15, 2023Jul 15, 2023
Published inLevel Up CodingBack to basics : Spark caching key ideas!Foundational concepts about how Spark caches Dataframes vs RDDsMar 13, 2023Mar 13, 2023
Delta lake Z-Ordering from A to ZUnderstand how to optimise delta lake tables for high cardinality queries.Sep 19, 20225Sep 19, 20225