Dinesh Kumar Sai M





I am a Data Engineer Professional with experience in Cloud Data Warehouse Technologies, where I build ETL Pipelines, perform Data and Dimensional Modelling, and work on Business Intelligence, Data Quality, Analysis and Validation frameworks. I possess skills in documenting, analyzing, designing, implementing, and testing these frameworks across various projects. My experience includes working with Azure Synapse Analytics, Azure Data Factory, Microsoft SQL Server and T-SQL, Data Lake with Azure Data Lake Storage, Transformations with PySpark, Azure Databricks, Snowflake Data Warehouse, DBT, and Python. Additionally, I am a Microsoft and Databricks Certified Data Engineer Associate.


Experience

Senior Software Engineer - Data Engineering and Data Warehousing

Tiger Analytics

  1. Cloud Data Warehousing with Multiple Data Sources Project for Fin-tech based client
    • Promotoed and continued to work on the same project and managed the Data warehouse and ETL pipelines in Synapse Analytics
    • The Stored Procedures are created for the aggregated Fact tables load and SCD supported load and the Materialized Views are created on top of the aggregated facts for Power BI Datasets
    • Data Validation of raw data and Error Containers are created in ADLS and Delta Lake to avoid the Pipeline failures due to erroneous and invalid data
    • Schema Comparison frameworks are designed and implemented for PostgreSQL and MongoDB data sources with the Data Warehouse tables
  2. Data Quality and Profiling Application for Cloud Data Warehouses and Multiple Data Sources
    • As part of a developer in internal data quality accelerator application, my job is to create a python script which supports data quality and validation with Great Expectation library whenever it called from front-end
    • Reads data from any of the GE supported cloud data warehouses / data sources especially Snowflake and then created a Great Expectation Data source
    • Based on the test cases need to be validated on the Great Expectation Data source, the Expectation Suite is created and and column with test case mappings are stored
    • Once Expectation Suite saved, the data sources validated and stored the validation results in the PostgreSQL and Snowflake tables
    • Created the small Flask based API's to generally GET and POST the data from front-end to the databases and vice-versa

July 2022 - Present

Software Engineer - Data Engineering and Data Warehousing

Tiger Analytics

Cloud Data Warehousing with Multiple Data Sources Project for Fin-tech based client

  • The Azure Synapse Analytics was provisioned in the Azure Web UI along with Azure Data Lake Storage Gen2, a Dedicated SQL Pool (Data Warehouse), and a Spark Pool for transformations, specifically for the development environment. For the other environments (STG, PERF, QA1, SIT1, UAT, and PROD), ARM scripts were prepared and used to provision the same resources
  • The Linked Services in Synapse Analytics are created for the PostgreSQL and MongoDB servers with both Azure Integration Runtime and Self-Hosted Integration Runtime. The required Integration Datasets are created along with the staging tables
  • For Historical data load, the config file is created to have the queries of historical data by past two years on month wise. The Synapse Pipeline is created initially to load from the source and stored the raw data in Azure Data Lake Storage (parquet for PostgreSQL and JSON for MongoDB). The JSON data gets flattened with Pyspark Code on Spark pool and stored the flattened data again into Azure Data Lake Storage. The processed data from ADLS loaded to staging tables and then upsert into Facts and Dimensions created in the Dedicated SQL Pool
  • For Incremental / Daily data load, the control log table created and used it to support for incremental date filter. Similar to historical data load, the Synapse Pipeline is created to load from the source and stored the raw data in Azure Data Lake Storage (parquet for PostgreSQL and JSON for MongoDB). The JSON data gets flattened with Pyspark Code on Spark pool and stored the flattened data again into Azure Data Lake Storage. The processed data from ADLS loaded to staging tables and then upsert into Facts and Dimensions created in the Dedicated SQL Pool
  • The Audit log and Error log framework is designed and created for the Pipeline triggers. For Pipeline trigger monitoring, the Azure Monitor service and the Azure Logic App is created to notify stakeholders when pipeline trigger failed
  • The Proof Of Concept and feasibility check is done for the Data Governance tool - Microsoft Purview for the Data Sources, Synapse Data warehouse with Power Bi Reports and presented the findings to the Client's Design Committee
  • For securing the source's connection strings and server passwords and instead of whitelisting it in the Linked Services, the Azure Key Vault is created and saved as secrets to avoid whitelisting it in synapse

August 2021 - June 2022

Data Engineering Intern

Tiger Analytics

  1. Learned the fundamentals and basics of Data Engineering - SQL, Data Modelling, Data Warehousing, Python, Big Data, Spark and Azure Cloud
  2. Completed a Mini Project using Python and MYSQL
    • In Python, connected to a MYSQL Server and ingested a CSV file into a table in the database
    • Written the SQL queries into multiple files and read it in Python
    • Execute the queries read from the file with Python connector and printed the result.
  3. Done the Aviation based Data Analysis case-study
    • Examined the aviation related data to provide solution (visualizations) for the business insights problems
    • Transferred the datasets from local storage to Azure Cloud SQL database and created Raw Layer of Data Warehouse on Databricks
    • Cleaned and Tranformed the raw layer data and loaded to staging layer tables
    • Analyzed and Visualized the data for business insights using Databricks and Power BI

February 2021 - July 2021

Education

Sri Venkateswara College of Engineering (Autonomous), Anna University

Bachelor of Engineering
Computer Science and Engineering

CGPA: 9.45

July 2017 - June 2021

Dr. VGN Matriculation and Higher Secondary School

Higher Secondary School Education

Percentage: 95.5%

June 2016 - April 2017

Skills

Programming Languages & Tools
Programming Languages and Databases :
  • Python
  • Microsoft SQL Server - T-SQL
  • MYSQL

Technologies and Stacks :
  • Cloud Data Warehousing
  • Azure Synapse Analytics with Dedicated SQL Pool and Spark Pool
  • Azure Data Lake Storage
  • Apache Spark - PySpark
  • API with Python Flask SQLAlchemy
  • Data Modelling - Dimensional Modelling
  • ETL / ELT
  • Snowflake Data Warehousing
  • Databricks
  • DBT

Certifications & Accrediations

Microsoft Certifications :
  • Microsoft Certified : Azure Fundamentals
  • Microsoft Certified : Data Fundamentals
  • Microsoft Certified : Data Engineer Associate

Databricks Certifications & Accrediations :
  • Databricks Certified Data Engineer Associate
  • Databricks Certified Associate Developer for Apache Spark 3.0
  • Academy Accrediation - Databricks Lakehouse Fundamentals

Snowflake Badges :
  • Hands On Essentials Badge - Data Warehouse
  • Hands On Essentials Badge - Data Applications
  • Hands On Essentials Badge - Data Sharing
  • Hands On Essentials Badge - Data Lake
  • Hands On Essentials Badge - Data Engineering

Contact

      dineshvenkatm@gmail.com

      +91 96262 10026