Dinesh Kumar Sai M

I am a Data Engineer Professional with experience in Cloud Data Warehouse Technologies, where I build ETL Pipelines, perform Data and Dimensional Modelling, and work on Business Intelligence, Data Quality, Analysis and Validation frameworks. I possess skills in documenting, analyzing, designing, implementing, and testing these frameworks across various projects. My experience includes working with Azure Synapse Analytics, Azure Data Factory, Microsoft SQL Server and T-SQL, Data Lake with Azure Data Lake Storage, Transformations with PySpark, Azure Databricks, Snowflake Data Warehouse, DBT, and Python. Additionally, I am a Microsoft and Databricks Certified Data Engineer Associate.

Experience

Senior Software Engineer - Data Engineering and Data Warehousing

Tiger Analytics

Cloud Data Warehousing with Multiple Data Sources Project for Fin-tech based client

Promotoed and continued to work on the same project and managed the Data warehouse and ETL pipelines in Synapse Analytics
The Stored Procedures are created for the aggregated Fact tables load and SCD supported load and the Materialized Views are created on top of the aggregated facts for Power BI Datasets
Data Validation of raw data and Error Containers are created in ADLS and Delta Lake to avoid the Pipeline failures due to erroneous and invalid data
Schema Comparison frameworks are designed and implemented for PostgreSQL and MongoDB data sources with the Data Warehouse tables

Data Quality and Profiling Application for Cloud Data Warehouses and Multiple Data Sources

As part of a developer in internal data quality accelerator application, my job is to create a python script which supports data quality and validation with Great Expectation library whenever it called from front-end
Reads data from any of the GE supported cloud data warehouses / data sources especially Snowflake and then created a Great Expectation Data source
Based on the test cases need to be validated on the Great Expectation Data source, the Expectation Suite is created and and column with test case mappings are stored
Once Expectation Suite saved, the data sources validated and stored the validation results in the PostgreSQL and Snowflake tables
Created the small Flask based API's to generally GET and POST the data from front-end to the databases and vice-versa

July 2022 - Present

Software Engineer - Data Engineering and Data Warehousing

Tiger Analytics

Cloud Data Warehousing with Multiple Data Sources Project for Fin-tech based client

The Azure Synapse Analytics was provisioned in the Azure Web UI along with Azure Data Lake Storage Gen2, a Dedicated SQL Pool (Data Warehouse), and a Spark Pool for transformations, specifically for the development environment. For the other environments (STG, PERF, QA1, SIT1, UAT, and PROD), ARM scripts were prepared and used to provision the same resources
The Linked Services in Synapse Analytics are created for the PostgreSQL and MongoDB servers with both Azure Integration Runtime and Self-Hosted Integration Runtime. The required Integration Datasets are created along with the staging tables
For Historical data load, the config file is created to have the queries of historical data by past two years on month wise. The Synapse Pipeline is created initially to load from the source and stored the raw data in Azure Data Lake Storage (parquet for PostgreSQL and JSON for MongoDB). The JSON data gets flattened with Pyspark Code on Spark pool and stored the flattened data again into Azure Data Lake Storage. The processed data from ADLS loaded to staging tables and then upsert into Facts and Dimensions created in the Dedicated SQL Pool
For Incremental / Daily data load, the control log table created and used it to support for incremental date filter. Similar to historical data load, the Synapse Pipeline is created to load from the source and stored the raw data in Azure Data Lake Storage (parquet for PostgreSQL and JSON for MongoDB). The JSON data gets flattened with Pyspark Code on Spark pool and stored the flattened data again into Azure Data Lake Storage. The processed data from ADLS loaded to staging tables and then upsert into Facts and Dimensions created in the Dedicated SQL Pool
The Audit log and Error log framework is designed and created for the Pipeline triggers. For Pipeline trigger monitoring, the Azure Monitor service and the Azure Logic App is created to notify stakeholders when pipeline trigger failed
The Proof Of Concept and feasibility check is done for the Data Governance tool - Microsoft Purview for the Data Sources, Synapse Data warehouse with Power Bi Reports and presented the findings to the Client's Design Committee
For securing the source's connection strings and server passwords and instead of whitelisting it in the Linked Services, the Azure Key Vault is created and saved as secrets to avoid whitelisting it in synapse

August 2021 - June 2022

Data Engineering Intern

Tiger Analytics

Learned the fundamentals and basics of Data Engineering - SQL, Data Modelling, Data Warehousing, Python, Big Data, Spark and Azure Cloud
Completed a Mini Project using Python and MYSQL
- In Python, connected to a MYSQL Server and ingested a CSV file into a table in the database
- Written the SQL queries into multiple files and read it in Python
- Execute the queries read from the file with Python connector and printed the result.
Done the Aviation based Data Analysis case-study

Examined the aviation related data to provide solution (visualizations) for the business insights problems
Transferred the datasets from local storage to Azure Cloud SQL database and created Raw Layer of Data Warehouse on Databricks
Cleaned and Tranformed the raw layer data and loaded to staging layer tables
Analyzed and Visualized the data for business insights using Databricks and Power BI

February 2021 - July 2021

Education

Sri Venkateswara College of Engineering (Autonomous), Anna University

Bachelor of Engineering

Computer Science and Engineering

CGPA: 9.45

July 2017 - June 2021

Dr. VGN Matriculation and Higher Secondary School

Higher Secondary School Education

Percentage: 95.5%

June 2016 - April 2017

Skills

Programming Languages & Tools

Programming Languages and Databases :

Python
Microsoft SQL Server - T-SQL
MYSQL

Technologies and Stacks :

Cloud Data Warehousing
Azure Synapse Analytics with Dedicated SQL Pool and Spark Pool
Azure Data Lake Storage
Apache Spark - PySpark
API with Python Flask SQLAlchemy
Data Modelling - Dimensional Modelling
ETL / ELT
Snowflake Data Warehousing
Databricks
DBT

Certifications & Accrediations

Microsoft Certifications :

Microsoft Certified : Azure Fundamentals
Microsoft Certified : Data Fundamentals
Microsoft Certified : Data Engineer Associate

Databricks Certifications & Accrediations :

Databricks Certified Data Engineer Associate
Databricks Certified Associate Developer for Apache Spark 3.0
Academy Accrediation - Databricks Lakehouse Fundamentals

Snowflake Badges :

Hands On Essentials Badge - Data Warehouse
Hands On Essentials Badge - Data Applications
Hands On Essentials Badge - Data Sharing
Hands On Essentials Badge - Data Lake
Hands On Essentials Badge - Data Engineering

Dinesh Kumar Sai M

Experience

Senior Software Engineer - Data Engineering and Data Warehousing

Software Engineer - Data Engineering and Data Warehousing

Data Engineering Intern

Education

Sri Venkateswara College of Engineering (Autonomous), Anna University

Dr. VGN Matriculation and Higher Secondary School

Skills

Certifications & Accrediations

Contact

dineshvenkatm@gmail.com

+91 96262 10026

www.linkedin.com/in/m-dinesh-kumar-sai

Dinesh Kumar Sai M

Experience

Senior Software Engineer - Data Engineering and Data Warehousing

Software Engineer - Data Engineering and Data Warehousing

Data Engineering Intern

Education

Sri Venkateswara College of Engineering (Autonomous), Anna University

Dr. VGN Matriculation and Higher Secondary School

Skills

Certifications & Accrediations

Contact

dineshvenkatm@gmail.com +91 96262 10026 www.linkedin.com/in/m-dinesh-kumar-sai

dineshvenkatm@gmail.com

+91 96262 10026

www.linkedin.com/in/m-dinesh-kumar-sai