Praveen Kumar M
l Data Engineer
* United States ï Praveen Kumar
214-***-**** # ************@*****.***
SUMMARY
Data Engineer with around 4+ years of experience and AWS & Azure certified Data Engineer, adept at using cloud platforms such as AWS (S3, Redshift, Glue, EMR, Lambda, Kinesis) and Azure (Data Factory, Synapse, ADLS, Blob, Logic Apps, Databricks, Fabric, Purview,) Google Cloud Platform and along with DBT, Airflow.
With strong skills programming and Big data tools in Snowflake and Hadoop, SQL, Python, PySpark, ETL/ELT, Excel,with expertise in data visualization tools like Tableau and Power BI, Data modelling.
I have a proven track record of delivering metadata-driven data solutions. I have successfully led teams and completed projects across Banking, Finance, Healthcare, Insurance and Learning domains with regulations. EXPERIENCE
Regions Bank, Birmingham, AL Feb 2024 - Present Data Engineer
• Conducted detailed financial data analysis to support strategic planning, risk management. Possesses a strong foundation in Data quality and Data governance. Proficient in the utilization of Data management tools.
• Reviewing customer accounts for fraud detection and prevention, assist customers in initiating both fraud and non-fraud claims. Developed ETL jobs to monitor accounts and applications in Cloud Era Workbench.
• Implemented business analytics using SQL, Excel, and Power BI to design dashboards tracking key banking metrics and KPIs. Developed and optimized data pipelines using Snowflake to bend, load, and transform data.
• Utilized Spark,Hadoop (HDFS), Hive, and Impala to manage, query, and optimize large-scale banking transaction data, enhancing analytics, ensuring regulatory compliance, improving scalability.
• Optimized ETL pipelines with PySpark and Python, SQL, processing 100GB+ of banking data from multiple systems into Snowflake, and implemented Delta Lake for ACID-compliant storage and data consistency
• Designed and architected scalable, cloud-native distributed systems platforms using Python, Kafka, Spark, Hadoop, Delta Lake storage, handling 10+ GB of data, supporting daily transactions with 99.95% uptime.
• Managed enterprise-scale AWS cloud ecosystems (Redshift, Athena, S3, SNS, SQS, Lambda, Glue, DMS, EMR, EC2, Batch, CloudFormation, Step Functions, CloudWatch) and optimized database performance across Red- shift, MySQL, and Snowflake through advanced SQL tuning and query optimization, ensuring high efficiency.
• Designed and implemented data platform strategy. Developed comprehensive data models (3NF, Dimensional, Data Vault) for improved data trust. Normalized and automated processes to optimize data reliability.
• Analyzed and documented complex data sources and PySpark flows, boosting team understanding by 80%, while delivering automated, high-performance solutions in Python and SQL for 5 key initiatives.
• Applied data warehousing and SQL to design scalable architectures for reports; collaborated with stakeholders to document lineage and improve governance; ensured security and compliance with GDPR, CCPA standards.
• Developed and launched new data-driven products by applying statistical modeling, machine learning, and deep learning expertise, and visualizing insights with Power BI to solve complex business problems.
• Designed, developed, and optimized end-to-end BI solutions, including Redshift, Oracle, and NoSQL data pipelines, enabling real-time analysis over datasets records, improving data availability for operational teams.
• Engineered scalable ETL pipelines leveraging SQL, Python, and AWS Glue, integrating multiple structured and semi-structured data sources to ensure 99.9% data accuracy and drive better cost optimization in workflows.
• Experience in Agile/Scrum, Lean/Six Sigma Methodologies/processes and waterfall SDLC models. Cognizant Technology Solutions, Hyderabad, India June 2020 - Aug 2022 Data Engineer
• Architected and deployed enterprise-scale data solutions on Azure, leveraging Data Factory, Databricks, and Synapse Analytics to process 5TB+ of data daily, reducing storage costs by 35% with efficiency.
• Ensured data quality and integrity using Azure Data Factory and PySpark by implementing validation checks, reducing discrepancies by 95%, and resolving issues to achieve 99.9% uptime for critical data services.
• Built and maintained datasets with validation frameworks, leveraging Azure Stream Analytics and Cosmos DB to support cloud-based data workflows, scalable architecture, and automated data quality checks.
• Developed scalable data solutions using Azure Events and Logic Apps, while mentoring developers on PySpark and Delta Lake best practices for cloud data integration projects globally every day with expertise.
• Built and maintained robust data infrastructure using tools Apache Spark, Hadoop, Kafka, Airflow, SQL/NoSQL databases and cloud data platforms ensuring data integrity and facilitating iteration of AI/ML models.
• Leveraged analytical and problem-solving skills to optimize big data using Alteryx and SAS, driving 30% performance& cost reduction through statistical analysis, data transformation, and cross-functional collaboration.
• Optimized batch processing frameworks and high-performance data orchestration platforms using Apache Spark with Delta Lake, HDFS, Apache Airflow, and Python scripts, managing 100+ daily ETL workflows.
• Developed interactive dashboards and reports using Power BI and Tableau, enhancing data visualization capa- bilities by 40%, facilitating data-driven decision-making, resulting in a 20% increase in operational efficiency.
• Engineered big data processing using Spark for distributed computing on 50+ TB daily, optimizing ETL pipelines with PySpark and Python to boost ingestion efficiency by 40% and cut processing time by 10 hours.
• Developed and deployed 150+ ETL/ELT pipelines leveraging Azure Data Factory, dbt and Databricks, processing 500GB data with 99.8% success rate using agile development methodologies effectively team lead role.
• Collaborated cross-functionally with product managers, operations leaders, and data scientists to define BI requirements, translate business problems into analytical solutions, and delivered data-driven decisions . Electronics Corporation of India Limited, Hyderabad, India June 2019 – May 2020 Data Analyst
• Ensured data quality and integrity through robust data validation processes in SQL, Excel, and data governance platforms, promoting best practices as a Data Analytics Council participant, driving business growth.
• Architected and deployed scalable data solutions, including SSIS packages for automated ETL and relational databases with advanced design techniques, to optimize data integration and support business applications.
• Implemented Python and PySpark scripts to automate data extraction, transformation, and loading processes from APIs, flat files, and databases, enhancing data accuracy by 20%, reducing manual errors greatly.
• Developed and maintained Star, Snowflake schema and relational database models to support scalable data warehousing solutions, improving report generation and analytical insights, driving business informed decisions.
• Designed and implemented scalable data pipelines using Hadoop, Spark, and Python, leveraging Hive for data analysis, driving business growth and developed scalable solutions using HDFS, MapReduce and YARN.
• Collaborated in Agile Scrum teams using Jira and GitLab for continuous integration and deployment (CI/CD), ensuring delivery of data solutions and maintaining 99.9% system uptime. EDUCATION
Master of Science in Computer Science University of Alabama at Birmingham, AL Aug 2022 - Dec 2023 SKILLS
Programming skills: Python(NumPy, Pandas, Matplotlib, SciPy, Scikit-learn), SQL, Java, JavaScript, Spark, HTML/CSS, Docker, PySpark, R, ETL, DBT, A/B testing, ML, Linux, Unix, Shell Scripting, Looker Studio.
Cloud Services: AWS (S3, EC2, Lambda, Redshift, EMR, Glue, Kinesis), Microsoft Azure (AD, VM, DevOps, Data Factory, snowflake, ADLS Gen2, Synapse analytics, Blob Storage), GCP, Hive, Hadoop(HDFS), Impala.
Software Tools: Jupyter Notebook, GitHub, JIRA, Google Analytics and BigQuery, Power BI, SAS, Tableau.
Database Management : Relational database (My SQL, Oracle, PostgreSQL), AWS RDS, NoSQL (MongoDB, DynamoDB), SQL Server(Tables, functions, T - SQL, views, Stored Procedures, Commands/Clauses), Snowflake. CERTIFICATIONS
AWS Certified Data Engineer – Associate Microsoft Certified - Fabric Data Engineer Associate Cisco Networking Academy Data Analytics Essentials Cyber Security certification form UAB HP Certified in Python Programming and Data Science and Analytics SAS Certified Specialist: Base Programming Using SAS 9.4. Lean Six Sigma White Belt from CSSC.