Data Engineer Engineering

Location:

Herndon, VA

Posted:

March 06, 2025

Contact this candidate

Resume:

Sai Movva

Sr Data Engineer/ ETL Developer

Dallas, TX 219-***-**** *****.******@*****.***

7+ years of professional experience in information technology as a Data Engineer with expertise in the areas of Database Development, ETL Development, Data modelling, Report Development and Big Data Technologies.

Experience in Data Integration and Data Warehousing using various ETL tools Informatica PowerCenter, AWS Glue, DataStage, SQL Server Integration Services (SSIS), and Talend.

Hands-on experience of working with AWS services like Lambda function, EC2, Lambda, S3, Athena, RedShift, DynamoDB, Step functions, SNS, SQS, S3, IAM etc.

Experience in the integration of various data sources such as Oracle, SQL Server, Salesforce cloud, Teradata, JSON, XML Files, Flat files, XML, XSDs and API integration.

Experience coding in Python with solid CS fundamentals including data structure and algorithm design.

Experience with data transformations utilizing SnowSQL and Python in Snowflake.

Extensive experience in creating complex mappings in Talend using transformation and big data components.

Implemented Spark SQL and Data frame API to connect to Hive to read the data and distributed processing to make it highly Scalable.

Demonstrated skills in integrating data science libraries such as Pandas, NumPy, Scikit-Learn, and TensorFlow into data engineering workflows for advanced analytics and machine learning applications.

Strong skills in visualization tools POWER BI, Microsoft Excel - formulas, Pivot Tables, Charts and DAX Commands.

Experience in developing web applications by using Python, Django, C++ XML, CSS, HTML, JavaScript and jQuery.

Proficient in Hadoop, HDFS, Hive, Map Reduce, Pig and NOSQL databases like Mongo DB, HBase, Cassandra and expertise in applying data mining techniques and optimization techniques.

Expertise in Linux systems and Bash scripting, contributing to the automation and optimization of data processes.

Experience in database development and implementation using different RDBMS such as ORACLE, MySQL, PostgreSQL and NoSQL databases such as MongoDB, DynamoDB.

Experience in designing and creating RDBMS Tables, Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers and Transactions.

Hands on experience deploying KAFKA connect in standalone and distributed mode, creating docker containers using DOCKER.

Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys

Proficient with Azure Data Lake Services (ADLS), Databricks & Jupiter Notebooks formats.

Championed in cloud provisioning tools such as Terraform and CloudFormation.

Extensively involved in all phases of data acquisition, data collection, data cleansing, model development, model validation and visualization to deliver data science solutions

Hands-on experience setting up Kubernetes Clusters for running microservices.

Strong SQL development skills include writing Stored Procedures, Triggers, Views, and User Defined functions.

Top Skills

Languages: Python, PySpark, Scala, Java, SQL, Unix Shell Scripting, SQL, T-SQL

Databases: MySQL, Oracle, PostgreSQL, PL SQL, SQL Server, Teradata, MongoDB, DynamoDB

Amazon Web Services: EC2, EMR, S3, Redshift, EMR, Lambda, Athena, Lake Formation, Glue

ETL: Informatica, DataStage, SSIS, Talend, AWS Glue

Reporting Tools: Power BI, Tableau

Bigdata Ecosystem: Hadoop, HDFS, MapReduce, Hive, Spark

Data Engineering: Snowflake, Apache Kafka, Apache Airflow, Azure Databricks

Operating Systems: Windows, MacOS, Unix, Linux

Other Software Tools: MS Office, Visio, HPALM, JIRA, Splunk

Build Tools: Maven, Ant, Docker, Kubernetes, EKS

Experience

Sr DATA ENGINEER/ Snowflake Data Engineer Mission Squared, FL, US September 2022 – Present

Interacting with business and customers to understand the architecture of the applications.

Design, implement and maintain all AWS infrastructure and services within a managed service environment.

Worked extensively with AWS services like EC2, S3, Lambda, Route 53, IAM, CloudWatch, CloudFormation, SNS.

Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations using AWS Athena.

Experience with ETL workflow Management tools like Apache Airflow and have significant experience in writing the python scripts to implement the workflow.

Developed PySpark based pipelines using spark data frame operations to load data to EDL using EMR for jobs execution & AWS S3 as storage layer.

Developed AWS lambdas using Python & Step functions to orchestrate data pipelines.

Developing ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.

Working and executing snowflake procedure using VS code.

Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various Machine learning algorithms.

Hands on Experience in migrating datasets and ETL workloads with Python from On-prem to AWS Cloud services

Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.

Worked on Snowflake Schemas and Data Warehousing.

Sound working experience in developing Snowflake Multi-cluster Size and Credit Usage.

Task/stream creation for scheduling/Automate Snowflake jobs.

Working and executing snowflake procedure using VS code.

Created file format, table structure, stage, snowpipe structure to develop and deploy from DEV to all rest of environments.

Created data models for information systems by applying formal Data Modeling techniques.

Use various Teradata load utilities for data load and UNIX shell scripting for file validation process.

Built Python and PySpark automation scripts to read the Informatica XML files to comprehend the complexity and code criticality.

Improved daily jobs performance using data cleaning, query optimization and table partitioning

Worked with Informatica Data Quality toolkit, Analysis, data cleansing, data matching, data conversion, exception handling, and reporting and monitoring capabilities.

Built data lake using tools like Hadoop and Spark, making it easy to find and analyze for making smarter decisions.

Automated deployment pipelines using Jenkins, ensuring continuous integration and delivery for enhanced development workflows.

Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.

Automate Datadog Dashboards with the stack through Terraform Scripts.

Developed entire frontend and backend modules using Python on Django Web Framework and created User Interface (Ul) using JavaScript. bootstrap, Cassandra with MySQL and HTMLS/CSS

Responsible for building Scala distributed data solutions using Hadoop.

Used Apache Kafka functionalities like distribution, partition, replicated commit log service for messaging.

Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure, etc

Published developed reports to Power BI Service (Cloud).

Created an automation process for Distribution group which receive Inventory and sales data send activation report using Talend.

Created Talend Mappings to populate the data into dimensions and fact tables

Worked on scaling microservices applications using Kubernetes and docker.

Used Jenkins and pipelines which helped us drive all microservices builds out to the Docker registry and then deployed to Kubernetes.

Written Programs in Spark using Scala for Data quality check.

Utilized DevOps methodologies to implement CI/CD pipelines for data engineering projects, streamlining the testing, deployment, and monitoring processes of data pipelines for enhanced efficiency and reliability.

Participated in all phases of data mining, data collection, data cleaning, developing models, data validation, and Data visualization and performed Gap analysis

Designed various Jenkins jobs to continuously integrate the processes and executed CI/CD pipeline using Jenkins.

Demonstrated proficiency in Agile methodologies, particularly Scrum, ensuring efficient collaboration within and across teams.

DATA ENGINEER Centene Corporation, St Louis, MO May 2021 – Sep 2022

Worked closely with a project team to gather the business requirements and interacted with business analysts to translate business requirements into technical specifications.

Demonstrated strong analytical skills by designing and implementing data validation processes, ensuring data accuracy and integrity throughout the ETL pipeline

Implemented a serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon s3 buckets.

Used Amazon EMR for MapReduce jobs and test locally using Jenkins.

Created a Lambda function and configured it to receive events from your s3 bucket.

Developed a data ingestion pipeline that reads and writes multiple data formats (JSON, CSV, ORC, Parquet) on HDFS using PySpark.

Writing complex SQL scripts to analyze data present in different Databases/Datawarehouse's like Snowflake, Teradata, and Redshift.

Sound working exp in AWS services like Kinesis, AWS Glue, or Lambda can also be used to stream data into Snowflake.

I conducted independent data analysis, gap analysis, wrote SQL queries with interpretation and generated reports on graphs as per specifications.

Executed ETL processes using AWS Glue, transforming raw data from diverse sources into structured formats for analytical purposes.

Creating AWS Lambda functions using python for deployment management in AWS and designing and implemented public facing websites on AWS and integrating it with other applications infrastructure

Built an ETL framework for Data Migration from on premise data sources such as Hadoop, Oracle to AWS using Apache Airflow, Apache Sqoop and Apache Spark.

Transformed raw data into actionable insights by incorporating various statistical techniques, data mining, data cleansing, data quality, integrity utilizing Python (Scikit-Learn, NumPy, Pandas, and Matplotlib) and SQL.

Worked on docker and Kubernetes for deploying applications and making containerize.

Developed high level data dictionary of ETL data mapping and transformations from a series of complex Talend data integration jobs.

Monitored and optimized system performance using AWS CloudWatch, ensuring reliable and scalable data processing.

Excellent experience in designing and maintaining complex SQL queries and developing PL/SQL stored procedures.

Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication, schema design, etc.

Built data pipelines using AWS Data Pipeline for automating and scheduling regular data movement and transformation.

Using Informatica PowerCenter created mappings and mapplets to transform the data according to the business rules.

Utilized Apache Airflow to schedule and monitor ETL (Extract, Transform, Load) processes, improving data pipeline visibility, and maintenance efficiency.

Developed and deployed stacks using AWS Cloud Formation Templates (CFT) and AWS Terraform.

Scheduled refreshes through the Power BI gateway to refresh Dashboard at regular intervals and appropriate dependencies.

Building a scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, Postgres, SQL Server, Big query and load it in Big Query.

Code quality checks using Jenkins and peer reviews in an agile methodology.

Written docker file and docker compose to create various databases and to run the Flask application in both Local and Dev environment.

Performed data analysis and ETL (Extract, Transform, Load) operations using Spark on AWS EMR cluster.

Built real time data pipelines by developing Kafka producers and spark streaming applications for consuming.

Develop dashboards and visualizations to help business users analyze data as well as providing data insight into upper Management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.

Applied strong problem-solving skills and critical thinking abilities to overcome challenges in the data engineering process.

DATA ENGINEER Fidelity Investments, Benguluru, India Nov 2018 – Dec 2020

Collaborated with data engineers and operation team to implement the ETL process,

Wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Developed complex ETL Packages using SQL Server 2008 Integration Services to load data from various sources like Oracle/SQL Server/DB2 to Staging Database and then to Data Warehouse.

Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day.

Performed logical data modeling, physical Data modeling (including reverse engineering) using the Erwin Data modeling tool.

Creating job flow using Airflow in python language and automating the jobs. Airflow will have separate stacks for deploying Dags on and will run jobs on EMR or EC2 cluster.

Write terraform scripts from scratch for building Dev, Staging, Prod and DR environments.

Used Scala to convert Hive / SQL queries into RDD transformations in Apache Spark.

Performed ETL operations using Databricks for transforming big data into a suitable format for analysis.

Conducted studies, rapid plots and using advanced data mining and statistical modeling techniques to build a solution that optimizes the quality and performance of data.

Dealt with initials, delta and Incremental data as well Migration data to load into the Teradata.

Created automated solutions using Databricks, Spark, Python, Snowflake, HTML.

Created Airflow DAGs to sync files from box, analyze data quality, and alert for missing files

Developed rest API's using python with flask framework and done the integration of various data sources including java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.

Designed ETL Process using Informatica to load data from Flat Files, and Excel Files to target Oracle Data Warehouse database.

Collaborated with teams using Tableau for data visualization, creating insightful reports and dashboards.

Used Git version control to manage the source code and integrated Git with Jenkins to support build automation and integrated with Jira to monitor the commits.

Configured AWS EC2 instances and configured IAM users and roles and created S3 data pipes using Boto API to load data from internal data sources.

Developed entire frontend and backend modules using Python on Django Web Framework

Developing sequencers & parallel jobs based on the Mappings on Agile scrum methodology.

Jr DATA ENGINEER Excel Media, Telangana, India Feb 2017 – Oct 2018

Worked on Data Mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.

I worked on ETL testing and used SSIS tester automated tools for unit and integration testing.

Wrote efficient PL/SQL code for seamless integration with Oracle databases, ensuring data integrity and performance.

Prepared technical specification documents for the ETL code process.

Involved in validating and ETL testing data from source to target and implemented necessary steps if necessary.

Performed Unit testing & Test Cases for DataStage jobs.

Used GitHub to save and share code, review changes made by teammates, and keep track of project tasks, making it easier to work together and manage project progress.

Batch processed data through Rest API gateway to make transformations on it, resulting in a 30% reduction in the time it took to process the data.

I worked on testing and used SSIS tester automated tool for unit and integration testing.

Conducted statistical analysis and created an interactive dashboard to present findings in Tableau, which was used by stakeholders to make better decisions.

Highest Qualification

Master of Science in Information Technology Valparaiso University Valparaiso, IN, USA GPA 3.9, 2022.

Contact this candidate