Data Engineer

Location:

Atlanta, GA

Salary:

$130k Per Anum

Posted:

July 30, 2023

Contact this candidate

Resume:

Swati Dogra

Data Engineer

Cell- 404-***-**** e-mail: **************@*****.***

EXPERIENCE SUMMARY

Having 8+ years of total IT experience in all phases of software development life cycle, 4+ years of industry experience as Data Engineer/Analyst, involved in design and development of distributed systems using SQL, Python, AWS, PySpark, and Tableau.

Experience on working on rich financial data like Moody’s, Bloomberg, JP Morgan Chase, Credit Suisse, SNP and make the data available for the analysis.

Experience in developing SQL scripts to Upload, Retrieve, Manipulate and handle sensitive data in MySQL and SQL Server Management Studio.

Used SQL and Excel to ensure data quality and integrity, Identified and eliminated duplicates and inaccurate data records.

Expertise in creating optimized Hive tables using proper partitioning, Bucketing for faster I/O, implementing Change Data Capture (CDC), Dedup, Incremental load, Data quality governance, balance and control.

Experience of creating Lambda Function for Delta files.

Experience of using Pyspark for extract, filtering and transforming the Data in data pipelines

Proficient in AWS services like S3, Event Bridge, Step Function, RDS, IAM, Lambda, EMR, Aws Glue, Athena, CloudWatch, Redshift, QuickSight.

Hands on experience working Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift, and EC2 for data processing.

Good understanding of data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables

Developed a Quicksight dashboard to monitor real-time monthly & quarterly transactional pattern of clients based on business unit, transaction type and sector.

Proficient in data collection, cleaning, and analysis, utilizing a wide range of statistical techniques and data visualization tools to deliver actionable recommendations.

Experience on calculations using different type of functions like logical, Number, String, Aggregate, date functions.

Experience in deployment of source code in every environment.

Experience in functional/database testing using selenium/webdriver, pytest.

Experience in Performing Database/Backend testing by writing SQL/Oracle Queries for data validation and integrity.

Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation, Experience in working with agile team.

TECHNICAL SKILLS

Scripting and Query Languages: Python, Pyspark, SQL, SparkSQL

Databases: My SQL, SQL Server, Oracle, PostgreSQL

Tools: Eclipse, Pycharm, IntelliJ, SubVersion [SVN], JIRA, BitBucket, Git

AWS Services: S3, Event Bridge, Step Function, Lambda, RDS, EMR, IAM, Aws Glue, Athena, CloudWatch, Redshift, Kinesis, QuickSight

CERTIFICATIONS

AWS Cloud Practitioner (CLF-C01, Validation Number QJZLLB5L3MQEQBGL)

ISTQB – Certified Tester Foundation Level (CTFL) (Certification No- ITB-CTFL-75123)

EXPERIENCE

Fitch Ratings – AWS Data Engineer

New York – NY (Remote)

Jan 2023 – Till

Responsibilities:

Collaborated with various teams to gather all the information needed for data pipeline creation and was involved in all aspects like data collection, data cleaning, developing models, and visualization.

Working in ingesting different types of file formats XML, CSV, and text files.

Created the SFTP connections and get the zip files from a different folder and place the files in the landing zone bucket.

Created the event bridge rules with payloads that help to fetch the files from the SFTP server.

Built crawlers in Glue to build the table structures in the Glue data catalog and read the files.

Created the common glue job to pull the zip files and then unzip the file and generate the trigger files

Create an event-based rule to start the actual data processing that triggers the step functions

Created glue jobs for processing the files that create the partition tables.

Created the lambda functions for the ETL of delta files.

Ingested real-time data streaming using Kafka and did the data transformation using python.

Created pandas data frame for delta files and apply the transformations.

Experience in creating parquet files using AWS Wrangler.

Created the glue jobs that build ICE BERG tables and do the merge operations.

Created the dynamo dB inserts and updates for the control table changes

Created and present reports, charts, and dashboards to communicate analytical findings to internal stakeholders, such as rating analysts and senior management using Quicksight.

Created the step functions that help to orchestrate the glue jobs and lambda functions

Environment: MySQL, Oracle, AWS (Lambda, GLUE, S3, Athena, Step Function, Event Bridge), Python,Pandas, Pyspark, Tableau.

Suniksha Technologies – Data Engineer

Dallas – Texas (Remote)

Jan 2020 – Dec 2022

Responsibilities:

Collaborated with various teams to gather all the information needed for data analysis and involved in all aspects like data collection, data cleaning, developing models, visualization.

Developed, modified, and managed databases, schemas, and tables and used Joins and Sub-Queries to simplify complex queries involving multiple tables.

Used Pyspark for extracting, filtering, and transforming the Data in data pipelines.

Executed Hadoop/Spark jobs on AWS EMR using programs and data is stored in S3 Buckets.

Develop the ETL using AWS Glue and query the data using Athena.

Experience to manage IAM users by creating new users, giving them limited access as per needs, assigning roles and policies to a specific user

Performed end-to-end architecture & implementation assessment of various AWS services like Amazon EMR, S3, and Redshift.

Used SQL and Excel to ensure data quality and integrity. Identified and eliminated duplicates and inaccurate data records.

Experienced with different Python module libraries, and packages like Pandas data frame.

Used the Pandas data frame for better analysis and data cleansing.

Environment: MySQL, SQL server, AWS (EMR, GLUE, S3, Athena, Redshift), Python, Pyspark, Tableau.

Motherson Technology Services Limited - Data Engineer

Gurgaon – India

Project - Sumitomo Electric wiring systems

Mar 2018 – May 2019

Responsibilities:

Involved in gathering functional and non-functional requirements.

Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

Used AWS EMR to transfer and move large amounts of data into and out of other AWS data stores and databases, such as Amazon S3 and Amazon Redshift.

Performed end-to-end architecture and implementation assessment of various AWS services like Amazon EMR, Redshift, and S3.

Import the source from different sources like HDFS into Spark RDD and perform computations using PySpark to generate output response.

Used Pyspark for extracting, filtering, and transforming the Data in data pipelines.

Used SQL and Excel to ensure data quality and integrity. Identified and eliminated duplicates and inaccurate data records.

Analyzed the data by performing SQL queries and generated monthly and quarterly reports.

Used the Python pandas library for data visualization and data cleansing.

Analyze the Trouble Reports (UAT results) to identify the issue behavior.

Environment: MySql, AWS, Python, Pyspark, Tableau, TFS, Redmine

Espire Infolabs PVT LTD– Data Engineer/Analyst Jan 2017 – Jan 2018

Gurgaon – India

Project - Customer Communication Management (Clydesdale Bank)

Responsibilities:

Work with Product and Data Science team to define campaigns measurement KPIs Design Logging specs to capture required data for powering KPIs and Metrics, work with the Software Engineering team to get logging in place, validating logging data per logging spec.

Developed, modified, and managed databases, schemas, tables, used Joins and Sub-Queries to simplify the complex queries involving multiple tables.

Used SQL and Excel to ensure data quality and integrity. Identified and eliminated duplicates and inaccurate data records.

Developed data pipeline using Python, hive to load data into data link. Perform data analysis data mapping for several data sources.

Building data pipelines to clean and process campaign data, Analyze the campaign data by performing Hive queries, load cleaned campaign data to hive tables.

Develop python scripts to connect various databases (MySQL, SQL Server Management Studio, and Oracle) in order to retrieve and consolidate the data that is available in CYBG’s databases for monthly Data Analysis for Corporate Planning.

Database integrity, stability and consistency - Performing manual testing to check if the data has loaded without any issues.

Responsible for creating optimized hive DDLs, partitioned and bucketed hive tables.

Worked on fast pace multiple ad-hoc business data requests.

Environment: HDFS, HIVE, MYSQL, Python

Espire Infolabs PVT LTD– Senior Software Engineer Nov 2015 – Dec 2016

Gurgaon – India

Project - Business as Usual (Yorkshire Bank)

Responsibilities:

Involved in testing of the new functionalities based on test cases and coordinated with development team in fixing the issues.

Involved in Release support - Release to QA, testing the release process.

Writing SQL Queries for data validation and integrity.

Created test data for Dev, QA and Pre-Prod Environment according to the requirements.

Verified the outputs against the variable passed in test data.

Automated database testing processes and procedures.

Collaborated with database development team for testing implementation.

Created Regression, Smoke, Sanity test cases.

Involved in the Requirements Meetings, Review Meetings continuously with developers and users and modifying the Test Cases.

Complete Reporting for Automation (Html and log).

Met all project deadlines and provided business end-user support.

Environment: SQL Server 2008, Python, Selenium/Webdriver, SOAPUI, GitHub

NIIT LTD - Software Engineer

Gurgaon – India

Project - HMH, Skillsoft

Responsibilities:

Involvement in the creation of E-learning solutions using the Lectora tool.

Involved in Release support - Release to QA, testing the release process.

Writing SQL Queries for data validation and integrity.

Mar 2013 – Oct 2015

Witty Brains Software Technologies Pvt Ltd - QA Analyst

Noida – India

Project – ON2

Responsibilities:

Jan 2012 – Oct 2012

Involved on testing the new functionalities based on test cases and coordinated with development team in fixing the issues.

Involved in Release support - Release to QA, testing the release process.

Created Regression, Smoke and Sanity Test suits.

Involved in the Requirements Meetings, Review Meetings continuously with developers and users and modifying the test cases as per the user requirements as part of User Acceptance Testing(UAT

EDUCATIONAL QUALIFICATION

Master of Computer Application - 2011

Uttar Pradesh Technical University, India

Bachelor of Science - 2008

Kurukshetra University, India

Contact this candidate