Data Engineer Information Technology

Location:

Overland Park, KS

Salary:

90000

Posted:

December 21, 2023

Contact this candidate

Resume:

Hemanth Mereddy

**********@*****.***

Phone: 913-***-****

Data Engineer

PROFESSIONAL SUMMARY:

Around 4+ years of Professional experience in the field of Information Technology as a specialized expertise as a Data Engineer. During this time, I have developed a comprehensive understanding of data architecture, ETL processes, and data modeling which has enabled me to design and implement effective data solutions for various business needs.

Experienced in utilizing PySpark and SparkSQL for data processing and analysis, showcasing proficiency in leveraging big data technologies to derive meaningful insights and enhance decision-making processes.

Hands on expertise with AWS Cloud Computing such as EC2, VPC, S3, Glue, IAM, EBS, DynamoDB, CloudWatch, CloudFormation, SNS, SQS, Step Function, Event Bridge, Secrets Manager, Lambda, DataSync and EFS.

Proficient in Autosys job scheduling, demonstrating expertise in creating and managing complex job workflows, ensuring timely execution, and optimizing system performance.

Experience in using SDLC methodologies like Waterfall, Agile Scrum for design and development.

Excellent Programming skills at a higher level of abstraction using Scala, AWS, and Python.

Highly experienced in creating complex Informatica mappings and workflows working with major transformations.

Strong Experience in working with Databases like Oracle 10g, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.

Experience in automation and building CI/CD pipelines by Jenkins and Chef.

Extensively used Python Scripts for Data Quality Reporting.

Extensively used Databricks for Data Analysis and Visualization.

Technical Skills:

ScriptingLanguages: SQL, Python, HQL, PySpark, SparkSQL,ShellScripting

Databases: Oracle 11g, MySQL, MongoDB, PostgreSql

Cloud Services: AWS :S3, Glue, Lambda, RedShift, Athena, DynamoDB, CloudWatch, RDS, EC2, RDS

Tools: Jira, Git,Github

Hadoop Ecosystem: HDFS, YARN, Spark, Sqoop, Hive

IDE’s :Eclipse, Jupyter notebook, Spyder, PyCharm, IntelliJ

PROFESSIONAL EXPERIENCE:

Client: Matrix -IFS, New Jersey, USA

Role: AWS Data Engineer Duration: Aug 2022 to Present

Project Description: Designed and developed efficient pipelines for a leading Investment Banking firm, incorporating various data feeds, and performing ETL (Extract, Transform, Load) processes, and then seamlessly storing the output in respective database tables on AWS (Amazon Web Services) cloud platform.

Responsibilities:

Collaborated with Business Analysts to understand the business logics and ETL transformations that would be needed for the raw data from Upstream to Downstream.

Automated file arrival from remote Unix machines to AWS EC2 and transferring raw files to corresponding AWS S3 buckets using Unix scripting and AWS Data Sync.

Ran the entire file arrival process using Autosys.

Developed transformations using AWS Glue with Spark.

Leveraged DynamoDB to store ETL and Business Logics as dynamic content, ensuring high scalability and performance.

Enhanced data processing efficiency and analysis of large datasets by utilising crawlers to read and extract raw data efficiently.

Utilised Glue tables to seamlessly read structured and semi-structured data, ensuring flexibility in data retrieval processes. Employed the boto3 Python SDK to access and interact with AWS components and services via the command-line interface (CLI), streamlining operations and workflows.

Proficiently worked with multiple file formats, including CSV, JSON, and XML, within AWS S3 buckets to optimise data storage and retrieval processes.

Validated ETL and Business Logics in AWS Athena to ensure mappings saved in DynamoDB were reflected in downstream tables.

As a developer, I have both designed and written mappings, performed Unit tests, set up the System Test environments, generate test cases, Regression test where needed and support User Acceptance Testing.

Implemented Type 1 and Type 2 methodologies in ODS tables loading to keep historical data in the data warehouse. Designed Incremental loading process to load data into staging tables. Created Worklets, Workflow and Tasks to schedule the loads at required frequency using Workflow Manager.

Worked with Pre-Session and Post-Session UNIX scripts for automation of ETL jobs. Also involved in migration/conversion of ETL processes from development to production.

Verified the accuracy of ETL and Business Logics stored in DynamoDB by validating mappings in AWS Athena, ensuring downstream tables reflected correct data.

Monitored all cloud activities using AWS CloudWatch.

Environment: ETL,EC2, VPC, S3, Glue, IAM, EBS, DynamoDB, CloudWatch, CloudFormation, SNS, SQS, Step Function, Event Bridge, Secrets Manager, Lambda, DataSync, EFS, MS SQL Server, Autosys, Unix, No SQL, Python, Spark, and Amazon States Language

Client: Aircom Solutions, Bengaluru, India

Role: Data Engineer Duration: Aug 2019 to October 2021

Responsibilities:

Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.

Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.

Recommended structural changes and enhancements to systems and Databases.

Created test plan documents for all back-end database modules.

Used MS Excel, MS Access, and SQL to write and run various queries.

Worked extensively on creating tables, views, and SQL queries in MS SQL Server.

Coordinate with the business users in providing appropriate, effective, and efficient way to design the new reporting needs based on the user with the existing functionality.

Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.

Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams.

Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.

Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.

Worked on writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala and Python.

Executed multiple ETL jobs using AWS step functions and lambda, also used AWS Glue for loading and preparing data Analytics for customers.

Environment: AWS S3, EMR, Lambda, Redshift, Athena, Glue, Spark, SQL, PySpark, Scala, Python, Java, Hive, Kafka.SQL, SQL Server, MS Office, postgres,Mysql

Client: Ray Solutions, Hyderabad, India

Role: SQL Developer Duration: October 2018 to June 2019

Responsibilities:

Involved in complete Software Development Lifecycle (SDLC).

Wrote complex SQL Queries, Stored Procedure, Triggers, Views & Indexes using DML, DDL commands and user defined functions to implement the business logic.

Advised optimization of queries by looking at execution plan for better tuning of Database.

Performed Normalization & De-normalization on existing tables for faster query results.

Wrote T-SQL Queries and procedures to generate DML Scripts that modified database objects dynamically based on inputs.

Created SSIS package to import and export data from various CSV les, Flat les, Excel spread sheets and SQL Server.

Designed and developed different types of reports like matrix, tabular, chart reports using SSRS.

Involved in migration on SQL Server 2012 databases to SQL Server 2014.

Maintained positive communication and working relationships with all business levels.

Coordinating with onshore/o shore & stake holder’s team for task clarification, fixes and review.

Reviewed, analyzed, and implemented necessary changes in appropriate areas to enhance and improve existing systems.

EDUCATION:

Lindsey Wilson College, Columbia, Kentucky April 2023

Major: M.S. Technology Management GPA: 3.68

Contact this candidate