*+ years of professional experience as a data engineer with experience on SQL, Tableau, Hadoop, Python, Spark, Apache Airflow, Docker, Hive to build an end to end pipeline with integration across various data management platforms. SKILLS AND CERTIFICATIONS
Programming Languages: Python, Unix Scripting, SQL, Spark, MapReduce, Pig, Hive, Database: Oracle 12C, PostgreSQL, MySQL
Tools: Docker, Oracle developer, AWS, GCP, HDFS, Tableau, Kafka, Flume, Qlik Certifications: Microsoft Azure Fundamentals, Data Engineering with Google Cloud Platform WORK EXPERIENCE
Data Engineer Intern Jan 2020 -present
HCL Global Systems, MI, USA.
• Migrating legacy SQL server DWH to google cloud using the custom ETL code.
• Supporting legacy users by loading data available in GCP to on-prem SQL server using outbound API requests.
• For data analysts and business users, created views in BigQuery to restrict access to PII data using policy tags.
• Extensively worked on GCP’s cloud function, cloud run, and containerized VM to determine which fits best for lite batch processing.
• Built Gitlab CICD pipelines using runners to build and deploy applications in Google cloud.
• Automated cloud infrastructure creation using Terraform scripts.
• Created alerts from log-based metrics to notify the team of production job errors and failures. Data Engineer Aug 2017-July 2019
Infor Global Solutions, India.
• Designed secure data pipelines, resolved cluster scalability, data distribution, and processing concerns.
• Designed and developed big data ETL pipelines to process, clean, and ingest semi-structured and structured data
• Charted data visualization with Python using Scatter Plots to identify the correlation between product deployment environment and storage issues.
• Automated data flow processes in Airflow, which ensure timely process of data to analysts and clients, leading to a 15% increase in the business value.
• Designed various use cases and functions to extract useful parameters from raw data.
• Documented and presented various phases and resulting insights of the project to the company.
• Build data marts from data lakes and time-variant sources.
• Used the JIRA ticketing system to manage and track all stages of development and test data pipeline.
• Transformed data using Pyspark from S3 and loaded the aggregated data into AWS redshift for reporting, dashboarding, and analysis as per the business use case.
• Facilitate detailed monitoring with Cloudwatch and smart Lambda functions and provide user support.
• Possess strong commitment to team environment dynamics with the ability to contribute expertise and follow leadership directives at appropriate times
• Used various Spark transformations and actions for cleansing the input data and involved in using the Spark application master to monitor the Spark jobs and capture the logs for the spark jobs. Used spark UI for analyzing spark jobs to improve performance
• Experience building data processes to support production engineering, data science and Business intelligence systems
• Created PL/SQL functions, stored procedures, and packages to migrate data from one database to another, while working closing with the development team for any alterations.
• Performed Data Cleaning, features scaling, featurization, features engineering, and deploying the data in AWS S3 and Athena
• Worked on writing Docker Compose and Docker File to containerize the application.
• Actively involved in deployment processes using Jenkins, AWS EMR, and AWS ECS. Big Data Developer July 2016-Aug 2017
• Performed loading, transforming, analyzing data from various sources into HDFS using Sqoop, Hive
• Translated the analysis into actionable insights using Python visualization and presented strategies of a permanent solution to the stakeholders on site
• Designed and developed complex data pipelines to retrieve data from XML documents related to medical data.
• Charted data visualization with Python using Scatter Plots to identify the correlation between product deployment environment and storage issues
• Exporting the analyzed and processed data to the RDBMS using Sqoop for visualization and the generation of reports for the BI team
• Proficient in developing packages, stored procedures, functions, triggers, and complex SQL statements.
• Involved in developing various ETL jobs to load, extract, and map the data from flat files and heterogeneous database sources like Oracle and DB2.
• Excellent personal motivation with a proven ability to build and work collaboratively in a strong team concept environment, and independently
• Developed code with complete test coverage with unit testing and end-to-end integration testing. Hadoop Developer Intern Feb 2015-Jun 2016
HAPP Technologies Pvt Ltd., India
• Developed Spark jobs using PySpark to create a generic framework to process all kinds of files such as JSON, txt
• Managed and reviewed Hadoop log files to identify issues when a job fails and used HUE for UI based pig script execution, Automatic scheduling.
• Interpreted and analyzed sales information data for marketing purposes using Tableau and provided dashboard reports to the management.
• Developed and deployed Python on Spark applications for data aggregation and analysis. EDUCATION:
M. Eng. Computer Sciences (Data Science) (3.8 GPA) May 2021 University of Cincinnati, Cincinnati, OH
Course work: ML, Deep Learning, Intelligent Data Analysis, Cloud Computing, Information Retrieval B.E, Computer Science and Engineering (3.2 GPA) May 2017 G Narayanamma Institute of Technology and Sciences, Hyderabad, India