Data Engineer Senior

Location:

Tinley Park, IL

Posted:

April 24, 2024

Contact this candidate

Resume:

PAVAN KALYAN

Email: ******.****@*****.***

linkedin.com/in/pavan-kalyan-2207182a9 PH: 312-***-****

Senior Data Engineer

PROFESSIONAL SUMMARY:

• Having 8+ Years and Expertise in designing data-intensive applications using Big Data Frameworks during different stages of data processing, Cloud Data Warehouse/data mart for efficient Data Analytics, and Data Quality solutions using Data Catalog and ETL tools.

• Experience in collaborating with organization-wide Business Teams, Stakeholders, and Data Stewards to understand the requirements and maintain Data Catalog for Business Users.

• Experience in building and updating robust Data Frameworks/Pipelines to ingest batch and stream data using EMR, EC2, Glue, Lambda, Kafka, Step Functions, CloudWatch, and SNS/SQS, while adopting and integrating to various AWS cloud resources resulting in eliminating the dependency on teams like Operations and Lifecycle Management etc...

• Profound experience in performing Data Ingestion and Data Processing (Transformations, enrichment, and aggregations). Strong Knowledge of the Architecture of Distributed systems and Parallel Processing, In-depth understanding of SPARK programming and Spark execution framework.

• Instantiated, created, and maintained CI/CD (continuous integration & and deployment) pipelines and applied automation to environments and applications. Worked on various automation tools like GIT, Terraform, and Ansible.

• Hands-on experience in handling database issues and connections with SQL and NoSQL databases like MongoDB, Redis, and DynamoDB by installing and configuring various packages in Python.

• Good Knowledge in writing different kinds of tests like Unit tests/Pytest and building them.

• Expertise in Build Automation and Continuous Integration tools such as Jenkins.

• Experience of Partitions, and bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Experience with different file formats like Avro, parquet, ORC, JSON, and XML.

• Well-versed with Agile with SCRUM, Waterfall Model, and Test-driven Development (TDD)methodologies.

• Strong experience in developing Web Services like SOAP, REST, and Restful with Python programming language.

• Good Experience in Linux Bash scripting.

TECHNICAL SKILLS:

Programming

Languages

Python (Boto3, Pandas, Matplotlib, HTTPLib2, Urllib2, Beautiful Soup, NumPy) R language, and SQL

Operating Systems Windows 98/2000/XP/7,8, Mac OS, and Linux CentOS Cloud Infrastructure

AWS Cloud – AWS S3, AWS Lambda, Step functions, DynamoDB, AWS Athena, Code pipelines, AWS EMR, AWS EC2, AWS Glue, AWS Simple DB, AWS MQ, AWS RDS, AWS CloudWatch, AWS SQS, AWS Identity and access management

Azure Cloud - Data Bricks, Azure Data Factory (ADF), Azure Data Lake, Azure pipelines, Azure Functions, Blob storage, Azure Functions.

Google Cloud–Cloud Dataflow, Cloud ML, Big Query, Data Proc, Big Table. Big Data Tools Hadoop, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Kafka, Yarn, Apache Spark. Databases/Servers Snowflake, MySQL, Redis, SFTP, FTP, PostgreSQL, Aurora DB, DynamoDB, MongoDB Machine Learning & AI

Frameworks

Data Mining- Logistic Regression, Linear Regression, Support Vector Machines, Decision trees, Time Series, Neural Networks, and other advanced statistical techniques. Deep Learning- Neural Networks, Deep Neural Networks (CNN & RNN) and LSTM. Deep Learning Frameworks-Kera’s & TensorFlow

Webservices/Protocols TCP/IP, UDP, FTP, HTTP/HTTPS, SOAP, Rest, Restful Business Intelligence Kibana, Tableau, Qlik Sense, Google Cloud Data Studio, Advanced Microsoft Excel, and Power BI. Version Control Git, SVN, Bit bucket

PROFESSIONAL Experience

Exabeam, Foster City, CA November 2022 to Present

Senior Data Engineer

Responsibilities:

• Involved in all phases of SDLC from requirement gathering, design, development, testing, Production, user training, and support for the production environment.

• Design, review, implement, and optimize data transformation processes in Oracle, and Hadoop(primary) utilizing Python, and Hive.

• Collaborate with Solution Architects, Principal Engineers, DevOps team, and business Analysts to understand the precise business needs of the acceptance criteria and ensure the deliverables of data products.

• Used Enterprise GitHub Repos for version control.

• Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data martin Redshift.

• Responsible for extracting, transforming, maintaining quality, and ensuring the Integrity of data utilizing Big Data technologies like Hadoop, MapReduce, Pig, Hive, Flume, Sqoop, Spark, and AWS cloud services like S3, Lambda, EC2, EMR, RedShift, DynamoDB, Glue.

• Involved in designing, and optimizing Spark SQL queries and data frames, importing data from Data sources, performing transformations, performing read/write operations, save the results to the output directory into HDFS.

• Developed Spark scripts by writing custom RDDs in PySpark for data transformations and performing actions on RDDs.

• Worked on Kafka REST API to collect and load the data on the Hadoop file system and used Sqoop to load the data from relational databases.

• Extract Real-time feed using Kafka and Spark Streaming convert it to RDD process data in the form of Data Frame and save the data as Parquet format in HDFS.

• Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP. Environments: AWS Services, Hadoop, Map Reduce, Hive, Pig, Sqoop, Spark, Teradata, Parquet, Oracle12c, SQL Plus, SQL Loader, SQL Developer, PL/SQL, Designer, Workflow Manager, Workflow Monitor, Repository Manager, Kafka, GitHub, Shell Scripts, UNIX, Windows XP.

Mayo Clinic Rochester MN June 2020 to October 2022 Data Engineer

Responsibilities:

• Working with Python 3.8, Azure Serverless stack.

• Using Terraform - Infrastructure as a code.

• Focused on building VNETs from scratch using Azure Resource Manager (ARM) templates, creating private and public subnets, network security groups, configuring internet gateways, VPN, creating custom images, understanding of user access management/role-based access/multi-factor authentication, API access.

• Used SQL on the reports generated by companies to judge the complexity of data and datatypes.

• Created Azure Cosmos DB tables to load large sets of structured, semi-structured, and unstructured data coming from UNIX, NoSQL, and a variety of portfolios.

• Loading processed data to Azure Synapse Analytics (formerly Azure SQL Data Warehouse).

• Trigger Python scripts written on Azure Functions and monitor logs on Azure Monitor.

• Assisted with the configuration of Azure Monitor and Azure Security Center for logs and metrics.

• Loaded the data into Azure Databricks and performed in-memory computation to generate the output.

• Monitored the use of data files and regulated access to protect secure information.

• Provisioned Azure Landing Zones to create a customized baseline of Azure accounts, networks, and security policies.

• Involved in loading data from the UNIX file system to Azure Data Lake Storage.

• Developed Data Pipelines using Azure Stream Analytics to stream and process real-time data.

• Implemented Partition, Bucketing, and salting on various tables for better performance.

• Worked on various file formats like Parquet, JSON, and CSV.

• Worked with external vendors/partners e.g., Salesforce to onboard external data into Target Azure Data Lake. Environments: Python 3.8, Shell Scripting, PySpark, Spark 3.0, Spark SQL, Spark DF/DS, Azure Blob Storage, Cosmos DB, UNIX, PowerShell, Terraform, Azure Monitor, Azure Resource Manager (ARM) templates, Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Azure Data Lake Storage, Azure Stream Analytics. AWS Data Engineer

Truist Bank, Charlotte, NC October 2018 to May 2020 Responsibilities:

• Deploying, automating, maintaining, and managing AWS cloud-based production system, to ensure the availability, performance, scalability, and security of production systems.

• Setting the Infrastructure as a code (IaaS) in the AWS cloud platform from scratch through cloud formation templates by configuring and integrating appropriate AWS services as per the business requirement.

• Preformation critical patches to the Windows servers every month through the AWS Patch manager.

• Designed and implemented the data migration from the SQL Server (On-Prem) to AWS Cloud (Redshift) which resulted in huge cost savings for the organization.

• Building data models and schema design based on the business requirement and interacting with different clients to efficiently design and model.

• Working with SQL queries to QC the load on the DB with the ETL tool and managing appropriate changes to tables for query performance improvement.

Environment: AWS, AWS Lambda, AWS EMR, SQL, ETL, Redshift, Informatica, Sagemaker, DevOps, Jenkins, Cloudera (CDH3), HDFS, Pig 0.15.0, Hive 2.2.0, Kafka, Sqoop, Shell Scripting, Spark 1.8, Linux- Cent OS, Map Reduce, python 2, Eclipse 4.6. Brio Technologies Private Limited Hyd India April 2017 to July 2018 AWS Developer/Data Engineer

Responsibilities:

• Developed an end-to-end data pipeline solution in the AWS cloud that reads data from Redshift, performs transformations using AWS Glue(spark), and writes data to neo4j, which is based on SQS events from upstream data.

• Orchestration of glue jobs using step functions, RDS, and DynamoDB for audit purposes, and lambdas to invoke step function and Python code for RDS database updates and queries.

• Develop Glue jobs with Python code to query data from Aurora DB and create tables in Redshift.

• Work with PySpark classes, traits, and objects using data frame and RDD in Spark for Data Aggregation, queries, and writing data back into graphical database neo4j.

• Created and managed S3 buckets in AWS imported data from AWS S3 into Spark RDD and performed transformations on RDD.

Environments: AWS Services, S3, SQL, Hadoop, Spark, Terraform, GitHub, Python, Agile-Scrum, Parquet, Avro, PySpark, Jenkins, Ctrl-M.

IBing Software Solutions Private Limited Hyd India August 2015 to March 2017 Data Analyst

Responsibilities:

• Performed Column Mapping and data Mapping and Maintained Data Models and Data Dictionaries.

• Built a system to perform real-time data processing using Spark Streaming and Kafka.

• Involved in retrieving multi-million records for data loads using SSIS and by querying against Heterogeneous Data Sources like SQL Server, Oracle, Text files, and some Legacy systems.

• Expertise in using different Transformations like Lookups, Derived Column, Merge Join, Fuzzy Lookup, Loop, For Each Loop, Conditional Split, Union all, Script component, etc.

• Transferred data from various data sources/business systems including MS Excel, MS Access, and Flat Files to SQL Server using SSIS/DTS packages using various features.

• Involved in Performance tuning of ETL transformations, data validations, and stored procedures. Environment: SQL, Azure Data Factory, Kafka, SQL Server, MS Excel, Microsoft Education:

Bachelor of Technology in Electronics and Communication Engineering: Lovely Professional University 2011-2015.

Contact this candidate