Data Engineer

Location:

Nashville, TN

Posted:

June 06, 2023

Contact this candidate

Resume:

BHARATHI KANTETI

OBJECTIVE

Seeking a challenging position in data engineering and software development, where I can leverage my knowledge of data architecture, software engineering principles, and programming languages to design and develop scalable, high-performance systems. I am passionate about building end-to-end data solutions that enable businesses to make data-driven decisions. I am eager to work with a team of talented engineers to design and implement software systems that solve complex business problems.

PROFESSIONAL SUMMARY

● Pioneer for developing a transpiler which is a part of the Cognizant AI migration tool for converting legacy ETL to Cloud Compatible Big Data Scripts.

● Understood and developed ETL (Extraction, Transformation, and Loading) logic for PySpark and Python, mapping it to on-premise ETL tools.

● Created and executed Spark scripts in various cloud environments such as AWS Glue and Azure Databricks, to efficiently move data from diverse sources to AWS storage.

● Led transpiler development and provided training to colleagues to facilitate their learning and development.

● Proficient in comprehending business requirements and translating them into ETL logic.

● Coordinated and delivered automated scripts for client projects, monitoring progress to ensure timely and complete project delivery.

● Proficient in writing optimized complex SQL queries, including conditional queries(where, having), subqueries, joins, aggregators, filters, and data grouping from multiple tables.

● Expertise in AWS Cloud Administration, encompassing services such as EC2, S3, EBS, VPC, ELB, SNS, IAM, Route 53, Auto Scaling, CloudWatch, Security Groups, CloudFormation, and CodePipeline, with a focus on enhancing scalability, reliability, performance, and availability of cloud infrastructure.

● Experience in implementing, updating, and managing infrastructure, such as virtual machines, load balancers, and databases, using Terraform as code solutions on cloud providers like AWS and Azure.

● Proficient in working with Microsoft Azure Administrator, including configuring availability sets, virtual machine scale set (VMSS) with load balancers, virtual networks, network security groups (NSG), Docker, and Kubernetes.

● Experience in utilizing Azure DevOps tools such as Boards, Repos, Artifacts, and Pipelines for building, testing, and deploying applications, including executing manual and automated tests using test plans.

● Experience in creating conversational agents using IBM Watson, Node.js, and its REST API, integrated with a MongoDB-powered backend system" for improved wording and accuracy.

CONTACT

571-***-****

****************@*****.***

linkedin.com/in/bharathi-kanteti

TECHNICAL SKILLS:

Programming Languages: SQL, PySpark,

SparkSql, Python, Node JS rest API, Java

Clouds: AWS, Microsoft Azure

Relational Database: Oracle, MySQL

Non-Relational Database: MongoDB

Build Automation tools: Jenkins

Automation Tools: Terraform, Ansible

IDE: Visual Studio Code, Eclipse, Pycharm,

Jupyter, Spyder

Version Control:

Git, Bit-Bucket, AWS Code Commit

Monitoring and Alerting:

AWS Cloudwatch, Azure Monitoring,

Prometheus, Dynatrace, Newrelic, Splunk

Orchestration: Airflow

Containerization:

Kubernetes, Docker, AKS, EKS

Code Quality and Security:

SonarQube, Veracode Scan

Frameworks and Libraries: Node.js

Documentation Tools: Confluence

Ticketing Tools: Jira, Zendesk, Service Now

Methodologies: Agile

CERTIFICATIONS

Microsoft Azure Administrator (AZ104)

Microsoft Azure Devops Solutions (AZ400)

● Good knowledge of integrating GitHub Enterprise with Amazon Web Services, including AWS CodeBuild and AWS CodeDeploy, to provide a streamlined software development workflow.

● Well-versed in Git's core functionality, such as branching and merging. Used GitHub Actions to streamline continuous integration and delivery(CI/CD) and automate testing, building, deploying code, and managing releases.

● Experience in writing Dockerfiles to build Docker images and run containers. Worked on Docker registry and Docker Hub to store and distribute Docker images, Docker Compose to define and run multi-container Docker applications, Docker Swarm for container orchestration, and volume to store and share data between containers.

● Handled continuous integration using Jenkins to automate the building and testing of software every time a commit is made in the code to a shared repository, helping ensure the code is always in a working state and can be deployed to production.

● Accomplished continuous delivery using Jenkins to automate deployment of software to various environments, such as development, testing, staging, and production, which helps reduce errors and enable more frequent releases.

● Experience in using Monitoring tools like AWS Cloudwatch, Newrelic, Splunk, Datadog, Dynatrace to track and monitor system performance, metrics, and logs, setting up alerts based on thresholds and conditions. EDUCATION

● Master’s in Computer Science August 2021 - May 2023 University of Dayton, Dayton, Ohio, USA

Relevant Coursework: Database Management system(Data models, Entity relationship, relational algebra, Normal forms), Advanced Programming and data structures using Python, Algorithm designs, Deep learning, Advanced Computer Vision, Operating Systems, Data Science and Pattern recognition, Language based security, Web semantics (SPARQL, GraphDB)

● B-Tech in Electronics and Communication September 2013 - May 2017 Jawaharlal Nehru Technological University, Hyderabad, Telangana, India ACADEMIC PROJECTS

● ETL pipeline using Pyspark (Spring 2023):

As a Master's student, my project involved developing an ETL pipeline using PySpark and processing a sample JSON dataset. Firstly, Extracted the data from the source, which was in JSON format. Used PySpark to read the JSON files and convert them into a DataFrame. Transformed the data by performing various operations such as filtering, aggregating, and joining. This included data cleaning and normalization, as well as implementing business logic to derive new columns and features. Once the data was transformed, I loaded it into the target system, which was a relational database. Used PySpark to write the transformed data into a database table. To optimize performance, I partitioned the data into smaller chunks and used parallel processing techniques provided by PySpark to process them simultaneously. Finally, I tested the ETL pipeline on the sample dataset and validated the results by comparing them with the expected output.

● Azure Cloud Computing (Fall 2022):

As a master's student, created Azure resources using Terraform and configured CI/CD pipelines by developing YAML templates to deploy a Node.js application using Azure DevOps build and multistage release pipelines. These pipelines included configurations such as pre-approvers and pipeline variables and targeted Azure Kubernetes Service, Azure Function Apps, and Azure App Services. Additionally, it used Azure DevOps technologies such as Azure Repos, Azure Boards, and Azure Test Plans to organize tasks and collaborate with others on code development, building, and application deployment.

● AWS Cloud Computing (Fall 2022):

As a master's student, Created an AWS infrastructure that included a VPC with subnets, custom route tables, and an internet gateway(IGW). Configured EC2 instances on both Linux and Windows, with Elastic Block Storage (EBS). Additionally, Set up Elastic Load Balancers with target groups, health checks, and SSL certificates on listeners to restrict traffic. After launching instances, Installed Docker and wrote Docker files to host an NGINX web application.

● Limitless - A Music Database (Spring 2022):

As a master's student, Designed and implemented a highly scalable MySQL database for a course project. The database is created to store and retrieve data related to songs sung by multiple artists. Firstly, Created a conceptual data model and created an ER diagram to represent the relationships between the various entities. Created a logical data model, which involved translating the ER diagram into a set of normalized tables. Used SQL to create the tables, define relationships between them, and enforce referential integrity constraints. Inserted sample data into the database, writing Complex MySQL queries using techniques like where clause, subqueries, joins, aggregators, filters, and grouping data from multiple tables to extract meaningful insights from the data. To optimize database performance, identified frequently used columns within queries and created corresponding indexes. Also implemented security measures to restrict access to sensitive data and prevent SQL injection attacks.

● Object Detection using TensorFlow (Fall 2021):

As a master's student, I developed an Android application as part of a course project. The application was designed to detect cats and dogs with bounding boxes using a tflite model. To build the dataset, Collected images from the Kaggle dataset and labeled them with bounding boxes using the labeImg repository from GitHub. Then created .TF records and used the TensorFlow Object Detection API to train the model. The pre-trained model used was SSD MobileNet FPN Lite. After training, the model created checkpoint files that contained the optimized weights for deployment on the Android application.

WORK EXPERIENCE

Data Engineer June 2019 - August 2021

Cognizant Technology Solution, Hyderabad, India

Scope of the Project:

Providing Intelligent migration solutions to convert code from one ETL to another using a collection of “AI/ML”. Enabled pre-built components, tools & accelerators, and methods for faster conversion with high accuracy and efficiency.

Roles and Responsibilities:

● Worked on Big Data technologies such as Apache Spark, Cloud services and storage platforms such as AWS Glue, S3, Azure DataBricks, and ADLS.

● Used Python for data migration tasks such as reading data from various sources, performing data cleaning and transformation, and writing data into the target data source.

● Created spark jobs that read and write data from different DBs along with transformations like Expression, Filter, Join, Aggregator, Sorter, Update strategy, and SQL Overrides.

● Developed Spark transformations by understanding client requirements and their technical documentation.

● Executed Spark scripts in different cloud Environments like AWS Glue, Azure Databricks and connecting to upstream sources like hive for reading and writing data.

● Worked on parquet,avro and CSV, JSON file formats as file system reads. And explored different data write modes to send data to target locations.

● Responsible for developing and maintaining Directed Acyclic Graphs (DAGs) in Airflow.

● Implemented data validation checks and data cleaning techniques to ensure that the data is of high quality.

● Responsible for monitoring data pipelines to ensure they ran smoothly on a daily basis. Additionally, I performed data reconciliation on daily data loads.

● Documented the processes involved in building and maintaining the data pipeline. This includes documenting data sources, data transformations, and data validation processes.

● Used EMR to run Spark pipelines that ingest data from various sources, such as S3, Hive, and Oracle.

● Set up the configuration for each data pipeline based on the size of data feeds coming from upstream.

● Optimized the time taken by Spark jobs by manipulating cluster configurations. AI Developer March 2018 - May 2019

Cognizant Technology Solution, Chennai, India

Scope of the Project:

To build a chatbot application that utilizes NLP to provide a conversational interface for employees to inquire about company policies, raise tickets, and connect with live agents. To design chatbot with high availability and scalability, using AWS and to feature security measures to protect sensitive data. Finally, to test Chatbot ensuring that it can handle a high volume of concurrent users without degradation in functionality or performance. Roles and Responsibilities:

● Developed use-cases for Chatbot using Intents, entities and dialogue flows in Watson Assistant of IBM Watson and used its Watson Conversation API and automated the data ingestion process to IBM discovery.

● Web scraped the data from a client’s website using Node.js and ingested the scraped data into MongoDB.

● Built infrastructure services such as Lambda functions, API gateway endpoints, IAM roles, VPC, Subnet with respective CIDR ranges, EC2, configuring Elastic Load Balancers(ELB) with target groups, health checks and SSL certs on its listeners.

● Configured IAM by creating roles, users, groups and also implemented MFA to provide additional security to AWS accounts and its resources.

● Configured AWS lambda to trigger the applications based on the user request through chatbot like raising remedy tickets. Also configured AWS Connect to connect to the live agents through chatbot.

● Created and configured GitHub repository with branching strategy to store project’s source code, set up access controls, and configured hooks and scripts to automate tasks such as code reviews and merge requests, also configured Git repositories for version control.

● Created test scripts for unit testing using jasmine and karma.

● Created CI/CD pipelines via code pipeline which includes code build and code deploy. Configured code build actions to run unit test cases created, code quality tests using sonarqube and vulnerability management tests(SAST, DAST tools like veracode scan) on frontend and backend source code respectively in the test stage.

● Developed Docker files with its dependencies to produce docker image and configured code build to push the artifact into Elastic Container Registry (ECR).

● Created Kubernetes deployment files referencing the Docker image present in ECR repository and deployed the Kubernetes deployment to EKS cluster.

● Configured AWS cloudwatch to monitor and Set up alerts and notifications to detect instance downtime and trigger respective lambda functions to restart the services.

● Configured Cloudtrail to monitor and detect unusual activity happening on AWS services.

● Built a confusion matrix on the Chatbot responses and to create pivot charts based on the analysis from the confusion matrix.

Contact this candidate