Machine Learning Data Engineer

Location:

Overland Park, KS

Posted:

September 07, 2023

Contact this candidate

Resume:

Laxma Reddy Nalla

+1-913-***-**** ***********.*****@*****.*** Laxma Reddy Nalla LinkedIn GitHub

Programming Skills Languages and Frameworks: Python, Java, C, C++, PySpark, SQL, Spark, Data Analysis, Databricks, Docker, HDFS, Hive

Web Technologies: HTML, CSS, JavaScript, JSON, XML.

Tools and Utilities: Databricks, IntelliJ, Git, MySQL, VS Code, Snowflake, Bitbucket, Oracle SQL, Postgres, PyCharm, shell, zshell, bash, Hadoop, Apache Spark

Cloud: Azure, GCP, AWS, Azure ML, Distributed computing,

File Formats: CSV, Parquet, AVRO, JSON

OS: Windows, Linux, Mac OS

Professional Summary

With nearly 4 years of experience in the realm of Data and Machine Learning models, I am dedicated to delivering data-driven, action-oriented solutions to intricate business challenges. My robust grasp of statistical concepts related to Machine Learning, encompassing Confidence Intervals, Correlation, Probability, Hypothesis Testing, and more, empowers me to extract insights from complex datasets. Proficient in Python scripting and SQL, I adeptly manipulate data from diverse databases while harnessing the power of libraries like NumPy, Matplotlib, Pandas, and Scikit Learn for effective analysis and visualization. My portfolio includes leveraging frameworks such as Keras, TensorFlow, Spark ML, and Si-kit for predictive modeling. I seamlessly collaborate with business/product managers, translating problems into mathematical models within the business context. Moreover, my proficiency in Linux environments and cloud architecture enables me to efficiently handle substantial data in cloud environments.

Experience

Everest Reinsurance Warren, NJ (Remote)

Data Engineer Lead April 19, 2023 – Present

Providing data-driven, action-oriented solutions to challenging business problems.

Strong understanding of statistics concepts related to Machine Learning including Confidence Intervals, Correlation, Significance, Probability, Distributions, Hypothesis Testing etc. Familiar in collecting data from various databases and cleaning data for statistical Analysis model.

Proficient in Python Scripting, SQL to manipulate the data.

Worked on stats functions with NumPy, visualization using Matplotlib and Pandas for organizing data and machine learning tools/frameworks like, Scikit Learn, Keras, TensorFlow, Spark ML etc.

Used Sci-kit python packages for predictions.

Lead Data Engineering team to implement Databricks architecture for data pipelines.

Implemented Change Data Capture using checkpointing/Change Data Feed concepts on top of Delta lakes.

Worked with PySpark Data Frames to cleanse and transform data.

Implemented medallion architecture to process data in data bricks delta tables.

Strongly grounded concepts and application knowledge of ML techniques including Linear Regression, Logistic Regression, Classification, Decision Trees, Clustering, Random Forest.

Can work with business/Product managers to frame a problem mathematically and within the business context.

Ability to work in a Linux environment, Experience in cloud Architecting and process large amounts of data in a cloud environment.

Advoco Pte Ltd Singapore (Remote)

Data Engineer Dec 2020 – July 2022

Implemented chatbot using open-source RASA NLP platform for insurance domain clients.

Worked NLP and NLU to build chatbot to generate leads for Insurance Clients.

Attending Daily stand-up meetings (Scrum), Estimation meetings and Requirement review meetings to analyze requirements for each story card in a sprint.

Worked on Insurance Domain NLP chatbot to build human like conversational AI chatbot.

Built RASA NLP predictive models to make the chatbot interactive to users.

Wrote python and PySpark scripts to implement data validation and processing.

Implemented spark data proc clusters in GCP to support distributed file processing.

Build Big Query Data warehousing system to query and load data.

Worked on GCP cloud platform to build infrastructure to implement dockerize containers for chatbot deployment.

Created a centralized data lake on the Azure cloud platform. Developed data pipelines using Azure Data Factory to process transactional and user profile data from on-premises data warehouses using PySpark and Scala.

Automated Data Factory pipeline deployment to QA and production using GitHub CICD pipelines.

Utilized ADLS Gen2 to store data from Data Factory ETL pipeline.

Built Delta Tables on top of Data Lake using Databricks ETL pipelines.

Implemented medallion architecture to process data in data bricks delta tables.

Wrote PySpark and Scala transformations for data loaded in Data Lake Storage.

Used Synapse Dedicated SQL pool Connector to write processed data from Spark to Dedicated SQL pool.

NoFrdz Hyderabad, India Software Engineer May 2020 – May 2021

Responsible for building Python Api’s using Python flask.

Implemented and modified various SQL queries and Functions, Stored Procedures, Cursors and Triggers as per requirements.

Worked with multiple databases using Py MySQL to connect to MySQL and Psycopg2 to connect to PostgreSQL.

Developed, stored, and evaluated testing data sets for various application scenarios.

Worked with POSTMAN for API testing.

Prepared and stored testing data sets for various business scenarios.

Maintained and developed data pipeline for ingesting data from various sources using S3, AWS RDS and Python boto3.

Utilized continuous integration and automated deployments with Docker Containers and Docker Compose.

Techtuts Hyderabad, India ML Tutor May 2020 – Aug 2020

Delivered Python lectures via an open-source learning platform.

Delivered Machine Learning Basics, Supervised, Unsupervised Machine Learning Models.

Delivered content on Linear, Logistic, Decision Tree (DT), Random Forests, PCA, Tensor Flow, Dockerization.

Worked on NumPy, Pandas, and Python packages.

Education

University of Central Missouri Lees summit, MO master’s in computer science; GPA 3.55/4.0 January 2022- May 2023

Course Work

Advanced Algorithms, Advance Database Management Systems, Advance Operating Systems, Compiler Design, Big Data, Machine Learning, Artificial Intelligence, Advance Application Programming in Java, Statistical Programming with python

Guru Nanak Institute of Technical Campus Hyderabad, India

Bachelor’s in computer science GPA 7.75/10

Course Work

Operating Systems, Software Engineering, Database Management Systems, Computer Networks, Design an analysis of algorithms, Web Technologies, Machine Learning and pattern recognition, cloud computing, Java.

Certifications:

-Delta Lakehouse Fundamentals

-AI-900 Azure AI Fundamentals: https://bit.ly/3Yx0XSF

-Google Cloud Essential GCP Essentials

-AZ-305 Azure Infrastructure Solution Designer: https://bit.ly/3EdhcvR

-DP-203 Azure Data Engineer Associate: https://bit.ly/3K9NawR

-DP-900 Azure Data Fundamentals: https://bit.ly/3jZLPhA

-AZ-900 Azure Fundamentals: https://bit.ly/3K9bJtK

-SC-900 Azure Security, Compliance Fundamentals: https://bit.ly/3YyPn9k

-Snowflake hands on essentials https://bit.ly/3lU9eld

Contact this candidate