Data Engineer Machine Learning

Location:

Hayward, CA

Posted:

September 17, 2024

Contact this candidate

Resume:

Nikhila Veldurthi Data Engineer

************@*****.*** 475-***-**** Hayward, CA Linkedin

Profile

Results-oriented Data Engineer with 4 years of experience in conceptualizing and implementing ETL processes, optimizing data pipelines, and enhancing business intelligence. Skilled in SQL, and Python, and with practical knowledge in data modeling, database management, Spark, and data-build tools. Success in implementing scalable cloud solutions, including Azure

& AWS, along with a solid understanding of machine learning concepts. Skills

Languages: Python, R, SQL, NoSQL, Scala, JavaScript, Shell Databases: MS SQL, MySQL, PostgreSQL, MongoDB, Oracle, Cosmos DB, DynamoDB Data Processing, Streaming & Library: PySpark, Apache Kafka, Pandas, NumPy, Scikit-Learn, Matplotlib, SpaCy, Airflow, Snowflake, T-SQL, PL-SQL, SSIS, SSRS, ETL, MS Excel, Azure SQL, Azure Data Factory (ADF), Google Analytics, Databricks, Hadoop, Synapse Analytics, Data Warehouse

Other Tools: Git, GitHub, Docker, CI/CD pipelines, Agile, Jenkins, Power BI, Tableau, IntelliJ, Machine Learning, KNN, Logistic Regression, Decision Trees, JIRA, Django, Flask

Professional Experience

Data Engineer, Civis Analytics 01/2023 – Present Remote, USA Project: Real-Time Customer Behavior Analysis for E-Commerce Developed a real-time data pipeline to analyze customer behavior on an e-commerce platform, enhancing user experience and increasing sales conversions.

Engineered robust data pipelines using Apache Kafka and Apache Spark (Scala) for real-time data ingestion and processing; architected and optimized a Snowflake data warehouse for scalable, efficient data storage and access; developed interactive dashboards with Power BI and Tableau for real-time visualization of key metrics. Implemented predictive models using Azure Machine Learning Services for customer churn and product recommendations; integrated with AWS services (S3, EC2, IAM) for secure, scalable cloud resources and automated CI/CD pipelines using Azure DevOps; conducted performance tuning of Spark jobs and implemented monitoring tools for continuous improvement.

Achieved a 15% increase in sales conversions and a 10% reduction in customer churn through real-time data insights and predictive analytics.

Improved operational efficiency by automating data processing and model deployment, enabling rapid iteration and innovation.

Data Engineer, Genpact 03/2020 – 07/2022 Hyderabad, India Project: Scalable Data Integration and Real-Time Analytics Platform Developed a scalable data integration and real-time analytics platform to streamline data processing and enhance the company's ability to make data-driven decisions.

Architected and optimized ETL pipelines using PySpark on Azure Databricks, accelerating data processing by 40% and ensuring high data quality across 10TB+ of diverse sources. Integrated machine learning models into pipelines, automating forecasting and customer behavior prediction, boosting accuracy by 30% and saving 100+ man-hours monthly. Implemented CI/CD pipelines with Azure DevOps for continuous integration and deployment of data solutions, reducing deployment time by 50% and doubling deployment frequency to meet dynamic business demands. Managed Azure-based data warehouse infrastructure (Azure Blob Storage, Azure Synapse Analytics, Azure Data Factory), achieving 99.99% uptime and enabling real-time data access and analysis for 50+ global stakeholders. Education

Master of Science, California State University-East Bay 01/2022 – 12/2023 California, USA Business Analytics

Bachelor of Science, Sreenidhi Institute Of Science And Technology 07/2017 – 09/2021 Hyderabad, India Computer Science Engineering

Certifications

AWS certified solution architect- Associate

Contact this candidate