Data Engineer Machine Learning

Location:

Denton, TX, 76205

Salary:

70000

Posted:

September 15, 2025

Contact this candidate

Resume:

Sandeep Kumar Nandyala

Data Engineer

+1-940-***-**** *******.*@**********.*** Texas 76205 LinkedIn SUMMARY

Data Engineer with 3+ years of experience in designing, developing, and optimizing scalable data solutions across cloud and on- premises environments. Expertise in SQL, Python, AWS, and building robust ETL/ELT pipelines and data warehousing architectures. Background includes data modeling using star and snowflake schemas, RDBMS concepts, and data migration—experience with Apache Spark and Spark SQL in Databricks to develop efficient data pipelines for customer behavior analysis. Machine learning techniques are applied to extract insights from structured and unstructured data. Interactive dashboards created using AWS QuickSight, Power BI, and Tableau to visualize KPIs from Amazon Redshift and enhance business intelligence. SKILLS

Programming Language: Python, R Programming, Scala, SQL, NoSQL Big Data Ecosystem: Apache Spark, Apache Kafka, Apache Nessie, Hadoop, Hive, HDFS, Zookeeper Machine Learning/Statistics: Linear/Logistic Regression, Classification, SVM, Random Forests, Naive Bayes, KNN, A/B Testing Cloud: AWS (EC2, S3, RDS, Lambda, Glue, Athena, AWS Pipeline, Redshift, DynamoDB), Azure, GCP Visualizations: Tableau, Power BI, Excel, Alteryx

Packages: NumPy, Pandas, Matplotlib, Seaborn, PySpark, Scikit-learn, TensorFlow ETL/ELT Tools: Fivetran, SSIS, Informatica, Data Build Tool (DBT), Apache Airflow, Talend, Airbyte Database & Tools: GitHub, Git, GitLab, SQL Server, PostgreSQL, Terraform, Cassandra, MySQL, Snowflake EXPERIENCE

Data Engineer, Blue Cross Blue Shield, USA Jul 2024 – Present

• Deployed automated data pipelines using Apache Airflow to orchestrate and monitor ingestion and transformation tasks, reducing manual intervention by 40% while enhancing data accuracy and overall pipeline efficiency.

• Leveraged Azure Data Factory, Azure Synapse Analytics, and Azure Data Lake to design and deploy scalable ETL pipelines and data warehouses, enabling real-time analytics and reducing data latency.

• Engineered an efficient Apache Spark framework in collaboration with cross-functional teams, enhancing structured and unstructured data processing by 30% through optimized partitioning, caching, and Spark configurations.

• Built scalable data lakes on Amazon S3 to manage diverse data types, supporting big data analytics, reporting, and machine learning while improving data retrieval speed through effective lifecycle management strategies.

• Implemented incremental data models using DBT, tailored for large-scale financial datasets, significantly improving pipeline efficiency, reducing cloud compute costs, and enhancing the timeliness and accuracy of financial reporting and analytics.

• Employed RBAC and masking policies within Snowflake to meet compliance requirements such as GDPR and PCI-DSS, ensuring secure access to personally identifiable and financial data.

• Improved ETL workflows through rigorous testing and debugging of SQL scripts and Python code, resulting in a 50% increase in data processing efficiency and ensuring seamless data integration with downstream systems. Data Engineer, Citus Infotech, INDIA May 2020 – Jul 2022

• Led the migration of legacy reporting systems to Power BI, partnering with stakeholders to define requirements and enhance data visualization, reducing report generation time by 40% and improving decision-making.

• Developed and automated AWS Data Pipeline workflows to streamline data integration from multiple sources, reducing manual intervention by 25% and improving data flow reliability through enhanced error handling and validation processes.

• Leveraged Spark SQL and MLlib to perform complex data analysis and machine learning tasks, generating actionable insights that improved business decision-making and stakeholder engagement.

• Secured clusters to coordinate distributed Kafka brokers, ensuring high availability and consistency for critical data streams

• Built various ETL/ELT pipelines using Databricks notebooks and SQL to extract, transform, and load data from multiple sources into the data warehouse and lakehouse environments.

• Integrated Airflow with AWS to monitor multi-stage ML workflows, with tasks running on Amazon SageMaker, and contributed to CI/CD solutions using Git and Jenkins for setting up and configuring the big data architecture on AWS.

• Collaborated with Data Governance teams to enforce metadata standards and lineage tracking within Talend Data Catalog, improving data traceability and trustworthiness for internal stakeholders and auditors.

• Streamlined Hive queries by applying partitioning, bucketing, and compression techniques, and implemented MapReduce- based data processing systems to manage large-scale datasets and improve overall workload performance efficiently.

• Utilized Google Cloud Platform (GCP) services such as BigQuery, Cloud Storage, and Dataflow to build scalable data pipelines, improving processing efficiency and reducing data latency. EDUCATION

Master’s in Information Systems and Technology,

University of North Texas, Denton, Texas Aug 2022 – May 2024 Bachelor’s in Computer Science and Engineering,

SRM Institute of Science and Technology, Tamil Nadu, India Jul 2017 – May 2021

Contact this candidate