Data Engineer Project Management

Location:

Kansas City, MO

Posted:

September 10, 2025

Contact this candidate

Resume:

YASHASRI KANCHUKATLA

+1-913-***-**** *********************@*****.*** LinkedIn_Yashasri

Technical Skills:

Category

Tools & Technologies

SCM Tools

Subversion, GIT, GitLab, Bitbucket, TFS

Cloud Platforms

AWS, Azure, OpenShift

Build & Release Tools

Bamboo, Jenkins, Hudson, Docker, Anthill Pro, Ant, Maven, Gradle, MS Build

Orchestration Tools

Ansible, Chef

Infrastructure Tools

Terraform, AWS CloudFormation (CFT), Azure ARM Templates

Container Management

Kubernetes, Docker

Bug Tracker & Testing

JIRA, Bugzilla, JUnit, HP Quality Center

Project Management Tools

MS Project, MS SharePoint, Atlassian Tools, Team Foundation Server, Agile, Scrum, Waterfall

Servers

JBOSS, Apache Tomcat, Oracle WebLogic, IBM WebSphere, IIS Server

Scripting & Programming

Shell Script, Bash, Python, PowerShell

Databases

Aurora, DynamoDB, SQL Server 2000/2005/2008, Oracle 9i/10g (PL/SQL), REST API’s

Operating Systems

UNIX, CentOS 6/7, Linux 4/5, Ubuntu, Windows 98/NT/XP/Vista/7/8/10

PROFESSIONAL EXPERIENCE

AWS BIG DATA ENGINEER SEPT 2023 - PRESENT

CGI, Lafayette, Louisiana

Responsibilities

Developing Python scripts to create IAM roles, S3 Buckets, EMR Clusters, GLUE ETL jobs, GLUE Crawlers, and Lambda functions in Amazon Web Services (AWS) for Big Data applications.

Designing and orchestrating complex data workflows using AWS Step Functions. Created and managed state machines to automate and streamline data processing pipelines.

Implemented serverless computing using AWS Lambda functions to execute code in response to Step Function state transitions.

Orchestrated Lambda functions within Step Functions to trigger data processing tasks, ensuring efficient and scalable execution.

Partnered with cross-functional stakeholders to capture evolving data needs and translate them into scalable architectural blueprints.

Engineered resilient infrastructure to support analytics workloads, including ingestion, transformation, and real-time monitoring pipelines.

Designed modular data solutions encompassing pipelines, semantic models, and workflow automation for high-throughput environments.

Built and maintained hybrid data platforms—data lakes, warehouses, and Lakehouses—to accommodate structured and unstructured datasets.

Developed custom scripts and automation routines to streamline data engineering tasks and accelerate delivery cycles.

Conducted performance audits across data systems, identifying bottlenecks and implementing optimizations to enhance throughput and reliability.

Provisioned infrastructure for model deployment and monitoring, enabling seamless integration of analytics into production systems.

Created internal libraries and utilities to support model lifecycle management and reproducible experimentation.

Automated ML workflows using DevSecOps principles, embedding security checks and compliance gates into CI/CD pipelines.

Collaborated with DevOps teams to build integrated solutions that unify data ingestion, transformation, and model deployment pipelines.

Designed and managed APIs and data flows between upstream systems, ensuring seamless communication and data fidelity.

Diagnosed and resolved issues related to data latency, schema drift, and integration failures across distributed systems.

Facilitated integration planning with both technical and business teams, ensuring alignment on data delivery and system interoperability.

Authored detailed documentation for data services, APIs, and infrastructure components to support maintainability and onboarding.

Worked closely with IT and compliance teams to align infrastructure with governance policies and evolving business requirements.

Instituted rigorous data validation and integrity checks across pipelines to uphold enterprise data quality standards.

Tracked project milestones and deliverables using tools like Jira, Confluence, and Azure DevOps to ensure timely execution and visibility.

Integrated Azure Functions and APIs to support event-driven data workflows and external system connectivity.

Developing solutions and offering guidance on code development, integration, and maintenance of Big Data Analytic solutions and application systems to support business needs and meet IT standards.

Maintaining 24x7 highly critical business applications running in a Hadoop production environment.

Leveraging DevOps techniques and practices such as Continuous Integration, Continuous Deployment, Test Automation, Build Automation, and Test-Driven Development (TDD) to enable rapid delivery of end-user capabilities.

Azure Data Engineer May 2021- June2023

Mercedes Benz

Responsibilities

Implemented robust monitoring and alerting frameworks using Azure Monitor, Application Insights, and Log Analytics to proactively identify performance degradation and system anomalies—resulting in improved uptime and faster incident response.

Optimized cloud resource utilization, implementing autoscaling policies and resource tagging strategies, reducing infrastructure costs by up to 30 while maintaining high availability.

Led infrastructure modernization, transitioning legacy applications and environments to ARM template-based deployments with immutable infrastructure practices, ensuring consistency across production and staging environments.

Enforced security and governance standards, implementing role-based access control (RBAC), Azure Key Vault integrations, and CI/CD pipeline security checks, supporting compliance and reducing vulnerabilities.

Streamlined application migrations, leveraging scripts and automated tooling to shift on-prem workloads to Azure VMs and AKS clusters, achieving 90% migration success with zero downtime for critical services.

Automated financial operations workflows, such as escrow management, investor reporting, and receivables processing, improving accuracy and turnaround time.

Engineered reliable, end-to-end data flows using Spark SQL and PySpark notebooks within Databricks, supporting scalable transformation and analytics pipelines.

Orchestrated and scheduled batch and streaming jobs using Azure Data Factory, ensuring timely and fault-tolerant data delivery across environments.

Developed and maintained dbt models to standardize transformations, enforce documentation, and promote modular, testable data workflows.

Implemented data cataloging, lineage tracking, and governance practices aligned with pharma-compliance requirements, ensuring traceability and regulatory adherence.

Collaborated with analysts to optimize queries and refine schemas, driving consistency in metrics and enabling a unified source of truth across dashboards.

Actively contributed to agile ceremonies, including backlog refinement, sprint reviews, and retrospectives, championing a culture of data craftsmanship and continuous improvement.

Established self-service developer platforms, creating reusable deployment modules and dashboards, boosting developer productivity by approximately 20% and reducing dependency bottlenecks.

Cultivated DevOps best practices, championing documentation, wiki usage, and YAML pipelines, enabling team-wide visibility into build processes and improving onboarding times.

Mentored junior engineers, providing guidance on IaC, CI/CD, and AKS patterns, helping strengthen team capabilities and expanding cross-functional expertise.

I accelerated delivery by approximately 50–75, enhanced reliability with a 60% reduction in deployment failures, and enabled scalable operations while significantly cutting deployment time, errors, problem-solving, communication, and operational bottlenecks.

DATA ENGINEER MAY 2019-DEC 2019

Creatick Solutions

Responsibilities

Managed Amazon Web Services (AWS) like EC2, S3, RDS, EMR, IAM, and CloudFormation to build scalable data lake and analytics ecosystems, automating deployments using Python (boto3), Ansible, and CI/CD pipelines.

Designed and automated data ingestion pipelines from diverse sources (databases, APIs, flat files, streaming) into Snowflake using AWS Glue, Lambda, and Snowpipe for real-time and batch loads.

Architected ingestion pipelines in GCP to process unstructured policy documents, sentiment data, and federal energy records using scalable cloud-native components.

Leveraged Dagster and Cloud Functions to orchestrate ETL workflows tailored for climate-related data aggregation and multi-resolution analysis.

Refined BigQuery table structures and query logic to support dynamic policy evaluation and trend identification in renewable energy initiatives.

Established secure API connections with municipal and federal data sources, enabling real-time synchronization and enrichment of climate datasets.

Built analytical models in BigQuery to quantify policy impact, assess replicability across regions, and evaluate energy infrastructure feasibility.

Created automated dashboards and narrative-driven reports to visualize climate data insights and support strategic decision-making.

Applied Retrieval-Augmented Generation (RAG) techniques to enhance policy document search and energy site recommendation systems.

Conducted exploratory analysis to detect correlations between community sentiment and renewable policy adoption rates.

Participated in the design of RAG-based architectures to enable flexible adaptation and domain-specific customization.

Instituted governance protocols and validation checks to ensure integrity and reliability of datasets sourced from public institutions.

Developed health checks and alerting mechanisms to monitor pipeline performance and ensure accurate policy tracking.

Engaged in fast-paced development cycles aligned with climate advocacy goals, contributing to iterative improvements and stakeholder feedback loops.

Integrated Apache Kafka with Snowflake using Kafka Connect and Snowpipe to enable real-time data streaming and ingestion of high-volume event data.

Built streaming pipelines with Kafka topics to capture, process, and deliver near real-time insights, supporting use cases like customer behavior analytics and operational monitoring.

Built database schemas and pipelines tailored for financial and real estate datasets, enabling consistent reporting and analytics across business units.

Collaborated with research, product, and operations teams to enhance analytical toolkits, streamline workflows, and deliver actionable business insights.

Conducted data analysis on large aggregated datasets (e.g., demographic, geographic) to support strategic planning and improve decision-making.

Partnered with data scientists and analytics teams to deliver reusable data tools and self-service solutions, accelerating ML model development and experimentation.

Collaborated with data scientists and analysts to deliver tailored solutions for advanced analytics, reporting, and machine learning use cases.

Utilized R for statistical analysis and data modeling to complement Python and SQL-based workflows (nice-to-have skill).

Researched and adopted emerging Databricks and Azure capabilities to drive continuous improvement in pipeline performance and scalability.

Education:

Masters in Computer science - University of central Missouri 2025

PROFESSIONAL SUMMARY

AWS Data Engineer and Cloud/DevOps specialist with 4 years of hands-on experience designing and deploying production-grade data pipelines and infrastructure automation, data architecture and integration. Proficient in orchestrating ETL/ELT workflows using AWS Glue, Lambda, Step Functions, EMR/Spark, S3, IAM, and Terraform or CloudFormation.

Expert in building real time and serverless workflows with robust monitoring, security, and cost optimization measures. Delivered measurable impact—reducing latency, improving resilience, and enabling scalable analytics for clients like CGI and Mercedes Benz. Ready to bring end to end ownership of cloud data solutions to new opportunities.

Coordinated with clients, development, and testing teams, finalizing business requirements, technical design, development, SIT, and UAT, resulting in delivery of requirements without defects.

Designed and deployed applications using the AWS stack (including EC2, Route53, S3, ELB, EBS, VPC, RDS, DynamoDB, SNS, SQS, IAM, KMS, Lambda, Kinesis), focusing on high availability, fault tolerance, and auto-scaling in AWS CloudFormation, Ops Works, and security practices (IAM, CloudWatch, CloudTrail).

Built data pipeline automation using AWS Glue to automate the ETL/ELT (Extract, Transform, Load) process Airflow, streamlining data ingestion and ensuring data quality.

Developed serverless data processing solutions using AWS Lambda, integrating with other AWS services for efficient and cost-effective operations.

Orchestrated complex data workflows using AWS Step Functions, enabling scalable and reliable data processing.

Managed and optimized EMR clusters for big data processing, including data ingestion, transformation, and analysis, while adhering to best practices for scalability and performance.

Proficient in implementing IAM policies and access controls to secure data assets, with experience in monitoring and auditing using AWS CloudWatch for compliance and security.

Expert in server builds, installs, upgrades, patches, configuration, and performance tuning in Red Hat Linux and VMware environments.

Experienced in Linux administration, including network-based installation, RAID, LVMs, disk quotas, and configuration of DHCP, DNS, NTP, iptables, user/group administration, Nagios, and proxy servers.

Contact this candidate