Alekhya Kondrakunta
+1-937-***-**** ******************@*****.***
PROFILE SUMMARY
Data engineer with 5+ years of experience, offering hands-on expertise in Linux OS and cloud infrastructures. Proven ability in troubleshooting complex systems and managing data pipelines, with a strong interest in hardware and networking. Passionate about integrating robust, scalable solutions while collaborating effectively to solve data center challenges.
TECHNICAL SKILLS
•Data Storage: Data Lakes, Data Warehousing, SQL & NoSQL Databases, Cloud Storage
•Data Processing: ETL/ELT Development, Data Pipelines, Batch & Stream Processing, Data Transformation
•Big Data Technologies: Hadoop, Spark, Distributed Computing, Large-scale Data Processing
•Cloud Platforms: Azure, AWS, Google Cloud Platform (GCP)
•Programming Languages: Python, SQL, Java, Scala
•ETL/ELT Tools: Apache NiFi, Informatica, Talend, Apache Airflow, Azure Data Factory, AWS Glue
•Analytics & Reporting: Data Visualization, BI Tools (Tableau, Power BI), Descriptive & Predictive Analytics
•Orchestration: Workflow Automation, Job Scheduling, Data Pipeline Orchestration
•Machine Learning & Artificial Intelligence: Model Deployment, Feature Engineering, Integration with Data Pipelines, Tensorflow, PyTorch, Decision Trees and Random Forests, NLP, SVM, Ethical AI, AI Safety
•DevOps & CI/CD: Version Control (Git), Continuous Integration/Continuous Deployment (CI/CD), Infrastructure as Code (IaC)
•Networking: Data Integration, API Management, Cloud Networking, Data Migration
•Data Quality: Data Profiling, Data Cleansing, Data Validation, Data Lineage
•Data Warehousing: OLAP, Dimensional Modelling, Schema Design, Data Marts
•Google Cloud Platform (GCP): BigQuery, Cloud Storage, Cloud Composer, Pub/Sub, Google Dataform, Cloud Data Fusion, Cloud SQL, Dataproc, Dataflow, GCP REST API Integration
•Database/DBMS: BigQuery, Azure SQL, Amazon Redshift, SQL Server, Oracle, HBase, Hive, DB2
•Orchestration & DevOps: Airflow, Jenkins, GitHub Actions, Cloud Composer, Jira, Confluence
•API Integration: REST APIs, SOAP Webservices, Python requests, Cloud Functions for endpoint execution
WORK EXPERIENCE
ProMedica Apr 2024 - Present
Azure Data Engineer Toledo, Ohio, USA
Designed an effective and sustainable data system for healthcare by processing and analyzing large-scale data using Azure tools, while maintaining a strong focus on infrastructure reliability and troubleshooting.
•Developed data architectures using Azure Data Lake, Azure SQL Database, and Azure Synapse Analytics to support robust healthcare data storage and analytics, and built CI/CD pipelines with Jenkins to facilitate data ingestion and visualization via Power BI for improved decision-making.
•Developed Python scripts to call external REST APIs for ingesting real-time health data; familiar with SOAP for legacy system integration.
•Managed data pipelines with Azure Data Factory to automate ETL processes, ensuring efficient data movement and transformation across healthcare systems and emphasizing data quality for clinical analysis.
•Implemented CI/CD pipelines using Azure DevOps for automated deployment and testing of healthcare data solutions, while leveraging Linux OS environments to support and troubleshoot continuous delivery in compliance with healthcare regulations.
•Utilized Python to create scripts for transforming, cleaning, and preprocessing healthcare data, integrating information from diverse clinical and administrative systems into Azure Data Lake for comprehensive analysis.
•Built scalable data pipelines with Azure services and Hadoop ecosystem tools, ensuring high-performance processing and analytics of healthcare data while maintaining an interest in data center hardware and infrastructure.
•Engineered real-time data processing applications with Spark-Streaming to consume healthcare data from Kafka topics, ensuring timely insights for operational decisions through rapid data access in HBase.
•Applied Power BI to design interactive data visualizations and reporting, enabling leadership to make informed, data-driven decisions based on real-time patient metrics and clinical performance indicators.
•Developed and maintained Microservices deployed on Docker and Kubernetes within Linux-based container environments, enhancing the scalability of patient data processing systems and ensuring high availability.
•Optimized ETL workflows using Python, SQL, and Scala to extract, transform, and load large volumes of healthcare data, effectively integrating data across clinical, operational, and financial systems.
American Financial Group Apr 2023 - Mar 2024
AWS Data Engineer Cincinnati, Ohio, USA
Designed and implemented scalable data infrastructure using AWS services such as Amazon Redshift, Amazon S3, AWS Glue, and Amazon Kinesis to manage financial data pipelines and enable real-time analytics, displaying a strong aptitude for problem-solving.
•Engineered robust data architecture solutions with AWS services and monitoring tools like CloudWatch, while incorporating Lin- ux-based troubleshooting techniques to ensure smooth data flow across financial systems.
•Formulated complex SQL queries to extract, aggregate, and analyze financial data from Amazon Redshift and other databases, supporting timely financial reporting and risk analysis.
•Developed Python scripts for transforming, cleaning, and preprocessing large data volumes, ensuring data integrity before loading into Amazon Redshift, and utilized libraries such as Pandas, NumPy, and PySpark for detailed analysis.
•Authored Terraform scripts to automate AWS infrastructure deployments, leveraging serverless architectures to enhance scalability and manage financial data processing in the cloud.
•Employed Docker to containerize financial data applications, ensuring portability and scalability in cloud environments while integrating Linux OS practices for system reliability.
•Utilized Elasticsearch and Kibana for real-time indexing and visualization of financial data, providing actionable insights for data-dri- ven decision-making.
•Enhanced Java/Scala Spark jobs to optimize large-scale financial data analytics and processing, ensuring efficient data handling for reporting and forecasting.
•Leveraged Hadoop ecosystem tools including HDFS, Hive, and MapReduce to process extensive financial datasets, ensuring high-per- formance analytics and efficient data retrieval for various reporting needs.
•Configured tools such as Sqoop, Impala, and Zookeeper to support distributed data system operations, and implemented real-time data streaming with Kafka, processed via PySpark and orchestrated using Airflow for end-to-end financial data pipelines.
Citi Bank Sep 2021 - Dec 2022
GCP Data Engineer Chennai, India
Citi Bank is one of India's leading private sector banks, offering a wide range of banking products and financial services to individuals, businesses, and corporations. I involved in development, resource allocation and maintenance, and cost-effective use of features offered by the primary cloud services and including developing a data model, maintaining a data warehouse and analytics environment, and writing scripts.
•Designed and implemented data pipelines using Google Cloud Dataflow and Cloud Composer to ingest and process large volumes of energy data from multiple sources, ensuring that data is accessible for timely analytics and reporting across various business units.
•Designed and implemented scalable data pipelines using Google Cloud Dataflow, Google Cloud Composer, and Pub/Sub to ingest and process high volumes of real-time energy and financial data.
•Utilized Google BigQuery for analytical querying and cost-effective data warehousing; developed SQL-based transformations using Google Dataform for modular, version-controlled modeling.
•Automated structured and unstructured data ingestion into BigQuery from diverse sources such as Google Cloud Storage, Cloud SQL, and REST APIs, enhancing accessibility for analytics and reporting.
•Created visual ETL pipelines in Cloud Data Fusion, integrating on-premises RDBMS sources such as SQL Server, Oracle, and DB2 with GCP services, ensuring secure and consistent data replication.
•Implemented and managed storage solutions using Google Cloud Storage for staging, archival, and raw data lake setup, enabling efficient pipeline processing and lineage.
•Developed Python-based connectors to call external RESTful APIs for transactional data ingestion, and integrated with legacy systems via SOAP webservices for backward compatibility.
•Built an event-driven architecture using Google Pub/Sub and Cloud Functions, streaming data for real-time operational dashboards and predictive analytics models.
•Managed orchestration workflows and dependencies using Cloud Composer to schedule ETL jobs, monitor failures, and trigger alerting mechanisms for critical data pipelines.
•Designed dashboards in Looker and Power BI by querying BigQuery datasets, supporting energy operations, maintenance forecasting, and executive decision-making.
•Wrote optimized SQL queries for business-critical reporting, reducing latency in querying billions of rows across partitioned and clustered tables.
•Collaborated with cross-functional teams using Jira for sprint planning, issue tracking, and user stories, and maintained technical documentation and architecture diagrams in Confluence.
•Adopted GitHub for source control, implementing version-controlled CI/CD pipelines to deploy GCP infrastructure and Dataform SQL workflows securely.
•Integrated multiple Google Cloud services (Pub/Sub, BigQuery, GCS, Cloud SQL, Vertex AI) into unified, modular pipeline architecture for scalable, fault-tolerant analytics.
•Used Terraform and GCP’s Deployment Manager to automate provisioning of infrastructure resources, optimizing cost and ensuring compliance
Technologies: GCP (BigQuery, Cloud Storage, Pub/Sub, Dataform, Cloud Composer, Data Fusion), SQL, Python, REST, SOAP, GitHub, Jira, Confluence, Looker, Power BI, Terraform
Volkswagen India Pvt. Ltd. Jun 2019 - Aug 2021
Data Engineer Bangalore, India
Volkswagen India Pvt. Ltd. is a wholly-owned subsidiary of the Volkswagen Group, one of the world's largest automobile manufacturers. optimized manufacturing processes through predictive analytics and real-time monitoring, reducing waste, enhancing quality control, and ensuring more efficient production lines and Customized offerings and enhanced interactions are made possible through customer data analysis, improving engagement, satisfaction, and loyalty.
•Designed and implemented scalable data pipelines using Apache Kafka, Apache Spark, and Azure Data Factory to process and transform large volumes of data from multiple sources and built and optimized ETL workflows for data extraction, transformation, and loading processes using Azure Synapse Analytics and Azure Data Lake, supporting data-driven decision-making in manufacturing operations, vehicle production, and supply chain management.
•Developed and deployed real-time data streaming solutions using Apache Flink, enabling instant processing of data from production lines and sensors, providing real-time insights into quality control and manufacturing efficiency.
•Implemented and managed data warehouses with Azure SQL Database, ensuring efficient data storage and fast retrieval for key business reports and dashboards used by manufacturing, engineering, and supply chain teams.
•Created and maintained CI/CD pipelines using Jenkins and GitHub Actions for automated deployment of data processing workflows, ensuring seamless integration and continuous delivery of new features for manufacturing data solutions and collaborated with business intelligence teams to integrate Power BI and Tableau for data visualization, providing actionable insights from data pipelines and supporting strategic decisions in vehicle manufacturing.
•Optimized complex SQL queries for high-performance data retrieval from Azure SQL and Google Big Query, reducing query latency and enabling faster access to critical operational data for production planning and inventory management.
•Leveraged Apache Hive and Apache HBase to manage and query large datasets within the Hadoop ecosystem, optimizing big data analytics for manufacturing performance, supply chain analytics, and vehicle production quality.
EDUCATION
Wright State University Master's,Computer Science
2023 - 2024