BALAKRISHNA MEKALA
Northfield, Illinois USA
331-***-**** ************@*****.***
Objective
Seasoned Data Engineer with 8 years of extensive experience in designing, implementing, and optimizing data solutions using AWS, Azure, and GCP tools. Proven track record of leveraging cloud platforms to manage large-scale data pipelines, ensure data integrity, and enhance performance.
Work Experience
Medline Industries Azure Data Engineer Oct 2023 - Present
Objective: Medline Industries, LP is an American private healthcare company. I design, build, and maintain scalable ETL/ELT pipelines using Azure Data Factory (ADF) and Azure Synapse Analytics. Automate data ingestion and transformation processes for structured, semi-structured, and unstructured data.
•Designed and deployed scalable ETL/ELT pipelines using Azure Data Factory, Databricks, and Snowflake (GCP), improving data processing speed by 30% for structured and semi-structured data.
•Led the cross-cloud migration of SAP data warehouse to Azure Synapse and Snowflake (GCP), integrating SAS-based metrics and enabling enterprise-wide reporting.
•Developed real-time data pipelines between Azure Synapse and Snowflake (GCP), improving availability and decision-making across business units by 40%.
•Engineered scalable ETL pipelines using Python and Snowflake, processing structured and semi-structured data and improving query performance by 30% through optimized SnowSQL and Hive queries.
•Automated CI/CD workflows by integrating T-SQL with Azure DevOps, reducing deployment time by 25%, and optimized Spark (Java/Scala) jobs to improve data pipeline reliability.
•Designed and optimized cloud data warehouses using Snowflake and Azure Synapse, enabling 99.9% data availability and reducing query execution time by 40%.
•Engineered SQL and NoSQL databases (DynamoDB, Cassandra), ensuring ANSI SQL compliance and improving data retrieval speed by 35%.
•Developed complex SQL queries, optimizing performance and reducing query execution time by 50%. Designed and optimized SQL and NoSQL databases, ensuring compliance with ANSI SQL standards.
•Built Spark-Streaming applications to process KaLa topic data and store results in HBase, reducing stream latency by 20%. Developed optimized Databricks PySpark jobs for table-to-table transformations.
•Developed custom aggregate functions using Spark SQL and performed interactive querying.
•Integrated non-relational databases (DynamoDB, Cassandra) into data pipelines, supporting both structured and unstructured data needs.
•Developed and deployed machine learning pipelines with Sklearn, TensorFlow, and MLFlow, improving model scalability and automation improving model training time by 15%.
•Developed and deployed RESTful APIs, enabling cross-platform data access and reducing manual reporting efforts by 20%.
•Fetched Twitter feeds for a certain important keyword using the Python-twitter library.
•Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support. Managed large datasets using Panda data frames and SQL.
•Secured sensitive credentials by implementing Azure Key Vault with ADF and Databricks, enhancing data security and compliance.
•Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.
•Implemented Azure data lake, Azure Data Factory and Azure data bricks to move and conform the data from on premises to cloud to serve the analytical needs of the company. Developed container-based Docker, worked with Docker images, Docker Hub and Docker-registries and Kubernetes.
•Developed and managed CI/CD pipelines using Bitbucket and Bamboo for seamless integration and delivery of cloud infrastructure and application code.
•Integrated business intelligence tools (Power BI, Tableau), enhancing reporting and data visualization capabilities.
•Automated Automated cloud infrastructure provisioning using Terraform and AWS CloudFormation, reducing manual configuration time by 40% and ensuring secure IAM compliance.
•Developed and executed a migration strategy to move the Data Warehouse from SAP to Azure Synapse Ana- lyrics. Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (Google Cloud). Environment: Azure, Azure Synapse Analytics, CI/CD, Data Factory, Docker, Elasticsearch, ETL, Factory, Git, HBase, Hive, Java, Jenkins, Jira, Kafka, Kubernetes, lake, PySpark, Python, SAS, Scala, Snowflake, Spark, Spark SQL, SQL
•Built Snowflake data marts on GCP for financial and operations teams, reducing query latency by 20% and enhancing multi-source analytics.
•Automated infrastructure provisioning across Azure and GCP using Terraform and CloudFormation, reducing setup time by 40%.
Arthur J. Gallagher &Co. AWS Data Engineer
Oct 2022 - Sep 2023
Objective: Arthur J. Gallagher & Co. (AJG) is an American global insurance brokerage and risk management services firm. Developed the serverless workflows using AWS Lambda, Kinesis Data Streams, and Kinesis Firehose. Built and maintained CI/CD pipelines using AWS Code Pipeline, Code Build, and Code Deploy for data workflows.
•Ingested and processed high-volume datasets using Java APIs and HBase, streamlining ETL workflows and reducing data transforma- tion time by 20%.
•Automated data ingestion pipelines with AWS Lambda, PySpark, and Scala, enabling seamless integration from APIs, S3, Teradata, and Redshift, and reducing manual intervention by 30%.
•Developed MapReduce programs in Hadoop to generate automated insurance claim reports, reducing manual report preparation time by 40%.
•Built real-time streaming pipelines with Kinesis, Firehose, and Lambda, enabling near real-time analytics and improving decision-mak- ing speed by 35%.
•Transformed and optimized large datasets using AWS Glue ETL, Redshift, and S3, reducing reporting query execution time by 25%.
•Automated and monitored AWS infrastructure with Terraform for high availability and reliability, reducing infrastructure management time by 90% and improving system uptime.
•Led full SDLC for data engineering projects, ensuring 100% on-time delivery of scalable data pipelines.
•Optimized SQL transformations using SSIS, DTS, and ANSI SQL, reducing ETL execution time by 20% and improving data quality.
•Implemented Continuous Integration Continuous Delivery (CI/CD) for end-to-end automation of release pipeline using DevOps tools like Jenkins.
•Architected and managed AWS cloud infrastructure, including S3, Redshift, EC2, Lambda, and Glue to process large datasets.
•Built and configured Jenkins slaves for parallel job execution. Installed and configured Jenkins for continuous integration and performed continuous deployments. Implemented automated Data pipelines for Data migration, ensuring a smooth and reliable transition to the Cloud environment.
•Enhanced security and access control mechanisms, using IAM policies and API Gateway authentication.
•Implemented containerized data applications, using Docker and Kubernetes to streamline deployments.
•Worked on Jenkins pipelines to run various steps including unit, integration and static analysis tools.
•Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions.
•Ensured compliance with governance frameworks by managing IAM roles and permissions for data access.
•Develop metrics based on SAS scripts on legacy system, migrating metrics to Snowflake (AWS S3). Environment: APIs, AWS, CI/CD, Elasticsearch, ETL, HBase, Java, Jenkins, Kafka, lake, Lambda, MapR, PySpark, Redshift, S3, SAS, Scala, Services, Snowflake, Spark, Spark Streaming, SQL, SSIS, Teradata
•Built real-time streaming pipelines using AWS Kinesis, Firehose, and Lambda, reducing decision latency by 35%.
•Developed scalable ETL workflows with AWS Glue, Redshift, and PySpark, optimizing transformation time by 25%.
•Implemented full CI/CD automation using CodePipeline, CodeBuild, and Jenkins, reducing deployment errors by 40%.
•Integrated Qlik Replicate to stream transactional data into AWS S3, supporting near-real-time analytics.
(Wipro) Capital One GCP Data Engineer Jan 2020 - Aug 2021
Objective: Capital One Financial Corporation is an American bank holding company. Organized, optimized, and secured data storage in Google Cloud Storage (GCS). Managed and optimized databases such as BigQuery, Cloud SQL, Firestore, or Spanner.
•Using Spark, performed various transformations and actions and the result data is saved back to HDFS from there to target database Snowflake.
•Ingested and processed data in Azure Data Lake, Azure SQL, and Databricks, enabling seamless cloud data integration.
•Created Data Studio billing reports, optimizing query performance and reducing cloud costs by 15%
•Migrated Talend workflows across environments with zero downtime, improving operational efficiency by 20%.
•Created ELT pipelines with dbt, optimizing query performance and reducing processing costs. Implemented API-based data ingestion workflows, integrating external financial data sources.
•Designed and optimized data pipelines using BigQuery, Dataproc, and Pub/Sub on GCP, enabling real-time analytics and improving reporting speed by 35%.
•Created Databricks Job workflows which extracts data from SQL Server and upload the files to SFTP using PySpark and Python. Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery.
•Good knowledge in using Cloud Shell for various tasks and deploying services. Experienced in adjusting the performance of Spark applications for the proper batch interval time, parallelism level, and memory tuning.
•Used different Python libraries like BEAUTIFUL SOUP to perform data extraction from the website
•Experienced in Google Cloud components, Google container builders and GCP client libraries and Cloud SDK'S.
•Well versed with various aspects of ETL processes used in loading and updating Oracle data warehouse.
•Built a common SFTP download or upload framework using Azure Data Factory and Databricks.
•Enhanced reporting dashboards with Power BI and Google Data Studio, reducing manual reporting effort by 25% and improving financial decision-making speed.
•Managed production Linux environments, ensuring system stability and performance.
•Built real-time streaming pipelines using Pub/Sub, Cloud Dataflow, and BigQuery, processing 2M+ events daily with sub-second latency.
•Used Apache Airflow in GCP Composer environment to build data pipelines and used various airflow operators like bash operators, Hadoop operators and Python callable and branching operators. Environment: Azure, Azure Data Lake, BEAUTIFUL, BigQuery, Bricks, Cluster, Data Factory, EC2, EMR, ETL, Factory, GCP, HDFS, lake, Lake, Oracle, PySpark, Python, S3, SDK, Services, Snowflake, SOUP, Spark, SQL, VPC
•Built real-time streaming pipelines using Pub/Sub, Dataflow, and BigQuery, processing over 2 million events/day.
•Designed BigQuery data models with clustering and partitioning, improving performance and reducing cost by 30%.
•Automated ELT workflows using dbt, Dataform, and Cloud Composer, reducing manual overhead by 40%.
Onida Electronics Data Engineer Jun 2017 - Dec 2019
Objective: Onida Electronics is an Indian multinational electronics and home appliances manufacturing company. Integrated the data from various sources, such as on-premises databases, third-party APIs, and cloud-native systems. Automated the deployment and testing of data pipelines and analytics models.
•Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop. Implemented AJAX, JSON, and Java script to create interactive web screens.
•Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
•Assess the infrastructure needs for each application and deploy it on Azure platform.Configured Spark streaming to get ongoing information from the KaLa and store the stream information to DBFS.
•Migrated large datasets from ADLS Gen2 to Databricks using ADF pipelines, improving data processing speed by 35% and optimizing cluster resource usage.
•Managed relational database services in which the Azure SQL handles reliability, scaling, and maintenance. Integrated data storage solutions. Designed and implemented Infrastructure as code using Terraform, enabling automated provisioning and scaling of cloud resources on Azure.
•Designed business intelligence dashboards, using Tableau and Power BI to visualize key research insights.
•Implemented security best practices, ensuring HIPAA compliance and data privacy.
•Designing the business requirement collection approach based on the project scope and SDLC methodology.
•Developing Spark scripts, UDFS using both Spark DSL and Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop. Used PySpark, Apache Flink, KaLa, and Hive on a distributed Hadoop Cluster to contribute to the creation of real-time streaming applications.
•Engineered data ingestion pipelines using ADF, Spark SQL, and U-SQL, improving Azure DW loading efficiency by 30%.
•Successfully completed a POC for Azure implementation, with the larger goal of migrating on-premises servers and data to the cloud. Built PySpark code in validate the data from raw source to Snowflake tables.
•Led Agile development efforts, collaborating with cross-functional teams to enhance data accessibility.
•Designed and deployed a Kubernetes-based containerized infrastructure for data processing and analytics, leading to a 20% increase in data processing capacity. Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Environment: Azure, Azure Data Lake, CI/CD, Cluster, Data Factory, EMR, Factory, HDFS, Hive, Java, JS, Kafka, Kubernetes, MapR, Oozie, Oracle, PySpark, RDBMS, Redshift, S3, Services, Snowflake, Spark, Spark SQL
•Migrated analytics workloads to Azure Databricks from on-prem Hadoop, improving query execution by 35%.
•Built Spark Streaming pipelines using Kafka and Flink, supporting real-time product monitoring.
•Developed batch ingestion from Oracle and Teradata into HDFS using Sqoop; automated processing with Oozie and Hive.
Technical Skills
•Technical Skills: Microsoft Azure Cloud exposure, Azure Databricks, Azure data factory, Azure Synapse analytics/SQL (Data
• warehouse), Logic apps, Azure data lake, Azure Analysis services, Azure Key Vault services
•Databases: Snowflake, MySQL, Teradata, Oracle, MS SQL SERVER, PostgreSQL, DB2, MongoDB, Cassandra
•Big Data Ecosystem: Hadoop Map Reduce, Impala, HDFS, Hive, Pig, HBase, Flume, Storm, Sqoop, Oozie, Kafka, Spark, Zookeeper
•AWS: EC2, Amazon S3, ECS, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, CloudFront, Auto Scaling, CloudWatch, Redshift, kinesis, FireHose, Lambda, EMR
•Machine Learning: Linear Regression, Logistic Regression, Decision Tree, Random Forest, SVM, XGBoost, PCA, Naïve Bayes, K-Means, LDA, Neural Network, KNN
•Continues integration / Containerization: Jenkins, Docker, Kubernetes
•Version Control: Git, SVN, Bitbucket
•Operating Systems: Windows, Linux, Mac OS
•IDE Tools: Eclipse, PyCharm, Jupyter
•Business Intelligence & Reporting: Power BI, Tableau, Google Data Studio
•Google Cloud Platform: GCP Composer, BigQuery, GCS, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Dataflow, Cloud Data fusion, Cloud Pub/Sub
Education
Lindsey Wilson College, USA Masters, Computer Science (Jan 2022- April 2023)