THARUN KATTULA
DATA ENGINEER
+1-859-***-**** **********@*****.*** www.linkedin.com/in/tharunk9 PROFILE SUMMARY
Data Engineer with 4+ years of experience building scalable data solutions across AWS, Azure, and GCP. Proficient in big data processing (Spark, Kafka, Flink), ETL pipelines, and SQL-based data warehousing. Skilled in Python, Java and CI/CD automation with Terraform, Docker and Airflow, ensuring secure and high-performance data ecosystems. Strong understanding of Object-Oriented Design (OOP) principles for scalable and modular development. Experienced in performance tuning, unit testing, real-time analytics, BI tools (Power BI, Tableau, Looker), and cloud infrastructure for data- driven decision-making.
TECHNICAL SKILLS
Cloud & Data Platforms
• AWS: S3, Redshift, Lambda, EMR, EC2, AWS Glue Studio, Kinesis, AWS Glue Data Catalog, QuickSight, Athena.
• Azure: Azure Data Lake, Synapse Analytics, Databricks, Azure ML Studio, Azure Data Factory.
• GCP: BigQuery, Cloud Dataflow, Cloud Storage, Vertex AI, Dataproc, Data Catalog, Analytics Hub.
• Data Warehousing: Amazon Redshift, Google BigQuery, Snowflake, Vertica, Teradata, Delta Lake. Databases & Query Optimization
• Relational Databases: MySQL, PostgreSQL, SQL Server, Oracle, DB2, Netezza.
• NoSQL Databases: MongoDB, Cassandra, HBase, DynamoDB.
• SQL Expertise: Advanced Query Optimization, T-SQL, PL/SQL, UDF, Stored Procedures. Big Data & Streaming Technologies
• Big Data Processing: Apache Spark, PySpark, Apache Flink, Apache Hadoop, Apache Beam, HDFS, Hue, Impala.
• Streaming & Messaging: Kafka, Kinesis, NATS.
• Distributed Computing: MapReduce, Apache Zookeeper, Cloudera Manager.
• Data Integration & Metadata Management: Databricks, Debezium (CDC), Apache Airflow, Apache NiFi, Airbyte, Atlan, HCatalog.
Data Engineering & ETL Tools
• ETL & Data Pipelines: Informatica, AWS Glue, Talend, DBT, Dataform.
• Data Governance & Processing: Apache Sqoop, Apache Flume, Impala, Hue, Cloudera Manager, Kerberos. DevOps, CI/CD & Infrastructure Automation
• Infrastructure as Code (IaC) & Orchestration: Terraform, Docker, Kubernetes, Argo, Tekton.
• CI/CD & Automation: GitHub Actions, Gitlab CI, Jenkins, Tekton Pipelines, JIRA, Azure DevOps, Agile, Scrum.
• Version Control & Build Tools: GIT, Maven, SBT, CBT. Programming, AI & Visualization
• Programming Languages: Python, C, Java, Scala.
• Machine Learning & AI: TensorFlow, Scikit-learn, PyTorch, GCP AI Platform, Vertex AI, Azure ML Studio.
• BI & Data Visualization: Tableau, Looker, Power BI, Apache Superset.
• Web & Application Servers: Apache Tomcat, WebLogic, WebSphere. PROFESSIONAL EXPERIENCE
Tempus AI Chicago, USA
Azure Data Engineer Feb 2024 - Present
• Launched a scalable Azure-based data lakehouse integrating Azure Data Lake, Databricks, and PySpark, centralizing terabytes of healthcare data, expediting analytics.
• Directed a real-time streaming analytics initiative, leveraging Kafka, NATS & Azure Stream Analytics, slashing data latency by 40% and enhancing healthcare decision-making speed.
• Engineered an end-to-end automated CI/CD pipeline, deploying Azure DevOps, Terraform, and ARM Templates, integrating unit and integration testing frameworks to enhance deployment reliability
• Formalized robust security frameworks by integrating Azure Key Vault, Cosmos DB, and Data Encryption, safeguarding patient data and achieving HIPAA compliance.
• Charted distributed data processing strategies, harnessing Spark clusters & Hadoop, expediting batch jobs, enabling real-time patient analytics and predictive modelling
• Developed containerized data pipelines with Azure Kubernetes Service (AKS) for scalable deployment. Environment: Azure Data Lake, Azure Stream Analytics, Azure DevOps, Delta Lake, CI/CD, Azure SQL, PySpark, Scala, Git, Azure Cosmos DB, Python, AKS, Azure Data Factory, Terraform, ARM Templates, Kafka, Hadoop, Spark, Databricks. Alliance Bernstein Nashville, USA
Cloud Data Engineer July 2023 - Jan 2024
• Initiated and led the deployment of a cloud-native data pipeline, integrating AWS Glue, Spark, and RDS, accelerating data ingestion and reducing annual cloud costs.
• Engineered a real-time financial data streaming solution, capitalizing on AWS Kinesis, NATS & Lambda, slashing processing delays by 50% and empowering portfolio risk assessment.
• Expedited big data analytics workflows, leveraging Amazon EMR & Hadoop, optimizing Spark jobs, and tuning complex SQL queries using indexing, partitioning, and query refactoring to reduce query latency.
• Facilitated CI/CD pipeline automation, deploying Terraform, AWS Lambda, and GitHub Actions, centralizing DevOps workflows, and expanding cloud automation efficiency.
• Influenced AWS security best practices, implementing IAM roles, KMS encryption, and Glue Data Catalog, advancing compliance with financial regulations (GDPR, SEC, PCI-DSS).
• Capitalized on S3 intelligent tiering, reducing storage costs by dynamically archiving low-access data. Environment: AWS Kinesis, Amazon EMR, AWS Lambda, Python, CI/CD, Terraform, Apache Hadoop, Apache Spark, Amazon S3, Amazon RDS, AWS Glue Studio, Informatica, Talend. Marico Ltd Bangalore, India
GCP Data Engineer July 2021 – Nov 2022
• Formalized a high-speed data analytics platform, architecting Google BigQuery-based solutions, accelerating complex queries, and enabling real-time marketing insights.
• Charted an advanced CRM integration strategy, engineering Cloud Pub/Sub & API Gateway pipelines, expediting CRM data processing for improved customer engagement tracking.
• Engineered machine learning pipelines, cultivating predictive models with Vertex AI, TensorFlow, and Scikit-learn, advancing customer segmentation accuracy.
• Expedited ETL batch processes, optimizing Apache Spark & Presto jobs, reducing processing times by 60%, facilitating near-instant decision-making for supply chain analytics.
• Enabled interactive reporting dashboards, deploying Google Data Studio, centralizing real-time insights for sales, finance, and operations teams.
• Consolidated data governance policies, leveraging GCP Data Catalog & security frameworks, fortifying GDPR compliance & enterprise-wide data observability.
Environment: Google Cloud Platform, Google BigQuery, CRM, Cloud Pub/Sub, Google Data Studio, Apache Spark, GCP Machine Learning, Vertex AI, Scikit-learn, TensorFlow, Apache Hive, Presto, Looker, DBT, Dataform. Citibank Bangalore, India
Data Engineer Jan 2020 – May 2021
• Advanced Citibank’s enterprise-wide ETL strategy, capitalizing on Apache Spark & Hadoop, expediting transaction processing workflows and enhancing fraud detection.
• Engineered high-availability data pipelines, leveraging SQL Server, PostgreSQL, MySQL, optimizing data retrieval, debugging SQL query bottlenecks, and reducing query execution time.
• Pioneered a scalable data warehousing solution, architecting Snowflake, Redshift, and Teradata, centralizing data access across business units and streamlining BI reporting.
• Administered real-time data pipelines, deploying Apache NiFi, centralizing ingestion workflows, and reducing manual intervention.
• Cultivated advanced financial analytics dashboards, designing Power BI & Tableau solutions, empowering executives with real-time risk assessment insights.
• Orchestrated enterprise-wide data security enhancements, implementing GDPR-compliant data governance policies, enabling seamless regulatory compliance.
Environment: ETL, SQL Server, PostgreSQL, MySQL, Python, Apache Spark, Hadoop, Snowflake, Redshift, Teradata, Apache NiFi, Power BI, Tableau, Azure, Cloudera Manager, HDFS, Hue, Impala, Kerberos. EDUCATION
Wright State University, Masters in Computer Science.