HARINI VALLABHANENI
Data Engineer
Ph: +1-972-***-**** Email: *************@*****.***
LinkedIn: www.linkedin.com/in/harini-v-582b1b358
PROFESSIONAL SUMMARY
Over 4+ years of experience in data engineering, analytics, and business intelligence across banking and consulting domains.
Expertise in designing and implementing scalable data pipelines using Apache Spark, Kafka, and cloud platforms
(AWS, Azure, GCP).
Proficient in ETL/ELT processes, data warehousing, and real-time streaming architectures for large-scale data processing.
Strong experience in Python, SQL, Scala programming languages with advanced knowledge of big data technologies.
Skilled in data modeling, schema design, and optimization techniques for data lakes and data warehouse solutions.
Hands-on experience with cloud-native services including AWS EMR, Redshift, S3, Azure Data Factory, and Google BigQuery.
Experienced in Google Cloud Platform services including BigQuery, Dataflow, Cloud Composer, and Pub/Sub for enterprise-grade data solutions.
Expertise in implementing CI/CD pipelines using Jenkins, Git, Docker, and Kubernetes for data engineering workflows.
Proficient in data visualization tools including Tableau, Power BI, and creating executive dashboards for business insights.
Strong background in data governance, quality assurance, and compliance frameworks in regulated industries.
Excellent communication skills, problem solving, adaptability, and team player passionate to learn innovative technologies.
TECHNICAL SKILLS
Big Data Technologies: Apache Spark, Kafka, Hadoop, HDFS, MapReduce, Apache Airflow, Apache NiFi
Cloud Platforms: AWS (EMR, S3, Redshift, Lambda, Glue, Kinesis), Azure (Data Factory, Synapse, Databricks), GCP
(BigQuery, Dataflow, Pub/Sub)
Databases: PostgreSQL, MySQL, Oracle, MongoDB, Cassandra, DynamoDB, Snowflake, Teradata
Data Warehousing: Amazon Redshift, Azure Synapse Analytics, Google BigQuery, Snowflake, Apache Hive
ETL/ELT Tools: Apache Airflow, Talend, Informatica, SSIS, AWS Glue, Azure Data Factory, Pentaho
Streaming Technologies: Apache Kafka, Amazon Kinesis, Azure Event Hubs, Apache Storm, Spark Streaming
Programming Languages: Python, SQL, Scala, Java, R, Shell Scripting, PySpark, HiveQL
Containerization & Orchestration: Docker, Kubernetes, Apache Mesos, Docker Compose
Version Control & CI/CD: Git, Jenkins, GitHub Actions, Azure DevOps, GitLab CI, Bamboo
Data Visualization: Tableau, Power BI, Looker, Grafana, Apache Superset, QlikView, Google Data Studio, Looker Studio
Operating Systems: Windows 10, Windows 11, Windows Server, Linux (Ubuntu, CentOS), macOS
Monitoring & Testing: Apache Kafka Manager, Prometheus, Grafana, Great Expectations, pytest, unittest
Development Frameworks: Apache Beam, Dask, Pandas, NumPy, Scikit-learn, TensorFlow, Apache Zeppelin PROFESSIONAL WORK EXPERIENCE
Citigroup Data Engineer New York, NY Nov 2023 - Present Responsibilities:
Architected and implemented end-to-end data pipelines processing 500TB+ daily transaction data using Apache Spark and AWS EMR, reducing processing time by 45%.
Designed real-time streaming solutions using Apache Kafka and AWS Kinesis for fraud detection systems, improving detection accuracy by 32%.
Developed scalable ETL workflows using Apache Airflow and AWS Glue, automating data ingestion from 200+ source systems.
Optimized data warehouse performance on Amazon Redshift through partitioning and indexing strategies, achieving 60% query performance improvement.
Implemented data quality frameworks using Great Expectations and custom validation rules, ensuring 99.9% data accuracy.
Established automated data validation pipelines using Google Cloud Composer (Apache Airflow) for orchestrating complex data quality workflows across multi-cloud environments.
Built automated CI/CD pipelines using Jenkins and Docker for data engineering workflows, reducing deployment time by 70%.
Created comprehensive data lineage and metadata management solutions using Apache Atlas and AWS Data Catalog.
Developed Python-based data transformation modules for regulatory reporting, ensuring compliance with financial regulations.
Deployed comprehensive monitoring solutions using Google Cloud Operations Suite for real-time pipeline monitoring and alerting across GCP resources.
Established monitoring and alerting systems using CloudWatch and Grafana for proactive pipeline health management.
Collaborated with cross-functional teams to deliver data solutions for risk management and customer analytics initiatives.
Migrated legacy COBOL-based data processing systems to modern cloud-native architecture using AWS services.
Mentored junior engineers on best practices for data engineering and cloud technologies implementation. Environment: Python, PySpark, Apache Kafka, AWS (EMR, S3, Redshift, Glue, Lambda), GCP, Apache Airflow, Jenkins, Docker, PostgreSQL, Git, Tableau
Deloitte Data Analyst/Data Engineer India May 2020 - March 2023 Responsibilities:
Engineered robust data pipelines using Apache Spark and Azure Data Factory, processing 100TB+ monthly client data with 25% efficiency improvement.
Developed comprehensive ETL processes using SSIS and Talend, integrating data from diverse sources including SAP, Oracle, and flat files.
Implemented data lake architecture on Azure using Databricks and Delta Lake, enabling advanced analytics for multiple client engagements.
Created interactive dashboards and reports using Power BI and Tableau, delivering actionable insights to C-level executives.
Optimized SQL queries and stored procedures for data extraction and transformation, reducing execution time by 40%.
Designed dimensional data models and star schema for data warehouse solutions supporting business intelligence requirements.
Automated data quality checks and validation processes using Python scripts, ensuring 99.5% data integrity across projects.
Collaborated with business analysts to translate requirements into technical specifications for data solutions.
Maintained version control using Git and implemented code review processes for data engineering team.
Performed data profiling and analysis to identify trends, patterns, and anomalies for client business decisions. Environment: Python, SQL, Apache Spark, Azure (Data Factory, Databricks, Synapse), Power BI, Tableau, SSIS, Talend, Oracle, SQL Server, Git, Excel
EDUCATION
Master of Science in Business Analytics from East Texas A&M University - Commerce, TX, USA.
Bachelor of Technology in Electronics and Communication Engineering from JNTU-K SASI Institute of Technology, India.