Data engineer

Location:

Tampa, FL

Posted:

February 26, 2026

Contact this candidate

Resume:

SUDHAKAR ANNAM

*********@*****.***

469-***-****

PROFESSIONAL SUMMARY:

Results-driven Lead Data Engineer / ETL Developer with 14+ years of experience designing, building, and leading enterprise-scale data integration and analytics solutions.

Strong expertise in ETL/ELT, Big Data, Cloud (AWS & Azure), Ab Initio, PySpark, Spark Scala, and SQL.

Proven ability to modernize legacy ETL platforms, optimize high-volume pipelines, and lead offshore/onshore teams in banking and financial services environments.

Adept at translating business requirements into scalable, secure, and high-performance data architectures.

Strong knowledge of DWH concept, Facts and dimensions tables, slowly changing dimensions and Dimensional Modeling (Star Schema and Snowflake Schema).

Experience in Requirement gathering, System analysis, handling business and technical issues & communicating with both business and technical users

Hands on experience on complete Software Development Life Cycle SDLC for the projects using methodologies like Agile and hybrid methods.

Extensive experience in developing Kafka producers and Kafka consumers for streaming millions of events per minute on streaming data using PySpark, Python & Spark Streaming.

Experience in analyzing data using Big Data Ecosystem including HDFS, Hive, HBase, Zookeeper, PIG, Sqoop.

Designed ETL using Internal/External tables and store in parquet format for eﬃciency.

Designed, developed, and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.

Written python scripts using python libraries such as pandas, NumPy that does read/write operations on large CSV ﬁles, perform data aggregations and compare data by columns.

Involved in the Agile Development process (Scrum and Sprint planning).

Created various models for predictive analysis using machine learning with Python/spark.

Developed multiple Spark jobs in Python for data cleaning and preprocessing.

Implement Data integrity checks in the load process and between database distribution layers.

Integrated Hadoop, Hive, and AWS S3 with Ab Initio to streamline data processing and analytics workflows.

Involved in Complete Software development model including waterfall and agile methodology Business Analysis to Development, Testing, Deployment and Documentation.

Hands on experience of writing wrapper shell script to initialize variables, run graphs and perform error handling

Strong experience in data analysis using SQL results, Power BI and MS Excel.

Strong experience with SQL database programming, SQL performance tuning for complex SQL queries.

Proficient in communication, analytical & problem-solving skills to handle client-facing roles & end user communications.

CORE TECHNICAL SKILLS

Data Engineering & ETL

Ab Initio (GDE, Co>Op, EME, PDL, Metadata Hub), SSIS, Informatica (exposure), AWS Glue, Azure Data Factory

Big Data & Streaming

PySpark, Spark Scala, Apache Spark, Hadoop (HDFS, Hive, Sqoop), Kafka, Spark Streaming

Programming & Scripting

Python, SQL, Shell Scripting

Cloud Platforms

AWS (S3, Redshift, Glue, Lambda), Azure (ADF)

Databases & Warehousing

Oracle, Teradata, MSSQL, Hive

Reporting & Analytics

Power BI

DevOps & Methodologies

Agile/Scrum, GitHub, CI/CD, Automation, Data Quality Frameworks

PROFESSIONAL EXPERIENCE:

Senior Technical Lead Engineer Virtusa Consulting Services – Citibank Tampa, FL

Oct 2023 – Present

Lead the design and development of large-scale Ab Initio ETL pipelines processing high-volume financial data with strict SLA requirements.

Architect and optimize complex batch and near-real-time ETL workflows, improving performance and scalability by 30%.

Own end-to-end Ab Initio infrastructure setup across Dev, QA, UAT, and Production environments.

Implement and govern Metadata Hub solutions to derive end-to-end business and technical data lineage.

Drive SSIS to Ab Initio migration initiatives, standardizing enterprise ETL frameworks and coding standards.

Modernize legacy ETL workloads by migrating to PySpark and Spark Scala, reducing batch execution time by 35%.

Design and implement Python-based automation frameworks for data quality checks, reconciliation, and monitoring.

Build and support real-time ingestion pipelines using Kafka and Spark Streaming for fraud detection use cases.

Contribute to AWS-based cloud data solutions, optimizing S3 storage, compute usage, and overall cost by 20%.

Perform code reviews, performance tuning, and root cause analysis for production issues.

Lead Agile ceremonies, mentor junior engineers, and coordinate with global onshore/offshore teams.

Performed performance tuning in Databricks (cluster sizing, caching, Z-ordering, vacuuming) to reduce job runtimes and improve query efficiency.

Developed Python scripts, SQL, T-SQL, and PL/SQL queries for automation, validation, and custom utilities.

Built and maintained data pipelines using Azure Data Factory (ADF), Databricks, Hive, Sqoop, and Spark to move, transform, and process large datasets.

Set up and managed Databricks workspaces and clusters across Dev, Test, and Prod with autoscaling, runtime tuning, and workload-specific configurations to reduce costs while improving performance.

Leverage Hadoop components like HDFS (storage) and YARN (resource management) for handling large datasets efficiently.

Worked closely with business users, architects, and offshore teams to plan tasks, deliver on time, and share knowledge.

Performed POC on Ab initio metadata hub Operational/Business lineage using SQL server

Environment: Ab initio GDE, VS Code, GitHub, Lightspeed, Pyspark, Python, Control Centre, HDFS, Hive

Senior Software Engineer Virtusa Consulting Services – Citibank Hyderabad, India

Feb 2020 – Oct 2023

Designed and developed complex Ab Initio graphs for large-scale banking and regulatory data transformations.

Analyzed business requirements and translated them into scalable ETL design specifications.

Participated in multiple proof-of-concept initiatives, including SSIS-to-Python conversion frameworks.

Built reusable ETL components and templates to improve development productivity and consistency.

Carried out data quality checks using Pyspark, Hive to make sure data matched between source and target systems.

Built and supported data pipelines using ADF, Hive, Sqoop, and Spark to move and transform large volumes of data.

Developed and optimized PySpark pipelines integrated with enterprise data platforms.

Leveraged Spark Scala for distributed joins, aggregations, and performance tuning.

Implemented data validation, reconciliation, and exception handling mechanisms.

Supported UAT and production deployments, ensuring smooth releases and minimal downtime.

Collaborated with BI teams to deliver Power BI dashboards and analytics datasets.

Actively participated in Agile/Scrum ceremonies, sprint planning, and retrospectives.

Provided production support, defect fixes, and performance optimizations.

Environment: Ab initio GDE, VS Code, GitHub, Lightspeed, SQL, SSIS, Unix Shell scripting, Python, Control Centre, HDFS, Hive, Azure Databricks

Senior Technical Lead Virtusa Consulting Services – BMO (Bank of Montreal) Hyderabad, India Feb 2018 – Jan 2020

Built reusable Ab Initio components and parameterized frameworks to improve delivery consistency.

Developed and optimized PySpark pipelines integrated with Hive for analytics use cases.

Used Spark Scala for distributed joins, aggregations, and performance tuning.

Implemented data validation, balancing, reconciliation, and exception handling mechanisms.

Supported UAT and production releases, ensuring minimal downtime.

Responsible for designing the table’s architecture to fulfil the requirement.

Responsible for trouble shooting, identifying and resolving data problems.

Experienced in Data Bricks, Hive SQL, Azure CI/CD pipeline, Delta Lake, Hadoop file system, Snowflake.

Leveraged Azure Databricks notebooks for interactive data exploration, visualization, and analysis, showcasing proﬁciency in utilizing the collaborative notebook environment for data analysis and exploration.

Created data bricks notebooks using Python (PySpark), and Spark SQL for transforming the data that is stored in Azure Data Lake stored Gen2 from Raw to Stage and Curated zones.

Developed multiple Spark jobs in Python for data cleaning and preprocessing

Environment: Ab initio GDE, VS Code, Unix Shell scripting, Python, Control Centre, HDFS, Hive, Oracle, Spark SQL, Apache Spark Data frames

Data Engineer Tata Consultancy Services – JPMorgan Chase Hyderabad, India

Jul 2015 – Feb 2018

Designed and developed Ab Initio ETL solutions handling high-volume, multi-source financial data.

Created detailed technical designs and mapping documents for ETL implementations.

Built PySpark pipelines integrated with Hive for enterprise analytics use cases.

Used Spark Scala for distributed data processing on billions of financial records.

Optimized ETL workflows through performance tuning and parallelization techniques.

Implemented data quality checks, balancing, and reconciliation reports.

Collaborated with business analysts, QA, and production support teams.

Supported daily batch operations and resolved production issues within SLA timelines.

Participated in code reviews and ensured adherence to development standards.

Assisted in data migration and platform modernization initiatives.

Environment: Ab initio GDE, Unix Shell scripting, Python, Control Centre, Oracle, Spark SQL

ETL Developer Cognizant Technology Solutions – American Express Hyderabad, India

Dec 2010 – Jul 2015

Designed and developed Ab Initio ETL workflows for enterprise data warehousing and reporting systems.

Analyzed source-to-target mappings and implemented complex transformations.

Tuned SQL queries and Ab Initio graphs to improve batch performance.

Developed reusable ETL components and parameterized frameworks.

Conducted unit testing and integration testing for ETL jobs.

Supported production deployments and resolved data-related issues.

Worked closely with onsite-offshore teams to meet Agile sprint commitments.

Ensured data accuracy, completeness, and consistency across systems.

Maintained ETL documentation and technical specifications.

Participated in performance optimization and capacity planning activities.

Well versed with various Ab Initio Parallelism techniques and implemented Ab Initio Graphs using Data Parallelism and Multi File System (MFS) techniques

Integrated Hadoop, Hive, and AWS S3 with Ab Initio to streamline data processing and analytics workflows.

Involved in Complete Software development model including waterfall and agile methodology Business Analysis to Development, Testing, Deployment and Documentation.

Hands on experience of writing wrapper shell script to initialize variables, run graphs and perform error handling

Environment: Ab initio GDE, Unix Shell scripting, Sybase, Control Centre, Oracle.

EDUCATION:

Bachelor of Computer Science Engineering

Jawaharlal Nehru Technological University (JNTU), Hyderabad – 2009

CERTIFICATIONS:

AWS Certified Solutions Architect – Associate

Ab Initio Certifications (L0, L1, L2 – Internal)

Python Certified Associate Programmer (PCAP-31-03)

Artificial Intelligence & Career Empowerment – University of Maryland

PySpark / Spark Scala (Internal – L0, L1)

AWS Cloud Assessment, Agentic AI & Gen AI Assessments (Internal)

Contact this candidate