SUDHAKAR ANNAM
*********@*****.***
PROFESSIONAL SUMMARY:
Results-driven Lead Data Engineer / ETL Developer with 14+ years of experience designing, building, and leading enterprise-scale data integration and analytics solutions.
Strong expertise in ETL/ELT, Big Data, Cloud (AWS & Azure), Ab Initio, PySpark, Spark Scala, and SQL.
Proven ability to modernize legacy ETL platforms, optimize high-volume pipelines, and lead offshore/onshore teams in banking and financial services environments.
Adept at translating business requirements into scalable, secure, and high-performance data architectures.
Strong knowledge of DWH concept, Facts and dimensions tables, slowly changing dimensions and Dimensional Modeling (Star Schema and Snowflake Schema).
Experience in Requirement gathering, System analysis, handling business and technical issues & communicating with both business and technical users
Hands on experience on complete Software Development Life Cycle SDLC for the projects using methodologies like Agile and hybrid methods.
Extensive experience in developing Kafka producers and Kafka consumers for streaming millions of events per minute on streaming data using PySpark, Python & Spark Streaming.
Experience in analyzing data using Big Data Ecosystem including HDFS, Hive, HBase, Zookeeper, PIG, Sqoop.
Designed ETL using Internal/External tables and store in parquet format for efficiency.
Designed, developed, and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
Written python scripts using python libraries such as pandas, NumPy that does read/write operations on large CSV files, perform data aggregations and compare data by columns.
Involved in the Agile Development process (Scrum and Sprint planning).
Created various models for predictive analysis using machine learning with Python/spark.
Developed multiple Spark jobs in Python for data cleaning and preprocessing.
Implement Data integrity checks in the load process and between database distribution layers.
Integrated Hadoop, Hive, and AWS S3 with Ab Initio to streamline data processing and analytics workflows.
Involved in Complete Software development model including waterfall and agile methodology Business Analysis to Development, Testing, Deployment and Documentation.
Hands on experience of writing wrapper shell script to initialize variables, run graphs and perform error handling
Strong experience in data analysis using SQL results, Power BI and MS Excel.
Strong experience with SQL database programming, SQL performance tuning for complex SQL queries.
Proficient in communication, analytical & problem-solving skills to handle client-facing roles & end user communications.
CORE TECHNICAL SKILLS
Data Engineering & ETL
Ab Initio (GDE, Co>Op, EME, PDL, Metadata Hub), SSIS, Informatica (exposure), AWS Glue, Azure Data Factory
Big Data & Streaming
PySpark, Spark Scala, Apache Spark, Hadoop (HDFS, Hive, Sqoop), Kafka, Spark Streaming
Programming & Scripting
Python, SQL, Shell Scripting
Cloud Platforms
AWS (S3, Redshift, Glue, Lambda), Azure (ADF)
Databases & Warehousing
Oracle, Teradata, MSSQL, Hive
Reporting & Analytics
Power BI
DevOps & Methodologies
Agile/Scrum, GitHub, CI/CD, Automation, Data Quality Frameworks
PROFESSIONAL EXPERIENCE:
Senior Technical Lead Engineer Virtusa Consulting Services – Citibank Tampa, FL
Oct 2023 – Present
Lead the design and development of large-scale Ab Initio ETL pipelines processing high-volume financial data with strict SLA requirements.
Architect and optimize complex batch and near-real-time ETL workflows, improving performance and scalability by 30%.
Own end-to-end Ab Initio infrastructure setup across Dev, QA, UAT, and Production environments.
Implement and govern Metadata Hub solutions to derive end-to-end business and technical data lineage.
Drive SSIS to Ab Initio migration initiatives, standardizing enterprise ETL frameworks and coding standards.
Modernize legacy ETL workloads by migrating to PySpark and Spark Scala, reducing batch execution time by 35%.
Design and implement Python-based automation frameworks for data quality checks, reconciliation, and monitoring.
Build and support real-time ingestion pipelines using Kafka and Spark Streaming for fraud detection use cases.
Contribute to AWS-based cloud data solutions, optimizing S3 storage, compute usage, and overall cost by 20%.
Perform code reviews, performance tuning, and root cause analysis for production issues.
Lead Agile ceremonies, mentor junior engineers, and coordinate with global onshore/offshore teams.
Performed performance tuning in Databricks (cluster sizing, caching, Z-ordering, vacuuming) to reduce job runtimes and improve query efficiency.
Developed Python scripts, SQL, T-SQL, and PL/SQL queries for automation, validation, and custom utilities.
Built and maintained data pipelines using Azure Data Factory (ADF), Databricks, Hive, Sqoop, and Spark to move, transform, and process large datasets.
Set up and managed Databricks workspaces and clusters across Dev, Test, and Prod with autoscaling, runtime tuning, and workload-specific configurations to reduce costs while improving performance.
Leverage Hadoop components like HDFS (storage) and YARN (resource management) for handling large datasets efficiently.
Worked closely with business users, architects, and offshore teams to plan tasks, deliver on time, and share knowledge.
Performed POC on Ab initio metadata hub Operational/Business lineage using SQL server
Environment: Ab initio GDE, VS Code, GitHub, Lightspeed, Pyspark, Python, Control Centre, HDFS, Hive
Senior Software Engineer Virtusa Consulting Services – Citibank Hyderabad, India
Feb 2020 – Oct 2023
Designed and developed complex Ab Initio graphs for large-scale banking and regulatory data transformations.
Analyzed business requirements and translated them into scalable ETL design specifications.
Participated in multiple proof-of-concept initiatives, including SSIS-to-Python conversion frameworks.
Built reusable ETL components and templates to improve development productivity and consistency.
Carried out data quality checks using Pyspark, Hive to make sure data matched between source and target systems.
Built and supported data pipelines using ADF, Hive, Sqoop, and Spark to move and transform large volumes of data.
Developed and optimized PySpark pipelines integrated with enterprise data platforms.
Leveraged Spark Scala for distributed joins, aggregations, and performance tuning.
Implemented data validation, reconciliation, and exception handling mechanisms.
Supported UAT and production deployments, ensuring smooth releases and minimal downtime.
Collaborated with BI teams to deliver Power BI dashboards and analytics datasets.
Actively participated in Agile/Scrum ceremonies, sprint planning, and retrospectives.
Provided production support, defect fixes, and performance optimizations.
Environment: Ab initio GDE, VS Code, GitHub, Lightspeed, SQL, SSIS, Unix Shell scripting, Python, Control Centre, HDFS, Hive, Azure Databricks
Senior Technical Lead Virtusa Consulting Services – BMO (Bank of Montreal) Hyderabad, India Feb 2018 – Jan 2020
Built reusable Ab Initio components and parameterized frameworks to improve delivery consistency.
Developed and optimized PySpark pipelines integrated with Hive for analytics use cases.
Used Spark Scala for distributed joins, aggregations, and performance tuning.
Implemented data validation, balancing, reconciliation, and exception handling mechanisms.
Supported UAT and production releases, ensuring minimal downtime.
Responsible for designing the table’s architecture to fulfil the requirement.
Responsible for trouble shooting, identifying and resolving data problems.
Experienced in Data Bricks, Hive SQL, Azure CI/CD pipeline, Delta Lake, Hadoop file system, Snowflake.
Leveraged Azure Databricks notebooks for interactive data exploration, visualization, and analysis, showcasing proficiency in utilizing the collaborative notebook environment for data analysis and exploration.
Created data bricks notebooks using Python (PySpark), and Spark SQL for transforming the data that is stored in Azure Data Lake stored Gen2 from Raw to Stage and Curated zones.
Developed multiple Spark jobs in Python for data cleaning and preprocessing
Environment: Ab initio GDE, VS Code, Unix Shell scripting, Python, Control Centre, HDFS, Hive, Oracle, Spark SQL, Apache Spark Data frames
Data Engineer Tata Consultancy Services – JPMorgan Chase Hyderabad, India
Jul 2015 – Feb 2018
Designed and developed Ab Initio ETL solutions handling high-volume, multi-source financial data.
Created detailed technical designs and mapping documents for ETL implementations.
Built PySpark pipelines integrated with Hive for enterprise analytics use cases.
Used Spark Scala for distributed data processing on billions of financial records.
Optimized ETL workflows through performance tuning and parallelization techniques.
Implemented data quality checks, balancing, and reconciliation reports.
Collaborated with business analysts, QA, and production support teams.
Supported daily batch operations and resolved production issues within SLA timelines.
Participated in code reviews and ensured adherence to development standards.
Assisted in data migration and platform modernization initiatives.
Environment: Ab initio GDE, Unix Shell scripting, Python, Control Centre, Oracle, Spark SQL
ETL Developer Cognizant Technology Solutions – American Express Hyderabad, India
Dec 2010 – Jul 2015
Designed and developed Ab Initio ETL workflows for enterprise data warehousing and reporting systems.
Analyzed source-to-target mappings and implemented complex transformations.
Tuned SQL queries and Ab Initio graphs to improve batch performance.
Developed reusable ETL components and parameterized frameworks.
Conducted unit testing and integration testing for ETL jobs.
Supported production deployments and resolved data-related issues.
Worked closely with onsite-offshore teams to meet Agile sprint commitments.
Ensured data accuracy, completeness, and consistency across systems.
Maintained ETL documentation and technical specifications.
Participated in performance optimization and capacity planning activities.
Well versed with various Ab Initio Parallelism techniques and implemented Ab Initio Graphs using Data Parallelism and Multi File System (MFS) techniques
Integrated Hadoop, Hive, and AWS S3 with Ab Initio to streamline data processing and analytics workflows.
Involved in Complete Software development model including waterfall and agile methodology Business Analysis to Development, Testing, Deployment and Documentation.
Hands on experience of writing wrapper shell script to initialize variables, run graphs and perform error handling
Environment: Ab initio GDE, Unix Shell scripting, Sybase, Control Centre, Oracle.
EDUCATION:
Bachelor of Computer Science Engineering
Jawaharlal Nehru Technological University (JNTU), Hyderabad – 2009
CERTIFICATIONS:
AWS Certified Solutions Architect – Associate
Ab Initio Certifications (L0, L1, L2 – Internal)
Python Certified Associate Programmer (PCAP-31-03)
Artificial Intelligence & Career Empowerment – University of Maryland
PySpark / Spark Scala (Internal – L0, L1)
AWS Cloud Assessment, Agentic AI & Gen AI Assessments (Internal)