PROFESSIONAL SUMMARY:
• Experienced Data Engineer with 5 years in Big Data technologies, ETL development, and data pipeline design, with a strong understanding of the software development lifecycle from planning to deployment, delivering scalable solutions across Waterfall and Agile methodologies.
• Hands-on experience with cloud computing platforms like GCP (BigQuery, DataProc, Cloud Storage, Composer), Azure (Data Factory, Blob Storage, Databricks, Cosmos DB, Azure SQL, SQL Server) for building end-to-end, cloud-native data architecture and solutions.
• Proficient in Apache Spark, Flink, Hadoop (Distributed Computing HDFS, MapReduce, Hive, Sqoop), Kafka, Kinesis, Apache Beam, real- time data processing and data manipulation with Spark Streaming and PySpark to optimize large-scale, data-driven workflows.
• Strong expertise in Snowflake, Data Vault 2.0, RDBMS like SQL, writing optimized SQL queries, Dimensional Data Modeling and information technology-driven data mining techniques for uncovering insights from complex datasets including migrating Teradata objects, handling nested JSON, and building ELT pipelines with Airflow and Python, supporting enterprise-grade software systems and solutions.
• Skilled in CI/CD implementation using AWS Code Pipeline, Docker, Kubernetes (for containerization) and Terraform, along with monitoring solutions like Splunk and CloudWatch for infrastructure, data security, log analysis, quality assurance and security policies.
• Proficiency in Python, Java, Scala, and working with RESTful APIs for data extraction, processing, and automation in data science-driven cloud-based microservices architectures, applying core engineering principles. Expertise in performance tuning, data validation, data profiling, and data migration while optimizing large-scale data processing systems using Snowflake, Spark SQL, and Hadoop ecosystems.
• Developed and maintained dbt models in Snowflake, implementing business logic, ensuring alignment with data architecture standards.
• Designed and managed scalable data pipelines and a robust data warehouse to support reporting, analytics, and operational use cases.
• Conducted data analysis to validate assumptions, troubleshoot issues, enhance data quality, data structure and data lineage visibility.
• Certified Data Vault 2.0 Practitioner (CDVP2.0) and Certified Business Intelligence Professional (CBIP) with strong experience in data governance, data curation, data analytics, business intelligence tools such as Looker, fintech and enterprise data solutions.
• Skilled in workflow management and root cause analysis, with experience optimizing information systems. Proven ability to build scalable data pipelines and architectures, supporting cross-functional teams in fast-paced environments with a focus on technical knowledge sharing.
• Skilled in AWS cloud services like SNS, EC2 with experience in serverless computing using Lambda and scripting language using Bash.
• Proficient in DynamoDB and Postgres, with a strong foundation in statistics and informatics for optimizing cloud services.
• Implemented inclusive digital experiences by applying WCAG 2.2 AA standards, leveraging assistive technologies, and integrating digital accessibility best practices to ensure seamless access and usability for all users, with consideration for scalable system design principles. Technical Skills:
Programming Languages: Python, SQL, Java, JavaScript, TypeScript, Scala, php, XML, Node.js, Golang, OOP, Unity, Ruby, Perl Data Warehousing & Databases:
• Relational Databases (RDBMS): MySQL, PostgreSQL, ANSI SQL, MS SQL Server, stored procedures.
• NoSQL Databases: MongoDB, Cassandra, HBase.
• Data Warehousing: Amazon Redshift, Google BigQuery, Snowflake, dbt. Big Data Technologies: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos, PySpark Airflow, YAML, Snowflake Spark, GraphQL, Alteryx, Dataiku, GraphDB, TensorFlow, Distributed Systems Cloud Computing Platforms / Cloud Technologies: Public cloud (Azure, GCP, AWS)
• Azure: Services like Azure Data Lake, Azure DevOps, Synapse Advanced Analytics, and Azure Databricks.
• Google Cloud Platform (GCP): BigQuery, Cloud Dataflow, and Cloud Storage.
• AWS: Services like S3 (storage), AWS Redshift (data warehousing), RDS (Relational Database Service), Lambda (serverless computing), and EMR (Hadoop/Spark), Data Management, Data Architecture, text Mining, Data Visualization, Data cleaning. Visualization & ETL tools: Tableau dashboard, Microsoft Power BI, Power Apps, Informatica, Talend, SSIS, PLM, Reporting software tools Web/Application server: Apache Tomcat, WebLogic, Gradle, GUI, WebSphere, Web Architectures, Custom Web development Machine Learning Algorithms: Regression, Decision Trees, NLP, Artificial Intelligence (AI), Scikit-learn, Pandas, LLMs. Version Control Systems & Tools: GIT Management, Maven, SBT, SSRS, CBT, GitHub, GitLab, Jenkins, Matplotlib, Seaborn, control-m Sai Chandana
Data Engineer
**************@*****.***
Work Experience:
Client: Key Bank Role: Data Engineer Mar 2023 - Present Responsibilities:
• Implemented Azure Data Factory (ADF) extensively for data ingestion from different source legacy systems like relational and unstructured data to meet business analytics, business requirements, business intelligence tools, business strategy, provisioning in financial services.
• Design and developed Batch processing and real-time processing solutions using ADSL, ADF, Databricks clusters, stream processing and log analysis in compliance with GDPR, enabling insights for cost analysis, impact analysis and supporting consumer web data use cases.
• Created 25+ data pipelines in Azure using Azure Data Factory v2 to get the data from disparate source systems by using different Azure Activities like Move &Transform, Copy, filter, for each, Databrick products etc., supporting enterprise solutions, using Genesys data.
• Maintained and optimized Azure data infrastructure, data flow and pipelines using ADF and PySpark, with complex data transformation, debugging, performance tuning, data encryption, capacity planning for scalable batch and real-time processing, using Ansible automation.
• Automated data preparation jobs using triggers like Events, Schedules and Tumbling in ADF, reducing manual effort by 30%, provisioned Databricks clusters, notebooks, jobs, and autoscaling, contributing to performance improvement and leveraging cloud infrastructures.
• Performed application monitoring, automation, and refinement of data engineering solutions in a Scrum setup and scheduled automated business workflows and metadata management using Microsoft Azure Logic Apps, ensuring compliance with regulatory requirements.
• Designed and developed a new solution design to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue and Created Linked services to connect the external resources to ADF for data representation, contributing to RFC documentation.
• Migration of on-premises data (Oracle/ Teradata) to Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF V1/V2).
• Implemented Artifactory as a universal repository manager to support the major package formats and improve the company's Continuous Integration / Continuous Development (CICD) pipeline, supporting database development and database administration tasks.
• Created, provisioned different Databricks clusters needed for batch and continuous streaming data processing, data editing, streaming technologies and installing the required libraries for the clusters, project management skills to coordinate resources and timelines. Utilized JavaScript, Spring boot and front-end development frameworks (React, Nucleus) for data visualization and interactive user interfaces.
• Implemented Kimball Methodology to construct data warehousing design systems within an Agile environment. Developed custom UDFs in python to be used in spark-SQL. Utilized Elasticsearch and Kibana for indexing, visualizing with efficient resource management. Client: Cigna Role: Data Engineer Mar 2020- Jul 2022 Responsibilities:
• Designing, developing, and maintaining scalable data pipelines using GCP services like Dataflow, Pub/Sub, and Cloud Composer to ingest, process, and store large volumes of HIPAA-compliant health insurance data from various sources, with integrated quality checks.
• Implementing and managing data storage solutions such as BigQuery, Cloud Storage, and Cloud SQL database technologies and using Perl to ensure efficient, secure, and cost-effective storage of structured data and unstructured data, with a strong focus on data privacy.
• Experienced with integration tools and orchestration/scheduling platforms such as Airflow, Prefect, Autosys, and Automic, with hands-on expertise in GCP services like Dataproc, GCS and Cloud Functions, and a strong ability to quickly adapt and integrate new technologies.
• Demonstrated some experience in all aspects of development including, but not limited to, gathering requirements, development of technical components related to process scope and supporting testing, JIRA tracking, and post implementation support, leveraging Linux.
• Experience in building PowerBI reports on MS Analysis services for better performance, using Excel for reporting. Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery. Coordinated with the team and developed a framework to generate daily ad hoc reports and extracts from enterprise data, reducing reporting time by 40% through automation and monitoring instrumentation.
• Proven track record of delivering impactful products and realizing product vision, with working knowledge of telemetry systems and real- time data processing to enhance performance and insights across industries including the healthcare and hospitality industry trends.
• Expertise in cloud deployment of code, Hadoop clusters and different Big Data analytic and modeling tools including Pig, Hive, SQOOP.
• Experienced in implementing and managing backup and recovery systems to ensure data integrity and availability in large-scale data software engineering environments. Skilled in re-engineering practices, processing documentation, predictive analytical skills, propensity modeling, technical analysis and programming skills to optimize workflows, enhance system performance and support ERP integration. EDUCATION:
• Master’s in computer science from University of Michigan, USA