Siva K
+1-469-***-**** *********@*****.*** www.linkedin.com/in/siva-koritela/
Professional Summary:
Cloud Data Engineer with 5+ years of experience in building robust data ingestion and processing pipelines using Python, PySpark, and Scala. Proven track record in delivering large-scale data engineering and data warehousing solutions using AWS and Snowflake ecosystems, including expertise in real-time streaming, data migrations, and high-throughput pipelines.
Programming Languages: Python, PySpark, Scala, SQL, PL/SQL
Strong proficiency in AWS services, including S3, IAM, EC2, EMR, Kinesis, VPC, DynamoDB, Redshift, RDS (Aurora), Lambda, Athena, Glue, DMS, QuickSight, ELB, Auto Scaling, CloudWatch, SNS, and SQS. Hands-on experience with AWS analytics services like Athena, Glue, Data Catalog, and QuickSight for large-scale data processing and visualization.
Experience using Python’s various standard libraries such as NumPy, Pandas, Matplotlib, Scikit-Learn.
Skilled in designing data models, implementing Snowpipe for real-time data ingestion, and creating ELT pipelines using Snowflake's multi-cluster architecture.
Proficient in SQL and PL/SQL for creating stored procedures, triggers, and packages.
Experienced with relational data basses like Oracle, MySQL, NoSQL databases like MongoDB and Cassandra, and performing complex data transformations and optimizations.
Proficient in data ingestion using Sqoop and experienced in scheduling workflows with Oozie and performing data analysis with Hive and Spark SQL.
Hands-on experience with real-time data streaming tools such as Kafka and Flume.
Expert in data warehousing techniques, including SCD table creation, data migration, and building ETL pipelines. Successfully led legacy data migration from on-premises to AWS Cloud and Snowflake, ensuring data consistency and reliability.
Developed and orchestrated ETL workflows using Apache Airflow and Matillion, while implementing data transformation processes with dbt to enhance data quality and accessibility in a cloud-based data warehouse environment
Skilled in processing structured and semi-structured data using JSON, XML, CSV, ORC, and Parquet file formats. Hands-on experience with Spark for performing actions, transformations, and streaming.
Exposure with Google Cloud services such as GCP, BigQuery, GCS Bucket, and G-Cloud Functions for data engineering projects.
Proficient in integrating data from multiple source systems into Snowflake, including loading nested JSON data into Snowflake tables, utilizing Snowpipe, and applying Snowflake Clone and Time Travel features.
Advanced skills in BI reporting tools such as Tableau, Power BI, and Cognos for creating interactive dashboards and visualizing complex data.
Skilled in front-end development using HTML, CSS, JavaScript, and Ajax with cross-browser compatibility.
Experienced in Agile/Scrum and Waterfall practices, contributing to sprint planning, daily stand-ups, and review meetings.
Good experience in handling errors/exceptions and debugging the issues in large scale applications.
Ability to learn and adapt quickly to the emerging new technologies and paradigms
TECHNICAL SKILLS
Programming Languages
Python, Scala, R, SQL and PL/SQL.
Hadoop/Big Data Technologies
HDFS, Sqoop, Hive, Pig, HBase, MapReduce, Spark, Airflow, dbt, Oozie
AWS Cloud Technologies
IAM, S3, EC2, VPC, EMR, Glue, DynamoDB, RDS, Redshift, CloudWatch, CloudTrail, CloudFormation, Kinesis, Lambda, Athena, EBS, DMS, Elasticsearch, SQS, SNS, KMS, QuickSight, ELB, Auto Scaling
Version Control
Git, GitHub, SVN, CVS
Databases/Datawarehouse
Oracle, SQL, MySQL, DB2, MongoDB, PostgreSQL, Teradata, Snowflake
Database Modelling
Dimension Modeling, ER Modeling, Star Schema Modeling, Snowflake Modeling
Reporting Tools
Tableau, Power BI, SSRS, Quicksight
Operating Systems
UNIX, Linux, Windows
Methodologies
Agile Scrum, Waterfall.
EDUCATION
Master of Science in Information Systems and business analytics, Park University – Kansas City, USA
Bachelor of Computer Science & Engineering, Acharya Nagarjuna University, Guntur, INDIA
WORK EXPERIENCE
Fidelity Investments, Irving, USA Nov 2023 - Present
AWS Data Engineer
Responsibilities
Developed robust batch and real-time data pipelines using AWS Glue, Kinesis, Apache Spark, and Python for data ingestion, transformation, and processing.
Implemented scalable ELT processes in Snowflake using Snowpipe for real-time data ingestion and Snowflake tasks for batch processing.
Designed and built enterprise data lakes using AWS S3 to support analytics, storage, and reporting on large datasets.
Created optimized PySpark-based data pipelines on AWS EMR and leveraged Apache Spark on Snowflake for parallel processing and performance optimization.
Migrated data from RDBMS to NoSQL and Snowflake, unifying datasets and enabling seamless integration using Snowflake’s data sharing and replication features.
Developed data pipelines using Apache Spark, Python, and Airflow DAGs, integrating AWS Lambda and Step Functions for automated workflows.
Led data governance initiatives by implementing Snowflake's role-based access controls, data masking, and fine-grained security frameworks for AWS S3 using AWS Lambda.
Orchestrated batch processing using Apache Airflow and real-time streaming with Apache Spark Structured Streaming.
Migrated ETL workloads from on-prem to AWS Cloud, optimizing data workflows and performance.
Worked with Amazon SageMaker to analyze machine learning processes and integrate ML models into data pipelines.
Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, SageMaker, Apache Spark, Kafka, Hive, Python, Tableau, Kibana, Informatica, UNICA.
Innovaccer Inc, Hyderabad, India April 2020 – Aug 2021
Data Engineer/Data Analyst
Responsibilities
Provisioned key AWS Cloud services and configured them for scalability, flexibility, and cost optimization in a multi-region, multi-zone infrastructure.
Designed and automated deployment of AWS resources such as EC2, S3, EFS, EBS, IAM, Jenkins using Terraform scripts and CloudFormation templates.
Created and managed VPCs, subnets, and NAT gateways to support a multi-region, multi-zone infrastructure.
Built cloud data stores in S3 with layered architecture raw, curated, transformed for effective data management and reporting.
Developed data ingestion modules and ETL pipelines using AWS Glue, with transformations and cleansed data loaded into S3.
Configured and managed S3 bucket policies and lifecycle rules to meet compliance requirements.
Implemented CICD pipelines using Jenkins and Git for automated build and deployment of Python/PySpark code.
Built Docker images to run Apache Airflow locally for testing ETL pipelines; managed Docker container clusters with Kubernetes for CI/CD system runtime, including build, test, and deployment.
Built Glue jobs for data cleansing such as deduplication, NULL value handling and standard transformations such as date, string, math operations required by business users.
Leveraged Kinesis Data Streams, Kinesis Firehose, Kinesis Analytics to collect, process, and analyze streaming data.
Created Athena data sources for ad hoc querying and integrated with QuickSight and Tableau for business intelligence and dashboarding.
Used Lambda and Step Functions to trigger Glue jobs and orchestrate end-to-end data pipelines.
Copied fact/dimension and aggregated data from S3 to Redshift for historical data analysis and reporting with Tableau.
Supported production environments and troubleshot issues using Splunk logs to ensure smooth operations and timely resolution of problems.
Utilized PyCharm IDE for Python/PySpark development and Git for version control and repository management.
Environment: AWS (EC2, VPC, S3, EBS, ELB, Lambda, CloudWatch, Glue, Athena, QuickSight), Terraform, Jenkins, Git, Python, PySpark, Shell scripting.
Mars Telecom Services Ltd, India Mar 2018 – April 2020
Hadoop- AWS Developer
Responsibilities
Participated in requirements gathering and translated them into technical specifications for data processing workflows.
Developed data ingestion pipelines using SpringXD into HDFS and created MapReduce jobs for processing large datasets.
Utilized HDFS for data storage and worked extensively with Hive to create internal and external tables, perform data analysis, and validate data using HiveQL.
Developed Hive queries for loading data from HDFS and used Sqoop to import/export data between Hive and Netezza.
Built and automated data pipelines using Oozie for job scheduling, including the orchestration of MapReduce, Hive, and Sqoop jobs, and configured workflows with XML and property files.
Implemented AWS proof-of-concept for transferring data from local file systems to S3, created EMR clusters, and developed Glue jobs for data transformations.
Created Kinesis streams for real-time data processing and integrated with downstream systems.
Worked on Spark to aggregate data from Netezza and created RDDs and HiveSQL queries for data processing and analysis.
Scheduled Oozie workflows for job management, deployed applications across development, staging, and production environments, and validated jobs using CLI and HUE.
Collaborated in Agile environments, participated in daily stand-ups, sprint demos, and client reviews, providing input during PSI planning.
Wrote JUnit and MR Unit test cases for MapReduce jobs and used Cucumber for integration testing, with continuous integration through Jenkins.
Utilized Maven as a build tool for creating application JARs and SVN for version control.
Conducted data validation in Netezza and downstream applications like DB2 to ensure accuracy and consistency
Environment: Hadoop, MapReduce, Java, Spark, Hive, Sqoop, Oozie, HDFS, Netezza, AWS (EMR, Glue, S3, Kinesis), Informatica, DB2, Oracle, Maven, SVN, Jenkins, JUnit.