Data Engineer Big

Location:

Charleston, IL

Posted:

February 18, 2025

Contact this candidate

Resume:

Veneesha jureddy

DATA ENGINEER

Email id: **********@*****.*** Contact no:+1-845-***-****

LinkedIn: veneesha

PROFESSIONAL SUMMARY

Over 3 years of experience in designing, developing, and optimizing data pipelines, ETL processes, and scalable data architectures using modern technologies like Spark, DBT, Databricks, and Snowflake.

Skilled in implementing cloud-based solutions on AWS and managing big data ecosystems with services like Glue, EMR, S3, and Redshift, achieving optimized performance and cost efficiency.

Strong expertise in creating robust database models, stored procedures, and integrating data across various platforms, including Snowflake, PostgreSQL, and NoSQL databases like DynamoDB and MongoDB.

Improved query performance, reduced ETL processing time, and enhanced data accessibility by leveraging Spark SQL, partitioning strategies, and scalable pipelines.

Delivered near-real-time analytics by integrating Kafka, Snowpipe, and Spark Streaming, enabling faster decision-making and improved data freshness.

Designed and managed CI/CD pipelines, leading to a significant reduction in deployment time while improving system reliability and scalability.

Experienced in gathering requirements, collaborating with stakeholders, and delivering business-driven data solutions, fostering teamwork and effective communication.

Proficient in handling large-scale datasets and streaming data using tools like Hive, Kafka, Cassandra, and Airflow to streamline workflows and enhance operational efficiency.

Supported advanced analytics with medallion architectures, ELT pipelines in DBT, and integration of Snowflake with data lakes, driving improved insights and business outcomes.

Successfully managed projects in Agile environments, ensuring timely delivery of scalable, secure, and high-performance data solutions.

TECHNICAL SKILLS

Programming Language

SQL, Python, R, Scala, UNIX Shell Script, Power Shell, SQL, YAML

Application/Web Servers

WebLogic, Apache Tomcat 5.x/6.x/7.x/8.x

Hadoop Distributions

Horton Works, Cloudera Hadoop

Hadoop/Big Data Technologies

HDFS, Hive, Sqoop, Yarn, Spark, Spark SQL

Big Data

Azure Storage, Azure Data Factory, Azure Analysis Services, Azure Database, Map Reduce, AWS

Hadoop/Spark Ecosystem

Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, Airflow

Hadoop Distribution

Cloudera distribution and Hortonworks

Cloud Platforms

AWS: Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue,Kinesis, Lambda, EMR, Redshift, DynamoDB

Azure: Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake, Azure HDInsight, GCP, OpenStack.

DBT and Databricks

ETL Tools

Informatica, Data Studio

Reporting Tools

Power BI, Tableau, SSRS

Virtualization

Citrix, VDI, VMware

PROFESSIONAL EXPERIENCE

Client: Fidelity investments, Remote Feb 2024 – Current

Role: Data Engineer

Responsibilities:

Developed medallion architecture, building bronze, silver, gold, and platinum models in DBT, improving data processing efficiency .

Developed ELT pipelines using DBT and Databricks, reducing processing time by 40% and improving data transformations.

Configured DBT Cloud with Databricks to optimize data modeling, lineage tracking, and transformation workflows.

Implemented AWS Glue-based ETL workflows, leveraging Spark SQL and PySpark for large-scale data transformation, aggregation, and schema conversion.

Designed and optimized Amazon Redshift schemas, improving query performance and data efficiency.

Enhanced Redshift cluster performance by query tuning, optimizing distribution styles, and managing sort keys.

Integrated Snowflake with AWS S3 using Matillion, streamlining data ingestion and reducing transformation time by 25%.

Implemented AWS Snowpipe for automated near-real-time data ingestion, reducing latency for analytics workloads.

Automated data workflows using AWS Step Functions and AWS Lambda, reducing manual intervention in ETL tasks.

Assisted in automating infrastructure provisioning with Terraform, ensuring scalable and cost-efficient cloud resource deployment.

Configured AWS CloudWatch and DataDog for real-time monitoring, logging, and performance alerting of AWS services and pipelines.

Managed AWS Secrets Manager to securely store and rotate credentials.

Collaborated on IAM policy design, implementing best practices for secure AWS resource access management.

Contributed to CI/CD pipeline maintenance, improving deployment efficiency for data workflows.

Built interactive dashboards in Looker Studio, providing actionable insights and self-service reporting capabilities.

Worked with cross-functional teams to define data requirements and design scalable data solutions for analytics and reporting.

Environment: AWS (Glue, Redshift, S3, Lambda, Step Functions, Snowpipe, EventBridge, Secrets Manager, CloudWatch, Transfer Family), DBT, Databricks, Apache Spark (PySpark, Spark SQL), Matillion, Kafka, Snowflake, SQL (Redshift SQL, PostgreSQL, Spark SQL), Terraform, SCALR, Git, Jenkins, DataDog, Dynatrace, Looker Studio.

Client: Pristyn Care, India Aug 2021 – Jan 2022

Role: Data Engineer

Responsibilities:

Utilized AWS Glue, S3, and EMR for migration processes.

Collaborated effectively with cross-functional teams to set up new Hadoop users, including creating Linux users and testing HDFS, Hive, Pig, and MapReduce access.

Integrated Snowflake with data lake solutions using Apache Spark to support advanced analytics, reducing data processing time for large datasets by 50%.

Managed data synchronization from HDFS to S3, and subsequently from S3 to Amazon Redshift.

Created and optimized Hive tables with partitioning and bucketing, improving query performance.

Authored ETL tasks using Spark SQL to deliver transformed and aggregated data metrics to application users.

Utilized Spark APIs to execute necessary transformations and actions for building the data model according to specific needs.

Developed and maintained scalable and efficient data models using both SQL and NoSQL databases, enhancing data querying and management practices.

Environment: AWS Glue, S3, Redshift, HDFS, Hive, AWS, MapReduce, Kafka, Amazon DynamoDB, PostgreSQL Spark SQL, Python, Scala, SSIS, ELK.

Client: Paytm, India Mar 2019 – Aug 2021

Role: Data Engineer

Responsibilities:

Efficiently ingested gigabytes of clickstream data daily from FTP servers and S3 buckets using customized home-grown Input Adapters, ensuring seamless data integration and availability.

Enhanced data accessibility by creating Sqoop scripts for seamless data import/export between RDBMS and S3 data stores.

Improved data processing and analytics capabilities by developing and managing ELT pipelines using Spark and AWS Glue.

Achieved significant data transformation and enrichment by developing Spark applications using Scala, improving data quality and usability.

Boosted processing efficiency and reliability by troubleshooting and fine-tuning Spark applications, reducing overall processing time.

Streamlined real-time data streaming by creating Kafka producer APIs for live JSON data and developing Spark-Streaming applications to process and store data in HBase.

Enhanced system scalability and resource efficiency by orchestrating containerized big data tools, including Spark and Kafka, using Kubernetes.

Maintained historical data accuracy by implementing SCD-II logic and integrating Change Data Capture (CDC) data into the target warehouse using PowerExchange.

Optimized data handling and analysis by creating and managing Hive tables, implementing partitioning, dynamic partitions, and buckets.

Improved performance for large dataset operations using Spark's in-memory capabilities and broadcast variables for efficient data joins and transformations.

Successfully managed big data workflows in the AWS cloud environment, leveraging EMR clusters and S3 for enhanced performance and scalability.

Ensured timely delivery and adaptability to changes by following Agile methodologies throughout the project lifecycle.

Created and maintained conceptual, logical, and physical database models to support data management and analysis.

Environment: AWS, Scala, Hive, HDFS, Apache Spark, Apache Airflow, Oozie, Sqoop, Cassandra, Shell Scripting, Power BI, Mongo DB, Jenkins, UNIX, JIRA, Git.

EDUCATIONAL EXPERIENCE

Masters in Information Technology Jan 2022-Dec 2023

University of west florida 3.50 GPA

Contact this candidate