Data Engineer

Location:

East Newark, NJ

Posted:

November 05, 2024

Contact this candidate

Resume:

Venkata Anil Kumar Poka

Data Engineer

East Newark, New Jersey • ***********************@*****.*** • +1-201-***-**** • LinkedIn • GitHub PROFESSIONAL SUMMARY

Data Engineer with a master’s in data science from Pace University and over 3 years of experience in designing and optimizing scalable data pipelines, ETL processes, data migrations, and big data architectures. Proficient in Python, SQL, and NoSQL, with hands-on experience in AWS, Azure, GCP, and core tools within the Hadoop ecosystem (Hive, PySpark, Kafka, Sqoop, Flume, Impala). Skilled in building RESTful APIs to support real-time data exchange and streamline interoperability. Strong background in data governance, Agile methodologies, and version control (Git), ensuring reliable and timely data solutions. Known for analytical, problem-solving, and communication skills that drive strategic business insights and value. WORK EXPERIENCE

Go Digit General Insurance Private limited Bangalore, India Associate Data Engineer July 2021 - January 2023

Engineered and scaled high-impact data solutions leveraging AWS and big data technologies, accelerating data processing capabilities and optimizing efficiency to support dynamic business objectives.

Spearheaded cross-functional collaboration to design robust data models and deploy RESTful APIs, boosting data accessibility by 40% and achieving a 25% improvement in system interoperability for real-time data updates.

Successfully orchestrated the migration of legacy SQL Server and Oracle databases to AWS PostgreSQL via AWS Database Migration Service (DMS), ensuring 100% data integrity and reducing operational disruptions by 30%.

Developed and optimized high-performance ETL pipelines processing over 2 billion records, increasing data visualization capabilities in real-time and driving a 35% enhancement in processing efficiency.

Modernized real-time streaming pipelines for pre-inspection logs using Kafka, Python, and AWS Lambda, leading to a 30% reduction in manual processing time and facilitating actionable insights.

Deployed a data lake architecture for flight delay insurance, centralizing data sources to enhance claims processing efficiency by 40% and improve reporting accuracy by 50%.

Leveraged the Hadoop ecosystem including HDFS, Hive, Sqoop, and Flume to handle large-scale data storage and optimize data ingestion workflows, achieving a 20% improvement in data processing efficiency.

Designed and automated interactive dashboards, empowering stakeholders with data-driven insights and delivering a 25% increase in workforce productivity.

Led project milestones within Agile frameworks, performing root cause analysis, and driving a 20% improvement in system reliability and operational resilience.

Enhanced Technical Design Documents (TDD), led training initiatives, and executed cost-benefit analyses to optimize resource allocation and expedite project delivery timelines by 15%. Technical Environment: Python, Scala, Java, SQL, Spark, Linux Shell, Hadoop (HDFS, Hive, Sqoop, Flume, HBase, Impala), MongoDB, Oracle, MySQL, PostgreSQL, Database Design, AWS (DMS, EC2, EMR, S3, Redshift, API Gateway), Kafka, Azure Blob, Azure Functions, GCP, Pub/Sub, Tableau, Agile methodologies, Unit Testing, REST API’s, Excel. Prospecta Software Solutions Private limited Gurugram,India Technical Trainee August 2020 - July 2021

Created robust RESTful and SOAP APIs that streamlined communication pathways between internal platforms and third-party services; drove a significant improvement in operational efficiency, reducing latency by an average of 30%.

Assisted in the creation and maintenance of ETL pipelines for data extraction, transformation, and loading, leading to a 30% reduction in data processing time.

Conducted unit and integration testing to verify API functionality and ensure data accuracy, resulting in a 20% decrease in errors reported in production.

Revamped existing API and data process documentation, introducing version control that improved accuracy and accessibility; directly led to a 50% decrease in support queries related to outdated information.

Worked with data analysts and project managers to align on data needs and project objectives, contributing to successful project delivery within deadlines.

Technical Environment: Python, Java, SQL, Spring Boot, Hibernate, Linux Shell, RESTful APIs, SOAP APIs, MongoDB, PostgreSQL, MS SQL, AWS, Swagger, Git, Agile Methodology, Eclipse, IntelliJ. TECHNICAL SKILLS

Programming & Scripting: Python, Java, Scala, Shell Scripting (Linux, Unix, PowerShell), SQL, PL/SQL

Cloud Platforms: AWS (DMS, EC2, S3, Redshift, Lambda, API Gateway), GCP (BigQuery, Cloud Pub/Sub, Cloud Storage, Dataflow), Azure (Data Factory, Blob Storage, Synapse Analytics, Functions, Cosmos DB)

Big Data & Streaming: Hadoop, Spark, HDFS, Kafka, Hive, Sqoop, Oozie, Impala, HBase, Flume, Flink.

Data Engineering: Database Design, Data Modeling, ETL Mapping, Architecture Designing, Data Warehousing, Data Pipeline Automation

Databases: MySQL, PostgreSQL, Oracle, MongoDB, DynamoDB, Cassandra

Data Warehousing & BI: AWS Redshift, Snowflake,Databricks, BigQuery, Power BI, Tableau, IBM Cognos.

Version Control & Dev Tools: Git, Bitbucket, IntelliJ, VS Code, Jupiter Notebook, Excel,Jenkins.

Methodologies: Agile, Scrum

EDUCATION

PACE UNIVERSITY New York, NY

Master of Science in Data Science 2023-2024

LOVELY PROFESSIONAL UNIVERSITY Punjab, India

Bachelor of Engineering, Major in Computer Science; Minor in Data Science 2017-2021 ADDITIONAL INFORMATION

Languages: English, Fluent in Telugu (native), Conversational Proficiency in Hindi.

Awards: Tech Titan (2023); Wall of Awesomeness (2023)

Collaborated with a professor as an assistant on a research project during my master’s program.

Mentored classmates during my master's program, emphasizing data pipelines and architecture to build their technical expertise.

Contact this candidate