Post Job Free
Sign in

Data Engineer Analysis

Location:
Queens, NY
Posted:
April 28, 2025

Contact this candidate

Resume:

Januka Pandey

Data Engineer/Analyst Queens, New York

929-***-**** ****************@*****.*** linkedin.com/in/januka-pandey Professional Summary

Detail-oriented and results-driven Data Engineer/Analyst with over 5 years of experience in data analysis, engineering, and management. Proficient in manipulating large datasets using SQL and Python, with expertise in developing and optimizing ETL pipelines, data warehousing, and data pipeline automation. Skilled in cloud platforms such as Google Cloud Platform (GCP), AWS, and Snowflake, with hands-on experience in data governance and ensuring data integrity.

Strong background in collaborating with cross-functional teams to gather data requirements, resolve data-related incidents, and support data-driven business decisions. Experienced in working with ServiceNow for incident management, GitHub for version control, and Rally for task tracking. Adept at automating reports using Excel VBA and macros to streamline processes and enhance reporting accuracy. Additionally, proficient in data visualization tools such as Tableau and Power BI, with the ability to create interactive dashboards and reports to present complex data insights. Known for strong problem-solving and communication skills, combined with a passion for transforming data into meaningful insights that drive operational efficiency and strategic growth.

Skills

Education

ASA College Associate's Degree Associate Degree in Medical Assistant Capella University Bachelor of Arts (BA) Science in Information Technology Work Experience

Data Engineer/Analyst UnitedHealth Group NY, New York January '23 - Present Responsibilities:

Managed and maintained data pipelines using SQL and SSMS to support healthcare data operations and analytics.

Resolved data-related incidents through ServiceNow by investigating root causes, implementing fixes, and ensuring data integrity.

Collaborated with stakeholders and business partners to gather data requirements, address requests, and provide timely solutions.

Utilized GitHub for version control and Rally for tracking project progress and task management.

Performed ETL tasks to process and integrate data from various sources, ensuring smooth data flow and consistency.

Developed and automated reports using Excel VBA to minimize manual work and enhance reporting accuracy.

Cleaned, validated, and transformed large datasets using SQL and Python, ensuring data quality and reliability.

Worked closely with cross-functional teams to streamline data processes and support business decisions.

Contributed to data architecture discussions, helping to improve data storage and retrieval processes.

Trained junior team members on SQL best practices and data visualization techniques. Programming & Scripting SQL, Python, R

Data Engineering & ETL Hadoop, Spark, Apache Kafka, Airflow, Apache Oozie, Data Pipelines, Data Integration, Data Cleansing

Data Analysis & Visualization Tableau, Power BI, Excel VBA, Excel Macros, A/B Testing, Time-Series Analysis, Regression Analysis

Databases & Storage MySQL, SQL Server, Snowflake, BigQuery, HBase, MongoDB, DB2, Hive Cloud Platforms & Tools AWS, GCP

Version Control & Project

Management

GitHub, Rally, ServiceNow

Technology Used: SQL, Python, SSMS, Snowflake, ServiceNow, GitHub, Rally, and Excel VBA Data Engineer AT&T New York, NY Jun 2020 – October 2022 Responsibilities:

Modeled complex problems and identified opportunities using advanced analytics, algorithms, and visualization techniques.

Integrated and prepared large, diverse datasets; designed databases and computing environments.

Developed Spark SQL scripts and configured Spark Streaming for real-time data ingestion.

Used Apache Kafka for streaming data into Spark and publishing processed results.

Imported/exported data between HDFS and Hive using Spark, optimized workflows.

Implemented dynamic partitioning, managed and external Hive tables for performance gains.

Created Hive tables, loaded data, and wrote queries invoking Spark jobs.

Converted raw data to AVRO format via Spark for Hive table integration.

Automated tasks with Apache Oozie and Spark workflows.

Managed semi-structured data using HBase and MongoDB; created HBase tables.

Wrote Hive UDFs and provided business analysis via Hive queries.

Transferred data from DB2 to HDFS and optimized data lake integration.

Designed partitioned and bucketed Hive tables for query optimization.

Utilized Python, NumPy, and Panda’s libraries to manipulate and analyze large datasets, enhancing data processing efficiency.

Implemented SQL queries for data extraction, transformation, and loading, ensuring data accuracy and integrity.

Gained basic knowledge of Apache Airflow for orchestrating data pipelines and workflows. Technologies Used: SQL, Python, Spark, Apache Kafka, HDFS, Hive, HBase, MongoDB, DB2, Apache Oozie, Apache Airflow, NumPy, Pandas

Junior Data Engineer Northern Trust Chicago, IL Apr 2018 – Mar 2020 Responsibilities:

Installed and configured Hadoop MapReduce and HDFS; developed multiple MapReduce jobs in Java for data cleaning and processing.

Monitored and managed Hadoop log files for troubleshooting and optimization.

Processed large structured, semi-structured, and unstructured datasets by loading and transforming them into the Hadoop ecosystem.

Implemented partitioning, dynamic partitions, and buckets in Hive for efficient query performance.

Developed Java MapReduce programs to analyze sample log files stored in the Hadoop cluster.

Designed workflows with Apache Oozie to handle job dependencies and managed resources using YARN.

Extracted data from SQL-based databases into HDFS using Sqoop.

Built efficient Hive and MapReduce scripts for data analysis and processing.

Coordinated cluster activities using Zookeeper to enhance system stability.

Loaded data from UNIX file systems into HDFS for distributed processing.

Installed and configured Hive; developed and implemented custom Hive UDFs.

Automated data ingestion from FTP servers to Hive tables using Oozie workflows.

Created and managed Hive tables, loaded data, and executed Hive queries internally mapped to MapReduce.

Exported processed data to relational databases via Sqoop, enabling visualization and report generation for the BI team.

Technologies Used: Hadoop, HDFS, Java, MapReduce, Apache Oozie, Sqoop, Hive, Zookeeper, SQL REFERENCES AVAILABLE UPON REQUEST



Contact this candidate