Resume

Data Engineer

Location:

Mount Pleasant, MI, 48858

Posted:

April 10, 2024

Contact this candidate

Resume:

VIJENDAR

DATA ENGINEER

Phone:989-***-**** Email:ad4wwi@r.postjobfree.com LinkedIn

SUMMARY

• 5+ years of experience as a Data Engineer with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statistical modeling, Data modeling, and Data Visualization.

• Developed enterprise solutions utilizing batch processing with Apache Pig and streaming frameworks, including Spark Streaming, Apache Kafka, and Apache Flink, ensuring high throughput and low latency in data processing.

• Leveraged Big Data technologies such as Hadoop and Spark, as well as HDFS, HBase, MapReduce, Hive, PIGand AWS DynamoDB for scalable data solutions and analytics.

• Experience in Migrating SQL database to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data Warehouse and Controlling and granting database access and Migrating premise databases to Azure Data Lake store using Azure Data Factory.

• Strong knowledge of RDBMS concepts, Data Modeling (Facts and Dimensions, Star/Snowflake schemes), Data Migration, Data Cleansing and ETL Processes.

TECHNICALSKILLS

Programming Language: Scala, Python, SQL

IDE’s: PyCharm, Jupyter Notebook

Big Data Ecosystem: Hadoop, MapReduce, Hive, Pig, DynamoDB, HDFS, Spark Machine Learning: Linear Regression, Logistic Regression, Decision Tree, SVM,K mean,Random Forest Cloud Technologies: AWS (EC2, S3 Bucket, Amazon Redshift, Lambda, IAM, Kinesis), Azure (Azure Data Lake Storage, Azure Data Factory, Azure SQL databases, Azure Databricks, Azure DevOps, Azure stream analytics, Azure Synapse) Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow, Kafka, PySpark, Apache Airflow CI-CD/Reporting Tools:Jenkins, Tableau, Power BI(DAX), SSRS Database:SQL Server, PostgreSQL, MongoDB, MySQL

Operating Systems: Windows, MacOS

EXPERIENCE

CVS Health, MI Jan2023 – Present

Data Engineer

• Accomplish data importation from diverse sources, utilized Hive, Pig, and MapReduce for data transformation, and maintained data integrity within HDFS.

• Using Spark to accelerate data processing, achieving a 25% increase in processing speed for daily batches of up to 50GB, enhancing the team's data analysis capability.

• Utilize Azure DevOps for continuous integration and deployment, leading to a 30% reduction in software release cycles.

• Industrialize efficient data pipelines that parsed and stored raw data into partitioned Hive tables, improving data retrieval for reporting and analysis by 20%.

• Implement Azure Data Lake Storage, leading to a 30% improvement in data storage and retrieval efficiency for large-scale datasets.

• Working on CI/CD solutionsusing Git and Jenkins to set up and configure the big data architecture on the Azure cloud platform.

• Automated data pipelines using Apache Airflow, reducing manual intervention by 40% and ensuring seamless data flow between systems.

• Collaborate with cross-functional teams to implement automated dashboards and reporting using Tableau, empowering stakeholders with data-driven insights.

• Improve data pipeline efficiency, achieving a 35% reduction in manual intervention through the use of Azure Data Factory. Itinfolab Technologies, India Jan 2018 – Nov2021

Data Engineer

• Developed a data normalization and consolidation process using AWS Glue, reducing data redundancy by 40% and improving data quality scores by 50%, which directly increased the accuracy of machine learning models by 15%.

• Leveraged Spark SQL for the preprocessing, cleansing, and joining of large datasets, ensuring data quality and readiness for analysis.

• Conducted comprehensive architecture and implementation assessments for AWS services, including Amazon EMR, Redshift, and S3, ensuring optimal cloud solutions.

• Designed and Developed Scala workflows for data pull from cloud-based systems and apply transformations to them.

• Worked on modeling, Dimensional Modeling (Star Schema, Snowflake Schema), Data warehousing, and OLAP tools.

• Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon SageMaker.

• Developed a Power BI dashboard to visualize key business KPIs, resulting in a weekly time savings of 10 hours on manual reporting tasks.

• Designed SSIS Packages to transfer data between servers, load data into the database on SQL Server environment, and deploy the data.

EDUCATION

Master in Information Technology Central Michigan University, Mount Pleasant, MI May2023

Contact this candidate