Computer Science Data Engineer

Location:

Dallas, TX

Posted:

January 16, 2025

Contact this candidate

Resume:

Varun Goti

**** ******** ****, ****** ** ***** ***********@*****.*** 281-***-****

EDUCATION

Governor's State University, Chicago, Illinois

Master of Science in Computer Science

Jawaharlal Nehru Technological University, Hyderabad, India Bachelor of Technology in Computer Science Engineering 3.6/4

May 2021

3.7/4

May 2019

TECHNICAL SKILLS

Skills: Data Visualization, Data Extraction, Data Migration, Data Modelling, Data Analysis, Data Mining, Shell scripting, Data Mapping, Application Development, Pipeline Creation, Encryption & Decryption, ETL Development. Tools: Hadoop, lnfoworks, BigQuery, Cloud Composer, Dataproc, Airflow, Google Cloud Platform, Cloud Shell, AWS, Amazon Aurora, PostgreSQL, Azure Databricks, Azure SQL Database, Python, Hive, Spark, Teradata, Tableau, SSIS, SQL Management Studio, Visual Studio, Oracle. Data Engineer

PROFESSIONAL SUMMARY:

• 4+ years of IT experience in a variety of industries working on Big Data technology using operational and analytical technologies. Hadoop working environment includes Hadoop, Spark, MapReduce, Hive and Sqoop.

• Fluent programming experience with Python, SQL, T-SQL.

• Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components lik Hive, Sqoop, Spark SQL.

• Experience in migration of data from various data sources such as Hive, SQL server, Teradata and flat files.

• Adept at configuring and installing Hadoop/Spark Ecosystem Components.

• Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN.

• Experience in application of various data sources like Oracle SE2, SQL Server, Flat Files and Unstructured files into a data warehouse.

• Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.

• Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Infoworks and Microsoft SSIS.

• Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS., Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.

• Comprehensive experience in developing simple to complex Map reduce and Streaming jobs using Scala and Java for data cleansing, filtering and data aggregation. Also possess detailed knowledge of MapReduce framework.

• Ample knowledge of data architecture including data ingestion pipeline design, Hadoop/Spark architecture, data modeling, data mining, machine learning and advanced data processing.

• Proficient in troubleshooting and debugging database environments, with a focus on performance tuning and slow query optimization.

• Strong understanding of database engine fundamentals, optimization techniques, and Linux system commands for database administration.

• Familiarity with cloud computing platforms such as AWS, Azure, and Google Cloud, applying cloud-native solutions to database management.

• Knowledgeable in NoSQL technologies, including DynamoDB and MongoDB, for distributed data storage and retrieval.

• Experience working with GCP services like Bigquery, Cloud Composer, Dataproc and Cloudshell.

• Experience working with Azure services like Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory and Azure SQL Data warehouse.

• Developed Spark Applications that can handle data from various RDBMS (MySQL, Oracle Database) and Streaming sources.

• Proficient SQL experience in querying, data extraction/transformations and developing queries for a wide range of applications.

• Capable of processing large sets (Gigabytes) of structured, semi-structured or unstructured data. WORK EXPERIENCE

Data Engineer- CVS Health, Irving, Texas July 2021 – Present

• Analyze and organize raw data. Build data systems and pipelines. Perform Data analysis, Data Migration, Data mapping, Data Cleansing, Transformation, Integration, Data Import and Data Export through Python.

• Performed data engineering functions: data extract, transformation, loading and integration in support of enterprise data infrastructures- data warehouse, operational data stores and master data management.

• Working on migration of data from various data sources such as Hive, Teradata, DB2, SQL server, Excel and flat files from GCS buckets to Google BigQuery.

• Working on migration of data from Hadoop environment using egress framework, an internal framework which uses a json and shell script to egress data (move data) into GCS buckets in .DAT file format.

• Build JSON scripts for egress framework and trigger them using shell scripts to move data (full refresh frequency and incremental frequency tables) from source (Hadoop) to target (GCS buckets).

• Migration of data from GCS buckets and direct sources such as Hive, Teradata, DB2, SQL server, Excel to target (BigQuery) through an ETL tool Infoworks, used for Ingestion of data into Google Bigquery.

• Create pipelines for movement of data from Infoworks to Google BigQuery.

• Automating Cl/CD pipelines by creating workflows in Infoworks and using scheduler to automate data based upon the refresh frequency. Trigger workflows at scheduled time using python wrapper scripts.

• Schedule, Monitor and Manage workflows using Cloud Composer. Designed several DAGs in Airflow.

• Create BigQuery authorized views for row level security.

• Working on SQL databases and using languages such as Python, Hive and Spark.

• Ingesting data of file formats like csv and parquet into Infoworks(an ETL tool to migrate data from various data sources such as Hive/Hadoop, Teradata, DB2 and other sources).

• Administered database engines by performing installations, configuring environments, and optimizing performance through tuning and query optimization.

• Provided technical support and troubleshooting for database systems, including Oracle, PostgreSQL, and Amazon Aurora PostgreSQL, ensuring minimal downtime and high availability.

• Conducted in-depth debugging and issue resolution for technical systems, leveraging knowledge of Linux OS file manipulation and system monitoring commands.

• Utilized networking expertise, including the OSI model, to diagnose and resolve network-related database issues.

• Implemented database migration strategies, ensuring seamless transitions with minimal impact on operational workflows.

• Designed and deployed monitoring and alerting solutions for database environments, improving issue detection and resolution times.

• Perform Data Validation by creating count scripts in source and BigQuery procedures in target.

• Participated in full software development lifecycle with requirements, solution design, development, QA implementation and product support using Scrum and other Agile methodologies.

• Perform Google Cloud Development using Google Cloud Platform (GCP) services like BigQuery, Cloud Shell, Cloud Composer and DataProc.

• Working closely with business users in understanding requirements and aligning implementation to meet the requirements. Environment: Python, BigQuery, Cloud Shell, Teradata, DB2, JSON, Microsoft Excel, T-SQL, Spark, SQL, ETL, Hive/Hadoop, Data Pipeline, Infoworks, Cloud Composer, Dataproc, GCS Buckets, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), SQL Server Analysis Services (SSAS), AWS

Data Engineer- Grey Campus, Hyderabad, India Apr 2019 – January 2020

• Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

• Architect & implement BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks).

• Migration of data to Azure cloud platform from various sources like Hive, Db2, Teradata, Excel and SQL server.

• Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools.

• Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.

• Created Airflow Scheduling scripts in Python.

• Identify and implement best practices, tools and standards.

• Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse.

• Working with the design and development of Tableau visualization solutions.

• Analyzed requirements for various reports, dashboards and scorecards using Tableau Desktop.

• Involved in creating dashboards and reports in Tableau. Engage with business users to gather requirements and design visualizations. Environment: Python, Azure SQL, Data Lake, Databricks, Microsoft Excel, Oracle, Tableau. CERTIFICATIONS

• Machine Learning Internship: For Data Science with Real Exercises from Microsoft.

• Microsoft Technology Associate (Python Programming): For Data Science with Real Exercises from Microsoft.

Contact this candidate