Manikanta Jarugula
EMAIL: ************@*****.*** Mobile:949-***-****
Professional Summary
With over 6 years of experience as a Data / Software Engineer, I have developed deep expertise in managing and optimizing data warehouse environments using technologies such as PySpark, Data Lake, and various database systems including Oracle, Teradata, and SQL Server. I am proficient in leveraging AWS services and data visualization tools like Tableau to deliver actionable insights.
Experienced in designing and developing data pipelines for ETL/ELT processes, handling unstructured data (JSON, logs), and implementing data modeling techniques, such as Star Schema, Snowflake Schema, and Fact- Dimension tables, for cloud and Exadata platforms. Demonstrates proficiency in using Spark (Scala, PySpark, Spark-SQL), Python, and shell scripting for data processing, as well as Kafka for designing efficient streaming data pipelines.
Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple le formats for analyzing & transforming the data to uncover insights into the customer usage patterns
I am skilled in creating interactive dashboards using Tableau with data blending, filter actions, and best practices for visual analytics. Experience with MS SQL Server (SSIS, SSRS, T-SQL), optimizing database storage, and performance tuning. Familiar with Object-Oriented Analysis and Design (OOAD), Agile methodologies, and evaluating technology stacks to build cloud-based analytics solutions.
Experienced Data Engineer skilled in developing and managing data pipelines, covering all ETL and ELT phases. Proficient in transforming unstructured data (JSON, logs) into structured formats to support data analysis and insights.
Expertise in data modeling, migration, and ETL pipeline design for cloud and Exadata environments, with a strong grasp of data warehousing techniques, including Star and Snowflake schemas, as well as Fact and Dimension table structures.
I am skilled in building ETL applications for processing large datasets using tools such as MapReduce, Spark (Scala, PySpark, Spark-SQL), and Pig.
Experienced in Creating DAX Expressions, create Date to yearly columns and implement partitions in Tabular Models for power BI.
Sound knowledge of database architecture for OLTP and OLAP applications, Data Analysis and ETL processes and data modeling and dimension modeling.
Effective Data Governance ensures secure, compliant, and efficient data management. RBAC and ABAC enforce controlled access based on roles and attributes, minimizing security risks.
Data Lineage provides visibility into data flow, supporting transparency and compliance. DLP and Data Security measures prevent unauthorized data access, ensuring regulatory adherence. Cloud Audit & Monitoring enhances real-time security, risk mitigation, and compliance enforcement.
Proficient in setting up and refining Continuous Integration (CI) pipelines using Jenkins, GitLab CI, and GitHub Actions to automate the deployment, testing, and integration of code.
• Proficient in implementing robust Continuous Deployment (CD) strategies, using tools like Kubernetes, Helm, and Terraform to automate deployment pipelines.
Experience on publishing Power BI reports of dashboards in Power BI server and Scheduling the dataset to refresh for upcoming data in power BI server.
Integrated Power BI reports into other applications using embedded analytics like Power BI service (SaaS), or by API automation
Adept at creating interactive dashboards with Tableau, utilizing features like filter and highlight actions, dynamic display with URL actions, and data blending to enhance visualization.
Familiar with implementing best practices for Tableau report development to ensure clarity and actionable insights.
Involved in the migration of applications to AWS and Azure clouds
Expert in creating HIVE UDFs using java in order to analyze data sets for complex aggregate requirements.
Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark- Scala, PySpark, Spark-Sql, and Pig.
Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
Experience on MS SQL Server, including SSRS, SSIS, and T-SQL.
Knowledge on kafka and snowflake
AWS Certified Data Engineer – Associate
TECHNICAL SKILLS:
Programming Languages
SAS, Python, R, Scala
Scripting Languages
Unix, Python, Windows PowerShell
Big Data Technologies
Hadoop, Hive, Pig, Spark, Kafka, HDFS, Map Reduce, NiFi, Flume
Clouds
AWS, GCP, AZURE
Data Warehousing
Data Modelling, Data Warehousing, Data Mart, ETL Data Pipelines, Snowflake
Database Management
SQL (MySQL, MS SQL Server, PostgreSQL, Oracle), NoSQL (MongoDB)
Data Visualization Tools
Tableau, Power BI, Micro Strategy
Data Integration Tools
Informatica, Apache Airflow
Data Quality
Data Profiling, Data Cleansing, Quality Checks, Optimizing Query Performance
Version Control
GitHub
Documentation
Metadata repositories, data dictionaries, report specifications
Direct Tv, El Segundo, CA Nov 2023 to Till Date
Sr. Data Engineer Responsibilities:
●Worked on Full Lifecycle (SDLC) of Data Warehouse projects including Dimensional Data Modelling. Involved in Data Analysis, Data Profiling, Data Modeling, and Data Governance, production support and, ETL Development.
●Developed predictive models using Python & R to predict customer churn and classification of customers.
●Designed and implemented migration strategies for traditional systems on Azure.
●Worked with Azure Cloud Services, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.
●Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
●Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics .
● Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
●Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
●Created tabular models on Azure analysis services for meeting business reporting requirements. Worked on Data Ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL DB, Azure SQL DW), and processing of the data in Azure Databricks.
●Worked on Datacenter migration, azure data services and data visualization.
●Used Spark for Parallel data processing and better performances using Python.
●Used Spark Streaming to divide streaming data into batches as an input to the Spark engine for batch processing.
●Implemented ad-hoc analysis solutions using Azure Data Lake Analytics/Store and HDInsight.
●Developed Spark Applications by using Scala, python, and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
●Developed Spark Programs using Scala and performed transformations and actions on RDDs.
●Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala, and Python. Created Databricks notebooks using PySpark, Scala, and Spark SQL to read and write JSON, CSV, and Parquet.
●Used Visualization tools such as Power View for Excel, and Power BI for visualizing and generating reports.
Environment: Python, NumPy, SciPy, Matplotlib, Pandas, MS Azure, SQL Azure, ADF, Azure Databricks, Data Lake, Azure Blob, HDInsight, USQL, Big Data, Snowflake, Snow SQL, Snow Pipe, Spark, Spark Streaming, Spark RDD, PySpark, Spark SQL, Scala, Hive,
Anthem, Inc, San Diego, CA Sep2022 to Oct 2023
Data Engineer Responsibilities:
Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Sqoop, Hive, Spark, Kafka and Pyspark.
Worked on MapR platform team for performance tuning of hive and spark jobs of all users. Using Hive TEZ engine to increase the performance of the applications.
Working on incidents created by users for platform team on hive and spark issues by monitoring Hive and Spark logs and fixing it or else by raising MapR cases.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
Worked on Hadoop Data Lake for ingesting data from different sources such as Oracle and Teradata through INFOWORKS ingestion tool.
Worked on ARCADIA for creating analytical views on top of tables as if the batch is loading also no issue in reporting or table locks as it will point to arcadia view.
Worked on Python API for converting assigned group level permissions to table level permission using MapR ace by creating a unique role and assigning through EDNA UI
Developed ETL pipelines in and out of data warehouse using a combination of Python and Snowflakes Snow SQL Writing SQL queries against Snowflake.
Utilized SSIS to extract data from Access, Seibel and Excel forms to perform the transformation, cleansing and loading into the data warehouse.
Created OLAP cubes from relational data warehouse to develop measures, dimensions, and business
critical KPIs and representing aggregations in different ways - hierarchically and using custom groupings
using SSAS.
Provided the application analyses, data modeling, metadata management, and integration testing
functions for existing applications and the new performance assessment and invoice visibility applications.
Created MDX queries according to business requirements.
Environment: Hadoop, Python, HDFS, Hive, Scala, MapReduce, Agile, Cassandra, Kafka, Storm, AWS, YARN, Spark, ETL, Teradata, NoSQL, Oozie, Java, Cassandra, Talend, LINUX, Kibana, HBase.
Reliance Digital, HYDERABAD, IND Jan 2020 to July 2022
Data Engineer
Responsibilities:
Responsible for implementation, administration and management of Hadoop infrastructures and worked on Amazon Web Services (AWS) cloud services to do machine learning on big data.
Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
Evaluation of Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters and involved in cluster monitoring and troubleshooting Hadoop issues.
Worked with application teams to install OSs and Hadoop updates, patches, version upgrades as required.
Strong understanding of Hadoop eco systems such as HDFS, MapReduce and HBase.
Utilized Amazon Web Services to perform big data analytics.
Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS
Reviewed the HDFS usage and system design for future scalability and fault-tolerance. Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, Sqoop.
Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
Processed HDFS data and created external tables using Hive, to analyze visitors per day, page views and most purchased products.
Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
Developed Hive queries for the analysts.
Environment: HDFS, MapReduce, Hive, AWS, Sqoop, HBase, power bi.
Denken Solutions India Pvt ltd Jan 2019 to Dec 2019
Data analyst Responsibilities:
Designed & Created Test Cases based on the Business requirements (Also referred Source to Target Detailed mapping document & Transformation rules document).
Involved in extensive DATA validation using SQL queries and back-end testing
Used SQL for Querying the database in UNIX environment
Developed separate test cases for ETL process (Inbound & Outbound) and reporting
Involved with Design and Development team to implement the requirements.
Developed and Performed execution of Test Scripts manually to verify the expected results
Design and development of ETL processes using Informatica ETL tool for dimension and fact file creation
Functionality testing of email notification in ETL job failures, abort or data issue problems.
Identify, assess and intimate potential risks associated to testing scope, quality of the product and schedule
Created and executed test cases for ETL jobs to upload master data to repository.
Involved in Manual and Automated testing using QTP and Quality Center.
Conducted Black Box – Functional, Regression and Data Driven. White box – Unit and Integration Testing (positive and negative scenarios).
Created SSIS package to check XML file validation and data validation to match the requirements
Using Power BI as a front-end BI tool and Microsoft SQL Server 2012 as a back-end database designed and developed workbooks, dashboards, global filter page, and complex parameters-based calculations.
Experience in optimizing the database by creating various clustered, non-clustered indexes and indexed views. Also modified and created Joins to Improve the Database execution speed.
Designed and Developed matrix, tabular and parametric, Drill down and Drill Through reports wif Drop down menu option using SSRS and scheduled to generate all daily, weekly, monthly and quarterly Reports including the current status.
Used Power BI, Power Pivot to develop data analysis prototype, and used Power View to visualize reports.
Created, Maintained & scheduled various reports in SSRS like Tabular Reports, Matrix Reports, Parameterized Reports, Sub reports, Cross Tabs, Drill down reports using SSRS.
Deploying and scheduling Reports using SSRS to generate all daily, weekly, monthly and quarterly Reports including current status.
Modified the automated scripts from time to time to accommodate the changes/upgrades in the application interface.
Tested the database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
Environment: Windows XP, Informatica Power Center 6.1/7.1, QTP 9.2, Test Director 7.x, Load Runner 7.0, Oracle 10g, UNIX AIX 5.2, PERL, Shell Scripting
Education
Bachelors in technology – Electronics and communication Engineering JNTUK, India.
Masters in computer science Wichita State University, Kansas (KS), USA.