Rakesh Duguri
Data Engineer
+1-848-***-**** ************@*****.*** United States https://www.linkedin.com/in/rakesh-duguri- 0a5810261/
SUMMARY
Experienced Data Engineer with over 7+ years of expertise in designing and implementing robust data pipelines and infrastructure across diverse industries. Proficient in programming languages including Python, Java, and Scala, with a strong focus on optimizing data movement efficiency and processing times. Expert at leveraging a variety of data storage solutions, from SQL and NoSQL databases to cloud platforms like Amazon Redshift, Google Big Query, and Snowflake.
TECHNICAL SKILLS
• Programming Languages: Python, Java, Scala
• Data Storage: SQL, NoSQL databases, Amazon Redshift, Google BigQuery, Snowflake, Data Lakes
• Big Data Technologies: Apache Hadoop, Apache Spark, Databricks, Apache Flink
• Data Processing Frameworks: Apache Kafka, Apache NiFi
• ETL Tools: Apache Airflow, Talend, Pentaho, Informatica
• Cloud Platforms: Amazon Web Services, Microsoft Azure, Google Cloud Platform
• Containerization and Orchestration: Docker, Kubernetes
• Data Integration: Apache Camel, MuleSoft
• Version Control: Git
• Database Management Systems: MySQL, PostgreSQL, Oracle, Snowflake
• Data Quality and Profiling: Apache Griffin, Trifacta, Talend Data Quality
• Collaboration Tools: Jira, Confluence
• Monitoring and Logging: ELK Stack, Prometheus
• Scripting and Automation: Shell scripting, PowerShell
• Continuous Integration/Continuous Deployment: Jenkins, GitLab CI
• Data Governance: Apache Atlas, Collibra
• Machine Learning
PROFESSIONAL EXPERIENCE
UPS March 2023 Present
SANDY SPRINGS, ATLANTA GA
Data Engineer
• Designed and implemented robust data pipelines and infrastructure using Python and SQL, optimizing data movement efficiency and processing times.
• Executed comprehensive data quality measures and monitoring systems, ensuring compliance with regulations and enhancing data governance.
• Collaborated cross-functionally to translate complex business needs into innovative data solutions, improving data-driven decision-making accuracy.
• Engineered and optimized data pipelines using Airflow, Kafka, and Luigi, streamlining data flow and reducing latency.
• Implemented data quality tools, Great Expectations and Trifacta, to reduce errors and accelerate the identification of data anomalies.
• Implemented and automated complex data workflows using Pentaho’s graphical interface, improving data processing efficiency and reducing manual intervention.
• Utilized cloud technologies, including AWS as the primary platform, to implement cost-effective and scalable solutions.
• Utilized Pentaho for designing and deploying ETL processes, enhancing data integration and processing efficiency.
• Engaged in full systems life cycle management activities, providing key insights for analyses, technical requirements, and coding.
• Implemented version control using Git, ensuring seamless collaboration and tracking of code changes.
• Executed strategic optimizations in data warehousing solutions, including BigQuery, Snowflake, Redshift.
• Played a pivotal role in the adoption of additional cloud platforms like GCP, expanding the company's cloud capabilities.
• Conducted training sessions for the team on Python and SQL integration for enhanced data manipulation.
• Implemented automated monitoring systems for IoT devices, reducing response time to device issues.
• Developed and maintained documentation for data pipelines and infrastructure, streamlining knowledge transfer.
• Collaborated with external vendors to seamlessly integrate APIs, reducing data integration time.
• Actively participated in industry conferences and forums, staying abreast of the latest trends and technologies.
• Assisted in the onboarding of new team members, providing mentorship and support. Samsung January 2019 – December 2020
Data Engineer
• Spearheaded the development and implementation of scalable data pipelines at Samsung, enhancing data ingestion, transformation, and storage processes, optimizing overall data processing efficiency.
• Designed and implemented ETL/ELT workflows on Data Warehouses and Data Lakes utilizing technologies such as Snowflake and Redshift, improving data processing times and ensuring timely delivery of insights.
• Orchestrated complex data flows using tools like Airflow, ensuring seamless automation and monitoring of data pipelines, enhancing data pipeline reliability and quality.
• Collaborated closely with stakeholders to understand and translate business needs into technical requirements, fostering effective communication between data engineers, analysts, data scientists, and software engineers.
• Utilized Python and Java to develop and optimize data processing algorithms, reducing code execution time.
• Played a pivotal role in establishing infrastructure to strategically analyze big data, fostering a data-driven culture within the organization.
• Integrated SQL for efficient database management, improving data retrieval times.
• Created interactive reports and dashboards using Pentaho BI tools, enabling data-driven decision-making and enhancing business insights.
• Leveraged cloud platforms such as AWS, Azure, and GCP, demonstrating familiarity with relevant data services to enhance data processing capabilities.
• Leveraged Pentaho for building and optimizing ETL processes, facilitating efficient data extraction and transformation.
• Implemented Apache Airflow, Luigi, Prefect, Kafka, and NiFi for data pipeline orchestration, streamlining workflows and reducing manual intervention.
• Maintained version control using Git, ensuring code integrity and collaboration efficiency within the data engineering team.
• Streamlined data pipelines and improved performance by configuring and tuning Pentaho transformations and jobs for high efficiency and minimal latency.
• Optimized Linux-based systems to support data engineering tasks, improving system performance.
• Successfully collaborated with cross-functional teams, providing technical leadership and support to achieve common data goals and deliver valuable insights.
• Trained and mentored junior data engineers, fostering skill development and knowledge transfer within the team.
• Participated in performance tuning and optimization initiatives, optimizing resource utilization and costs.
• Demonstrated a keen focus on continuous learning and staying updated on the latest industry trends, contributing to the implementation of cutting-edge technologies in data engineering processes. Cardinal Health Inc April 2015 – December 2018
Data Engineer
• Spearheaded the design and implementation of robust data pipelines, ensuring seamless ETL processes from diverse sources into data warehouses.
• Orchestrated the development and maintenance of data models, guaranteeing data consistency and accuracy, and enhancing accessibility for analysts and scientists.
• Automated critical data workflows, implementing cutting-edge code and tools that streamlined data pipelines and optimized overall data processing speed.
• Integrated Pentaho to automate and manage complex data workflows, enhancing pipeline reliability and performance.
• Implemented stringent data quality checks and security measures, ensuring compliance with industry regulations.
• Troubleshooted and debugged data pipeline issues promptly, minimizing disruptions and ensuring continuous and smooth data flow.
• Collaborated seamlessly with analysts and scientists, understanding and addressing their data needs, fostering a cohesive team environment.
• Designed and executed cloud-based data solutions, leveraging AWS, Azure, and GCP for enhanced scalability.
• Developed tools and scripts for data ingestion, processing, and manipulation, enhancing automation and reducing data processing time.
• Authored comprehensive documentation for data pipelines and models, facilitating knowledge transfer and ensuring smooth onboarding of new team members.
• Contributed significantly to performance optimization and scalability efforts, enhancing system responsiveness.
• Leveraged BigQuery, Snowflake, and Redshift for data warehousing and lakes, optimizing data storage and retrieval processes.
• Employed Airflow and Luigi for data processing automation, improving workflow efficiency.
• Demonstrated proficiency in Git for version control, ensuring effective collaboration and code management within the data engineering team.
• Applied database administration knowledge, including MySQL and PostgreSQL, for efficient query optimization.
EDUCATION
• Master of Science in Branch Wilmington University, Delaware
• Bachelor of Technology in Branch DRK college of engineer and Technology, India CERTIFICATIONS
• Google Cloud Ceritified Professional Data Engineer