Post Job Free
Sign in

Data Engineer

Location:
Miami, FL
Posted:
January 25, 2013

Contact this candidate

Resume:

Download PDF

Ahmed Radwan

*.********@*****.***

Cell# (786) 543-

7772

INTERESTS

I have a diversity of interests focused around areas of cloud computing, data/metadata management, semantics

and data integration, with emphasis on using advancements in these areas to build solutions that are useful for

customers.

WORK EXPERIENCE

Cloudera Inc., Nov. 2010 present Palo Alto, CA

Senior R&D Engineer Platforms

Designed and developed solutions for efficient transfer and processing of massive amounts of structured and

unstructured data on distributed/cloud computing environments. This work involved challenging problems in

terms of devising efficient techniques for data extraction/loading using optimized import/export interfaces

supported by the different databases and Enterprise Data Warehouses. Other challenges included the metadata

management and data integration problems across such autonomous distributed systems, and the analysis and

optimization of the performance and scalability of such solutions. Investigating techniques for better resources

utilization for MapReduce jobs in terms of resource management and scheduling strategies.

Contributions to various open-

source projects including Apache Hadoop Common, MapReduce and Yarn, Apache

Sqoop and Apache Flume.

Apache Sqoop committer and member of the Project Management Committee (PMC). Sqoop is a open-

source

tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as

relational databases.

Apache Flume committer and PMC member. Flume is an open-

source distributed, reliable, and available service

for efficiently collecting, aggregating, and moving large amounts of log data.

Yahoo! Inc., Nov. 2008 Nov. 2010 Sunnyvale, CA

Senior Software Engineer Cloud Computing

Building solutions for managing massive amounts of structured and unstructured data on distributed/cloud

computing environments. Design of efficient models and techniques for metadata management, data processing

and performance optimization on the cloud.

Conceived, designed and led the development of MapReduce-

Legos; a data processing abstraction layer on top of

Hadoop MapReduce; this layer provides a refined model for MapReduce jobs, enabling an optimized way of

describing and running Extract Transform and Load (ETL) workflows on Hadoop MapReduce clusters. The

project is used by production systems to process petabytes of data on daily basis at Yahoo! Inc. The work was

also published and demoed in a number of internal Yahoo! conferences. A peer-

reviewed article describing this

achievement was published in the International Journal of Cloud Computing.

Designed and developed a declarative SQL query engine on top of Hadoop MapReduce and distributed file system.

IBM Research, May Aug. 2006, May Aug. 2007 San Jose, CA

Graduate Research Intern -

Information Integration (IBM Almaden Research Center).

Designed a novel similarity measure and top-

k enumeration algorithm used to quantify the distance between

schema concepts in the schema integration problem to efficiently calculate the best k candidate integrated

schemas. This work has significant importance in metadata management in cloud computing systems as it

facilitates the process of federating data from multiple autonomous data sources and generating a unified non-

redundant representation of the data. This work had led to the publication of an article in the ACM SIGMOD

conference.

Studied the problem of expressing Extract, Transform and Load (ETL) dataflows using declarative mapping

semantics, and vice versa. This work is adopted by IBM and is being productized as the FastTrack component of

IBM Information Server. These contributions led to the submission of a patent disclosure describing this work.

The work was also demoed in the IBM Information On Demand (IOD) conference and an article detailing the

work was published in the IEEE ICDE.

University of Miami, Jan 2005 Oct. 2008 Miami, FL

Research/Teaching Assistant Electrical and Computer Engineering

Conducting research on information integration in a grid environment with applications on bioinformatics.

Designed a web services-

based data federation architecture for bioinformatics applications. The system is called

Biofederator and was awarded the prestigious IBM Faculty Award in 2009. Based on collaborations with

bioinformatics researchers, several domain-

specific data federation challenges and needs are identified. The

BioFederator addresses such challenges and provides an architecture that incorporates a series of utility services.

These address issues like automatic workflow composition, domain semantics, and the distributed nature of the

data. It also incorporates a series of data-

oriented services that facilitate the actual integration of data. The

BioFederator is deployed on a grid environment over the web. An article describing the design was published in

the AAAI IIWeb; additional details and applications were presented in a book chapter published by IOS Press.

Studied and developed novel data integration and processing techniques for data intensive applications, this

work was applied to a bioscience study where, for the first time, we presented a whole genome prediction of

nucleosome exclusion regions for the human genome. The details of this work were published in an article in the

BMC Genomics journal in 2008 and featured in the 57th Annual Meeting of the American Society of Human

Genetics (ASHG). Also the output results were made available to the scientific community as part of the

University of California at Santa Cruz (UCSC) Genome Browser custom data tracks.

Research team member of the Latin American Grid (LAGrid-

BioGrid) project; BioGrid is addressing research

issues for enabling grid computing technologies in bioinformatics applications. My work focused on: studying,

designing and developing Grid/Web services for bioinformatics applications.

Electronics Research Institute (ERI), Feb. 1999 Aug. 2004 Cairo, Egypt

Researcher Electrical and Computer Engineering

Member of the Parallel Processing Team and team leader for a number of projects sponsored by the National

Science Foundation (NSF) U.S.A. and the European Union.

Designed parallel/distributed texture segmentation techniques that were applied on real-

time distributed

surface inspection systems. I have discovered simple, elegant, and yet very powerful and useful parallel

algorithms that advanced the real-

time distributed computing inspection systems. These scientific contributions

led to publishing an article in the IEEE SMC conference, and another detailed article in Elsevier PRL journal.

MentorGraphics Corporation, Jul. 2001 Jul. 2002 Cairo, Egypt

R&D Engineer Modeling and Interconnectix (ICX)

Studying problems in Electronics Design Automation (EDA) and designing and developing EDA tools and

simulation packages, these tools were used in The IBIS to Spice converter, to generate SPICE models from IBIS

data sheet files. My studies and designs enhanced the modeling process in terms of time and accuracy.

PUBLICATIONS

Peer-

reviewed Book Chapters:

Rosa Badia, Gargi Dasgupta, Onyeka Ezenwoye, Liana Fong, Howard Ho, Sawsan Khuri, Yanbin Liu, Steve Luis,

Anthony Praino, Jean-

Pierre Prost, Ahmed Radwan, Seyed Masoud Sadjadi, Shivkumar Shivaji, Balaji

Viswanathan, Patrick Welsh, and Akmal Younis, "High Performance Computing and Grids in Action, chapter

Innovative Grid Technologies Applied to Bioinformatics and Hurricane Mitigation," pp. 436-

462, IOS Press, ISBN

978-

1-

58603-

839-

7, Amsterdam, 2008.

Peer-

reviewed Articles in Journals:

Ahmed Radwan, Akmal Younis, Santhosh Srinivasan and Abhay Gupta, MR-

LEGOS: A Refined MapReduce Model,

International Journal of Cloud Computing (IJCC) 1(1), 2011, pp. 58-

80.

Ahmed Radwan, Akmal Younis, Peter Luykx and Sawsan Khuri, "Prediction and analysis of nucleosome exclusion

regions in the human genome," BMC Genomics, 2008, pp. 9:186.

Ahmed Abouelela Radwan, Hazem M. Abbas, Hesham Eldeeb, Abdelmonem A. Wahdan and Salwa M. Nassar,

"Automated Vision System for Localizing Structural Defects in Textile Fabrics," Elsevier Pattern Recognition

Letters, 26, 2005, pp. 1435-

1443.

Peer-

reviewed Articles in Conferences:

Ahmed Radwan, Lucian Popa, Ioana Roxana Stanoi, Akmal A. Younis, Top-

k generation of integrated schemas

based on directed and weighted correspondences, ACM SIGMOD Conference, 2009, pp. 641-

654.

Stefan Dessloch, Mauricio A. Hernandez, Ryan Wisnesky, Ahmed Radwan, Jindan Zhou, Orchid:Integrating

Schema Mapping and ETL, IEEE International Conference on Data Engineering (ICDE), 2008, pp. 1307-

1316.

Ahmed Radwan, Akmal Younis, Mauricio Hernandez, Howard Ho, Lucian Popa, Shivkumar Shivaji, and Sawsan

Khuri, "BioFederator: A Data Federation System for Bioinformatics on the Web," Proc. AAAI Sixth Int. Workshop

on Information Integration on the Web (IIWeb) 2007, pp. 92-

97.

A. Abouelela Radwan, H. Abbas, H. El deeb, S. Nassar, "A statistical approach for textile fault detection," Proc. IEEE

conference System, Man, Cybernetics (SMC), 2000, pp. 2857-

2861.

Presentations, Abstracts and Posters:

Ahmed Radwan, Santhosh Srinivasan and Kalyan Ayloo, MR-

LEGOS: A Data Warehousing ETL Toolkit, Yahoo!

TechPulse conference, 2010.

Ahmed Radwan, Ryota Egashira, Brian Keefe, A MapReduce Approach for Efficient Data Extraction from

Database Management Systems, Yahoo! TechPulse conference, 2010.

Ahmed Radwan and Abhay Gupta, Lotus MapReduce Legos, Yahoo! TechPulse conference, 2009.

Sawsan Khuri, Ahmed Radwan, Peter Luykx and Akmal Younis, Nucleosome Exclusion Regions across the

Human Genome, American Society of Human Genetics (ASHG) 57th Annual Meeting, San Diego, California, 23-

27

October 2007.

Ahmed Radwan, Lucian Popa and Ioana R. Stanoi, Calculating Confidences and A Cost Function for Ranking

Schema Integration Alternatives, IBM Almaden Research Center Intern Showcase, 2007.

Ahmed M. Radwan, Ryan Wisnesky, Jindan Zhou, Didier Garcia, Bo Shao, Stefan Dessloch, Mauricio A. Hernandez,

Lucian Popa and Howard Ho, Orchid: ETL Mapping Transformation with Clio, IBM Almaden Research Center

Intern Showcase, 2006.

EDUCATION

Doctor of Philosophy (Ph.D.) in Electrical and Computer Engineering.

Thesis Title: Information Integration in a Grid Environment -

Applications in the Bioinformatics Domain. University

of Miami, U.S.A, Dec. 2010, GPA: 4.0.

Master of Science (MS) in Electrical and Computer Engineering.

Thesis title: Image processing -

Statistical Approach for Texture segmentation -

An implementation on a parallel

inspection system. Ain Shams University, Cairo, Egypt, Aug. 2002.

Bachelor of Science (BS) in Electrical and Computer Engineering.

Ain Shams University, Cairo, Egypt, 1998. Graduation Project: V-

CAD: An FPGA based Design Flow. (Grade:

Distinction). Electronic design automation tool including a schematic capture, a VHDL netlister, an automatic test

pattern generation and a PLA synthesis tool. The tool was featured in the Design Automation & Test in Europe

(DATE) conference in the year 1999, and was developed using Visual C++.

HONORS

-

Membership of the Eta Kappa Nu HKN International Honor Society for Electrical Engineers (2006 Present).

-

Membership of the Institute of Electrical and Electronics Engineering IEEE (2008-

Present).

-

Membership of the Association for Computing Machinery ACM (2009-

Present).

-

My BioFederator research work was awarded the prestigious IBM faculty award in 2009.

SERVICE TO PROFESSION

Conference Reviewer: VLDB 2007 The 33rd Very Large Data Bases Conference.

Conference Reviewer: ICDE 2008 The IEEE 24th International Conference on Data Engineering.

Conference Reviewer: ICMT 2010 International Conference on Model Transformation.

TECHNICAL SKILLS

Programming using Java, C/C++, Visual C++, Pascal, Prolog, x86 assembly and network programming using

sockets.

Familiar with the following programming environments: MS Win95/2000/XP, Solaris UNIX, Red Hat Linux,

PARIX (Parallel UNIX), and PVM (Parallel Virtual Machine).

Special purpose languages: VHDL, JavaCC, Lex & Yacc, SQL, XQuery, SPARQL, PHP, JSP, and MATLAB.

ETL and data warehousing tools (IBM Datastage).

Eclipse, Rational Rose, UML, EMF, Apache Axis, web services, HTML, XML, RDF and OWL semantic web

technologies, Hadoop MapReduce, Pig Latin.



Contact this candidate