Dr. Zhenhua Guo

Experience

At Facebook, I am working on the bottom layer of our distributed system stack.

sharding-as-a-service: We built a holistic sharding-as-a-Service solution that provides high availability, load balancing, fault tolerance, automatical scaling, and manageability to a wide spectrum of distributed applications. Our system is a core piece of our infrastructure, and manages a large number of services of different types - stateless, Google Spanner-like paxos-based, caching, batch processing, interactive, etc. See our SOSP'21 paper Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications for detail.

Prior to joining Facebook, I pursued my Ph.D. degree at Indiana University. I was a member of Digital Science Center and supervised by Prof. Geoffrey Fox. My research was on distributed systems especially data parallel systems and science gateways/portals.

Data locality and task scheduling in MapReduce: I thoroughly investigated how to improve multiple critical aspects of MapReduce such as data locality, load balancing, resource utilization and speculative execution. Given the huge volume of data modern big data systems process, data locality is crucial. I was among the first to deeply analyze how commonly used factors such as replication factor and number of servers affect data locality. I quantified their relationship with mathematical model, and proposed innovative task scheduling algorithm that significantly improves data locality. The resultant publication is Investigation of data locality in mapreduce and Investigation of data locality and fairness in MapReduce . Perceiving that the artificially imposed partitioning of resources into map/reduce slots causes significant resource underutilization, I proposed multiple innovative improvements that boost efficiency. In Automatic task re-organization in MapReduce , I presented mechanisms to dynamically split and consolidate tasks to cope with load imbalancing and break through the concurrency limit resulting from fixed task granularity. For single-job system, two algorithms were proposed for circumstances where prior knowledge is known and unknown. For multi-job case, I proposed a modified shortest-job-first strategy, which minimizes job turnaround time theoretically when combined with task splitting. In Improving Resource Utilization in MapReduce , I proposed resource stealing to enable running tasks to steal resources reserved for idle slots and give them back proportionally whenever new tasks are assigned. Resource stealing makes the otherwise wasted resources get fully utilized without interfering with normal job scheduling. I also proposed Benefit Aware Speculative Execution (BASE) which evaluates the potential benefit of speculative tasks and eliminates unnecessary runs. In Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization , I investigated the performance of MapReduce in heterogeneous network environments and proposed novel network heterogeneity aware scheduling algorithm.
Expand MapReduce: Observing the limited environments and application types MapReduce supports, I proposed new architectures that greatly expand the scenarios MapReduce can be used. I proposed a new paradigm Hierarchical MapReduce that enables MapReduce to be deployed on top of geographically dispersed compute clusters across research institutes and universities. The work was published as A hierarchical framework for cross-domain MapReduce execution . Some projects in our lab ran data processing pipeline consisting of multiple applications which needed to be manually scheduled to run on appropriate platforms including Hadoop and Twister (iterative MapReduce). I designed Hybrid MapReduce that allows users to orchestrate complicated processing workflows across multiple runtime platforms without worrying about implementation detail (e.g. copy/transform data between platforms). The work was published as HyMR: a Hybrid MapReduce Workflow System .
Science Gateways: Besides backend distributed systems, I also worked intensively on revolutionizing science portal/gateway development. I applied cutting-edge web technologies such as OpenID, OAuth, AJAX, OpenSocial, gadgets and widgets into science gateway development. This significantly improved reusability, flexibility, and agility. I shared my work in open source project OGCE (Open Gateway Computing Environments) which was the largest initiative to innovate the accessibility of large computer clusters and received millions of dollars fund from National Science Foundation. OGCE has been used by researchers from various fields including Geographic Information Systems, atmospheric discovery, earthquake modeling and simulation, macromolecule data processing, SocialCloud (sustainable resource sharing), bioinformatics data analysis, and TeraGrid OAuth (cross-platform authorization). A series of papers were published: Building the PolarGrid Portal Using Web 2.0 and OpenSocial , Cyberaide JavaScript: A JavaScript Commodity Grid Kit , Investigating the Use of Gadgets, Widgets, and OpenSocial to Build Science Gateways , The QuakeSim Portal and Services: New Approaches to Science Gateway Development Techniques , Open Community Development for Science Gateways with Apache Rave , and Using Web 2.0 for Scientific Applications and Scientific Communities .

Patent Num.	Assignee	URL
US8924978	IBM	https://www.google.com/patents/US8924978
US9020802	Emc	https://www.google.com/patents/US9020802
US9158843	Emc	https://www.google.com/patents/US9158843
US8645916	Microsoft	https://www.google.com/patents/US8645916
US9176720	Google	https://www.google.com/patents/US9176720
US9148429	Google	https://www.google.com/patents/US9148429
CN103885835	Thomson Licensing	https://www.google.com/patents/CN103885835A
US8924977	IBM	https://www.google.com/patents/US8924977
US9201690	IBM	https://www.google.com/patents/US9201690
US8959138	IBM	https://www.google.com/patents/US8959138
US9053067	IBM	https://www.google.com/patents/US9053067
US8539514	Verizon Patent And Licensing Inc.	https://www.google.com/patents/US8539514

Patent Num.

Assignee

URL

US8924978

IBM

https://www.google.com/patents/US8924978

US9020802

Emc

https://www.google.com/patents/US9020802

US9158843

Emc

https://www.google.com/patents/US9158843

US8645916

Microsoft

https://www.google.com/patents/US8645916

US9176720

Google

https://www.google.com/patents/US9176720

US9148429

Google

https://www.google.com/patents/US9148429

CN103885835

Thomson Licensing

https://www.google.com/patents/CN103885835A

US8924977

IBM

https://www.google.com/patents/US8924977

US9201690

IBM

https://www.google.com/patents/US9201690

US8959138

IBM

https://www.google.com/patents/US8959138

US9053067

IBM

https://www.google.com/patents/US9053067

US8539514

Verizon Patent And Licensing Inc.

https://www.google.com/patents/US8539514

International Conference and Journal Papers

Sangmin Lee, Zhenhua Guo, et al
Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications
SOSP'21

Zhenhua Guo, Geoffrey Fox, Mo Zhou, Yang Ruan
Improving Resource Utilization in MapReduce (IEEE, ACM)
IEEE International Conference on Cluster Computing 2012 (CLUSTER'12).

Zhenhua Guo, Geoffrey Fox and Mo Zhou
Investigation of Data Locality and Fairness in MapReduce (ACM)
The Third International Workshop on MapReduce and its Applications (MAPREDUCE'12)

Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox
HyMR: a Hybrid MapReduce Workflow System (ACM)
The 3rd International Emerging Computational Methods for the Life Sciences Workshop (ECMLS'12)

Zhenhua Guo, Geoffrey Fox
Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization (ACM)
The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12).

Zhenhua Guo, Geoffrey Fox, Mo Zhou
Investigation of Data Locality in MapReduce (IEEE, ACM)
The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12).

Yuan Luo, Beth Plale, Zhenhua Guo, Wilfred W. Li, Judy Qiu, Yiming Sun.
Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing (CCPE)
Concurrency and Computation: Practice and Experience 2011

Zhenhua Guo, Marlon Pierce, Geoffrey Fox, Mo Zhou.
Automatic Task Re-organization in MapReduce (IEEE)
IEEE International Conference on Cluster Computing (CLUSTER'11) (PDF)

Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu, and Wilfred Li,
A Hierarchical Framework for Cross-Domain MapReduce Execution (ACM)
The 2nd International Emerging Computational Methods for the Life Sciences Workshop (ECMLS'11) (PDF)

Marlon Pierce, Suresh Marru, Carol Song, Sudhakar Pamidighantam, Shaowen Wang, Borries Demeler, Emre Brookes, Zhenhua Guo, Yan Liu, David Braun, Raminder Singh, Bruce Dubbs, Ye Fan and Lan Zhao
Open Gateway Computing Environments: Tools for Science Gateway Development
TeraGrid'11 (Poster)

Marlon Pierce, Raminderjeet Singh, Zhenhua Guo, Suresh Marru, Pairoj Rattadilok, and Ankur Goyal,
Open community development for science gateways with apache rave (ACM)
In Proceedings of the 2011 ACM workshop on Gateway computing environments (GCE '11). ACM, New York, NY, USA, 29-36

Zhenhua Guo, Raminderjeet Singh, Marlon Pierce, Yan Liu.
Investigating the Use of Gadgets, Widgets, and OpenSocial to Build Science Gateways (ACM, IEEE)
The 7th IEEE International Conference on e-Science (eScience'11)

Zhenhua Guo and Marlon Pierce
Lightweight OGCE Gadget Portal for Science Gateways
TeraGrid'10 Student research poster

Zhenhua Guo, R. Singh, and Marlon Pierce.
Building the PolarGrid Portal Using Web 2.0 and OpenSocial (ACM)
The 5th Gateway Computing Environments workshop (GCE'09). (PDF)

Marlon Pierce, Xiaoming Gao, Sangmi Pallickara, Zhenhua Guo, Geoffrey Fox.
QuakeSim Portal and Services: new approaches to science gateway development techniques (ACM, CCPE)
Concurrency and Computation: Practice and Experience Special Issue on Computation and Informatics in Earthquake Science: The ACES Perspective 6th ACES International workshop Cairns, Australia 11 - 16 May 2008 (PDF)

Gregor von Laszewski, Fugang Wang, Andrew Younge, Xi He, Zhenhua Guo, Marlon Pierce,
Cyberaide JavaScript: A JavaScript Commodity Grid Kit (IEEE)
The 4th Gateway Computing Environments workshop (GCE'08)

Marlon Pierce, Geoffrey Fox, Jong Choi, Zhenhua Guo, Xiaoming Gao, and Yu Ma,
Using Web 2.0 for Scientific Applications and Scientific Communities (CCPE)
Concurrency and Computation: Practice and Experience Special Issue for 3rd International Conference on Semantics, Knowledge and Grid SKG2007 Xian China, October 28-30 2007 (PDF)

Publication Review (Recent)

Parallel Simulation of Complex Evacuation Scenarios with Adaptive Agent Models
IEEE Transactions on Parallel and Distributed Systems (TPDS). Manuscript ID: TPDS-2013-01-0081

PIOD: A Parallel I/O Dispatch Model Based Lustre File System for Virtual Machine Storage
CloudCom12

Big Data: Principles and best practices of scalable realtime data systems
Nathan Marz and James Warren
The website for the book: https://www.manning.com/books/big-data

Distributed and Cloud Computing: From Parallel Processing to the Internet of Things
I came up with standard answers to the questions.

Hadoop in Practice
Alex Holmes
https://www.manning.com/books/hadoop-in-practice

A General Metric and Parallel Framework for Adaptive Image Fusion
Concurrency and Computation: Practice and Experience. Manuscript ID: cpe-12-0348

Answering the demands of digital genomics
Concurrency and Computation: Practice and Experience. Manuscript ID: CPE-11-0256

MapReduce Delay Scheduling with Deadline Constraint
Concurrency and Computation: Practice and Experience. Manuscript ID: cpe-12-0288

Design and Implementation of Task Scheduling Strategies for Massive Remote Sensing Data Processing Across Multiple Datacenters
Software: Practice and Experience. Manuscript: ID: SPE-13-0048

A Survey of MapReduce Ecosystems
ClusterComputing Manuscript ID: CLUS-D-12-00455

Improving the Path Stability and Power Efficiency with Mobility Prediction in Mobile Ad hoc Network
ClusterComputing Manuscript ID: CLUS-D-12-00481

Diversifying Search Results through Pattern-based Subtopic Modeling
International Journal on Semantic Web and Information Systems (IJSWIS)

Heterogeneous Resource Federation with a Centralized Security Model for Information Extraction
Journal of Internet Services and Applications (JISA). Manuscript ID: JISA-D-12-00169

Chimera: Efficient Multiresolution Data Representation through Chimeric Organization
Journal of Internet Services and Applications (JISA). Manuscript ID: JISA-D-12-00166

Flux: Quality-Driven Dataflow Model for Data Intensive Computing
Journal of Internet Services and Applications (JISA). Manuscript ID: JISA-D-12-00177

Large-scale volunteer computing over the Internet
Journal of Internet Services and Applications (JISA). Manuscript ID: JISA-D-12-00180

Providing and Integrating the Building Blocks for Data-Intensive SOEDA
Journal of Internet Services and Applications (JISA). Manuscript ID: JISA-D-12-00179

Natural Disaster Monitoring with Wireless Sensor Networks: A Case Study of Data-intensive Applications upon Low-cost Scalable Systems
Mobile Networks and Applications (MONE), Security-aware & Data Intensive Low-cost Mobile Networks and Applications. Manuscript ID: MONE-1142

Parallelizing Modified Cuckoo Search on MapReduce Architecture (subreview)
CloudCom12

Deadline Scheduling for MapReduce Environment (subreview)

ALPA: Adaptive Load Partitioning and Allocation for Data-Intensive Weather Applications (subreview)
CCGrid12

Predictive Scheduling and Prefetching for Hadoop clusters (subreview)

ThemisMR: An I/O-Efficient MapReduce (subreview)
ACM SOCC12

Controlled Ratio: Scheduling multiresource jobs efficiently (subreview)
ACM SOCC12

Talks (incomplete)

Zhenhua Guo, Task Scheduling in MapReduce. Beihang University, China

Zhenhua Guo and Raminder Singh, Building Polargrid Portal using Gadgets and OpenSocial (PDF)

Marlon Pierce, Surresh Marru, Zhenhua Guo, Fugang Wang, Rion Dooley, Wenjun Wu, and Gregor von Laszewksi, "Building Science Gateways and Managing Workflows with the Open Grid Computing Environment Toolkit", SC 09 (PPT)

Marlon Pierce and Zhenhua Guo, OAuth Security for Gateways, TeraGrid 09, Arlington, VA (PPT)

Selected Services and Activities

Journal reviewer: IEEE Transactions on Parallel and Distributed Systems, Concurrency and Computation: Practice and Experience, Software: Practice and Experience, ClusterComputing, International Journal on Semantic Web and Information Systems, Mobile Networks and Applications, Journal of Internet Services and Applications
Conference paper reviewer and subreviewer: CloudCom, SoCC, CCGrid
Book reviewer: Manning Publications.
I was a member of FutureGrid Experts Team
I am an Apache committer in RAVE project, which tries to build a generic web 2.0 friendly portal framework using OpenSocial gadgets and W3C widgets. Our main responsibility is to add science gateway specific features (e.g. integrate with existing grid computing systems and High Performance Computing clusters).
I contributed significantly to open source project OGCE (Open Gateway Computing Environments), which has been used by researchers from different fields.
I am a committer to open source project Cyberaide.

Academic Qualifications

Experience

Patents

Patents that reference my work

Dissertation

International Conference and Journal Papers

Publication Review (Recent)

Technical Reports

Talks (incomplete)

Selected Services and Activities