Zhenhua(Gerald) Guo

Ph.D. in Computer Science
Infrastructure Engineer, Facebook
Google Scholar, LinkedIn

Academic Qualifications

Experience

At Facebook, I am working on the bottom layer of our distributed system stack.

Prior to joining Facebook, I pursued my Ph.D. degree at Indiana University. I was a member of Digital Science Center and supervised by Prof. Geoffrey Fox. My research was on distributed systems especially data parallel systems and science gateways/portals.

  1. Data locality and task scheduling in MapReduce: I thoroughly investigated how to improve multiple critical aspects of MapReduce such as data locality, load balancing, resource utilization and speculative execution. Given the huge volume of data modern big data systems process, data locality is crucial. I was among the first to deeply analyze how commonly used factors such as replication factor and number of servers affect data locality. I quantified their relationship with mathematical model, and proposed innovative task scheduling algorithm that significantly improves data locality. The resultant publication is Investigation of data locality in mapreduce and Investigation of data locality and fairness in MapReduce . Perceiving that the artificially imposed partitioning of resources into map/reduce slots causes significant resource underutilization, I proposed multiple innovative improvements that boost efficiency. In Automatic task re-organization in MapReduce , I presented mechanisms to dynamically split and consolidate tasks to cope with load imbalancing and break through the concurrency limit resulting from fixed task granularity. For single-job system, two algorithms were proposed for circumstances where prior knowledge is known and unknown. For multi-job case, I proposed a modified shortest-job-first strategy, which minimizes job turnaround time theoretically when combined with task splitting. In Improving Resource Utilization in MapReduce , I proposed resource stealing to enable running tasks to steal resources reserved for idle slots and give them back proportionally whenever new tasks are assigned. Resource stealing makes the otherwise wasted resources get fully utilized without interfering with normal job scheduling. I also proposed Benefit Aware Speculative Execution (BASE) which evaluates the potential benefit of speculative tasks and eliminates unnecessary runs. In Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization , I investigated the performance of MapReduce in heterogeneous network environments and proposed novel network heterogeneity aware scheduling algorithm.
  2. Expand MapReduce: Observing the limited environments and application types MapReduce supports, I proposed new architectures that greatly expand the scenarios MapReduce can be used. I proposed a new paradigm Hierarchical MapReduce that enables MapReduce to be deployed on top of geographically dispersed compute clusters across research institutes and universities. The work was published as A hierarchical framework for cross-domain MapReduce execution . Some projects in our lab ran data processing pipeline consisting of multiple applications which needed to be manually scheduled to run on appropriate platforms including Hadoop and Twister (iterative MapReduce). I designed Hybrid MapReduce that allows users to orchestrate complicated processing workflows across multiple runtime platforms without worrying about implementation detail (e.g. copy/transform data between platforms). The work was published as HyMR: a Hybrid MapReduce Workflow System .
  3. Science Gateways: Besides backend distributed systems, I also worked intensively on revolutionizing science portal/gateway development. I applied cutting-edge web technologies such as OpenID, OAuth, AJAX, OpenSocial, gadgets and widgets into science gateway development. This significantly improved reusability, flexibility, and agility. I shared my work in open source project OGCE (Open Gateway Computing Environments) which was the largest initiative to innovate the accessibility of large computer clusters and received millions of dollars fund from National Science Foundation. OGCE has been used by researchers from various fields including Geographic Information Systems, atmospheric discovery, earthquake modeling and simulation, macromolecule data processing, SocialCloud (sustainable resource sharing), bioinformatics data analysis, and TeraGrid OAuth (cross-platform authorization). A series of papers were published: Building the PolarGrid Portal Using Web 2.0 and OpenSocial , Cyberaide JavaScript: A JavaScript Commodity Grid Kit , Investigating the Use of Gadgets, Widgets, and OpenSocial to Build Science Gateways , The QuakeSim Portal and Services: New Approaches to Science Gateway Development Techniques  , Open Community Development for Science Gateways with Apache Rave , and Using Web 2.0 for Scientific Applications and Scientific Communities .

Patents

Patents that reference my work

Patent Num.AssigneeURL
US8924978IBMhttps://www.google.com/patents/US8924978
US9020802Emchttps://www.google.com/patents/US9020802
US9158843Emchttps://www.google.com/patents/US9158843
US8645916Microsofthttps://www.google.com/patents/US8645916
US9176720Google https://www.google.com/patents/US9176720
US9148429Google https://www.google.com/patents/US9148429
CN103885835Thomson Licensinghttps://www.google.com/patents/CN103885835A
US8924977IBMhttps://www.google.com/patents/US8924977
US9201690IBMhttps://www.google.com/patents/US9201690
US8959138IBMhttps://www.google.com/patents/US8959138
US9053067IBMhttps://www.google.com/patents/US9053067
US8539514Verizon Patent And Licensing Inc.https://www.google.com/patents/US8539514

Dissertation

International Conference and Journal Papers

  1. Sangmin Lee, Zhenhua Guo, et al
    Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications
    SOSP'21
  2. Zhenhua Guo, Geoffrey Fox, Mo Zhou, Yang Ruan
    Improving Resource Utilization in MapReduce (IEEE, ACM)
    IEEE International Conference on Cluster Computing 2012 (CLUSTER'12).
  3. Zhenhua Guo, Geoffrey Fox and Mo Zhou
    Investigation of Data Locality and Fairness in MapReduce (ACM)
    The Third International Workshop on MapReduce and its Applications (MAPREDUCE'12)
  4. Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox
    HyMR: a Hybrid MapReduce Workflow System (ACM)
    The 3rd International Emerging Computational Methods for the Life Sciences Workshop (ECMLS'12)
  5. Zhenhua Guo, Geoffrey Fox
    Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization (ACM)
    The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12).
  6. Zhenhua Guo, Geoffrey Fox, Mo Zhou
    Investigation of Data Locality in MapReduce (IEEE, ACM)
    The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12).
  7. Yuan Luo, Beth Plale, Zhenhua Guo, Wilfred W. Li, Judy Qiu, Yiming Sun.
    Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing (CCPE)
    Concurrency and Computation: Practice and Experience 2011
  8. Zhenhua Guo, Marlon Pierce, Geoffrey Fox, Mo Zhou.
    Automatic Task Re-organization in MapReduce (IEEE)
    IEEE International Conference on Cluster Computing (CLUSTER'11) (PDF)
  9. Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu, and Wilfred Li,
    A Hierarchical Framework for Cross-Domain MapReduce Execution (ACM)
    The 2nd International Emerging Computational Methods for the Life Sciences Workshop (ECMLS'11) (PDF)
  10. Marlon Pierce, Suresh Marru, Carol Song, Sudhakar Pamidighantam, Shaowen Wang, Borries Demeler, Emre Brookes, Zhenhua Guo, Yan Liu, David Braun, Raminder Singh, Bruce Dubbs, Ye Fan and Lan Zhao
    Open Gateway Computing Environments: Tools for Science Gateway Development
    TeraGrid'11 (Poster)
  11. Marlon Pierce, Raminderjeet Singh, Zhenhua Guo, Suresh Marru, Pairoj Rattadilok, and Ankur Goyal,
    Open community development for science gateways with apache rave (ACM)
    In Proceedings of the 2011 ACM workshop on Gateway computing environments (GCE '11). ACM, New York, NY, USA, 29-36
  12. Zhenhua Guo, Raminderjeet Singh, Marlon Pierce, Yan Liu.
    Investigating the Use of Gadgets, Widgets, and OpenSocial to Build Science Gateways (ACM, IEEE)
    The 7th IEEE International Conference on e-Science (eScience'11)
  13. Zhenhua Guo and Marlon Pierce
    Lightweight OGCE Gadget Portal for Science Gateways
    TeraGrid'10 Student research poster
  14. Zhenhua Guo, R. Singh, and Marlon Pierce.
    Building the PolarGrid Portal Using Web 2.0 and OpenSocial (ACM)
    The 5th Gateway Computing Environments workshop (GCE'09). (PDF)
  15. Marlon Pierce, Xiaoming Gao, Sangmi Pallickara, Zhenhua Guo, Geoffrey Fox.
    QuakeSim Portal and Services: new approaches to science gateway development techniques (ACM, CCPE)
    Concurrency and Computation: Practice and Experience Special Issue on Computation and Informatics in Earthquake Science: The ACES Perspective 6th ACES International workshop Cairns, Australia 11 - 16 May 2008 (PDF)
  16. Gregor von Laszewski, Fugang Wang, Andrew Younge, Xi He, Zhenhua Guo, Marlon Pierce,
    Cyberaide JavaScript: A JavaScript Commodity Grid Kit (IEEE)
    The 4th Gateway Computing Environments workshop (GCE'08)
  17. Marlon Pierce, Geoffrey Fox, Jong Choi, Zhenhua Guo, Xiaoming Gao, and Yu Ma,
    Using Web 2.0 for Scientific Applications and Scientific Communities (CCPE)
    Concurrency and Computation: Practice and Experience Special Issue for 3rd International Conference on Semantics, Knowledge and Grid SKG2007 Xian China, October 28-30 2007 (PDF)

Publication Review (Recent)

Technical Reports

Talks (incomplete)

Selected Services and Activities