Experience
At Facebook, I am working on the bottom layer of our distributed system stack.
- sharding-as-a-service: We built a holistic sharding-as-a-Service solution that provides high
availability, load balancing, fault tolerance, automatical scaling, and manageability to a wide spectrum
of distributed applications. Our system is a core piece of our infrastructure, and manages a large
number of services of different types - stateless, Google Spanner-like paxos-based, caching, batch
processing, interactive, etc. See our SOSP'21 paper
Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications for detail.
Prior to joining Facebook, I pursued my Ph.D. degree at Indiana University. I was a member of Digital Science Center and supervised by Prof. Geoffrey Fox. My
research was on distributed systems especially data parallel systems and science gateways/portals.
- Data locality and task scheduling in MapReduce:
I thoroughly investigated how to improve multiple critical aspects of MapReduce such as data locality,
load balancing, resource utilization and speculative execution. Given the huge volume of data modern big
data systems process, data locality is crucial. I was among the first to deeply analyze how commonly used
factors such as replication factor and number of servers affect data locality. I quantified their
relationship with mathematical model, and proposed innovative task scheduling algorithm that significantly
improves data locality. The resultant publication is
Investigation of data locality in mapreduce
and
Investigation of data locality and fairness in MapReduce
.
Perceiving that the artificially imposed partitioning of resources into map/reduce slots causes
significant resource underutilization, I proposed multiple innovative improvements that boost efficiency.
In
Automatic task re-organization in MapReduce
, I presented mechanisms to dynamically split and consolidate tasks to cope with load imbalancing
and break through the concurrency limit resulting from fixed task granularity. For single-job system, two
algorithms were proposed for circumstances where prior knowledge is known and unknown. For multi-job case,
I proposed a modified shortest-job-first strategy, which minimizes job turnaround time theoretically when
combined with task splitting.
In
Improving Resource Utilization in MapReduce
, I proposed resource stealing to enable running tasks to steal resources reserved for idle slots
and give them back proportionally whenever new tasks are assigned. Resource stealing makes the otherwise
wasted resources get fully utilized without interfering with normal job scheduling. I also proposed
Benefit Aware Speculative Execution (BASE) which evaluates the potential benefit of speculative tasks and
eliminates unnecessary runs. In
Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization
,
I investigated the performance of MapReduce in heterogeneous network environments and proposed novel
network heterogeneity aware scheduling algorithm.
- Expand MapReduce:
Observing the limited environments and application types MapReduce supports, I proposed new architectures
that greatly expand the scenarios MapReduce can be used. I proposed a new paradigm Hierarchical MapReduce
that enables MapReduce to be deployed on top of geographically dispersed compute clusters across research
institutes and universities. The work was published as
A hierarchical framework for cross-domain MapReduce execution
. Some projects in our lab ran data processing pipeline consisting of multiple applications which
needed to be manually scheduled to run on appropriate platforms including Hadoop and Twister (iterative
MapReduce). I designed Hybrid MapReduce that allows users to orchestrate complicated
processing
workflows across multiple runtime platforms without worrying about implementation detail (e.g.
copy/transform data between platforms). The work was published as
HyMR: a Hybrid MapReduce Workflow System
.
-
Science Gateways:
Besides backend distributed systems, I also worked intensively on revolutionizing science portal/gateway
development. I applied cutting-edge web technologies such as OpenID, OAuth, AJAX, OpenSocial, gadgets and
widgets into science gateway development. This significantly improved reusability, flexibility, and
agility. I shared my work in open source project
OGCE (Open Gateway Computing Environments)
which was the
largest initiative to innovate the accessibility of large computer clusters and received millions of
dollars fund from
National Science Foundation. OGCE has been used by researchers from
various fields including
Geographic Information Systems, atmospheric discovery, earthquake modeling and simulation, macromolecule
data processing, SocialCloud (sustainable resource sharing), bioinformatics data analysis, and TeraGrid
OAuth (cross-platform authorization). A series of papers were published:
Building the PolarGrid Portal Using Web 2.0 and OpenSocial
,
Cyberaide JavaScript: A JavaScript Commodity Grid Kit
,
Investigating the Use of Gadgets, Widgets, and OpenSocial to Build Science Gateways
,
The QuakeSim Portal and Services: New Approaches to Science Gateway Development TechniquesÂ
,
Open Community Development for Science Gateways with Apache Rave
, and
Using Web 2.0 for Scientific Applications and Scientific Communities
.