Research Papers
Storage Systems
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
- The Hadoop Distributed File System
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services
- PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database
- Fast key-value stores: An idea whose time has come and gone
- Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
- Dynamo: Amazon’s Highly Available Key-value Store
- Spanner: Google’s Globally-Distributed Database
- Bigtable: A Distributed Storage System for Structured Data
- IPFS - Content Addressed, Versioned, P2P File System
Computational Frameworks
- MapReduce: Simplified Data Processing on Large Clusters
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- Spark: Cluster Computing with Working Sets
- Spark SQL: Relational Data Processing in Spark
- The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
- Twister: A Runtime for Iterative MapReduce
- WarpFlow: Exploring Petabytes of Space-Time Data
- Big Data normalization for massively parallel processing databases
Cluster Management
- ZooKeeper: Wait-free coordination for Internet-scale systems
- The Chubby lock service for loosely-coupled distributed systems
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- Large-scale cluster management at Google with Borg
Streaming Data and Data Representation
- Storm @Twitter
- Apache Flink: Stream and Batch Processing in a Single Engine
- Scaling Big Data Mining Infrastructure: The Twitter Experience
- Thrift: Scalable Cross-Language Services Implementation
Algorithms
- Algorithmic Nuggets in Content Delivery
- Cuckoo Filter: Practically Better Than Bloom
- Less Hashing, Same Performance: Building a Better Bloom Filter
- Automatically Generating Interesting Facts from Wikipedia Tables
- The PageRank Citation Ranking: Bringing Order to the Web
- Random Sampling with a Reservoir