osdi 2021 accepted papers

The main contribution of this paper is GoJournal, a verified, concurrent journaling system that provides atomicity for storage applications, together with Perennial 2.0, a framework for formally specifying and verifying concurrent crash-safe systems. See www.cs.cmu.edu/~mmv/Veloso.html for her scientific publications. The paper review process is double-blind. The overhead of GPT is 5% for memory-intensive workloads (e.g., Redis) and negligible for CPU-intensive workloads (e.g., RV8 and Coremarks). At a high level, Addra follows a template in which callers and callees deposit and retrieve messages from private mailboxes hosted at an untrusted server. We implement and evaluate a suite of applications, including MICA, Raft and Set Algebra for document retrieval; and we demonstrate that the nanoPU can be used as a high performance, programmable alternative for one-sided RDMA operations. Instead of choosing among a small number of known algorithms, our approach searches in a "policy space" of fine-grained actions, resulting in novel algorithms that can outperform existing algorithms by specializing to a given workload. Camera-ready submission (all accepted papers): 2 April 2021; Main conference program: 27-28 April 2021; All deadline times are . We propose a new framework for computing the embeddings of large-scale graphs on a single machine. People often assume that blockchain has Byzantine robustness, so adding it to any system will make that system super robust against any calamity. Our further evaluation on 38 CVEs from 10 commonly-used programs shows that SanRazor reduced checks suffice to detect at least 33 out of the 38 CVEs. Furthermore, by combining SanRazor with an existing sanitizer reduction tool ASAP, we show synergistic effect by reducing the runtime cost to only 7.0% with a reasonable tradeoff of security. Our evaluation on the SPEC benchmarks shows that SanRazor can reduce the overhead of sanitizers significantly, from 73.8% to 28.062.0% for AddressSanitizer, and from 160.1% to 36.6124.4% for UndefinedBehaviorSanitizer (depending on the applied reduction scheme). Professor Veloso is on leave from Carnegie Mellon University as the Herbert A. Simon University Professor in the School of Computer Science, and the past Head of the Machine Learning Department. Thanks to selective profiling, DMons profiling overhead is 1.36% on average, making it feasible for production use. Zeph executes privacy-adhering data transformations in real-time and scales to thousands of data sources, allowing it to support large-scale low-latency data stream analytics. Moreover, as of October 2020, a review of the 50 most cited empirical papers that list personality as a keyword indicates that all 50 papers were authored by people with insti tutional affiliations in the United States, Canada, Germany, the UK, and New Zealand, and only three papers included samples outside of these regions (see Supplementary Precision Conservation: Linking Set-aside and Working Lands Policy PET then automatically corrects results to restore full equivalence. Federated Learning (FL) is an emerging direction in distributed machine learning (ML) that enables in-situ model training and testing on edge data. Overall, the OSDI PC accepted 31 out of 165 submissions. Paper Submission Information All submissions must be received by 11:59 PM AoE (UTC-12) on the day of the corresponding deadline. The 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) will take place as a virtual event on July 1416, 2021. One important reason for the high cost is, as we observe in this paper, that many sanitizer checks are redundant the same safety property is repeatedly checked leading to unnecessarily wasted computing resources. This is unfortunate because good OS design has always been driven by the underlying hardware, and right now that hardware is almost unrecognizable from ten years ago, let alone from the 1960s when Unix was written. The key insight in blk-switch is that Linux's multi-queue storage design, along with multi-queue network and storage hardware, makes the storage stack conceptually similar to a network switch. Ethereum is the second-largest blockchain platform next to Bitcoin. Paper abstracts and proceedings front matter are available to everyone now. If you are uncertain about how to anonymize your submission, please contact the program co-chairs, osdi21chairs@usenix.org, well in advance of the submission deadline. We built an FPGA prototype of the nanoPU fast path by modifying an open-source RISC-V CPU, and evaluated its performance using cycle-accurate simulations on AWS FPGAs. First, it enables a caller to push a message to a callee in two hops, using a new way of assigning mailboxes to users that resembles how a post office assigns PO boxes to its customers. If you have any questions about conflicts, please contact the program co-chairs. Research Impact Score 9.24. . JEL codes: Q18, Q28, Q57 . Professor Veloso is the Past President of AAAI (the Association for the Advancement of Artificial Intelligence), and the co-founder, Trustee, and Past President of RoboCup. If the conference registration fee will pose a hardship for the presenter of the accepted paper, please contact conference@usenix.org. As a member of ACCT, I have served two years on the bylaws and governance committee and two years on the finance and audit committee. Further, Vegito can recover from cascading machine failures by using the columnar backup in less than 60 ms. Existing systems that hide voice call metadata either require trusted intermediaries in the network or scale to only tens of users. Widely used log-search tools like Elasticsearch and Splunk Enterprise index the logs to provide fast search performance, yet the size of the index is within the same order of magnitude as the raw log size. Fortunately, we observe that the backups for high availability in modern distributed OLTP systems can be retrofitted to bridge the analytical queries and transactions in HTAP workloads. (Registered attendees: Sign in to your USENIX account to download these files. Leveraging these information, Pollux dynamically (re-)assigns resources to improve cluster-wide goodput, while respecting fairness and continually optimizing each DL job to better utilize those resources. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models. HotNets 2021: Call for Papers - sigcomm Second, Fluffy uses multiple existing Ethereum clients that independently implement the specification as cross-referencing oracles. Samantha Vaive - Member Board Of Trustees - Lansing Community College Some recent schedulers choose job resources for users, but do so without awareness of how DL training can be re-optimized to better utilize the provided resources. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library (3.02 faster on average) and NeuGraph (up to 4.10 faster), on mainstream GNN architectures across various datasets. Pages should be numbered, and figures and tables should be legible in black and white, without requiring magnification. In some cases, the quality of these artifacts is as important as that of the document itself. For instance, FAST 21 and NSDI 21 have author-notification dates after the OSDI 21 abstract-registration deadline. We also welcome work that explores the interface to related areas such as computer architecture, networking, programming languages, analytics, and databases. As increasingly more sensitive data is being collected to gain valuable insights, the need to natively integrate privacy controls in data analytics frameworks is growing in importance. Session Chairs: Dushyanth Narayanan, Microsoft Research, and Gala Yadgar, TechnionIsrael Institute of Technology, Jinhyung Koo, Junsu Im, Jooyoung Song, and Juhyung Park, DGIST; Eunji Lee, Soongsil University; Bryan S. Kim, Syracuse University; Sungjin Lee, DGIST. PET discovers and applies program transformations that improve computation efficiency but only maintain partial functional equivalence. Existing algorithms are designed to work well for certain workloads. Weak Links in Authentication Chains: A Large-scale Analysis of Email Sender Spoofing Attacks Authors may submit a response to those reviews until Friday, March 5, 2021. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning, Oort: Efficient Federated Learning via Guided Participant Selection, PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections, Modernizing File System through In-Storage Indexing, Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes, Rearchitecting Linux Storage Stack for s Latency and High Throughput, Optimizing Storage Performance with Calibrated Interrupts, ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction, DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling, CLP: Efficient and Scalable Search on Compressed Text Logs, Polyjuice: High-Performance Transactions via Learned Concurrency Control, Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing, The nanoPU: A Nanosecond Network Stack for Datacenters, Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator, Scalable Memory Protection in the PENGLAI Enclave, NrOS: Effective Replication and Sharing in an Operating System, Addra: Metadata-private voice communication over fully untrusted infrastructure, Bringing Decentralized Search to Decentralized Services, Finding Consensus Bugs in Ethereum via Multi-transaction Differential Fuzzing, MAGE: Nearly Zero-Cost Virtual Memory for Secure Computation, Zeph: Cryptographic Enforcement of End-to-End Data Privacy, It's Time for Operating Systems to Rediscover Hardware, DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols, GoJournal: a verified, concurrent, crash-safe journaling system, STORM: Refinement Types for Secure Web Applications, Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation, SANRAZOR: Reducing Redundant Sanitizer Checks in C/C++ Programs, Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads, GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs, Marius: Learning Massive Graph Embeddings on a Single Machine, P3: Distributed Deep Graph Learning at Scale. Call for Papers - EuroSys 2022 An evaluation of Addra on a cluster of 80 machines on AWS demonstrates that it can serve 32K users with a 99-th percentile message latency of 726 msa 7 improvement over a prior system for text messaging in the same threat model. She also invented the spanning tree algorithm, which transformed Ethernet from a technology that supported a few hundred nodes, to something that can support large networks. A.H. Hunter, Jane Street Capital; Chris Kennelly, Paul Turner, Darryl Gove, Tipp Moseley, and Parthasarathy Ranganathan, Google. See the USENIX Conference Submissions Policy for details. P3 exposes a simple API that captures many different classes of GNN architectures for generality. The device then "calibrates" its interrupts to completions of latency-sensitive requests. SC is being increasingly adopted by industry for a variety of applications. By monitoring the status of each job during training, Pollux models how their goodput (a novel metric we introduce that combines system throughput with statistical efficiency) would change by adding or removing resources. She also has made contributions in network security, including scalable data expiration, distributed algorithms despite malicious participants, and DDOS prevention techniques. SOSP Conference - Home - ACM Digital Library Memory allocation represents significant compute cost at the warehouse scale and its optimization can yield considerable cost savings. Welcome to the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) submissions site. Third, GNNAdvisor capitalizes on the GPU memory hierarchy for acceleration by gracefully coordinating the execution of GNNs according to the characteristics of the GPU memory structure and GNN workloads. AI enables principled representation of knowledge, complex strategy optimization, learning from data, and support to human decision making. We focus on NVMe storage devices and show that it is natural to express these semantics in the kernel and the application and only requires a modest two-bit change to the device interface. For realistic workloads, KEVIN improves throughput by 68% on average. In particular, responses must not include new experiments or data, describe additional work completed since submission, or promise additional work to follow. Responses should be limited to clarifying the submitted work. All deadline times are 23:59 hrs UTC. The co-chairs may then share that paper with the workshops organizers and discuss it with them. We propose Marius, a system for efficient training of graph embeddings that leverages partition caching and buffer-aware data orderings to minimize disk access and interleaves data movement with computation to maximize utilization. In this paper, we propose Oort to improve the performance of federated training and testing with guided participant selection. They collectively make the backup fresh, columnar, and fault-tolerant, even facing millions of concurrent transactions per second. This approach misses possible optimization opportunities as transformations that only preserve equivalence on subsets of the output tensors are excluded. This paper demonstrates that it is possible to achieve s-scale latency using Linux kernel storage stack, even when tens of latency-sensitive applications compete for host resources with throughput-bound applications that perform read/write operations at throughput close to hardware capacity. We also verified a simple NFS server using GoJournals specs, which confirms that they are helpful for application verification: a significant part of the proof doesnt have to consider concurrency and crashes. We present Nap, a black-box approach that converts concurrent persistent memory (PM) indexes into NUMA-aware counterparts. Pollux promotes fairness among DL jobs competing for resources based on a more meaningful measure of useful job progress, and reveals a new opportunity for reducing DL cost in cloud environments. Uniquely, Dorylus can take advantage of serverless computing to increase scalability at a low cost. Existing frameworks optimize tensor programs by applying fully equivalent transformations, which maintain equivalence on every element of output tensors. Manuela M. Veloso is the Head of J.P. Morgan AI Research, which pursues fundamental research in areas of core relevance to financial services, including data mining and cryptography, machine learning, explainability, and human-AI interaction. First, Fluffy mutates and executes multi-transaction test cases to find consensus bugs which cannot be found using existing fuzzers for Ethereum. The biennial ACM Symposium on Operating Systems Principles is the world's premier forum for researchers, developers, programmers, and teachers of computer systems technology. My paper has accepted to appear in the EuroSys2020; I will have a talk at the Hotstorage'19; The Paper about GCMA Accepted to TC; The ZNS+ also allows each zone to be overwritten with sparse sequential write requests, which enables the LFS to use threaded logging-based block reclamation instead of segment compaction. Tao Luo, Mingen Pan, Pierre Tholoniat, Asaf Cidon, and Roxana Geambasu, Columbia University; Mathias Lcuyer, Microsoft Research. Abstract registrations that do not provide sufficient information to understand the topic and contribution (e.g., empty abstracts, placeholder abstracts, or trivial abstracts) will be rejected, thereby precluding paper submission. There are two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which are expensive to purchase and maintain, and 2) limited memory on GPUs cannot scale to today's billion-edge graphs. Reviews will be available for response on Wednesday, March 3, 2021. Jiang Zhang, University of Southern California; Shuai Wang, HKUST; Manuel Rigger, Pinjia He, and Zhendong Su, ETH Zurich. We implement a variant of a log-structured merge tree in the storage device that not only indexes file objects, but also supports transactions and manages physical storage space. News Baris Kasikci's Home Page - Electrical Engineering and Computer Her robot soccer teams have been RoboCup world champions several times, and the CoBot mobile robots have autonomously navigated for more than 1,000km in university buildings. In this paper, we propose a software-hardware co-design to support dynamic, fine-grained, large-scale secure memory as well as fast-initialization. USENIX NSDI, 2021 Acceptance Rate: 15.99% Fluid: Resource-Aware Hyperparameter Tuning Engine P. Yu*, J. Liu*, M. Chowdhury (*Equal contribution) MLSys, 2021 Acceptance Rate: 23.53% NetLock: Fast, Centralized Lock Management Using Programmable Switches Z. Yu, Y. Zhang, V. Braverman, M. Chowdhury, X. Jin ACM SIGCOMM, 2020 Acceptance Rate: 21.6% Unfortunately, because devices lack the semantic information about which I/O requests are latency-sensitive, these heuristics can sometimes lead to disastrous results. In 2023 I started another two-year term on the . Of the 26 submitted artifacts: 26 artifacts received the Artifacts Available badge (100%). Used Zotero to organize papers about the stress and diffusion between anode and electrolyte and made a summary . Mingyu Li, Jinhao Zhu, and Tianxu Zhang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Cheng Tan, Northeastern University; Yubin Xia, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Sebastian Angel, University of Pennsylvania; Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. To resolve the problem, we propose a new LFS-aware ZNS interface, called ZNS+, and its implementation, where the host can offload data copy operations to the SSD to accelerate segment compaction. HotCRP.com signin Sign in using your HotCRP.com account. Tej Chajed, MIT CSAIL; Joseph Tassarotti, Boston College; Mark Theng, MIT CSAIL; Ralf Jung, MPI-SWS; M. Frans Kaashoek and Nickolai Zeldovich, MIT CSAIL. We also propose two file system techniques for ZNS+-aware LFS. To enable FL developers to interpret their results in model testing, Oort enforces their requirements on the distribution of participant data while improving the duration of federated testing by cherry-picking clients. Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding, University of California, Santa Barbara. Authors must limit their responses to (a) correcting factual errors in the reviews or (b) directly addressing questions posed by reviewers. Software Systems Laboratory Wins Best Paper Award at OSDI 2022 As the emerging trend of graph-based deep learning, Graph Neural Networks (GNNs) excel for their capability to generate high-quality node feature vectors (embeddings). Web pages today commonly include large amounts of JavaScript code in order to offer users a dynamic experience. Prepublication versions of the accepted papers from the summer submission deadline are available below. Compared to a state-of-the-art fuzzer, Fluffy improves the fuzzing throughput by 510 and the code coverage by 2.7 with various optimizations: in-process fuzzing, fuzzing harnesses for Ethereum clients, and semantic-aware mutation that reduces erroneous test cases. . The file system performance of the proposed ZNS+ storage system was 1.33--2.91 times better than that of the normal ZNS-based storage system. Mothy joined the Computer Science Department ETH Zurich in January 2007 and was named Fellow of the ACM in 2013 for contributions to operating systems and networking research. Our evaluation shows that DistAI successfully verifies 13 common distributed protocols automatically and outperforms alternative methods both in the number of protocols it verifies and the speed at which it does so, in some cases by more than two orders of magnitude. OSDI brings together professionals from academic and industrial backgrounds in a premier forum for discussing the design, implementation, and implications of systems software. The NAL eliminates remote PM accesses to hot items without inducing extra local PM accesses. While several new GNN architectures have been proposed, the scale of real-world graphsin many cases billions of nodes and edgesposes challenges during model training. First, GNNAdvisor explores and identifies several performance-relevant features from both the GNN model and the input graph, and use them as a new driving force for GNN acceleration. This paper describes the design, implementation, and evaluation of Addra, the first system for voice communication that hides metadata over fully untrusted infrastructure and scales to tens of thousands of users. Commonly used log archival and compression tools like Gzip provide high compression ratio, yet searching archived logs is a slow and painful process as it first requires decompressing the logs. For conference information, see: . Proceedings Front Matter When further combined with a simple caching strategy, our evaluation shows that P3 is able to outperform existing state-of-the-art distributed GNN frameworks by up to 7. We demonstrate that Marius achieves the same level of accuracy but is up to one order of magnitude faster. NrOS is primarily constructed as a simple, sequential kernel with no concurrency, making it easier to develop and reason about its correctness. We develop a prototype of Zeph on Apache Kafka to demonstrate that Zeph can perform large-scale privacy transformations with low overhead. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. OSDI'20: 14th USENIX Conference on Operating Systems Design and ImplementationNovember 4 - 6, 2020 ISBN: 978-1-939133-19-9 Published: 04 November 2020 Sponsors: ORACLE, VMware, Google Inc., Amazon, Microsoft Get Alerts for this Conference Save to Binder Export Citation Bibliometrics Citation count 96 Downloads (6 weeks) 317 Downloads (12 months) DMons targeted optimizations provide 16.83% speedup on average (up to 53.14%), compared to a baseline that uses the highest level of compiler optimization. Report - Systems Research Artifacts A glance at this year's OSDI program shows that Operating Systems are a small niche topic for this conference, not even meriting their own full session. To this end, we propose GNNAdvisor, an adaptive and efficient runtime system to accelerate various GNN workloads on GPU platforms. The hybrid segment recycling chooses a proper block reclaiming policy between segment compaction and threaded logging based on their costs. Although the number of submissions is lower than the past, it's likely only due to the late announcement; being in my first OSDI PC, I think the quality of the submitted and accepted papers remains as high as ever. Typically, monolithic kernels share state across cores and rely on one-off synchronization patterns that are specialized for each kernel structure or subsystem. Concurrency control algorithms are key determinants of the performance of in-memory databases. See the Preview Session page for an overview of the topics covered in the program. We also show that Marius can scale training to datasets an order of magnitude beyond a single machine's GPU and CPU memory capacity, enabling training of configurations with more than a billion edges and 550 GB of total parameters on a single machine with 16 GB of GPU memory and 64 GB of CPU memory. Although SSDs can be simplified under the current ZNS interface, its counterpart LFS must bear segment compaction overhead. We develop MAGE, an execution engine for SC that efficiently runs SC computations that do not fit in memory. The papers will be available online to everyone beginning on the first day of the conference, July 14, 2021. Because DistAI starts with the strongest possible invariants, if the SMT solver fails, DistAI does not need to discard failed invariants, but knows to monotonically weaken them and try again with the solver, repeating the process until it eventually succeeds. Jaehyun Hwang and Midhul Vuppalapati, Cornell University; Simon Peter, UT Austin; Rachit Agarwal, Cornell University. Papers so short as to be considered extended abstracts will not receive full consideration. PLDI seeks outstanding research that extends and/or applies programming-language concepts to advance the field of computing. Welcome to the 2021 USENIX Annual Technical Conference (ATC '21) submissions site! One classical approach is to increase the efficiency of an allocator to minimize the cycles spent in the allocator code. Last year, 70% of accepted OSDI papers participated in the . We built a functional NFSv3 server, called GoNFS, to use GoJournal. NSDI 2021 - Research.com Sep 2021 - Present 1 year 7 months. We have made Fluffy publicly available at https://github.com/snuspl/fluffy to contribute to the security of Ethereum. For example, talks may be shorter than in prior years, or some parts of the conference may be multi-tracked. We present Storm, a web framework that allows developers to build MVC applications with compile-time enforcement of centrally specified data-dependent security policies. Kernel code requires manual memory management and type-unsafe code and must efficiently handle complex, asynchronous events. This paper presents Zeph, a system that enables users to set privacy preferences on how their data can be shared and processed. Authors are also encouraged to contact the program co-chairs, osdi21chairs@usenix.org, if needed to relate their OSDI submissions to relevant submissions of their own that are simultaneously under review or awaiting publication at other venues. This is especially true for DPF over Rnyi DP, a highly composable form of DP. She developed the technology for making network routing self-stabilizing, largely self-managing, and scalable. Submissions may include as many additional pages as needed for references but not for appendices. Across a wide range of pages, phones, and mobile networks covering web workloads in both developed and emerging regions, Horcrux reduces median browser computation delays by 31-44% and page load times by 18-37%. blk-switch evaluation over a variety of scenarios shows that it consistently achieves s-scale average and tail latency (at both 99th and 99.9th percentiles), while allowing applications to near-perfectly utilize the hardware capacity. To achieve low overhead, selective profiling gathers runtime execution information selectively and incrementally. This motivates the need for a new approach to data privacy that can provide strong assurance and control to users. This year, there were only 2 accepted papers from UK institutes. We propose PET, the first DNN framework that optimizes tensor programs with partially equivalent transformations and automated corrections. The 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) will take place as a virtual event on July 14-16, 2021. If in doubt about whether your submission to OSDI 2021 and your upcoming submission to SOSP are the same paper or not, please contact the PC chairs by email. Our evaluation shows that NrOS scales to 96 cores with performance that nearly always dominates Linux at scale, in some cases by orders of magnitude, while retaining much of the simplicity of a sequential kernel. The copyback-aware block allocation considers different copy costs at different copy paths within the SSD. USENIX new Date().getFullYear()>document.write(new Date().getFullYear()); Grants for Black Computer Science Students Application, Propose an interesting, compelling solution, Demonstrate the practicality and benefits of the solution, Clearly describe the paper's contributions, Clearly articulate the advances beyond previous work. Youngseok Yang, Seoul National University; Taesoo Kim, Georgia Institute of Technology; Byung-Gon Chun, Seoul National University and FriendliAI. Manuela will present examples and discuss the scope of AI in her research in the finance domain. Questions? Session Chairs: Nadav Amit, VMware Research Group, and Ada Gavrilovska, Georgia Institute of Technology, Stephen Ibanez, Alex Mallery, Serhat Arslan, and Theo Jepsen, Stanford University; Muhammad Shahbaz, Purdue University; Changhoon Kim and Nick McKeown, Stanford University. OSDI - Guide Proceedings Currently, for large graphs, CPU servers offer the best performance-per-dollar over GPU servers.
Shim Sham Step Sheet, Articles O