# Distributed Systems ## Common pattern in distributed systems research paper 1. Define the problem 2. Implement the solution 3. Test the solution ## Challenges in this field - The overhead - Cold-start of containers - solution: bypass the container, use sandbox instead - Initialization of workers/executor - Better [[scheduling]] - Exploiting data locality ## Two types of papers Each year, thousands of papers in distributed system are published. They are categorized into two types: ### Prototyping Most of the papers are in this category. It's okay if nobody's using it. The key point is: - Is your problem really a problem? - Do your ideas work out? ### Production Only a few papers are in this category. Certain projects composed papers only after they have gained a solid user base. These papers usually get accepted "automatically", as their value has already been well proved. Note that, these production systems are extremely challenging to be [[maintenance|maintained]]! - [[parsl|Parsl]] employed full-time developers to implement the system. - [[spark|Apache Spark]] started as a prototyping project, and their paper got rejected at the beginning. They then employed 3 engineers to refactor the whole system to make it production. - [[cctools|ND CCTools]] has senior software engineer Ben Tovar to help maintain the project. - [[tensorflow|TensorFlow]] has 300 software engineers worked 2 years on it. Funding matters! No fund, no engineers.