Home

Share:

Technology

Open Source as an Engineering Strategy

Over the past few years, our research team at PLAYERUNKNOWN Productions built a range of internal machine learning systems, datasets, and research projects. Like many engineering teams, most of that work stayed internal. It worked, but it was fragmented, difficult to maintain, and largely inaccessible outside the team.

At the same time, we started running into recurring issues. Systems were hard to reproduce, onboarding new contributors required significant context, and knowledge was effectively locked inside the organization. Valuable work existed, but it was not easy to reuse or extend.

This led us to a simple question:

What if we treated research and machine learning systems the same way we treat production software, structured, versioned, reproducible, and designed to be shared?

Exploring that question pushed us toward open source. Not as a final step after a project is done, but as a way to improve how we design, build, and maintain systems from the beginning.

This article is the first in a series about moving from internal research and engineering projects to open source machine learning systems. Here, we focus on the strategic layer: why open source matters, what changes when you take it seriously, and how to approach it in a structured way.

Author: Hirad Emamialagha

Rethinking Open Source

Our initial assumption was simple: open source means publishing code on GitHub with a README.

That assumption did not hold for long.

Well-structured open source projects, especially in machine learning, are not just repositories. In practice, they rarely exist as a single repository. They form small ecosystems of related components.

These systems typically include:

  • Code

  • Model weights

  • Datasets or data pipelines

  • Documentation

  • Research papers or reports

  • Examples and tutorials

  • Tooling and CI/CD

  • Community and governance

Releasing only code is rarely enough. Without context, reproducibility, and usable artifacts, most projects are difficult to adopt and easy to abandon.

A more useful definition is:

Open source is not sharing code. It is packaging a system so others can understand it, run it, and build on top of it.

Open Source Improves Internal Engineering

One of the most important insights we had was that open source is not only an external activity. It has strong internal effects.

Preparing a project for open source is one of the fastest ways to expose engineering debt.

It forces you to:

  • Remove hidden dependencies

  • Eliminate environment assumptions

  • Clean up repository structure

  • Write real documentation

  • Make the system reproducible from scratch

In practice, this often improves the internal quality of the project significantly, regardless of whether it is ever released publicly.

This leads to a shift in mindset:

Projects should be built as if they could be open sourced later, even if that decision has not been made yet.

That single constraint changes how you design systems.

Open Source as a Long-Term Strategy

From an engineering perspective, open source is not a one-time release. It is a long-term strategy that operates across multiple dimensions.

For engineers, open source becomes a portfolio of real work:

  • Code quality

  • System design

  • Documentation

  • Engineering practices

Unlike a resume, it shows how you actually build systems.

For organizations, open source contributes to:

  • Technical reputation

  • Hiring and talent attraction

  • Collaboration opportunities

  • Visibility in specific domains

For the broader ecosystem, it enables:

  • Reproducibility

  • Shared tooling

  • Accelerated research and development

Open source sits at the intersection of all three.

Before You Open Source Anything

Not every project should be open sourced, and not every project is ready.

Over time, we found that most issues fall into a small number of categories. To keep things simple, we reduced this into a checklist we use before any release.

1. Legal and Licensing

You need to ensure that:

  • No proprietary components are exposed

  • No internal infrastructure is referenced

  • No restricted data is included

You also need to choose a license that defines how others can use the project.

2. Maintenance Commitment

Open sourcing a project creates expectations.

Even small projects can generate:

  • Issues

  • Questions

  • Feature requests

  • Contributions

Open source is not just publishing. It is committing to a certain level of ongoing maintenance and communication.

3. Repository Quality

Internal repositories often contain:

  • Experimental code

  • Hardcoded paths

  • Implicit assumptions

  • Missing documentation

External users have none of the internal context.

If someone cannot clone your repository and run it on a clean machine, it is not ready.

Open Source Readiness Checklist

Before making a project public, we now use a simple checklist:

  • No secrets or internal references

  • License selected and added

  • Installation works on a clean environment

  • Documentation is complete and clear

  • Examples or notebooks are provided

  • Basic validation or tests exist

  • Contribution guidelines are defined

If these are not satisfied, the project is not ready, regardless of how good the code is.

The Four Pillars of Open Source ML

A useful way to think about open sourcing machine learning projects is through four core pillars:

  1. Code : training, evaluation, inference

  2. Models : weights, checkpoints, configurations

  3. Datasets : or reproducible data pipelines

  4. Research and Documentation : papers, reports, explanations

Most projects only release code. That is usually insufficient.

The most useful and widely adopted projects combine at least:

  • Code

  • Model artifacts

  • Documentation

  • Examples

Without these, reproducibility breaks down.

What We Tested in Practice

We started small by publishing internal tools and utilities rather than full research systems. This reduced risk, shortened feedback loops, and helped us build internal experience with:

  • Repository preparation

  • Documentation standards

  • Licensing decisions

  • CI/CD for public projects

We also experimented with publishing models through platforms like Hugging Face. This introduced a different set of requirements:

  • Model cards

  • Usage examples

  • Configuration clarity

Releasing models is not the same as releasing code.

In parallel, we explored reproducible research workflows using version-controlled LaTeX projects, Research-PaperOps. Treating research papers like software artifacts, with versioning, CI builds, and structured repositories, proved to be a powerful approach.

Open source, in this sense, became part of a broader effort toward reproducible engineering and research.

Challenges We Encountered

The process was not trivial.

One of the biggest challenges was uncovering hidden assumptions. Many internal systems depended on:

  • Specific directory structures

  • Internal services

  • Undocumented environment setups

Making these systems runnable in a clean environment required significant effort.

Documentation was another major challenge.

Internally, many things feel obvious. Externally, nothing is.

Writing clear documentation often required as much effort as implementing parts of the system itself.

We also had to rethink how we structure repositories:

  • What belongs together

  • What should be separated

  • How to organize multi-repository systems

These challenges were not drawbacks. They were signals. In most cases, addressing them improved both external usability and internal quality.

Strategic Considerations

Open source is a strategic decision first and foremost.

Open source is not always the right decision, and treating it as a default can be as problematic as ignoring it completely.

Releasing a project can expose:

  • Architectural decisions

  • System limitations

  • Areas still under development

In some cases, this transparency is beneficial. In others, it may conflict with business or competitive considerations.

The decision to open source should balance:

  • Engineering benefits

  • Organizational goals

  • Long-term maintenance capacity

Key Takeaways

Open source should be treated as an engineering and research strategy, not a place to upload code.

Preparing a project for open source improves its structure, documentation, and reproducibility. The most useful projects go beyond code and include models, datasets, and clear documentation. And open source is not a one-time action. It is an ongoing commitment.

A simple test is this:

If someone outside your organization can understand, run, and extend your project without additional explanation, you are not just ready to open source it, you have already engineered it properly.

What Comes Next

In the next article, we will focus on the practical side:

  • Repository structure

  • Licensing decisions

  • Documentation standards

  • CI/CD pipelines for research and ML projects

References and Resources