Oss | Mark Chmarny

DevPulse: Community Health Analytics for the Rest of Us

If you maintain an open source project, you’ve probably wondered: who’s actually contributing? Are we retaining contributors or burning through a revolving door? Is one person carrying the whole project? These questions matter. Commercial platforms answer these questions well. They do come with trade-offs though. Hosted analytics means sending your potentially private repo contributor data to a third party. Pricing models often assume enterprise budgets. And for a maintainer running a project with a handful of repos and no funding, the friction of setting up yet another SaaS integration is enough to never bother. ...

AI Cluster Runtime: Reproducible Configs for GPU-Accelerated Kubernetes Clusters

GPU Kubernetes is hard. Aligning kernels, drivers, container runtimes, operators, and Kubernetes versions is a version compatibility minefield. A single misconfigured component can take down an entire GPU fleet, and root cause analysis can take days. Typically, these known-good configurations live as tribal knowledge in “runbooks” and internal pipelines, not as portable, reproducible artifacts. ...

Reputation scoring for open source contributors: what reputer measures and why

Every dependency you install, every pull request you merge, carries an implicit trust decision. You trust that the person behind the commit is who they claim to be, that their account hasn’t been compromised, and that their contribution is genuine. Most of the time, that trust is warranted. But supply chain attacks like the xz utils backdoor remind us that trust without verification is a vulnerability. ...

Complexity can be learned but abstractions come at a long-term cost

All complexity needs to be abstracted, right? This reductionist statements misses nuance around the inherent cost/benefit tradeoffs, especially when you consider these over time. Don’t get me wrong, there often are good reasons for additional layers to make things simpler (grow adoption, lowering toil, removing friction, etc.). Still, these layers come at the long-term cost that’s often is not a part of the evaluation process. ...

How to debug container image content

When dealing with file permissions in a non-root image or building apps that include static content (like css or templates), I sometime get an error resulting from the final image content mismatch with my expectations. Most of the time the errors are pretty obvious, simple fix and rebuild will do. Sometimes though, you want to take a look into the image and understand what the actual layout looks like in there. ...

Knative momentum continues…

I wrote a new post on Google blog on the momentum behind the Knative project. How it the community reached another adoption milestone, doubling the number of its contributors. Also, another data point underscoring the Knative momentum is the month-over-month contributions which have increased over 45% since the 0.1 release, now representing more than a dozen of different companies. ...

Build and manage modern serverless workloads using Knative on Kubernetes

By now, Kubernetes should be the default target for your deployments. Yes, there are still use-cases where Kubernetes is not the optimal choice, but these represent an increasingly smaller number of modern workloads. The main value of Kubernetes is that it greatly abstracts much of the infrastructure management pain. The broad support amongst virtually all major Cloud Service Providers (CSP) also means that your workloads are portable. Combined with the already vibrant ecosystem of Kubernetes-related tools, means that the experience of the operator, the person responsible for managing Kubernetes, is now pretty smooth. ...

Service, not Volume - data explosion and how to amplify its value

Data is growing at an exponential pace. Based on recent numbers from IDC, the total amount of data in 2015 (4.4ZB) will grow to 44ZB in 2020. Franky, how much is in Zettabyte is almost inconsequential. It is the fact that all of the data generated since the beginning of time (at least the electronic part), will grow 10x in just the next four years that’s shocking! ...

HDFS has won, now de facto standard for centralized data storage

The “high-priests” of Big Data have spoken. Hadoop Distributed File System (HDFS) is now the de facto standard platform for data storage. You may have heard this “heresy” uttered before. But, for me, it wasn’t until the recent Strata conference that I began to really understand how prevalent this opinion actually is. ...

Data-related investments shift from tech to skills — talent new differentiator

Over the last decade, the access to best-of-bread data technologies has become easier. This is due mainly to the increasing popularity of open source software (OSS). While this phenomenon holds true in other areas like operating systems, application servers, development frameworks or even monitoring tools, it is perhaps most prevalent in the area of data. ...