PVCS Tools

TL;DR: I have just created and published an open source library + command line interface (CLI) to extract information and file content from PVCS log files - if this interests you, the code is on sourcehut as usual: https://git.sr.ht/~phlash/pvcstools

So err why?

If you remember the early days of revision control software on MS-DOS in the late 80s -> early 90s, you might remember the delightful Polytron Version Control System, aka Polytron VCS, aka PVCS.

PVCS was a popular commercial tool that provided similar capabilities to RCS (which was open source and ~1 year older by release date, but very much Unix / Posix based, so not available as a port to MS-DOS).

I used it both in work at BT for my day job, and at home to manage my own stuff. So far, so dull…

At the monthly tech meetup I attend (aka “nerd beers”), a friend reminded me of some toy software we had created that linked message boards together across Netware file servers (the message board package was Threadz Pinboard), to create a message “inter-network”. Later that evening I had a rummage in my archives and found the code - great, I can now pop that on a 3.5" floppy and present my friend with it next month as a surprise / late present from 1992 :)

Now the yak shaving began… the code was managed using PVCS but there were no checked out source files, all I had were the control files, aka log files, in their proprietary binary format. No problem I thought, I’ll go grab an old archive of pvcs stuff from the Inter-webs and extract the source - err nope, it turns out PVCS still exists and is now an expensive (£1500/user) corporate service platform owned / sold by Microfocus! Surely there is something from the past on archive.org I can use.. also nope (but see later). Oh well, it’s a 40 year-old file format, there will be open source tooling for that right… also nope!

So here I am needing to prank an old friend but without the tools to do it.. what’s an old hacker to do?

Reversing the format

I bite and decide to write some tooling that can extract source code from the log files. I have several hundred samples to look at locally, some large and with lots of changes, others smaller and with only one revision in, time to dump some hex..

It turns out that PVCS is very similar to RCS in keeping the latest file content, plus a series of reverse deltas to recontruct earlier revisions, it also supports symbolic names / labels (we might say tags in a modern VCS). There is a list of revision locks and some control metadata (such as the original file name, the originating user, etc.). All of this is visible in a logfile as plain text blocks, with some surrounding binary goo. Closer examination between files shows consistent structure, with a fixed header, then a series of Type-Length-Value (TLV) blocks, much like ASN.1 or Protobuf encoding. I work out how to step through these blocks, and to identify the one holding the latest revision of the content, extracting that to stdout. Job done for the prank, but…

The one that got away

While looking for PVCS downloads in the wayback machine, I discover that Borland C/C++ V5 for Windows supported PVCS natively as a built-in version control backend, and there is a CD image on archive.org :) Yoink!

After much unpacking (nested 3 layers deep - thanks Borland), I find a specific win32 DLL that contains the PVCS logic, which appears to have been supplied by Polytron to Borland (according to the copyright and version table strings within). I throw this into Ghidra and take a quick look - it’s quite complicated, so I decide to put that aside and see what I can learn from the raw files first, which turns out to be all I needed to extract source code - I might return to the decompiler if I ever want / need to know more.

Making a useful tool

I’m enjoying the process of understaning the format, there are likely to be others out there facing a similar issue (always assume another nerd shares your pain) and there are no open source tools for PVCS. I feel obliged to finish the job and extend the tooling such that it is possible to read useful metadata and retrieve any revision from a logfile, then publish for the good of geekdom - hello!

What can be done

Read-only extraction of metadata and content revisions (by revision number or label). It’s unlikely ever to be possible to write a logfile as there is too much unknown stuff (although some persistence with the decompiler might get there).

Have fun and let me know if you use this!