Fear the Cowboy

Life of Microsoft Open Source Developer

Crafting an Optimized PHP Build process on Windows (Part IV)

clock June 23, 2009 19:16 by author Garrett Serack

Previously, I had discussed what it took to use PGO on the Windows PHP build. The lead to me building automated build scripts…

Automation as the root of all evil

"Anything that can be done for you, automatically, can be done to you, automatically." – David C. Wyland

First, I had to get the entire dependency stack into the mix.  While some of the dependent libraries had VCProject files, some didn't.  Worse, even if they had them, you couldn't tell with a degree of certainty that they were compiled with the same settings which would enable them to take advantage of PGO optimization.  I began taking each project, updating (or creating, using the Trace and mkProject tools) the Visual C++ project files that would use the same settings as the rest, and eventually came up with a solution file that had 74 projects in it (some of the projects generated more than one binary).

Next, I had to actually automate the process of creating the vcproject files. Once you've got the right dependencies, the PHP build process cranks out over 30 binaries when you include the PHP extensions that get built as part of the core.  After what seemed like a million compile-verify-tweak iterations, I had the tools that could generate VCProject files for the core PHP and all the extensions, provided it was all in the right place.

Next I wrote a .cmd batch script that went step-by-step, checking out the source, compiling the dependent libraries, building the PHP makefile, compiling PHP like the community did—and logging what it was doing, then switching to instrumentation, rebuilding the dependencies again, building the stack, PGO training it with test data and some applications (Wordpress, MediaWiki and phpBB) and then relinking it with optimization.

I got the .cmd script almost working, but it was fairly fragile.  At that point I decided to switch batch scripting strategies, and in about a week, rewrote the batch script in JScript, which was far more flexible, and a lot more reliable.

What's next…

"The future always arrives too fast... and in the wrong order." –Alvin Toffler

During this process, I've tweaked the build process that is generated quite a bit, added in a few more applications to the PGO training which cranks the performance up more and more. Now, I can add in more scripts to assist with the training pretty trivially, but it still takes some effort to package up an entire application like MediaWiki or Wordpress and include it into the build process.  Even once I've added in an application, I end up doing a whole slew of comparative testing to see what impact it has on the final executables.

As time goes forward, I'm sure there's more tweaking to be done, but in all likelihood, any significant performance gains are going to be the result some modification of the PHP codebase itself.




Crafting an Optimized PHP Build Process on Windows (Part III)

clock June 18, 2009 14:18 by author Garrett Serack

Previously, I had talked about using PGO in the PHP build process. In order to use it I had to observe…

The Heisenberg build process

"A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it." – The First law of Mentat, quoted by Paul Atreides to Reverend Mother Gaius Helen Mohiam

Really, what I needed was a tool in two parts. The first would watch what happens during the build process, and the second would take that data and spit out some .vcproj files.

When I want to see what's happening on my own system I use ProcMon—a Sysinternals tool that monitors processes, what files they touch, what commands get executed, etc. I grabbed that and tried to watch what happens when you run NMake on the makefile when building PHP. It turns out that are a few problems with that—ProcMon isn't very scriptable (making it tricky to automate) and even if it was, it has problems chopping off the command line in its log files when it's past a certain length.

I found nothing else that did quite what I needed, so I started thinking about how to write a tool that does the same thing.  In the past I have used Detours (an API detouring library built by Microsoft Research) to build a couple quick-and-dirty snoop/debugging tools.  Starting with a sample that came from the Detours library, I cobbled together a tool that would watch a process and its children, recording every file written or read, every command issued, and dump it into an XML file which I could process later.

Creating the project files

At the same time, I began working on a tool that would generate .vcproj files from the data gathered during the make process. I first tried just putting together a tool which assembled the .vcproj XML file from what I knew about the layout of the project file, but as the build got trickier, the xml was getting harder to make sure it came out the way that Visual Studio expected.  I turned to the Visual Studio SDK to see if there are any COM objects I could use to manipulate project files—there were, but they aren't documented in great detail, and they were really designed to be used to inside Visual Studio for automation. Having scoured the planet, I found some examples of using the VCProjectEngine to generate project files.

For a couple of weeks solid, I worked on the tool to generate project files, compiling, testing, tweaking, etc.  I finally reached a point where I generated a project file completely that would compile the php.exe and php5.dll . Having finally arrived at this point, I built PHP using PGO instrumentation, ran the bench.php script from the PHP source directory, and then re-linked the project. This first time, I saw about an 18% improvement in speed over the previous version!

That moment

"It ain't over 'til it's over, and maybe not then, either. " – Slovotsky's Law #29

Well, as anyone who's done software development will tell you, there's the moment when you finally get your program to do what you want under very controlled conditions, and then—quite some time later—there's the moment that you can give the fruits of that labor to someone else so they can do the same thing.

Now that I had passed the point where I'd finally proven that it was worth the effort to build a PGO-optimized version of PHP, I had to get it scripted so that it could be done in an automated fashion, not just on my computer, or a computer in our Lab.

In the final part, I wrap up with the automation of the build and look to where we might go next in PHP.




Crafting an Optimized PHP Build Process on Windows (part II)

clock June 12, 2009 12:12 by author Garrett Serack

I had talked about getting started in building the PHP stack last time, now I’m taking it…

One step further

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." – Donald Knuth

A chance conversation I had last summer at OSCON with Trent Nelson—who was building Python on Windows—had planted the seeds of how to get PHP on Windows optimized further.  Trent was using the PGO features of Visual Studio to generate Python binaries that run faster.  Rather than spend a lot of time optimizing all the little bits of PHP itself, I thought that this would be an ideal way to improve the overall speed of PHP, provided I could find the right scenarios to train PHP with.  Little did I know that finding the right scenarios wasn't the hardest part.

  What is PGO? (from Wikipedia)
Profile-guided optimization (PGO) is a
compiler optimization technique in computer programming to improve program runtime performance. In contrast to traditional optimization techniques that solely use the source code, PGO uses the results of test runs of the instrumented program to optimize the final generated code. The compiler is used to access data from a sample run of the program across a representative input set. The data indicates which areas of the program are executed more frequently, and which areas are executed less frequently. All optimizations benefit from profile-guided feedback because they are less reliant on heuristics when making compilation decisions.
Adding PGO to the existing build process

"I have not failed, I've just found 10,000 ways that won't work." – Thomas Edison

I had downloaded the source to the dependent libraries off the PHP wiki, checked out the PHP source code, and began the process of adding in PGO support to the existing build process. This proved to be extremely difficult.  Even limiting the scope to just the core of PHP itself—without the dependent libraries, I ran trouble trying to compile using PGO instrumentation and then re-linking after running some tests.  The make file that gets generated by the configure.js script (a JScript version of the automake configure script for the Windows platform) was just not built with what I had in mind.

I spent the better part of two weeks trying different approaches to tweaking the makefile so that I could use PGO to improve the PHP executable, but I kept running into roadblocks.  Worse, the closer I got to a makefile that did that I wanted, the farther away from the current build process I was getting, and I wasn't sure that what I would end up with would even be close to what was being built today.

The long dark winter road

"Only the meek get pinched. The bold survive." – Ferris Bueller

I came to the conclusion that I'd have to build new Visual Studio project files from scratch.  What worried me is that this would end up to be a completely different build process and I'd never get the community to abandon what was already working, so I'd better be able to rebuild these new project files easily.  I started looking (inside Microsoft and out) for any tools which generated Visual C++ project files.  I found someone internally who had used some JScript to create project files from text files, but after some experimentation, I found this was nowhere near what I needed.  What I really needed was a way to convert the generated Makefile into a .vcproj file—and not just 'wrap' it.

Once I found there was no such tool* , I began trying to figure out how to create one. I had this idea a few times in the last decade or so: watch how a program was compiled, and create a project file that does the same thing. Having tossed around the idea in my head before, I knew it wasn't going to be trivial, but without it, I couldn't do what needed to be done.

* Let me tell you: you never want to think about writing a tool to parse out what a makefile does.  It's rather like making a tool that tells you how sausage is made, in excruciating detail. Ugh.

In Part III, I’ll talk about the trouble with observing the build process.




Crafting an Optimized PHP Build Process on Windows (Part I)

clock June 9, 2009 15:18 by author Garrett Serack

The last several months, I’ve been working very deeply with PHP—specifically—compiling the PHP core itself, and looking for avenues for optimization. This is the first of four posts about the journey I’ve been on with PHP.


I get started building PHP

"It is a bad plan that admits of no modification" – Publilius Syrus

I started working with building PHP itself about a year ago. Initially, I was trying to put together an environment to compile up the PHP stack so that I could do some debugging, and track down a few faults that we were encountering in some of the PHP applications that we were trying to modify to use the SQL Server PHP driver that the SQL Server team here at Microsoft was creating.

Once I began to work with the source code, I found out very quickly that on top of having a hard time recreating the exact same binaries that the community build process generated, there were a large number of dependent libraries that were available in binary-only form which were kept in a zip file that was passed around from developer to developer. That seemed a little odd for an open-source project but I can certainly understand that over time, unless someone is working hard to keep it all together, these things happen.

Around the same time, the community had started to invest a time and effort to 'clean up' the dependencies for building PHP on Windows, and move towards supporting VC9 (Visual Studio 2008) as an officially supported compiler.

In order to help in this process, I built out some testing environments in our Lab, which would let me compile up PHP on Windows and Linux, in order to get decent and reliable test results which we could use to identify any shortcomings that we could address. This includes benchmarking not just the core PHP executable, but replicable and comparable testing of PHP applications such as Wordpress, MediaWiki, Gallery and phpBB.

PHP 5.3 on Windows: Not your father's PHP

"I'm looking for a lot of men who have an infinite capacity to not know what can't be done." – Henry Ford

For PHP 5.3, Pierre (and others) had gone out and found up-to-date versions of all the dependencies, brought them together, and managed to get them compiling with VC6 and VC9.  They had posted these in binary and source form to the PHP Windows Internals site, which allows anyone to rebuild the PHP stack on Windows, and theoretically, get the same results as the 'official' build.

Jumping in at that point was much easier than it had been, as all you had to do was download the binaries of the libraries, check out the source code, and run a few commands at the command line, and presto you had your PHP executables. 

At this point Pierre and I played around with the build flags on VC9 and found some settings that gave some pretty significant improvements to the speed of PHP vs. the speed of the VC6 version—and a lot of speed improvements to vs. the old 5.2x line of PHP.

In Part II, I’ll talk about the going one step further with optimization.





The Cowboy

What I'm Tweetering about...

 

follow me on Twitter

Calendar

<<  September 2010  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

View posts in large calendar

Sign in