Yapip - Yet Another [HTTP] Proxy in Perl.
--------------------------------------------------------------------------------

Introduction
--------------------------------------------------------------------------------
Kyle R. Burton <mortis@voicenet.com>
Fri Aug 17 09:37:08 EDT 2001

I've started a secondary project in Pas for the purposes of including an
integrated regression testing tool.  Justin Bedard and I created a tool
for regression testing, that included a [modified] CGI proxy that
recorded what you did in your browser as a perl program that could play
back the actions that you took.  Starting with the recorded script, you
could modify it to create a regression test for specific functionality
on the website.  Our original work is available here:

  http://www.bgw.org/projects/guts/

The CGI based proxy is kind of slow (which is no fault of the program,
but more because of the fact that it's CGI based), so I've started work
on a custom proxy that's object oriented and uses IO multiplexing
instead of forking to handle connections.  

Starting from scratch has provided a few added benefits over the proxy
programs I found while searching on the web.  Since the proxy is an
object, it allows the proxy to be easily subclassed to implement new
functionality (to record regression testing scripts for example).  This
should allow the proxy to live several different lives, with the primary
one being a simple proxy, the main secondary one being as a tool for
recording and generating regression tests.

The io multiplexing nature of this implementation is significantly
faster than CGI or forking style proxies.  By using io multiplexing,
there is only 1 process, this allows it to more easily track what it's
clients are doing, as well as provide a 'control' style connection.  The
drawback is that while the proxy is busy handling data from one
connection, all the others block.  This is acceptable for our purposes -
the proxy is designed for use mainly as a testing tool, not as a high
performance gateway or caching proxy.  This blocking will not be an
issue unless there are many simultaneous users on the proxy.  In most
instances where we see this software being used (tinkering, testing,
observation), it should perform exteremly well.

The RFC for HTTP includes information on how HTTP proxies should behave.
This proxy was not created based on that RFC.  It probably should be.
It attempts to perform as little modification to the HTTP requests as
possible, only re-writing the request URI as necessary.  The RFC for
HTTP/1.1 can be found here:

  http://www.faqs.org/rfcs/rfc2068.html

Http Sniffer by Tim Meadowcroft <tim@schmerg.com> (another proxy written
in perl) was used extensivly as an example when writing this code.  Http
Sniffer is available from:

  http://www.schmerg.com/HttpSniffer.pl.txt



Yapip - Yet Another [HTTP] Proxy in Perl.
------------------------------------------------------------------------------
- more of a pass through than a full HTTP proxy
- ideal for subclassing to perform extended 
  tasks (like logging, or request observation, etc.)
- captures HTTP header and post data seperatly
- currently captures client requests
- will capture server responses

The following environment variables need to be set before you can
successfuly run the proxy:

  PAS_BASE=/path/to/the/installation/of/pas
  PERL5LIB=$PAS_BASE/src

To run it, use the proxy shell:

  [mortis@malevolence pas]$ perl -MOrg::Bgw::HTTP::Proxy::Shell -e shell
  args: 
  trying to connect to proxy localhost:8081...failed : Connection refused
  
  You are not connected to a proxy, try 'help' or 'connect'
  
  proxy> help
  proxy> start
  launching server...started
  trying to connect to proxy localhost:8081..connected
  proxy> status
  proxy> status
  
     Pid:                        6091
     Sid:                        6091
     Current Time:               Fri Aug 17 09:52:53 2001
     Start Time:                 Fri Aug 17 09:52:43 2001
     Elapsed Time:               10
     Connections:                0
     Control Connections:        1
     Requests Processed:         0
  
     Logfile:                    /home/mortis/projects/pas//logs/pas.log
     Loglevel:                   9
      
  proxy> stop
  shutting down...
  proxy> quit
  exiting...
  [mortis@malevolence pas]$ 



You can also run the proxy directly (without the shell):

  [mortis@malevolence pas]$ perl -MOrg::Bgw::HTTP::Proxy -e run_proxy
  args: 
  daemonizing...forked.  Child is: 6095.
  [mortis@malevolence pas]$ 


For debugging, run it from the command line and instruct it not to 
daemonize:

  [user@host dir]$ perl -MOrg::Bgw::HTTP::Proxy -e run_proxy -- -nodaemon


Once the proxy is running you can use the shell to establish a control
connection to it.  Starting or stopping the shell has no effect on the
proxy, the shell can disconnect or connect to the proxy arbritraily.

The proxy outputs information to the PAS logfile.  Tail the logfile to
watch what it's doing:

  [mortis@malevolence pas]$ tail -f $PAS_BASE/logs/pas.log
  
To utilize the proxy, configure your browser to use an HTTP proxy, and
point it at the proxy on the port reported by the proxy when it started
(or in the status).


TODO
--------------------------------------------------------------------------------
- implement response header parsing
- implement client ip tracking
  . reportable through the control connection
- implement request/response tracking by ip
  . initiate 'recording' through the shell interface
- streaming monitor connections
  . connect to the proxy, issue a 'stream ip' request, and any
    requests made by that ip are sent directly to you as they
    come in - turns the connection from an interactive one to
    a streaming connection.
- implement regression script recording
  . initiate 'recording' through the shell interface
- For the app server, switch on 'test mode' or 'debug mode', and then
  from inside Pas's base page object, we can establish a connection
  to the proxy and communicate with it to record request/response
  information (there is alot of potential for this idea)
- how else can we facilitate the regression testing process?
  . automaticly parsing out HTML?  form elements?  Possibly with
    HTML::Parser (though this is slow)
    - lists of anchor tags (for further spidering?)
    - names (and sizes) of textfields
    - names and contents of select elements (popup/dropdown lists)
    - names and contents of submit buttons