harbour-core/contrib/hbhttpd/readme.txt

Date: Fri, 12 Jun 2009 19:47:37 +0300
From: Mindaugas Kavaliauskas <dbtopas@dbtopas.lt>
To: "Harbour Project Main Developer List."
Subject: uhttpd v0.2


Hello,


I want to share some more ideas (and code) about uhttpd development.
All pro and cons, and any brainstorming is very welcome.

Sources can be obtained from:
https://dbtopas.lt/hrb/uhttpd-0.2.zip

You can test running demo application at (I'll try to keep it running
for some time): https://dbtopas.lt:8001/


I also want to add answer about one question. uhttpd support and
upload into Harbour source repository. I expected and wrote some time
before:
------
I just have some ideas how to extend it, but I'm not sure if these
ideas will be similar to repository changes by other people. It can
happen, that after some time I will propose something completely
different and incompatible from repository.
------
I see many backward incompatible changes in my uhttpd, and I'm going
to do development in this incompatible way. I'm just experimenting
with my simple applications, and I want to find a best web application
architecture solution. I'm not interested in showcounter sample,
or uhttp_cookie object, so, I do not want to do any changes to
repository uhttpd sample. Feel free to pick the features you like
and put it into repository.


Regards,
Mindaugas


The main idea
=============
I've implemented sessions for uhttpd. This session model is different
from other WWW servers. In database oriented web applications, server
has to open database file read/write some data, generate html output,
close database, send response to client. This requires database
opening/closing for each request. The goal of this implementation was
to avoid this opening/closing and other per request initialization/
exit operations. This could be done by keeping a separate thread for
each session. Every request of some session is processed by the same
thread and this thread keeps databases open.
This approach makes web server  a little similar to terminal server:
each application has "its own" thread in web server (just like each
application has its own process in case remote terminal). Some remote
terminal protocol is used to send keyboard data to and receive screen
image from terminal server, here we use HTTP protocol for this
purpose.


Sessions
========
Main thread waits for connections. Accept connections are put into
common request queue. This queue is processed by some threads. These
threads reads http request from socket and analyzes session
information. If request corresponds to some session and session
information is required to handle particular request, the request is
redirected to sessioned request queue of corresponding session. These
sessioned requests are processed by sessioned threads. Each active
session has one sessioned thread to handle request. This thread keeps
databases open. Some request (for example, .css file request) does not
requires session data even if client has active session with server,
these request may be processed by threads of common request queue.

If keep-alive connections are used, after sessioned request is
processed, connection is put into common request queue. This helps to
move receiving of request and processing of static content responses
to common queue threads, thus leaving sessioned threads available for
generation of dynamic content for another keep-alive connection of the
same session.
                                               Common keep-alive conn.
                                              +----------------------+
                  Accepted                    |                      |
  +-------------+ connection                  V  Common queue        |
  | Main thread | -----------+------------->  ###################    |
  +-------------+            ^                                  V    |
                             |         +------+ +------+ +------+    |
                             |         |Thread| |Thread| |Thread| ---+
                             |         +------+ +------+ +------+    |
                             |                                       |
                   Keep-alive|                                       |
                  connections|     Sessioned request queues          |
                             |                                       |
                             |     ####  <---------------------------+
                             |     V                                 |
                             |     +------+                          |
                             +---- |Thread|                          |
                             |     +------+                          |
                             |                                       |
                             |                     Sessioned request |
                             |     ####  <---------------------------+
                             |     V
                             |     +------+
                             +---- |Thread|
                                   +------+


Thread circulation
==================
The figure above shows connection/request circulation. Thread
circulation is done in very similar way. All children threads are
created by main thread. These child threads wait for connections in
common request queue. If sessioned request is received, thread finds
the corresponding sessioned thread, passes request to it, and starts
to wait for new request. If sessioned thread is not found (the first
session request is received), this thread initializes session data
and becomes a sessioned thread. After processing of the first session
request, it does not return to common request queue, but waits for
requests of this session only. After session is destroyed, this
thread returns to common request queue.


Mounting table
==============
Traditional web servers exposes directory tree (DocumentRoot) to the
clients. Server side scripts are a regular files having executable
attribute or some kind of extension (ex., .php) inside directory
tree (or some aliased directory). uhttpd is oriented to be a single
compiled application and dynamic web pages are generated not by
external files (thought, we can add such possibility using .hrb or
.prg files), but generated by function linked into final executable.
Thus some "table" is needed to convert requested URL to server script.
This table is called mounting table in my uhttpd implementation. It
allows to mount a single URL or URL subtree to a particular handler
(function or codeblock).
Mounting table is hash, having this structure:
   oServer:hMount := { url => { handler, sessioned }, ... }
URL can a single URL path, or path containing '*' wildchar in the end.
Example:
    /app/login   - single URL match https://host/app/login
    /files/*     - the whole URL subtree from https://host/files/
    /*           - the whole URL tree https://host/

NOTE: '*' should be placed after '/' symbol to match URL subtree.
Usage of '/files*' is invalid and do not match '/files1', '/filesa'
or '/files/x'. The requested URL path is checked by deleting last
slashed part until URL is found in mounting table. If no URL found
in mounting table, 404 Not Found error is returned.
Example 1. If '/files/dir/aaa' is requested, '/files/dir/aaa',
'/files/dir/*', '/files/*', and '/*' will be checked before 404
error is returned.
Example 2. If '/files/dir/' is requested, '/files/dir/',
'/files/dir/*', '/files/*', and '/*' will be checked before 404
error is returned.

NOTE 2: if you want to use a slash-less URL address as a synonym for
the directory you may need an extra redirection rule. Ex.,
     "/files"   => { {|| URedirect( "/files/" ) }, .F. }
     "/files/*" => { {| x | UProcFiles( DocumentRoot + x ) }, .F. }


Widgets
=======
The implementation described above can be used to develop web
applications with a power comparable to plain php, but we will need
some framework/toolkit on top of basic uhttp server to allow a quick
application development. UWidgets is used for this purpose. It allows
to use some objects (browse, etc.) instead of plain:
    UWrite('<table>')
    DO WHILE ! Eof()
       UWrite( '<tr><td>' + FIELD->NAME + '</td><td align="right">' +
               hb_ntos( FIELD->AGE ) + '</td></tr>' )
       dbSkip()
    ENDDO
    UWrite('</table>')

To use UWigets under some URL subtree, you should add an entry to
server mounting table specifying standard widgets handler:
    "app/*" => { {| x | UProcWidgets( x, s_aMap ) }, .T. }
You can see UWidgets handler requires requests to be sessioned.

s_aMap is one more table similar to server mounting table. Actually,
these two tables can be merged is widgets are implemented inside
server itself, but I want to keep widgets implementation separate,
thus, allowing an alternative implementations. s_aMap is hash
containing the mapping of URL subtree into handler functions. Ex.,
   s_aMap := { "login"        => @proc_login(), ;
               "main"         => @proc_main(), ;
               "account"      => @proc_account(), ;
               "items"        => @proc_items(), ;
               "items/edit"   => @proc_items_edit(), ;
               "logout"       => @proc_logout() }

Page handler functions receives a parameter indicating received
event/method. Handler has a structure:

STATIC FUNCTION proc_handler( cMethod )
   DO CASE
   CASE cMethod == "INIT"
      // This code is executed on entering URL (first call to this URL)
      // Here we open databases used to process queries
   CASE cMethod == "POST"
      // Process HTTP POST request
   CASE cMethod == "GET"
      // Process HTTP GET request
   CASE cMethod == "EXIT"
      // This code is executed on leaving URL (before first call to
      // another URL)
      // Here we close databases opened in INIT method, etc.
   ENDCASE
   RETURN .T.

As you can see this handler reminds the structure of traditional GUI
based application message/event handler, for example in windows, we
have:

STATIC FUNCTION WndProc( hWnd, uMsg, wParam, lParam )
   DO CASE
   CASE uMsg == WM_CREATE
   CASE uMsg == WM_PAINT
   CASE uMsg == WM_DESTROY
   ENDCASE
   RETURN ...

I hope this similarity will help to develop (or convert) event based
GUI applications to web easier.

The widgets are created on INIT method. The main widget is UWMain
object. Creation of widgets is done using a function following
Clipper convention: <object_name>New(). So,
     oM := UWMainNew()
creates a main widget of web page. This main widget acts as a
layout/container in for example, GTK+ library. It has :Add() method
and other widgets can be included inside of it. Ex.,
     oM := UWMainNew()
     oM:Add( UWLabelNew( "Hello, Widgets World!" ) )

UWidgets keeps main widget (and its children) inside session variable
and produces html output for it upon GET (or POST) requests. Main
widget "renders" all its child widgets, until the whole web page
content is generated. This html "rendering" is performed by
UWDefaultHandler().

POST method is usually used to perform some action on user data. I use
URedirect() function to do "redirect after post" and solve the problem
of from resubmitting, etc.


Modal page handlers
===================
Page handler has INIT, GET/POST, and EXIT messages. INIT and EXIT
methods are called only after you request of new page.

Ex.,

GET request "items" executes:
   page_items( "INIT" )
   page_items( "GET" )

Next GET request "items" executes:
   page_items( "GET" )

If you issue a GET "account" request, it will execute:
   page_items( "EXIT" )
   page_account( "INIT" )
   page_account( "GET" )

A tree structure of URL is transfered into page handles INIT, EXIT
logic. It helps make some feeling of modal structure of handler. I call
it "modal" because of idea how event are processed in event handlers of
modal dialogs. Let's have event handler function items_handler() for
items dialog, and items_edit_handler() function for item_edit dialog.

   PROCEDURE items_handler()
      ...

      IF event == "edit button pressed"
         dialog := create_new_modal_dialog() // create items_edit dialog
         dialog:handler := @items_edit_handler()
         process_event_loop() // until dialog is closed
         destroy_dialog( dialog )
      ENDIF
      ...
      RETURN

during process_event_loop() events are processed inside
items_edit_handler() function, and this event handler can access
workareas opened in a parent dialog (item dialog), private variables
of item_hadler(), etc.
The similar effect was tried to reach in page hadlers. Let's continue our
sample (last query was "account").

GET request for "items" executes:
   page_account( "EXIT" )
   page_items( "INIT" )
   page_items( "GET" )

GET request for "items/edit" executes:
   page_items_edit( "INIT" )  // no page_items( "EXIT" ) !!!
   page_items_edit( "GET" )

GET request for "account" executes:
   page_items_edit( "EXIT" )
   page_items( "EXIT" )
   page_account( "INIT" )
   page_account( "GET" )

GET request for "account/edit" executes:
   page_account_edit( "INIT" )
   page_account_edit( "GET" )
etc...


Other major changes
===================
- dropped underscore and changed style to lower case for "global"
   memvars: server, get, post, cookie, session. These variables are
   accessed very often, and it is not some kind of internals marked by
   underscore. Code looked very UPPERCASE SCREAMING before;
- uhttpd rewritten to be an object. Server does not occupies main()
   function any more, and can be included into any application (or even
   a few servers in one application);
- implemented missing HTTP/1.1 headers;
- implemented keep-alive connections to reduce TCP handshake overhead;
- implemented HTTP/1.1 Last-Modified and co., to avoid resend of
   unmodified static content;
- session expiration;
- socket.c module supports linux (thanks Przemek for detailed
   instructions);
- socket.c is more multithread GC friendly (using hb_vm[Un]Lock());
- error handling. Runtime errors in "user" code does not cause server
   crash;


Pro and Cons
============

One thread per session
----------------------
Pro:
   * OK, if there is a limited number of clients actively using web
     application
Cons:
   * not scalable solution for sites with thousands low activity
     users (will keep a large number of inactive threads)

The implementation of sessioned threads was started from idea: it's
nice to have a prepared open aliases, positioned records, etc, on
request processing instead of opening, positioning and closing it
on every request. Some alias caching or another data model can solve
this problem.

Sessioned threads
-----------------
Pro:
   * Keeps alias state unchanged
   * Solves race condition problem for accessing session vartiables
Cons:
   * A little complicated and not standard architecture
   * Request should be divided into sessioned and non-sessioned

Modal page handlers
-------------------
Pro:
   * helps to have aliases opened in parent (as in modal application)
Cons:
   * disables to handle request for a few unrelated parts of web
     application. For example, if unrelated part having a different
     URL branch is opened in popup window. The old page handler
     receives EXIT message and state (aliases, etc) is lost.

Widgets and layouts
--------------------
Pro:
   * It's OK for compilated AJAX widgets, like browse
Cons:
   * For simple widgets (ex., UWHTML,. UWLabel, UWInput) is much easier
     to write a plain HTML code, than widget creation code


Roadmap
=======
I'm not sure if I will not delete the whole sessioned threads, modal
page handlers, and widgets in next version of uhttpd. But there are
a few things I know I will implement:
   * Templates
     Perhaps this can change widgets a lot. Only complicated widgets
     will remain. Simple widgets and layouts will move to templates.