It could have been easier, but I'm not sure by how much.  The hard
part, it seems to me, stems from basic differences between the
application domains.  SIP forking proxies, for instance, are doing
something fundamentally different from HTTP proxies --- they are
searching for a person who moves without warning, rather than for a
bytestream that can always be retrieved from a known address.  And
even at the level of what HTTP calls an origin server, SIP servers
have to maintain persistent state and connections to streaming
engines; plain HTTP servers do not.
The SIP folks could, of course, have made it easier to get a working
SIP server by writing code for all that, and plugging it into some
existing HTTP server via its Apache API/NSAPI/ISAPI-filter/servlet
style interface.  However, it seems to me (again, on the basis of only
a fairly quick skim of the RFC) that most of the protocol features
that make that impossible stem, directly or indirectly, from the
decision to support multicast and unreliable transport for SIP itself,
and not just the underlying data streams.  Whether that was a wise
decision, I don't know, but once you make it, a lot of other stuff
comes out in the wash.
rst