Friday, July 26, 2002
Clustering with Tomcat
This article describes how Web applications can benefit from clustering and presents a clustering solution that we developed for the Jakarta Tomcat Servlet Engine to provide high scalability, load-balancing, and high availability using JavaSpaces technology.
Very sexy. #
Blogging Makes it to the Cover of InformationWeek [via robb]
It's a bit overkill. As they are now, I don't find anything interesting about blogs from the technological perspective. I mean they're personal websites. When blogs start serving up XML of course they will become the foundation for the Semantic Web. #
Covariance in .NET
Great. I too think covariance is a good thing. #
More on Schema and Strong Typing (from weakliem)
Weakliem brings up lots of good points. I have to think about them some more. It seems like there's a bit of ambiguity around the problem that needs to be resolved. The XML Schema Recommendation is very clear that the primary purpose of schemas is to provide a formalized mechanism for defining and enforcing constraints on an XML document. This is beyond debate.
What got me thinking was a previous discussion in which I complained about the nature of schemas and particularly the notion of datatyping XML and decided that the only real utility of schema was in mapping XML to objects.
Then Vaster made some really smart observations on the nature of schema. He concluded that a schema does carry semantics. In fact he declared that the majority of XML implicitly must carry semantics because XML is a serialization format for well-defined, semantically-understood things. I'm not sure if I agree with this statement, but if you do then there are consquences and it'd be interesting to explore these consequences.
Weakliem introduces two important notions: the idea of a 'document' vs an 'object' and of strongly typed data vs operations on data. These are both ideas which seem to pop up everywhere, particularly in the REST and RDF worlds.
I think the first distinction is valid. A document is an object on which the only valid operation is 'read' (analagous to HTTP GET) and that operation is idempotent (it causes no change on the object). An object (in the OO sense) is an object which exposes operations that are not idempotent. This also reminds me of the distinction between 'data' and 'information'.
The second distinction is not so valid. I don't think there's such a thing as 'untyped' data. All information (and by extension all XML) must be typed. The type may change (sometimes '7' is an integer representing an order quantity, sometimes it's a string describing the name of a movie) but it's always there, implicitly or explicitly, and if it's somehow not, what you've got is nonsense (or, as others might say, a meaningless sequence of numbers). This might not make too much sense, but it's true. I'd go even further and say that every meaningful XML document, regardless of whether it declares a schema and regardless of whether the declared schema exists or not, must follow a schema though that schema may be 'implicit' in the document structure.
Now if all data is implicitly strongly typed (because untyped data isn't data) then I might say the existence of a strong (that is well-understood) type implies that I can perform non-idempotent operations upon that data. For example if a schema tells me that the value of an attribute must be an integer between 1 and 10 then it follows that anywhere I see that attribute I can change its value to an integer between 1 an 10. For the case of air availability documents, if I'm told that an element contains a flight number then it makes sense that I should be able to perform flight number operations on that element such as booking it or retrieving information on it. So I might conclude that an air availability document, (and indeed any document typed by a schema and since all documents are at least implicitly typed by a schema then I really mean all documents), does indeed represent a type of object.
I'm not sure if I'm ready to make this conclusion yet but to some extent I think it may be inevitable. I don't think you can talk about data without talking about operations on that data. If this is the case, the OO paradigm does in fact supercede all other paradigms, including the document-centric XML paradigm.
This is all very interesting because I was just talking to a colleague about whether the HTTP protocol is 'complete' or needs to be seriously extended. I think HTTP is complete and I might 'prove' this by going backwards, deriving operations from data. If I saw an URL (the data) on the side of a bus, I think I could naturally derive the GET method. Once the GET method exists, PUT and DELETE follow. POST would be a hack but a useful one which I'd stumble upon sooner or later. Except I would probably change the fundamental verbs to READ, WRITE, DELETE and EXECUTE. The point is I can start with the data and work backwards to operations on that data.
I guess what I'm trying to say is that while Schemas were not meant to define semantics but they do. They define a strong type and they define a context (through the namespace) -- this is probably enough.
Anyways back to work. #
HTTP Resource Identification
Consider a small example: suppose that Culinary Press establishes
Very, very, very interesting point. #
Basic XML and RDF techniques for knowledge management, pt 7
Uche Ogbuji takes a moment to review in a broader context the relevance of the XML/RDF techniques he has been presenting. He discusses the importance of XML/RDF interchange, of specialized RDF query, and of applying lessons from RDF modeling to overall application development. He also shows how this thread of the Thinking XML column relates to the parallel thread on developments toward semantic transparency.#
RDF & Schema Data Typing
Is this a good idea? I have my doubts. There seems to be two schools of RDF users emerging: those who are interested in using RDF to store concrete data (I call them the personal information managers) and those who are interested in using RDF to define relationships between URIs. I'm part of the second group but clearly, now that data typing is being introduced, the first group is winning. I'm not sure why this is... perhaps because people have so many different notions of what constitutes 'metadata'. My doubts about RDF's ability to solve the metadata exchange problem are still growing. Perhaps RDF simply attempts to do too much. I know the failure to distinguish between attributes (of an instance object) and relations (between two sets of objects) is a major flaw in my opinion. #
More on Schemas and Schema Inheritance
Schema doesn't map to a class in OO.
Technically this is true, but I'm not sure it's 100% true anymore. There are a lot of tools out there that will consume a class and spit out a schema or will consume a schema and spit out a class. This may be approaching the 90% barrier. If it's true that 90% of the time that I'm writing/using a schema it's a schema that's tightly correlated to a class then I'd say a particular schema instance within a namespace does map to a class.
Further, if you believe that "XML is just a tool to negotiate semantically well understood stuff between people at two ends of a long pipe" then it's not such a stretch to say that a Schema maps a collection of XML data into well-defined business objects and classes in OO are representations of those well-defined business object thus a particular subset of a schema for a particular set of XML does correspond directly to an object.
Schema describes just a bunch of types; the schema itself is just a scope which defines a common identity for a concrete set of concrete type definitions across space and time. This makes it necessary to change the identity of all types once one of them changes or the set changes.
This seems unfortunate. Maybe a mechanism for increasing the granularity of a schema could be introduced. The problem of having a schema defined by a namespace which defines a collection of types might be solved just by using fragment identifiers. For example, urn:schemas-example-com:mySchema#myType.
You won't benefit from them fixing their stuff, because that'd change their Schema and hence that'd change your Schema.
This is true. If parent schema evolved, derived schema wouldn't evolve automatically. This might be solved by implementing a strict versioning scheme. Parent schema simply wouldn't be allowed to evolve.
XSD doesn't really map to OO -- only very few aspects do.
This is true but I wonder. It's a given that the original goal of the Schema rec had little to do with OO but that's not really how it's evolved. This is one case where I might apply the Unix philosophy and look for the 90% solution. 90% of the time it seems like schema are used to validate a collection of XML data which must be mapped back into an object (simultaneously validating the XML data). This does seem to be implied and SOAP and it's explicit in .NET. If this is the case, then the goal of mapping XSD onto OO might make a lot of sense.
Then there is the problem of what I've termed context-transitions. If I want to write a service that processes a set of XML documents (eg purchase orders) in a rather generic fashion. The problem is that these XML documents are all coming from a diverse array of sources (each is largely repetitive of the other, but there are subtle, significant differences). The current solution would be for me to define my own schema and then require clients to transform their XML into what I'm expecting (or all clients would simply agree to a common schema). While this may be necessary I don't think it's optimal 90% of the time. It would make sense to factor out the common elements into a common structure (the parent schema) and have clients take this common structure and add elements to it in a manner that didn't explicitly violate the parent schema. In this way, you could develop services that automagically deal with highly customized documents from diverse sources by viewing them through a base, common structure. There might be other benefits to such a scheme particularly if you're creating standardized schema (such as the ebXML set) but need to add customizations.
More on ASP.NET State Management
Yup, it won't work. There is no clean way for a module or anything else to get a handle to the page that a request is destined for. I still don't like the inheritance requirement so I think the best one can hope for is to move all of the functions in the StateManagingClass into static functions of a 'PageStateManager' class. Then clients will have the choice of either deriving from StateManagedPage or simply using the functions of PageStateManager. Then again, there might be a better way. I'll search for a more flexible solution more when I get back.
The only attribute I'm really interested in was TransientPageState since I tend to avoid storing things in sessions or cookies whenever possible. It'd be nice if the client could choose whether to store such transient-persistent values either in the Session or in the page's viewstate. As for the redirect problem I believe there's an overload of Response.Redirect which allows the current page to finish executing which could help. As for the base class problem it seems inescapable. For the webapplet framework I'm working on, it'd be neat if the client could install global event listeners that could listen to the lifecycle events (particularly init, pre-render) and execute code on the page for every page within a webapplet. Then clients who wanted their pages to have extra abilities wouldn't have to derive from classes like StateManagedPage or FormBindablePage, they could just declare their attributes and trust that the framework will receive their lifecycle events and do their magic. Of course, pages that wanted to use these extra abilities such as automagic state management and form binding would still have to derive from some base class say AbilityAwarePage which passed the lifecycle events to the registered global event listeners. #
Thursday, July 25, 2002
ASP.NET State Management With Style
The problem with this code is that it requires you derive all your classes from a common Page class. It'd be nicer if you could just stick an HTTP Module into the request pipeline and implement a marker interace (IStateManagedPage). I think I might make the necessary changes. There'll be problems with private variables. #
Take Advantage of Streams and Formatters [via drew]
When you first start learning VB.NET, one of the first things you may notice is the absence of "traditional" file I/O support in .NET. Microsoft has replaced the classic IO operations by stream operations. A stream is a simple concept that originated in the Unix world.#
dotMSN: .NET Messenger Library [via gentile]
dotMSN is a class library to make use of the MSN Messenger Service. The library is built in C# and can therefore be used by all languages the .NET environment supports. The library is easy to use because of a clean object oriented approach which is offered by a modern language like C#. You can use this library for example to create MSN bots or for use in already existing applications who need to communicate through the Messenger service.#
SOAP is like...
Or as a friend of mine likes to say "SOAP is like Oakland, there's no there, there".
REST and SQL
Very interesting. I still think the primary value propisition of REST is that it provides a common interface, architecture and methodology for interacting with complex systems. Just like SQL and RDBMS. This is a huge concern; the problems with messaging oriented architecture is that everybody is allowed to develop their own application level protocol (portTypes) and messaging formats and what not and complexity is, in effect, unbounded. From system to system there are very few common denominators which reduces interop potential. But within all REST systems the semantics of the four basic verbs are well defined and quickly understood. This means you can look at a REST system, examine the resources it exposes, and just, for the most part, 'get it'. With SOAP systems you have to deal with what is in effect a new protocol (the classic protocol (semantics) within a protocol (SOAP formatting) within a protocol (HTTP transport)) that many apps suffer from.
This idea of genericity and "few verbs" as Prescond says is very powerful methinks. #
Web Services Security - HTTP Digest Authentication without Active Directory
In this article, I will present an interoperable implementation of Digest authentication, also implemented in .NET managed code, without the use of the built-in IIS implementation and Active Directory. As with the Basic sample, this code will run even on a shared server. Note that both samples can run simultaneously on the same server, and your site can then support both Basic and Digest.#
Sam Gentile's Blogroll
Sam has removed his blogroll. I'd like to take a moment to remember what was certainly one of the finest's blogrolls around. Everyday I used to wake up and waste a good half-hour going down that extensive blogroll. The reason I started this weblog in fact was because I wanted to add these weblogs to Sam's extensive blogroll but I thought it might be a bit rude asking a stranger to include blogs he didn't read on his blogroll for my convienence.
Still, that was a damn fine blogroll. Here's to Sam's blogroll. #
Wednesday, July 24, 2002
Very Long Transactions
The transaction model itself is too biased towards a procedural worldview. Transactions in the business and database sense (as if there's much of a difference) simply might not scale to internet-class applications (where guarantees of any sort can rarely be made). What are the alternatives? #
Windley's Comments on Staying Sane
That's exactly what a schema does. As mentioned in the previous post, a schema locates an XML document (a set of tags) within a well defined context in a specific application model. You might say that a Schema provides 'strong typing' (in accordance with the notion of strongly typed programming models) to XML and that declaration of a schema is equivalent to a type declaration. Once an XML document has been typed then it is up to the application model to define what operations may be carried out upon the type and the semantics of those operations.
This is a somewhat disturbing thought. XML is supposed to promote interop by simply structuring data. Services built upon XML achieve maximum interop by passing data around. But when you begin implementing things like Schemas (and possibly Namespaces) you've crossed a line; the data being passed back and forth is no longer XML it is simply a serialization format for a strongly typed object. There's a subtle but important difference.
This may lead to the inevitable conclusion that the Object-Oriented paradigm, in which data and behavior are encapsulated within the same entity, is inherently superior to the XML (data-oriented) paradigm. If this is so, what are the implications for REST vs SOAP? Does this validate SOAP as an interop mechanism? And what will be the interop implications of the widespread use of tools such as VS.NET's xsd.exe which can consume a Schema and spit out a Strongly Typed Class?
More from Vasters on XML Web Services [Schema Inheritance]
A description of a "customer" complex type (I stick with this most-abused example for clarity) expressed in XML Schema is context-free and without any associated semantics until I say targetNamespace="urn:schemas-newtelligence-com:banking:collections:client:2002-07-24". With that I am binding what used to be a purely technical description to well known and well defined business semantics and gain the freedom to adjust that distinct understanding of customer to changing business needs. So, at least in my world, XSD does carry semantics.
Very interesting point. I've always been against XML schemas (particularly the typing system) but this is a clear advantage. A schema does connote semantics and declaring a schema does transform an XML document from a meaningless infoset into a well-defined business object. Declaring a schema is equivalent to saying 'this XML document corresponds to X business object so you can do Y and Z with it.' At the same time, you can do the same time with the plain old XML Namespaces Recommendation.
Further, it seems that if we're going to use Schemas and Namespaces to connote semantics we should aim for maximum flexibility and allow for the same document to exist in multiple Namespaces or Schemas. This suggests that neither Schemas nor Namespaces are the optimal solution to the problem of embedding semantics into an XML document or placing an XML document within a well defined context (in which the tags have well defined meanings).
The problem might be solved by exposing Schema and Namespace inheritance in the XML document itself. It'd be nice if it were possible to explictly declare in the document itself that 'My Schema derives from that Schema. Any document that conforms to My Schema probably conforms to that Schema'. This would probably solve 90% of the problems of moving XML documents between well-defined contexts, encourage extensibility and OO-design, and it would certainly be easier to use Schema/Namespace inheritance rather then writing an XSLT document for every imaginable context transition.
More thoughts. Such an inheritance mechanism would certainly go a long way towards reaping the power of OOAD and unifying the XML and OO paradigms. It'd be very nice if I could recieve a document conforming to the schema urn:schemas-newtelligence-com:banking:collections:client:2002-07-24 and 'upcast' it to a document conforming to the schema urn:schemas-fdic-com:banking:collections:client. Further, any document which conforms to a schema deriving from urn:schemas-fdic-com:banking:collections:client should be able to be treated like such a document. That is, XML documents, like the business objects they correspond to, should be able to be treated polymorphically. This is because the notion of Object Inheritance (in which a subclass can do everything a base class can do with some extra stuff) is logically equivalent to the notion of Schema Inheritance (in which an XML document subclass has every element as the base class but with some extra stuff).
Schemas already have a built in mechansim for inheritance (through importing), the question is really how to expose this inheritance metadata to the parser/client. Exposing such metadata might be very difficult (particularly for large, complex hierarchies) and this might make the whole notion DOA, but I certainly think that if you accept that Schemas/Namespaces transfer semantics, then Schema/Namespace inheritance is a lot easier and nicer way to allow for context transititons than writing an XSLT document. #
The Daily Chump Bot is an IRC bot which allows you to create a collaborative weblog from IRC chat.#
Staying Sane in an XML Web Services World
For the most part, this seems right on target to me.
Second, I think Clemens' rule #3 doesn't necessarily encourage people to use DataSets; I think it just extends his rules to the cases where someone does use a DataSet. However, I certainly agree - DataSets are evil in public, interoperable web services.
DataSets aren't evil in just public web services, they suck in any web service. AFAIK, there is no standard DataSet API thus the DataSet schema/serialization protocol is a secret, proprietary Micrososft format that may change at any time. Further, sending a minature copy of the database (and this is especially true with typed DataSets which exist for a reason I cannot fathom), probably breaks several rules of ntier development. Your clients probably shouldn't be aware of an implementation detail as low-level and likely to change as your database schema. If you're going to use DataSets around you might as well just stick with the binary serialization scheme--it's faster.
I agree with this except for the case where you're adding a method. Without thinking about this too awfully much, I think adding a new method would not necessarily invalidate the contract, or change the semantics of the existing methods.
As I see it, a WSDL document is just like any other XML document. Addding elements to an XML document is always allowed (provided, of course, you don't break the schema). #
Tuesday, July 23, 2002
Introduction to Native XML Databases
Introduction to dbXML
Introduction to XML:DB API
Getting Started With Cocoon 2
The True Meaning of Service
Part of the debate between services and semantics is a replay of the debate about what makes the Web an interesting place: commerce or content? In the conventional wisdom, services represent the commerce part of the Web, while semantics represent its content.
Introduction to DAML: Part II
DAML ReferenceRDF Primer Primer #
Getting the Most Out of Your WSDL
Outlines headers and structured data updates to the Favorites Service sample Web Service to take better advantage of SOAP, and XML Schema added for displaying favorites and report information to the Web Service definitions.#
.NET Architecture Center
The .NET Architecture Center is a new site devoted to business, software, and infrastructure architects.#
Fielding on fragment identifiers
The TAG gave me an action item to describe some of the design history and rationale for fragment identifiers. This is my attempt to write it down in a "few" paragraphs.#
Monday, July 22, 2002
Action Comics no. 1
Wow. How cool is this? I think this "Internet" thing may be a good thing after all. #
Soap Extension Walkthrough
RPC and Document SOAP from one .NET Web Service #
Unify the Role-Based Security Models for Enterprise and Application Domains with .NET
Role-based security allows administrators to assign access permissions to users based on the roles they play rather than on their individual identities. These privileges can be used to control access to objects and methods, and are easier to identify and maintain than user-based security. The .NET Framework provides two role-based security models, which are exposed as two namespaces: System.Enterprise-Services and System.Security.Permissions.#
J2EE 1.4 Specification First Public Draft Available for Review
The first public draft of the J2EE 1.4 specification is avialable for review. This new release of the platform is primarily aimed at Web Services integration. It defines the component model, deployment, packaging, and container requirements for J2EE components to be exposed as or to use Web Services. It also includes EJB 2.1, JMS 1.1, Servlets 2.4, and JSP 1.3.#
TheServerSide Interviews Microsoft's Doug Purdy on .NET and J2EE
TSS presents a new Hard Core Tech Talk interview with Doug Purdy, a Program Manager with Microsoft's XML Web Services Team. The intention of the interview was to learn about .NET from the perspective of a J2EE architect;What followed was a fascinating comparison of both features and application design strategies of .NET and J2EE.
A side note: I love "attributal programming"--though I call it attribute-based programming which makes it sound less dangerous. #
Email Interface Design 101 [via rebelutionary]
And, what have I been coding all day? A natural language parser for treating emails sent to a common address like todo@ yourdomain.com as task list items -- but inferring properties like Priority, Status, Project, Categories, Who to Assign them to, etc. Serendipity in action. It's all database driven a pretty cool piece of code that knows that "Paolo" = www.evectors.com = IdeaTools and that tasks from Paolo have a higher priority than tasks from other sources, etc. More details as it gets more features and such.#
Using Amazon's Web Services #
Design By Contract Framework [via gentile] #
Sharing Types [via gentile]
This column addresses a common problem with Microsoft® Visual Studio® .NET Web service development: sharing data types across Web services. This issue arises when a developer creates a set of Web services with what appears to be well-thought-out portTypes and data types. Then, things quickly go awry when creating a client for that Web service.#
FOAF [via dj] #
Sunday, July 21, 2002
SOAP Web Method Specification #
REST + SOAP
Interesting. Unfortunately, his advice to never use PUT or DELETE represents a flawed understanding of REST. #
HTTP Handlers and HTTP Modules in ASP.NET
HttpHandlers and HttpModules