The madness of Samba setup on Ubuntu

Yeah, Ubuntu has thrown a few medium-sized annoyances at me recently. Tonight I wanted to set up Samba for sharing files with our local Windows machines. I've set up Samba dozens of times, but this is one of the more confusing episodes I've come across recently.

First of all I looked and saw that I had a package named "samba-common", and yet this did not mean I actually had Samba. A little confusing, but I just had to fire up Synaptic and search for "samba". So I installed the package and figured I'd have a serviceable default config. No dice. Rather than hack at /etc/smb.conf I decided to use swat, the Web control panel for Samba, which I've used, and I like. I installed it and tried to surf http://localhost:901/, the usual way to access swat. No response. I read the swat man page which said it runs through inetd. The man page specifically mentions editing /etc/inetd.conf I looked there and found but two lines, one of which was:

#<off>#swat            stream  tcp     nowait.400      root    /usr/sbin/tcpd  /usr/sbin/swat

How quaint! Old school inetd. How odd! Only 2 managed services. Oh well. I uncommented the line and then tried to restart it. Umm. no trace of inetd beside that config file.

At this point I did some googling. I just wanted a simple Ubuntu-specific HOWTO. I wasn't really in much of a mood for pulling out all the hacking stops. Google had almost nothing to offer except others bitching about how hard Samba is to set up on Ubuntu. I went back to work. I checked Synaptic and it turned out I had no inetd package installed. I selected xinetd, which is what I'd been expecting in a modern distro, anyway.

Now I had to configure swat to run behind it, but I was on my own. This is why a package such as swat should depend on a package such as xinetd, so that it can provide a plug-and play config. I'm pretty sure it does on Fedora Core. And certainly there shouldn't be phantom inetd.conf files floating around to trip up users.

I created /etc/xinetd.d/swat, with these contents:

# description: SAMBA SWAT
service swat
{
    disable         = no
    socket_type     = stream
    protocol        = tcp
    user            = root  #should use a more limited user here
    wait            = no
    server          = /usr/sbin/swat
}

And then dpkg-reconfigure xinetd to restart xinetd (and do any other magic needed after updating config). Finally I got swat up. If you want to use swat to administer Samba, you'll need a proper root password. No sudo to bail you out. If you log in as your user, you get a crippled, pee-on swat console for mere mortal users. On one machine I got my Ubuntu root by booting to single user mode and then setting the password. On this machine I used Automatix. Either way, use that root login to get the full plate of admin options. From that point it's fairly smooth sailing. swat isn't perfect, but it's as close as you get to simple Samba administration.

I hope this helps someone else. Maybe it's my imagination, but this seems to be much harder than I remember it ever being on Fedora Core or Mandrake. Oddly enough, I'm getting the impression that Ubuntu rocks for user desktop stuff, but gets really clumsy when it comes to the server management goods that should be bread and butter for Linux.

[Uche Ogbuji]

via Copia

Merging Atom 1.0 feeds with Python

mergeatom.py

At the heart of Planet Atom is the mergeatom module. I've updated mergeatom a lot since I first released it. It's still a simple Python utility for merging multiple Atom 1.0 feeds into an aggregated feed. Some of the features:

  • Reads in a list of atom URLs, files or content strings to be merged into a given target document
  • Puts out a complete, merged Atom document (duplicates by atom:id are suppressed).
  • Collates the entries according to date, allowing you to limit the total. WARNING: Entries from the original Atom feed may be deleted according to ID duplicate removal or entry count limits.
  • Allows you to set the sort order of resulting entries
  • Uses atom:source elements, according to the spec, to retain key metadata from the originating feeds
  • Normalizes XML namespaces prefixes for output Atom elements (uses atom:*)
  • Allows you to limit contained entries to a date range
  • Handles base URIs fixup intelligently (Base URIs on feed elements are) migrated on to copied entries so that contained relative links remain correct

It requires atomixlib 0.3.0 or more recent, and Amara 1.1.6 or more recent

[Uche Ogbuji]

via Copia

CherryPy QOTW

OK there are many fun zingers from the latest CherryPy/WSGI/Paste tiff, but this one resonated soundly with me:

In times BC (Before CherryPy ;) I would simply write stuff in PHP because it was easier than fitting my square mind in some Python framework's tetrahegon shaped hole.

—Christian Dowski, Captain Buffet

Such an L7. :-) And I don't know why that puts in my head the nonce construction "techno-hadron".

[Uche Ogbuji]

via Copia

Planet Atom

Planet Atom is now live.

Planet Atom focuses Atom streams from authors with an affinity for syndication and Atom-specific issues. This site was developed by Sylvain Hellegouarch, Uche Ogbuji, and John L. Clark. Please visit the Planet Atom development site if you are interested in the source code. The complete list of sources is maintained in XBEL format (with some experimental extensions); please contact one of the site developers if you want to suggest a modification to this list.

John, Sylvain and I have been working at this on and off for over a month now (we've all been swamped with other things—the actual development of the site was fairly straightforward). Planet Atom is built on an aggregation from Atom 1.0 feeds into one larger feed (with entries collated, trimmed etc.) It's built on 4Suite (for XSLT processing), CherryPy (for Web serving), Amara (for Atom feed slicing and dicing), atomixlib (for building the aggregate feed) and dateutil (for date wrangling), with Python and XML as the twin foundations, of course. Thanks to folks on the #atom and #swhack IRC channels for review and feedback.

[Uche Ogbuji]

via Copia

ElementTree in Python stdlib

The big news in Python/XML in December was the checking in of ElementTree into Python's standard library, which means the package will be part of Python 2.5. The python-dev summary has a good account of the move. I'm surprised it took so long. Even though I've been involved in at least one ElementTree versus 4Suite battle (I also wrote the earliest and possibly only comprehensive treatment of ElementTree), I've always appreciated that ElementTree's relative weightlessness makes it a decent option for Pythoneers who really don't care much about all the intricacies of XML and just have to reluctantly deal with a mess of angle brackets. That's the sort of library that ends up in the stdlib, and for good reason. And for sure, developers have needed a stdlib alternative to SAX and DOM for ages.

A very useful side effect of this move is a step towards merging PyXML into Python SVN, and eliminating the _xmlplus hack that was used to have PyXML installs shadow the stdlib originals. XML-SIG has argued about this for years, but there was never a stimulus for anyone to actually get something done about it. To quote from my article "Gems from the Mines: 2002 to 2003".

In early 2003] Martijn Faassen kicked off a long discussion on the future of PyXML by [complaining about the "_xmlplus hack" that PyXML uses to serve as a drop-in replacement for the Python built-in XML libraries. After he reiterated the complaint the discussion turned to a very serious one that underscored the odd in-between status of PyXML, and in what ways the PyXML package continued to be relevant as a stand-alone. Most of these issues are still not resolved, so this thread is an important airing of considerations that affect many Python/XML users to this day.

Congrats to Fredrik on this culmination of his hard work. And here's to the continued diversity of developer options in the very complex field of XML processing.

[Uche Ogbuji]

via Copia

Updates to Python style guidelines (PEP 8)

There has been a lot of discussion about updates to PEP 8. The contributing discussion is summarized in this python-dev summary, although the link to the work-in-progress summary is broken. I use instead the subversion snapshot for the PEP. A lot of the discussion makes sense to me, although I personally are in the "deprecate leading double underscore as private camp", despite Tim Peters's sensible arguments. I think the Timbot's arguments point instead to a need for a clearer declarative mechanism for privates variables. I'll be burned at the stake for making a suggestion that could whiff of C++ or Java envy, but I'll counter that even if the language has such a mechanism, I'll likely never use it.

[Uche Ogbuji]

via Copia

Expat 2.0 (featured in 4Suite CVS)

Expat 2.0 has made it to the world, after a long incubation. Expat is, of course, the very popular XML parser in C originally developed by James Clark. The first I learned of this development was an announcement by Jeremy Kloth. His announcement also mentioned that current 4Suite CVS includes Expat 2.0, and that it's probably the first outside project to do so. In fact, the most recent 4Suite beta release included an Expat CVS snapshot that was for all practical purposes 2.0.

This is a very important milestone as it will allow Expat development to move on to more innovative pastures. And of course it adds the essential—support for AmigaOS (I'm going to get hate mail from my Amiga booster friends from college).

[Uche Ogbuji]

via Copia

Inventing XML for music

I got an interesting message in response to "Learn how to invent XML languages, then do so". Michael Good of Recordare LLC wrote:

I enjoyed your response to Tim Bray's piece on inventing XML languages.

I hope that when listing important XML vocabularies in the future, you will consider including the MusicXML language for music notation:

For 20 years people had tried to invent a better format than MIDI for exchanging music notation between applications. MIDI was not designed for this purpose, even though it was used this way. It could do the bare basics but not much more, and was really inadequate to the task. The two major attempts, NIFF and SMDL, failed in attracting any significant industry support (in SMDL's case, any industry support whatsoever). MusicXML is the first such language to succeed.

MusicXML is supported by over 50 applications, including the market leaders in music notation editors (Finale and Sibelius) and all the major players in music scanning. It has been adopted by commercial and open source projects; by industry developers, hobbyists, and academic researchers; by established products and innovative new applications. Consumers can finally exchange digital sheet music files between applications, and the barriers to entry for innovative new applications in the market has been dramatically lowered (e.g. see the entry of MuseBook, OrganMuse, Notion, and musicRAIN into the market).

I'm sure there are other examples where XML has successfully (not just potentially) broken down barriers between document interchange in specialized fields. But for now, MusicXML is the most dramatic success story I know. I do want to better understand how MusicXML measures up to XML vocabularies in other industries in terms of adoption rate. If you have pointers to other work in this area, anything you could send on would be most appreciated.

This just underscores my first reaction whenever I hear someone discouraging people from inventing XML formats—how can that be the product of anything but the narrowest world view? XML's strength is in providing a syntactic framework that can be used across innumerable domains. There is no reason why a musical XML format should not be as important as , say, Docbook. I don't know anything about XML formats in the music field, but I'm certainly happy to see that there is room in the XML universe for MusicXML as well as Country Dance (Folk Dancing) animation language, to grab another example plucked from XML.com.

I was curious, so I browsed the landscape a bit for Music and XML. There were some interesting nuggets in the usual sea of noise in this SlashDot article on MusicXML, including this comment:

[Don't forget] the archival value of MusicXML -- [people] criticize it for "re-inventing the wheel," but they're only looking at the value for music composers and consumers.

The true value of MusicXML is as a universally understood format for describing musical scores digitally. The music libraries of the future aren't going to be made of paper, don't you think?

This speaks very sensibly to the overall value proposition of XML. The universal syntax allows data formats to evolve that enhance the longevity of stored data. Longevity that comes from transparency. (OK, so you have to have long-lived storage media as well, but that's a different topic). Such longevity is further enhanced by (once again) the closeness of the expression to the domain model.

There are other XML specifications in the area:

Actually, just go straight to the indefatigable Robin Cover on the topic.

[Uche Ogbuji]

via Copia

Learn how to invent XML languages, then do so

There has been a lot of chatter about Tim Bray's piece "Don’t Invent XML Languages". Good. I'm all for anything that makes people think carefully about XML language design and problems of semantic transparency (communicated meaning of XML structure). I'm all for it even though I generally disagree with Tim's conclusions. Here are some quick thoughts on Tim's essay, and some of the responses I've seen.

Here’s a radical idea: don’t even think of making your own language until you’re sure that you can’t do the job using one of the Big Five: XHTML, DocBook, ODF, UBL, and Atom.—Bray

This is a pretty biased list, and happens to make sense for the circles in which he moves. Even though I happen to move in much the same circles, the first thing I'd say is that there could hardly ever be an authoritative "big 5" list of XML vocabs. There is too much debate and diversity, and that's too good a thing to sweep under the rug. MS Office XML or ODF? OAGIS or UBL? RSS 2.0 or Atom? Sure I happen to plump for the latter three, as Tim does, but things are not so clear cut for the average punter. (I didn't mention TEI or DocBook because it's much less of a head to head battle).

I made my own list in "A survey of XML standards: Part 3—The most important vocabularies" (IBM developerWorks, 2004). It goes:

  • XHTML
  • Docbook
  • XSL-FO
  • SVG
  • VoiceXML
  • MathML
  • SMIL
  • RDF
  • XML Topic Maps

And in that article I admit I'm "just scratching the surface". The list predates first full releases of Atom and ODF, or they would have been on it. I should also mention XBEL, which is, I think, not as widely trumpetd, but just about as important as those other entrants. BTW, see the full cross-reference of my survey of XML standards.

Designing XML Languages is hard. It’s boring, political, time-consuming, unglamorous, irritating work. It always takes longer than you think it will, and when you’re finished, there’s always this feeling that you could have done more or should have done less or got some detail essentially wrong.—Bray

This is true. It's easy to be flip and say "sure, that's true of programming, but we're not being advised to write no more programs". But then I think this difficulty is even more true of XML design than of programming, and it's worth reminding people that a useful XML vocabulary is not something you toss off in the spare hour. Simon St.Laurent has always been a sound analyst of the harm done by programmers who take shortcuts and abuse markup in order to suite their conventions. The lesson, however, should be to learn best practices of markup design rather than to become a helpless spectator.

If you’re going to design a new language, you’re committing to a major investment in software development. First, you’ll need a validator. Don’t kid yourself that writing a schema will do the trick; any nontrivial language will have a whole lot of constraints that you can’t check in a schema language, and any nontrivial language needs an automated validator if you’re going to get software to interoperate.

If people would just use decent schema technology, this point would be very much weakened. Schema designers rarely see beyond plain W3C XML Schema or RELAX NG. Too bad. RELAX NG plus Schematron (with XPath 1.0/XSLT 1.0 drivers) covers a huge number of constraints. Add in EXSLT 1.0 drivers for Schematron and you can cover probably 95+% of Atom's constraints (probably more, actually). Throw in user-defined extensions and you have a very powerful and mostly declarative validation engine. We should do a better job of rendering such goodness to XML developers, rather than scaring them away with duct-tape-validator bogeymen.

Yes, XHTML is semantically weak and doesn’t really grok hierarchy and has a bunch of other problems. That’s OK, because it has a general-purpose class attribute and ignores markup it doesn’t know about and you can bastardize it eight ways from center without anything breaking. The Kool Kids call this “Microformats”...

This understated bit is, I think, the heart of Tim's argument. The problem is that I still haven't been able to figure out why Microformats have any advantage in Semantic transparency over new vocabularies. Despite the fuzzy claims of μFormatters, a microformat requires just as much specification as a small, standalone format to be useful. It didn't take me long kicking around XOXO to solve a real-world problem before this became apparent to me.

Some interesting reactions to the piece

Dare Obasanjo. Dare indirectly brought up that Ian Hickson had argued against inventing XML vocabularies in 2003. I remember violently and negatively reacting to the idea that everyone should stick to XHTML and its elite companions. Certainly such limitations make sense for some, but the general case is more nuanced (thank goodness). Side note: another pioneer of the pessimistic side of this argument is Mark Pilgrim http://www.xml.com/pub/au/164. Needless to say I disagree with many of his points as well.

I've always considered it a gross hack to think that instead of having an HTML web page for my blog and an Atom/RSS feed, instead I should have a single HTML page with <div class="rss:item"> or <h3 class="atom:title"> embedded in it instead. However given that one of the inventors of XML (Tim Bray) is now advocating this approach, I wonder if I'm simply clinging to old ways and have become the kind of intellectual dinosaur I bemoan.—Obasanjo

Dare is, I think, about as stubborn and tart as I am, so I'm amazed to see him doubting his convictions in this way. Please don't, Dare. You're quite correct. Microformats are just a hair away from my pet reductio ad absurdum<tag type="title"> rather than just <title>. I still haven't heard a decent argument for such periphrasis. And I don't see how the fact that tag is semantically anchored does anything special for the stepchild identifier title in the microformats scenario.

BTW, there is a priceless quote in comments to Dare:

OK, so they're saying: don't create new XML languages - instead, create new HTML languages. Because if you can't get people to [separate presentation from data], hijack the presentation!—"Steve"

Wot he said. With bells on.

Danny Ayers .

I think most XML languages have been created by one of three processes - translating from a legacy format; mapping directly from the domain entities to the syntax; creating an abstract model from the domain, then mapping from that to the XML. The latter two of these are really on a greyscale: a language designer probably has the abstract entities and relationships in mind when creating the format, whether or not they have been expressed formally.—Ayers

I've had my tiffs with RDF gurus lately, but this is the sort of point you can trust an RDF guru to nail, and Danny does so. XML languages are, like all languages, about expression. The farther the expression lies from the abstraction being expressed (the model), the more expensive the maintenance. Punting to an existing format that might have some vague ties to the problem space is a much worse economic bet than the effort of designing a sound and true format for that problem space.

To slightly repurpose another Danny quote towards XML,

...in most cases it’s probably best to initially make up afresh a new representation that matches the domain model as closely as possible(/appropriate). Only then start looking to replacing the new terms with established ones with matching semantics. But don’t see reusing things as more important than getting an (appropriately) accurate model.—Ayers

Ned Batchelder. He correctly identifies that Tim Bray's points tend to be most applicable to document-style XML. I've long since come to the conclusion (again with a lot of influence from Simon St.Laurent) that XML is too often the wrong solution for programmer-data-focused formats (including software configuration formats). Yeah, of course I've already elaborated in the Python context.

[Uche Ogbuji]

via Copia