Today marks a decade since I made my first commit to CPython's repository on Sat, 19 Apr 2003 04:00:56 +0000 (python-checkins, hg.python.org). According to Ohloh, I currently sit as the 16th most prolific committers based on commit count which I can hardly believe. Boy have times changed over the past decade!
Back in April 2003, we were still on CVS on SourceForge (I somewhat foolishly took on projects to change both of those). Guido gave me my commit privileges himself (now I hand them out which is a bit scary =). It was less than a month after the first PyCon (or at least the first Python conference officially called PyCon, and I've managed to attend every single since and now have my wife asking if she can come) and me being elected to the Python Software Foundation (which I joined the board of directors for a time). This was before Python 2.3 we released (and now we are working on Python 3.4). Back then, Python was becoming popular and had an upward trend, heading towards its current position as the top dynamic language out there that isn't embedded in a browser (I would say it didn't really become really obvious this was going to happen until about 2005, so I got my wagon hitched at just the right time =).
But this post is not about reminiscing. It's for thanking the people and community who have made contributing to Python so enjoyable that I have actually wanted to do it for a whole decade (and will continue to do so for the foreseeable future).
I want to first thank python-dev. I have always said I truly learned how to program from my fellow core developers. Getting to work on CPython's interpreter core and the stdlib showed me how to manage complexity in APIs, keep my code clean and readable, when to optimize and when to go with the easier to read solution, etc. Pretty much everything that you would want to know when programming in the wild I didn't learn from a class or a book but from my fellow open source programmers. You just can't buy that experience. This is the reason I have always done what I could to make the lives of people who wanted to contribute as easy as possible (sometimes at the expense of other core devs depending on how you fall down on the svn -> hg transition).
I also want to thank the Python community. When I first started contributing I was doing it to gain experience in programming in the real world by contributing to a top-notch codebase with world-class programmers. But as time went on the things I gained in terms of experience dwindled. But what I lost in terms of fulfillment from what I learned was more than made up for in terms of the interactions I had with the community. Meeting people who have benefited from my code and said "thanks" for volunteering my time truly does inspire me to keep contributing, especially when I don't want to backport a bug fix. =)
But through the community I have also been able to gain great friends from across the globe. While I may only get to see my "open source friends" about once a year for a week at PyCon (which is the key reason I look forward to the conference as soon as the last one finished), they are truly friends. They are people I would let crash in my spare bedroom, give them a key, and say "welcome" without hesitation (if any of them were ever so inclined to visit Toronto, let alone Guelph). Those friendships are truly important to me and what will keep me coming back for years on in the future no matter how much or little I am able to contribute in my spare time.
So thanks to everyone reading this. By being a part of this great community of nice, caring individuals I continue to come back to contribute and participate however I can.
Coder Who Says Py
A place for me to babble on about Python development, Python itself, and coding in general. The title is inspired by some knights who enjoy a good shrubbery.
2013-04-12
Why I'm signing up for Gittip
While at PyCon I heard about plans to integrate Gittip into rubygems.org and so I decided to have another look at Gittip. For those that don't know, it's a website where you can give and/or receive money to/from others on a weekly basis. You can give as little as $0.25/week ($13/year) up to $24/week ($1,248/year). Being on a weekly schedule allows you to say "I appreciate the time and effort you put into open source; keep it up!", compared to bounties which are goal-specific and don't recognize people who make contributions that have no direct financial benefit or make contributions year-round.
As I was poking around the site I noticed that my friend Jesse Noller was the top recipient. I read his page on Gittip which listed his vast accomplishments that he has made in his spare time for no pay beyond gratitude from others and any feeling of accomplishment his hard work gives him. But the other thing his Gittip page mentions is what receiving tips means for him.
It basically boils down to a way for people to thank his family for letting him do his open source work. That sentiment really struck a cord with me. Like most open source contributors, I do it because I derive some enjoyment from it. It's a feeling of accomplishment, it's the camaraderie with my various friends that I have in the Python community, etc. In other words it's all very intangible but I do get something from doing my open source work.
But my family doesn't get any of the benefit that I get. Since I am not paid to do my open source work I need to take personal time to do it. That means I have to take time away from my wife to do this rather solitary work of contributing to open source. While my wife understands why I do what I do for Python, her benefit of getting to be proud of me is indirect and very diluted compared to what I get from it (although she is starting to increase her participation by attending PyCon).
But having people express gratitude through Gittip gives more direct benefit to one's family. When I asked on Twitter and Google+ for people to tip Jesse to thank him for all that he does through the year (and especially for PyCon in the past two years), he got a nice bump in his tips, and so he was able to take his daughter and family out bowling that night.
Tips then are a way for the community to thank someone's family for letting them share their loved one with open source. For instance, tips for me would be a way of thanking my wife for letting me spend the hours I do contributing to Python in my various ways by letting me treat my wife to a night out so neither of us has to cook. It also doesn't hurt that it acts like a small form of blackmail; "yes, Andrea, I do need to get this patch in and you should let me put the time in since the Python community treated you to a nice dinner last night" =) .
All of these reasons are why I'm joining Gittip. It has actually now reached the point of legitimacy to have Heroku as a company start leaving tips and Read the Docs is trying to pay for various expenses through Gittip; it's no longer just a bunch of individuals. So please consider signing up to both receive and send tips if you have the financial means to thank those in open source for their diligent and hard work.
As I was poking around the site I noticed that my friend Jesse Noller was the top recipient. I read his page on Gittip which listed his vast accomplishments that he has made in his spare time for no pay beyond gratitude from others and any feeling of accomplishment his hard work gives him. But the other thing his Gittip page mentions is what receiving tips means for him.
It basically boils down to a way for people to thank his family for letting him do his open source work. That sentiment really struck a cord with me. Like most open source contributors, I do it because I derive some enjoyment from it. It's a feeling of accomplishment, it's the camaraderie with my various friends that I have in the Python community, etc. In other words it's all very intangible but I do get something from doing my open source work.
But my family doesn't get any of the benefit that I get. Since I am not paid to do my open source work I need to take personal time to do it. That means I have to take time away from my wife to do this rather solitary work of contributing to open source. While my wife understands why I do what I do for Python, her benefit of getting to be proud of me is indirect and very diluted compared to what I get from it (although she is starting to increase her participation by attending PyCon).
But having people express gratitude through Gittip gives more direct benefit to one's family. When I asked on Twitter and Google+ for people to tip Jesse to thank him for all that he does through the year (and especially for PyCon in the past two years), he got a nice bump in his tips, and so he was able to take his daughter and family out bowling that night.
Tips then are a way for the community to thank someone's family for letting them share their loved one with open source. For instance, tips for me would be a way of thanking my wife for letting me spend the hours I do contributing to Python in my various ways by letting me treat my wife to a night out so neither of us has to cook. It also doesn't hurt that it acts like a small form of blackmail; "yes, Andrea, I do need to get this patch in and you should let me put the time in since the Python community treated you to a nice dinner last night" =) .
All of these reasons are why I'm joining Gittip. It has actually now reached the point of legitimacy to have Heroku as a company start leaving tips and Read the Docs is trying to pay for various expenses through Gittip; it's no longer just a bunch of individuals. So please consider signing up to both receive and send tips if you have the financial means to thank those in open source for their diligent and hard work.
2013-03-21
PyCon 2013 report
PyCon 2013 is now over and it was awesome (as usual)! As seems to happen every year, there were a few themes at the conference.
Packaging
For those of you who don't know, people are giving it another go to try and straighten out packaging in the Python world. The difference compared to the previous attempt is that Nick Coghlan, who is leading this endeavour, is working directly with pre-existing tools to gain consensus on things instead of trying to get the stdlib to handle it all. This means, for instance, he is working with the installer projects (e.g. pip) to agree on what should (and should not) happen in the evolution of packaging. This seems to have done a good job in energizing key people into supporting Nick's overall view (more on that later).
This means the stdlib is not going to try and solve all problems. The current thinking seems to be that the stdlib should house modules for which PEPs exist and then tools are to be built on top of that. This allows for all tools to act on metadata and such in a uniform way, letting them innovate on higher-level details (and keep the stdlib out of the installer game). Think PEPs 425, 426, and 427 details being handled by distlib.
He is also working from the top down on the stack. This means installer now, build-related stuff later. This has the nice benefit that the thing that most people directly interact with the most should get fixed first, rather than worrying about behind-the-scenes details later.
What does all of this mean? Eventually people will be able to get an installer, be able to securely install from the Cheeseshop (or any other package index of their choosing), and have it all bootstrap up on their system easily. A proposal (PEP 439) even went out this week to basically include a pip bootstrap script in Python which will install the real pip if it has not already happened and then continue on with the installation, making it all seamless. You can follow the discussion of this specific proposal on distutils-sig.
If all of this interests you I suggest you watch the packaging panel when the video goes up.
Python 3.3
I gave my Python 3.3 > Python 2.7 talk again (video here; PyCon Argentina video here although I think I like the US one more) where basically I pointed out all the wonderful features of Python 3.3 and that performance-wise you don't have to care which version you use (unless you have memory issues in which case you will want to use Python 3.3). I honestly was expecting some pushback since I have become a little jaded over the past 4+ years of Python 3's existence. But you know what? No pushback at all (but maybe it's because Armin wasn't there this year =). It was a really nice change of pace to not have to defend something I believe in and have worked hard to foster.
I heard numerous people tell me that they had finally been able to start using Python 3 and that they really enjoyed it. Jacob Kaplan-Moss of Django fame gave a talk on porting Django apps to Python 3 (no video yet) and told me that he not only liked Python 3, but that the no-argument version of super() made him "irrationally excited". David Beazley said that since he wrote the 3rd edition of the Python Cookbook for Python 3.3 he finds Python 2.7 a bit painful to use. It continues to be the case that almost everyone who gives Python 3 a fair shake ends up really liking it.
Diversity & Outreach
Watch Jesse Noller's opening statements. Then watch Eben Upton's keynote. Then realize that 20% of attendees were women. Then realize there was also a Raspberry Pi programming class for kids. Then really make sure you watch Jesse's opening statements if you ignored that initial link. Makes me want to be a better person and try to help people even more.
Everything else
I gave my "How Import Works" talk (US video here, Argentina video here and this time I prefer the latter thanks to having more time and thus feeling more relaxed).
The language summit happened. You can find numerous other summaries of what happend out there (Nick Coghlan, Kushal Das), so I won't rehash it here.
I wasn't able to stick around for the sprints this year (first time since the founding of the conference) past half of the first day. But hopefully next year I will be able to make it work out.
As I said, overall it was a great conference. Thanks to Jesse and everyone else who volunteered to help make it a great week.
2013-02-17
Resolving a TOOWTDI interface problem for attributes
TL;DR: choose one way to signify the lack of information in an API either as making the attribute optional or setting a default value (e.g. None), but not both.
When you read the docs for importlib.find_loader() you find that an exception is raised when __loader__ is set to None. But if you read the docs for importlib.abc.Loader.load_module() you will notice that __loader__ "should be set" (italics mine). So one part of the docs says having a value of None is fine while another says the attribute doesn't even have to exist. So the former is a LBYL version of the API while the latter is a EAFP version. While that's technically fine, I do like the concept of TOOWTDI in Python, so I would prefer choosing one of the approaches as the definitive way to signal that a module's loader is not known.
Does long-term (think in timescales of years) backwards-compatibility suggest a preference of one over the other? As it stands now, one must do:
That handles both the LBYL and EAFP approaches of either not setting the attribute or setting it to None. If this were to translate to LBYL it would become:
Not a huge difference, just easier to read. The EAFP approach would be:
try:
loader = module.__loader__
except AttributeError:
pass
else:
# Use loader
But since most code that cares whether __loader__ is set already uses the getattr() approach, the None value approach is the least disruptive to changing to the eventual idiom.
But the thing that tipped the scales for me is I don't want the attribute to be optional but be required in the long run (think Python 4 long run; side-effect of how long Python versions last), so I plan to change the default attributes on the module type to always have __loader__ and __package__ and set them to None by default in Python 3.4. That means the optional approach won't mean anything going forward, so that makes the LBYL approach the one I plan to go with even if I personally prefer the EAFP approach for optional API attributes; I don't want this part of the API being viewed as optional by loader authors.
If you care about any of this specific API cleanup, you can follow issue #17115 as I clean up importlib's mixed approach to __loader__.
When you read the docs for importlib.find_loader() you find that an exception is raised when __loader__ is set to None. But if you read the docs for importlib.abc.Loader.load_module() you will notice that __loader__ "should be set" (italics mine). So one part of the docs says having a value of None is fine while another says the attribute doesn't even have to exist. So the former is a LBYL version of the API while the latter is a EAFP version. While that's technically fine, I do like the concept of TOOWTDI in Python, so I would prefer choosing one of the approaches as the definitive way to signal that a module's loader is not known.
Does long-term (think in timescales of years) backwards-compatibility suggest a preference of one over the other? As it stands now, one must do:
if getattr(module, '__loader__', None) is not None: # Use loader That handles both the LBYL and EAFP approaches of either not setting the attribute or setting it to None. If this were to translate to LBYL it would become:
if module.__loader__ is not None:
# Use loaderNot a huge difference, just easier to read. The EAFP approach would be:
try:
loader = module.__loader__
except AttributeError:
pass
else:
# Use loader
Longer, but still totally readable and psychologically makes more sense since the attribute is set more often than not (importlib actually sets the attribute along with __package__ after importing if they are not already set).But since most code that cares whether __loader__ is set already uses the getattr() approach, the None value approach is the least disruptive to changing to the eventual idiom.
But the thing that tipped the scales for me is I don't want the attribute to be optional but be required in the long run (think Python 4 long run; side-effect of how long Python versions last), so I plan to change the default attributes on the module type to always have __loader__ and __package__ and set them to None by default in Python 3.4. That means the optional approach won't mean anything going forward, so that makes the LBYL approach the one I plan to go with even if I personally prefer the EAFP approach for optional API attributes; I don't want this part of the API being viewed as optional by loader authors.
If you care about any of this specific API cleanup, you can follow issue #17115 as I clean up importlib's mixed approach to __loader__.
2013-02-01
Remember that the "BC" in ABC means "Base Class"
[UPDATE: had a talk with +Thomas Wouters on IM and has caused me to rethink things. New thoughts up top, original post after the jump break]
I had mis-heard a comment Thomas Wouters made in a meeting about not raising NotImplementedError and using ABCs, which led to me thinking about the problem of having ABCs in your MRO which were not at the bottom and defined methods which would override methods you wanted to access farther down the inheritance chain. I had thought that calling super() in your ABCs in some manner was the solution. But after discussing things with Thomas I believe I was in the wrong and I had badly misheard what he had said. =)
Because importlib has a bunch of overlapping ABCs which inherit from each other I thought that the situation might come up where you inherited from two different classes which had overlap in methods but for which you would want them to build off of each other. But as Thomas pointed out to me, ABCs are meant to be at the bottom of an MRO; the BC stands for "Base Class" for a reason. If you are trying to interleave methods between two different classes implementing the same interface then either the granularity of the ABC is wrong or you shouldn't be inheriting from the ABC to begin with. I tried to come up with counter-examples but they were so convoluted and leading to bad API design in order to justify that I gave up and admitted they were stupid.
What does all of this mean for you when you are writing an ABC? You should just treat your ABCs as the bottom of your MRO. That means you should have all of your methods, even the abstract ones, do something sensible in case they are somewhat blindly reached through a super() call. If you have a default return value, return that. If that does not exist then you should raise the exception which signifies failure as defined by the API. But raising NotImplementedError is not the right thing to do when it can be avoided with a sensible default reaction (which I have not been doing in importlib.abc). This also has a nice side benefit of making sure you clearly define what the default reaction is for a call to the method.
I had mis-heard a comment Thomas Wouters made in a meeting about not raising NotImplementedError and using ABCs, which led to me thinking about the problem of having ABCs in your MRO which were not at the bottom and defined methods which would override methods you wanted to access farther down the inheritance chain. I had thought that calling super() in your ABCs in some manner was the solution. But after discussing things with Thomas I believe I was in the wrong and I had badly misheard what he had said. =)
Because importlib has a bunch of overlapping ABCs which inherit from each other I thought that the situation might come up where you inherited from two different classes which had overlap in methods but for which you would want them to build off of each other. But as Thomas pointed out to me, ABCs are meant to be at the bottom of an MRO; the BC stands for "Base Class" for a reason. If you are trying to interleave methods between two different classes implementing the same interface then either the granularity of the ABC is wrong or you shouldn't be inheriting from the ABC to begin with. I tried to come up with counter-examples but they were so convoluted and leading to bad API design in order to justify that I gave up and admitted they were stupid.
What does all of this mean for you when you are writing an ABC? You should just treat your ABCs as the bottom of your MRO. That means you should have all of your methods, even the abstract ones, do something sensible in case they are somewhat blindly reached through a super() call. If you have a default return value, return that. If that does not exist then you should raise the exception which signifies failure as defined by the API. But raising NotImplementedError is not the right thing to do when it can be avoided with a sensible default reaction (which I have not been doing in importlib.abc). This also has a nice side benefit of making sure you clearly define what the default reaction is for a call to the method.
2012-12-09
How much of Python can be written in Python?
Now I don't mean in the PyPy sense where you can bootstrap yourself with another Python installation. No, I'm talking about all you have is a checkout of the CPython repository and a C compiler. How far could you go in writing stuff for Python in Python and not C (from my perspective, for maintainability, for others perhaps ease of extensibility). In Python 3.3 we now have import written in Python (technically the main import loop that is used is implemented in C to save 5% at startup, but that is entirely optional as equivalent pure Python code still exists) and it's actually faster than the C version from Python 3.2 thanks to directory content caching. So it is not entirely ridiculous to think about how far one could push the idea of replacing C code in CPython with Python code.
What restrictions do we have for this thought experiment? One is that CPython needs to continue to be performant. That means either that the feature is not executed constantly or can be made to work as close to C code as possible. The other requirement is that it can't really have dependencies on the stdlib beyond built-in modules. Since this concept works based on freezing Python bytecode into C-level char arrays you don't want to have to pull in half the stdlib just to make something work. But that's pretty much it.
The first possibility is the parser. If you either generated the parser like the one CPython uses (that has not really changed much since Guido wrote it way back when) or wrote a recursive descent one by hand, it could probably be written in Python. The real problem is how performance might be hit. Now if you are working off of bytecode files then this really is only a one-time cost per bytecode file creation. But if you are working primarily with modules that you specify on the command line then they get parsed every time you invoke the interpreter and that could be costly if you can't get performance to be good enough.
Going down the compiler chain, you could also go from CST (concrete syntax tree) to AST (abstract syntax tree) in pure Python. You can already get to the CST from the parser module, so the work to expose the CST at the Python level is done. And with the ast module already exposed it then becomes a matter of creating the AST nodes from the CST. But once again, it's a question of performance since this is invoked every time source code is compiled.
Next would be transforming the AST to bytecode. The AST is already exposed to Python code, so once again the initial work is done for access. But also once again there is the question of performance as this is also on the critical path if you continually compiling Python source code because you are executing scripts instead of importing code which was previously stored as a bytecode file.
You can't do anything for the interpreter eval loop as that becomes a bootstrap issue. If you really wanted to push this you could do a basic eval loop to bootstrap a more complex one, but that seems like more work than it's worth.
I suspect most of Python's builtins could be re-implemented in pure Python without any trouble. Re-implementing something like any(), map(), etc. is not exactly difficult. In this instance, though, performance definitely becomes a key issue due to the extensive use of builtin functions. And in the case of exceptions you have to worry about the C API surrounding them on top of any possible performance issue from exception raising (although I'm willing to bet this can easily be alleviated by just caching at the interpreter level the builtin exception classes so that at the C level it's still just PyObject pointers instead of having to extract them dynamically every time from the builtin module).
And as always every single module in the stdlib does not have to be implemented in C code if it doesn't wrap other C code. In that instance it is simply taking the time to either copy over and get working the pure Python versions of modules that other VMs have written or writing one from scratch. But thanks to PEP 399 this is only an issue for pre-existing modules (which is also why no one has bothered to backfill all of those modules as the other VMs have already done the work for themselves so no one really needs this to happen; I opened issue 16651 to find out exactly what modules don't have a pure Python version).
In other words, there are various possibilities for technically writing more of CPython in pure Python exists, but performance considerations will quite possibly not make it worth pursuing (but I would be quite happy if proved wrong =).
2012-05-13
My (very shallow) thoughts on Dart
Being the language nerd that I am, I actually find it fun to learn new programming languages. Now typically this is nothing more than me reading all of the official documentation and writing some toy examples that give me a very shallow, quick-and-dirty feel for a language. Since I have been involved in language design for nearly a decade (started participating on python-dev in June 2002) and have done toy examples now in 18 languages (17 actually still run; I have never bothered to get Forth to work again after a gforth change broke my code), this is actually usually enough for me to grasp the inspirations for a language and thus understand its essence.
At work I have been doing some JavaScript work for an internal Chrome extension and dashboard and so that led me to want to look into what Dart had to offer over JavaScript. I know the language is only at version 0.09 (and still changing weekly), but the fundamentals are there so I wanted to see what the general feel of the language is (and will continue to be).
I also know Dart is somewhat controversial for some people. Personally, I fall on the "competition is good" side of the argument, not the "OMG fragmentation" side. I want ECMAScript Harmony to still happen and give me a cleaner, tighter, more functional JavaScript, but that doesn't mean Dart doesn't have a place in the world as a cleaner OO language for the web. Besides, me thinking otherwise would make me a massive hypocrite as I began working on Python before it was cool (I feel like I need a hipster meme for that statement, but I digress) and I have worked hard to convert people to Python from other languages. Hell, I have tried to foster competition between the Python VMs to get them to push each other to perform better and be ever more interoperable. IOW I don't totally buy this fragmentation argument.
Going into learning Dart I knew who was involved with the language which is what will inherently define how a language feels. I knew Lars Bak of V8 helped design the language, which meant it would have some design restrictions put on it to make it have a damn fast VM. Josh Bloch has been helping to design Dart's library which meant some JDK feel to it. I also know Jim Hugunin is involved which should also help with the VM speed. So fast with an API designed like the JDK.
What did I find? A language with a damn fast VM and a standard library that felt like the JDK. =) Take OO as a Python programmer would expect (e.g. pure OO where everything is an object, not dogmatic OO like Java where everything has to be in an class definition), make types entirely optional for testing and tooling purposes but enough support to use interfaces and generics, and then toss in abilities based on what JavaScript allows and then you have a good idea of what Dart offers.
So, Dart has optional typing. In case you have not heard, Dart does not use type information at runtime for performance and only throws any form of fit if a type doesn't match what is specified unless you run in checked mode. If you do that then you get warnings about possible type issues. But Dart's type system is unsound so don't expect typing to catch every error that a more strict type system might even when you run in checked mode. Dart views types as helpful documentation and a way to help tools assist with things, period. I actually find it rather refreshing to have a language that treats types as just documentation since that is really what they are for the programmer (VMs can use it for performance, but it isn't required for good performance and type safety only saves you from a minor set of bugs which every Python programmer probably realizes eventually =).
But that's even if you bother with types! You can write all of your code without types and everything will run without issue. Even generics are optional, so you can declare a function accepts a List or List ; Dart doesn't care either way and it alleviates covariance/contravariance headaches by not caring if you don't care either. It's actually rather nice to have non-library code be written quickly using dynamic typing and only add in the type information for library code where you care about what interface is expected. IOW I think Dart strike a nice balance with how it does typing and I actually feel fine using types when I know what I expect to accept in my own code that I don't expect anyone else to rely upon.
Dart is OO, not prototypical like JavaScript. It's single-inheritance, which I'm fine with. It does have interfaces as one would expect in a statically typed language, but it softens their expense by allowing one to define a default implementation of an interface. What this means is that the Map interface will also give you a HashMap instance if you call new Map(). I suspect they snagged the idea from Scala where you have the Map class which hides HashMap from the user if you simply don't care about what Map implementation you use.
It does have a modicum of privacy by using a leading underscore for signaling something is private, much like Python. But the privacy is enforced at the library-level or is public, period. Every field automatically has a getter and setter defined for them, so there is no way to force a private field (which I think is a good thing since I find private privacy bloody annoying). I also like that getters and setters are directly supported by the language with automatic generation show you don't ever have to see a setSomething()/getSomething() function call just to read/write a field, but you can do something like Python's properties very easily.
The standard libraries are fine and just feel like the JDK. Things are very much LBYL rather than EAFP. I am willing to bet (although I have not tested this) that exceptions are a little expensive in Dart (since exceptions are hard to optimize) and so they would rather go the LBYL way. But they still went a little overboard in my opinion on some things (e.g. the list interface has a last() method instead of supporting negative indexes). But there is nothing there that is making me run away screaming.
One place I do think Dart could use some improvement is simplifying their constructor rules. Upfront Dart has some nice syntactic sugar for a construction where you directly specify how a constructor's arguments map to instance fields, avoiding having to declare the constructor parameters and then also write an assignment. OK, I like that.
Dart also has initializer lists which let you initialize final fields. OK, that's cool and a nice idea taken from C++.
Constructors are not inherited. OK, that's fine since you probably want to be explicit about how you tweak stuff. But there is an exception about the default, no-argument constructor calling the superclass' no-argument constructor. So while not technically inherited, it might as well be in that single instance. And all defined constructors will automatically call the default constructor, which if it isn't defined you must explicitly call a constructor somehow (probably in the initializer list of your constructor). Um, OK...
And you have named constructors. This gets you around from the lack of type-based method overloading for constructors. OK, I can go with that.
You also have constant constructors since fields can only be initialized to compile-constant values. Fine, that's for performance and determinism in instance creation, so I can grasp the desire for that.
And then you have factory constructors. OK, this is where I go "WTF people". This is so that you can have a constructor that actually doesn't create a new instance but instead can return something else other than a new instance (think of Python's __new__() or any of Java's static factory methods). But this lets you use the new keyword on a factory constructor instead of using a static method. And that to me seems unneeded.
So lets recap what constructor options we have. We have regular constructors, default and defined, which supports initialize lists. You have named constructors. There are constant constructors. And you also have factory constructors. If you don't count the default constructor as special that means Dart has four types of constructors. WTF!?! I realize that Java's FactoryFactoryOfFactories crap has probably spooked the crap out of the Dart designers, all the while having Java influences making them think they need the new keyword for anything that would return an instance of a class, but this seems a bit much. Dart's function definitions are rich enough to allow for optional arguments, etc. which would suggest that the typical constructor can do the job of named constructors with static methods picking up the slack where absolutely necessary where factory constructors are used. Maybe I'm missing something here, but I think they tried to design for everything that is bad about Java's constructor mess without stopping to think what their function definitions already buy them, all while making sure the new keyword was used.
Luckily that is the only bit of Dart that I found poorly designed. Everything else is reasonable and something any JavaScript programmer will be somewhat familiar with or quickly grasp.
Now as I said, I only did toy examples in Dart beyond reading the docs from beginning to end. If I had more time this weekend I may have done one more coding example that was more involved, but I ran out of time. But based on what I have read and what I learned, I am happy with Dart and would be content in using it for programming for the Internet. I would also be totally happy being asked to use it in a situation where others wanted to use types (e.g. I would be fine ditching Java for Dart if people really felt the need to hold on to their types).
At work I have been doing some JavaScript work for an internal Chrome extension and dashboard and so that led me to want to look into what Dart had to offer over JavaScript. I know the language is only at version 0.09 (and still changing weekly), but the fundamentals are there so I wanted to see what the general feel of the language is (and will continue to be).
I also know Dart is somewhat controversial for some people. Personally, I fall on the "competition is good" side of the argument, not the "OMG fragmentation" side. I want ECMAScript Harmony to still happen and give me a cleaner, tighter, more functional JavaScript, but that doesn't mean Dart doesn't have a place in the world as a cleaner OO language for the web. Besides, me thinking otherwise would make me a massive hypocrite as I began working on Python before it was cool (I feel like I need a hipster meme for that statement, but I digress) and I have worked hard to convert people to Python from other languages. Hell, I have tried to foster competition between the Python VMs to get them to push each other to perform better and be ever more interoperable. IOW I don't totally buy this fragmentation argument.
Going into learning Dart I knew who was involved with the language which is what will inherently define how a language feels. I knew Lars Bak of V8 helped design the language, which meant it would have some design restrictions put on it to make it have a damn fast VM. Josh Bloch has been helping to design Dart's library which meant some JDK feel to it. I also know Jim Hugunin is involved which should also help with the VM speed. So fast with an API designed like the JDK.
What did I find? A language with a damn fast VM and a standard library that felt like the JDK. =) Take OO as a Python programmer would expect (e.g. pure OO where everything is an object, not dogmatic OO like Java where everything has to be in an class definition), make types entirely optional for testing and tooling purposes but enough support to use interfaces and generics, and then toss in abilities based on what JavaScript allows and then you have a good idea of what Dart offers.
So, Dart has optional typing. In case you have not heard, Dart does not use type information at runtime for performance and only throws any form of fit if a type doesn't match what is specified unless you run in checked mode. If you do that then you get warnings about possible type issues. But Dart's type system is unsound so don't expect typing to catch every error that a more strict type system might even when you run in checked mode. Dart views types as helpful documentation and a way to help tools assist with things, period. I actually find it rather refreshing to have a language that treats types as just documentation since that is really what they are for the programmer (VMs can use it for performance, but it isn't required for good performance and type safety only saves you from a minor set of bugs which every Python programmer probably realizes eventually =).
But that's even if you bother with types! You can write all of your code without types and everything will run without issue. Even generics are optional, so you can declare a function accepts a List or List
Dart is OO, not prototypical like JavaScript. It's single-inheritance, which I'm fine with. It does have interfaces as one would expect in a statically typed language, but it softens their expense by allowing one to define a default implementation of an interface. What this means is that the Map interface will also give you a HashMap instance if you call new Map(). I suspect they snagged the idea from Scala where you have the Map class which hides HashMap from the user if you simply don't care about what Map implementation you use.
It does have a modicum of privacy by using a leading underscore for signaling something is private, much like Python. But the privacy is enforced at the library-level or is public, period. Every field automatically has a getter and setter defined for them, so there is no way to force a private field (which I think is a good thing since I find private privacy bloody annoying). I also like that getters and setters are directly supported by the language with automatic generation show you don't ever have to see a setSomething()/getSomething() function call just to read/write a field, but you can do something like Python's properties very easily.
The standard libraries are fine and just feel like the JDK. Things are very much LBYL rather than EAFP. I am willing to bet (although I have not tested this) that exceptions are a little expensive in Dart (since exceptions are hard to optimize) and so they would rather go the LBYL way. But they still went a little overboard in my opinion on some things (e.g. the list interface has a last() method instead of supporting negative indexes). But there is nothing there that is making me run away screaming.
One place I do think Dart could use some improvement is simplifying their constructor rules. Upfront Dart has some nice syntactic sugar for a construction where you directly specify how a constructor's arguments map to instance fields, avoiding having to declare the constructor parameters and then also write an assignment. OK, I like that.
Dart also has initializer lists which let you initialize final fields. OK, that's cool and a nice idea taken from C++.
Constructors are not inherited. OK, that's fine since you probably want to be explicit about how you tweak stuff. But there is an exception about the default, no-argument constructor calling the superclass' no-argument constructor. So while not technically inherited, it might as well be in that single instance. And all defined constructors will automatically call the default constructor, which if it isn't defined you must explicitly call a constructor somehow (probably in the initializer list of your constructor). Um, OK...
And you have named constructors. This gets you around from the lack of type-based method overloading for constructors. OK, I can go with that.
You also have constant constructors since fields can only be initialized to compile-constant values. Fine, that's for performance and determinism in instance creation, so I can grasp the desire for that.
And then you have factory constructors. OK, this is where I go "WTF people". This is so that you can have a constructor that actually doesn't create a new instance but instead can return something else other than a new instance (think of Python's __new__() or any of Java's static factory methods). But this lets you use the new keyword on a factory constructor instead of using a static method. And that to me seems unneeded.
So lets recap what constructor options we have. We have regular constructors, default and defined, which supports initialize lists. You have named constructors. There are constant constructors. And you also have factory constructors. If you don't count the default constructor as special that means Dart has four types of constructors. WTF!?! I realize that Java's FactoryFactoryOfFactories crap has probably spooked the crap out of the Dart designers, all the while having Java influences making them think they need the new keyword for anything that would return an instance of a class, but this seems a bit much. Dart's function definitions are rich enough to allow for optional arguments, etc. which would suggest that the typical constructor can do the job of named constructors with static methods picking up the slack where absolutely necessary where factory constructors are used. Maybe I'm missing something here, but I think they tried to design for everything that is bad about Java's constructor mess without stopping to think what their function definitions already buy them, all while making sure the new keyword was used.
Luckily that is the only bit of Dart that I found poorly designed. Everything else is reasonable and something any JavaScript programmer will be somewhat familiar with or quickly grasp.
Now as I said, I only did toy examples in Dart beyond reading the docs from beginning to end. If I had more time this weekend I may have done one more coding example that was more involved, but I ran out of time. But based on what I have read and what I learned, I am happy with Dart and would be content in using it for programming for the Internet. I would also be totally happy being asked to use it in a situation where others wanted to use types (e.g. I would be fine ditching Java for Dart if people really felt the need to hold on to their types).
2012-05-12
Thoughts on using function signatures as a DSL for CLI parsers
I have no idea why, but this morning I thought about a decorator for delineating what function should be treated as the main function (e.g. using a decorator instead of the traditional if __name__ == '__main__' idiom). Now I solved it in my head on the spot, and then immediately realized someone had to have solved this already. Turns out various people have done things as nuts as examine stack levels to detect the __main__ name, but the most straight-forward solution I found doesn't do anything nearly as nuts or CPython-specific and is basically what I came up with. There was a red herring, though, in everyone's solution where they claim the decorator has to be on the last function in your module. While technically true when using the decorator as a decorator only, you can also just as easily not decorate the function and instead, at the end of your module, do something like main(func) since that is the same as decorating func with main.
A really simple expansion of this idea of helping out with defining what function is the main function, is to pass in sys.argv and to return a value to signify exit status: sys.exit(func(sys.argv[1:])). So now you have made the decorator more useful than replacing the old __name__ idiom.
But while that is nice and helps deal with the very common case, I wanted more. Why can't you introspect on the arguments the function takes and use that to automatically generate a command-line parser? I did a search and the best I could find is entrypoint, but it doesn't go far enough for me. What I want is to use the full expressiveness of function parameters in Python to express as much about what should/could be given on the command-line along with passing in as little as possible to the decorator in order to replicate the common case of command-line parsing; think just as easy as getopt but more powerful by using as much of argparse as you can without coming up with complicated rules about how things should work (since once you pass a certain complexity threshold you should just build the argument parser using argparse's API directly and stop trying to optimize for it like I'm suggesting).
So what do we have at our disposal to build such a decorator? We have positional arguments so we know how many arguments are required without some specific qualifier. We have variable positional arguments (e.g. *args) to take an optional number of extra arguments at the end of the command-line. We have keyword arguments which are optional flags that one can specify. You could even have variable keyword arguments for major flexibility, but that just seems like a total lack of structure the CLIs just don't typically provide. With all of that you can reproduce getopt without any issue for long-form names. For short names, I would say you need to pass in a mapping of short names to long names into the decorator. Same goes for long names to help string (you can use the function's docstring for the main help for the app itself).
But where things get really interesting is when you take into consideration function annotations. That opens up the possibility of going beyond getopt and potentially supporting argparse's action, nargs, and type options. Take the type option as an example. You could say limit:int=10 to have a command-line option called --limit which only accepted an integer and defaulted to 10. This obviously could also work with float or any other type where you can just pass in a string to the constructor to get back an instance of the type. So you have a general case which can be useful, but you can you potentially special-case some things to get enhanced functionality where it doesn't make sense to simply take in a string?
Lists pose an interesting option as argparse provides both nargs for specifying the number of arguments to a single option, or the append action for accepting multiple instances of the same option and accumulating them. In my mind both can be expressed in a way that I think makes sense but some might view as too magical. If you specify names:list=[], then that supports the append action, e.g. --names Brett --names Andrea leads to names being set to ['Brett', 'Andrea']. But if you were to do names:['+']=[], then that would get the same result from --names Brett Andrea. In other words, the list type specifies the append action while a list instance specifies using the nargs option with the single item in the list acting as the value to set to nargs.
For booleans, I would want the use of the bool type to mean use either the store_true or store_false action based on what the default argument was. So turn_on:bool=True would use the store_false action since the argument is meant to be a boolean and it's default value is True, meaning that if the option was specified it represents the reverse.
Finally, the tricky bit is for files since that is a common command-line argument and you might as well open the file and close it for the function. The solution argparse uses is a specific FileType class where you can pass specific arguments to use when opening the file. The problem is that it doesn't support everything open() does, e.g. encoding. So what I would want to do instead is provide a partial function that took everything but the file path and then when it came time to call the main function, passed in the file path to the partial function, passed the returned file to contextlib.closing(), and then passed it on to the main function. You could even generalize a lot of this and simply say that whatever is specified as the function annotation, if it isn't a special-case like lists, then you call the annotation with what came from the command-line and if it provides a context manager it is used before calling the main function.
So those are my thoughts on using function parameters as a DSL for getopt++/argparse-- functionality on a Saturday morning. Honestly the most complicated bit would be constructing the arguments to pass to the main function in the right order, otherwise it's just introspecting on a function's parameters and making the proper call to argparse. But then again the real question is whether anyone thinks this at all sounds reasonable enough to code it up.
A really simple expansion of this idea of helping out with defining what function is the main function, is to pass in sys.argv and to return a value to signify exit status: sys.exit(func(sys.argv[1:])). So now you have made the decorator more useful than replacing the old __name__ idiom.
But while that is nice and helps deal with the very common case, I wanted more. Why can't you introspect on the arguments the function takes and use that to automatically generate a command-line parser? I did a search and the best I could find is entrypoint, but it doesn't go far enough for me. What I want is to use the full expressiveness of function parameters in Python to express as much about what should/could be given on the command-line along with passing in as little as possible to the decorator in order to replicate the common case of command-line parsing; think just as easy as getopt but more powerful by using as much of argparse as you can without coming up with complicated rules about how things should work (since once you pass a certain complexity threshold you should just build the argument parser using argparse's API directly and stop trying to optimize for it like I'm suggesting).
So what do we have at our disposal to build such a decorator? We have positional arguments so we know how many arguments are required without some specific qualifier. We have variable positional arguments (e.g. *args) to take an optional number of extra arguments at the end of the command-line. We have keyword arguments which are optional flags that one can specify. You could even have variable keyword arguments for major flexibility, but that just seems like a total lack of structure the CLIs just don't typically provide. With all of that you can reproduce getopt without any issue for long-form names. For short names, I would say you need to pass in a mapping of short names to long names into the decorator. Same goes for long names to help string (you can use the function's docstring for the main help for the app itself).
But where things get really interesting is when you take into consideration function annotations. That opens up the possibility of going beyond getopt and potentially supporting argparse's action, nargs, and type options. Take the type option as an example. You could say limit:int=10 to have a command-line option called --limit which only accepted an integer and defaulted to 10. This obviously could also work with float or any other type where you can just pass in a string to the constructor to get back an instance of the type. So you have a general case which can be useful, but you can you potentially special-case some things to get enhanced functionality where it doesn't make sense to simply take in a string?
Lists pose an interesting option as argparse provides both nargs for specifying the number of arguments to a single option, or the append action for accepting multiple instances of the same option and accumulating them. In my mind both can be expressed in a way that I think makes sense but some might view as too magical. If you specify names:list=[], then that supports the append action, e.g. --names Brett --names Andrea leads to names being set to ['Brett', 'Andrea']. But if you were to do names:['+']=[], then that would get the same result from --names Brett Andrea. In other words, the list type specifies the append action while a list instance specifies using the nargs option with the single item in the list acting as the value to set to nargs.
For booleans, I would want the use of the bool type to mean use either the store_true or store_false action based on what the default argument was. So turn_on:bool=True would use the store_false action since the argument is meant to be a boolean and it's default value is True, meaning that if the option was specified it represents the reverse.
Finally, the tricky bit is for files since that is a common command-line argument and you might as well open the file and close it for the function. The solution argparse uses is a specific FileType class where you can pass specific arguments to use when opening the file. The problem is that it doesn't support everything open() does, e.g. encoding. So what I would want to do instead is provide a partial function that took everything but the file path and then when it came time to call the main function, passed in the file path to the partial function, passed the returned file to contextlib.closing(), and then passed it on to the main function. You could even generalize a lot of this and simply say that whatever is specified as the function annotation, if it isn't a special-case like lists, then you call the annotation with what came from the command-line and if it provides a context manager it is used before calling the main function.
So those are my thoughts on using function parameters as a DSL for getopt++/argparse-- functionality on a Saturday morning. Honestly the most complicated bit would be constructing the arguments to pass to the main function in the right order, otherwise it's just introspecting on a function's parameters and making the proper call to argparse. But then again the real question is whether anyone thinks this at all sounds reasonable enough to code it up.
2012-04-28
Playing with the Ninja build system
Whenever I learn a new programming language I end up writing some toy examples to try to get a feel for what the language is about. This leads to the need to build code using many different compilers with their own flags, quirks, etc. Up until today I had used SCons for my build setup. But honestly, it always seemed like overkill to me. Because I only had about 5 programs to build per language with at most two files used to produce the program, a full-blown build system was never really needed. Add to the fact that I am building for languages that no build system would have built-in support for, it led me to always have a wandering eye for another build system I could use.
This past week someone on Google+ shared a post comparing configure+make, cmake+make, and cmake+ninja. I had never heard of Ninja, so I decided to have a look. It turns out someone had written a build tool whose only explicit job was to take a DAG, figure out what needed to be built, and then execute the commands for the build. No crazy metadata checks like Make, or fanciful features, just bare-bones building. Ninja was actually designed to be a target for other higher-level build systems like cmake which can do the pre-computation of what the DAG should be, leaving it to Ninja to drive the needed compilation.
What attracted me to it was that it was fast and the syntax was simple. I have code examples for 16 languages, of which 10 have build rules (one happens to be Python 2.7 as I pre-compile the .pyo files). Turned out to be a pretty straight-forward process to take my custom SCons commands and just translate them to the subsequent shell commands that Ninja would execute for me. They are a tad verbose in order to make sure that the ninja -t clean command would clean up all intermediary files (I'm looking at you OCaml, Haskell, Java, and Scala). But as I said, I typically never have more than 5 programs to build per language, so it wasn't that much of a burden. And if I really cared I could have written a Python script to auto-generate the Ninja files for me, but I decided the effort of writing the code would be just as much as writing the build files by hand.
I realize I could have used Make, but I honestly am not enamoured with that tool; requiring tabs just rubs me the wrong way. Plus it's rather slow in the common case of only changing a file or two compared to a complete build from scratch.
Overall, for my weird case Ninja worked out. For something more complex, though, I will consider looking at cmake+ninja as a build solution.
This past week someone on Google+ shared a post comparing configure+make, cmake+make, and cmake+ninja. I had never heard of Ninja, so I decided to have a look. It turns out someone had written a build tool whose only explicit job was to take a DAG, figure out what needed to be built, and then execute the commands for the build. No crazy metadata checks like Make, or fanciful features, just bare-bones building. Ninja was actually designed to be a target for other higher-level build systems like cmake which can do the pre-computation of what the DAG should be, leaving it to Ninja to drive the needed compilation.
What attracted me to it was that it was fast and the syntax was simple. I have code examples for 16 languages, of which 10 have build rules (one happens to be Python 2.7 as I pre-compile the .pyo files). Turned out to be a pretty straight-forward process to take my custom SCons commands and just translate them to the subsequent shell commands that Ninja would execute for me. They are a tad verbose in order to make sure that the ninja -t clean command would clean up all intermediary files (I'm looking at you OCaml, Haskell, Java, and Scala). But as I said, I typically never have more than 5 programs to build per language, so it wasn't that much of a burden. And if I really cared I could have written a Python script to auto-generate the Ninja files for me, but I decided the effort of writing the code would be just as much as writing the build files by hand.
I realize I could have used Make, but I honestly am not enamoured with that tool; requiring tabs just rubs me the wrong way. Plus it's rather slow in the common case of only changing a file or two compared to a complete build from scratch.
Overall, for my weird case Ninja worked out. For something more complex, though, I will consider looking at cmake+ninja as a build solution.
2012-02-14
The re-launch of py3ksupport!
The reason the past few blog posts I have written have been App Engine-themed is because I have re-launched py3ksupport! I did a complete rewrite of the code to make it more efficient (since I'm paying $9/month for the site) and at the same time moved over to HRD so as to guarantee the site is always up.
Before I discuss some unique features of py3ksupport, I want to point out that right now that 56 - 60% of the top 50 projects based on downloads of their latest PyPI release support Python 3. The reason for the range is that some projects had in-development support last time I looked and since that can change underneath me I wanted to cover the possibility the data was stale. But the key point is that 8 of the top 10 projects support Python 3 and one of them has support under development along with over half of the top 50 projects.
So I'm sure the site explains itself (and the FAQ fills in gaps), but I figured I should explain some of the more unique features of the site. One is the metadata rating given to each project. Basically I wanted to shame project owners into updating their project metadata. LOTS of projects don't bother to specify the Python support metadata for their projects which makes my life difficult and is unfortunate for users. For instance, of projects bothered to specify the exact versions of Python they support then users could easily tell from the Cheeseshop (nee PyPI) whether they could use the project based on what version of Python they are tied to. I might have to do some public shaming at PyCon if the situation doesn't improve. =)
The other key point is that I personally keep the front page up to date. If a new project shows up on the front page that does not have the proper metadata specified then I personally search online to find out the status of the Python 3 support. Since I get emailed each day when this happens the situation tends to get fixed that day and will be noticed within an hour of me fixing it (the length of time I have things cached).
I'm hoping to eventually move to measuring an project's popularity based on the download rate for the lifetime of the application (instead of just the latest release) to give a more accurate reflection of how popular a project is.
Subscribe to:
Posts (Atom)