Sunday, October 28, 2012

Keyword Argument Unpacking in Python

Recently in Software Engineering we learned about "keyword argument unpacking" in Python. Many times when I learn weird language quirks and features like this, I often think about how the language feature can help the design of my program flow in a natural way; more natural than if the feature did not exist in the first place.

Keyword argument unpacking basically works by "decorating" the name of a dictionary with two prefixed asterisks in a function or method call which has argument keywords that have the same strings as the keys of the dictionary.

So basically, say we have the following function:

def f (x, y, z) :
    return [x, y, z]

We can make the following call:

f(dict([(x, 1), (y, 2), (z, 4)])

Or the following call:

f(dict([(y, 2), (z, 4), (x, 1)])

Or in any other order and still got the same list back: [1, 2, 4]

Last wee this proved to be particularly useful in dealing with Google App Engine models mainly because the models can often be constructed with a subset of possible arguments. If you only want one call to the model constructor and the number of arguments is dynamic at runtime, then simply building a dictionary with the arguments and values of the call seems to be a very elegant solution. Before I thought to use it, I had a very ugly block with many if and else statements creating different models depending on which arguments to the model constructor were valid. Again, this way is much more elegant. It is limited, but I do believe I have found one of my new favorite features of Python. 

Sunday, October 21, 2012

Editors vs IDE's in a Group Environment

Last semester in Object Oriented Programming I touched on this subject a bit in my blog there, but now that I've gotten the chance to assess this subject from a group's perspective, I thought I would give it another shot.

Both editors and IDE's (Interactive Development Environments) have their place. People will moan over how editors are "so superior" to IDE's in every way, but the simple truth is that some things are just better done in a full blown IDE. In smaller projects and most projects done by myself, I will choose to do them with vim (the best editor :) ). I can search and/or replace faster than you can blink, I can move around the document painlessly, and basically any unix machine around will have it. Combine it with a terminal multiplexer like tmux, (which is also installed on all the CS linux machines thanks to me) and you have got yourself a serious development setup.

The thing is though, this really only works for small jobs. When I say small, I mean within a week, maybe a couple weeks of time to finish a project. I love working with vim, but when the projects get large with oodles of different files and formats, using a terminal with a lighweight editor just for the hell of it can get a little cumbersome. The truth is that the developers of these IDE's just put too many awesome features in them for the large scale projects to pass up. In a group environment often times some people just are not extremely comfortable with editors yet, and to be honest they can be a little intimidating to some who are not used to them.

The current project for Software Engineering calls for the creation of an XML schema, an XML file which uses the schema, and the translation of the xml into Google App models, which will be used to display web pages interfacing with the models. In my experience, it is extremely important to establish the environments the group will be working with early on so that each member of the group can help each other with any environment problems, and so the rest of the time can be spent working on developing.

So what did we choose? Eclipse of course! In our case, we chose Eclipse for Java EE Developers, which is really an Eclipse package which includes a ton of awesome tools for Web Development. Eclipse is cross-platform, so we can use it on our Linux, Windows, and OS X machines without a problem. We then all installed eGit, which is a Git plugin for Eclipse so we can communicate with our Git repository on GitHub. This really made the creation of the XML a breeze and allowed us all the ability to essentially work in the same environment regardless of what systems we were running. We also installed the PyDev plugin for eclipse and the Google App SDK so we could also run whatever Python code we needed from within Eclipse. As an extra touch I also installed a vi-binding plugin so that I could get a similar feel to vim within Eclipse.

Could we have done this all in our own separate environments? Probably. Could we have done it as quickly? Questionable. Granted, it did take me quite a bit of time to establish what I thought would be a good environment for the team based on my experience, but it definitely paid off. We were able to work in a more uniform manner, communicate more effectively, and use some awesome new tools!

Sunday, October 14, 2012

Pickles in Python

In the last project for Software Engineering, Netflix, we were asked to guess the ratings different users would give to different movies based on information from a large dataset of users, movies, and ratings. Sadly, for this project I was forced to work alone, and I knew I needed to be as efficient as possible in order to get done in time. The project called for different caches from the data: average ratings for different users, average ratings for different movies, standard deviations of users' ratings from their mean rating, average ratings from a user per decade (which decade the movie was in). Needless to say, parsing all this data could have been a real chore if I was not careful, and not just parsing it, but outputting it (caching it) into a format my top-level application could read later.

Recently at my job, the application I am creating called for Object serialization, which is basically a way in which Object instances in a running application can be "serialized" and output through some kind of data stream, usually a file. I knew something had to exist in Python for object serialization, so I came to find out about Pickle, which is exactly that.

Now to the fun stuff...

Pickle basically allows us to write any data structure to a file like so :

import pickle
my_list = ['f', 'o', 'o', 'e', 'y']
pickle.dump(my_list, open('my_list.p', 'w'))

That's it! And it is an awfully nice way to store the caches for the Netflix project. We can simply read these serialized objects back in like so:

import pickle
my_list = pickle.load(open('my_list.p', 'r'))

Again, that's it! Immediately our list (or any other data structure we save) is loaded right back into memory, ready to be used by the application. There is no ugly parsing of my own hacked-together data format, it is just the beauty of serialized objects and me saving a ton of time.

Speaking of saving time, we can actually save even more time in the code by changing the import statement from :

import pickle

to:

import cPickle as pickle

This imports the cPickle module instead of the standard pickle module. The difference is that cPickle is written in C and is "up to 1000 times faster" than the Python version of pickle. You lose out on some of the subclassibility of the normal pickle, but hey, you can't argue with 1000 times faster.

Object serialization really is a beautiful thing!

Cheers

Monday, October 8, 2012

Design Trade-Offs

After reading Is Design Dead?, I have come to realize the debates that exist over the amount of effort and thoroughness that should be put into the design stage of XP. XP really encourages that design really be indirectly created through the development of user stories, and that designs really should not be "set in stone" when development begins.

When I started my first large project in the summer with ARM, I more or less took the XP approach. I had no idea what issues I would come across in development, and I really had no idea what classes and structures were necessary to complete the job. And that's ok! So I thought. The design would sort itself out in the end.

There's a balance though. There needs to be some element of design pattern application and forethought structure put in place such that refactoring and scalability will not be an issue later. You can't just go aimlessly writing hundreds of lines of code implementing spike solutions to user stories. There still needs to be some thought put into design so that when it comes time to improve the solution, you won't have to rewrite the entire thing.

Again, for my project, I had no idea what problems I would come across, so I sort of formed a "design-as-you-go" mentality. Keep design as simple as possible and as scalable as possible. The design should make sense from an overhead view and should be fluently applicable at the source code view. As the project grows in scope, it can be difficult to balance the work put into design and the work put into user stories. How much time do I spend refactoring code in order to adhere to a better design? In my experience, if I think that a certain piece of code will need to be expanded on later, I tend to spend a good amount of time on refactoring it if I believe there is a better design. Often times there is just no way I would have known why this new thought out design was better, so there is no way I could have designed it in the first place.

Sometimes people can get really attached to pre-determined designs and force code to adhere to it. The problem is that the code will begin to lose its flow and explaining it to someone else will become more and more confusing. If you can't explain the overall flow to someone, it's probably a bad idea, and it is due to bad design, the design needs to be modified.

Just as people shouldn't get too attached to code, don't get too attached to design! In my opinion, software design is just as important as the code itself, and if it needs to be changed for the better, by all means just do it! Cross it out with a pen! If you force yourself to adhere to a bad pre-determined design, you are going to have a bad time.

Monday, October 1, 2012

Test Driven Development

Last week we had a visitor from National Instruments come and speak to our class about the company and how they implement different forms of test driven development. In every case where it was used in previous projects it was a great access, but a few issues were shown that can make it somewhat difficult to implement. (Ones I have had issues with in the past as well)

1. Getting everybody on board with test driven development can be a real chore, especially if the project manager is hard-coded with old fashion methodologies which say to test later.

2. Sometimes it is difficult to test code which belongs to someone else. When developing a front end or application which depends on other databases, it is sometimes impossible to test the database communication functions independently. Sometimes top-level acceptance type tests are only possible, and that's ok! At least the entire system is testable.

3. It is difficult to test when the project outcome was not defined well enough when the test writing began. Let's say for instance you write 100 tests based on a particular understanding of what the correct output is. If the project manager keeps coming in and saying, "I know we created those stories a month ago, but the output needs to change again", it starts to get very annoying. When the consumer/manager/person who wants the application to exist keeps dramatically changing his or her mind about the scope of the project, the will for developers to effectively test can really be impacted. Establish scope first!

These are just a few, but the first and third of these examples are two of the reasons why I personally have trouble with test driven development. I will say it is the way to go. The representative from NI (National Instruments) really showed me to push myself to be in better communication with my manager and learn to persuade the company of things I truly believe will help. TDD is definitely one of those things and should make my job much easier.