Software Engineering Adventures

Sunday, October 28, 2012

Keyword Argument Unpacking in Python

Recently in Software Engineering we learned about "keyword argument unpacking" in Python. Many times when I learn weird language quirks and features like this, I often think about how the language feature can help the design of my program flow in a natural way; more natural than if the feature did not exist in the first place.

Keyword argument unpacking basically works by "decorating" the name of a dictionary with two prefixed asterisks in a function or method call which has argument keywords that have the same strings as the keys of the dictionary.

So basically, say we have the following function:

def f (x, y, z) :
return [x, y, z]

We can make the following call:

f(dict([(x, 1), (y, 2), (z, 4)])

Or the following call:

f(dict([(y, 2), (z, 4), (x, 1)])

Or in any other order and still got the same list back: [1, 2, 4]

Last wee this proved to be particularly useful in dealing with Google App Engine models mainly because the models can often be constructed with a subset of possible arguments. If you only want one call to the model constructor and the number of arguments is dynamic at runtime, then simply building a dictionary with the arguments and values of the call seems to be a very elegant solution. Before I thought to use it, I had a very ugly block with many if and else statements creating different models depending on which arguments to the model constructor were valid. Again, this way is much more elegant. It is limited, but I do believe I have found one of my new favorite features of Python.

Sunday, October 21, 2012

Editors vs IDE's in a Group Environment

Last semester in Object Oriented Programming I touched on this subject a bit in my blog there, but now that I've gotten the chance to assess this subject from a group's perspective, I thought I would give it another shot.

Both editors and IDE's (Interactive Development Environments) have their place. People will moan over how editors are "so superior" to IDE's in every way, but the simple truth is that some things are just better done in a full blown IDE. In smaller projects and most projects done by myself, I will choose to do them with vim (the best editor :) ). I can search and/or replace faster than you can blink, I can move around the document painlessly, and basically any unix machine around will have it. Combine it with a terminal multiplexer like tmux, (which is also installed on all the CS linux machines thanks to me) and you have got yourself a serious development setup.

The thing is though, this really only works for small jobs. When I say small, I mean within a week, maybe a couple weeks of time to finish a project. I love working with vim, but when the projects get large with oodles of different files and formats, using a terminal with a lighweight editor just for the hell of it can get a little cumbersome. The truth is that the developers of these IDE's just put too many awesome features in them for the large scale projects to pass up. In a group environment often times some people just are not extremely comfortable with editors yet, and to be honest they can be a little intimidating to some who are not used to them.

The current project for Software Engineering calls for the creation of an XML schema, an XML file which uses the schema, and the translation of the xml into Google App models, which will be used to display web pages interfacing with the models. In my experience, it is extremely important to establish the environments the group will be working with early on so that each member of the group can help each other with any environment problems, and so the rest of the time can be spent working on developing.

So what did we choose? Eclipse of course! In our case, we chose Eclipse for Java EE Developers, which is really an Eclipse package which includes a ton of awesome tools for Web Development. Eclipse is cross-platform, so we can use it on our Linux, Windows, and OS X machines without a problem. We then all installed eGit, which is a Git plugin for Eclipse so we can communicate with our Git repository on GitHub. This really made the creation of the XML a breeze and allowed us all the ability to essentially work in the same environment regardless of what systems we were running. We also installed the PyDev plugin for eclipse and the Google App SDK so we could also run whatever Python code we needed from within Eclipse. As an extra touch I also installed a vi-binding plugin so that I could get a similar feel to vim within Eclipse.

Could we have done this all in our own separate environments? Probably. Could we have done it as quickly? Questionable. Granted, it did take me quite a bit of time to establish what I thought would be a good environment for the team based on my experience, but it definitely paid off. We were able to work in a more uniform manner, communicate more effectively, and use some awesome new tools!

Sunday, October 14, 2012

Pickles in Python

In the last project for Software Engineering, Netflix, we were asked to guess the ratings different users would give to different movies based on information from a large dataset of users, movies, and ratings. Sadly, for this project I was forced to work alone, and I knew I needed to be as efficient as possible in order to get done in time. The project called for different caches from the data: average ratings for different users, average ratings for different movies, standard deviations of users' ratings from their mean rating, average ratings from a user per decade (which decade the movie was in). Needless to say, parsing all this data could have been a real chore if I was not careful, and not just parsing it, but outputting it (caching it) into a format my top-level application could read later.

Recently at my job, the application I am creating called for Object serialization, which is basically a way in which Object instances in a running application can be "serialized" and output through some kind of data stream, usually a file. I knew something had to exist in Python for object serialization, so I came to find out about Pickle, which is exactly that.

Now to the fun stuff...

Pickle basically allows us to write any data structure to a file like so :

import pickle
my_list = ['f', 'o', 'o', 'e', 'y']
pickle.dump(my_list, open('my_list.p', 'w'))

That's it! And it is an awfully nice way to store the caches for the Netflix project. We can simply read these serialized objects back in like so:

import pickle
my_list = pickle.load(open('my_list.p', 'r'))

Again, that's it! Immediately our list (or any other data structure we save) is loaded right back into memory, ready to be used by the application. There is no ugly parsing of my own hacked-together data format, it is just the beauty of serialized objects and me saving a ton of time.

Speaking of saving time, we can actually save even more time in the code by changing the import statement from :

import pickle

to:

import cPickle as pickle

This imports the cPickle module instead of the standard pickle module. The difference is that cPickle is written in C and is "up to 1000 times faster" than the Python version of pickle. You lose out on some of the subclassibility of the normal pickle, but hey, you can't argue with 1000 times faster.

Object serialization really is a beautiful thing!

Cheers

Monday, October 8, 2012

Design Trade-Offs

After reading Is Design Dead?, I have come to realize the debates that exist over the amount of effort and thoroughness that should be put into the design stage of XP. XP really encourages that design really be indirectly created through the development of user stories, and that designs really should not be "set in stone" when development begins.

When I started my first large project in the summer with ARM, I more or less took the XP approach. I had no idea what issues I would come across in development, and I really had no idea what classes and structures were necessary to complete the job. And that's ok! So I thought. The design would sort itself out in the end.

There's a balance though. There needs to be some element of design pattern application and forethought structure put in place such that refactoring and scalability will not be an issue later. You can't just go aimlessly writing hundreds of lines of code implementing spike solutions to user stories. There still needs to be some thought put into design so that when it comes time to improve the solution, you won't have to rewrite the entire thing.

Again, for my project, I had no idea what problems I would come across, so I sort of formed a "design-as-you-go" mentality. Keep design as simple as possible and as scalable as possible. The design should make sense from an overhead view and should be fluently applicable at the source code view. As the project grows in scope, it can be difficult to balance the work put into design and the work put into user stories. How much time do I spend refactoring code in order to adhere to a better design? In my experience, if I think that a certain piece of code will need to be expanded on later, I tend to spend a good amount of time on refactoring it if I believe there is a better design. Often times there is just no way I would have known why this new thought out design was better, so there is no way I could have designed it in the first place.

Sometimes people can get really attached to pre-determined designs and force code to adhere to it. The problem is that the code will begin to lose its flow and explaining it to someone else will become more and more confusing. If you can't explain the overall flow to someone, it's probably a bad idea, and it is due to bad design, the design needs to be modified.

Just as people shouldn't get too attached to code, don't get too attached to design! In my opinion, software design is just as important as the code itself, and if it needs to be changed for the better, by all means just do it! Cross it out with a pen! If you force yourself to adhere to a bad pre-determined design, you are going to have a bad time.

Monday, October 1, 2012

Test Driven Development

Last week we had a visitor from National Instruments come and speak to our class about the company and how they implement different forms of test driven development. In every case where it was used in previous projects it was a great access, but a few issues were shown that can make it somewhat difficult to implement. (Ones I have had issues with in the past as well)

1. Getting everybody on board with test driven development can be a real chore, especially if the project manager is hard-coded with old fashion methodologies which say to test later.

2. Sometimes it is difficult to test code which belongs to someone else. When developing a front end or application which depends on other databases, it is sometimes impossible to test the database communication functions independently. Sometimes top-level acceptance type tests are only possible, and that's ok! At least the entire system is testable.

3. It is difficult to test when the project outcome was not defined well enough when the test writing began. Let's say for instance you write 100 tests based on a particular understanding of what the correct output is. If the project manager keeps coming in and saying, "I know we created those stories a month ago, but the output needs to change again", it starts to get very annoying. When the consumer/manager/person who wants the application to exist keeps dramatically changing his or her mind about the scope of the project, the will for developers to effectively test can really be impacted. Establish scope first!

These are just a few, but the first and third of these examples are two of the reasons why I personally have trouble with test driven development. I will say it is the way to go. The representative from NI (National Instruments) really showed me to push myself to be in better communication with my manager and learn to persuade the company of things I truly believe will help. TDD is definitely one of those things and should make my job much easier.

Saturday, September 22, 2012

Some Python Nuances

Me and my partner have been struggling a little bit with the current project due to some misunderstood Python nuances. Our current implementation to the Project File Dependencies problem on Sphere calls for a list of lists. Seems like standard procedure right? We should just use Python's overloaded * operator to initialize the data structure like so:

matrix_size = 5
matrix = [[]] * matrix_size

Which, if we print out the matrix, we get the following output:

[[], [], [], [], []]

It seems like it is working just fine, but if we do a simple operation, like add a number to the list at index 0, funny things start to happen. See the following operation:

matrix[0].append("?")

and the contents of the matrix afterwards:

[['?'], ['?'], ['?'], ['?'], ['?']]

How does this make sense? What were the creators of Python thinking? Why does this functionality make sense? I do not understand why this is the case. It is acting like each list in the matrix is just a reference to one matrix. In order to get the functionality we wanted we had to do something like the following:

matrix = [[] for i in range(matrix_size)]

which seems to be working just fine because a new list is created for each index.

Another issue we were having lies in the annoyance that Python has no increment or decrement operators! And if you try to use them, nothing happens! It is valid Python syntax!

It is all ok though. With every new language comes a little bit of headache. As far as interpreted languages go, Python is coming to be one of my favorites so far, and as long as I can minimize future headaches like the first one in this post I think I will be quite alright.

Monday, September 17, 2012

Write Tests Before Development

In my recent programming travels at work I have come across an increasing problem: Communicating the importance of testing to the project manager. Sometimes, oddly enough, the manager does not allocate time for testing in the development cycle, and simply accepts a spike solution as the final solution and asks for more features.

To a project group not accustomed to the methods of Extreme Programming, how does one communicate the effectiveness of the programming methodology to his or her peers? In my opinion it is simple, you just do it anyway.

Sure, the manager might moan about how it is taking you a little longer to get things "complete", but the difference is code that "somewhat provably works in all possible cases" versus code that "works in a small demo and might work in more cases". Writing tests first makes sense for a variety of reasons. First, it allows you to define the exact purpose of a method or function by testing different inputs on defined outputs. Every time I change or refactor some code in a class, I don't want to have to open up the GUI and click through different execution paths just to test some small method. Writing the tests first also allows you to double check a story with the "customer" who created the story (In my case the manager) and make sure the expected output is really what is expected. My final reason for writing the tests first is that you won't be tempted into not writing them later. A lot of times when we write code and build on it we can get the illusion that the building block classes that make up the outer level tested class work correctly, which might not necessarily be the case. If someone writes another api or class that utilizes an untested class, it opens up the door for new execution paths which can introduce new bugs.

Always test your code. Test first, and your life will be much easier later on in the development cycle. I can't stress it enough!