U2.com > Welcome

So, I can imagine there will be a few posts about my music preferences scattered here and there; maybe it wont all be about data.  That being said, here is one.

I am a huge fan of U2…actually a fanatic.  The first concert I ever went to was a U2 concert when I was 15 at the old Gillette Stadium.  It was a summer concert, HORRIBLE seats, warm weather, stadium, what seemed like 60K plus U2 fans…just euphoric.

Every tour has since been an arena tour…or a venue probably half that size, at best….”more intimate venues.”

Well this new tour is underway….and they are playing stadiums again.  I am massively excited, but I have been trying to avoid “all things about the concert” as much as possible. For shits and giggles, I ventured onto U2′s site, and saw this “lead in” picture on their site’s landing page.  It is a picture from their tour.

I haven’t seen setlists, have barely seen pictures from the tour…..but I have a child like excitement for this concert!

Oh I hope they dig deep for the setlist, and from the picture, they might have!

U2.com > Welcome.

Retrieve Files from the Web

One of the things I have been trying to do is learn Python.  I use SPSS everyday and am beginning to realize that with a little bit of effort, I can make my life as a data analyst a lot easier if I learn to retrieve data from the web, pre-process it, and parse it into SPSS or my database of choice (remember:  Data Management is a huge issue that many analysts overlook!).

In short, I am trying to learn the tool by applying problems to things I am interested in.  This introductory post will focus on sports, more specifically, NBA data.  Dougstats.com is a great resource for daily and annual data.  He posts text files that aggregate player data for the season.  My objective is to use Python to retrieve the data and throw it in MySQL, so I can use R and SPSS to analyze the data.  Essentially, my first task is to learn how to use Python to get external data and structure it so I can use it in a host of tasks.

Hopefully as time goes on, my coding will improve, but I imagine this first code block is very primitive.  It simply uses Python to navigate to the file, save it to my computer.  This isn’t that radical of an idea, but if you are like me and are trying to figure out ways to be more efficient, this should make you smile.   I have yet to figure out how to parse it into MySQL, but that will come.

Hope this helps:

import urllib

years = ['08-09','07-08','06-07','05-06','04-05','03-04','02-03','01-02','00-01']

for year in years:

path = r’C:\Sports\NBA\Dougs Stats\%s.txt’ % year

url = r’http://www.dougstats.com/%sRD.Each.txt’ % year

urllib.urlretrieve(url, path)

print ‘retrieved file for %s’ % year

If you are like me, coming to coding from learning MS Office products and VBA, this code may seem weird, but the logic is hopefully intuitive.  In short, I define an object that is a list of values.  I use these variables by utilizing string substitution (%s inserted into a string) and define the values of the substitution by assigning % year, which takes the value from the loop.  It is as simple as that.

years = ['08-09','07-08','06-07','05-06','04-05','03-04','02-03','01-02','00-01']
for year in years:
path = r’C:\Sports\NBA\Dougs Stats\%s.txt’ % year
url = r’http://www.dougstats.com/%sRD.Each.txt’ % year
urllib.urlretrieve(url, path)
print ‘retrieved file for %s’ % yea

How to Setup and Use Eclipse for Python

In addition to PHP, the other language I have been trying to learn is Python. Primarily, I have been trying to select a new computer language because I think I have outgrown VBA. I looked at Perl, but there seems to be a ton of help for Python, plus, since SPSS can be controlled within the language, I decided to give it a shot. For the most part, I want use it to get data off of the web and be able to write it into a database or SPSS. In the end, I think I might be able to get dynamic data, model it in SPSS and write reports (maybe into Word?). I am pretty excited for the prospects, but I have a lot to learn.

When I learned VBA, I got used to the editor environment. When I moved to Python, I was trying to find something similar. In Python terms, I was looking for an IDE. You don’t need one of course, as you can just type in commands one by one in the Interactive shell, but that is not exactly the best way to go. I tried PythonWin, but it just wasn’t cutting it.

Up comes Eclipse. I have heard a lot of people talk about it, or reference it, but I couldn’t quite figure out what I was supposed to do. Again, I am not a computer scientist; hobby programmer at best. Everything I searched for appeared to be geared towards those who could invent a whole new language, as such, I felt like it wasnt for me. However, after some searching and patience, I think I figured it out, and just after playing around for an hour so, I am excited for the possibilities. As such, I am going to outline the basic way I setup the environment and highlight some of the things that stand out to me.

  1. Download Eclipse here http://www.eclipse.org/downloads/. I first Downloaded Eclipse Classic. My first stumbling block was “which one to select?” While I think that was a fine choice, I realized that I probably would want to try my hand at PHP as well.  I subsequently downloaded the PHP developer tool and added PyDev again as described below.  Now I can write PHP and Python code from the same tool!
  2. I unzipped the folder and saved a shortcut to my desktop.
  3. I do not believe that Eclipse natively supports python, but you can install external modules that expand its capability. To do this, go to Help > install new software.
  4. At this screen, you will add a site. The Python functionality sits in a module called PyDev. Point to this http://www.fabioz.com/pydev/updates
  5. Pydev will be a selection. Select it, accept terms, and install it. You have to restart Eclipse, but after that, Voila, good to go.
  6. You need to change the Interpreter. Go to Window > Preferences, select PyDev, Interpreter – Python, new Interpreter, and point to the python.exe file in your install folder. On Windows, it is probably C:\\Python2X. Select the file, and name it. NOTE: If you Google Configure Eclipse Python or Setup Eclipse Python, you should see some help.
  7. Finally, to create a new Python script as you would in PythonWin or any other IDE, go to New – select PyDev Project. Give it a name and select OK (more about projects in a second). It will put it in the project tree on the left. Right click the project, select new Pydev module. From this point, you can start coding in Python!

Now, as to why I like Eclipse. First, it dynamically gives you updates on errors, before you run your code. Second, it also tells you if a variable or imported module is not used, or even available in your Python environment. This is extremely helpful, especially when you are trying to modify code that you found on the Internet.

Also, and this wasn’t obvious to me at first, but I LOVE the idea of projects. If I am testing code, learn new tricks, see code examples that may be applicable for a task, I save the scripts in the applicable project. For example, I want to collect alot of data for professional sports, so I created a project for each sport of interest. Essentially, it allows me to organize my work. I can even insert text files as a reminder of things I want to do, need to learn or have learned. Now, it goes without saying that this probably isn’t the proper use, but hey, it works for me.

Either way, I think I am really going to like developing in Eclipse. My errors are obvious (I had a hard time deciphering errors in PythonWin) so fixing my code will be easier. There are “tooltips” that help you complete code, which is something I was used to when writing VBA. Finally, I love that I can keep all of my scripts organized together. I will keep you posted if my opinion changes, but as of now, I would GREATLY recommend Eclipse for someone who is trying to learn a programming language.