Monday, November 2, 2009

Setting Up Your Own Jython 2.5.1 Environment

In this post I'll show you how to setup your own Jython environment.

Prerequisite to this, you must have a Java Runtime Environment (JRE) installed in your system. Chances are when you already have browser running some Java applet or Java Web Start application, you already have JRE installed on your system.
Get the latest JRE from http://java.com/en/download/manual.jsp.

First you need to download a Jython binaries installer, currently the latest one is version 2.5.1, which you could download from this link.

After download completion you should have a file named jython_installer-2.5.1.jar, click on the file, or run from command line:

[sourcecode lang="shell"]<br />$ java -jar jython_installer-2.5.1.jar<br />[/sourcecode]


Or in some other platform:

[sourcecode lang="shell"]<br />c:\&gt; java -jar jython_installer-2.5.1.jar<br />[/sourcecode]


Now you need to add you Jython path to your OS Path variable, so that the shell or command line interpreter is able to find Jython. In my sample Windows installation case:

[sourcecode lang="shell"]<br />C:\&gt; set PATH=%PATH%;c:\opt\env\jython2.5.1<br />[/sourcecode]


Alternatively, if you are running bash shell (assuming you installed Jython to /opt/env/jython2.5.1 folder):

[sourcecode lang="shell"]<br />$ export PATH=$PATH:/opt/env/jython2.5.1<br />[/sourcecode]


Now you may proceed with the first invocation of Jython.

[sourcecode lang="shell"]<br />C:\&gt; jython<br />*sys-package-mgr*: processing new jar, 'C:\opt\env\jython2.5.1\jython.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\resources.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\rt.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\jsse.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\jce.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\charsets.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\ext\dnsns.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\ext\localedata.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\ext\QTJava.zip'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\ext\sunjce_provider.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\ext\sunmscapi.jar'<br />*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\ext\sunpkcs11.jar'<br />Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)<br />[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_16<br />Type "help", "copyright", "credits" or "license" for more information.<br />&gt;&gt;&gt;<br />[/sourcecode]



The jython command line actually invoked a Windows batch file jython.bat, which
When first time invoked, Jython interpreter will activate sys-package-mgr to add standard Java library classes as Python package.

 

After the installation, just play around with simple thing such as:

[sourcecode lang="shell"]<br />Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)<br />[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_16<br />Type "help", "copyright", "credits" or "license" for more information.<br />&gt;&gt;&gt; from java.lang import System<br />&gt;&gt;&gt; System.out.println(&amp;quot;Hello, world!&amp;quot;)<br />Hello, world!<br />&gt;&gt;&gt; System.out.println('Hello, world!')<br />Hello, world!<br />&gt;&gt;&gt; System.currentTimeMillis()<br />1257145977410L<br />&gt;&gt;&gt; from java.lang import String<br />&gt;&gt;&gt; s = String(&amp;quot;abc&amp;quot;)<br />&gt;&gt;&gt; s.toString()<br />u'abc'<br />[/sourcecode]


Look at how System.out.println is able to receive both single quote and double quote Python string, and also the java.lang.String type by a Java class method is a Unicode string (u'abc').

Wednesday, October 21, 2009

PyDev 1.5 on Eclipse Galileo 3.5 SR1

I begin to have more time to explore the Eclipse Galileo Service Release 1 (Eclipse 3.5.1). This time I want to set it up for my Python development, using the PyDev plugin.

PyDev is an Eclipse plugin for Python language application development. It supports CPython, Jython and IronPython runtimes. It supports the interactive as well as non-interactive Python development on Eclipse IDE.

Eclipse IDE (and platform) has become the de facto industry standard IDE. More and more free software, open source software, and proprietary software makers build their systems on Eclipse platform. Galileo is the codename for 3.5 release, and now the latest one is the SR1 (Service Release 1).

There has been interesting development on PyDev since its last 1.4.7 version. PyDev has moved to be sheltered under the umbrella of Aptana. PyDev and PyDev Extension now has been merged and both now open source (previously the PyDev Extension, the advanced version of PyDev was not an open source).

The update site has been moved from domain www.fabioz.com (http://www.fabioz.com/pydev) to domain pydev.org (http://pydev.org/updates) . Aptana  This development looks good for me, means that PyDev now being maintained by its own organization, and hopefully in more serious efforts.

In Eclipse Galileo, you need to go to the menu and choose: Help > Install New Software...

eclipse-pydev-update-01

It will show the Install dialog.

eclipse-pydev-update-02

Click the [Add] button on the upper of menu dialog. You will get the "Add Site" dialog.

Type name and location.
Name: "PyDev Update  Site" (you can feel with any arbitrary name though)
Location: http://pydev.org/updates

eclipse-pydev-update-03

Back to the "Install" dialog, the previous "Add Site" will fill up the "Work with" field for you.

eclipse-pydev-update-04

Now tick the Checkbox "PyDev for Eclipse" under "PyDev". You may also optionally tick the "Pydev Mylyn Integration". In this example we don't tick.

Click the [Next >] button. It will show you the Install Details:

eclipse-pydev-update-05

Proceed by clicking [Next >] button. Review Licenses dialog will be displayed.

eclipse-pydev-update-06

Click the radio button "I accept the terms of the license  agreement" when you are agree with the licenses. Proceed by clicking [Finish] button.

eclipse-pydev-update-08

You will see the progress bar. Somewhere around the progress, you will be asked whether to install software that contains unsigned content.
eclipse-pydev-update-09

Click [OK] to proceed (we have no choice as long as the provider of this plugin hasn't signed the content).

eclipse-pydev-update-10

Click the [Yes] button to restart your Eclipse IDE.

Tuesday, September 15, 2009

Chardet Python Library: Determining Character Encoding of Text File or Text Stream

Recently I have came into a disagreement with fellow programmer. He tried to create view simple codes in different programming languageshe knows: C, C++, Perl, Python, PHP, and Java (I think it's for self-actualization, obviously not for money or wealth). It turns out to be he created XML parser codes on those (programming) languages.

The code doesn't produce as expected, except of course in Python. He ascertained that it is because most those languages make use of libxml2 library (from GNU). He came into a conclusion that the Java library might also make use of the libxml2 library from GNU, which I believe most unlikely. Java wouldn't normally pick GNU stuffs for its library. It really hit into my Java developer pride, and I need to prove that it is not the library that is incorrect, instead the input file is not correctly encoded.

Although in Western world most text files usually play along well most of the time, characters other than ordinary characters will be displayed incorrectly. Most of the case it will be displayed incorrectly in countries that uses special characters!

Since the first time I work in Singapore, I absorb it quickly that the ability to display these special characters are one important thing a web application should have, especially when they cater for the Asian market! You will see that they will require you to do i18n, display this in Traditional Chinese characters, in Simplified Chinese characters, Tamil characters, Thai characters, to name a few...

It brought me to this question:
If I have an arbitrary text file, how could I know which encoding the text uses?

I googled around and found  this: Universal Encoding Detector (chardet) library. The chardet site shows how to do it for web sites.

[sourcecode lang="python"]<br />&gt;&gt;&gt; import urllib<br />&gt;&gt;&gt; urlread = lambda url: urllib.urlopen(url).read()<br />&gt;&gt;&gt; import chardet<br />&gt;&gt;&gt; chardet.detect(urlread("http://google.cn/"))<br />{'encoding': 'GB2312', 'confidence': 0.99}<br /><br />&gt;&gt;&gt; chardet.detect(urlread("http://yahoo.co.jp/"))<br />{'encoding': 'EUC-JP', 'confidence': 0.99}<br />[/sourcecode]


Now it here comes the next question for me: if it is not a web site, but a file instead, how to detect the encoding?

So here is the solution:

[sourcecode lang="python"]<br />&gt;&gt;&gt; import chardet<br />&gt;&gt;&gt; fileread = lambda filename: open(filename, "r").read()<br />&gt;&gt;&gt; chardet.detect(fileread("italian-english.xml"))<br />{'confidence': 0.9899999999999999, 'encoding': 'utf-8'}<br /><br />&gt;&gt;&gt; chardet.detect(fileread("utf8_demo.txt"))<br />{'confidence': 0.9899999999999999, 'encoding': 'utf-8'}<br /><br />[/sourcecode]


So this way, I checked that the text file is most likely an UTF-8 encoded text.

By the way, the encoding detection engine library was derived from (or ported from) Mozilla auto-detection code, which has been widely used. So more or less I believe it is mature and powerful enough for most usages.

Another thing to note is that those codes above are using lambda notation, a feature of Python language borrowed from pure functional programming syntax.

Thursday, August 20, 2009

Python 3.1 File I/O open() Is No Longer Binary Mode By Default

After long holiday on my wedding preparation, D day and honeymoon, I returned back in shape.

Some things happened during the off days, such as the acquisition of SpringSource by VMWare, Inc.

I try to run my piece of XML parsing code like this in Python 3.1:

[sourcecode lang="python"]
import xml.parsers.expat
import sys
import os.path

filename = "companies.xml"
parser = xml.parsers.expat.ParserCreate()
f = open(filename, "r")
parser.ParseFile(f)
f.close()
[/sourcecode]

The code above throws an error like this:

[sourcecode lang="shell"]
TypeError: read() did not return a bytes object (type=str)
[/sourcecode]

This source code works in Python 2.6, but failed when running on Python 3.1.

After comparing the manual for the built-in open() function between those 2 versions (2.6 and 3.x versions), I found out that there is a new feature in Python 3.x which is not backward compatibility (unlike transitions between 2.x versions, the transition from 2.x to 3.x may break the backward compatibility).

Python 3.x added the "t" for text, "b" for binary, "+" for updating (read/write) and "U" modifier to the open file mode.

It turns out to be that Python 3.1 no more handles file open as binary by default.Now the text ("t") mode become default for Python 3.1 meanwhile Python 2.6 assume all file access are binary accesses. Python 3.x has implemented modes of more similarity to its C stdio library counterpart than its previous 2.6 version.

Since Python 3.x added additional parameter to the open() function, we must now specify "b" to make it binary access, so that the read() method will return bytes, otherwise it will return str.

[sourcecode lang="python"]
import xml.parsers.expat
import sys
import os.path

xmlFilename = "companies.xml"
p = xml.parsers.expat.ParserCreate()
f = open(xmlFilename, "rb")
p.ParseFile(f)
f.close()
[/sourcecode]

Now the ParseFile() method get what it asks for, a file handler with read() method that return bytes instead of str.

Saturday, July 18, 2009

PyDev 1.4.7 on Eclipse Galileo

WARNING:

The content of this blog is obsolete now, as newer PyDev version 1.5 has been released. Please look at my latest blog post on this.

I begin to have more time to explore the Eclipse Galileo (3.5) again. This time I want to set it up for my Python development, using the PyDev plugin. Current PyDev plugin version is 1.4.7

Assuming you already have a working  installation of Galileo, and have opened a workspace.

The update site for PyDev is http://pydev.sourceforge.net/updates

In Galileo, you need choose: Help > Install New Software...

[caption id="attachment_605" align="aligncenter" width="313" caption="Adding the PyDev Plugin to Eclipse Galileo"]Adding the PyDev Plugin to Eclipse Galileo[/caption]

The "Install" dialog will show up.

eclipse-update-02

Click on the [Add] button at the top right of dialog. You should see this "Add Site" dialog.

eclipse-update-03Type in the data.

Name: Py Dev Update Site
Location: http://www.fabioz.com/pydev/

eclipse-update-04

Click [OK] button.
eclipse-update-05

Tick the checkbox "PyDev for Eclipse" under "PyDev". Click [Next >] button.
eclipse-update-06

Click [Next >].

eclipse-update-07

Click the radio button "I accept the terms of the license agreement" when you agree. Click [Finish] button.

You will see the progress bar.

eclipse-update-08

On completion of the installation process, Eclipse will ask you whether you want to restart.

Answer by clicking the [Yes] button.

Done.

Thursday, July 9, 2009

Python's Lazy Evaluation on Exception Handling

I tried a simple code like this:

[sourcecode lang="python"]
try:
while True:
print('yipee')
except KeyboardInterrupt:
print('w00t')
[/sourcecode]

It runs successfully on command line interface, and keeps printing "yipee" until the user press Ctrl-C. After that I did a typo, which turns out to mistype the "KeyboardInterrupt" as "KeybaordInterrupt".

[sourcecode lang="python"]
try:
while True:
print('yipee')
except KeybaordInterrupt:
print('w00t')
[/sourcecode]

To my surprise, it still runs well, it keeps printing "yipee" to the screen. It's just that
when I press Ctrl-C, Python threw this error:

[sourcecode lang="shell"]
Traceback (most recent call last):
File '<stdin>', line 3, in <module>
KeyboardInterrupt
[/sourcecode]

During handling of the above exception, another exception occurred:

[sourcecode lang="shell"]
Traceback (most recent call last):
File '<stdin>', line 4, in <module>
NameError: name 'KeybaordInterrupt' is not defined
[/sourcecode]

I looked at the documentation of Python 3.1 which says:
The try statement works as follows.

First, the try clause (the statement(s) between the try and except keywords) is executed.

  • If no exception occurs, the except clause is skipped and execution of the try statement is finished.

  • If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the except keyword, the except clause is executed, and then execution continues after the try statement.

  • If an exception occurs which does not match the exception named in the except clause, it is passed on to outer try statements; if no handler is found, it is an unhandled exception and execution stops with a message as shown above.


The exception handling in Python works in a lazy manner, they will only validate the except clauses just before processing the exception!

Monday, February 23, 2009

Print command now as a function in Python 3.0

Recently I managed to play around with Python again. The Python 3.0 (now I'm using 3.0.1) treats the print as an ordinary function.

Last time it is a special command:

[sourcecode lang="python"]
print 'hello, world!'
[/sourcecode]

which is treated as a different animal from any other Python functions. Now the Python 3.0 team has return print back to its mainstream:

[sourcecode lang="python"]
print('hello, world')
[/sourcecode]

The behavior of weird comma, a GW-BASIC-like print command also has been fixed here, we have to append a comma to tell Python interpreter not to print a new line character after a print command:

[sourcecode lang="python"]
print "I don't want to skip to",
print "next line"
[/sourcecode]

which will print (it adds space between the words "to" and "next line"):

I don't want to skip to next line

Sometimes, or most of the time, this behavior is considered weird. Of course it is not weird for a used-to-be-a-GW-BASIC/BASICA programmer like me. But for the rest of the world who only knows Pascal, C, Java, C++, etc it is a weird syntax!

Now Python 3.0 comes with the print as a normal Python function:

[sourcecode lang="python"]
print("I don't want to skip to", end="")
print("next line")
[/sourcecode]

which will print:

[sourcecode lang="shell"]
I don't want to skip tonext line
[/sourcecode]

It's good to see that now some of the inconsistencies have been fixed. Of course it took a lot of consideration as they don't want to break the compatibility with previous versions. The 3.0 release is of course an exception. It was released as a breaking version, there will be some backward incompatibility with previous versions (2.x).