How good? Let me tell you...
I am a big Alexandre Dumas fan. He's the direct ancestor of Neal Stephenson, so many of you should like him too. So I used one of his best books to try some automatic typesetting of project gutenberg texts.
No, the whole book did not convert without errors, and yes, there is some manual work in what you are about to see, but hey, take a look.
Here's a far look of the first two pages:
And here's some detail of the typsetting:
Yes, the typesetting is not really LaTeX quality, but it's not bad, either.
Compare it with the HTML version at project Gutenberg. The typesetting is a thing of beauty compared to that :-(
The image is a picture of Chateau d'If from flickr, released under Creative Commons. The title font is Scriptina, I chose it because it looks 19th century but modern.
I did a release yesterday, and another today of my rst-to-pdf-without-latex tool. What's new? Here's an incomplete list:
New in 0,4
New in 0.3
Of course, since I said I would release something every friday, this means I need to find something else to release? ;-)
This mini-sprint is doing wonders for rst2pdf. Now on SVN: pygments-based syntax highlighting. Example here: rst2pdf's code, in a PDF by rst2pdf.
Following my new policy of one release every friday, in 6 days you will see a rst2pdf release. But not any release: a great release.
What will be new?
I intend to call this release 0.3.0, but maybe I will jump higher, since there is not much more left to implement.
Since revision #17_ you can display Page numbers in headers and footers (only!) by using this syntax:
.. header:: This is the header. Page ###Page### This is the content .. footer:: This is the footer. Page ###Page###
It has some issues if your page number is bigger than 99999999999 or your header/footer is a little longer than one line when using the placeholder, because the space required is calculated with the placeholder instead of with the number, but those are really marginal cases.
Next in line, a decent way to define custom stylesheets.
As for "time-based releases", I intend to release a new version of something every friday.
Since I have about a dozen projects in different stages of usability, I expect this will push me a bit more towards showing this stuff instead of it rotting in my hard drive and unknown svn repos.
This article is inspired by a thread in the PyAr mailing list. Here´s the original question (translated):
From: Daniel Padula
I need some advice. I need to create an application for schools that takes student data (personal information, subjects, grades, etc) and produces their grade report. I need to create a printed copy, and keep a historic record.
As a first step, I thought on generating them in PDF via reportlab, but I want opinions. For example, I can generate the PDF, print it and regenerate it if I need to reprint it. What other optins do you see? It's basically text with tables. Reportlab? LaTeX? Some other tool?
To this I replied I suggested Restructured Text which if you follow my blog should surprise noone at all ;-)
In this story I will try to bring together all the pieces to turn a chunk of python data into a nice PDF report. Hope it´s useful for someone!
Here's an example I posted in that thread: how to create a PDF with two paragraphs, using restructured text:
This is a paragraph. It has several lines, but what it says does not matter. I can press enter anywhere, because it ends only on a blank line. Like this. This is another paragraph.
And here's what you need to do in reportlab:
# -*- coding: utf-8 -*-
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
styles = getSampleStyleSheet()
def go():
doc = SimpleDocTemplate("phello.pdf")
Story = [Spacer(1,2*inch)]
style = styles["Normal"]
p = Paragraph('''This is a paragraph. It has several lines, but what it says does not matter.
I can press enter anywhere, because
it ends when the string ends.''', style)
Story.append(p)
p = Paragraph('''This is another paragraph.''', style)
Story.append(p)
Story.append(Spacer(1,0.2*inch))
doc.build(Story)
go()
Of course, you could write a program that takes text separated in paragraphs as its input, and creates the reportlab Paragraph elements, puts them in the Story and builds the document.... but then you are reinventing the restructured text parser, only worse!
Restructured text is data. Reportlab programs are code. Data is easier to generate than code.
You create a file with the data in it, process it via one of the many rst->pdf paths (I suggest my rst2pdf script, but feel free to use the other 9 alternatives).
Suppose you have the following data:
frobtimes = [[1,3],[3,5],[9,8]]
And you want to produce this report:
Frobniz performance =================== * 1 frobniz: 3 seconds * 3 frobniz: 5 seconds * 9 frobniz: 8 seconds
You could do it this way:
print '''Frobniz performance ===================''' for ft in frobtimes: print '* %d frobniz: %d seconds\n'%(ft[0],ft[1])
What you want is to use, say, Mako (or whatever). It's going to be better than your homebrew solution anyway. Here's the template for the report:
${title('Frobniz Performance')}
% for ft in frobtimes:
* ${ft[0]} frobniz: $ft[1] seconds
% endfor
This uses a function title defined thus:
title=lambda(text): text+'\n'+'='\*len(text)+'\n\n'
You could generalize it to support multiple heading levels:
title=lambda(text,level): text+'\n'+'=-~_#%^'[level]*len(text)+'\n\n'
One very common feature of reports is tables. In fact, it would be more natural to present our frobniz report as a table. The bad news is how tables look like in restructured text:
+---------+----------------+ | Frobniz | Time (seconds) | +---------+----------------+ | 1| 3 | +---------+----------------+ | 3| 5 | +---------+----------------+ | 9| 8 | +---------+----------------+
Which is very pretty, but not exactly trivial to generate. But don't worry, there is a simple solution for this, too: CSV tables:
.. csv-table:: Frobniz time measurements :header: Frobniz,Time(seconds) 1,3 3,5 9,8
Produces this:
| Frobniz | Time(seconds) |
|---|---|
| 1 | 3 |
| 3 | 5 |
| 9 | 8 |
And of course, there is python's csv module if you want to be fancy and avoid trouble with delimiters, escaping and so on:
def table(title,header,data):
head=StringIO()
body=StringIO()
csv_writer = csv.writer(head, dialect='excel')
csv_writer.writerow(header)
head=´:header: %s´head.getvalue()
csv_writer = csv.writer(body, dialect='excel')
for row in data:
csv_writer.writerow(row)
body=body.getvalue()
return '''.. csv-table:: %s
:header: %s
%s
'''%(title,head,body)
will produce neat, ready for use, csv table directives for restructured text.
This python program is really generic. All you need is for it to match a template (an external text file), with data in the form of a bunch of python variables.
But how do we get the data? Well, from a database, usually. But it can come from anywhere. You could be making a report about your del.icio.us bookmarks, or about files in a folder, this is really generic stuff.
What would I use to get the data? I would use JSON in the middle. I would make my report generator take the following arguments:
That way, the program will be completely generic.
So, put all this together, and there's the superduper magical report generator.
Once you get rst, pass it through something to create PDFs, but store only the rst, which is (almost) plain text, searchable, easy to store, and much smaller.
I don't expect such a report generator to be over 50 lines of code, including comments.
Because of a thread in the PyAr list about generating reports from Python, I suggested using ReST and my rst2pdf script.
This caused a few things:
So, if you want the simplest way to generate PDF files from a program in the entire pythonic universe... give it a look.
A little over a month ago, on July 15th, I opened a Google Code project called uRSSus. Here's the commit. My goal was to try building a desktop application like if I were building a web application, using a ORM, templating, generic views, and other things.
The first thing I learned is that it was more fun to just write the application and see it grow than spending time writing the framework needed to do what I wanted, so I just kept the ORM, and the rest is pretty traditional code.
The second thing I learned is that for a hobbyist programmer, this is a golden age. I am not exactly an awesome programmer myself, and with today´s tools, I could almost wish my app into existence. When I started programming on a PC, I had to swap floppies to change from the IDE to the compiler [1]. And if I made a mistake, the computer crashed. No, not the program. The computer crashed.
Now? I get a pretty dialog, a link to the position, a stack dump, etc, etc, etc. Not missing the old days at all.
Another way this is a golden age is that there is a lot of code out there. I literally had to learn my code from books. I first "got" C by reading the help for a pirated copy of Autodesk Animator's POCO extension language. There were no collections of code I could look at and learn. There were not even any large libraries of code I could legally use!
And that´s another reason why this is a golden age: Open Source and Free Software. You really can be a programmer just by willing it and effort. You will not lack tools, you will find users (if you are good), you will find helpers (if you are lucky), you will find free infrastructure (svn repos, free wikis, free file hosting, free everything), you will find libraries you can use!.
The third thing I learned is that Python does come with batteries included. Many things that would be annoying effort in other languages are just there, ready to be used. Add the internet, and it´s a Mr. Fusion instead of a battery.
The application I developed is a News aggregator and thanks to Mark Pilgrim I had Feed Parser and thanks to Troll Tech (Now Nokia) I had Qt for the UI, and many many other things. I could focus on application logic, not on parsing and drawing.
The fourth thing I learned is that a month is a long time when you have productive tools. Urssus (that's my application) was functional (but awful) in a day or two. It was not awful in 2 weeks. It was pretty good in 3.In a month? Download it and see for yourself, I like it, the SVN version is much better most of the time, try revision 619 ;-)
The fifth thing I learned is that Python performance is good enough. I don´t see much performance difference between uRSSus and, say, Akregator, which is C++, except on places which are obviously broken. Sure, the database is C, the UI toolkit is C++... they are all black boxes to me here. I code Python. My pieces do well.
The last thing I learned is that I can still code free software. I had not written a useful/usable large free software application in perhaps 8 years. I am 36.9 years old... excuse me if I feel middle-aged, surrounded by youngsters which are faster, more dedicated and actually have free time.
Because of the productivity of the tools, I managed to code just a couple of hours a day for the first weeks, and progress was still good, so I didn´t get discouraged, which is the worst enemy of free software.
It has been a fun experiment, hopefully it will be a fun ongoing hobby.
| [1] | Can you guess what I was using? |
Just sent it to Python Argentina:
http://mx.grulic.org.ar/lurker/message/20080813.155700.6b988bc8.es.html