Blog posts tagged: google-app-engine
News and other things I find interesting
Converting a Django site to Google App Engine
Last modified: Sunday, May 01, 2011
Since my company was acquired, I needed to get my website off of the new company's servers.
My site was built with Django, and I didn't want to have to pay for hosting.
I was at the same time reading a book on Google App Engine (GAE), and realized this would be a great fit. GAE offers generous initial quotas for free and I think I can fit within the free quota limits of GAE for a while. The free quotas equate to 1 GB of persistent storage and enough CPU and bandwidth for about 5 million page views a month.
As of a few days ago, the site was fully converted to a live GAE site. The page you are reading was served from GAE.
The entire conversion process only took a day and a bit for bug fixing, including coding an administrative back-end. To import the data from the old site I used the GAE shell which interfaced into sqlite and my local GAE instance, and then used the GAE bulk exporter from my local datastore and bulk importer to the remote datastore. Alternatively, I could have used the Remote API if I wanted to go directly to the remote site. If this all sounds foreign to you, and you are interested in GAE, buy the book from the review below!
As most Django websites, my existing site was based on a relational database (in particular sqlite). You can't use relational databases with GAE, so most of the work in converting the site was in converitng the Django models and queries to the datastore's equivalents. In particular I had to re-code the blogs database, comments, tags, and syndication using Google's datastore and GQL.
In addition, the Django administration console didn't work since it relies heavily on the database back-end. I made a small administration section myself using Django forms.
GQL by the way looks exactly like SQL, but doesn't allow joins, and can't select partial entities which exist in your datastore.
GAE doesn't support Django 1.3 yet, so I used 1.2 and the Google App Engine Django project. The Google App Engine Django project interops GAE with Django easily. I was using Django 0.96 on my old site so this was already an improvement. The Google App Engine Django project is a great project which allows access to many Django features including using manage.py, sending mail, and Django's test framework.
The only downside of GAE is that if Google ever stops offering their service, I'd need to go through another conversion. There are already several replacement systems though which implement the same platform that you can host yourself, although I've heard all are still missing some functionality.
After watching the launch videos of Google App Engine at Google Campfire (6 part video), I was surprised to hear that Guido van Rossum works on the Google App Engine team. Apparently 50% of his time is still spent working on managing the Python language itself.
Tags: python django google-app-engine
Add a new comment | 2 comment(s)
|
Have you heard of Django-nonrel? It supports Django models, the admin UI and a lot of other Django features, so you could've reused much more code. |
|
Ya I actually mentioned the non rel project in my critique of the GAE book in my previous blog post. It seems further ahead than the Google App Engine Django project which I used and which the book recommended. I wanted to get a tighter feel for GAE and the datastore anyway though, so I didn't mind. |
Review of Programming Google App Engine
Last modified: Tuesday, October 11, 2011
★★★★☆ (4 stars out of 5)
I purchased the Programming Google App Engine book by Dan Sanderson because I wanted to learn more about Google App Engine (GAE) on Python.
GAE is a platform and set of services for building web applications.
GAE scales your web application automatically as long as you work within their set of restrictions such as using a non relational datastore.
Although the online GAE doc has great coverage and documentation, I wanted something which could provide a little more depth. Looking up things as you need them in the GAE docs doesn't give you a high level global understanding of what's possible and how to do it.
I am left feeling confident coding for GAE after reading this book. The author has a great understanding of GAE and does a great job of explaining everything clearly. I was never bored when reading this book and the examples were great dealing with game avatars.
After reading this book you will be ready to take on a GAE project to solidify the knowledge you learnt. You will have a great understanding of GAE runtime, datastore, index design, datastore transactions, memcache, queues, scheduled tasks, webapp module, django on GAE, remote API, bulk data operations, incoming/outgoing email, incoming/outgoing XMPP and more.
You will also understand exactly what will never be possible on this platform (such as writing to the filesystem), and what will one day be possible (such as different languages and runtimes).
This book is a quick read, it is 333 pages not including the index, and if you are only interested in Python you will be skipping over about 1/3 of the book. I would have preferred if the book only focused on Python and didn't even mention Java. I would have preferred the book to be split into 2; a python version and a Java version.
In the one and a half years since the book was released, and since GAE was released April 7, 2008, there are already several additions to GAE, but everything mentioned in the book is still sound. The book will give you enough of a foundation of Google App Engine to continue on.
I hope future editions will include coverage on OAuth, Prospective Search, DoS Protection, the django nonrel project, and updated content throughout
Tags: django google-app-engine book review python
Add a new commentThe big picture of how Khan Academy development works
Last modified: Friday, April 22, 2011
If you haven't heard of Khan Academy yet, you need to start reading more news. I first heard of Khan Academy when they were announced as a winner of Google's Project 10^100 and have been telling people and tweeting about them ever since. I didn't start looking into how their development process works until last night though.
Khan Academy is a non profit company started by Salman Khan with the mission of educating the world. Sal himself has created over 2,000 videos on a range of topics from history to mathematics and everything in between. The videos are nothing short of amazing and are broken down into 10 minute chunks which was originally because of the youtube limit imposed on Sal.
Shantanu is the President and COO of the Khan Academy and also has a strong mathematical background like Sal.
Khan Academy has a reputation/energy and badge system in place which makes the site just as addictive as StackOverflow. The badge system is especially cool, offering real time badge awards, something not easily done with a NoSQL implementation and a huge dataset behind the scenes.
Khan Academy is hosted on Google code and uses subversion (SVN) Kiln Hg (They upgraded from SVN) for source control.
There are currently 11 committers to the project, the current most active by far (with over a half dozen commits even on a Saturday afternoon) is someone named Ben Kamens (@kamens).
Ben is a previous employee of Joel Spolsky's company Fog Creek Software, and has a great blog with some interesting insight on how Khan Academy works. He accomplished a lot within just a few months of working at Khan Academy. He also develops a couple of cool iPhone apps one called RulerPhone and the other Precorder.
Khan Academy runs on Google App Engine (GAE) which means they must either use Java or Python 2.5 (Python 2.5 in sandboxed mode also minus the ability to run C extension modules). Khan Academy uses Python 2.5 along with GAE's default webapp module. Since webapp does not include a template engine, they use the Django 0.96 template engine which the GAE runtime includes by default. As with all GAE applications, the main sitemap is configured via setting URL pattern matching with a YAML configuration file. GAE has a great GAE getting started guide if you are interested. I was.
GAE works off of a datastore which is automatically replicated and scaled and is based on BigTable and hence Google Filesystem (GFS). GAE does not allow for you to host a relational database. Instead of using SQL to tie into the datastore and having write access to your filesystem, you need to use the Google Query Language (GQL). GQL looks exactly like SQL but you can't do joins and you can't select partial entities from your queries. You must either select just the keys or the entire entity.
GAE applications such as Khan make use of caching so that the datastore does not need to be contacted on each page load. This caching is typically handled with the memcache service included in GAE API. Typically each model that you have would save to the memcache when you write the model to the datastore, and it would try to retrieve the object from the memcache before getting it from the datastore.
Khan Academy does expose an HTTP JSON API but only for getting a list of playlists and videos per playlist.
It would be great to see additional APIs for read only access to the energy and badge system.
The backup system used by Khan Academy takes around 3 days to complete and is run on an Amazon EC2 instance.
I think this could be improved by doing incremental/differential processes, and using deduplication.
Khan Academy tries to fix all bugs before adding new features, which is a great mantra to have. Other than GAE they use a few very cool Javascript libraries under the hood:
- jQuery (who doesn't use jQuery?)
- ASCIIMathML to formulate math equations, this works by automatically converting any math equation within back tick characters.
- ASCIIsvg Graphing is accomplished using an iframe which contains generated SVG code (hurray for IE9 finally getting native SVG support)
- JavaScript InfoVis: Provides tools for creating Interactive Data Visualizations for the Web. Used for the old knowledge map.
- YUICompressor to compress the Javascript, but better ratios could be accomplished using the Google Closure compiler.
- Google Maps API v3 is used for the exercise dashboard using a custom map type and some other customizations on the controls and zoom. Another cool aspect is that you are actually zooming around images from the Hubble telescope.
- Google Analytics is a tracking tool for stats on your visitors
- Highcharts JS: Interactive JavaScript charts. They use this for user profile charts.
- Raphaël—JavaScript Library: Used for the scratchpad when doing exercises, and for exercise drawings. Raphaël is a Javascript library for creating SVG graphics, every graphic object is a DOM object which can be manipulated
- MathJax: Math visualization library for inputs of MathML and LaTeX
HTML5 is used by Khan Academy proved by their HTML doctype declaration; however, in the exercise modules some simple changes could improve the user interface and be compatible across all browsers and platforms.
By simply making input boxes like so: <input type="number"> this would mean that all popular mobile phones would display a numeric keypad by default right away. All browsers default to type="text" if the type specified is unknown by old browsers that don't understand HTML5.
Sal himself started the code but I would imagine most of his time is spent creating the actual content videos, handling press, and doing thousands of other things today. Dean Brettle and Omar Rizwan are also notable developers (sorry if I missed others). Dean amongst other things handles release management, and created the scratch pad used in exercises. Omar has contributed at least 16 exercise modules. Jason Rosoff (Jason's blog, @jasonrr) is also extremely involved in the project and is known as the lead designer also doing some coding. Marcia Lee (@marcia_lee) is a recent hire and makes frequent commits.
If you are interested in helping with the Khan Academy project you can get started by:
- Reading the Khan developer's guide
- Taking a look at the open issues or the full issue list
- Checkout the project:
http://khanacademy.googlecode.com/svn/trunkhttps://khanacademy.kilnhg.com/Repo/Website/Group/stable - Create a new module from the list of modules pending development
- Start working on exercise modules and bug fixes!
Tags: google-app-engine khan-academy javascript python
Add a new comment | 7 comment(s)
|
Added information on the Google Maps API v3 used for the exercise dashboard. |
|
Added a note on HTML5 usage and a suggested improvement. |
|
Added Highcharts JS: Interactive JavaScript charts. They use this for user profile charts. |
|
Hi! Thank you so much for putting this excellent overview together! I wish I had one of these every time I was joining a new project. Great work! |
|
- Updated info for svn conversion to Hg |
|
Very nice post! It's interesting to see what goes on behind the scenes behind major projects, and outlines like this are useful for determining whether or not my skill set matches what they need. |
|
I was hoping to write something like this but this is a top quality summary! |