Recently modified blog posts
News and other things I find interesting
RSS feeds and ATOM feeds

Wikipedia contributions
Last modified: Monday, March 01, 2010
I finally signed up for an account at Wikipedia. I've made several anonymous changes in the past. You can see a list of my contributions here at my wikipedia user page: http://en.wikipedia.org/wiki/User:Netzen
One of the joys of Python - Generators
Last modified: Saturday, December 19, 2009
This post will use Fibonacci numbers as a tool to build up to explaining the usefulness of Python generators.
This post will feature both C++ and Python code.
Fibonacci numbers are defined as the sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ....
Or in general:
F0 = 0
F1 = 1
Fn = Fn-1 + Fn-2
This can be transferred into a C++ function extremely easily:
size_t Fib(size_t n)
{
//Fib(0) = 0
if(n == 0)
return 0;
//Fib(1) = 1
if(n == 1)
return 1;
//Fib(N) = Fib(N-2) + Fib(N-1)
return Fib(n-2) + Fib(n-1);
}
But if you want to print the first 6 Fibonacci numbers, you will be recalculating a lot of the values with the above function.
For example: Fib(3) = Fib(2) + Fib(1), but Fib(2) also recalculates Fib(1). The higher the value you want to calculate, the worse off you will be.
So one may be tempted to re-write the above by keeping track of the state in main.
//Not supported for the first 2 elements of Fib
size_t GetNextFib(size_t &pp, size_t &p)
{
int result = pp + p;
pp = p;
p = result;
return result;
}
int main(int argc, char *argv[])
{
size_t pp = 0;
size_t p = 1;
std::cout << "0 " << "1 ";
for(size_t i = 0; i <= 4; ++i)
{
size_t fibI = GetNextFib(pp, p);
std::cout << fibI << " ";
}
return 0;
}
But this is very ugly, and it complicates our logic in main, it would be better to not have to worry about state in our main function.
We could return a vector of values and use an iterator to iterate over that set of values, but this requires a lot of memory all at once for a large number of return values.
So back to our old approach, what happens if we wanted to do something else besides print the numbers? We'd have to copy and paste the whole block of code in main and change the output statements to whatever else we wanted to do.
And if you copy and paste code, then you should be shot. You don't want to get shot do you?
To solve these problems, and to avoid getting shot, we may re-write this block of code using a callback function. Every time a new Fibonacci number is encountered, we would call the callback function.
void GetFibNumbers(size_t max, void(*FoundNewFibCallback)(size_t))
{
if(max-- == 0) return;
FoundNewFibCallback(0);
if(max-- == 0) return;
FoundNewFibCallback(1);
size_t pp = 0;
size_t p = 1;
for(;;)
{
if(max-- == 0) return;
int result = pp + p;
pp = p;
p = result;
FoundNewFibCallback(result);
}
}
void foundNewFib(size_t fibI)
{
std::cout << fibI << " ";
}
int main(int argc, char *argv[])
{
GetFibNumbers(6, foundNewFib);
return 0;
}
This is clearly an improvement, your logic in main is not as cluttered, and you can do anything you want with the Fibonacci numbers, simply define new callbacks.
But this is still not perfect. What if you wanted to only get the first 2 Fibonacci numbers, and then do something, then get some more, then do something else.
Well we could go on like we have been, and we could start adding state again into main, allowing GetFibNumbers to start from an arbitrary point.
But this will further bloat our code, and it already looks too big for a simple task like printing Fibonacci numbers.
We could implement a producer and consumer model via a couple threads. But this complicates the code even more.
Instead let's talk about generators.
Python has a very nice language feature that solves problems like these called generators.
A generator allows you to execute a function, stop at an arbitrary point, and then continue again where you left off. Each time returning a value.
Consider the following code that uses a generator:
def fib():
pp, p = 0, 1
while 1:
yield pp
pp, p = p, pp+p
g = fib()
for i in range(6):
g.next()
Which gives us the results:
0
1
1
2
3
5
The yield statement is used in conjuction with Python generators. It saves the state of the function and returns the yeilded value. The next time you call the next() function on the generator, it will continue where the yield left off.
This is by far more clean than the callback function code. We have cleaner code, smaller code, and not to mention much more functional code (Python allows arbitrarily large integers).
Forward declaring enums in C++
Last modified: Sunday, December 13, 2009
Forward declaring things in C++ is very useful because it dramatically speeds up compilation time. You can forward declare several things in C++ including: struct, class, function, etc...
But can you forward declare an enum in C++?
No you can't.
But why not allow it? If it were allowed you could define your enum type in your header file, and your enum values in your source file. Sounds like it should be allowed right?
Wrong.
In C++ there is no default type for enum like there is in C# (int). In C++ your enum type will be determined by the compiler to be any type that will fit the range of values you have for your enum.
What does that mean?
It means that your enum's underlying type cannot be fully determined until you have all of the values of the enum defined. Which mans you cannot separate the declaration and definition of your enum. And therefore you cannot forward declare an enum in C++.
The ISO C++ standard S7.2.5:
The underlying type of an enumeration is an integral type that can represent all the enumerator values defined in the enumeration. It is implementation-defined which integral type is used as the underlying type for an enumeration except that the underlying type shall not be larger than
intunless the value of an enumerator cannot fit in anintorunsigned int. If the enumerator-list is empty, the underlying type is as if the enumeration had a single enumerator with value 0. The value ofsizeof()applied to an enumeration type, an object of enumeration type, or an enumerator, is the value ofsizeof()applied to the underlying type.
You can determine the size of an enumerated type in C++ by using the sizeof operator. The size of the enumerated type is the size of its underlying type. In this way you can guess which type your compiler is using for your enum.
What if you specify the type of your enum explicitly like this:
enum Color : char { Red=0, Green=1, Blue=2};
assert(sizeof Color == 1);
Can you then forward declare your enum?
No. But why not?
Specifying the type of an enum is not actually part of the current C++ standard. It is a VC++ extension. It will be part of C++0x though.
Pure virtual function call errors and related behavior
Last modified: Thursday, December 10, 2009
What are a pure virtual functions?
Pure virtual functions, also called abstract functions, are functions who's implementation is not yet specified.
They are useful because:
- They allow you to define an interface without defining an implementation.
- A base class may not have a specific default definition for a function, but you know that derived types will.
In C++ both interfaces, and undefined base functions are implemented via virtual functions. In C# there are different constructs for interfaces (interface) and undefined base functions (abstract).
This post discusses what pure virtual function call errors are, and how they work across the following languages: C++, C#, and Python.
What is a pure virtual function call error?
Pure virtual function call errors could potentially happen, in a programming language that allows you to create partially implemented classes. Although not all programming languages can have pure virtual function call errors.
Pure virtual function call errors occur when a call is made to a pure virtual function. Since an abstract base type cannot be created in most languages, they will typically occur before a derived type is fully created, or after a derived type is already destroyed. The call is therefore usually called from the base type. Pure virtual function call errors could potentially also occur when using a pointer to call a function of an already deleted object.
Can C++ have pure virtual function errors?
Yes.
Consider the order of construction for the following C++ code:
class Animal
{
public:
virtual ~Animal() {}
virtual void Speak() = 0;
Animal() {}
};
class Dog : public Animal
{
public:
virtual void Speak() { }
};
//....
Dog leia;
When you create an instance of Dog the following happens:
- Construct
Animal - Construct
Dog
When the instance of Dog named leia falls out of scope, the following happens on destruction:
- Destruct
Dog - Destruct
Animal
If you happen to call Speak() in the destructor of Animal, or in the constructor of Animal, then a pure virtual function error will occur. Most C++ compilers will give you a compiling error; however, you can get around this compiling error by calling a function that calls a pure virtual function.
Here is a code sample that will produce a pure virtual function runtime error in g++, Visual Studio 2005, and Visual Studio 2008.
class Animal
{
public:
virtual ~Animal() {}
virtual void Speak() = 0;
void SpeakPlease()
{
Speak();
}
Animal()
{
SpeakPlease();
}
};
class Dog : public Animal
{
public:
virtual void Speak() { }
};
int main(int argc, char* argv[])
{
Dog leia;
return 0;
}
Can C# have pure virtual function errors?
No.
C# allows you to create pure virtual functions by using the abstract keyword on each of your abstract function/methods. And if you have even one abstract function/method in your class you must also use abstract before your class declaration.
C# gets around pure virtual function calls though, but arguably in a worse way.
public abstract class Animal
{
public Animal()
{
Speak();
}
~Animal()
{
Speak();
}
public abstract void Speak();
}
public class Dog : Animal
{
public override void Speak()
{
Console.WriteLine("Woof!");
}
~Dog()
{
}
}
Dog::Speak() will be called in the destructor of Animal even know Dog is already destructed. Obviously this can lead to many problems.
Can Python have pure virtual function errors?
Kind of, and only if you follow certain conventions.
Python can't define abstract functions directly, instead you simply raise an exception of type NotImplemented.
In Python all functions/methods are virtual.
This is to say pure virtual function support is defined in Python simply by convention instead of language constructs.
Therefore unlike C++ and C#, you can create objects of a class that have some of it's functions/methods as abstract. In that sence you can have pure virtual function errors (via NotImplementedError exceptions)
But Python works like C# in the sense that even before the derived type is constructed, it will call into it. The end result is that it throws an exception that can be caught.
class Animal(object):
def __init__(self):
print("Constructing animal")
self.Speak()
def Speak(self):
raise NotImplementedError
def __del__(self):
print("Destructing animal")
class Dog(Animal):
def __init__(self):
super(Dog, self).__init__()
print("Constructing Dog")
def Speak(self):
print("Woof!")
def __del__(self):
print("Destructing dog")
super(Dog, self).__del__()
def Test():
leia = Dog()
Next time you get an error like: "R6025 Pure virtual function call", perhaps you will wonder less about the source of the error.
Slow compilation time in C/C++
Last modified: Thursday, December 10, 2009
Compilation in C/C++ is a very big operation due to C/C++'s complex grammar. Source files typically residing in .cpp are always only compiled one time; however, header files typically residing in .h files are compiled once per compiler execution. Each header file needs to be recompiled because there could be different effects made from the preprocessor.
Since an individual header file is often compiled many times, header compilation as a whole can make up a large part of your total C/C++ compilation time.
Two of ways you can do to reduce this portion of compilation time is:
- Forward declarations
- Precompiled headers
Forward declarations
Extensively using forward declarations at all times will give you the biggest performance in compilation time.
Forward declaration means to declare something without defining it in a header file. Include the header file instead in the source file where it will be compiled and parsed only once.
c.h:
class C
{
public:
C()
{
}
};
d.h:
class C; //<--- This is a forward declaration
class D
{
public:
D()
{
}
C c;
};
Notice that d.h does not include c.h even know it uses a class declared in c.h
main.cpp:
#include "c.h"
#include "d.h"
int main(int argc, char **argv)
{
D d;
return 0;
}
In main.cpp it is important that you include c.h before d.h; otherwise, the compiler will complain about C being an undefined type.
Note you can also perform forward declarations with template types:
template <typename T>
class CMyClass;
Precompiled headers
Precompiled headers allow you to speed up compile time when compiling C++ source code. You typically put anything in a precompiled header that doesn't change often or ever such as the standard library includes or boost includes.
Precompiled headers are available for most C++ compilers including GCC and Visual C++. Both of those implementations are similar.
Only 1 precompiled header can be included per compilation, so therefore at a minimum per file. But in a single project you can have several different precompiled headers.
In Visual C++ the compiled headers have an extension of .pch and in GCC they have an extension of .gch.
In GCC you compile headers just like any other file but you put the output inside a file with a suffix of .gch.
So for example if you precompile stdafx.h you will have a precompiled header that will be automatically searched for called stdafx.h.gch anytime you include stdafx.h
stdafx.h:
#include <string>
#include <stdio.h>
a.cpp:
#include "stdafx.h"
int main(int argc, char**argv)
{
std::string s = "Hi";
return 0;
}
Then compile as:
> g++ -c stdafx.h -o stdafx.h.gch
> g++ a.cpp
> ./a.out
Your compilation will work even if you remove stdafx.h after step 1.
Arrays are not the same as pointers!
Last modified: Thursday, November 12, 2009
A common mistake people make in C++ is thinking that arrays and pointers are the same thing. They're not.
char *p = "hello";
char q[] = "hello";
These 2 lines are very different.
The first is a pointer to a string literal. The string literal is in read only memory. Changing p[i] for any index i is undefined.
The second is a char array initialized with 'h', 'e', 'l', 'l', 'o', '\0'. Changing q[i] for any index i in the range 0..5 is fine.
Consequently:
assert(sizeof(p) == sizeof(char*));
assert(sizeof(q) != sizeof(char*));
assert(sizeof(q) == 6);
Not only are pointers and arrays 2 different things completely, but you can also have pointers to arrays. Most people think that a char* is a pointer to an array. It's not.
char sz[12];
//This is fine, p points to sz's first element's address
char *p1 = sz;
//Compiling error, Can't convert a pointer to 12 elements to a pointer to a char
char *p2 = &sz;
//This is the correct way to create a pointer to an array
char (*x)[12] = &sz;
//Compiling error, can't convert a pointer to 12 elements to a pointer to 10 elements
char (*y)[10] = &sz;
And of course you can also create references to arrays. But the syntax is just as ugly as the syntax for pointers to arrays.
//r is now a reference to sz
char (&r)[12] = sz;
How VNC, Fog Creek Copilot and other remote control software works
Last modified: Tuesday, September 22, 2009
About 7 years ago my company and I created a program called Remote Task Manager. It was exactly like the Windows Task Manager, but it had an extra left bar which showed all the computers you wanted to control. And it had a button for View This Process.
The button made any process magically show up on your screen remotely.
This program didn't last long though, my company's focus went to backup software and we shelfed the product at a young age. This program was built by taking screenshots and sending them from computer to computer and was built with a custom protocol.
This article investigates more proper ways for implementing remote-control-computer software.
An introduction to some terms:
Images on computers can be categorized into 2 groups:
-
Vector graphics: Graphic that are built from stored instructions. These stored instructions are used to draw primitives such as points, lines, curves, shapes, polygons.
Examples of vector graphic file formats include: SVG, Adobe Illustrator files. -
Raster graphics: Graphic that are built from a grid of pixels and their color values. Examples include: BMP files, GIF files, JPG files.
Likewise, displays can also be categorized into 2 groups:
- Vector graphic displays: Video output display who's source comes from vector graphics.
- A framebuffer display: Video output display who's source comes from a memory buffer containing a frame of data. That is to say it stores a raster graphic.
The VNC protocol works for any framebuffer display. That is to say it relates to #2s above, but not the #1s above. That means it applies to just about every operating system whether it is Linux, Solaris, OS X, or Windows.
The remote framebuffer (RFB) protocol is a protocol used for remotely accessing a GUI. The VNC protocol is built from the RFB protocol. The RFB protocol, like most (but not all) other protocols can be divided into a client and server.
The RFB Client:
The computer that wants to control the remote computer is called the RFB Client. Programming an RFB Client is easier relative to programming an RFB server, mostly because the client is stateless.
Having a stateless protocol is an absolute gift from god. It means that when you disconnect, you can reconnect easily, whether on purpose, or by accident, or by hardware/software failure. An example of a completely stateless protocol is HTTP. An example of a completely stateful protocol is FTP.
After setting the frame format that the RFB Client wants, the RFB Client will request updated frames as it wants them from the RFB Server. For every update the RFB Client obtains, it displays it on the screen. The RFB Client does not need to request the entire frame each time. It can request an x,y position and width,height as well.
The RFB Client also sends Input events such as keyboard presses, mouse presses, mouse moves, and more to the RFB Server.
I won't go into much more details about how things work at the RFB protocol level, but if you'd like to know more please read this document.
The RFB Server:
The computer that actually has the framebuffer that you want to see is called the RFB server. Programming this is harder relative to the RFB Client because it needs to 1) manage one or more RFB Clients, 2) respond to input events from the RFB Client, and 3) Provide updated frames to the RFB Client.
Several requests can occur from the RFB Client and the server may decide to simply send only one frame update.
The RFB Server needs to only send incremental updates to the RFB Client, unless the RFB Client specifically sets the Incremental value to false. Typically all requests form the RFB Client will have Incremental set to True. Except of course for the first request and any first request after a reconnect.
Both the RFB Server and RFB Client can also notify each other about cut text, and the RFB Server can notify the RFB Client to ring a bell if it has one.
TCP hole punching:
One thing that the RFB protocol does not address directly is connecting 2 endpoints that are behind a router/NAT (henceforth referred to as a NAT).
Just about everyone on the internet now days has a NAT. A NAT connects multiple computers to 1 Internet connection. That means that each computer behind that NAT has an inside IP address of its own, but they all share the same outside IP address.
The way that the NAT knows which computer to send which data coming in to its IP to, is via a network address translation table.
When a computer behind a NAT initiates a connection to a server and a port, the NAT stores the internal IP and performs the connection for that computer. Any data coming back from that server then gets routed back to the original computer that initiated the connection.
This works nicely if the computer behind the NAT initiates the connection, but what if an outside computer wants to connect to one of your internal computers, and it wants to initiate the connection? It's not possible. The outside computer knows only about your outside IP and knows nothing about internal IPs of the computers behind that external IP.
The way around this problem, if you have the source code to the programs you want to make the connections between, is known as TCP hole punching.
TCP hole punching means that both computers connect to a known server, and then the communication will continue after that between the 2 computers. I will not do it justice by explaining it; however, you can read a great article called Peer-to-Peer Communication Across Network Address Translators on the matter by Bryan Ford, Pyda Srisuresh, and Dan Kegel.
Getting across NATs is probably one of the hardest things when doing network programming. The problem is that just about everyone on the internet now days has a NAT. This problem is also shared with P2P protocols such as the Gnutella protocol. Google Talk is another example.
A commonly used library for doing all of this work for you and getting across NATs is STUNT. STUNT stands for: Simple Traversal of UDP Through NATs and TCP too.
Getting across firewalls:
If possible it is best to put your connection to a port like 443 (HTTPS) or port 80 (HTTP). Because almost all firewalls will let you have outgoing socket connections on those ports.
Windows specific, sessions and more:
More advanced Windows specific functionality relating to remotely controlling computers has to do with sessions.
In windows you can have 1 or more sessions. Each session represents one logged on user. A single user can login multiple times and belong to multiple sessions. Each session can have one or more Desktop's. An example of a Desktop is what you are looking at now, another example is your screensaver. Both your screensaver and what you are looking at now belong to the same session, but have a different Desktops.
Every application in windows that is started can be started in any session and in any Desktop. But that Session and Desktop cannot be changed once the application is already started. For this reason, it is typical to see desktop software of all types that have a core program, plus a viewer. The core program can be started on any session and in any desktop, it has no GUI. The viewer can then be started on one or more sessions and desktops and communicates with the Core program and displays a GUI to you.
In Windows Vista and later, Microsoft introduced something called Session 0 isolation. They discuss its impact in this article aptly entitled Impact of Session 0 Isolation on Services and Drivers in Windows Vista. Did it ever have an impact...
The Session 0 isolation change broke many programs that were compatible with Windows XP and Windows 2003. The change created hundreds of thousand of developer hours needed from 3rd party developers. Suddenly complete programs needed to be restructured to accommodate for Session 0 isolation. In general this change means that all Windows services now run in Session 0. Session 0 cannot have any GUI associated with it. No GUI can be seen across sessions anymore. And the applications you see now cannot be in the same session as Session 0. The problems that this causes are far reaching, but for the most part most companies worked around them.
Sessions can be controlled by programmers using the WTS API which is now called the Remote Desktop Services API. This article focus' on the RFB Protocol but you can also accomplish Remote Control in Windows by using the RDP Protocol and related API.
If multiple Sessions and Desktops exist, the RFB Server must decide which one to use. This may or may not be the same as which session and desktop the RFB Server runs from.
Hooking user input:
Another important aspect of remotely controlling computers is called a Hook.
A Hook allows you to get feedback system wide, or per process wide about what events are happening on the system. A typical Hook that you would install is a keyboard and mouse hook. These hooks would be installed on the RFB Server so it can better detect changes to send to the RFB Client.
In Windows, Hooks are implemented in a DLL and Windows will load that DLL and notify it of the events of what the hook is registered to do.
Web Versions:
In some software for remotely controlling computers, the Client side is on the web. This is simply done by HTTP and a lot of AJAX. The web page itself makes the requests directly to the Server and updates the web browser dynamically with the content of the retrieved framebuffers.
Remote-Control-Computer Software:
There are many VNC client/server implementations. Most are open source and licensed under the GPL.
There are also VNC client/server implementations that are based on the VNC protocol but don't follow it exactly. An example is Fog Creek Copilot. I go into more detail about the Fog creek Copilot project here.
Introducing the rope data structure!
Last modified: Monday, September 21, 2009
I've long known about the rope data structure, but it hasn't dawned on me until now in what cases you'd actually want to use it. I also didn't take the time to see how a rope was actually implemented until I looked it up tonight.
Here's an explanation:
The problem with a string data structure is that it's very hard to do insertions into the middle of the string. This means that if you are continually inserting text into the middle of your string, then you will need to re-allocate your buffer a lot of times.
Also to append 2 strings you are forced into copying the 2nd string into the first string's buffer. Assuming the first string's buffer is large enough. If the buffer is not large enough, then the first string will need to re-allocate and copy.
If you have a really really large string, let's say 1GB worth of string data, then a simple insertion of a couple words means you need to move the rest of the characters over and then null terminate. This is an O(N) operation, which means you have to process 1GB worth of data even for any insertion into a 1GB string.
A rope is a special case of a binary tree data structure where each leaf node holds a substring (or an array of characters). Nodes that are not leaf nodes do not hold an array of characters.
A left child that is a leaf is the left part of a string. A right child that is a leaf node is the right part of the string. To get the whole string you process the left child then the right child.
You can concatenate 2 ropes easily by creating a common root node and joining them together.
Example:
RootNode = "Hello"
I want to append " World" so first I create a new rope with a single RootNode....
RootNode = " World"
Then I combine both of those ropes:
RootNode:
- Left Child: "Hello"
- Right Child: " World"
Now if we want to make an insertion into the middle of the stirng: "Hello Great World" we simply made a new node and join the leaf node "Hello" with this new node " Great".
RootNode:
LeftChild:
- LeftChild: "Hello"
- RightChild: " Great"
RightChild: " World"
To get the full string you simply process the tree from the root node, following the left subtree first.
Appending becomes an O(1) operation. Indexing becomes O(logn), and printing everything is just as efficient, other than the rope having a constant factor overhead.
If you are implementing your own rope data structure from scratch you can actually take advantage of the non leaf nodes. They can be used for any type of hierarchical data that would apply to all of its children. For example, perhaps special formatting. Getting a whole item's format would simply involve getting the parent of the nodes all the way up to the root.
You could also do things like make each leaf node a single word. Then you could implement a spell check pretty easily without having to process each node to find the distinct words in each node.
2 books I'm reading...
Last modified: Monday, September 21, 2009
I'm currently reading:
- Gray Hat Python: Python Programming for hackers and reverse engineers by Justin Seitz
- C# in depth 2nd edition by John Skeet
I'm about half way through both books. They're both great books so far.
STL mini tutorial - strings, vectors, and maps
Last modified: Monday, August 24, 2009
Vectors, strings, and maps are all part of the C++ standard template library (STL). You can use them in any C++ program. It doesn't matter what operating system you're programming on, if it has C++ support, then it has STL support.
Before using this tutorial you should have a good grasp at C++ templates, as well as everyting that you should know before templates. A great tutorial for beginners is at cplusplus.com
To use STL you need the following includes:
//If you want to use maps
#include <map>
//If you want to use vectors
#include <vector>
//If you want to use strings
#include <string>
And you will need to always prefix string, vector and map with std:: unless you put this:
using namespace std;
Here is an example on how to use STL strings:
string s = “hi”;//s holds "hi"
s += “ Brian”;//s now holds “hi Brian”
int iSize = s.size();//iSize holds 8
char *p = s.c_str();//p holds a pointer to an array of characters with a 0 at the end.
All character arrays, when holding strings, have a 0 at the end, this is very important. They don’t have the character “0” at the end, but they have an ascii value of 0 at the end. This zero is used to indicate the end of the char array. When you call the string's c_str() method, it will append a 0 for you at the end of the char array that it returns. You could have also used s.data(), which means the exact same thing as s.c_str().
Many people don't know this, but strings can also store binary data. By binary data I just mean any data that is non text. I.e. there may be zero's intermixed througout the string.
So for example:
string s;
//The size of pBinaryData is uiLen1; copy a new string object into s that holds the binary data from pBinaryData
s = string(pBinaryData, uiLen1);
s.append(pMoreBinaryData, uiLen2);
assert(s.size() == uiLen1 + uiLen2);
In the example above s.c_str() will now hold the binary data from both calls. You no longer have to worry about forgetting to delete binary data that you allocate. If you created pBinaryData and pMoreBinaryData on the heap, you can delete[] them after the call to append. Anything that you put in a string will have it's own memory, you don't need to worry about freeing it.
Here is an example on how to use STL vectors:
You use a vector to store a list of values. A value can be a number, string, or any other class object that you create. This list can grow to any number of elements, unlike an array.
vector<string> v;
v.push_back(“hi”);
v.push_back(“hi2”);
v.push_back(“hi3”);
assert(v.size() == 3);
for(int a = 0; a < v.size(); ++a)
{
string s = v[a];
s += “Brian”;
}
//The above "for loop" is pointless obviously :). It will go through each element in the vector, and copy the vector element's data to a variable "s"
// When I go s+= "Brian" it won’t modify the vector contents in any way. Because when you access each elemetn v[a], it will create a copy of the string.
// After the for loop, the vector is left unchanged.
Notice how I wrote vector
Vector of your own class objects:
class Cat
{
public:
string str1;
string str2;
string str3;
int x;
};
vector<Cat> vMyCats;
Cat minou;
minou.str1 = “hi”;
minou.str2 = “hi2”;
minou.str3 = “hi3”;
minou.x = 5;
vMyCats.push_back(minou);
assert(vMyCats.size() == 1);
//vMyCats holds a vector of Cat objects. Each cat has 3 strings and 1 integer.
Here is an example on how to use STL maps:
You use a map to store a lookup table. i.e. to associate something with something else.
map<string, string> myMap;
myMap.insert(pair<string,string>(“firstName”, “brian”);
myMap.insert(pair<string,string>(“lastName”, “bondy”);
myMap.insert(pair<string,string>(“age”, “24”);
string strFirstName = myMap[“firstName”];
assert(strFirstName == "brian");
You can use a map with other things then a string: map
FROM can be any type, for example int, Cat, string,
TO can be any type, for example int, Cat, string
The type you use for the FROM you use in the brackets to do lookups
The type you use in the TO is the value that’s associated with it.
How email works?
Last modified: Monday, August 24, 2009
Here's a quick tutorial on how email works, SMTP, POP3, IMAP, Webmail, ...
What is a Standard?
A standard is a set of rules that are followed by all developers around the world. Some standards include HTTP, SMTP, POP3, …
There is official documentation that describes each individual standard and most standards have been around for 0 to 30 years.
Each standard document is a very detailed explanation of what the standard is and how it works. Typically a standard has an RFC number associated with it, but there are many different types of standards.
Protocols
SMTP and POP3 are ‘standards’. Each standard describes a different protocol. A protocol is any kind of communication between 2 or more computers.
What is SMTP?
SMTP is the ‘standards’ protocol that is used to send email. Your computer uses SMTP to send email. See RFC 821, August 1982
What is POP3?
POP3 is the ‘standards’ protocol that is used to receive email. Your computer uses POP3 to receive email. POP3 is also referred to as simply POP. See RFC 1939, May 1996.
POP3 typically will connect to the mail server and download messages to your computer. It can then optionally delete the message from the server (which it is usually setup to do).
How Email works
- User A wants to send an email to user B.
- User A writes up an email and presses send.
- User A’s computer, uses SMTP communication to send the email to User A’s (Yes A, not B) SMTP server.
- User A’s SMTP server, sends the email to user B’s SMTP server using SMTP communication.
- User B when he feels like it, contacts his SMTP server and uses POP3 to download the messages.
Some important notes:
The only way to send email is to use SMTP. (Actually you can also use MAPI and some other things but let's not get into that)
The only way to receive email is to use POP3. (Actually there is also IMAPv4, but we'll pretend that POP3 is the only way)
How Email Applications work:
SMTP communication is present on your computer, no matter what email client you use. Any time an email is sent out, your computer uses SMTP to send the email. It doesn’t matter if you're using Eudora, Outlook, Outlook Express, Mozilla Thunderbird, or a custom made program. All programs use SMTP to send emails.
By using standards you are guaranteed that, even know user A uses Outlook, and user B uses Eudora, and they both have different SMTP servers both of the users will be able to communicate.
What is HTTP?
Before I can get to what web mail is, you first need to know what HTTP is. HTTP is just another standard protocol. But HTTP is meant to download files and web pages, unlike SMTP which is meant to send emails. See HTTP 1.1 RFC 2616, June 1999.
What is web mail?
Web mail is an online web page that allows you to send and receive emails using HTTP.
But wait a minute, didn’t I just say that the ONLY way to send email was using SMTP?
Yes! What the web page does, is provide you with a form that you fill out. Your computer doesn’t know that it is any different from a form that you fill out to enter your credit card information, or a form that you fill out to enter your home address, or a form that you fill out to sign into another web site. All your computer knows is that you are filling out a form.
When you press the send button, your web browser sends the form to the server. The server knows that this form is for email though. So the server interprets the form and extracts the needed information. The HTTP server then uses SMTP to send the message. Because the only way that a message is going to get from User A to User B is using SMTP.
What the web browser has done is fooled you into thinking that you are sending an email. But what’s really happening, is that your web browser is filling out a form, and then the web server is using SMTP to send your email.
Can you give me a web mail walk through ?
- User A wants to send an email to User B, User A is going to use web mail.
- User A uses his browser to type in an internet address (for example: www.hotmail.com).
- User A’s computer uses HTTP to contact the server and ask for the web page that is used for web mail in this case.
- The server responds (using HTTP) to User A’s computer with a web page that gives him options to compose mail, check mail, …
- User A clicks on the compose a message link. Again User A’s computer uses HTTP to contact the server.
- The server responds (using HTTP) to User A’s computer with the web page (which contains a form) that allows User A to compose a message.
- User A fills in the web page and presses send. The page is sent back to the server using HTTP.
- In the background, unknown to User A, the web server uses SMTP to send the email to User B. Why? Because the only way to send an email is to use SMTP
- The server responds (using HTTP) to User A’s computer with a web page that says the email was sent.
How does the web server use SMTP?
Since SMTP is a standard protocol it uses SMTP in the same way any program would use SMTP. See the section ‘How email works’.
What is IMAPv4?
I mentioned IMAPv4 earlier. IMAPv4 is a second method used by email clients to retrieve your emails. IMAPv4 is also referred to as more simply IMAP. IMAPv4 is more complex than POP3, but gives you the ability to work on your email from multiple computers. If you use more than one computer, and you'd like to access your email from both computers, IMAP is the way to go.
IMAP stores all of its data on the mail server. In that way each mail client from each different computer can be in sync. When you read an email from one computer, your work computer will also see that the message is read. Since data is stored on the server, IMAP email accounts are typically more expensive.
Windows 64Bit AMD and Intel processor rundown
Last modified: Monday, August 24, 2009
I just purchased a new 64-Bit AMD computer and installed Windows 64-Bit AMD for the first time.
Here’s a short rundown on what I found out during my first night of using it.
There are 2 main types of 64-bit processors, IA64 and x64, and they run different versions of windows 64-bit.
That means that every driver built for Windows 64-Bit needs to be compiled twice.
The IA64 processor was called the Itanium. It was a huge flop because of problems with the way they did the caching. It was replaced with the Itanium 2, which has a better design for cache. Itanium 2 does not mean dual processors, but instead it means the 2nd design of Itanium (version 2). It is referred to the Itanium 2 CPU. It is now referred to as IA64 only.
With x64 you can install both Windows 32 bit or Windows 64 bit. With IA64 you can only install the IA64 Windows. You can run 32-bit apps on all variants. For the 64-bit Windows versions you can run the applications through an emulator that works transparently called Windows 32 on Windows 64 (WOW64)
The same does not hold for 32-bit drivers though. WOW64 is only for user mode applications.
On 64-bit machines there are actually 2 different registries. A 32-bit registry and a 64-bit registry. They are treated as different registries by 32-bit and 64-bit applications but the 32-bit registry is just a subkey for the 64-bit registry. Windows will automatically route an application to the proper registry depending on if it’s 32-bit or 64-bit.
Because of WOW64, any application that was compiled before on a 32-bit machine, will still work in 64-bit Windows. This isn't because it supports it direclty, but because windows is using WOW64 to translate the 32-bit calls to 64-bit ones. If however a program depends on a driver, or if it integrates with the windows shell, then that program will no longer work. I believe IPC is also translated on the fly by WOW64.
Some other useful information.
These 2 directories are both for 64-bit applications/drivers only. Pretty great naming! :), but I'm sure it will save many headaches for inf file authors.
C:\Windows\System32\C:\Program Files\
These 2 directories are for 32-bit apps and drivers only, they use 32-bit emulation
C:\Windows\SysWOW64\C:\Program Files (x86)\
So developing for 64-bit machines can get a little tricky if you actually want it to run as a 64-bit program. If it's running as a 32-bit program, then your ints will still be 4 bytes, and everything else you hold true will also still hold.
Right now, when I want to make a 64-bit program, I use a makefile instead of using Visual Studio directly. You can use different NSIS .exes to generate the different installers. I suggest to build each project with 3 different versions of NSIS. Then if you want to, you can build a 32-bit installer wrapper that combines each of your installers.
FTP protocol overview
Last modified: Monday, August 24, 2009
The official FTP standard is defined in RFC 959 [Postel and Reynolds 1985]. This article is in no way complete nor should it be used as an implementation guide alone.
FTP uses two TCP connections to transfer files:*
- The control connection
The control connection is made by the server listening to the default port for FTP which is 21. This port can differ, but the default is 21. The FTP server waits for a connection from the FTP client. The client does an active open from any port to port 21. The control connection once established, remains open for the duration of the communication between client and server. The control connection is used to specify what to do.
- The data connection
Once the control connection is used to specify that it would like to transfer a file, a data connection is established to actually transfer the file. The data connection is used for sending files, receiving files, and getting a list of files in the current directory.
FTP requests:
All FTP requests are 3 or 4 digit ASCII commands. Each command can have 0, 1, or more optional arguments. The most popular FTP commands are as follows:
- ABOR - abort a file transfer
- ACCT - send account information
- APPE - append to a remote file
- CDUP - CWD to the parent of the current directory
- CWD - change working directory
- HELP - return help on using the server
- DELE - delete a remote file
- LIST - list remote files
- MDTM - return the modification time of a file
- MKD - make a remote directory
- MODE - set transfer mode
- NLST - name list of remote directory
- NOOP - do nothing
- PASS - send password
- PASV - enter passive mode
- PORT - open a data port
- PWD - print working directory
- QUIT - terminate the connection
- RETR - retrieve a remote file
- REIN - reinitialize the connection
- RMD - remove a remote directory
- RNFR - rename from
- RNTO - rename to
- STAT - return server status
- SITE - site-specific commands -SIZE - return the size of a file
- STOR - store a file on the remote host
- STOU - store a file uniquely
- STRU - set file transfer structure
- SYST - return system type
- TYPE - set transfer type -USER - send username
FTP replies:
All FTP replies start with 3 digit status code in ASCII. An optional message can follow this 3 digit status code.
The first (left most) digit in the error code has special meaning.
1xx: Used for positive preliminary replies.2xx: Used for positive completion replies.3xx: Used for positive intermediate replies. Which means everything was OK, but another command is expected.4xx: Transient negative completion reply.5xx: Permanent negative completion reply.
The second (middle) digit in the error codes also have special meaning.
x0x: Syntax errorx1x: Informationx2x: Connection relatedx3x: Authentication relatedx4x: Unspecifiedx5x: Filesystem status
The third (right most) digit in the error code, is just used to be more specific.
Most FTP replies are 1 line replies, and can be read up to the first CR LF. However, some may be multi-line replies and you should therefore check to see if the 4th digit is a hyphen. If the 4th digit is a hypen, then you should keep reading lines until the 4th digit is not a hypen. All lines in a multiline reply will start with the same 3 digit status code. The final line in a multi line reply will not have a hyphen.
Using telnet to communicate with FTP:
As with many other ASCII protocols, you can communicate with an FTP server via telnet alone. To do this in windows, simply go to start menu | Run | command.com In the prompt type:
>telnet myservername.com 21
220 Microsoft FTP service
>USER brian
331 Password required for user brian
>PASS mypassword
230 User brian logged in.
>SYST
215 Windows_NT
You won't be able to easily test your data connection this way, but you can test out many of the FTP commands and see the replies that they give.
Debugging FTP:
As with every other protocol, when implementing or learning FTP, it is a good idea to use ethereal. You can download Ethereal/Wireshark. If you are using ethereal 0.10.9 or above, you will not need to download winPCAP.
The new site is out!
Last modified: Saturday, August 22, 2009
Several months ago I decided that this website needed a revamping.
The work since that thought is finally being released today. The work on the site is still slowly in progress, but I feel that enough of the site is done to be better than the last site.
Some stats on the new site:
- Was developed on my mac
- Was written using vi for all editing
- Was built using Django 1.1
- Is running with Python 2.6
- Is served through IIS with PyISAPIe. (PyISAPIe is a great project created by Phillip Sitbon which brings Django to IIS.)
- JQuery was used for a couple of post page served alterations
- The database is built using SQLite
- The blog posts with code samples are made with Google's code prettify
- All blog posts are rendered using Django Markdown (which makes for easy posting of great content)
Markdown is a hassle free, easy to use markup language that allows you to easily post links, post images, and format great blog posts with little work.
The old site was built in ASP .NET beta, and slowly upgraded over time to newer versions. Overall for me, coding in Django is a much better and funner experience than coding in ASP .NET ever was.
Before deciding on Django for the new site, I also had an attempt at Ruby on Rails. I wasn't a fan from the start, the language was simply too different from what I am used to and know.
I like the new design a lot better than the old one. I hired Bradd Bezaire to do this design over 2 years ago in August of 2007. This design sat in my inbox since then because I've been extremely busy, but I'm glad his work didn't go to waste after all.
Wedding Anniversary
Last modified: Saturday, August 22, 2009
Already Shannon and my 2 year anniversary. We have 2 baby boys so we're averaging a child per year.
Time to go visit the wedding web page and browse the photo albums.
PyISAPIe with Django 1.1 and Python 2.6
Last modified: Saturday, August 22, 2009
The PyISAPIe help files currently claim you have to use Python 2.5 and uses Django pre 1.0.
But with some small changes to one of the config files, you can actually have Django 1.1 and Python 2.6.
The only change from their help file that you need to do is replace their pyisapie.py with this file that I modified.
Overloading logical operators
Last modified: Saturday, August 22, 2009
It is no surprise that the following function returns true
bool test()
{
bool b1(true), b2(false);
return b1 || b2;
}
But why does it return true? Most people think that there are 2 expressions here, b1 and b2 and that if either of them are true then the language magically evaluates the total expression to true.
But what's actually happening is that there is a built in global operator that is defined.
bool operator||(bool, bool)
This function is called for every 2 boolean values that you stick an || in between.
This raises the question, if there is an || operator for boolean values, can you define some for other types too? The answer is: Yes! (But not for built-in types)
It is not obvious, because most people haven't seen code like this, but you can also overload logical operators to work with types other than boolean values.
bool test()
{
A a1, a2;
return a1 || a2;
}
Is a1 and a2 implicitly converted to bool? No. What's happening here is that the class A has an overloaded logical operator.
Here's how to overload the logical OR operator:
class A
{
public:
A(bool b) : b(b)
{
}
bool operator||(A & a)
{
return b || a.b;
}
bool b;
};
What we have above is an overloaded operator for logical OR and it takes in 2 different A types.
So what happens if you stick a bool value on the right hand side of a value with type A? It will give you a compiling error.
bool test()
{
A a;
bool b(true);
return a || b; // Error no overloaded operator defined for bool A::operator||(bool)
}
What's seems even more strange though, is that you can overload logical operators so that they do not return a bool type.
Why would you ever want to do that? One good reason comes to mind.
Consider you would like to define a new type similar in every way to boolean values, but it can have 3 states: true, false, and not_set. You could define a tristate type which works exactly the same way as a boolean value in every way, but that has 3 different states.
This is in fact how boost::tribool is implemented:
namespace boost {
namespace logic {
class tribool;
bool indeterminate(tribool, unspecified = unspecified);
tribool operator!(tribool);
tribool operator&&(tribool, tribool);
tribool operator&&(tribool, bool);
tribool operator&&(bool, tribool);
tribool operator&&(indeterminate_keyword_t, tribool);
tribool operator&&(tribool, indeterminate_keyword_t);
tribool operator||(tribool, tribool);
tribool operator||(tribool, bool);
tribool operator||(bool, tribool);
tribool operator||(indeterminate_keyword_t, tribool);
tribool operator||(tribool, indeterminate_keyword_t);
tribool operator==(tribool, tribool);
tribool operator==(tribool, bool);
tribool operator==(bool, tribool);
tribool operator==(indeterminate_keyword_t, tribool);
tribool operator==(tribool, indeterminate_keyword_t);
tribool operator!=(tribool, tribool);
tribool operator!=(tribool, bool);
tribool operator!=(bool, tribool);
tribool operator!=(indeterminate_keyword_t, tribool);
tribool operator!=(tribool, indeterminate_keyword_t);
}
}
Overaloading logical operators can be powerful, but I would advise to only do it if your type can be logically and implicitly considered true or false.
Easy taxes for Ontario Canada
Last modified: Tuesday, August 18, 2009
I used this program to do my taxes (and my wife's). Once done using the program, it will give you a .tax file which you can use to file your taxes online. You simply upload the .tax file to the government's online site here.
Parallels with Mac Book Pro
Last modified: Tuesday, August 18, 2009
I have multiple Macs and PCs. I like to use the Macs more, but I do most of my development in windows. I tried using parallels at first, but it turned out to be slow for several apps.
I've been using parallels with my MBP for a few months now. My MBP had 2GB of RAM and is a 2.33GHz Intel Core 2 Duo. Parallels always made OS X slow, and running applications like Visual Studio was painfully slow.
I decided to replace one of my 1GB sticks of RAM with a 2GB stick of RAM to reach the maximum available 3GB of RAM in my MBP. Apparently if you try to use two 2GB sticks of RAM, then there could be memory address overlap and problems can arrive.
What did I find? The performance difference after installing the extra 1GB of RAM is amazing. Windows runs extremely fast in parallels, and there are no delays at all. I have parallels configured to use 16MB video and 1500MB of RAM. It runs better than my best PC. Before I had parallels configured to use 1GB of RAM.
I even keep parallels open during my normal work day while in OS X just for convenience's sake. This is something I wouldn't dream of doing with my old configuration.
In conclusion, if you're going to buy a MBP and you are going to use parallels. Make sure you select the 3GB RAM maximum. It makes all the difference in the world.
New tools added and updated look and feel of the site
Last modified: Tuesday, August 04, 2009
Added tools and descriptions for the following topics: Base64 encoding, Base64 decoding, URL encoding, URL decoding, and obtaining HTTP headers. You can access these pages and tools via the 'Other' link on the left hand navigation bar.
The twins have arrived
Last modified: Sunday, August 02, 2009
February 5th 2009 my twin boys were born.
Ronald Brian Bondy was born at 4:34pm and weighed 5 pounds 11 ounces. Lincoln Edward Bondy was born at 4:35pm and weighed 5 pounds 10 ounces.
Both boys are home now and are both doing well. Ronnie has gained almost 2 pounds from his birth weight and Link has gained over 1 pound since his birth weight.
Myself I'm as proud as ever and love them to pieces.
How to change the port for RDP?
Last modified: Sunday, August 02, 2009
-
Modify this registry entry: HKEY_LOCAL_MACHINESystemCurrentControlSetControlTerminalServerWinStationsRDP-TcpPortNumber Enter a new value for the port you would like.
-
Turn off the option in my computer properties to enable remote access.
-
Turn on the option in my computer properties to enable remote access.
-
Verify via command prompt that RDP is working with the current information: telnet localhost
If a blank prompt comes up RDP is listening on the new port. -
Configure your firewall/NAT to allow outside connections to this port.
Solution Environment Variables for Visual Studio
Last modified: Sunday, August 02, 2009
I found a VS plug-in that allows you to define solution level build environment variables. This is something I've been wishing Visual Studio had built in for several years now. One of the reasons that it's so important is that you can easily define include directories for all of your projects while keeping them in sync. This is important when administering multiple build configurations.
Typically a simple project only has 2 configs (Debug, and Release) so the problem is not that apparent. But a production level product has at least (Release + Debug) * (x64 + x86 + ia64) = 6 configurations that you have to try and keep in sync. You can download the Solution Build Environment plug-in here.
Also it is important to patch Visual Studio because they have a bug in VS2005 SP1 where plug-ins don't work if you run via command line. You can download this fix here
Why is math important?
Last modified: Sunday, August 02, 2009
A quote from Bjarne Stroustrup, creator of the C++ programming language:
I think of math as a splendid way to learn to think straight. Exactly what math to learn and exactly where what kinds of math can be applied is secondary to me.
I've always said that math teaches you above all how to think.
Microsoft - Yahoo
Last modified: Sunday, August 02, 2009
Everyone states that putting 2 companies together who are not leading will not take over the leader Google.
This is true, but only for search. Yahoo is a great company with a huge community that reaches much further than search. In many ways Yahoo beats Google, just not in search.
Most advertising money is spent on search. Even if Microsoft-Yahoo doesn't have the majority share of search, it doesn't matter. Aggregating both portals into the same advertising service, will simplify advertiser's management of their budget. It will also be a bigger market share, and so people will take more interest in advertising on Microsoft-Yahoo.
There exists a subset of users who advertise with Microsoft but not Yahoo. Likewise, there exists a subset of users who advertise with Yahoo, but not Microsoft. Putting Microsoft-Yahoo together will expand on both of these subsets of people, by broadening their advertising base.
It is a great move by Microsoft in my opinion. That is, just as long as they can properly integrate the parts of the company that should be integrated. For now, phase 1, this should only be the advertising portal.
Windsor Social sucks
Last modified: Sunday, August 02, 2009
What ever you do, do not submit your email to them. After several phone calls, they assured me that my email would be removed, but it never has.
They continue to send unwanted SPAM. Companies like this should be pelted with rocks.
Funny Simpsons quote
Last modified: Sunday, August 02, 2009
Homer:
How is education supposed to make me feel smarter? Besides, every time I learn something new, it pushes some old stuff out of my brain. Remember when I took that home winemaking course, and I forgot how to drive?
Why do file copy dialogs suck so much?
Last modified: Sunday, August 02, 2009
In all operating systems that I've seen the file copy dialog sucks. Why is this simple dialog that is used by everyone everyday so hard to come up with a good design?
Mac is better than all versions of windows, but it still isn't ideal. Windows fails to give you a do not replace apply to all option.
It would be nice to see a list of some sort detailing all of the status' of each file that was copied. It would be even nicer to be able to select this list and copy it into the clipboard.
The main problem with windows is if a single file fails to copy, the entire copy operation is aborted.
I could go on with several more points, but it just comes down to the developers lack of caring for the operating systems they develop.


