Blog posts tagged: stackexchange

News and other things I find interesting


RSS Feed


Apr
26
2011

StackExchange average age of users for each tag

Last modified: Thursday, April 28, 2011

I thought it would be interesting to calculate the average age of users on each StackExchange site, and even more interesting to see each tag within those sites. I did a caculation using the April 2011 data dump and came up with the following data. I call the statistic the Expected age of a tag because it is calculated using the Expected Value.

Observations:

  • The expected age of the whole StackOverflow site is ~30 years old.
  • On StackOverlow the tag with the youngest expected age is 26 years old, the tag with the oldest is 36. I was surprised they were so close together.
  • The site with the youngest users of the StackExchange network is: Gaming, then surprisingly Game dev, and Ask Ubuntu.
  • The site with the oldest users of the StackExchange network is: Do It Yourself, followed by Photography, and then by Geographic Information Systems.
  • A funny one, on ServerFault one of the tags with the oldest expected age is old-hardware. Apparently older people know more about old-hardware than anything else.
  • I'm not sure if this is true, but perhaps the tags with younger ages are more cutting edge. For example vb6 and COBOL have ages of over 36 on Programmers SE. I don't think this assertion is true in general though.

And as for the other sites, the expected age is:

You can see the per user tag data by clicking on the site name in the above list.

You could probably say that the StackExchange network could use younger contributors. I've said this before, but I think it would be advantageous for the StackExchange team to do some events at Universities. When I previously helped with some Microsoft events at University of Waterloo (Top Computer Science University in Canada, and one of the top in the world) several students didn't know what StackOverflow was.

How I made the calculations per tag

The below calculations were calculated with the April 2011 StackOverflow data dump.

What I calculated was the average age per tag each answer comes from for each StackExchange site.

To do this calculation I calculated the Expected Age of each site.

Expected Age = Summation over each age X of: P(X) * X

Where P(X) is the probability that a user of age X will answer a given question. You can calculate this probability by summing the number of answers by each age, divided by the total number of answers within that tag.

I also only considered the top 3000 tags. The top tags may not match up exactly since I only consider tags if the answerer has an age specified in their profile.

Other attempts at these stats

I initially tried to do this statistic by weighing each age by the reputation of each user, but it turned out to not generate interesting data. The problem was that the data was weighted heavily to only include the top 1% or so of users.

Limitations of this study

  • Several users don't enter their age in their profile, so no answers from a user without an age specified counts.
  • Users that are very young and users that are very old may be more unlikely to enter their age.
  • Each user may be counted more than once, since I only count +1 for each age that answers a questions.
  • Some users may be entering fake age values, although I ignored age values out of an acceptable range.
  • We are talking about averages here, so this doesn't mean there aren't a lot of younger and older contributors.
    For example if an average is 20 years old, there could be an equal amount of 10 and 30 year olds answering, or there could be only 20 year olds answering.

Tags:

Add a new comment | 1 comment(s)

Gravatar image Andrew Steele on Tuesday, April 26, 2011 (04:04:23) says:

For some reason I find this data to be really interesting. This is not a study I would have performed, but I am glad you did. Kudos!





Apr
25
2011

Twitter, LinkedIn, and Facebook lists updated for StackExchange April dumps

Last modified: Thursday, April 28, 2011

I refreshed my lists of social networking accounts (Twitter, LinkedIn, and Facebook) for StackExchange users. The lists are sorted by reputation and updated for the April 2011 data dump.

The data dumps surface every 2 months, so I will update the lists on my site around the same frequency.

This month 7 new sites appeared since they came out of the StackExchange beta:

  • Android
  • Apple
  • Do It Yourself
  • Electronics
  • Geographic Information Systems
  • Unix
  • Wordpress

You can view all of the links for each list on this section of my site.

For the first time there are over 20 StackExchange sites, and so I ran into a problem of Twitter only allowing you to host 20 lists. For each site I use an automatically maintained list of the top 500 users.

I tried to contact Twitter support to raise my limit of 20 lists but they could not help. I ended up getting my 2 sons to host the lists, so I have all automatic lists up and room for another 36 StackExchange sites. Thanks @linkbondy and @ronniebondy.

Tags:

Add a new comment





Jan
6
2011

Twitter, LinkedIn, and Facebook StackOverflow user lists sorted by reputation

Last modified: Friday, April 22, 2011

StackOverflow users sorted by reputation are now available for Twitter, LinkedIn, and Facebook!

Social icons

A few days ago I blogged about a list of StackOverflow (and related StackExchange sites) users sorted by reputation who had twitter accounts.

A couple days later I added extra info to those lists showing the following, followers, last tweet date, and twitter description.

Today I added a couple more list types: LinkedIn, and Facebook.

StackOverflow
- Twitter    - LinkedIn    - Facebook

ServerFault
- Twitter    - LinkedIn    - Facebook

SuperUser
- Twitter    - LinkedIn    - Facebook

Game Development
- Twitter    - LinkedIn    - Facebook

Webmasters
- Twitter    - LinkedIn    - Facebook

Web Applications
- Twitter    - LinkedIn    - Facebook

Ubuntu
- Twitter    - LinkedIn    - Facebook

Statistical Analysis
- Twitter    - LinkedIn    - Facebook

StackApps
- Twitter    - LinkedIn    - Facebook

Photography
- Twitter    - LinkedIn    - Facebook

Mathematics
- Twitter    - LinkedIn    - Facebook

Cooking
- Twitter    - LinkedIn    - Facebook

Gaming
- Twitter    - LinkedIn    - Facebook

Across all sites users are 2x more likely to have their twitter info vs. their LinkedIn info.

Likewise across all sites users are 2x more likely to have their LinkedIn info vs. their facebook info.

Tags:

Add a new comment | 3 comment(s)

Gravatar image Joshua Kehn on Thursday, January 06, 2011 (08:01:46) says:

Very interesting, especially the Twitter vs. LinkedIn vs. FB likelihoods. I wonder if the likelihoods would increase on less tech-centric sites, though if 73% are from SO then I doubt that it would change.

Gravatar image Dan M on Tuesday, January 11, 2011 (05:01:35) says:

I wonder how the Twitter v LinkedIn v Facebook ratios vary between age groups?

Gravatar image Brian R. Bondy on Wednesday, January 12, 2011 (11:01:17) says:

I think the ratios don't have anything to do with who uses what, but it just means that people keep their facebook the most private, followed by linked in, and twitter they will let complete strangers in.





Jan
3
2011

73 percent of StackExchange users from StackOverflow

Last modified: Friday, April 22, 2011

StackExchange is a group of Q&A sites created by StackOverflow (SO).

But exactly which part of the new StackExchange Q&A sites are new users and which part of are shared from StackOverflow?

I mined the November 2010 data dump again and came up with some interesting stats.

To figure out the common percentage between StackOverflow and other sites, I created lists of in memory users for each site, and then figured out which users had the same email hash. A user across sites with the same email hash can be considered the same user.

I knew before doing this analysis that the percentage of common users to StackExchange users would be high because of the relative size of the StackOverflow community. I do fully expect for this 73% to decrease for future data dumps though and it will be interesting to re-run these stats and compare when the next data dump comes out.

Here are the statistics per site:

  • Cooking: 2630 of 3155 in common (83.36%)
  • Game Development: 2497 of 2938 in common (84.99%)
  • Gaming: 3813 of 4418 in common (86.31%)
  • Mathematics: 2162 of 2965 in common (72.92%)
  • Photography: 1659 of 1916 in common (86.59%)
  • Server Fault: 28770 of 38434 in common (74.86%)
  • StackApps: 3656 of 3874 in common (94.37%)
  • Statistical Analysis: 1298 of 1728 in common (75.12%)
  • Super User: 31897 of 49157 in common (64.89%)
  • Ubuntu: 3245 of 5090 in common (63.75%)
  • WebApplications: 5575 of 6223 in common (89.59%)
  • WebMasters: 2612 of 2820 in common (92.62%)

Total: 73.19% in common, 26.81% distinct

Of particular interest are the sites with a very high common percentage and some overlapping questions like the WebMasters StackExchange site.

Update:

What percentage of SO users come from the other sites? I checked the registration dates and a surprising 5% of SO accounts come from the other sites. This doesn't change the result much above though. Almost all of these 5% of distinct accounts come from Ask Ubuntu, Super User, and Server Fault.

Tags:

Add a new comment | 4 comment(s)

Gravatar image Truxton on Monday, January 03, 2011 (10:01:34) says:

I know it's only been six months since Area51 kicked off but I see this as worrying aspect of the whole Stack Exchange thing. They went from one extreme (far too expensive and not very good - SE never pace with the SO codebase) to the Area51 "incubation" thing where, lets face it, a noisy minority of [M]SO users decide on whether a proposal is any good.

Jeff and Joel really need to get off their arrogant high horses ("making the internet a better place" - meh) and realise that they are never going to appeal to folks outside of the [M]SO clique unless they lower the barrier for entry.

The whole Area51 process is an utter f***ing joke and makes me weep. I wonder how long their VC's will allow this to carry on.

I referred a couple of internet savvy foodie pals to the cooking site, they took one look and decided not to participate. When I asked why, they felt that there were too many "bolt counters", self appointed officials and the whole thing seemed like a bunch of basement nerds writing a science paper on cooking. I myself participated for a while (I'm a long term SO user so I grok the rules) but felt there were far too many "Meta" types trying to call the shots on whether my questions were "subjective". I mean FFS, cooking, flavouring, tasting...these are all subjective. I gave up.

Why they can't accept that what people really want is a tenner a month (rising sensibly based on realistic resource consumption) hosted Q&A service with all the frills of SO and without some officious w*nkers from MSO (and I mean the ones who clearly have no interest in "community" but like making rules - the ones who have stupidly high MSO rep, but barely participate in the sites themselves) poking their noses in.

How hard can that be to provide? Maybe it's time that Jeff and Joel are sidelined because they sure aren't coming up with any "great" ideas of late?

Gravatar image jjnguy on Monday, January 03, 2011 (11:01:43) says:

I think it is too early to really take this data too seriously. I think that in the next few months we will see a large shift in that percentage.

Gravatar image Brian R. Bondy on Tuesday, January 04, 2011 (08:01:55) says:

I also think that we will see an improvement on the numbers. I think that more VC money should be spent going to each communities individual conferences and shows though.

Gravatar image Joshua Kehn on Tuesday, January 04, 2011 (11:01:42) says:

I think that the initial SO idea is great, but seeing how some of the more subjective sites (especially Programmers) are ruled with an iron fist of "Off Topic" and "Not Constructive" I wonder what the real purpose of these sites is. Sure there is excellent content and advice, but is the goal helping the users or the site? If we continue to persecute "bad" questions with this kind of zealousness where will we end up?

I take offense to the "expert level" criteria – not everyone is an expert and they should not be expected to be anyways.





Jan
1
2011

Twitter accounts for all StackOverflow users by reputation

Last modified: Friday, April 22, 2011

Wondering who to follow on twitter to keep up to date on technology?

I mined the latest StackOverflow (SO) data dump for all users with twitter accounts, then calculated each user's top tags based on most votes, and finally sorted the lists by user reputation.

The end result is that you can now easily stay connected with the people in your Stack Exchange community.

Here is a screenshot of what the SO list looks like, containing over 2300 Twitter accounts:

Screenshot of the twitter list

If you'd like to have your account listed in the directories, simply make sure your twitter account is linked somewhere in your profile, and I'll update these lists again on a future data dump.

I also mined the available Stack Exchange data dumps and extracted those twitter accounts as well.

You can view the lists here:

Updates:

  • Created real twitter lists which are self updating via the Twitter API. You can access these twitter lists from the lists linked above. Note: Twitter has a limit of 500 users per list so I include only the top 500 users.
  • Removed some meta tags for the "Known By" list such as "mistakes" so that I don't show anyone as being known for mistakes :)
  • Fixed a bug with non StackOverflow sites linking to the StackOverflow user pages.
  • Added better parsing to find twitter URLs
  • Added filtering of bad twitter URLs
  • Removed invalid twitter accounts that don't actually exist anymore
  • Added followers count, following count, last tweet date, and twitter description

Tags:

Add a new comment | 15 comment(s)

Gravatar image Josh Lee on Saturday, January 01, 2011 (06:01:23) says:

Would it be possible to add the description and profile from each Twitter account? That would add a bit of personality to the otherwise-dry list.

Gravatar image Brian R. Bondy on Saturday, January 01, 2011 (06:01:10) says:

I want to keep the list as small as possible, I have the "known for" to show you what they do. Otherwise it's harder to focus on what the list is for.

Gravatar image Bill the Lizard on Saturday, January 01, 2011 (10:01:30) says:

Fantastic use of the data dump! I followed a bunch of Stack Overflow users today, and now Twitter's "Who to Follow" algorithm is starting to auto-find more for me. :)

Gravatar image Kevin Montrose on Saturday, January 01, 2011 (11:01:42) says:

You can approximate this with the SO API. Benefit of it being in real time. Mild annoyance it having to deal with request limits and quotas (which I completely punt on in my example).

Example
http://jsbin.com/odase3/50
Source
http://jsbin.com/odase3/50/edit

(The actual javascript is garbage I threw together, the important parts are the API calls)

http://api.stackoverflow.com/1.0/help

/users (sort=reputation) & /users/{id}/tags are the important ones for this.

Gravatar image Brian R. Bondy on Sunday, January 02, 2011 (01:01:00) says:

Thanks everyone.

I added real twitter lists now too for each user list, you can access the lists in each page linked above.

Gravatar image systempuntoout on Sunday, January 02, 2011 (11:01:44) says:

Wow that's a terrific job, thanks.

Gravatar image Nick Craver on Sunday, January 02, 2011 (02:01:07) says:

Taking what Kevin did above, I added a few features to the API version here:

http://jsfiddle.net/nick_craver/crvth/2/embedded/result,js,html,css/

granted, there's a lot more data that can be exposed this way, and it's live as well.

Gravatar image Brian R. Bondy on Wednesday, January 05, 2011 (09:01:31) says:

Implemented some new additions today:

- Added better parsing to find twitter URLs
- Added filtering of bad twitter URLs
- Removed invalid twitter accounts that don't actually exist anymore
- Added followers count, following count, last tweet date, and twitter description

Links are updated as per the above list links.

Gravatar image Chris on Thursday, January 06, 2011 (12:01:02) says:

How did you get the gravatar's for the users? Was that a by product of parsing out the twitter information? I thought you had to have an email to resolve the gravatar but I don't believe StackOverflow provides that in their data dump, thus my curiosity.

Gravatar image Brian R. Bondy on Thursday, January 06, 2011 (01:01:54) says:

@Chris: StackOverflow data dumps include the email hash of each user. This hash is used to obtain the gravatar.

As for all of the meta twitter info, that is obtained with the Twitter API.

Gravatar image Skilldrick on Monday, January 10, 2011 (08:01:00) says:

I'm completely gutted because I'd be in the top 100 if I'd added Twitter (@skilldrick) to my bio :P

Gravatar image whatsthebeef on Monday, January 10, 2011 (02:01:33) says:

This is an excellent use of the data dump, I am seeing more and more constructive uses information to assist actual engineers (as opposed to it being just a vehicle to advertise) http://www.redmonk.com for example

I was suprised stackoverflow doesn't provide specific profile fields for twitter. I guess it may result in answers not being published to those interested.

Gravatar image Brian R. Bondy on Monday, January 10, 2011 (02:01:20) says:

@whatsthebeef I requested this in the past but they didn't implement it. I asked to have an attribute and value of rel="me" for twitter accounts etc.

By the way there is a newer blog post for this topic but also including linked in and facebook account lists here: http://brianbondy.com/blog/id/107/twitter-linkedin-and-facebook-stackoverflow-user-lists-sorted-by-reputation

Gravatar image Paul on Tuesday, January 11, 2011 (11:01:29) says:

Brian,

Anyway you can make an online utility of typing in *any* tag and getting an on-the-fly generated list of the top 10 people by reputation under that tag? Then make this a standard feature in twitter clients?

I'm surprises such a powerful feature is not an everyday feature for everyone by now.

Gravatar image Brian R. Bondy on Tuesday, January 11, 2011 (11:01:11) says:

Hi @Paul,

Ya I plan on putting it in an SQL db so at that point it'll be a simple query.





Next page