Blog posts tagged: stackexchange
News and other things I find interesting
StackExchange average age of users for each tag
Last modified: Thursday, April 28, 2011
I thought it would be interesting to calculate the average age of users on each StackExchange site, and even more interesting to see each tag within those sites.
I did a caculation using the April 2011 data dump and came up with the following data.
I call the statistic the Expected age of a tag because it is calculated using the Expected Value.
Observations:
- The expected age of the whole StackOverflow site is ~30 years old.
- On StackOverlow the tag with the youngest expected age is 26 years old, the tag with the oldest is 36. I was surprised they were so close together.
- The site with the youngest users of the StackExchange network is: Gaming, then surprisingly Game dev, and Ask Ubuntu.
- The site with the oldest users of the StackExchange network is: Do It Yourself, followed by Photography, and then by Geographic Information Systems.
- A funny one, on ServerFault one of the tags with the oldest expected age is
old-hardware. Apparently older people know more aboutold-hardwarethan anything else. - I'm not sure if this is true, but perhaps the tags with younger ages are more cutting edge. For example vb6 and COBOL have ages of over 36 on Programmers SE. I don't think this assertion is true in general though.
And as for the other sites, the expected age is:
- Android: 30.02 years old.
- Apple: 30.50 years old.
- Ask Ubuntu: 28.08 years old.
- Cooking: 33.18 years old.
- Do It Yourself: 35.68
- Electronics: 32.01 years old.
- English Language and Usage: 32.22 years old.
- Game Development: 27.72 years old.
- Gaming: 27.39 years old.
- Geographic Information Systems: 33.34 years old.
- Mathematics: 30.19 years old.
- Photography: 34.01 years old.
- Programmers: 32.26 years old.
- Server Fault: 31.63 years old.
- Stack Apps: 28.31 years old.
- Stack Overflow: 30.48 years old.
- Statistical Analysis: 33.67 years old.
- Super User: 30.09 years old.
- TeX - LaTeX: 30.86 5years old.
- Theoretical Computer Science: 30.58 years old.
- Unix: 29.97 years old.
- Web Applications: 29.85 years old.
- Webmasters: 29.64 years old.
- Wordpress: 30.32 years old
You can see the per user tag data by clicking on the site name in the above list.
You could probably say that the StackExchange network could use younger contributors. I've said this before, but I think it would be advantageous for the StackExchange team to do some events at Universities. When I previously helped with some Microsoft events at University of Waterloo (Top Computer Science University in Canada, and one of the top in the world) several students didn't know what StackOverflow was.
How I made the calculations per tag
The below calculations were calculated with the April 2011 StackOverflow data dump.
What I calculated was the average age per tag each answer comes from for each StackExchange site.
To do this calculation I calculated the Expected Age of each site.
Expected Age = Summation over each age X of: P(X) * X
Where P(X) is the probability that a user of age X will answer a given question. You can calculate this probability by summing the number of answers by each age, divided by the total number of answers within that tag.
I also only considered the top 3000 tags. The top tags may not match up exactly since I only consider tags if the answerer has an age specified in their profile.
Other attempts at these stats
I initially tried to do this statistic by weighing each age by the reputation of each user, but it turned out to not generate interesting data. The problem was that the data was weighted heavily to only include the top 1% or so of users.
Limitations of this study
- Several users don't enter their age in their profile, so no answers from a user without an age specified counts.
- Users that are very young and users that are very old may be more unlikely to enter their age.
- Each user may be counted more than once, since I only count +1 for each age that answers a questions.
- Some users may be entering fake age values, although I ignored age values out of an acceptable range.
- We are talking about averages here, so this doesn't mean there aren't a lot of younger and older contributors.
For example if an average is 20 years old, there could be an equal amount of 10 and 30 year olds answering, or there could be only 20 year olds answering.
Tags: stackexchange data-analysis stackoverflow
Add a new comment | 1 comment(s)
|
For some reason I find this data to be really interesting. This is not a study I would have performed, but I am glad you did. Kudos! |
Twitter, LinkedIn, and Facebook lists updated for StackExchange April dumps
Last modified: Thursday, April 28, 2011
I refreshed my lists of social networking accounts (Twitter, LinkedIn, and Facebook) for StackExchange users. The lists are sorted by reputation and updated for the April 2011 data dump.
The data dumps surface every 2 months, so I will update the lists on my site around the same frequency.
This month 7 new sites appeared since they came out of the StackExchange beta:
- Android
- Apple
- Do It Yourself
- Electronics
- Geographic Information Systems
- Unix
- Wordpress
You can view all of the links for each list on this section of my site.
For the first time there are over 20 StackExchange sites, and so I ran into a problem of Twitter only allowing you to host 20 lists. For each site I use an automatically maintained list of the top 500 users.
I tried to contact Twitter support to raise my limit of 20 lists but they could not help. I ended up getting my 2 sons to host the lists, so I have all automatic lists up and room for another 36 StackExchange sites. Thanks @linkbondy and @ronniebondy.
Tags: stackoverflow stackexchange data-analysis
Add a new commentTwitter, LinkedIn, and Facebook StackOverflow user lists sorted by reputation
Last modified: Friday, April 22, 2011
StackOverflow users sorted by reputation are now available for Twitter, LinkedIn, and Facebook!
![]()
A few days ago I blogged about a list of StackOverflow (and related StackExchange sites) users sorted by reputation who had twitter accounts.
A couple days later I added extra info to those lists showing the following, followers, last tweet date, and twitter description.
Today I added a couple more list types: LinkedIn, and Facebook.
StackOverflow
- Twitter - LinkedIn - Facebook
ServerFault
- Twitter - LinkedIn - Facebook
SuperUser
- Twitter - LinkedIn - Facebook
Game Development
- Twitter - LinkedIn - Facebook
Webmasters
- Twitter - LinkedIn - Facebook
Web Applications
- Twitter - LinkedIn - Facebook
Ubuntu
- Twitter - LinkedIn - Facebook
Statistical Analysis
- Twitter - LinkedIn - Facebook
StackApps
- Twitter - LinkedIn - Facebook
Photography
- Twitter - LinkedIn - Facebook
Mathematics
- Twitter - LinkedIn - Facebook
Cooking
- Twitter - LinkedIn - Facebook
Gaming
- Twitter - LinkedIn - Facebook
Across all sites users are 2x more likely to have their twitter info vs. their LinkedIn info.
Likewise across all sites users are 2x more likely to have their LinkedIn info vs. their facebook info.
Tags: data-analysis stackexchange stackoverflow
Add a new comment | 3 comment(s)
|
Very interesting, especially the Twitter vs. LinkedIn vs. FB likelihoods. I wonder if the likelihoods would increase on less tech-centric sites, though if 73% are from SO then I doubt that it would change. |
|
I wonder how the Twitter v LinkedIn v Facebook ratios vary between age groups? |
|
I think the ratios don't have anything to do with who uses what, but it just means that people keep their facebook the most private, followed by linked in, and twitter they will let complete strangers in. |
73 percent of StackExchange users from StackOverflow
Last modified: Friday, April 22, 2011
StackExchange is a group of Q&A sites created by StackOverflow (SO).
But exactly which part of the new StackExchange Q&A sites are new users and which part of are shared from StackOverflow?
I mined the November 2010 data dump again and came up with some interesting stats.
To figure out the common percentage between StackOverflow and other sites, I created lists of in memory users for each site, and then figured out which users had the same email hash. A user across sites with the same email hash can be considered the same user.
I knew before doing this analysis that the percentage of common users to StackExchange users would be high because of the relative size of the StackOverflow community. I do fully expect for this 73% to decrease for future data dumps though and it will be interesting to re-run these stats and compare when the next data dump comes out.
Here are the statistics per site:
- Cooking: 2630 of 3155 in common (83.36%)
- Game Development: 2497 of 2938 in common (84.99%)
- Gaming: 3813 of 4418 in common (86.31%)
- Mathematics: 2162 of 2965 in common (72.92%)
- Photography: 1659 of 1916 in common (86.59%)
- Server Fault: 28770 of 38434 in common (74.86%)
- StackApps: 3656 of 3874 in common (94.37%)
- Statistical Analysis: 1298 of 1728 in common (75.12%)
- Super User: 31897 of 49157 in common (64.89%)
- Ubuntu: 3245 of 5090 in common (63.75%)
- WebApplications: 5575 of 6223 in common (89.59%)
- WebMasters: 2612 of 2820 in common (92.62%)
Total: 73.19% in common, 26.81% distinct
Of particular interest are the sites with a very high common percentage and some overlapping questions like the WebMasters StackExchange site.
Update:
What percentage of SO users come from the other sites? I checked the registration dates and a surprising 5% of SO accounts come from the other sites. This doesn't change the result much above though. Almost all of these 5% of distinct accounts come from Ask Ubuntu, Super User, and Server Fault.
Tags: data-analysis stackoverflow stackexchange
Add a new comment | 4 comment(s)
|
I know it's only been six months since Area51 kicked off but I see this as worrying aspect of the whole Stack Exchange thing. They went from one extreme (far too expensive and not very good - SE never pace with the SO codebase) to the Area51 "incubation" thing where, lets face it, a noisy minority of [M]SO users decide on whether a proposal is any good. Jeff and Joel really need to get off their arrogant high horses ("making the internet a better place" - meh) and realise that they are never going to appeal to folks outside of the [M]SO clique unless they lower the barrier for entry. The whole Area51 process is an utter f***ing joke and makes me weep. I wonder how long their VC's will allow this to carry on. I referred a couple of internet savvy foodie pals to the cooking site, they took one look and decided not to participate. When I asked why, they felt that there were too many "bolt counters", self appointed officials and the whole thing seemed like a bunch of basement nerds writing a science paper on cooking. I myself participated for a while (I'm a long term SO user so I grok the rules) but felt there were far too many "Meta" types trying to call the shots on whether my questions were "subjective". I mean FFS, cooking, flavouring, tasting...these are all subjective. I gave up. Why they can't accept that what people really want is a tenner a month (rising sensibly based on realistic resource consumption) hosted Q&A service with all the frills of SO and without some officious w*nkers from MSO (and I mean the ones who clearly have no interest in "community" but like making rules - the ones who have stupidly high MSO rep, but barely participate in the sites themselves) poking their noses in. How hard can that be to provide? Maybe it's time that Jeff and Joel are sidelined because they sure aren't coming up with any "great" ideas of late? |
|
I think it is too early to really take this data too seriously. I think that in the next few months we will see a large shift in that percentage. |
|
I also think that we will see an improvement on the numbers. I think that more VC money should be spent going to each communities individual conferences and shows though. |
|
I think that the initial SO idea is great, but seeing how some of the more subjective sites (especially Programmers) are ruled with an iron fist of "Off Topic" and "Not Constructive" I wonder what the real purpose of these sites is. Sure there is excellent content and advice, but is the goal helping the users or the site? If we continue to persecute "bad" questions with this kind of zealousness where will we end up? I take offense to the "expert level" criteria – not everyone is an expert and they should not be expected to be anyways. |
Twitter accounts for all StackOverflow users by reputation
Last modified: Friday, April 22, 2011
Wondering who to follow on twitter to keep up to date on technology?
I mined the latest StackOverflow (SO) data dump for all users with twitter accounts, then calculated each user's top tags based on most votes, and finally sorted the lists by user reputation.
The end result is that you can now easily stay connected with the people in your Stack Exchange community.
Here is a screenshot of what the SO list looks like, containing over 2300 Twitter accounts:
If you'd like to have your account listed in the directories, simply make sure your twitter account is linked somewhere in your profile, and I'll update these lists again on a future data dump.
I also mined the available Stack Exchange data dumps and extracted those twitter accounts as well.
You can view the lists here:
- StackOverflow
- ServerFault
- SuperUser
- Game Development
- Webmasters
- Web Applications
- Ubuntu
- Statistical Analysis
- StackApps
- Photography
- Mathematics
- Cooking
- Gaming
Updates:
- Created real twitter lists which are self updating via the Twitter API. You can access these twitter lists from the lists linked above. Note: Twitter has a limit of 500 users per list so I include only the top 500 users.
- Removed some meta tags for the "Known By" list such as "mistakes" so that I don't show anyone as being known for mistakes :)
- Fixed a bug with non StackOverflow sites linking to the StackOverflow user pages.
- Added better parsing to find twitter URLs
- Added filtering of bad twitter URLs
- Removed invalid twitter accounts that don't actually exist anymore
- Added followers count, following count, last tweet date, and twitter description
Tags: stackoverflow twitter data-analysis stackexchange
Add a new comment | 15 comment(s)
|
Would it be possible to add the description and profile from each Twitter account? That would add a bit of personality to the otherwise-dry list. |
|
I want to keep the list as small as possible, I have the "known for" to show you what they do. Otherwise it's harder to focus on what the list is for. |
|
Fantastic use of the data dump! I followed a bunch of Stack Overflow users today, and now Twitter's "Who to Follow" algorithm is starting to auto-find more for me. :) |
|
You can approximate this with the SO API. Benefit of it being in real time. Mild annoyance it having to deal with request limits and quotas (which I completely punt on in my example). Example (The actual javascript is garbage I threw together, the important parts are the API calls) http://api.stackoverflow.com/1.0/help /users (sort=reputation) & /users/{id}/tags are the important ones for this. |
|
Thanks everyone. I added real twitter lists now too for each user list, you can access the lists in each page linked above. |
|
Wow that's a terrific job, thanks. |
|
Taking what Kevin did above, I added a few features to the API version here: http://jsfiddle.net/nick_craver/crvth/2/embedded/result,js,html,css/ granted, there's a lot more data that can be exposed this way, and it's live as well. |
|
Implemented some new additions today: - Added better parsing to find twitter URLs Links are updated as per the above list links. |
|
How did you get the gravatar's for the users? Was that a by product of parsing out the twitter information? I thought you had to have an email to resolve the gravatar but I don't believe StackOverflow provides that in their data dump, thus my curiosity. |
|
@Chris: StackOverflow data dumps include the email hash of each user. This hash is used to obtain the gravatar. As for all of the meta twitter info, that is obtained with the Twitter API. |
|
I'm completely gutted because I'd be in the top 100 if I'd added Twitter (@skilldrick) to my bio :P |
|
This is an excellent use of the data dump, I am seeing more and more constructive uses information to assist actual engineers (as opposed to it being just a vehicle to advertise) http://www.redmonk.com for example I was suprised stackoverflow doesn't provide specific profile fields for twitter. I guess it may result in answers not being published to those interested. |
|
@whatsthebeef I requested this in the past but they didn't implement it. I asked to have an attribute and value of rel="me" for twitter accounts etc. By the way there is a newer blog post for this topic but also including linked in and facebook account lists here: http://brianbondy.com/blog/id/107/twitter-linkedin-and-facebook-stackoverflow-user-lists-sorted-by-reputation |
|
Brian, Anyway you can make an online utility of typing in *any* tag and getting an on-the-fly generated list of the top 10 people by reputation under that tag? Then make this a standard feature in twitter clients? I'm surprises such a powerful feature is not an everyday feature for everyone by now. |
|
Hi @Paul, Ya I plan on putting it in an SQL db so at that point it'll be a simple query. |
