Board

Teamleague Forum

TL ratings are now inflatedProposals Index ->

posted at 2013-07-07 04:59 by LeifPetersen

I remember when I started playing TL back in 42, and the generated ratings were conservatively low. I think that was mostly a good thing, in particular it was easy to add new unproven players to a team.

After all the changes done to the generated TL ratings, the ratings are now too high. New players are fluke rated close to their rating peak. Experienced TL players with a low TL performance rating are also rated with a bias to the high side, leading to TL ratings maybe 100+ higher than their proven performance. I don't understand the high bias towards the very unstable FICS ratings.

This makes the new u1500 section of very limited use. It's harder to squeeze in a team there now than it was with a u1400 team earlier.

posted at 2013-07-07 05:46 by smallblackcat

There is actually less of a bias towards FICS ratings now than in the past. It used to be entirely FICS based, then at some stage we began using a mixed formula using a 70/30 split between FICS rating and TL performance respectively. Now it's a 50/50 split, with the added dimension that TL performance only counts the last 8 tourneys, with an increased weighting from recent tourneys. Therefore TL ratings today are based more on TL performance than at any time in the past.

The ratings given to new players are always a bit problematic, since we can only go on FICS standard ratings. However, the calculation was done in the same way when I joined the admin team back in 2009. There has been no change in policy here.

It may be that you are looking at long-term TL performance, and taking that as your basis for saying that players are over-rated. As I hope I have explained above, this is not how the rating is calculated. TL results from more than 2 years ago no longer count. Nor should they, because some veteran TLers would get very strange ratings if they were. For instance my own TL performance rating of 1992.9 dates from 2007. I don't think many people would consider it fair if this was used as my TL rating now, considering my recent performance, and current FICS rating, are more than 100 points higher.

posted at 2013-07-07 06:36 by LeifPetersen

Thanks for a good answer. Perhaps you tinker a bit too much with the performance rating, with the "increasing weights", it would be great with a simple and transparent rating calculation, and with most weight on recent TL performance.
I still think that new players should be rated conservatively.

posted at 2013-07-07 08:44 by LeifPetersen

My post above doesn't make much sense, sorry, please ignore. What I meant was:

1) it's okay with me if you cut off years old TL performances, but weigh what's left evenly.
2) the performance rating listed in the TL finger should be equal to the one used in the rating calculation.
3) the TL rating calculation should be transparent, can you show in the finger how it is calculated step by step or something like that? At the moment some of the ratings seem to me like they've fallen from the sky.

Furthermore I am not sure believe you when you say the rating calculation of (new) players hasn't changed. Previously it seemed that players were rated according to their 100 game moving average, now it seems that the current rating is king.

posted at 2013-07-07 21:15 by smallblackcat

You are not sure whether you believe me about the rating of new players? Well in that case I may be wasting my time by replying, but here goes:

The practice has always been to use a player's current rating as their TL rating at the time that they join. The 100-game average and the all-time high rating are checked for consistency (i.e. to make sure a player's current rating is not significantly lower than past performance), but are not used to calculate most new players' TL rating.

In point of fact, ficsgames.org with it's excellent 100-game average was not around to consult until relatively recently, so in the past there was greater reliance on current FICS ratings than now.

posted at 2013-07-08 03:09 by LeifPetersen

It wasn't personal that I didn't believe you, I just thought you maybe had the facts wrong, can happen to the best. I'm sorry for the unintended insult.

What seems to be the case from your explanation is that new players ratings are set discretionary, not automatic, and determining whether such a practise has changed or not can be difficult. My personal feeling is that over the years the attitude has changed towards "God forbid someone is rated too low".

Please elaborate your point about ficsgames.org. How long has it been around exactly?

Good luck with your u1500 section, unfortunately I can't fit a team in there myself.

posted at 2013-07-08 23:12 by smallblackcat

No offence taken, I just thought it was an odd choice of words.

According to http://ficsgames.org/about.html gamebot, which is attached to ficsgames.org, has been running since 2008. Unless my memory is faulty, the website itself was not available until a year or two after that. I certainly don't recall using it back in 2009. We did start using it to check on player's ratings soon after it was introduced though, because its historical rating graphs are a useful tool for comparison.

You are correct that our vigilance with setting ratings has increased, partly because we now have more information. Letting new people in with potentially innaccurate or distorted ratings has always been a concern, but we now have better means for making an informed decision about a new player's rating. However, I don't think this can have caused significant overall rating inflation, because the current rating is still used for most new members.

What could be happening is that players who only play TL games find themselves facing a small pool of opponents (other TLers). This could cause some regression to the mean, with the lower ratings increasing and the higher ratings decreasing. However, since TL is not a closed system, and since new members are joining all the time, I cannot imagine this effect is significant either.

I am genuinely sorry that you can't fit your team in the U1500 section, since we need all the teams we can get to make that section viable. I'm also curious if any other former U1400 captains have opinions on this issue, and whether it is in fact harder to form such teams now than in the past.

posted at 2013-07-15 04:13 by greatsachin

good info on how it is calculated. After 25 odd tl games fics standard ratings should not even be considered. I say this because fics standard is based on 15 0 and it becomes a different game altogether. This would largely rule out the possibility of deflating standard rating just before tl. What say?

posted at 2013-07-15 11:49 by KRMCHESS

If anything I think problem with TL ratings is that they inherit from FICS standard ratings and that they are probably inflated. For example I find that FICS standard ratings are often 200-300 higher than FICS blitz ratings. This means that for example a 1100 in blitz can get a 1400ish rating in standard and if a player who is really 1400ish beats them they end up at 1800ish as an initial rating that naturally has a knock on effect. Having said that it's hardly a new issue since rating inflation has been around for a while.

I would say main issue is social as most players in TL are in 1700 to 2000 range and most people in teams will invite opponents they consider strong or who they are on friendly terms with. It wouldn't surprise me if strongest correlation to average rating in a team is actually rating of captain(s) since they will probably recruit new members based upon who they faced

posted at 2013-07-16 00:17 by smallblackcat

Well this thread is starting to head in different directions, but I will briefly respond to GreatSachin's suggestion:

It's a perfectly good suggestion for TL veterans, but the present system has the advantage of applying a similar standard to all ratings. A TL regular might play 25 games in a year, but many won't even play this many. Even 25 games is not a very large sample size to base a rating on.

posted at 2013-11-13 03:52 by PankracyRozumek

Just out of curiosity, could you share what are the correlations:
- between TL ratings difference and result of matches,
- and between FICS ratings difference and result of matches?

I have the impression that TL ratings much better predict the outcome, but would be great to see the actual numbers. A correlation based on games from a single season (even half a season) should be enough.

(Well, I could calculate the correlations myself for TL56, but to have the data, I need to parse TL teams page, TL results page for each round, ficsgames database etc, so I guess it would be much easier for you :). But anyway please ignore this request if it means too much work.)

posted at 2013-11-26 13:45 by wmahan

Unfortunately such an analysis isn't a trivial project, because we don't store the FICS ratings of players. If someone wanted to try, I could provide the results of games and the TL fixed ratings of players. Getting the FICS standard ratings would be more work; it would have to be done by fingering players on FICS or querying FICSgames.org, either manually or with a script.

I'm sure the current fixed rating calculation could be improved. It's basically an ad-hoc script written by seberg and modified by me. Having empirical data to evaluate the rating system would be nice, it's just not a top priority for me right now.

posted at 2015-05-26 03:48 by crazyblue

I was going to write something about the rating sytem, when I saw it has been discussed before. So I just want to give an actual example of myself. My TL rating was 1899 before this season. Performance rating 1888. Then at start of the season it was changed to 1992 because I'm usually between 2000-2100 in fics standard rating (highest 2104). With such a high TL rating I got stronger opponents and made 1.0/6 points without a win. Kinda frustrating. So my TL rating should get down more now. But again my fics rating will go up (it already has) and then next season I fear same thing will happen again. The 15 or 20 mins standard games are completely different to 45 45 tournament situation. Of course this is just one example. I'm sure there are opposite examples too. Just think, maybe it's worth discussing this thing again. If it stays like this, I'm kinda hard to get motivated to play (and keep losing) again.

posted at 2015-05-26 12:14 by PankracyRozumek

Hi crazyblue,

There are also reverse situations: people winning almost everything in TL, but keeping their ratings down due to between season 15 0 casual games. It is even worse when multiple players on the team are underrated.

If there is any interest from TL, I could calculate slow chess ratings based only on TL and SnailBucket games (and stcbunch, if that is still alive).

Michal

posted at 2015-05-27 02:19 by smallblackcat

crazyblue,

A point you may not have considered is that your TL performance rating goes back quite a long way (all the way back to T34, which was 8 years ago), whereas your TL rating only takes account of the 8-most recent tourneys. If I remember the formula correctly, your TL games could account for as much as 80% of your TL rating (depending on how many games you played in recent editions). So basically all that changed was a less recent set of TL results being removed, and the latest results being counted in your TL rating. Also, whatever difference in your average FICS rating between this year and last would have been calculated and made up the balance of your rating. It's not like your FICS rating suddenly bumped up your TL rating for this edition, it would have done that last time as well. Just thought I'd clarify that.

Michal,

I think using all organised long time control games for setting such ratings is a good idea, but the question is how to implement it. Currently all data is collected by the TL bot (obviously it tracks all TL games, but it also updates players' FICS ratings at various points). I have no idea how easy or difficult it would be to have it capture snailbucket results as well.

posted at 2015-05-27 06:09 by PankracyRozumek

sbc,

I am thinking about updating the ratings periodically like FIDE does e.g. after each TL season. If you would be willing to calculate the ratings on your system, we (snailbucket) would send you either a pgn of all games or a simple csv with results after each tournament. Otherwise, if you prefer to calculate the ratings on our system, you would send us the necessary data.

There is glicko2 implementation in python:
https://code.google.com/p/pyglicko2/

Regardless of who would be hosting the script to calculate the ratings, I would be happy to write the necessary code and open source it. (As long as it is in python :)). I can have this ready before TL60.

The only use for FICS ratings in this system would be to establish provisional long time control rating.

Michal

posted at 2015-05-29 08:26 by samuraigoroh

I used to worry about my rival's normal rating vs TL rating, but now I take that information lightly. If my rating was higher, I would expect to win with not much trouble (or at least draw) and same the other way around; if my rating was lower, I would expect to lose (or pull a draw)... Playing with that mentality will undoubtly make you play with a poor performance.

Nowadays I just prepare myself by analyzing my rival's TL games and solving puzzles. I stopped playing chess for a while that despite my rating going up, I don't feel I'm a +1700 rated player right now (in my last game I missed a few blunders I did, but luckily so did my rival).
http://freechess.statistics.at.schwarzes.net/show.php?PLAYER=samuraigoroh

My RD (ratings deviation) right now is still high that will make my rating fluctuate a lot. The only requirement to join TL it seems is to have played 20 games ever (Section 3 - Members Eligibility).

If a different category for long games would ever be implemented, I feel it would fall into this issue (it can't be to reliable because players won't/can't have an active RD and their rating would fluctuate to much).

In conclusion, I don't know how useful would be to find a more accurate rating system, but either way that's just a guideline and not an oracle. You won't know the result of the outcome until you play the game.

samurai goroh