The database suffers under high load of many client connections.
We are trying to reduce the load by disabling services temporarily. Buddylist and others may be affected.
Update 11:00: still optimizing components. The situation improves gradually, but will nevertheless take time.
Update 13:00: Operation resumed, but some services disabled. Buddylist status updates, points for the weekend (big sorry), Toplist, and some less visible components. Most important: chat works and people can meet each other. The world is back online.
The DB server will be locked for some time (expected: 1h) for maintenance. The websites will be affected and no new logins possible. Users who stay logged in and do not navigate will be able to continue their chats.
Obviously the improved DB connection did not help as expected. There was a 2 hour outage on one of the cluster nodes.
This is a growth problem as not only server load grows, but also the effective coupling of sub systems by way of their increasingly loaded interfaces. Events once isolated begin to propagate between sub systems.
Investigation is under way. In addition, an alternate solution will be implemented today.