Friday 19 December 2008

MUC update successful

The MUC service for muc1.virtual-presence.org has been updated successfully and works fine.

MUC update

We're updating the muc1.virtual-presence.org service... while that most of the virtual world is offline for a few seconds.

Thursday 18 December 2008

Points migration finished

The points migration has been finished but the Toplins page will stay offline for a few days.

Tuesday 16 December 2008

MUC server back

The muc1.virtual-presence.org service is up again and works normally.

MUC server issue

The muc1.virtual-presence.org server has a connection issue, so most of the virtual world is offline.
We're working on it.

Wednesday 10 December 2008

Points migration

At the moment we transfer the points to another code base so users may notice a difference between the points while the migration is running.

Tuesday 9 December 2008

Load balancer problems

Our load balancers have a problem with too many connections. Our website is not available at the moment, so several services are unavailable too, like login, wousle, contact list etc.

We're working on it.

Update: All services are unavailable...
2nd Update: All services are back online but there's still work to do with the load balancers.

Thursday 27 November 2008

Contactlist online

Today, we've reactivated the contact list. For new contacts the new infrastructure is already running.

At the moment, we're migrating the old contacts to the new infrastructure. This will take some hours.

Wednesday 26 November 2008

XMPP Failure

One of our XMPP services failed and needed to be restarted. A part of the virtual world was unavailable for about 4 minutes.

Wednesday 19 November 2008

Moving Users Between Clusters

We're moving some users between the clusters again to balance the load between them. Users will detect a normal short reconnect when they have moved.

Thursday 13 November 2008

Moving Users Between Clusters

Currently we're moving some users between the XMPP clusters to balance the load between them. Users will detect a normal short reconnect when they have moved.

Wednesday 12 November 2008

Chat Server Lag

A room server has very high lag. We are investigating.

Update: only the connection between XMPP cluster 1 and one chat server lags.

Update: during the investigation the room server crashed due to strange memory conditions. After the restart everything is back to normal.

Scheduled XMPP downtime

We will have a scheduled XMPP maintenance window today from 16:15am to 16:20am UTC to add performance updates.

Tuesday 11 November 2008

XMPP failure in Cluster 1, restarted

XMPP cluster 1 failed. Restarted multiple times to reestablish all connections.

XMPP failure in Cluster 1, restarted

The XMPP service in the Cluster 1 failed. Restarted, all affected users will automatically reconnect.

Saturday 8 November 2008

XMPP restart of Cluster 1

We experienced memory shortage XMPP cluster 1 and restarted it to free some leaked memory.

Friday 7 November 2008

New XMPP Cluster

A second XMPP cluster has been installed. New users will be registered on the new cluster. This will reduce load on the first cluster.

Thursday 6 November 2008

Preparation for XMPP Cluster Changes

We are preparing for XMPP cluster config changes in the next 24 hours. We hope that the preparation does not affect the operation. We expect a XMPP reboot as worst case, but we hope to avoid it.

Tuesday 4 November 2008

XMPP restart

XMPP servers have been restarted for maintenance.

Monday 3 November 2008

XMPP maintenance restart, new database service

The XMPP servers got their own database service. A restart is neccessary to enable the new configuration.

Sunday 2 November 2008

XMPP Maintenance restart

Due to a high load the XMPP cluster will be restarted as a safety measure at 8:30pm UTC.

XMPP Maintenance restart

Due to a high load a small part of the virtual world failed and the rest of the world has been restarted as a safety measure.

Saturday 1 November 2008

XMPP Failure and maintenance restart

Because of high load on the XMPP cluster a small part of the virtual world was unavailable. To avoid side effects some other parts has been restarted as a safety measure.

Friday 31 October 2008

Router Crash

The router failed. The secondary took over, but not all services are reachable from the outside world.

Update: the problem is solved. The resolution took very long. The network problem interfered with alert procedures. We are working to improve alerting in these cases.

Saturday 25 October 2008

New registrations will be delayed.

New Weblin registrations will be delayed for a while. All other users can continue.

Update: Registrations should arrive soon now. Operation of email delivery back to normal.

Thursday 23 October 2008

Network Problems

A network component shows errors, but not so much, that the fail over is activated. Random connection losses. We are working to identify the component.

Update: networking has been repaired.

Wednesday 22 October 2008

XMPP Server Failure [Update]

The XMPP cluster failed because of high load. The clients are trying to reconnect so the operation continues when the cluster is back.

Update: The cluster is back and running after 4 minutes.

MUC Server Failure

The Multi User Chat cluster partly failed and about half of the layered virtual world was offline.

Server has been restarted and the world is online again.

Tuesday 21 October 2008

Upgrading XMPP Cluster

A new XMPP cluster is installed to cope with the growing load. The cluster will be rebooted (possibly several times) during the installaton. Connection to chat servers might be interrupted until everything is up and running.

The cluster has been prepared in advance. But the switch might still be bumpy, because of size and version upgrades at the same time.

Friday 17 October 2008

XMPP Maintenance Restart

XMPP has been restarted as a safety measure.

Thursday 16 October 2008

XMPP Maintenance Restart

XMPP has been restarted as a safety measure.

Monday 13 October 2008

XMPP Maintenance Shutdown [Update]

To introduce a new database the XMPP service will be restarted at about 11:10pm UTC.

Update: The XMPP services are back and running.

Partial XMPP Maintenance Restart

Part of the XMPP cluster has been restarted as a safety measure.

Forum Down [Update]

Due to a server malfunction the forum is currently down. We work on it to repair the server.

Update: The forum service is back and running.

XMPP Maintenance Restart

XMPP has been restarted as a safety measure.

Saturday 11 October 2008

Partial XMPP Maintenance Restart

Part of the XMPP cluster has been restarted as a safety measure.

Comment: weblin is experiencing very high load, which is a good thing. Due to the recent events, we are very closely watching the XMPP condition and restart occasionally to avoid dangerous conditions. New (more and larger) servers will be online soon.

Friday 10 October 2008

XMPP Maintenance Restart

XMPP has been restarted as a safety measure.

Wednesday 1 October 2008

XMPP Service Restart

We needed to restart the XMPP service to start a new module to improve our services.

Monday 29 September 2008

Parameter change in XMPP-Servers

We changed a parameter in our XMPP-Servers.
Main operation continues, clients automatically reconnect to other cluster hosts.

Saturday 27 September 2008

XMPP Server Failure

A part of the XMPP cluster failed.

Main operation continues, clients automatically reconnect to other cluster hosts.

Monday 22 September 2008

Scheduled XMPP downtime [Update]

We will have a scheduled XMPP maintenance window tonight from 22:00am to 22:05am UTC to add performance updates.

Update: The XMPP service is up and running again.

Friday 19 September 2008

Buddylist updates off

Buddylist updates remain deactivated.

Comment: there seems to be an instability of the XMPP cluster under the condition of a combined high load of Web and XMPP. This is a special case, but we prefer to keep buddy list updates disabled until the condition is fixed, rather than risking general XMPP operation.

Thursday 18 September 2008

Load Problems

We are experiencing high load. This results in very slow Web access and even XMPP cluster failure.

Stopping some sub systems, e.g. buddylist updates, topcloud.

Comment: most developers are working on optimizations. We are getting new all time highs every day and we are trying to keep up with the growth. This is an ongoing process.

Tuesday 16 September 2008

XMPP Server Failure

A part of the XMPP cluster failed.

Main operation continues, clients automatically reconnect to other cluster hosts.

Operation completely restored after 3 minutes.

Sunday 14 September 2008

Topcloud briefly activated

Topclound has been activated for 1 h to check for high traffic sites in order to add more random rooms (see the LMS Operation Log for more). It is now again disabled until the re-write is completed.

Tuesday 9 September 2008

TopCloud Offline

Topcloud updates have been disabled because the processing might affect chat operation.

Comment: Topcloud processing will be changed. It is expected to resume early next week.

Sunday 7 September 2008

Chat Server Failure

location.virtual-presence.org failed. Most of the layered virtual world is offline.

Update: Server restart at 17:15

Comment: chat operation will partially be moved to other chat servers to reduce the load on location.virtual-presence.org

XMPP Server Failure

A part of the XMPP cluster failed.

main operation continues, clients automatically reconnect to other cluster hosts. But followup errors affect client notifications, i.e. nickname changes and messages are not propagated to clients.

Update: Operation completely restored at 14:15

Wednesday 20 August 2008

Communication Problem

There is a s2s communication problem between the XMPP cluster and other servers.

Update 16:10: solved by XMPP server restart.

Tuesday 19 August 2008

New Primary Database

We got a new even more powerful primary database. The hardware patched old primary is now first secondary. Second secondary will also be around.

Monday 18 August 2008

Operation resumed

The hardware has been patched up and is running again.

Hardware Failure

The current problem seems to be a hardware error on the main server and another hardware problem on the secondary.

We are aware, that this is virtually impossible. Nevertheless, the primary failed due to a hardware error and the secondary did not take over because of a different hardware related problem.

The hardware on the main server has been changed. Recover is under way. We expect that one of the DB servers will resume operation soon.

Website and client login down

There is a serious database problem. The failover did not come up automatically. We are working to recover.

Saturday 16 August 2008

Database Issues

The database suffers under high load of many client connections.

We are trying to reduce the load by disabling services temporarily. Buddylist and others may be affected.

Update 11:00: still optimizing components. The situation improves gradually, but will nevertheless take time.

Update 13:00: Operation resumed, but some services disabled. Buddylist status updates, points for the weekend (big sorry), Toplist, and some less visible components. Most important: chat works and people can meet each other. The world is back online.

Wednesday 13 August 2008

Database down for Maintenance

The DB server will be locked for some time (expected: 1h) for maintenance. The websites will be affected and no new logins possible. Users who stay logged in and do not navigate will be able to continue their chats.

Update@02:12: Back online.

Tuesday 12 August 2008

DB Maintenance

Portal very slow due to additional out of order DB backup.

Sunday 10 August 2008

XMPP Server Reboot

Server reboot to apply new database server config.

About the XMPP Authentication Refused Problem

Obviously the improved DB connection did not help as expected. There was a 2 hour outage on one of the cluster nodes.

This is a growth problem as not only server load grows, but also the effective coupling of sub systems by way of their increasingly loaded interfaces. Events once isolated begin to propagate between sub systems.

Investigation is under way. In addition, an alternate solution will be implemented today.

Friday 8 August 2008

XMPP Server Reboot

Reboot to add improved DB connection module. Lets see if this makes the connections more stable.

XMPP Authentication Refused

One of the XMPP servers refuses to authenticate clients. DB connection problem.

Restarted the XMPP server. A solution is in the works. The client release soon to come will also be part of the solution.

Wednesday 6 August 2008

Release Overload

A portal software release results in unexpected high load. Please be patient and try not to overload the web site.

Update: normal operation restored.

Update: Topcloud intentionally offline

Friday 1 August 2008

XMPP Authentication Refused

One of the XMPP servers refuses to authenticate clients. DB connection problem.

(Restarting XMPP server + Taking actions to prevent the annoying client dialog box in the upcoming version. Improving DB connection.)

Update: Operation restored. Clients reconnect.

Thursday 31 July 2008

XMPP Authentication Refused

One of the XMPP servers refuses to authenticate clients. DB connection problem.

(Restarting XMPP server)

Update: Clients reconnect, but initially affected clients show a dialog box.

Thursday 19 June 2008

Server restart

Database and XMPP servers have been restarted for maintenance.

(Comment: we are thinking about a regular scheduled downtime, but no decision yet)

Sunday 15 June 2008

Cache malfunction

A caching server is offline. The problem affects a part of the population.

Update: problems solved, but still starting (too) slow while the cache warms up.

Tuesday 27 May 2008

Operation resumed

Contact list may be distorted until all users changed their status at least once.

System malfunction

A malfunction in the cache subsystem requires partial shutdown. We are working to bring it back up quickly.

Update: Status checks need longer than expected.

Monday 19 May 2008

Reconnect test

The XMPP server cluster will be rebooted under max load to test automatic reconnect of all clients.

Thursday 8 May 2008

1st

Hello World