Opened 11 years ago

Closed 10 years ago

#64 closed task (fixed)

Intermittent authentication failure

Reported by: johren@bbn.com Owned by: jack.hong@nicta.com.au
Priority: major Milestone: GEC19
Component: Authentication/Authorization Version: Sprint6
Keywords: Cc:
Dependencies:

Description

We are seeing intermittent authentication failures in Labwiki. It succeeds if you click on try again. This is the error message that is received:

Authentication failed. Server https://portal.geni.net/server/server.php responds that the 'check_authentication' call is not valid

Change History (11)

comment:1 Changed 10 years ago by johren@bbn.com

Response from Tom:

Jeanne and I spent some time looking at this issue. It is easy to reproduce. Unfortunately there’s nothing in the logs in my side. The error does not induce any log messages.

I googled for “responds that the 'check_authentication' call is not valid" and turned up: https://github.com/openid/ruby-openid/issues/1 Is that the client you are using?

I also turned up these as the next four hits on google:

https://github.com/nbudin/devise_openid_authenticatable/issues/7
https://github.com/openid/ruby-openid/issues/51
http://developer.yahoo.com/forum/OpenID-General-Discussion/check-authentication-issue-with-ruby-openid/1264356406000-83688213-2436-39d0-85c0-ef1b75ff8ba3
http://stackoverflow.com/questions/1740138/openid-check-authentication-not-working

The first four reference ruby-openid, and may have some useful suggestions and/or patches.

Note: the error may still be on our server side, I don’t know. Unfortunately we’ll have to instrument our server with log messages which we really shouldn't do this close to the GEC. We are under a code freeze at this time (as of Oct. 13).

I opened a ticket for us: http://trac.gpolab.bbn.com/proto-ch/ticket/869 to track this issue.

Please let me know if any of this helps.

Sorry for the trouble,
Tom

comment:2 Changed 10 years ago by jack.hong@nicta.com.au

thanks tom,

we indeed use ruby-openid, and i actually went through these info on the web, and already implemented some of the workaround they suggested. Unfortunately none of the solutions is a definitely fix by going through some of the replies (works for some openid provider, not the others, e.g.)

i will investigate more in the near future, and could try different openid options, also try to capture data exchanged between client and different openid providers. I hope eventually I could spot some useful things that could help debugging on both sides.

cheers.

j.

comment:3 Changed 10 years ago by johren@bbn.com

Milestone: GEC18GEC19
Version: WrapUpBacklog

comment:4 Changed 10 years ago by johren@bbn.com

Version: BacklogSprint2

This is being tracked within GPO as http://trac.gpolab.bbn.com/proto-ch/ticket/869. Adding this to the ticket to keep track of this association even though most will not be able to access this ticket.

comment:5 Changed 10 years ago by jack.hong@nicta.com.au

After many attempts, seems I cannot reproduce this issue any more. Wondering if any changes performed on the server side, or I am just extremely lucky.

This little app here http://geniopenidtest.herokuapp.com/ can be used to test, and I tried instances on emmy9 as well.

comment:6 Changed 10 years ago by johren@bbn.com

This morning, I tried 20 times using Labwiki (port 4000) with Chrome and 20 times using Labwiki Firefox. I was able to hit the failure 7/20 on Chrome and 6/20 on FF. I did see as many as 7 successful logins in a row before hitting this failure.

I tried again using your app. This time I got 8/20 failures on Chrome and 6/20 on FF. The failures produced the following error message:

Bad signature in response from https://portal.geni.net/server/server.php

Let me know if there is any other information I can gather when reproducing. Or I can set up remote access to a system that reproduces the error if that helps at all.

comment:7 Changed 10 years ago by johren@bbn.com

Hmmmm, I think I've stumbled upon something interesting. I tried again to reproduce the problem after the portal upgrade to make sure that didn't have any impact on the issue. I was still able to reproduce the failure with my account (johren). However, I also tried testing this with my second "test" account (johren2). I was NOT able to reproduce the problem after

over 20 tries when I used the johren2 account. I was using user johren in Chrome and johren2 in FF. I then tried swapping browsers (johren2 in Chrome and johren in FF) and I found that the problem followed the johren user.

So, what information do we need to gather to shed more light on this problem? Is it something that Tom or I can gather or do we need to find another user for Jack to use to reproduce the problem?

comment:8 Changed 10 years ago by johren@bbn.com

Version: Sprint2Sprint3

comment:9 Changed 10 years ago by johren@bbn.com

Version: Sprint3Sprint5

Moving to sprint 5 when Jack returns from vacation.

comment:10 Changed 10 years ago by johren@bbn.com

Version: Sprint5Sprint6

comment:11 Changed 10 years ago by johren@bbn.com

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.