cannot reproduce the reported best result "2Channel2logit" #2
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I've managed to train a model using
firmas(4000 signers) dataset but the training has some problems.I did the following preprocess progress:
preprocess_image.pyto binarizefirmasinvertedly. (black background and white color signature strokes)generate_list_firmas.pyrun.pywith correct path settings (keep training hyperparameters unchanged)The training threw an exception at
model.pyline 266:and I suspend the reason is because
tf.div's divisor is zero. After fixing this bug, the training could be continued and loss will stop improving just after step 300:evaluation_auc = 0.4823507, global_step = 0, loss = 0.6020629, sec_at_spe = 0.086208425 ... global_step = 300, negative_distance = 0.0, positive_distance = 0.0Thanks a million if you could advise how to reproduce your solution.
Hi, The work can be reproduced on Tensorflow1.7, did you run the code on tf2.0?
Please tell me if it still doesn't work on tf1.7.
Actually, binarizing and inverting only have little effect.
I'm using TF1.14 not 2.0. it requires a big change if switching to 2.0.
Let me try to reproduce with TF1.7.
BTW, I also notice some messy code in
input_fn. Will open another issue to clarify. Thanks!after I changed to TF 1.7.1 with CUDA 9.0, I have the following error, which may due to the same
nanI think:It looks like that the loss has become nan after the first batch, I have never seen this before, traing processes are always stable.
what's your learning rate? have tried with a smaller lr?
I think the author does not open source the loss 2Channel2logit , the loss _loss_inception_2logits is not 2Channel2logit yet
Acctually, It is...
Any solutions or updates on this error? I get the same error with the same tensorflow versions using the CEDAR dataset. Trying smaller learning rates or changing the batch size did not work.