L3A composite product - nothing is happening

Good afternoon,

After having downloaded 2 months of data and having the L2A products, I have not been able to have sen2agri generating a L3A Composite. The scheduled product creation is still on hold and when I try to manually products one, the job gets scheduled but nothing is happening.

Can you help?

Thank you

Hello,

Could you please check the status (and eventually the logs during the launching of the L3A Composite) for the following services:

  • sudo journalctl -fu sen2agri-orchestrator
  • sudo journalctl -fu sen2agri-executor
  • sudo journalctl -fu sen2agri-scheduler
    I assume that the site and season are still enabled.
    Another check would be to execute the following:
  • log in as user “sen2agri-service” : sudo su -l sen2agri-service
  • Execute: “srun ls”. Does it executes OK?

Also, could you tell us if you executed a “scheduled job” from dashboard or you executed it from “Custom Jobs”?

Best regards,
Cosmin

Hi,

Here it goes:
Could you please check the status (and eventually the logs during the launching of the L3A Composite) for the following services:

sudo journalctl -fu sen2agri-orchestrator

[jonaszed@localhost ~]$ sudo journalctl -fu sen2agri-orchestrator
– Logs begin at Wed 2019-02-27 18:41:43 EST. –
Mar 02 14:41:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Thu Sep 10 2015, end = Sun Jul 10 2016, current=Wed Aug 16 2017)
Mar 02 14:41:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sat Sep 10 2016, end = Mon Jul 10 2017, current=Wed Aug 16 2017)
Mar 02 14:41:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sun Sep 10 2017, end = Tue Jul 10 2018, current=Wed Aug 16 2017)
Mar 02 14:41:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Mon Sep 10 2018, end = Wed Jul 10 2019, current=Wed Aug 16 2017)
Mar 02 14:41:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3B: Error getting season start dates for site 1 for scheduled date Wed Aug 16 00:00:00 2017!
Mar 02 14:41:52 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:42:02 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:42:12 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:42:22 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 1 ms
Mar 02 14:42:32 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 1 ms
Mar 02 14:42:42 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteDescriptions took 3 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetProducts took 7 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduled job for L4A and site ID 1 with start date Sat Sep 10 00:00:00 2016 and end date Wed Mar 15 00:00:00 2017 will not be executed (no products)!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 5 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteDescriptions took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 5 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler CropType: Error no shapefile found for site 1!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 5 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteDescriptions took 3 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 5 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler CropType: Error no shapefile found for site 1!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Thu Sep 10 2015, end = Sun Jul 10 2016, current=Mon Aug 15 2016)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Thu Sep 10 2015, end = Sun Jul 10 2016, current=Mon Aug 15 2016)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sat Sep 10 2016, end = Mon Jul 10 2017, current=Mon Aug 15 2016)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sun Sep 10 2017, end = Tue Jul 10 2018, current=Mon Aug 15 2016)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Mon Sep 10 2018, end = Wed Jul 10 2019, current=Mon Aug 15 2016)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3B: Error getting season start dates for site 1 for scheduled date Mon Aug 15 00:00:00 2016!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteDescriptions took 3 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetProducts took 7 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduled job for L4A and site ID 1 with start date Thu Sep 10 00:00:00 2015 and end date Tue Mar 15 00:00:00 2016 will not be executed (no products)!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3A: Getting season dates for site 1 for scheduled date Wed Sep 30 00:00:00 2015!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3A: Extracted season dates: Start: Thu Sep 10 00:00:00 2015, End: Sun Jul 10 00:00:00 2016!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetProducts took 7 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduled job for L3A and site ID 1 with start date Thu Sep 10 00:00:00 2015 and end date Fri Sep 25 00:00:00 2015 will not be executed (no products)!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sat Sep 10 2016, end = Mon Jul 10 2017, current=Wed Aug 16 2017)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Thu Sep 10 2015, end = Sun Jul 10 2016, current=Wed Aug 16 2017)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sat Sep 10 2016, end = Mon Jul 10 2017, current=Wed Aug 16 2017)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sun Sep 10 2017, end = Tue Jul 10 2018, current=Wed Aug 16 2017)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Mon Sep 10 2018, end = Wed Jul 10 2019, current=Wed Aug 16 2017)
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3B: Error getting season start dates for site 1 for scheduled date Wed Aug 16 00:00:00 2017!
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:48 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 5 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteDescriptions took 3 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 5 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler CropType: Error no shapefile found for site 1!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3A: Getting season dates for site 1 for scheduled date Fri Sep 30 00:00:00 2016!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3A: Extracted season dates: Start: Sat Sep 10 00:00:00 2016, End: Mon Jul 10 00:00:00 2017!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 5 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetProducts took 7 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduled job for L3A and site ID 1 with start date Sat Sep 10 00:00:00 2016 and end date Sun Sep 25 00:00:00 2016 will not be executed (no products)!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3A: Getting season dates for site 1 for scheduled date Mon Apr 30 00:00:00 2018!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3A: Extracted season dates: Start: Sun Sep 10 00:00:00 2017, End: Tue Jul 10 00:00:00 2018!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetProducts took 8 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduled job for L3A and site ID 1 with start date Tue Mar 6 00:00:00 2018 and end date Wed Apr 25 00:00:00 2018 will not be executed (no products)!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sun Sep 10 2017, end = Tue Jul 10 2018, current=Fri Aug 31 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Thu Sep 10 2015, end = Sun Jul 10 2016, current=Fri Aug 31 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sat Sep 10 2016, end = Mon Jul 10 2017, current=Fri Aug 31 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sun Sep 10 2017, end = Tue Jul 10 2018, current=Fri Aug 31 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Mon Sep 10 2018, end = Wed Jul 10 2019, current=Fri Aug 31 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler CropMask: Error getting season start dates for site 1 for scheduled date Fri Aug 31 00:00:00 2018!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteDescriptions took 3 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler CropType: Error no shapefile found for site 1!
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetConfigurationParameters took 6 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: GetSiteSeasons took 4 ms
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sun Sep 10 2017, end = Tue Jul 10 2018, current=Thu Aug 16 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Thu Sep 10 2015, end = Sun Jul 10 2016, current=Thu Aug 16 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sat Sep 10 2016, end = Mon Jul 10 2017, current=Thu Aug 16 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Sun Sep 10 2017, end = Tue Jul 10 2018, current=Thu Aug 16 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: IsInSeason: Date not in season (start = Mon Sep 10 2018, end = Wed Jul 10 2019, current=Thu Aug 16 2018)
Mar 02 14:42:49 localhost.localdomain sen2agri-orchestrator[7439]: Scheduler L3B: Error getting season start dates for site 1 for scheduled date Thu Aug 16 00:00:00 2018!
Mar 02 14:42:52 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:43:02 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:43:12 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:43:22 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:43:32 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms
Mar 02 14:43:42 localhost.localdomain sen2agri-orchestrator[7439]: GetNewEvents took 3 ms

sudo journalctl -fu sen2agri-executor

^C[jonaszed@localhost ~]$ sudo journalctl -fu sen2agri-executor
– Logs begin at Wed 2019-02-27 18:41:43 EST. –
Feb 28 00:01:02 localhost.localdomain sen2agri-executor[668]: MarkStepPendingStart took 2 ms
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: GetProcessorDescriptions took 3 ms
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: GetConfigurationParameters took 5 ms
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: GetProcessorDescriptions took 3 ms
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: GetConfigurationParameters took 5 ms
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: HandleStartProcessor: Executing command srun with params --qos qoslai --job-name TSKID_27683_STEPNAME_BVInputVariableGeneration_0 /usr/bin/sen2agri-processor-wrapper SRV_IP_ADDR=127.0.0.1 SRV_PORT_NO=7777 WRP_SEND_RETRIES_NO=3600 WRP_TIMEOUT_BETWEEN_RETRIES=1000 WRP_EXECUTES_LOCAL=1 JOB_NAME=TSKID_27683_STEPNAME_BVInputVariableGeneration_0 PROC_PATH=/usr/bin/otbcli PROC_PARAMS BVInputVariableGeneration -samples 40000 -out /mnt/archive/orchestrator_temp/l3b/128/27683-lai-bv-input-variable-generation/out_bv_dist_samples.txt -minlai 0.0 -maxlai 5.0 -modlai 0.5 -stdlai 1.0 -minala 5.0 -maxala 80.0 -modala 40.0 -stdala 20.0
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: HandleStartProcessor: Executing command sbatch with params --job-name TSKID_27683_STEPNAME_BVInputVariableGeneration_0 --qos qoslai /tmp/sen2agri-executor.vrA668
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: HandleStartProcessor: Sbatch command returned: "Submitted batch job 202
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: "
Mar 01 00:01:14 localhost.localdomain sen2agri-executor[668]: MarkStepPendingStart took 6 ms

sudo journalctl -fu sen2agri-scheduler

C[jonaszed@localhost ~]$ sudo journalctl -fu sen2agri-scheduler
– Logs begin at Wed 2019-02-27 18:41:43 EST. –
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: UpdateScheduledTasksStatus took 33 ms
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 5, siteId: 1 cannot be started now as is invalid
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 3, siteId: 1 cannot be started now as is invalid
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 6, siteId: 1 cannot be started now as is invalid
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 2, siteId: 1 cannot be started now as is invalid
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 6, siteId: 1 cannot be started now as is invalid
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 6, siteId: 1 cannot be started now as is invalid
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 5, siteId: 1 cannot be started now as is invalid
Mar 02 14:51:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 3, siteId: 1 cannot be started now as is invalid
Mar 02 14:51:49 localhost.localdomain sen2agri-scheduler[5372]: UpdateScheduledTasksStatus took 26 ms
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: GetScheduledTasks took 5 ms
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: UpdateScheduledTasksStatus took 32 ms
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 5, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 6, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 6, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 3, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 5, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 2, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 3, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:48 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 6, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:49 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 2, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:49 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 2, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:49 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 5, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:49 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 6, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:49 localhost.localdomain sen2agri-scheduler[5372]: The job for processor: 3, siteId: 1 cannot be started now as is invalid
Mar 02 14:52:49 localhost.localdomain sen2agri-scheduler[5372]: UpdateScheduledTasksStatus took 24 ms

I assume that the site and season are still enabled.
Yes, the site exists with 4 seasons enabled, each of them corresponding to a yearly season.

Another check would be to execute the following:

log in as user “sen2agri-service” : sudo su -l sen2agri-service

Execute: “srun ls”. Does it executes OK?

The output is:

sen2agri-service@localhost ~]$ srun ls
srun: Required node not available (down, drained or reserved)
srun: job 204 queued and waiting for resources

Also, could you tell us if you executed a “scheduled job” from dashboard or you executed it from “Custom Jobs”?
I have scheduled monthly jobs (default) and as nothing was happening I have tried 4 custom jobs.

Best regards,

João

Help anyone? I am really stuck…

Hello,

It seems that your problem is not actually related to the processors but to SLURM.

srun: Required node not available (down, drained or reserved)
srun: job 204 queued and waiting for resources

Did you ran out of disk space on the system partition at some point (or maybe you still are running out of disk space on this partition)?
In this case, make sure you have enough disk space and then restart the SLURM services.
You can check the errors that you have with SLURM checking its log files: /var/log/slurm/slurm.log, /var/log/slurm/slurmd.log.
You can try restarting with systemctl the following services:

slurmd, slurmdbd, slurmctld and mariadb

After that you can try:

sudo -u sen2agri-service scontrol update NodeName=localhost State=RESUME

Hope this helps.

Best regards,
Cosmin

Hi,

First and foremost I would like to thank you for your support.

Hello,

It seems that your problem is not actually related to the processors but to SLURM.

srun: Required node not available (down, drained or reserved)
srun: job 204 queued and waiting for resources

Did you ran out of disk space on the system partition at some point (or maybe you still are running out of disk space on this partition)?
In this case, make sure you have enough disk space and then restart the SLURM services.

I have actually run out of space a few weeks ago, I have increased the group volume when I have noticed it, currently I have still a few TB available.

You can check the errors that you have with SLURM checking its log files: /var/log/slurm/slurm.log, /var/log/slurm/slurmd.log.

Can I send the log files to you?

You can try restarting with systemctl the following services:

slurmd, slurmdbd, slurmctld and mariadb

I have restarted them using systemcl restart, they all appear as running

After that you can try:

sudo -u sen2agri-service scontrol update NodeName=localhost State=RESUME

Command executed without any errors.

How can I check if things are working_

Regards,

João

Hello,

First, you can try again a

srun ls

under the sen2agri-service user account.
If this command is successful, you can check that the system is processing your jobs using :
journalctl -fu sen2agri-executor
The things should be moving here if everything is OK with SLURM.

Best regards,
Cosmin

1 Like

Working like a charm, thank you, you are a star

Dear Cosmin,
please and what if I do not get successful result from command
srun ls ?

  • Unable to allocate resources:: Unable to contact slurm controller (connect failure).

here are my logs:
here are my errors:

slurmctld.service - Slurm controller daemon
Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor pr eset: disabled)
Drop-In: /etc/systemd/system/slurmctld.service.d
└─slurmctld.override.conf
Active: failed (Result: exit-code) since Sun 2021-01-03 23:43:41 CET; 3 days ago
Main PID: 12680 (code=exited, status=1/FAILURE)

tail: cannot open ‘/var/log/slurm/slurmdbd.log’ for reading: No such file or directory
==> /var/log/slurm/slurmd.log <==
[2021-01-07T16:49:24.933] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:49:34.937] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:49:44.940] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:49:54.944] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:50:04.948] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:50:14.953] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:50:24.959] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:50:34.964] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:50:44.967] error: Unable to register: Unable to contact slurm controller (connect failure)
[2021-01-07T16:50:54.971] error: Unable to register: Unable to contact slurm controller (connect failure)

==> /var/log/slurm/slurm.log <==
[2021-01-03T23:43:42.031] read_slurm_conf: backup_controller not specified.
[2021-01-03T23:43:42.031] Reinitializing job accounting state
[2021-01-03T23:43:42.031] Ending any jobs in accounting that were running when controller went down on
[2021-01-03T23:43:42.031] cons_res: select_p_reconfigure
[2021-01-03T23:43:42.031] cons_res: select_p_node_init
[2021-01-03T23:43:42.031] cons_res: preparing for 2 partitions
[2021-01-03T23:43:42.031] Running as primary controller
[2021-01-03T23:43:42.031] Registering slurmctld at port 6817 with slurmdbd.
[2021-01-03T23:43:42.190] error: slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn’t been added to accounting yet)
[2021-01-03T23:43:42.191] fatal: You need to add this cluster to accounting if you want to enforce associations, or no jobs will ever run.

thanks a lot,
eva

Dear Eva,

Seems that slurm is not starting from some reasons.
A reinstall and recreation of all slurm structures (code extracted from the install script) solved the problem.
I will include this script also in the next version of the system.

Best regards,
Cosmin

1 Like

Hello Cosmin,

great to see that the project is still going on.

Since version 2.0.3 we also have problems with the L3A data. The custom jobs are always displayed as “submitted”, but not processed. In the logs the job appears, but no start or error.

Running /usr/bin/composite_processing.py manually worked only after removing in line 307 “-pmxml”, firstMasterFile
was removed. Is this parameter important?

Hello,
When the jobs/steps remain with the status submitted, usually there is an issue of SLURM executing the jobs.
You can try running :

sudo su -l sen2agri-service
srun ls -al

If you obtain the following error and if at some point you ran out of disk space on the root partition, you can try the following command:

sudo -u sen2agri-service scontrol update NodeName=localhost State=RESUME

Please let me know if this solves your issue.

Best regards,
Cosmin

Hello Cosmin,

thank you very much for your reply!
In fact, running out of diskspace seems to be a serious problem, because since then we were not able to run any of the l3 processes again.
We tried your recommendations (after cleaning the root partition) but without any success. In the end we reinstalled the test machine (pretty easy using ansible: Sen2agri ansible ).

Now we are aware of this issue and take care of sufficient disk space. The error did not occur again.