Sunday, June 20, 2021

Solving lwip troubles.

Some time after the introduction of lwip core locking into ChibiOS port, I started to notice that MQTT still tends to disconnect and connect right after. Same problem as before core locking. This is not very good that all other lwip process seems to work well, but MQTT is keep on reconnecting. Well lwip is a big and complex library to jump in and decode all possible problems, and I decided to start with MQTT client part only. There are quite a few debugging facilities in lwip and I turned on many to start seeing the whole picture.

Nice informative way to see information about lwip is to enable LWIP_STATS. There are many options what to see, like to examine possible memory problems by defining:
#define MEMP_STATS 1
#define PBUF_STATS 1
#define SYS_STATS 1

And adding:

  stats_display(); // Show lwip statistics

into while (true) lwip thread.

I started to torture again OHS application fir frequent radio packets resulting into MQTT publish messages. All was fine even with heavy usage of web interface and MQTT subscribe. But after 5 days the statistics started to show an error on MEMP TCP PCB and the in used number of same did not cease even after no other TCP communication then MQTT was used. I increased the number of MEMP_NUM_TCP_PCB to 50 and started again. 5 days later the same problem.

Examining MQTT library or searching forums did not help much. In logs I found that the client receives an empty pbuf which is forcing the client to disconnect. Mosquitto broker then does not see the keep alive periodic packet and also closes the connection. But I started to measure the exact time when the MQTT get out of pbufs, and it a bit surprised me that it happens exactly after 4.9 days. To be precise after 2^32 / 10000 = 429496,7296 seconds. There are timers in lwip that make various time dependent tasks, like removing used pbufs or dhcp. Deep diving into it I found out that the problem is not in lwip, but in ChibiOS port. The sys_now() function does not overflow on u32_t boundary but actually much earlier, causing the timers wait forever.

A patch of the sys_now() function seems to fix the problem completely. It comes with new OHS version 1.3.8 with a few cosmetics enhancements described in particular commits on GitHub. 

It is recommended to upgrade if you use MQTT.

Wednesday, April 7, 2021

Fixing MQTT

There was lately a lot of changes done in gateway MQTT processing. First of all I've introduced a lwip_assert_core_locked() function which present as helper macro throughout lwip network stack, but it is defined as empty. This macro is a quite useful feature, especially if lwip is running multiple sockets. ChibiOS has very functional and well defined bindings to lwip, however lacks the lwip_assert_core_locked() function definition. To benefit from this lock I've added following code to lwipopts.h:

void lwip_assert_core_locked(void);
#define LWIP_ASSERT_CORE_LOCKED() lwip_assert_core_locked();

And the actual function to /arch/sys_arch.c:

void lwip_assert_core_locked(void) {
  // If the mutex hasn't been initialized yet, then give it a pass.
  if (lock_tcpip_core == SYS_SEM_NULL) return;
  // Ensure that the mutex is currently taken (locked).
  if (chSemWaitTimeoutS(lock_tcpip_core, TIME_IMMEDIATE) == MSG_OK) {
    chSysHalt("TCPIP core is not locked!");

Then basically you need to wrap every lwip API call with following macros:

This helped a lot, since without the locking the gateway after some time was not sending ACK to MQTT broker, and was forced to reconnect every 5 or so minutes.

Second lwip source was updated to latest dev branch, which has some fixes for MQTT in it as well. 

Last, but not least, there are several changes introduced to MQTT thread. The connection/re-connection function was moved there as well from service thread.

All this is is pushed to GitHub as version 1.3.6.

Saturday, March 20, 2021

Release notification

Just a short post about OHS firmware. In case you would like to be notified about new version being  available for download, you can subscribe in GitHub directly by "Watch" button on top right. Either watch all, or "custom" and then just the releases.

More about releases, I've just pushed few commits to GitHub releasing 1.3.5. It's adding more support for remote zones, adding debug message to alarm thread, and fixing few small bugs.

Tuesday, March 16, 2021

Authentication node changes

As you maybe know, the OHS gateway 2.x offers except standard arm away mode also arm home mode.
This allows gateway to monitor only zones, that you manually setup out of specific group, while in arm home state. This change required also some changes in authentication mechanism. As authentication nodes I offer does not have any keyboard to differentiate between arm home and arm away state, they do it based on time the iButton is connected to the probe. That is, short touch and the group receive an arm away command, while long (over 1 second) touch of the probe, and group receive arm home command. 

This working quite well, but after using it for several months I found out that the disarm command also waits for the end of scanning period (1.4 second), trying to recognize arm home / away command. Which had me wondering I put the iButton correctly many times while disarming. Then I realized it does not have to wait for it at all. Simply there is no need, as the disarm command does not differ as arm state. Luckily the node already knows which state the group is in. Realizing all that, I've updated the authentication node sketches and put new version on GitHub. Now the scanning period has two different time values, 1.4 second for arm state, and 0.4 second for disarm state. It just feels more comfortable to disarm if you receive instant feedback.

While on it, I've also updated the set default function, to make it more clear what and where is all set. And then, I've used these changes to propagate values of the elements. Hopefully making it simpler to understand. Feel free to update.

One more small hardware change, I've changed the iButton probe connector and matting PCB connector to JST X2.54, it is much flatter then previously used connector allowing it to fit in narrower spaces.

Sunday, February 7, 2021

Zone expansion board

Over last few weeks I was playing with new addition to OHS, zone expansion board. The
board has 8 analog(balanced) and 1 digital(unbalanced) ports. Analog ports are hybrid, meaning that they can be switched in software to unbalanced. Connection can be established via wire(RS485) or radio(RFM69). It has onboard relay and similar power detection circuit as gateway. There are 2 free pins left that are taken out to allow some user application like additional relay board.

The relay present on expansion board can added as basic relay output, or it can be registered as remote Horn/Siren, by the configuration definition described in expansion board sketch. Also new functionality was added into gateway firmware, which is now able to recognize such remote Horn/Siren nodes and turn them on, when associated group is in alarm state.

Sketch is, as always, present in GitHub. And it is also available for sale.

Wednesday, February 3, 2021

2.0.4 gateway footprint

In case someone would like to drill holes to hold the gateway up in metal casing, or have better understanding of connectors and components on 2.0.4 PCB I've put a PDF file in Google drive for print.

Tuesday, February 2, 2021

A7600E modem tests

The new SimCom A7600E modem is pin to pin compatible and working well in the gateway, so far no software changes needed.  Supporting both 2G and 4G/LTE. One benefit is that it comes up and also shuts down faster then 7600E or G.