problem:
single plane raw formats are not part of rfc4175, and as such 2nd class citizens to gstreamer, RTSP or anywhere else.

solution:
an image format is just a convention, internally they are byte(octet) streams. we can re-package a datastream to another format, as long as we mind our strides.

view the image format as just a stream of octets. here is our stream in Y8 format

     +--+--+--+--+ +--+--+--+--+
     |Y1|xx|xx|xx| |Y1|xx|xx|xx| ...
     +--+--+--+--+ +--+--+--+--+

and again, data quartered and represented in 8 bit depth and with stride of 4

     +--+--+--+--+ +--+--+--+--+
     |R0|G0|B0|A0| |R1|G1|B1|A1| ...
     +--+--+--+--+ +--+--+--+--+

note the above represents two sequential pixels.

the trick is to use gstreamer built in format spec and let that handle the conventions.

the hammer

gst:y8torgba.png

as easy as

gst-launch-1.0 -q \
  videotestsrc pattern=white num-buffers=1 \
! video/x-raw, format=GRAY8, width=4, height=4 \
! identity dump=1 \
! rawvideoparse format=rgba width=1 height=4 \
! identity dump=1 \
! fakesink
00000000 (0x7f369c00a8c0): eb eb eb eb eb eb eb eb eb eb eb eb eb eb eb eb  ................
00000000 (0x556228700560): eb eb eb eb eb eb eb eb eb eb eb eb eb eb eb eb  ................

note: why white pixel is not FF=(255)?
gstreamer uses YUV colorimetry when generating a GRAY8 test image
and YUV doesnt use the entire 0-255 space. issue

note2: this example is running on linux/debian,
to adapt to windows using powershell replace \ with ` and
place quotes around video/x-raw, format=GRAY8, width=4, height=4

gray8 raw video transported as RTP payload via a “fake” rgba parsing step
conforming to RFC4157
pay

gst-launch-1.0 -v \
  videotestsrc pattern=white is-live=1 \
  ! video/x-raw, format=GRAY8, width=4, height=4, framerate=30/1 \
  ! rawvideoparse width=1 height=4 format=rgba \
  ! rtpvrawpay \
  ! queue \
  ! udpsink host=localhost port=5000

depay and back to gray8

gst-launch-1.0 -v \
  udpsrc port=5000 \
  ! "application/x-rtp, media=(string)video, clock-rate=(int)90000, encoding-name=(string)RAW, sampling=(string)RGBA, depth=(string)8, width=(string)1, height=(string)4, colorimetry=(string)SMPTE240M, payload=(int)96" \
   ! rtpvrawdepay \
   ! queue \
   ! rawvideoparse width=4 height=4 format=gray8 \
   ! videoconvert \
   ! fpsdisplaysink

there are caveats to this approach (yuv). get the hammer and study the design document.

pstride only exist for simple pixel formats. It represent the distance in bytes between two pixels. This value, when it exists, is constant for a specific format.

rtstride (row stride), is the size, in bytes, of one [image] line. In general, it must be larger or equal to pstride * width, when pstride exist for your format. This design document outline the calculation of the default strides and offsets [channels\planes] within the libgstvideo library. Notice the use of RU4 or RU2, which are round up, these are used to ensure the each row starts on an align pointer. This simplify memory access in software processing.

When you use VAAPI, you are using some HW accelerators. These accelerators have wider memory alignment requirement, hence the end results is that strides will be larger, and sometimes there will also be couple of padding rows. question

GRAY8
8-bit grayscale “Y800” same as “GRAY8”

Component color depth pstride offset rstride size
0 Y 8 1 0 width rstride*height
+--+--+--+--+ +--+--+--+--+
|Y0|x |x |x | |Y1|x |x |x | ... (two pixel representation, in octet))
+--+--+--+--+ +--+--+--+--+

RGBA
rgb with alpha channel last (big-endian)

Component color depth pstride offset rtsride size
0 R 8 4 0 width*4 rstride * height
1 G 8 4 1
2 B 8 4 2
3 A 8 4 3
+--+--+--+--+ +--+--+--+--+
|R0|G0|B0|A0| |R1|G1|B1|A1| ...(two pixel representation, in octet)
+--+--+--+--+ +--+--+--+--+

extended format view

extended format view

"GRAY8" 8-bit grayscale "Y800" same as "GRAY8"
        Component 0: Y
          depth:           8
          offset:          0
          pstride:         1
          default rstride: RU4 (width)
          default size:    rstride (component0) * height
 
        Image
          default size:    size (component0)
 
"RGBA" rgb with alpha channel last
       +--+--+--+--+ +--+--+--+--+
       |R0|G0|B0|A0| |R1|G1|B1|A1| ...
       +--+--+--+--+ +--+--+--+--+
 
        Component 0: R
          depth:           8
          pstride:         4
          offset:          0
 
        Component 1: G
          depth:           8
          pstride:         4
          offset:          1
 
        Component 2: B
          depth            8
          pstride:         4
          offset:          2
 
        Component 3: A
          depth            8
          pstride:         4
          offset:          3
 
        Image
          default rstride: width * 4
          default size:    rstride (image) * height

more digging

more digging

/gst-plugins-base/gst-libs/gst/video/video-format.c
#define PACK_GRAY8 GST_VIDEO_FORMAT_AYUV, unpack_GRAY8, 1, pack_GRAY8
static void
unpack_GRAY8 (const GstVideoFormatInfo * info, GstVideoPackFlags flags,
    gpointer dest, const gpointer data[GST_VIDEO_MAX_PLANES],
    const gint stride[GST_VIDEO_MAX_PLANES], gint x, gint y, gint width)
{
  const guint8 *restrict s = GET_LINE (y);
 
  s += x;
 
  video_orc_unpack_GRAY8 (dest, s, width);
}
 
static void
pack_GRAY8 (const GstVideoFormatInfo * info, GstVideoPackFlags flags,
    const gpointer src, gint sstride, gpointer data[GST_VIDEO_MAX_PLANES],
    const gint stride[GST_VIDEO_MAX_PLANES], GstVideoChromaSite chroma_site,
    gint y, gint width)
{
  guint8 *restrict d = GET_LINE (y);
 
  video_orc_pack_GRAY8 (d, src, width);
}
 
#define DPTH8 8, 1, { 0, 0, 0, 0 }, { 8, 0, 0, 0 }
#define PSTR1 { 1, 0, 0, 0 }
#define PLANE0 1, { 0, 0, 0, 0 }
#define OFFS0 { 0, 0, 0, 0 }
#define SUB4 { 0, 0, 0, 0 }, { 0, 0, 0, 0 }
#define PACK_GRAY8 GST_VIDEO_FORMAT_AYUV, unpack_GRAY8, 1, pack_GRAY8
MAKE_GRAY_FORMAT (GRAY8, "raw video", DPTH8, PSTR1, PLANE0, OFFS0, SUB4,
      PACK_GRAY8),

video-format.c

fourcc on linux
MSDN on stride