Home
JournalsCollections
For Authors For Reviewers For Editorial Board Members
Article Processing Charges Open Access
Ethics Advertising Policy
Editorial Policy Resource Center
Company Information Contact Us
Publications > Journals > Gene Expression> Article Full Text
OPEN ACCESS

Group Theory of Messenger RNA Metabolism and Disease

  • Michel Planat1,* ,
  • Marcelo Amaral2,
  • David Chester2,
  • Fang Fang2,
  • Raymond Aschheim2 and
  • Klee Irwin2
Gene Expression   2024;23(4):264-272

doi: 10.14218/GE.2023.00079

Received:

Revised:

Accepted:

Published online:

 Author information

Citation: Planat M, Amaral M, Chester D, Fang F, Aschheim R, Irwin K. Group Theory of Messenger RNA Metabolism and Disease. Gene Expr. 2024;23(4):264-272. doi: 10.14218/GE.2023.00079.

Abstract

Background an objectives

Our recent work has focused on the application of infinite group theory and related algebraic geometric tools in the context of transcription factors and microRNAs. We were able to differentiate between “healthy” nucleotide sequences and disrupted sequences that may be associated with various diseases. In this paper, we extend our efforts to the study of messenger RNA (mRNA) metabolism, showcasing the power of our approach.

Methods

To achieve this, we used: (a) infinite (finitely generated) groups , with generators representing the distinct nucleotides and a relation between them [e.g., the consensus sequence in the mRNA translation (i), the poly(A) tail in item (ii), and the microRNA seed in item (iii)]; (b) aperiodicity theory, which connects healthy groups to free groups of rank r and their profinite completion , and (c) the representation theory of groups over the space-time-spin group SL2(C), highlighting the role of surfaces with isolated singularities in the character variety.

Results

We investigate (1) mRNA translation in prokaryotes and eukaryotes, (2) polyadenylation in eukaryotes, which is crucial for nuclear export, translation, stability, and splicing of mRNA, (3) microRNAs involved in RNA silencing and post-transcriptional regulation of gene expression, and (4) identification of disrupted sequences that could lead to potential illnesses.

Conclusion

Our approach could potentially contribute to the understanding of the molecular mechanisms underlying various diseases and help develop new diagnostic or therapeutic strategies.

Keywords

RNA metabolism, MicroRNAs, Diseases, Finitely generated group, SL2(C) character variety, Aperiodicity

Introduction

Genome-scale metabolic pathways, genome-environment interactions, the immune response, post-transcriptional regulatory mechanisms, and oncohistones represent aspects of a research field connecting the heritable genetic code to other biological codes.1–6 The aforementioned genetic code is defined precisely as a noninjective map from the 64 codons to the 20 amino acids. Both finite groups and quantum groups have leading roles in modeling this code.7–10 More explicitly, according to Planat et al.,8 complete quantum information is encoded in the 22 irreducible characters of the small group (240,105) ≌ Z5 ⋊ 2O, with 2O the binary octahedral group. The characters are put in correspondence with the DNA multiplets encoding the proteinogenic amino acids and the multiplicity is reflected in the dimension of the character representation. Further developments were explored in another study by Planat et al.,11 which showed that the small group (336,118) ≌ Z7 ⋊ 2O is another model of the genetic code reflecting the symmetry of the Lsm–7 complexes in the spliceosome. The eight-fold symmetric histone complex was subsequently investigated by Planat et al.,12 with the character table of the group (384, 5,589) ≌ Z8 ⋊ 2O.

The latest studies were the first to describe the role of a specific algebraic surface, called the Kummer surface, in the quantum modeling of the genetic code. From then on, we refer to the epigenetic code as all processes that reveal and execute gene expression. This includes DNA methylation processes,13 messenger RNA (mRNA) translation preparation, the poly(A) tail, the RNA-induced silencing complex, a vital tool in gene regulation comprising single strands of RNA and double strands of small interfering RNA, and other regulatory nucleotide sequence fragments that are discarded after splicing. Ultimately, this involves a relation between the epigenetic code and morphogenesis.14

Chemical modifications of RNA also drive the metabolism of transcription of the genetic information. Post-transcriptional regulation of gene expression is a hot topic known as epitranscriptomics. There are more than 170 known types of RNA methylation processes but the most common in eukaryotes is the possible methylation of N6-methyladenosine (m6A) on sites with a specific short sequence RRACH (R = A or G, H = A, U, or C).15–17

To study the epigenetic code (hereinafter referred to as the e-code), we used infinite (finitely generated) groups denoted by fp, and their representations over the (2 × 2) matrix group SL2(C), where the entries are complex numbers.18,19 The significance of this group extends across all fields of physics, as it represents a space-time-spin group. In this study, we applied a mathematical field known as algebraic geometry to define the e-code, which has not been done before.

Our key observation is that an fp group associated with a healthy sequence usually approximates a free group Fr, where the rank r equals the number of distinct nucleotides minus one. A sequence deviating from this may suggest a potential e-code deregulation leading to a disease. However, an fp group closely resembling a free group does not provide sufficient assurance against a disease. Additional examination of the SL2(C) representations of fp, termed the character variety, and specifically its basis called a Groebner basis G is necessary. The G comprises a set of surfaces. A surface within G containing isolated singularities indicates another potential disease that can be identified specifically, e.g., relating to an oncogene or a neurological disorder.19 The e-code we define comprises such algebraic geometric characteristics.

An additional attribute of healthy sequences, which leads to a group fp approximating the free group Fr and not mentioned in the study of Planat et al.,19 is their connection to aperiodicity. Schrodinger proposed the periodicity of living crystals.20 Planat et al.19 characterized aperiodic DNA sequences.21 We advanced this concept by introducing the so-called profinite completion F^r of the free group Fr. A sequence fp(l) of finitely generated groups approaching Fr emerges by applying l repeated substitutions to the generators of fp. However, all distinct groups fp(l) should possess the same profinite completion Fr. Profinite groups F^1 (corresponding to sequences containing two distinct nucleotides) and F^2 (corresponding to sequences containing three distinct nucleotides) have been examined in a prominent algebraic geometry treatise.22 We present the details below in a manner that is accessible to a non-specialist reader. In the Methods section, we illustrate our mathematical concepts through a few simple pedagogical examples. In the Results section, we apply these concepts to cases of mRNA translation, microRNAs (miRNAs), and m6A methylation. In the Discussion, we provide additional comments, a summary diagram, and perspectives.

Methods and preliminary results

Infinite finitely generated groups fp and free groups Fr

TATA box

We start with a simple example of an infinitely finitely generated group taken from the context of introns. The DNA sequence located in the core promoter region of many eukaryotic genes is the Goldberg–Hogness sequence, also known as the TATA box. This sequence contains a noncoding segment with repeated T and A base pairs. The TATA box serves as the binding site for the TATA-binding protein and other transcription factors in some eukaryotic genes. Its consensus sequence takes the form rel = TATAAAA. Variations in this consensus sequence, resulting from genetic polymorphism, can lead to diseases like Gilbert’s syndrome and immune suppression (https://en.wikipedia.org/wiki/TATA_box ).

In our methodology, we defined the group fp = 〈A,T|rel〉, which contains an infinite number of elements. There are numerous ways to investigate this group, but we opted for a specific one. This method involves calculating the number of conjugacy classes of subgroups of index d of fp (a sequence we refer to as the card seq of fp). The card seq of fp for the selected TATA sequence is [1,1,2,3,2,8,7,10,18,28···]. Interestingly, the group H3 = 〈A, T|A2 = T3〉 has a similar card seq (at least up to the highest index we can reach with the calculations). The group H3, as defined, is isomorphic to the so-called modular group PSL(2,Z) – the projective special linear group of (2 × 2) matrices of determinant 1 with integer entries. This group has an intriguing topological interpretation as the fundamental group of the trefoil knot manifold. Thus, we find that the group fp is close to H3 as the card seq of both groups is the same, but we can easily verify that fp and H3 are not isomorphic. According to Planat et al.,23 the Hecke groups Hq = 〈A, T|A2 = Tq〉, with q = 3 or 4, have a card seq corresponding to healthy TATA box sequences. The fp group for a TATA box with a card seq resembling that of Hecke groups, with q ≠ 3 or q ≠ 4, or even that of groups slightly different from H3 and H4, signifies Gilbert’s syndrome.

Polyadenylation signals

For our second example, we select a sequence from the context of eukaryotic polyadenylation (https://en.wikipedia.org/wiki/Polyadenylation ). Polyadenylation involves the addition of a poly(A) tail to an RNA transcript, usually a mRNA. A consensus poly(A) sequence takes the form rel1 = AAUAAA, which corresponds to a two-generator group of the form fp = 〈AU|rel1〉. The card seq of such a group is found to be [1,1,1,1,1,1,1,1,1,1,···], implying a single conjugacy class for each index. It appears that the free group F1 = 〈A, U|AU〉, of rank 1, has the same card seq as the fp group with relation rel1, even though neither group is isomorphic. Another consensus poly(A) sequence takes the form rel2 = UGUAA, which corresponds to a three-generator group of the form fp 〈A, U, G|rel2〉. The card seq of such a group is found to be [1,3,7,26,97,624,4,163,···]. Intriguingly, the free group F2 = 〈A, U, G|AUG〉, of rank 2, has the same card seq as the fp group with relation rel2, despite both groups not being isomorphic. From our perspective, DNA/RNA sequences that lead to fp groups closely resembling a free group are considered healthy sequences.19,21,23 The standard poly(A) sequences mentioned earlier play a regulatory role in producing mature mRNA during translation. Sequences that generate an fp group diverging from a free group Fr may be indicative of a disease.

Aperiodic sequences, their attached groups fp and free groups

In this subsection, we elucidate how a group fp, with a card seq identified to be close to a free group Fr, can be linked to an aperiodic sequence and the profinite completion F^r. We introduced the concept of aperiodic groups and sequences in our earlier papers.21,23 Consider the motif rel = TTTATTA, which serves as a consensus sequence for the transcription factor of the DBX gene in Drosophila melanogaster (fruit fly). This gene is involved in neuronal specification and differentiation. The group fp = 〈A, T|rel〉 has the same card seq as the free group F1 of rank 1. Furthermore, by splitting rel into two segments rel = relArelT and applying the substitution maps A → relA = TTTA, T → relT = TTA, we generate the substitution sequence SDBX = A,T,AT,TTTATTA,TTATTATTATTTATTATTATTTA,···. On inspection, it is straightforward to observe that all finitely generated groups fp(l), with their generators being AT,TTTATTA,TTATTATTATTTATTATTATTTA,···, respectively, have the card seq of F1.

As per the findings of Planat et al.,23 for a substitution rule to be considered aperiodic it must satisfy two conditions: (1) The substitution matrix M must be primitive, meaning it should be a strictly positive matrix (all entries > 0), irreducible, and Mk should be strictly positive for some k. This condition is denoted as M ≫ 0. (2) The Perron–Frobenius λPF eigenvalue must be irrational. It is worth noting that the Perron–Frobenius eigenvector of an irreducible non-negative matrix is the only one whose entries are all positive. The aforementioned sequence has a substitution matrix:

M=(1312).

One can verify that M is primitive as M2 ≫ 0 and λPF=3+13/2. Conditions (1) and (2) are satisfied, implying that the substitution SDBX is aperiodic. Of note, numerous other genes have transcription factors with a motif rel generating an aperiodic sequence.21

Aperiodic sequences and the profinite groups F^r

This section can be skipped without affecting the comprehension of the rest of the paper. It endeavors to answer the question of why the aforementioned groups fp(l) produce the same card seq as that of the free group Fr. The tentative answer is that the profinite completion of all groups fp(l) is the profinite group F^1. By making this observation, we aligned the aperiodicity of sequences with profinite groups. Profinite groups were introduced by Grothendieck in the context of algebraic geometry.22 Here, we describe the necessary ingredients for the layperson, focusing first on F^1 and then on F^2, and their relevance to our present work.

A group G can be considered a topological group by applying discrete topology, in which the elements of G are points of a discrete space, form a discontinuous sequence, and are isolated from each other. Every subset is open in the discrete topology. A profinite group is a topological group that, in a certain sense, is assembled from a system of finite groups. A profinite group requires a system of finite groups and group homomorphisms between them. Given a group G, there is a related profinite group G defined as the inverse limit Ĝ = limG/N, of the groups G/N, where N runs through the normal subgroups of G of finite index. A normal subgroup is a subgroup that remains invariant under conjugation by members of the group. Each finite quotient group corresponds to a normal subgroup N of G and the profinite completion Ĝ can be perceived as containing an analog of each of these normal subgroups. The profinite group Ĝ exhibits several properties: it is nonabelian, residually finite, (meaning that for any nonidentity element g in Ĝ, there exists a finite quotient of Ĝ in which g is not the identity), and totally disconnected (meaning that the only connected subsets of Ĝ are singletons, sets containing only one element). In general, an explicit construction of profinite groups Ĝ cannot be obtained. However, F^1 and F^2 are not too complex to handle.

Considering the profinite group F^1, we begin with F^1. The free group F1 on a single generator can be described as a group with one generator, say a, and no relations. It consists of all possible finite strings that can be formed by combining the generator and its inverse. It is the infinite cyclic group Z = {1,a,a−1,a2,a−2,a3,a−3,···}. Now, we discuss the profinite completion of F1. The profinite group F^1 is isomorphic to the group of all units of the commutative ring of p-adic integers Zp, across all primes p. It is often denoted as Zp*, as it corresponds to the elements of Zp with a valuation of zero. The p-adic integers are a special class of numbers used in number theory and algebraic geometry.

Considering the profinite group F^2, we briefly discuss F^2. This topic was first described by Grothendieck.22 The subject is complex and connected to the so-called Belyi theorem, a fundamental result that establishes a connection between algebraic curves defined over the algebraic closure of the rationals, Q, and certain rational functions called Belyi functions. An algebraic curve defined over Q can be represented as a branched covering of the Riemann sphere (the complex projective line P1(C)) branched only over three points (usually taken as 0, 1, and ∞) if and only if the curve itself is defined over a number field, which is a finite extension of the field of rational numbers Q.

In other words, the Belyi theorem implies that an algebraic curve defined over a number field can be mapped to the Riemann sphere in such a way that the ramification (branching) is restricted to just three points. The rational functions that provide these branched coverings are known as Belyi functions. The significance of the Belyi theorem lies in the fact that it provides a method to study algebraic curves defined over number fields by analyzing their ramified coverings and the associated ‘dessins d’enfants’, which are combinatorial objects encoding the ramification data. Specifically, we have the crucial result that:

π^1(P1(C)\{0,1,})F^2
i.e. the so-called étale fundamental group for the triply branched projective line is the profinite group F^2.

SL2(C) representations of groups fp and a Groebner basis G

While the previous section describing profinite groups showcases the importance of algebraic geometry in the context of DNA/RNA sequences, it remains somewhat abstract. To address this, we can consider the representations of an fp group over the space-time-spin group SL2(C), as we did in previous studies.18,19,21 Representations of fp in SL2(C) are homomorphisms ρ: fpSL2(C) with character κρ(g) = tr(ρ(g)), gfp.The notation tr(ρ(g)) signifies the trace of the matrix ρ(g). The set of characters is used to determine an algebraic set by taking the quotient of the set of representations ρ by the group SL2(C), which acts by conjugation on representations.24,25 In such papers, we showed that the character variety of fp is a set comprised of a sequence X of multivariate polynomials. A particular basis related to X is the Groebner basis G(X), whose factors define hypersurfaces.

Our previous paper focused on a possible algebraic approach of topological quantum computing.18 In two subsequent papers,19,21 we investigated SL2(C) representations of short DNA/RNA sequences (e.g., the consensus sequence of a transcription factor or the seed of a miRNA) and related them to a potential disease. For a two-generator group fp, the factors are three-dimensional surfaces. In general, these surfaces can be classified by mapping them to a rational surface across five categories.19 Often encountered surfaces are degree p Del Pezzo surfaces where 1 ≤ p ≤ 9. A rational surface may either be nonsingular, almost nonsingular, having only isolated singularities, or singular. Almost nonsingular surfaces are key in our context. A simple singularity is referred to as an A-D-E singularity and must be of the type An, n ≥ 1, Dn, n ≥ 4, E6, E7, or E8. The A-D-E type is mirrored in the notation we employ. For instance, S(lA1,mA2,nA3,···) denotes a surface containing l type A1, m type A2, n type A3 singularities, etc. A generic surface is the Cayley cubic we encountered in our previous papers, defined as S(4A1) = xyz+x2 +y2 +z2 −4.19

For a three-generator group fp, the factors of G(X) are seven-dimensional surfaces of the form Sa,b,c,d(x,y,z). Some of them belong to the Fricke family,19 which is associated with the four-punctured sphere. But for a chosen set of parameters a,b,c,d, the hypersurface reduces to an ordinary three-dimensional surface. For a four-generator group fp, the factors of G(X) are 14-dimensional surfaces containing four copies of the form S(x,y,z), S(x,u,v), S(y,u,v), and S(z,v,w) for selected choices of eight parameters.

Groebner basis of the TATA box

The Groebner basis for the character variety associated with the fp group of generators rel = TATAAAA of the TATA box as discussed above, is found to be:

GTATA = (z4xy2xyz + x2 + y2 + yz − 3z2 + x − 2) (x2zxyxz + yz) S(A2)S(A4) (x3z2 − 3x + 2),
where S(A2) = x2y − z3– xz – y + 3z and S(A4) = xz2–x2–yz − x + 2 are degree 3 Del Pezzo surfaces. The Groebner basis GTATA comprises a degree 2 Del Pezzo surface (Fig. 1a, and a rational scroll whole analytic expression is in the first row. Both surfaces are singular. The second row consists of two surfaces with simple singularities of type A2 and A4, respectively. The last term represents a curve (not a surface).

Two types of Del Pezzo surfaces.
Fig. 1  Two types of Del Pezzo surfaces.

(a) Degree 2 Del Pezzo surface within GTATA. (b) Degree 3 Del Pezzo surface S(A1) within Grel1.

Groebner basis for polyadenylation signals

For the first polyadenylation signal considered in the paragraph describing infinite finitely generated groups. The relation of the fp group is rel1 = AAUAAA. The corresponding Groebner basis is:

Grel1 = 3 rational scrolls × P2 × S(4A1)S(A1) × curve.

The Groebner basis Grel1 contains three rational scrolls, a surface birationally equivalent to the projective plane P2, the Cayley cubic S(4A1), the degree 3 Del Pezzo surface S(A1) = x2yxz2xz + yz + xy (Fig. 1b) and a curve.

For the second polyadenylation signal considered above in the paragraph describing groups fp and Fr, the relation of the fp group is rel2 = UGUAA. The factors of G(X) are seven-dimensional hypersurfaces Sa,b,c,d(x,y,z). However, by choosing specific parameters, such as S0,0,0,0(x,y,z) or S1,1,1,1(x,y,z), we obtained three-dimensional surfaces. These were found to be degree 3 Del Pezzo surfaces with simple singularities of the form S(lA2), with l = 1, 2, or 3, quadrics, or curves.

Groebner basis of the transcription factor of DBX gene

For the DBX gene studied in the paragraph on aperiodic sequences, the Groebner basis takes the form of GDBX = scroll × P2 × S(A4) × S(A2) × S(4A1) × curve, where scroll = y2z − xyyz + xz and P2 = z4x2y + xz − 4z2 + y + 2 are singular. The other factors are DP3 surfaces with isolated singularities that are S(A4) = yz2y2xzy2, S(A2) = z3xy2 + yz + − 3z, the Cayley cubic S(4A1) and curve = y3z2 − 3y + 2.

Further results

In this section, we describe additional results related to mRNA metabolism and miRNA.

Algebraic geometry of mRNA translation

Shine-Dalgarno box

Ribosomal RNA is a type of noncoding RNA and is the main component of a macromolecular machine, called the ribosome, whose role is to ensure mRNA translation. The initiation of translation needs the recognition of the appropriate sequences on the mRNA by the ribosome. A major factor in this recognition is an mRNA–ribosomal RNA interaction first proposed by Shine and Dalgarno.26 They proposed that the ribosomal nucleotides recognize the complementary purine-rich sequence rel3 = AGGAGGU, which is found approximately eight bases upstream of the start codon AUG in a number of mRNAs found in viruses that affect Escherichia coli.

Let us study the group fp = 〈A, G, U|rel3〉. The card seq of fp is found to be the same as that of the free group F2. The SL2(C) character variety is a scheme X whose Groebner basis G(X) comprises 7-dimensional surfaces Sa,b,c,d(x,y,z). By projecting to three dimensions, one gets surfaces like S0,0,0,0(x,y,z) and S1,1,1,1(x,y,z) as in the paragraph describing SL2(C) representations of groups fp. We find degree 3 Del Pezzo surfaces with isolated singularities S(A1) = x2y + yz2+4xz + 4y and x2y + yz2+x +z2+6xz + 5y − 6z − 7, S(A2) = xyz + 2x2+ z2+4 and S(A4) = xyz + 3x2+z2 − 5z, quadrics, and curves.

Kozak consensus sequence

The Kozak consensus sequence is a nucleotide motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts.27 The small (40S) subunit of eukaryotic ribosomes bind, initially at the capped 5-end of the mRNA and then migrate, stopping at the first AUG codon in a favorable context for initiating translation. In eukaryotes, the Kozak sequence ensures that a protein is correctly translated from the genetic message, mediating ribosome assembly and translation initiation. A sequence logo of the most conserved bases around the initiation codon AUG for human mRNAs may be found in the first caption of Kozak (https://en.wikipedia.org/wiki/Kozak ) consensus sequence as rel4 = ACCAUGGC.

Let us study the group fp = 〈A, C, G, U|rel4〉. The card seq of fp is found to be the same as that of the free group F3 of rank 3. This group can be linked to an aperiodic sequence by following the steps given in the paragraph describing aperiodic sequences. By splitting rel4 into four segments rel4 = relArelCrelGrelU and applying the substitution maps C → relC = A, A → relA = CCAUG, U → relU = G, G → relG = C, we generated the substitution sequence: SKozak = C,A,U,G,CAUG,ACCAUGGC,CCAUGA2CCAUGGC2A,···.

On inspection, it is straightforward to observe that all finitely generated groups fp(l) with their generations being CAUG, ACCAUGGC, CCAUGA2CCAUGGC2A,···, respectively, have a card seq of F3. The aforementioned sequence has a substitution matrix:

M=(0201110001000110).

One can verify that M is primitive as M4 ≫ 0 and λPF ≈ 2.2055694 is the only real (and irrational) solution of the equation x3 − 2x2 – 1=0. Conditions (1) and (2) for aperiodic sequences are satisfied, implying that the substitution SKozak is aperiodic. Rittaud discussed the connection of the later Perron–Frobenius eigenvalue to random Fibonacci sequences.28

Mutation of a purine at position −3 with respect to the AUG codon is known to be associated with diseases including a type of thalassemia owing to a bad initiation of alpha-globin.27 In our approach, the mutation from rel4 to rel4′ = CCCAUGGC leads to a substitution M′that is no longer primitive, so that the property of aperiodicity of the sequence is lost. However, the card seq of the associated fp group is still that of the free group F3. No other substitution in the sequence rel4′ can be found to restore the aperiodicity.

Algebraic geometry of miRNAs

miRNAs are small, single-stranded, noncoding RNA molecules containing approximately 22 nucleotides. miRNAs play crucial roles in RNA silencing and post-transcriptional regulation of gene expression by specifically targeting certain mRNAs for degradation and translational repression(https://en.wikipedia.org/wiki/MicroRNA ).29 miRNA genes are typically transcribed by RNA polymerase II (Pol II), which binds to a promoter located near the DNA sequence, encoding what will become the hairpin loop of a precursor (pre)-miRNA. Pre-miRNAs are approximately 70 nucleotides long and fold into imperfect stem-loop structures. A miRNA consists of a duplex comprising two strands (−5p and −3p). However, a single strand is selected into the RNA-induced silencing complex to serve as a template during the transcription of a complementary mRNA.30,31 For details of the miRNA sequences, we use the Mir database (https://www.mirbase.org/ ).32,33 It should be emphasized that a given miRNA may have hundreds of different mRNA targets and a single target may be regulated by multiple miRNAs. For previous discussions of how to define an fp group from the seed of a miRNA, the reader may consult a recent review.19 Below, we focus on other examples.

miRNA hsa-mir-122

mir-122 is a tissue-specific miRNA that is highly expressed in the liver.34 It is involved in cholesterol accumulation and fatty acid metabolism. It has a leading role in controlling the hepatitis C virus.35,36 The seed region for mir-122-5p is seed0 = GGAGUGU. The corresponding group fp = 〈C, G, U|seed0〉 has the card seq of the free group F2. Let us first check if the seed sequence is aperiodic. By splitting seed0 into three segments seed0 = seedA seedG seedU and applying the substitution maps A → seedA = GG, G → seedG = AGU, U → seedU = GU, one can check that the finitely generated groups fp(l) with generators GGAGUGU, AGUAGUGGAGUGUAGUGU, possess the card seq of the free group F2. Following the method described in the section on aperiodic sequences, their attached and free groups, one gets the (primitive) substitution matrix:

M=(010211011)
whose characteristic polynomial λ3 − 2λ2 − 2λ+2 has three real roots. The largest one is the (irrational) Perron–Frobenius eigenvalue λPF ≈ 2.481194. One concludes that the sequence seed0 is aperiodic.

Let us now look at the Groebner basis for the SL2(C) representation of fp with the method described above. One obtains:

Gmir-122−5p(0,0,0,0) = 8yz(2 − z2) and Gmir-122−5p(1,1,1,1) = −4 z2(x − z2 +z + 1) (y + z3z2 − 2z)

One can check that all values of the parameters Ga,b,c,d (x, y, z) only contain factors that are curves and not surfaces.

miRNA hsa-mir-503

The slowest evolving miRNA gene in the human species (hsa) is hsa-mir-503 (https://www.mirbase.org/ ). It regulates gene expression in various pathological processes of diseases, including carcinogenesis, angiogenesis, tissue fibrosis, and oxidative stress.37 The seed region of mir-503-5p is seed1 = AGCAGCGG. The corresponding group fp = 〈A, C, U|seed1〉 has the card seq of the free group F2. For this group, the Groebner basis with parameters (a,b,c,d) = (0,0,0,0) is quite simple: Gmir−503−5p(0,0,0,0) = S(4A1)(x,y,z), which is the already mentioned Cayley cubic. For (a,b,c,d) = (1,1,0,0), Gmir−503−5p(1,1,0,0) = −3xyzκ3(x,y,z), where κ3(x,y,z) is the Fricke surface described by Planat et al.38 For (a,b,c,d) = (1,1,1,1), there are several more polynomials. One of which defines the Fricke surface xyz + x2+ y2+z2 2x − y – 2 = 0. The considered seed region for mir-503-3p is GGGUAUU. The surfaces in the Groebner basis are very simple in this case, and no simple singularities exist within them.

miRNA hsa-mir-146a

mir-146 is primarily involved in the regulation of inflammation and other processes functioning in the innate immune system. It has a role in neuropathogenesis. The considered seed region for hsa-mir-146a-5p is seed2 = GAGAAC (https://www.mirbase.org/ ). Again the corresponding group fp = 〈A, C, G|seed2〉 has the card seq of the free group F2. The Groebner basis with parameters (a,b,c,d) = (0,0,0,0) is Ghsa-146a−5p(0,0,0,0) = (xz + y + 2) (yz2 + 2)2 (x2 + z2 − 2y − 4) S(3A2), where S(3A2) = z3xy − 2yz − 2x − 4z. The Groebner basis with parameters (a,b,c,d) = (1,1,1,1) is of the form Ghsa-146a−5p(1,1,1,1) = DP4 ×f(2A2)× quadric × curves, where DP4 is a degree 4 del Pezzo surface.

miRNAs and disease

As described previously,19 a potential disease is associated with fp groups that fail to satisfy at least one of three requirements: (1) the card seq of fp should be that of a free group Fr; (2) the generating sequence should be aperiodic; or (3) the SL2(C) character variety of fp should have a Groebner basis devoid of isolated singularities even though the fp group may have the card seq of a free group.19 Following these criteria, the sequence hsa-mir-122-5p is healthy but the sequences hsa-mir-503-5p and hsa-mir-146a-5p are not because criterion three is not satisfied. Additional examples can be found in our previous study.19

In addition to isolated singularities, the Groebner basis may contain unique surfaces that are not simply singular. The DP4 surface in Ghsa-146a−5p(1,1,1,1) is an example of a singular surface. Further mathematical evaluation is required to investigate these surfaces.39 However, we will not include them in this review.

Discussion

Figure 2 summarizes our key results. Given a short DNA/RNA sequence, rel that represents a consensus sequence in a transcription factor, the seed of a miRNA, or a relevant sequence in mRNA recognition and processing, we constructed a finitely generated group, fp. The architecture of subgroups, card seq, within this group was computed, as described in the subsection about the infinite finitely generated groups fp. If the fp card seq matches that of the free group Fr (of rank r = nt − 1), we proceed to path four; otherwise, a potential disease could be in sight (path three). After reaching path four, the next step involves checking the aperiodicity of rel and the corresponding fp group, as described in the subsection about aperiodic sequences and their attached groups fp. The final step is to examine the presence (or absence) of isolated singularities in the Groebner basis G for the SL2(C) character variety associated with fp, as outlined in the subsection about SL2(C) representations of groups fp. For a healthy sequence, the path concludes at six, while a potential disease may be indicated if the path ends at three, seven, or eight.

Diagram of the main results discussed in the text.
Fig. 2  Diagram of the main results discussed in the text.

For example, for the transcription factor of the gene EGR1, rel = GCGTGGGCG [25, Section 4.1.2], the path is 1 → 2 → 4 → 5 → 6 showing no risk of disease. But for the transcription factor of gene DBX (see the subsections about aperiodic sequences and the SL2(C) representations of groups), rel= TTTATTA, the path is 1 → 2 → 4 → 5 → 8 meaning a potential disease (see Table 1).

In Table 1, we provide several examples of paths.23,31,36,37,40 All three checks can be performed, even if paths 4 or 5 are not followed. For instance, the termination {7,8} signifies that the sequence fails both in being aperiodic and in being devoid of simple singularities. For sequences with four distinct nucleotides, like the sequence of transcription factor FOX or the Kozak sequence rel4, it is difficult to make a conclusion about the risk of a disease. The generic Groebner basis1G(x,y,z) always contains a surface with isolated singularities such as S(4A1) and S(3A1) and there are four copies of them. The termination {6,8} applies for this case.

Table 1

A few possible paths in the Figure 2 diagram that terminates at path six (healthy) or three, seven, or eight (potential disease)

SequencerelPath
EGR123GCGTGGGCG1→2→4→5→6
FOS23TGAGTCA1→2→4→5→{6,8}
Nanog23TAATGG1→2→4→{7,8}
DBXTTTATTA1→2→4→5→8
TATATATAAAA1→2→3→(7,8)
Poly(A) (rel1)AAUAAA1→2→3→{7,8}
Poly(A) (rel2)UGUAA1→2→4→{7,8}
Shine-Dalgarno (rel3)AGGAGGU1→2→4→5→8
Kozak (rel4)ACCAUGGC1→2→4→5→{6,8}
Kozak (rel4′)CCCAUGGC1→2→4→7
hsa-mir-122-5p36(seed0)GGAGUGU1→2→4→5→6
hsa-mir-132-5p (https://fr.wikipedia.org/wiki/Micro-ARN_7 )CCGUGGC1→2→4→5→6
mir-503-5p (seed1)37AGCAGCGG1→2→5→8
mir-146a-5p (seed2)40GAGAAC1→2→{7,8}
hsa-mir-7-5p (https://en.wikipedia.org/wiki/MiR-132 )GGAAGA1→2→{3,7,8}
hsa-mir-7-5pGGAAGAC1→2→4→5→6
hsa-mir-7-3pAACAAAU1→2→4→7
hsa-mir-155-3p31,40UCCUAC1→2→4→{7,8}
hsa-mir-155-3pUCCUACA1→2→3

Algebraic geometry of m6A modifications

As mentioned in the Introduction, a subfield of epigenetics deals with post-transcriptional mRNA modifications. m6A is the most frequent modification in most eukaryotes. But m6A is also present in bacteria, with the consensus motif GCCAG.41,42 An interesting aspect is that the mRNA m6A motif in bacteria is distinct from the consensus motif in eukaryotes (RRACH). This features the evolutionary machinery present in the last eukaryotic common ancestor compared to the last universal common ancestor.43 In Table 2, we provide details of the group generated by these sequences, when the sequence is aperiodic and/or has a Groebner basis of its character variety containing an isolated singularity. The path in the diagram of Figure 2 is shown in Table 1.

Table 2

Detailed group theoretical analysis of m6A modifications for bacteria (the sequence GCCAG) and eukaryotes (sequence RRACH (R = A or G, H = A, U, or C))

SequenceGroupAperiodicGroebner basisPath
Bacterial
  GCCAGF21.83928No1→2→4→5→6
Eukaryote
  AAACAF1No1→2→4→{7,8}
  AAACCH3No1→2→{3,7}
  AAACUF2NoS(A2), S(A1A2) No1→2→4→7
  GGACAF21.83928No1→2→4→5→8
  GGACCF2NoS(A2), S(A2A2) No1→2→4→7
  GGACUF3NoUnknown1→2→4→7

Only the bacterial sequence leads to a path terminating at edge 6 of the diagram of Figure 2. In the closest eukaryotic sequence GGACA (from the viewpoint of group analysis), isolated singularities are found, such as the degree 3 Del Pezzo surface S(A2A2) = y3 − 2xz −4y. The other sequences are not aperiodic. From the biological point of view, it is known that an appropriate level of m6A methylation is beneficial, but it may be a risk to drive it in an artificial way because it may destroy the delicate balance of regulations performed within the mRNA.

Conclusions

Our approach was comprehensive and can be applied in numerous contexts beyond those we have considered thus far. It has the potential to impact the search for underlying causes of diseases and aid in the discovery of therapeutic strategies. The e-code, the processes that reveals and executes gene expression, has a sophisticated structure that our mathematical approach aimed to elucidate.

Abbreviations

m6A: 

N6-methyladenosine

mRNA: 

messenger RNA

miRNA: 

microRNA

Declarations

Acknowledgement

The first author would like to acknowledge the contribution of the COST Action CA21169, supported by COST (European Cooperation in Science and Technology).

Data share statement

Computational data are available from the authors upon reasonable request.

Funding

Funding was obtained from Quantum Gravity Research in Los Angeles, CA, USA.

Conflict of interest

The authors declare that they have no conflicts of interest.

Authors’ contributions

Conceptualization (MP, FF, KI), methodology (MP, DC, RA), software (MP), validation (RA, FF, DC, MMA), formal analysis (MP, MMA), investigation (MP, DC, FF, MMA), writing and original draft preparation (MP), writing, review and editing (MP) visualization (FF, RA), supervision (MP, KI), project administration (KI), and funding acquisition (KI). All authors have read and approved the final version of the manuscript.

References

  1. Gu C, Kim GB, Kim WJ, Kim HU, Lee SY. Current status and applications of genome-scale metabolic models. Genome Biol 2019;20(1):121 View Article PubMed/NCBI
  2. Romão L. mRNA metabolism in health and disease. Biomedicines 2022;10(9):2262 View Article PubMed/NCBI
  3. Peedicayil J. Genome-environment interactions and psychiatric disorders. Biomedicines 2023;11(4):1209 View Article PubMed/NCBI
  4. Scharf S, Ackermann J, Bender L, Wurzel P, Schäfer H, Hansmann ML, et al. Holistic view on the structure of immune response: petri net model. Biomedicines 2023;11(2):452 View Article PubMed/NCBI
  5. Marques AR, Santos JX, Martiniano H, Vilela J, Rasga C, Romão L, et al. Gene variants involved in nonsense-mediated mrna decay suggest a role in autism spectrum disorder. Biomedicines 2022;10(3):665 View Article PubMed/NCBI
  6. Wan YCE, Chan KM. Histone H2B mutations in cancer. Biomedicines 2021;9(6):694 View Article PubMed/NCBI
  7. Fimmel E, Giannerini S, Gonzalez DL, Strüngmann L. Circular codes, symmetries and transformations. J Math Biol 2015;70(7):1623-1644 View Article PubMed/NCBI
  8. Planat M, Aschheim R, Amaral MM, Fang F, Irwin K. Complete quantum information in the DNA genetic code. Symmetry 2020;12:1993 View Article PubMed/NCBI
  9. Sanchez R, Barreto J. Genomic abelian finite groups. bioRxiv [Preprint] 2023 View Article PubMed/NCBI
  10. Frappat L, Sciarrino A, Sorba P. Crystalizing the genetic code. J Biol Phys 2001;27(1):1-34 View Article PubMed/NCBI
  11. Planat M, Chester D, Aschheim R, Amaral MM, Fang F, Irwin K. Finite groups for the Kummer surface: the genetic code and quantum gravity. Quantum Rep 2021;3:68-79 View Article PubMed/NCBI
  12. Planat M, Aschheim R, Amaral MM, Fang F, Irwin K. Quantum information in the protein codes, 3-manifolds and the Kummer surface. Symmetry 2021;13:1146 View Article PubMed/NCBI
  13. Sanchez R, Mackenzie SA. On the thermodynamics of DNA methylation process. Sci Rep 2023;13(1):8914 View Article PubMed/NCBI
  14. Bessonov N, Butuzova O, Minarsky A, Penner R, Soulé C, Tosenberger A, et al. Morphogenesis software based on epigenetic code concept. Comput Struct Biotechnol J 2019;17:1203-1216 View Article PubMed/NCBI
  15. Vissers C, Sinha A, Ming GL, Song H. The epitranscriptome in stem cell biology and neural development. Neurobiol Dis 2020;146:105139 View Article PubMed/NCBI
  16. Wang S, Lv W, Li T, Zhang S, Wang H, Li X, et al. Dynamic regulation and functions of mRNA m6A modification. Cancer Cell Int 2022;22(1):48 View Article PubMed/NCBI
  17. Widagdo J, Wong JJ, Anggono V. The m(6)A-epitranscriptome in brain plasticity, learning and memory. Semin Cell Dev Biol 2022;125:110-121 View Article PubMed/NCBI
  18. Planat M, Amaral MM, Fang F, Chester D, Aschheim R, Irwin K. Character varieties and algebraic surfaces for the topology of quantum computing. Symmetry 2022;14:915 View Article PubMed/NCBI
  19. Planat M, Amaral MM, Irwin K. Algebraic morphology of DNA-RNA transcription and regulation. Symmetry 2023;15:770 View Article PubMed/NCBI
  20. Schrödinger E. What Is Life? The Physical Aspect of the Living Cell. Cambridge: Cambridge University Press; 1944 View Article PubMed/NCBI
  21. Planat M, Amaral MM, Fang F, Chester D, Aschheim R, Irwin K. DNA Sequence and structure under the prism of group theory and algebraic surfaces. Int J Mol Sci 2022;23(21):13290 View Article PubMed/NCBI
  22. Grothendieck A. Lecture Series of the London Mathematical Society. Cambridge: Cambridge University Press; 1997, 243-283 View Article PubMed/NCBI
  23. Planat M, Amaral MM, Fang F, Chester D, Aschheim R, Irwin K. Group theory of syntactical freedom in DNA transcription and genome decoding. Curr Issues Mol Biol 2022;44(4):1417-1433 View Article PubMed/NCBI
  24. Goldman WM. Trace coordinates on Fricke spaces of some simple hyperbolic surfaces. Eur Math Soc 2009;13:611-684 View Article PubMed/NCBI
  25. Ashley C, Burelle JP, Lawton S. Rank 1 character varieties of finitely presented groups. Geom Dedicata 2018;192:1-19 View Article PubMed/NCBI
  26. Jacob WF, Santer M, Dahlberg AE. A single base change in the Shine-Dalgarno region of 16S rRNA of Escherichia coli affects translation of many proteins. Proc Natl Acad Sci U S A 1987;84(14):4757-4761 View Article PubMed/NCBI
  27. Kozak M. The scanning model for translation: an update. J Cell Biol 1989;108(2):229-241 View Article PubMed/NCBI
  28. Rittaud B. On the average growth of random Fibonacci sequences. J Int Seq 2007;10:07.2.4 View Article PubMed/NCBI
  29. Fang Y, Pan X, Shen HB. Recent deep learning methodology development for RNA-RNA interaction prediction. Symmetry 2022;14:1302 View Article PubMed/NCBI
  30. Medley JC, Panzade G, Zinovyeva AY. microRNA strand selection: Unwinding the rules. Wiley Interdiscip Rev RNA 2021;12(3):e1627 View Article PubMed/NCBI
  31. Dawson O, Piccinini AM. miR-155-3p: processing by-product or rising star in immunity and cancer?. Open Biol 2022;12(5):220070 View Article PubMed/NCBI
  32. Kozomara A, Birgaoanu M, Griffiths-Jones S. miRBase: from microRNA sequences to function. Nucleic Acids Res 2019;47(D1):D155-D162 View Article PubMed/NCBI
  33. Fromm B, Billipp T, Peck LE, Johansen M, Tarver JE, King BL, et al. A uniform system for the annotation of vertebrate microRNA genes and the evolution of the human microRNAome. Annu Rev Genet 2015;49:213-242 View Article PubMed/NCBI
  34. Ludwig N, Leidinger P, Becker K, Backes C, Fehlmann T, Pallasch C, et al. Distribution of miRNA expression across human tissues. Nucleic Acids Res 2016;44(8):3865-3877 View Article PubMed/NCBI
  35. Girard M, Jacquemin E, Munnich A, Lyonnet S, Henrion-Caude A. miR-122, a paradigm for the role of microRNAs in the liver. J Hepatol 2008;48(4):648-656 View Article PubMed/NCBI
  36. Hu J, Xu Y, Hao J, Wang S, Li C, Meng S. MiR-122 in hepatic function and liver diseases. Protein Cell 2012;3(5):364-371 View Article PubMed/NCBI
  37. He Y, Cai Y, Pai PM, Ren X, Xia Z. The causes and consequences of miR-503 dysregulation and its impact on cardiovascular disease and cancer. Front Pharmacol 2021;12:629611 View Article PubMed/NCBI
  38. Planat M, Chester D, Amaral M, Irwin K. Fricke topological qubits. Quant Rep 2022;4:523-532 View Article PubMed/NCBI
  39. Planat M, Amaral MM, Chester D, Irwin K. SL(2,C) scheme processsing of singularities in quantum computing and genetics. Axioms 2023;12:233 View Article PubMed/NCBI
  40. Sonkoly E, Ståhle M, Pivarcsi A. MicroRNAs and immunity: novel players in the regulation of normal immune function and inflammation. Semin Cancer Biol 2008;18(2):131-140 View Article PubMed/NCBI
  41. Deng X, Chen K, Luo GZ, Weng X, Ji Q, Zhou T, et al. Widespread occurrence of N6-methyladenosine in bacterial mRNA. Nucleic Acids Res 2015;43(13):6557-6567 View Article PubMed/NCBI
  42. Gao R, Tsui PH, Wu S, Tai DI, Bin G, Zhou Z. Ultrasound entropy imaging based on the kernel density estimation: a new approach to hepatic steatosis characterization. Diagnostics (Basel) 2023;13(24):3646 View Article PubMed/NCBI
  43. Liu C, Cao J, Zhang H, Yin J. Evolutionary history of RNA modifications at N6-adenosine originating from the R-M system in eukaryotes and prokaryotes. Biology (Basel) 2022;11(2):214 View Article PubMed/NCBI