'How to calculate Follow set of a production rule that has only ONE symbol on the right side

So i have this grammar :

S -> (D)
D -> EF
E -> a|b|S
F -> *D | +D | ε

First of all, books solution uses the P -> pBq , First(q) - {ε} is subset of FOLLOW(B) for the rule D -> EF but that rule has only 2 symbols do we assume ε infront of E (ε being the p in pBq)?

And secondly i can't understand how to calculate Follow(E).



Solution 1:[1]

FOLLOW(E) consists of every terminal symbol which can immediately follow E in some derivation step. That's the precise definition; it's not very complicated.

For a simple grammar, you should be able to figure out all the FOLLOW sets just be looking at the grammar and applying a little bit of common sense. It would probably be a good idea to do that, since it will give you a better idea of how the algorithm works.

As a side note, it's maybe worth mentioning that ? is not a thing. Or at least, it's not a grammar symbol. It's one of several conventions used to make the empty sequence visible, just like 0 is a way to make nothing visible. Sometimes that's useful, but it's important to not let it confuse you. (Abuse of notation is endemic in mathematics, which can be frustrating.)

So, what can follow E? E only appears in one place on the right-hand sdie of that grammar, in the production D ? E F. So clearly any symbol which be the first symbol of F must be in FOLLOW(E). The symbols which could be at the start of F are + and *, since as mentioned, ? is not a grammar symbol. (Many definitions of FIRST allow ? to be a member of that set, along with any actual terminal symbol. That's an example of the abuse of notation I was talking about in the previous paragraph, since it makes it look like ? is a terminal symbol. But it isn't. It's nothing.)

F is what we call a "nullable" non-terminal, because it can derive the empty sequence (which was written as ? so that you can see it). In other words, it's possible for F to disappear completely in a derivation step. And if it does disappear, then E might be at the end of the production D ? E F. If E is at the end of D, then it can be followed by anything which could follow D, which includes ). D can also appear at the end of a derivation of F, which means that F could be followed by anything which could follow F, a tautology which adds no information whatsoever.

So it's easy to see that FOLLOW(F) = {*, +, )}, and you can use that to check your understanding of any algorithm to compute follow sets.

Now, I don't know what book you are referring to (and it would have been courteous to mention that in your original question; sources should always be correctly cited). But the book I happen to have in front of me --the Dragon Book-- has a pretty similar algorithm. The Dragon book uses a simple convention for writing statements like that. Probably your book does, too, but it might not be the same convention. You should check what it says and make sure that you typed the copied statement correctly, respecting whatever formatting used to indicate what the symbols stand for.

In the Dragon book, some of the conventions include:

  • Lower case characters at the start of the alphabet. –a, b, c,…– are terminals (as well as actual symbols like * and +).
  • Upper case characters at the start of the alphabet. –A, B, C,…– are non-terminals.
  • S is the start symbol.
  • Upper case characters at the end of the alphabet. –X, Y, Z– stand for arbitrary grammar symbols (either terminals or non-terminals).
  • $ is the marker used to indicate the end of the input.
  • Lower-case Greek letters –?, ?, ?,…– are possibly-empty strings of grammar symbols. The phrase "possibly empty" is very important, so I'm repeating it.

With that convention, they write the rules for computing the FOLLOW set:

  1. Place $ in FOLLOW(S).
  2. For every production A ? ?B?, copy everything from FIRST(&beta) except ? into FOLLOW(B).
  3. If there is a production A ? ?B or a production A ? ?B? where FIRST(?) contains ?, place everything in FOLLOW(A) into FOLLOW(B). As mentioned above, ? is a possibly-empty string of grammar symbols. So it might not be visible.
  4. Keep doing steps 2 and 3 until no new symbols are added to any follow set.

I'm pretty sure that the algorithm in your book differs only in notation conventions.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 rici