Which avalon_mm_slave and how to use it?

There are several avalon_mm entities in the repository, but it’s not entirely clear to me, which one should be used.

There is the

avalon_mm_bus_arbitrer

which splits the master into several slaves. The number of slaves can be set with the NHI generic (number of slaves is 2 ** NHI). Right so far? Why do I need the NM generic?
But what is the

avalon_mm_ic_merged

for? The interface is pretty much the same…

An then there are several slave implementations,

avalon_mm_slave_vererr
avalon_mm_slave_xl
avalon_mm_slave

According to the comments, the functionality of the first two is now included in the last one, right? So the *_xl and *_vererr are outdated by now?

So If I want to use all the addresses coming into the user_logic (24bit address range), I instantiate one

avalon_mm_bus_arbitrer

where I set NHI to 24. NH I would keep at 1 and AWIDTH I don’t know, maybe also 24? Then for example the S_WAITREQ would be a 16,777,215 bit vector (2 ** 24). Does this make sense, I don’t really think so?

Ok, then the other way around, the lowest level is the

avalon_mm_slave

According to the width of RSTVAL, I can decode at most 64 addresses with one slave. For this case, MODE_LG would be 64 and AWIDTH = 6 (2 ** 6 = 64)?

Ok, how do I get a set of 6bit addresses out of the avalon_mm_bus_arbitrer? In line 179 of the source code, maddr(AWIDTH-NHI - 1 downto 0) is assigned to saddrlo which is later assigned to S_ADDR. So to get 6bit address, AWIDTH - NHI has to be 6, so for AWIDTH = 24 (we want all addresses), NHI needs to be 18.
With that I would conclude, I can have up to 18 (or 2 ** 18) slave instances with 1 master using all the available addresses. In total, up to 18 * 64 = 1152 (or 2 ** 18 * 64 = 16,777,216) registers can be used. Was this correct? Maybe the documentation could be extended to answer those questions a bit clearer. It took me half a day to dig through the code :frowning:

Another question: if I want to trigger some action from the host PC, for example a write_enable which is one clock cycle high to write a previously transmitted data to a local LUT. How can I do that with the avalon_mm_slaves? Sure, I could check If a value of one address has changed and use that to generate the pulse. But this requires, that the user always writes a different value to the relevant address, which might not be the case… Is there a different way which I overlooked?

Sorry for the long post, but this avalon_master/slaves are still a bit confusing for me…

@costaf @bourrion do you read through this from time to time, or should I better just mention your names here so that a mail is send to you?

Hello,

I was notified, this time and previously also.

About the avalon, we are about to remove completely for the repo the following:

avalon_mm_slave_vererr
avalon_mm_slave_xl
avalon_mm_ic_merged

And from now on, only the following should be used, these are the only one documented in the README.md :

avalon_mm_bus_arbitrer
avalon_mm_slave

Now about the details, they should be provided in pcoresng-cru/cru_misc/README.md

  • NM : will be always one in your case, for us, it is 2 and was even 3 before (2 PCIE + JTAG master)
  • The number of slaves is set by 2 ** NHI

Both new slave and master have a testbench in the sim directory, it is an example usage you may want to look at. Also grep the repo, you’ll see many usage of these new ones.

AWIDTH discussion (master only)
The purpose of the new master is to always work with a 32 bit address at the input, and therefore allow the usage of array. of vector. Thus to select the part to decode and the number of bit to use for that we use AWIDTH and NHI.

  • For instance AWIDTH=24 and NHI=2 means you decode 4 slaves with bit 23 and 22

MODE_LG discussion (slave only)
This is a bit redundant with AWIDTH, but in pratice you can set this to 8 bit and then MODE_LG is used to check how many bit are really needed to decode, and it can’t be larger than 64.

About your last question, I don’t understand your request. I understand you want to write at an address, but not what is the problem ?

I am watching the post and I get an email as notification that someone wrote on the forum.
So it should be ok.

PiPPo

ok, for the master NM stays at 1. And other configuration examples would be

  • AWIDTH=24 and NHI=3 means, I decode 8 slaves with bit 23 downto 21
  • AWIDTH=24 and NHI=4 means, I decode 16 slaves with bit 23 downto 20
  • AWIDTH=20 and NHI=4 means, I decode 16 slaves with bit 19 downto 16

But In principle I don’t care, which bits are really used, I just have to ensure, that they are not used multiple times and for that I have to set the AWIDTH generic correct.

And for the slave I can just set the AWIDTH to 8 and don’t care about it anymore and use the MODE_LG to specify the number of 32bit data words I want to decode with this slave (up to 64).

Is this correct so far?

About the second question: Imagine I want to write some configuration in a local RAM. Usually I would transmit some data I wan’t to set (via some address), and an RAM-address where I want to store the data in (via another address). But then the write_enable flag is missing and I don’t see, how I can generate this. Usually I would generate a pulse in some register_file if to a specific address is written (doesn’t matter what, just the write_enable of the bus would trigger it). But with the avalon_mm_slaves I get only the final value in a register but not that it is new (imaging writing multiple times the same value to a register, how do I see that).

Just thinking about it, a change in the address register could be used in that specific case to trigger the write_enable (but then the order matters, first the data has to be written, then the address…) I have to check, If I can use that…

Ok, then I will not mention you explicitly.
Greetings,
Sebastian

Hello,

For the first part, it is all OK, I’ll add some more information in the README.md to further clarify, your example for instance.

Then, did you notice than some usercs, userrd and userwr are available for user? You may want to a dedicated addresse (one of your register: probably the last one that you access in your sequence) and just AND the appropriate usercs produced and the userwr.

Cheers

Ok, I think, I have now a more or less working simulation test bench. But there is one statement in the avalon_mm_slave.vhd which I don’t understand. It’s line 186,

wordaddr <= unsigned(ADDR(wordaddr'range));

what does this do and why is it necessary? All I can see in the simulation is that it removes the last 2 bits from the address.

I’m testing the following: I have set up a bus mux

busmux : entity work.avalon_mm_bus_arbitrer                                      
  generic map (                                                                    
    AWIDTH => 6 + NUM_ADDRESS_BITS,                                                
    NHI => NUM_ADDRESS_BITS                                                        
  )                                                                                
  port map (                                                                       
    CLK             => MMS_CLK,                                                    
    RST             => MMS_RESET,                                                  
    --                                                                             
    M_WAITREQ(0)    => MMS_WAITREQ,                                                
    M_ADDR(0)       => (MMS_ADDR'range => MMS_ADDR, others => '0'),                
    M_WR(0)         => MMS_WR,                                                     
    M_WRDATA(0)     => MMS_WRDATA,                                                 
    M_RD(0)         => MMS_RD,                                                     
    M_RDVAL(0)      => MMS_RDVAL,                                                  
    M_RDDATA(0)     => MMS_RDDATA,                                                 
    --                                                                             
    S_WAITREQ       => sx_waitreq,                                                 
    S_ADDR          => sx_addr,                                                    
    S_WR            => sx_wr,                                                      
    S_WRDATA        => sx_wrdata,                                                  
    S_RD            => sx_rd,                                                      
    S_RDVAL         => sx_rdval,                                                   
    S_RDDATA        => sx_rddata                                                   
  );    

Where NUM_ADDRESS_BITS is set to 4, so I can have 16 slaves connected, each decoding 64 addresses.
If I have now the first slave connected like the following (with NUM_ADDRESSES set to 64)

gen_cfg : entity work.avalon_mm_slave                                       
generic map (                                                               
  MODE_LG => NUM_ADDRESSES,                                                 
  AWIDTH => 8,                                                              
  MODE => (others => x"B"),                                           
  RSTVAL => (others => (others => '0'))                                     
)                                                                           
port map (                                                                  
  CLK             => MMS_CLK,                                               
  RESET           => MMS_RESET,                                             
  WAITREQ         => sx_waitreq(0),                                         
  ADDR            => sx_addr(0)(7 downto 0),                         
  WR              => sx_wr(0),                                              
  WRDATA          => sx_wrdata(0),                                          
  RD              => sx_rd(0),                                              
  RDVAL           => sx_rdval(0),                                           
  RDDATA          => sx_rddata(0),                                          
  --                                                                        
  ALTCLK          => gen_cfg_clk,                                           
  --                                                                        
  qout(NUM_ADDRESSES-1 downto 0) => av_sl_qout,                             
  --                                                                        
  din(NUM_ADDRESSES-1 downto 0)   => (others => (others => '0'))                
);  

I would expect, writing to address 0 gives me some data in av_sl_qout(0), to address 1 something in av_sl_qout(1), to address 2 something in av_sl_qout(2) and so on. But apparently this is not the case, because the last two address bits are removed by this line I mentioned above. So to get the desired behaviour, I have to write to target_address+4 or replace the line

ADDR            => sx_addr(0)(7 downto 0),  

with

ADDR            => sx_addr(0)(5 downto 0) & "00",  

so just setting the bits which will be removed to 0 and shifting the address accordingly. Is this intended or a bug in the slave implementation?

So, now in addition, If I don’t want to use all the 64 addresses of the slave, but less (let’s say 12 or 5 or something) the addresses to which I have to write in the beginning are then again different.

With 12 addresses, I have to write to addresses 48 to 59 to get some data in the outputs 0 to 11
With 5 addresses I have to write to addresses 56 to 60 to get values in the outputs 0 to 4.

By the way, no, I did not notice the usercs /-wr /-rd outputs yet.
Is there some documentation, how to use them or how the waveforms look like? I don’t see anything in the README.md of cru_misc.

Ciao … Olivier is on holidays … I will try to answer to your question later.
Cheers

Ciao Sebastian,
following the answer from the developer of the new AVALON interface

"
wordaddr <= unsigned(ADDR(wordaddr’range)); is necessary because this assignment determines the internal address space of the avalon_mm_slave module.
Knowing that the ADDR input address is expressed in bytes while the avalon_mm_slave module works on 32-bit words that is why the A0 and A1 low-order addresses are removed.

The usercs, userwr and userrd outputs work in the same way on 32-bit words
"

Is it more clear now?

Hi PiPPo,

thanks, a bit more clear, but not entirely. I will simulate the module a bit more and try to find out by myself, how it works.

Greetings,
Sebastian

Maybe a last comment concerning the usercs, userwr and userrd: I can’t use them for my purpose, because they are in the wrong clock domain (CLK instead of ALTCLK).

Sure, I could make the CDC by my one and then a rising edge detection afterwards to trigger whatever I want, but that’s too much effort for me at the moment. So I leave it to the user in the end to first write a 0 to the corresponding register and afterwards a meaningful number and then I do the rising edge detection on the register itself to generate the trigger.

Maybe it could be implemented in the future that also those signals are directly in the desired clock domain if an ALTCLK is provided.

Or, quickly scrolling through the Altera Avalon manual, there is something like a Avalon-MM Clock Crossing Bridge which

transfers Avalon-MM commands and responses between different clock domains. You can also use the Avalon-MM Clock Crossing Bridge to bridge between AXI masters and slaves of different clock domains.

Since I need my registers all in a fast clock domain (and therefore I always provide an ALTCLK), maybe it would be a solution to have the slaves directly in the target clock domain. Could you provide a corresponding implementation?

Ciao,
I have opened an issue with your request, so the developers are aware.

You can follow the progress here

https://gitlab.cern.ch/alice-cru/pcoresng-cru/issues/84

1 Like

Thanks!

Hello,

I’m in for few days, I’ll change the logic a bit. I intend to

  • remove the USERCS bus
  • Make a bus from USERWR and aother one for USERRD (one bit per 32 bit register)

The fix is easy for the writing side, when accessing a 32 bit address the CDC (or not), the corresponding USERWR signal will pulse high one clock cycle. It can be seen as a pipeline.

But for the reading side it is trickier, as we did not implement the WAITREQ feature of the avalon bus (no need in our slow control case). Can we imagine to use the USERRD as a USER read confirm instead and not a real read request? This would be much simpler to implement and require less resources.

Cheers

That sounds nice!

Concerning the userrd, I can’t imagine a use case at the moment, so this would be not that important and therefore, from my side you, are free to define it as a read confirm.